Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Hadoop configuration


May 26, 2021 Hadoop


Table of contents


Get ready before you configure hadoop

1. Modify the host name, I created three virtual hosts here, named node-1, node-2, node-3, into the network file to delete the contents directly written on the host name on it

vi /etc/sysconfig/network

2. Map the IP and host name, after which reboot restarts the host

[root@node-1 redis-cluster]# vim /etc/hosts
192.168.0.1 node-1
192.168.0.2 node-2
192.168.0.3 node-3

3. Detection firewall (to shut down the firewall), different systems of the firewall shut down in different ways, the following to do a reference can be

1.service iptables stop  关闭
2.chkconfig iptables off  强制关闭  除非手动开启不然不会自动开启
3.chkconfig iptanles  查看
4.service  iptables status 查看防火墙状态

4.ssh password-free login

Enter the command: ssh-keygen -t rsa then click on the four enter keys, as shown in the following image:

Hadoop configuration

The host IP is then applied via ssh-copy-id, and then the host system can be logged in without entering a password via "ssh host name/IP"

Hadoop configuration

Start configuring Hadoop-related files

Upload the hadoop installation package and decompress it into the hadoop-2.7.6/etc/hadoop directory
All of the following are written in the labels at the end of their respective profiles

The first hadoop-env.sh

vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_171   #JAVA_HOME写上自己jdk 的安装路径

Second: core-site .xml

vi <strong>core-site.xml</strong>
<!-- 指定Hadoop所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
<property>
	<name>fs.defaultFS</name>
	<value>hdfs://node-1:9000</value>
</property>
<!-- 定Hadoop运行是产生文件的存储目录。默认 -->
<property>
	<name>hadoop.tmp.dir</name>
	<value>/export/data/hddata</value>
</property>

Third: hdfs-site .xml

vi <strong>hdfs-site.xml</strong>
<!-- 指定HDFS副本的数量,不修改默认为3个 -->
<property>
	<name>dfs.replication</name>
	<value>2</value>
</property>
<!-- dfs的SecondaryNameNode在哪台主机上 -->
<property>
	<name>dfs.namenode.secondary.http-address</name>
	<value>node-2:50090</value>
</property>

Fourth: mapred-site.xml

mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml

<!-- 指定MapReduce运行是框架,这里指定在yarn上,默认是local -->
<property>
	<name>mapreduce.framework.name</name>
	<value>yarn</value>
</property>

Fifth: yarn-site.xml

vi <strong>yarn-site.xml</strong>
<!-- 指定yarn的老大ResourceManager的地址 -->
<property>
	<name>yarn.resourcemanager.hostname</name>
	<value>node-1</value>
</property>


<!-- NodeManager上运行的附属服务。需要配置成mapreduce_shuffle,才可以运行MapReduce程序默认值 -->
<property>
	<name>yarn.nodemanager.aux-services</name>
	<value>mapreduce_shuffle</value>
</property>

Sixth: Slaves file, which has the name of the host from which the node is located

vi slaves
node-1
node-2
node-3

Seventh: Add Hadoop to the environment variable

vi /etc/profile
export HADOOP_HOME=/export/server/hadoop-2.7.6
export PATH=$HADOOP_HOME/bin:$PATH

The hadoop installation package is then sent remotely to the individual subcomputers

scp -r /export/server/hadoop-2.7.6/ root@node-2:/export/server/
scp -r /export/server/hadoop-2.7.6/ root@node-3:/export/server/

The configured environment variables are also sent remotely to the various subcomputers

scp -r /etc/profile root@node-2:/etc/
scp -r /etc/profile root@node-3:/etc/

Then try all the computer environment variables to take effect

source /etc/profile

Configuration instructions for hadoop
On hadoop's official website, click Documentation in the lower left corner to click on the appropriate version to go in, and pull to the bottom left corner where there is a .xml
-default .xml is the default configuration and will take effect if the user does not modify these options
-site .xml is a user-defined configuration
The configuration options for the site take precedence over the default, and if there is one in the site, it overwrites the default inside