May 26, 2021 Hadoop
1. Modify the host name, I created three virtual hosts here, named node-1, node-2, node-3, into the network file to delete the contents directly written on the host name on it
vi /etc/sysconfig/network
2. Map the IP and host name, after which reboot restarts the host
[root@node-1 redis-cluster]# vim /etc/hosts
192.168.0.1 node-1
192.168.0.2 node-2
192.168.0.3 node-3
3. Detection firewall (to shut down the firewall), different systems of the firewall shut down in different ways, the following to do a reference can be
1.service iptables stop 关闭
2.chkconfig iptables off 强制关闭 除非手动开启不然不会自动开启
3.chkconfig iptanles 查看
4.service iptables status 查看防火墙状态
4.ssh password-free login
Enter the command: ssh-keygen -t rsa then click on the four enter keys, as shown in the following image:
The host IP is then applied via ssh-copy-id, and then the host system can be logged in without entering a password via "ssh host name/IP"
Upload the hadoop installation package and decompress it into the hadoop-2.7.6/etc/hadoop directory
All of the following are written in the labels at the end of their respective profiles
The first hadoop-env.sh
vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_171 #JAVA_HOME写上自己jdk 的安装路径
Second: core-site .xml
vi <strong>core-site.xml</strong>
<!-- 指定Hadoop所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node-1:9000</value>
</property>
<!-- 定Hadoop运行是产生文件的存储目录。默认 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/export/data/hddata</value>
</property>
Third: hdfs-site .xml
vi <strong>hdfs-site.xml</strong>
<!-- 指定HDFS副本的数量,不修改默认为3个 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- dfs的SecondaryNameNode在哪台主机上 -->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node-2:50090</value>
</property>
Fourth: mapred-site.xml
mv mapred-site.xml.template mapred-site.xml
vi mapred-site.xml
<!-- 指定MapReduce运行是框架,这里指定在yarn上,默认是local -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
Fifth: yarn-site.xml
vi <strong>yarn-site.xml</strong>
<!-- 指定yarn的老大ResourceManager的地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node-1</value>
</property>
<!-- NodeManager上运行的附属服务。需要配置成mapreduce_shuffle,才可以运行MapReduce程序默认值 -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
Sixth: Slaves file, which has the name of the host from which the node is located
vi slaves
node-1
node-2
node-3
Seventh: Add Hadoop to the environment variable
vi /etc/profile
export HADOOP_HOME=/export/server/hadoop-2.7.6
export PATH=$HADOOP_HOME/bin:$PATH
The hadoop installation package is then sent remotely to the individual subcomputers
scp -r /export/server/hadoop-2.7.6/ root@node-2:/export/server/
scp -r /export/server/hadoop-2.7.6/ root@node-3:/export/server/
The configured environment variables are also sent remotely to the various subcomputers
scp -r /etc/profile root@node-2:/etc/
scp -r /etc/profile root@node-3:/etc/
Then try all the computer environment variables to take effect
source /etc/profile
Configuration instructions for hadoop
On hadoop's official website, click Documentation in the lower left corner to click on the appropriate version to go in, and pull to the bottom left corner where there is a .xml
-default .xml is the default configuration and will take effect if the user does not modify these options
-site .xml is a user-defined configuration
The configuration options for the site take precedence over the default, and if there is one in the site, it overwrites the default inside