之前的配置与伪分布式与完全分布式相同

不同的是配置文件,如果前面不懂的可以去看完全分布式的文章

HA前提调剂搭建好zookeeper集群,并且启动

core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoopcluster</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/usr/local/src/hadoop/tmp</value>
    </property>
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>master,slave1,slave2</value>
    </property>
</configuration>

配置 hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.nameservices</name>
            <value>hadoopcluster</value>
        </property>
        <property>
            <name>dfs.ha.namenodes.hadoopcluster</name>
            <value>nn1,nn2</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.hadoopcluster.nn1</name>
            <value>master:9000</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.hadoopcluster.nn1</name>
            <value>master:50070</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.hadoopcluster.nn2</name>
            <value>slave1:9000</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.hadoopcluster.nn2</name>
            <value>slave1:50070</value>
        </property>
        <property>
            <name>dfs.namenode.shared.edits.dir</name>
            <value>qjournal://master;slave1;slave2/hadoopcluster</value>
        </property>
        <property>
            <name>dfs.journalnode.edits.dir</name>
            <value>/usr/local/src/hadoop/tmp</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/usr/local/src/hadoop/tmp</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/usr/local/src/hadoop/tmp</value>
        </property>
        <property>
            <name>dfs.ha.automatic-failover.enabled</name>
            <value>true</value>
        </property>
        <property>
            <name>dfs.client.failover.proxy.provider.hadoopcluster</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
            <name>dfs.ha.fencing.methods</name>
            <value>
                sshfence
                shell(/bin/true)
            </value>
        </property>
        <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <value>/root/.ssh/id_rsa</value>
        </property>
    </configuration>

yarn-site.xml

<configuration>
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarnresourcemanager</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>master</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>slave2</value>
    </property>
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>master,slave1,slave2</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
</configuration>

mapred-site.xml


<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.staging-dir</name>
        <value>/usr/local/src/hadoop/history</value>
    </property>
</configuration>

hadoop-env.sh

export HADOOP_HOME=/usr/local/src/hadoop/
export HADOOP_CONF_DIR=/usr/local/src/hadoop/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export JAVA_HOME=/usr/local/src/jdk

yarn-env.sh

export JAVA_HOME=/usr/local/src/jdk

mapred-env.sh

export JAVA_HOME=/usr/local/src/jdk

集群规划

因为只配置了nn1与nn2,所以slave2中没有namenode,同理所以也没有ZKFC

因为只配置了rm1与rm2,所以slave1中没有ResourceManager

master slave1 slave2
NameNode NameNode
JournalNode JournalNode JournalNode
DataNode DataNode DataNode
Zookeeper Zookeeper Zookeeper
ZKFC ZKFC
ResourceManager ResourceManager
NodeManager NodeManager NodeManager

1.现在各个节点上启动zk

[root@master ~]# zkServer.sh start
[root@slave1 ~]# zkServer.sh start
[root@slave2 ~]# zkServer.sh start

再启动journalnode

[root@master ~]$ hadoop-daemon.sh start journalnode
[root@slave1 ~]$ hadoop-daemon.sh start journalnode
[root@slave2 ~]$ hadoop-daemon.sh start journalnode

在nn1上格式化并启动namenode

[root@master ~]$ hdfs namenode -format
[root@master ~]$ hdfs --daemon start namenode

master上启动namenode之后,slave1同步数据源,如果master没有启动namenode,slave1无法同步数据源

同步nn1的元数据

[root@slave1 ~]$ hdfs namenode -bootstrapStandby

启动nn2

[root@slave1 ~]$ hdfs --daemon start namenode

如果需要重新格式化,则删除 hadoop下面的name文件夹

master格式化ZKFS,否则会无法启动

hdfs zkfc -formatZK 

启动dfs

start-dfs.sh

启动yarn

start-yarn.sh

启动hadoop

 start-all.sh