Hadoop 完全分布式搭建(超详细无坑版,直接复制执行)
# Hadoop 完全分布式搭建【超详细无坑版|直接复制执行】
适配:**CentOS 7 / Rocky Linux 7+**
架构:`1主 + 2从`
- master:NameNode、ResourceManager
- node1:DataNode、NodeManager
- node2:DataNode、NodeManager
前置统一规范:
- 主机名:master / node1 / node2
- 统一用户:root
- 统一目录:`/usr/local/hadoop`
- JDK:1.8
- 关闭防火墙、SELinux、免密登录、固定IP
---
# 一、三台机器 前置统一配置(所有节点执行)
## 1. 固定主机名
```bash
# master 执行
hostnamectl set-hostname master
# node1 执行
hostnamectl set-hostname node1
# node2 执行
hostnamectl set-hostname node2
```
## 2. 配置 hosts(三台全部执行)
```bash
cat >> /etc/hosts << EOF
192.168.10.100 master
192.168.10.101 node1
192.168.10.102 node2
EOF
```
> 替换为你三台真实 IP
## 3. 关闭防火墙 & 开机不自启
```bash
systemctl stop firewalld
systemctl disable firewalld
```
## 4. 关闭 SELinux
```bash
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
setenforce 0
```
## 5. 安装依赖
```bash
yum install -y wget net-tools vim lrzsz tar
```
---
# 二、配置 SSH 免密登录【仅 master 执行】
```bash
# 生成密钥 一路回车
ssh-keygen -t rsa
# 分发公钥到三台机器
ssh-copy-id master
ssh-copy-id node1
ssh-copy-id node2
```
测试:`ssh node1` 无需密码即为成功
---
# 三、统一安装 JDK1.8(三台全部)
## 1. 解压安装
```bash
# 上传 jdk-8u341-linux-x64.tar.gz 到 /usr/local
tar -zxvf /usr/local/jdk-8u341-linux-x64.tar.gz -C /usr/local/
mv /usr/local/jdk1.8.0_341 /usr/local/jdk
```
## 2. 配置环境变量
```bash
cat >> /etc/profile << EOF
# JDK
export JAVA_HOME=/usr/local/jdk
export JRE_HOME=\$JAVA_HOME/jre
export CLASSPATH=.:\$JAVA_HOME/lib:\$JRE_HOME/lib
export PATH=\$JAVA_HOME/bin:\$PATH
EOF
source /etc/profile
java -version
```
---
# 四、Hadoop 安装【master 操作,后续分发】
## 1. 解压
```bash
# 上传 hadoop-3.3.6.tar.gz 到 /usr/local
tar -zxvf /usr/local/hadoop-3.3.6.tar.gz -C /usr/local/
mv /usr/local/hadoop-3.3.6 /usr/local/hadoop
```
## 2. Hadoop 环境变量
```bash
cat >> /etc/profile << EOF
# HADOOP
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=\$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=\$HADOOP_HOME/lib/native
export PATH=\$HADOOP_HOME/bin:\$HADOOP_HOME/sbin:\$PATH
EOF
source /etc/profile
hadoop version
```
## 3. 修改 hadoop-env.sh
```bash
vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
```
写入:
```bash
export JAVA_HOME=/usr/local/jdk
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
```
---
# 五、核心配置文件(全部复制覆盖)
## 1. core-site.xml
```bash
vim /usr/local/hadoop/etc/hadoop/core-site.xml
```
```xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
<property>
<name>hdfs.trash.interval</name>
<value>1440</value>
</property>
</configuration>
```
## 2. hdfs-site.xml
```bash
vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
```
```xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>node1:50090</value>
</property>
</configuration>
```
## 3. mapred-site.xml
```bash
vim /usr/local/hadoop/etc/hadoop/mapred-site.xml
```
```xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
```
## 4. yarn-site.xml
```bash
vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
```
```xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
```
## 5. workers 从节点列表
```bash
vim /usr/local/hadoop/etc/hadoop/workers
```
清空原有内容,写入:
```
node1
node2
```
---
# 六、分发 Hadoop 到 node1、node2
```bash
scp -r /usr/local/hadoop node1:/usr/local/
scp -r /usr/local/hadoop node2:/usr/local/
# 分发环境变量
scp /etc/profile node1:/etc/
scp /etc/profile node2:/etc/
```
两台从节点执行生效:
```bash
source /etc/profile
```
---
# 七、格式化 & 启动集群【仅 master】
## 1. 初始化临时目录
```bash
mkdir -p /usr/local/hadoop/tmp
```
## 2. 格式化 NameNode(**只执行一次!**)
```bash
hdfs namenode -format
```
## 3. 启动 HDFS
```bash
start-dfs.sh
```
## 4. 启动 YARN
```bash
start-yarn.sh
```
---
# 八、集群进程检查
## master 执行:`jps`
```
NameNode
ResourceManager
```
## node1 / node2 执行:`jps`
```
DataNode
NodeManager
```
---
# 九、Web 访问
- HDFS:http://master:9870
- YARN:http://master:8088
---
# 十、常用启停命令
```bash
# 全集群停止
stop-dfs.sh
stop-yarn.sh
# 全集群启动
start-dfs.sh
start-yarn.sh
```
---
# 十一、避坑关键点(必看)
1. 所有节点 **IP+hosts 一一对应**
2. 只在 **第一次部署** 执行 `hdfs namenode -format`,重复格式化集群损坏
3. 免密登录必须通,否则启动卡死
4. 防火墙、SELinux 必须关闭
5. 三台机器 **JDK、Hadoop 路径完全一致**

