centos7安装hadoop3
# CentOS 7 安装 Hadoop 3.x 完整教程(单机/伪分布式/完全分布式通用前置) ## 一、环境准备(所有节点必做) 系统:CentOS 7 软件:JDK8、Hadoop3.3.6(稳定版) ### 1. 关闭防火墙 & SELinux ```bash # 关闭防火墙 systemctl stop firewalld systemctl disable firewalld # 关闭SELinux setenforce 0 sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config ``` ### 2. 安装 JDK 1.8 ```bash yum install -y java-1.8.0-openjdk-devel ``` 查看Java路径 ```bash readlink -f $(which java) # 示例真实路径:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.382.b05-1.el7_9.x86_64 ``` ### 3. 配置 Hosts & 主机名 ```bash # 设置主机名(示例) hostnamectl set-hostname hadoop-master # 配置hosts vim /etc/hosts # 添加 192.168.10.100 hadoop-master 192.168.10.101 hadoop-slave1 192.168.10.102 hadoop-slave2 ``` ### 4. 配置 SSH 免密登录 ```bash # 生成密钥 一路回车 ssh-keygen -t rsa # 本地免密 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys # 分发到从节点(伪分布式只需要本机) ssh-copy-id hadoop-master ssh-copy-id hadoop-slave1 ssh-copy-id hadoop-slave2 ``` --- ## 二、下载并解压 Hadoop3 ### 1. 下载(国内镜像) ```bash cd /opt wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz ``` ### 2. 解压 ```bash tar -zxvf hadoop-3.3.6.tar.gz mv hadoop-3.3.6 hadoop ``` ### 3. 配置全局环境变量 ```bash vim /etc/profile # 末尾添加 export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.382.b05-1.el7_9.x86_64 export HADOOP_HOME=/opt/hadoop export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin ``` 生效 ```bash source /etc/profile # 验证 hadoop version ``` --- ## 三、Hadoop 核心配置文件 配置文件目录:`/opt/hadoop/etc/hadoop` ### 1. hadoop-env.sh ```bash vim /opt/hadoop/etc/hadoop/hadoop-env.sh # 添加 export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.382.b05-1.el7_9.x86_64 export HDFS_NAMENODE_USER=root export HDFS_DATANODE_USER=root export HDFS_SECONDARYNAMENODE_USER=root export YARN_RESOURCEMANAGER_USER=root export YARN_NODEMANAGER_USER=root ``` ### 2. core-site.xml ```xml <configuration> <!-- 指定默认文件系统 --> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop-master:9000</value> </property> <!-- 临时目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmp</value> </property> <!-- 关闭权限检查 --> <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration> ``` ### 3. hdfs-site.xml ```xml <configuration> <!-- 副本数 --> <property> <name>dfs.replication</name> <value>2</value> </property> <!-- 关闭权限 --> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> </configuration> ``` ### 4. mapred-site.xml ```xml <configuration> <!-- 指定MR运行在YARN上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>yarn.app.mapreduce.am.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.map.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> <property> <name>mapreduce.reduce.env</name> <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value> </property> </configuration> ``` ### 5. yarn-site.xml ```xml <configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- ResourceManager地址 --> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop-master</value> </property> </configuration> ``` ### 6. workers(3.x 替代 slaves) ```bash vim /opt/hadoop/etc/hadoop/workers # 写入节点 hadoop-master hadoop-slave1 hadoop-slave2 ``` --- ## 四、分发配置(分布式必做) ```bash scp -r /opt/hadoop root@hadoop-slave1:/opt/ scp -r /opt/hadoop root@hadoop-slave2:/opt/ # 同步环境变量 scp /etc/profile root@hadoop-slave1:/etc/ scp /etc/profile root@hadoop-slave2:/etc/ ``` --- ## 五、格式化 & 启动集群 ### 1. 格式化 NameNode(只执行一次!) ```bash hdfs namenode -format ``` ### 2. 启动 HDFS ```bash start-dfs.sh ``` ### 3. 启动 YARN ```bash start-yarn.sh ``` ### 4. 查看进程 `jps` - master:NameNode、ResourceManager、SecondaryNameNode - slave:DataNode、NodeManager --- ## 六、访问 Web UI - HDFS:`http://hadoop-master:9870` - YARN:`http://hadoop-master:8088` --- ## 七、常用启停命令 ```bash # 全部停止 stop-dfs.sh stop-yarn.sh # 全部启动 start-dfs.sh start-yarn.sh ``` --- ## 常见报错解决 1. **JAVA_HOME 找不到** 核对 `hadoop-env.sh` 内 JAVA_HOME 绝对路径 2. **连接从节点拒绝** 检查 SSH 免密、hosts 解析 3. **启动后无 DataNode** 删除 `tmp` 目录,重新格式化 NameNode



