Hadoop安裝
http://hadoop.apache.org/releases.htmlhtml
CentOS 下載 http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gzjava
下載後解壓node
tar zxf hadoop-3.0.0.tar.gz
把解壓後的文件本身定義的目錄,這裏爲 /usr/local/hadoop/git
設置JAVAHOMEgithub
若是不知道能夠經過java -verbose 查看spring
export JAVA_HOME=/usr/java/jdk1.8.0_112/
查看Java版本apache
java -version
設置HADOOP_HOMEcentos
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
使用如下指令判斷Hadoop是否工做springboot
hadoop version
配置SSH
apt-get install ssh
centos7 默認安裝了ssh,不用上面的命令app
基於空口令建立一個新SSH密鑰,以實現無密碼登錄
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
用如下指令測試
ssh localhost
由於咱們此時用root登錄,須要編輯如下文件
編輯vi /usr/local/hadoop/sbin/start-yarn.sh
在頂部添加用戶
YARN_RESOURCEMANAGER_USER=root YARN_NODEMANAGER_USER=root
編輯vi /usr/local/hadoop/sbin/start-dfs.sh
HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root
3.0版本後HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER
啓動HDFS和YARN守護進程
/usr/local/hadoop/sbin/start-dfs.sh /usr/local/hadoop/sbin/start-yarn.sh
若是啓動yarn報錯以下: localhost: ERROR: JAVA_HOME is not set and could not be found.
需在etc/hadoop/hadoop-env.sh 中寫入JAVA_HOME
export JAVA_HOME=/opt/jdk1.8.0_211/
若是發現dataNode沒有啓動,
是由於以前屢次格式化namenode致使的namenode與datanode之間的不一致。
因此須要刪除以前配置的data目錄(即爲dfs.data.dir所建立的文件夾),而後將temp文件夾與logs文件夾刪除,從新格式化namenode;
能夠在logs/hadoop-root-namenode-node02.log日誌中看到
若是看到以下警告
參考http://www.javashuo.com/article/p-umbxmzht-mw.html
namenode端口:9870
datanode端口:9864
NodeManager:8042
YARN的資源管理器Web地址:http://localhost:8088
打開防火牆端口
firewall-cmd --zone=public --add-port=8088/tcp --permanent firewall-cmd --reload
執行命令
$ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/<username> $ bin/hdfs dfs -put etc/hadoop input $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+' $ bin/hdfs dfs -get output output $ cat output/*
若是出現hadoop找不到或沒法加載主類,org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1.輸入命令 hadoop classpath
2.將輸出的內容直接複製到yarn-site.xml文件中
<configuration> <property> <name>yarn.application.classpath</name> <value>/opt/hadoop-3.1.2//etc/hadoop:/opt/hadoop-3.1.2//share/hadoop/common/lib/*:/opt/hadoop-3.1.2//share/hadoop/common/*:/opt/hadoop-3.1.2//share/hadoop/hdfs:/opt/hadoop-3.1.2//share/hadoop/hdfs/lib/*:/opt/hadoop-3.1.2//share/hadoop/hdfs/*:/opt/hadoop-3.1.2//share/hadoop/mapreduce/lib/*:/opt/hadoop-3.1.2//share/hadoop/mapreduce/*:/opt/hadoop-3.1.2//share/hadoop/yarn:/opt/hadoop-3.1.2//share/hadoop/yarn/lib/*:/opt/hadoop-3.1.2//share/hadoop/yarn/*</value> </property> </configuration>
重啓便可。
跑第一次失敗了,緣由是8G的內存不夠用。。。自動建立了第二次任務
程序訪問
使用maven新建工程hadoopdemo,添加依賴
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.2.1</version> </dependency>
定義Mapper
import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; public class MaxTemperatureMapper extends Mapper<LongWritable,Text,Text,IntWritable> { private static final int MISSING=9999; @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line=value.toString(); String year=line.substring(15,19); int airTemperature; if(line.charAt(87)=='+'){ airTemperature=Integer.parseInt(line.substring(88,92)); }else{ airTemperature=Integer.parseInt(line.substring(87,92)); } String quality=line.substring(92,93); if(airTemperature!=MISSING&&quality.matches("[01459]")){ context.write(new Text(year),new IntWritable(airTemperature)); } } }
定義Reducer
import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class MaxTemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue=Integer.MIN_VALUE; for(IntWritable value:values){ maxValue=Math.max(maxValue,value.get()); } context.write(key,new IntWritable(maxValue)); } }
定義主程序
import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MaxTemperature { public static void main(String[] args) throws Exception { if(args.length!=2){ System.err.println("Usage: MaxTemperature <input path> <output path>"); System.exit(-1); } Job job=new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max temperature"); FileInputFormat.addInputPath(job,new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true)?0:1); } }
使用maven打包
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> </plugins> </build>
若是是springboot應用,須要注意這裏的打包插件使用的是如上配置
將jar包上傳至CentOS中,這裏放到/opt/hadoop/ncdc/
下載測試數據:https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all
這裏下載1901.gz ,解壓到當前目錄
gunzip 1901.gz
運行測試
export HADOOP_CLASSPATH=hadoop-demo.jar
hadoop org.mythsky.hadoopdemo.MaxTemperature 1901 output
這裏的 org.mythsky.hadoopdemo是包名,若是沒有放在包內直接使用類名便可,第三個參數是輸入文件,第四個參數是輸出目錄
運行過程
查看輸出目錄
查看文件
這裏即便計算結果。