Hadoop

Hadoop安裝 

http://hadoop.apache.org/releases.htmlhtml

CentOS 下載 http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gzjava

下載後解壓node

tar zxf hadoop-3.0.0.tar.gz

把解壓後的文件本身定義的目錄,這裏爲 /usr/local/hadoop/git

設置JAVAHOMEgithub

若是不知道能夠經過java -verbose 查看spring

export JAVA_HOME=/usr/java/jdk1.8.0_112/

查看Java版本apache

java -version

設置HADOOP_HOMEcentos

export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

使用如下指令判斷Hadoop是否工做springboot

hadoop version

 

配置SSH

apt-get install ssh

centos7 默認安裝了ssh,不用上面的命令app

基於空口令建立一個新SSH密鑰,以實現無密碼登錄

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

用如下指令測試

ssh localhost

 

由於咱們此時用root登錄,須要編輯如下文件

編輯vi /usr/local/hadoop/sbin/start-yarn.sh

在頂部添加用戶

YARN_RESOURCEMANAGER_USER=root YARN_NODEMANAGER_USER=root

編輯vi /usr/local/hadoop/sbin/start-dfs.sh

HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=root

 3.0版本後HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER

 啓動HDFS和YARN守護進程

/usr/local/hadoop/sbin/start-dfs.sh /usr/local/hadoop/sbin/start-yarn.sh

若是啓動yarn報錯以下: localhost: ERROR: JAVA_HOME is not set and could not be found.

需在etc/hadoop/hadoop-env.sh 中寫入JAVA_HOME

export JAVA_HOME=/opt/jdk1.8.0_211/

 若是發現dataNode沒有啓動,

是由於以前屢次格式化namenode致使的namenode與datanode之間的不一致。

因此須要刪除以前配置的data目錄(即爲dfs.data.dir所建立的文件夾),而後將temp文件夾與logs文件夾刪除,從新格式化namenode;

能夠在logs/hadoop-root-namenode-node02.log日誌中看到

 若是看到以下警告

 

 參考http://www.javashuo.com/article/p-umbxmzht-mw.html

 namenode端口:9870

 datanode端口:9864

NodeManager:8042

 

 

YARN的資源管理器Web地址:http://localhost:8088

打開防火牆端口

firewall-cmd --zone=public --add-port=8088/tcp --permanent firewall-cmd --reload

執行命令

$ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/<username> $ bin/hdfs dfs -put etc/hadoop input $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar grep input output 'dfs[a-z.]+' $ bin/hdfs dfs -get output output $ cat output/*

若是出現hadoop找不到或沒法加載主類,org.apache.hadoop.mapreduce.v2.app.MRAppMaster

 1.輸入命令 hadoop classpath

 2.將輸出的內容直接複製到yarn-site.xml文件中

<configuration>
    <property>
        <name>yarn.application.classpath</name>
        <value>/opt/hadoop-3.1.2//etc/hadoop:/opt/hadoop-3.1.2//share/hadoop/common/lib/*:/opt/hadoop-3.1.2//share/hadoop/common/*:/opt/hadoop-3.1.2//share/hadoop/hdfs:/opt/hadoop-3.1.2//share/hadoop/hdfs/lib/*:/opt/hadoop-3.1.2//share/hadoop/hdfs/*:/opt/hadoop-3.1.2//share/hadoop/mapreduce/lib/*:/opt/hadoop-3.1.2//share/hadoop/mapreduce/*:/opt/hadoop-3.1.2//share/hadoop/yarn:/opt/hadoop-3.1.2//share/hadoop/yarn/lib/*:/opt/hadoop-3.1.2//share/hadoop/yarn/*</value>
    </property>
</configuration>
View Code

重啓便可。

 

 跑第一次失敗了,緣由是8G的內存不夠用。。。自動建立了第二次任務

程序訪問

使用maven新建工程hadoopdemo,添加依賴

<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-core</artifactId>
            <version>1.2.1</version>
        </dependency>

定義Mapper

import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import java.io.IOException; public class MaxTemperatureMapper extends  Mapper<LongWritable,Text,Text,IntWritable> { private static final int MISSING=9999; @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line=value.toString(); String year=line.substring(15,19); int airTemperature; if(line.charAt(87)=='+'){ airTemperature=Integer.parseInt(line.substring(88,92)); }else{ airTemperature=Integer.parseInt(line.substring(87,92)); } String quality=line.substring(92,93); if(airTemperature!=MISSING&&quality.matches("[01459]")){ context.write(new Text(year),new IntWritable(airTemperature)); } } }
View Code

定義Reducer

import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class MaxTemperatureReducer extends Reducer<Text,IntWritable,Text,IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue=Integer.MIN_VALUE; for(IntWritable value:values){ maxValue=Math.max(maxValue,value.get()); } context.write(key,new IntWritable(maxValue)); } }
View Code

定義主程序

import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MaxTemperature { public static void main(String[] args) throws Exception { if(args.length!=2){ System.err.println("Usage: MaxTemperature <input path> <output path>"); System.exit(-1); } Job job=new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max temperature"); FileInputFormat.addInputPath(job,new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true)?0:1); } }
View Code

使用maven打包

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-compiler-plugin</artifactId>
            <configuration>
                <source>1.8</source>
                <target>1.8</target>
            </configuration>
        </plugin>
    </plugins>
</build>

若是是springboot應用,須要注意這裏的打包插件使用的是如上配置

將jar包上傳至CentOS中,這裏放到/opt/hadoop/ncdc/

下載測試數據:https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all

這裏下載1901.gz ,解壓到當前目錄

gunzip 1901.gz

運行測試

export HADOOP_CLASSPATH=hadoop-demo.jar
hadoop org.mythsky.hadoopdemo.MaxTemperature 1901 output

這裏的 org.mythsky.hadoopdemo是包名,若是沒有放在包內直接使用類名便可,第三個參數是輸入文件,第四個參數是輸出目錄

運行過程

查看輸出目錄

查看文件

這裏即便計算結果。

相關文章
相關標籤/搜索