一、下載html
http://hadoop.apache.org/releases.htmljava
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.htmlnode
二、三臺虛擬機linux
192.168.17.178 192.168.17.179 192.168.17.180
三、刪除centos自帶的jdk,安裝jdk-7u80,三臺機器都執行這一步操做git
rpm -qa | grep java rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64 rpm -ivh jdk-7u80-linux-x64.rpm
安裝完成後,配置jdk路徑github
vi /etc/profile export JAVA_HOME=/usr/java/latest export PATH=$PATH:$JAVA_HOME/bin 使修改生效 source /etc/profile //使修改當即生效
四、設置178 ssh無密碼登陸179,180web
178上執行命令shell
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys scp .ssh/id_dsa.pub root@192.168.17.179:/root scp .ssh/id_dsa.pub root@192.168.17.180:/root
17九、180執行命令apache
mkdir .ssh cat id_dsa.pub >> .ssh/authorized_keys
五、上傳hadoop2.6.4到178機器,並解壓到/usr/local/hadoopcentos
/usr/local/hadoop/etc/hadoop/hadoop-env.sh配置
# set to the root of your Java installation export JAVA_HOME=/usr/java/latest # Assuming your installation directory is /usr/local/hadoop export HADOOP_PREFIX=/usr/local/hadoop
/usr/local/hadoop/etc/hadoop/core-site.xml配置
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://192.168.17.178:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> <property> <name>fs.checkpoint.period</name> <value>300</value> </property> <property> <name>fs.checkpoint.dir</name> <value>/usr/local/hadoop/dfs/namesecondary</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/hdfs-site.xml配置
<configuration> <property> <name>dfs.http.address</name> <value>192.168.17.178:50070</value> </property> <property> <name>dfs.secondary.http.address</name> <value>192.168.17.178:50090</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/usr/local/hadoop/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/usr/local/hadoop/dfs/data</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> <property> <name>dfs.nameservices</name> <value>192.168.17.178</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>192.168.17.178:50090</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/mapred-site.xml配置
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property> <property> <name>mapreduce.jobtracker.http.address</name> <value>192.168.17.178:50030</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>192.168.17.178:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>192.168.17.178:19888</value> </property> <property> <name>mapred.job.tracker</name> <value>192.168.17.178:9001</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/yarm-site.xml配置
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.hostname</name> <value>192.168.17.178</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>192.168.17.178:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>192.168.17.178:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>192.168.17.178:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>192.168.17.178:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>192.168.17.178:8088</value> </property> </configuration>
/usr/local/hadoop/etc/hadoop/master(注意master只是配置secondarynamenode節點運行,並非配置主從)
192.168.17.178
/usr/local/hadoop/etc/hadoop/slaves
192.168.17.178 192.168.17.179 192.168.17.180
六、複製配置好的hadoop2.6.4到179,180機器
scp -r /usr/local/hadoop 192.168.17.179:/usr/local/hadoop scp -r /usr/local/hadoop 192.168.17.180:/usr/local/hadoop
七、啓動和關閉hadoop
bin/hdfs namenode -format sbin/start-dfs.sh sbin/stop-dfs.sh sbin/start-yarn.sh sbin/stop-yarn.sh mr-jobhistory-daemon.sh start historyserver mr-jobhistory-daemon.sh stop historyserver sbin/hadoop-daemon.sh start secondarynamenode sbin/hadoop-daemon.sh stop secondarynamenode
經過下邊三個地址查看hadoop狀態
http://192.168.17.178:8088 http://192.168.17.178:19888 http://192.168.17.178:50070
window下配置hadoop開發環境
hadoop解壓在D:\hadoop\hadoop-2.6.4
一、安裝 Hadoop-Eclipse-Plugin
下載插件https://github.com/winghc/hadoop2x-eclipse-plugin,把hadoop2x-eclipse-plugin-master\release\hadoop-eclipse-plugin-2.6.0.jar(本人使用Eclipse Luna(4.4.2))放到eclipse的plugins文件夾。
打開eclipse,window->preferences->Hadoop Map/Reduce配置hadoop路徑D:\hadoop\hadoop-2.6.4
二、下載hadoop2.6版本的winutils和相關hadoop.dll文件,並放進D:\hadoop\hadoop-2.6.4\bin目錄,解決java.io.IOException: Could not locate executable null \bin\winutils.exe in the Hadoop binaries.報錯.
三、把hadoop.dll放進C:\Windows\System32目錄或eclipse工程項目裏面,解決java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z報錯
若是是放進C:\Windows\System32,這裏有個值得注意的問題,hadoop.dll版本必須與操做系統、jdk相同32位或64位才行,以前有臺64位win7,32位jdk,32位eclipse機器一直報錯就是這緣由形成的。
而hadoop.dll放進eclipse工程裏面(與src同一層目錄),只需與jdk版本相同就行。
四、file->new->other->Map/Reduce projectu新建項目工程
運行例子:文章以文件形式保存在hdfs://192.168.17.178:9000/mongo目錄,對全部文章按分詞,並統計出現次數,分詞使用ansj庫。
import java.io.IOException; import java.util.List; import org.ansj.domain.Term; import org.ansj.splitWord.analysis.ToAnalysis; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class NewsWordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { System.out.println("map value=" + value.toString()); List<Term> parse = ToAnalysis.parse(value.toString()); System.out.println(parse); for(Term term : parse){ String natrue = term.getNatureStr(); String name = term.getName(); word.set(name + "_" + natrue); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); System.out.println("reduce text=" + key + " result=" + result); context.write(key, result); } } public static void main(String[] args) throws Exception { System.setProperty("hadoop.home.dir", "D:/hadoop/hadoop-2.6.4"); System.setProperty("HADOOP_USER_NAME", "root"); args = new String[] { "hdfs://192.168.17.178:9000/mongo", "hdfs://192.168.17.178:9000/mongo_output" }; Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "news word count"); job.setJarByClass(NewsWordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
查看結果:
查看結果: bin/hdfs dfs -cat /mongo_output/part-r-00000
參考文檔
http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-common/SingleCluster.html
http://hadoop.apache.org/docs/r2.6.4/hadoop-project-dist/hadoop-common/ClusterSetup.html