之前對Hadoop有過一點了解,但沒有深刻,如今愈來愈感受這東西挺有意思的,打算學習下,前兩天買了兩本Hadoop相關的書,先粗略的翻了下,今天就動手先把環境搭起來。 html
環境:centos6.2,jdk7_u45,hadoop2.2.0 node
下載,解壓過程就不說了,直接環境配置(包括JAVA_HOME的配置,以及HADOOP_HOME的環境變量配置,都略過了)。參考文檔http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/SingleCluster.html shell
1,修改hadoop-env.sh中修改JAVA_HOME apache
2,修改core-site.xml配置文件 centos
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/data/hadoop/tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> <final>true</final> </property> </configuration>
3,修改hdfs-site.xml配置文件 app
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:///data/hadoop/dfs/name</value> <final>true</final> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///data/hadoop/dfs/data</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions.enabled</name> <value>false</value> </property> </configuration>
4,複製mapred-site.xml.template成mapred-site.xml,修改mapred-site.xml dom
cp mapred-site.xml.template mapred-site.xml vi mapred-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <!-- <property> <name>mapreduce.cluster.temp.dir</name> <value></value> <final>true</final> </property> <property> <name>mapreduce.cluster.local.dir</name> <value></value> <final>true</final> </property> --> </configuration>
5,修改yarn-site.xml配置文件 jsp
<?xml version="1.0"?> <configuration> <property> <name>yarn.resourcemanager.hostname</name> <value>localhost</value> <description>hostanem of RM</description> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>localhost:5274</value> <description>host is the hostname of the resource manager and port is the port on which the NodeManagers contact the Resource Manager. </description> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>localhost:5273</value> <description>host is the hostname of the resourcemanager and port is the port on which the Applications in the cluster talk to the Resource Manager. </description> </property> <property> <name>yarn.resourcemanager.scheduler.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value> <description>In case you do not want to use the default scheduler</description> </property> <property> <name>yarn.resourcemanager.address</name> <value>localhost:5271</value> <description>the host is the hostname of the ResourceManager and the port is the port on which the clients can talk to the Resource Manager. </description> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value></value> <description>the local directories used by the nodemanager</description> </property> <property> <name>yarn.nodemanager.address</name> <value>localhost:5272</value> <description>the nodemanagers bind to this port</description> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>10240</value> <description>the amount of memory on the NodeManager in GB</description> </property> <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/app-logs</value> <description>directory on hdfs where the application logs are moved to </description> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value></value> <description>the directories used by Nodemanagers as log directories</description> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <description>shuffle service that needs to be set for Map Reduce to run </description> </property> </configuration>
到此爲止,hadoop單機版配置已經完成。 oop
1)接下來咱們先格式化namenode,而後啓動namenode 學習
hadoop namenode –format
hadoop-daemon.sh start namenode能夠查看http://localhost:50070/dfshealth.jsp中logs的日誌 (帶namenode*.log字眼),確認是否啓動成功,若是沒有報錯則啓動成功。
2)接着啓動hdfs datanode
hadoop-daemon.sh start datanode同時也能夠在開始頁面上查詢對應的日誌文件(帶datanode *.log字眼),若是沒有報錯,和namenode通訊成功,即啓動成功。
還能夠在命令行數據Jps查看是否有結果
3)繼續啓動yarn
yarn-daemon.sh start resourcemanager yarn-daemon.sh start nodemanager
判斷啓動成功與否方法同上面一致。
最後進入hadoop-2.2.0\share\hadoop\mapreduce錄入中,測試運行
hadoop jar hadoop-mapreduce-examples-2.2.0.jar randomwriter out查看運行是否成功