Hadoop學習

例子1: 統計一個文本文件裏的單詞數量java

[hadoop@mylinux ~]$ cd hadoop-0.20.2/
[hadoop@mylinux hadoop-0.20.2]$ mkdir input   #在本地系統創建目錄 input
[hadoop@mylinux hadoop-0.20.2]$ cd input       #進入input目錄
[hadoop@mylinux input]$ vi file01                     #新建一個文件file01
hello hadoop 
this is first examples by huxin
hadoop
[hadoop@mylinux hadoop-0.20.2]$ cd
[hadoop@mylinux ~]$ cd hadoop-0.20.2/
[hadoop@mylinux hadoop-0.20.2]$ hadoop fs -mkdir input   在hadoop文件系統中創建目錄input,注意和上面的本地系統目錄input區分開
[hadoop@mylinux hadoop-0.20.2]$ hadoop fs -put ./input/file01 input   #將本地的file01放進hadoop文件系統的目錄中
[hadoop@mylinux hadoop-0.20.2]$ hadoop fs -ls input
Found 1 items
-rw-r--r--   1 hadoop supergroup         54 2011-06-22 19:25 /user/hadoop/input/file01
[hadoop@mylinux hadoop-0.20.2]$ hadoop jar hadoop-0.20.2-examples.jar wordcount input  output #對hadoop文件系統進行count單詞數,完成後自動生成一個output目錄
[hadoop@mylinux hadoop-0.20.2]$ hadoop fs -ls output
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2011-06-22 19:26 /user/hadoop/output/_logs
-rw-r--r--   1 hadoop supergroup         61 2011-06-22 19:26 /user/hadoop/output/part-r-00000
[hadoop@mylinux hadoop-0.20.2]$ hadoop fs -cat output/part-r-00000   #查看part-r-00000
by      1
examples        1
first   1
hadoop  2
hello   1
huxin   1
is      1
this    1
-----------------------
Done!linux

 

Wordcount源碼是在src/examples/org/apache/hadoop/examples/WordCount.java這裏。
下面咱們來手動編譯和執行一遍這段原代碼吧:
[hadoop@mylinux ~]$ cd hadoop-0.20.2/
[hadoop@mylinux hadoop-0.20.2]$ mkdir playground
[hadoop@mylinux hadoop-0.20.2] $mkdir playground/src
[hadoop@mylinux hadoop-0.20.2]$ mkdir playground/classes
[hadoop@mylinux hadoop-0.20.2]$ cp src/examples/org/apache/hadoop/examples/WordCount.java  playground/src/WordCount.java
[[hadoop@mylinux hadoop-0.20.2]$ javac -classpath hadoop-0.20.2-core.jar:lib/commons-cli-1.2.jar -d playground/classes/ playground/src/WordCount.java
[hadoop@mylinux hadoop-0.20.2]$ jar -cvf playground/wordcount.jar -C playground/classes/ .
[hadoop@mylinux hadoop-0.20.2]$ hadoop fs -rmr output  #記得先刪除前面咱們創建過的output目錄
Deleted hdfs://master:9000/user/hadoop/output
[hadoop@mylinux hadoop-0.20.2]$ hadoop jar playground/wordcount.jar org.apache.hadoop.examples.WordCount input output
[hadoop@mylinux hadoop-0.20.2]$ hadoop fs -ls output
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2011-06-22 19:55 /user/hadoop/output/_logs
-rw-r--r--   1 hadoop supergroup         61 2011-06-22 19:55 /user/hadoop/output/part-r-00000
[hadoop@mylinux hadoop-0.20.2]$ hadoop fs -cat output/part-r-00000   #查看part-r-00000
by      1
examples        1
first   1
hadoop  2
hello   1
huxin   1
is      1
this    1
-----
doneapache

相關文章
相關標籤/搜索