轉載自: http://blog.csdn.net/a925907195/article/details/42325579html
1 安裝 node
只安裝在namenode節點上便可 apache
1.1 下載並解壓 bash
下載:http://pig.apache.org/releases.html下載pig-0.12.1版本的pig-0.12.1.tar.gz app
存放路徑:/home/Hadoop/函數
解壓:tar -zxvf pig-0.12.1.tar.gz 更名:mv pig-0.12.1 pig 而後放到/usr/local/hadoop下grunt
1.2 改變pig的全部者 oop
chown -R hadoop:hadoop /usr /local/hadoop/pig 測試
1.3 修改配置文件 ui
添加path路徑:打開/etc/profile文件(vi /etc/profile)在最後加入以下內容 #pig path
export PATH=$PATH: /usr /local/hadoop/pig/bin 使修改生效:source /etc/profile
1.4 驗證安裝是否成功
輸入pig –x local 命令。看到出現「grunt>」提示符,代表Pig已經安裝成功,以下:
Pig –x local
1.5 配置pig的mapreduce模式
編輯/etc/profile文件,加入hadoop/conf路徑
Vim /etc profile
export PATH=$PATH: /usr /local/Hadoop/pig/bin
export PIG_CLASSPATH=/usr /local/Hadoop/conf
執行使配置文件生效
Source /etc/profile
1.6 驗證pig的mapreduce模式
直接輸入pig命令,出現「grunt>」提示便可(必須先啓動hadoop)
1.7 修改Pig的日誌文件目錄
Pig的日誌默認在當前目錄,不方便進行分析和管理,須要修改日誌文件目錄,修改以下:
1) 在/usr/pig目錄下新建文件夾logs
midir /usr/local/hadoop/pig/logs
2) 修改/usr/local/Hadoop/pig/conf/pig.properties文件中的pig.logfile=/usr/local/Hadoop /pig/logs
打開/usr/local/Hadoop /pig/conf/pig.properties文件找到pig.logfile修改以下:
Pig.logfile=/usr/local/Hadoop/pig/logs
1.8 pig 經常使用命令
Pig –x local以本地模式進入pig
Pig直接以hdfs系統模式進入pig
測試Pig latin語句
經常使用語句:
LOAD : 指出載入數據的方法
FOREACH:逐行掃描進行某種處理
FILTER:過濾行
DUMP:把結果顯示到屏幕
STORE:把結果保存到文件
一般書寫執行順序:
LOAD ——〉FOREACH——〉STORE
1.9 測試pig在MapReduce 模式下做業的執行
步驟一:上傳passwd.txt到hdfs文件系統
cat/home/hadoop/fjshtest/passwd.txt
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/bin/sh
bin:x:2:2:bin:/bin:/bin/sh
sys:x:3:3:sys:/dev:/bin/sh
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/bin/sh
man:x:6:12:man:/var/cache/man:/bin/sh
lp:x:7:7:lp:/var/spool/lpd:/bin/sh
mail:x:8:8:mail:/var/mail:/bin/sh
news:x:9:9:news:/var/spool/news:/bin/sh
uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh
proxy:x:13:13:proxy:/bin:/bin/sh
www-data:x:33:33:www-data:/var/www:/bin/sh
backup:x:34:34:backup:/var/backups:/bin/sh
list:x:38:38:MailingList Manager:/var/list:/bin/sh
bin/hadoop fs -put /home/hadoop/fjshtest/passwd.txt /user/hadoop/in
bin/hadoop fs -ls /user/hadoop/in
-rw-r--r-- 2 hadoop supergroup 1705 2015-01-01 22:46/user/hadoop/in/passwd.txt
-rw-r--r-- 2 hadoop supergroup 1026 2015-01-01 22:23 /user/hadoop/in/pigtest
-rw-r--r-- 2 hadoop supergroup 12 2014-11-14 23:18/user/hadoop/in/test1.txt
-rw-r--r-- 2 hadoop supergroup 13 2014-11-14 23:18/user/hadoop/in/test2.txt
步驟二:在grunt編譯器命令行依次執行以下命令
A = load '/user/hadoop/in/passwd.txt' usingPigStorage(':');
B = foreach A generate$0 as id;
dump B;
在屏幕能夠直接查看命令執行結果
- Total input paths toprocess : 1
(root)
(daemon)
(bin)
(sys)
(sync)
(games)
(man)
(lp)
(mail)
(news)
(uucp)
(proxy)
(www-data)
(backup)
(list)
(irc)
(gnats)
(nobody)
(libuuid)
常見錯誤整理:
pig語句等號兩次須要空格,不然報錯
A=load'test.txt' as {ip:chararray ,other:chararray} usingPigStorage(' ');
-->報錯
grunt>A=load 'test.txt' as {ip:chararray,other:chararray} using PigStorage(' ');
2014-07-0416:05:35,935 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Errorduring parsing. Encountered " <PATH> "A=load "" atline 2, column 1.
問題2:load加載數據時,usingPigStorage(' ')須要寫到as以前
A =LOAD 'test.txt' AS (ip:chararray ,other:chararray)using PigStorage(' ');
-->報錯
grunt>A = LOAD 'test.txt' AS (ip:chararray,other:chararray) using PigStorage(' ');
2014-07-0416:03:35,421 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:<line 1, column 54> mismatched input 'using' expectingSEMI_COLON
問題3:有些函數和關鍵字如COUNT,PigStorage,分區大小寫,不然提示不存在
C =foreach B {generate ip,count(ip);};
-->報錯
grunt>C = foreach B {generate ip,count(ip);};
2014-07-0416:19:40,167 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Couldnot resolve count using imports: [, Java.lang., org.apache.pig.builtin.,org.apache.pig.impl.builtin.]
Detailsat logfile: /app01/pig-0.13.0/pig_1404460981802.log
問題4:指定字段名,須要指定是那個關係(A.ip)C =foreach B {generate ip,COUNT(ip);};-->報錯grunt>C = foreach B {generate ip,COUNT(ip);};2014-07-0416:18:54,919 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: <line4, column 24> Invalid field projection. Projected field [ip] does not existin schema: group:chararray,A:bag{:tuple(ip:chararray,other:chararray)}.