ubuntu下安裝pig

轉載自: http://blog.csdn.net/a925907195/article/details/42325579html

1 安裝 node

只安裝在namenode節點上便可 apache

1.1 下載並解壓 bash

下載:http://pig.apache.org/releases.html下載pig-0.12.1版本的pig-0.12.1.tar.gz app

存放路徑:/home/Hadoop/函數

解壓:tar -zxvf pig-0.12.1.tar.gz 更名:mv pig-0.12.1 pig 而後放到/usr/local/hadoop下grunt

1.2 改變pig的全部者 oop

chown -R hadoop:hadoop /usr /local/hadoop/pig 測試

1.3 修改配置文件 ui

添加path路徑:打開/etc/profile文件(vi /etc/profile)在最後加入以下內容 #pig path 

export PATH=$PATH: /usr /local/hadoop/pig/bin 使修改生效:source /etc/profile 

1.4 驗證安裝是否成功 

輸入pig –x local 命令。看到出現「grunt>」提示符,代表Pig已經安裝成功,以下:

Pig –x local

1.5 配置pig的mapreduce模式 

編輯/etc/profile文件,加入hadoop/conf路徑

Vim /etc profile

export PATH=$PATH: /usr /local/Hadoop/pig/bin

export PIG_CLASSPATH=/usr /local/Hadoop/conf

執行使配置文件生效

Source /etc/profile

1.6 驗證pig的mapreduce模式 

直接輸入pig命令,出現「grunt>」提示便可(必須先啓動hadoop) 

 

1.7 修改Pig的日誌文件目錄 

Pig的日誌默認在當前目錄,不方便進行分析和管理,須要修改日誌文件目錄,修改以下: 

1) 在/usr/pig目錄下新建文件夾logs 

midir /usr/local/hadoop/pig/logs

2) 修改/usr/local/Hadoop/pig/conf/pig.properties文件中的pig.logfile=/usr/local/Hadoop /pig/logs 

打開/usr/local/Hadoop /pig/conf/pig.properties文件找到pig.logfile修改以下:

Pig.logfile=/usr/local/Hadoop/pig/logs

 

1.8 pig 經常使用命令

Pig –x local以本地模式進入pig

Pig直接以hdfs系統模式進入pig

測試Pig latin語句

經常使用語句:

LOAD : 指出載入數據的方法

FOREACH:逐行掃描進行某種處理

FILTER:過濾行

DUMP:把結果顯示到屏幕

STORE:把結果保存到文件

一般書寫執行順序:

LOAD ——〉FOREACH——〉STORE

1.9 測試pig在MapReduce 模式下做業的執行

步驟一:上傳passwd.txt到hdfs文件系統

cat/home/hadoop/fjshtest/passwd.txt

root:x:0:0:root:/root:/bin/bash

daemon:x:1:1:daemon:/usr/sbin:/bin/sh

bin:x:2:2:bin:/bin:/bin/sh

sys:x:3:3:sys:/dev:/bin/sh

sync:x:4:65534:sync:/bin:/bin/sync

games:x:5:60:games:/usr/games:/bin/sh

man:x:6:12:man:/var/cache/man:/bin/sh

lp:x:7:7:lp:/var/spool/lpd:/bin/sh

mail:x:8:8:mail:/var/mail:/bin/sh

news:x:9:9:news:/var/spool/news:/bin/sh

uucp:x:10:10:uucp:/var/spool/uucp:/bin/sh

proxy:x:13:13:proxy:/bin:/bin/sh

www-data:x:33:33:www-data:/var/www:/bin/sh

backup:x:34:34:backup:/var/backups:/bin/sh

list:x:38:38:MailingList Manager:/var/list:/bin/sh

bin/hadoop fs -put /home/hadoop/fjshtest/passwd.txt /user/hadoop/in

bin/hadoop fs -ls /user/hadoop/in

-rw-r--r--   2 hadoop supergroup       1705 2015-01-01 22:46/user/hadoop/in/passwd.txt

-rw-r--r--   2 hadoop supergroup       1026 2015-01-01 22:23 /user/hadoop/in/pigtest

-rw-r--r--   2 hadoop supergroup         12 2014-11-14 23:18/user/hadoop/in/test1.txt

-rw-r--r--   2 hadoop supergroup         13 2014-11-14 23:18/user/hadoop/in/test2.txt

 

  步驟二:在grunt編譯器命令行依次執行以下命令 
       A = load '/user/hadoop/in/passwd.txt' usingPigStorage(':'); 

       B = foreach A generate$0 as id;

        dump B;

        在屏幕能夠直接查看命令執行結果

- Total input paths toprocess : 1

(root)

(daemon)

(bin)

(sys)

(sync)

(games)

(man)

(lp)

(mail)

(news)

(uucp)

(proxy)

(www-data)

(backup)

(list)

(irc)

(gnats)

(nobody)

(libuuid)

 

常見錯誤整理:

pig語句等號兩次須要空格,不然報錯
A=load'test.txt' as {ip:chararray        ,other:chararray} usingPigStorage(' ');
-->報錯
grunt>A=load 'test.txt' as {ip:chararray,other:chararray} using PigStorage(' ');
2014-07-0416:05:35,935 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Errorduring parsing. Encountered " <PATH> "A=load "" atline 2, column 1.
問題2:load加載數據時,usingPigStorage(' ')須要寫到as以前
A =LOAD 'test.txt' AS (ip:chararray        ,other:chararray)using PigStorage(' ');
-->報錯
grunt>A = LOAD 'test.txt' AS (ip:chararray,other:chararray) using PigStorage(' ');
2014-07-0416:03:35,421 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200:<line 1, column 54>  mismatched input 'using' expectingSEMI_COLON

問題3:有些函數和關鍵字如COUNT,PigStorage,分區大小寫,不然提示不存在
C =foreach B {generate ip,count(ip);};
-->報錯
grunt>C = foreach B {generate ip,count(ip);};
2014-07-0416:19:40,167 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Couldnot resolve count using imports: [, Java.lang., org.apache.pig.builtin.,org.apache.pig.impl.builtin.]
Detailsat logfile: /app01/pig-0.13.0/pig_1404460981802.log

問題4:指定字段名,須要指定是那個關係(A.ip)C =foreach B {generate ip,COUNT(ip);};-->報錯grunt>C = foreach B {generate ip,COUNT(ip);};2014-07-0416:18:54,919 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: <line4, column 24> Invalid field projection. Projected field [ip] does not existin schema: group:chararray,A:bag{:tuple(ip:chararray,other:chararray)}.

相關文章
相關標籤/搜索