Pig的安裝很簡單,注意一下幾點:html
一、設置系統環境變量: java
export PIG_HOME=.../pig-x.y.z export PATH=$PATH:$PIG_HOME/bin
設置完成後使用pig -help進行驗證一下。sql
二、兩種mode:shell
local mode:訪問本地文件系統,進入shell時使用命令:pig -x localapache
MapReduce mode:pig將查詢翻譯爲MapReduce做業,而後在hadoop集羣上執行。此時,進入shell時的命令爲:pig -x mapreduce 或者pig函數
hadoop@master:/usr/local/hadoop/conf$ pig -x mapreduce Warning: $HADOOP_HOME is deprecated. 2013-08-16 16:18:52,388 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53 2013-08-16 16:18:52,389 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/hadoop/conf/pig_1376641132384.log 2013-08-16 16:18:52,470 [main] INFO org.apache.pig.impl.util.Utils - Default bootup file /home/hadoop/.pigbootup not found 2013-08-16 16:18:52,760 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://master:9000 2013-08-16 16:18:53,174 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: master:9001
注意:使用MapReduce模式須要設置hadoop的配置文件hadoop-env.sh,加入:grunt
export PIG_CLASSPATH=$HADOOP_HOME/conf
.../in/ncdc/micro-tab/sample.txt文件的內容爲:oop
1950 0 1 1950 22 1 1950 -11 1 1949 111 1 1949 78 1
在pig的shell下執行下列命令: spa
grunt> -- max_temp.pig: Finds the maximum temperature by year grunt> records = LOAD 'hdfs://master:9000/in/ncdc/micro-tab/sample.txt'--在不肯定本身設置的默認路徑是什麼的狀況下使用hdfs完整路徑 >> AS (year:chararray, temperature:int, quality:int); grunt> filtered_records = FILTER records BY temperature != 9999 AND >> (quality == 0 OR quality == 1 OR quality == 4 OR quality == 5 OR quality == 9); grunt> grouped_records = GROUP filtered_records BY year; grunt> max_temp = FOREACH grouped_records GENERATE group, >> MAX(filtered_records.temperature); grunt> DUMP max_temp;
pig同時提供ILLUSTRATE操做,以生成簡潔明瞭的數據集。翻譯
grunt>ILLUSTRATE max_temp;
輸出爲:
指南中關於註釋的示例,在此處,略做修改,加入schema:
grunt> B = LOAD 'input/pig/join/B' AS (chararry,int); grunt> A = LOAD 'input/pig/join/A' AS (int,chararry); grunt> C = JOIN A BY $0, /* ignored */ B BY $1; grunt> DESCRIBE C C: {A::val_0: int,A::chararry: bytearray,B::chararry: bytearray,B::val_0: int} grunt> ILLUSTRATE C
輸出爲:
---------------------------------------------------- | A | val_0:int | chararry:bytearray | ---------------------------------------------------- | | 3 | Hat | | | 3 | Hat | ---------------------------------------------------- ---------------------------------------------------- | B | chararry:bytearray | val_0:int | ---------------------------------------------------- | | Eve | 3 | | | Eve | 3 | ---------------------------------------------------- ----------------------------------------------------------------------------------------------------------- | C | A::val_0:int | A::chararry:bytearray | B::chararry:bytearray | B::val_0:int | ----------------------------------------------------------------------------------------------------------- | | 3 | Hat | Eve | 3 | | | 3 | Hat | Eve | 3 | | | 3 | Hat | Eve | 3 | | | 3 | Hat | Eve | 3 | -----------------------------------------------------------------------------------------------------------
注意:Pig Latin的大小寫敏感性採用混合的規則,其中:
操做和命令是大小寫無關;
別名和函數大小寫敏感。
例如上例中:
grunt> describe c 2013-08-16 17:14:49,397 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1005: No plan for c to describe Details at logfile: /usr/local/hadoop/conf/pig_1376641755235.log grunt> describe C C: {A::val_0: int,A::chararry: bytearray,B::chararry: bytearray,B::val_0: int} grunt> DESCRIBE C C: {A::val_0: int,A::chararry: bytearray,B::chararry: bytearray,B::val_0: int}