一、 hadoop集羣環境配置好(本人hadoop版本:hadoop-2.7.3)shell
二、 windows基礎環境準備:apache
jdk環境配置、esclipse環境配置windows
一、下載pig:grunt
在Apache下載最新的Pig軟件包,點擊下載會推薦最快的鏡像站點,如下爲下載地址:http://mirror.bit.edu.cn/apache/pig/oop
二、上傳pig(個人是上傳到/opt/bigdata下面)ui
三、解壓縮spa
[hadoop@wangmaster sbin]$ cd /opt/bigdata/ [hadoop@wangmaster bigdata]$ ls docs hadoop-2.7.3.tar.gz hbase-1.2.5-bin.tar.gz jdk1.8.tar.gz opt pig-0.17.0 zookeeper-3.4.10 hadoop-2.7.3 hbase-1.2.5 jdk1.8 maxtemperaurte.jar output pig-0.17.0.tar.gz zookeeper-3.4.10.tar.gz
[hadoop@wangmaster bigdata]$ tar -xzvf pig-0.17.0
四、設置環境變量命令行
sudo vi /etc/profile ##設置pig的class路徑和在path加入pig的路徑,其中PIG_CLASSPATH參數是設置pig在MapReduce工做模式: export PIG_HOME=/opt/bigdata/pig-0.17.0 export PATH=$PATH: /opt/bigdata/hadoop-2.7.3/bin:$PIG_HOME/bin ##確認生效 source /etc/profile
五、驗證安裝完成code
從新登陸終端,確保hadoop集羣啓動,鍵入pig命令,應該能看到pig鏈接到hadoop集羣的信息而且進入了grunt shell命令行模式:server
若是須要退出的話,在pig的grunt shell下鍵入quit便可。
若是在啓動hadoop集羣時候start-all.sh裏面沒有包含mapreduce.jobhistory.address這一項?那麼進行手動啓動。
./mr-jobhistory-daemon.sh start historyserver (在hadoop路徑下sbin下執行)
實例要求:這裏咱們給出一個學生表(學號,姓名,性別,年齡,所在系),其中含有以下幾條記錄並保存在/opt/bigdata/ziliao/student.txt文件:
201000101:Lihua:men:20:CST 201000102:Wangli:women:19:CST 201000103:Xiangming:women:18:CAT 201000104:Lixiao:men:19:CST 201000105:Wuda:women:19:CA 201000106:Huake:men:21:CST 201000107:Beihang:men:20:CA 201000108:Bob:women:17:CAT 201000109:Smith:men:19:CAT 201000110:Gxl:men:19:CST 201000111:Songwei:women:19:CA 201000112:Weihua:men:21:CAT 201000113:Weilei:women:18:CA 201000114:Luozheng:men:19:CA 201000115:Shangsi:women:20:CAT 201000116:Fandong:men:19:CST 201000117:Laosh:women:22:CAT 201000118:Haha:men:19:CA
它們所對應的數據類型以下所示:
Student(sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray)
咱們將在不一樣的運行方式下取出各個學生的姓名和年齡兩個字段,執行結果以下:
(Lihua,20) (Wangli,19) (Xiangming,18) (Lixiao,19) (Wuda,19) (Huake,21) (Beihang,20) (Bob,17) (Smith,19) (Gxl,19) (Songwei,19) (Weihua,21) (Weilei,18) (Luozheng,19) (Shangsi,20) (Fandong,19) (Laosh,22) (Haha,19)
一、local模式
進入grunt shell模式
[hadoop@wangmaster sbin]$ pig -x local
--加載數據(注意「=」左右兩邊要空格) grunt> A = load '/opt/bigdata/ziliao/student.txt' using PigStorage(':') as (sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray); --從A中選出Student相應的字段(注意「=」左右兩邊要空格) grunt> B = foreach A generate sname, sage; --將B中的內容輸出到屏幕上 grunt> dump B;
--將B的內容輸出到本地文件中 grunt> store B into '/opt/bigdata/ziliao/result.txt'; --查看本地文件內容,沒有'' grunt> cat /opt/bigdata/ziliao/result.txt;
(上面另外一種執行方式—腳本文件)將下面語句存儲到script.pig中(script.pig文件內容以下)
A = load '/opt/bigdata/ziliao/student.txt' using PigStorage(':') as (sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray); B = foreach A generate sname, sage; dump B; store B into '/opt/bigdata/ziliao/result1.txt';
執行pig -x local script.pig命令
查看結果:grunt> cat /opt/bigdata/ziliao/result.txt;
二、 MapReduce模式
首先將/opt/bigdata/ziliao/student.txt放到hadoop的in目錄下 hadoop dfs -put /opt/bigdata/ziliao/student.txt /in 輸入pig,進入shell編輯模式下 grunt> ls /in hdfs://wangmaster:9000/docs<r 3> 104 hdfs://wangmaster:9000/hbase <dir> hdfs://wangmaster:9000/input <dir> hdfs://wangmaster:9000/output <dir> hdfs://wangmaster:9000/student.txt<r 3> 525 hdfs://wangmaster:9000/tmp <dir> hdfs://wangmaster:9000/wang <dir>
而後對其進行操做
輸入目錄變爲hdfs://wangmaster:9000/in/student.txt
輸出目錄變爲hdfs://wangmaster:9000/in/result.txt
(注意:腳本也是如此)。
A = load 'hdfs://wangmaster:9000/student.txt' using PigStorage(':') as (sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray); B = foreach A generate sname, sage; dump B; store B into 'hdfs://wangmaster:9000/result0.txt' cat hdfs://wangmaster:9000/result0.txt;
第二例:求每一個專業的最大的年齡人的相關信息:(數據仍是上面的)
執行(在shell裏面執行):
A = load '/opt/bigdata/ziliao/student.txt' using PigStorage(':') as (sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray); B = group A by sdept; dump B; max_age = foreach B generate group,MAX(A.sage); dump = max_age; 輸出結果: (CA,20) (CAT,22) (CST,21)
查找目標信息 CA = filter A by sdept == 'CA' and sage == 20; (CA專業的最大年齡人的信息) CAT0 = filter A by sdept == 'CAT' and sage == 22; (不可用標識詞)(CAT專業的最大年齡人的信息) CST = filter A by sdept == 'CST' and sage == 21; (CST專業的最大年齡人的信息)