pig安裝配置及實例

時間 2019-11-21

標籤 pig 安裝配置實例简体版

原文原文鏈接

1、前提

　　一、 hadoop集羣環境配置好（本人hadoop版本：hadoop-2.7.3）shell

　　二、 windows基礎環境準備：apache

jdk環境配置、esclipse環境配置windows

2、搭建pig環境

　　一、下載pig：grunt

　　　　在Apache下載最新的Pig軟件包，點擊下載會推薦最快的鏡像站點，如下爲下載地址：http://mirror.bit.edu.cn/apache/pig/oop

　　二、上傳pig（個人是上傳到/opt/bigdata下面）ui

　　三、解壓縮spa

[hadoop@wangmaster sbin]$ cd /opt/bigdata/
[hadoop@wangmaster bigdata]$ ls
docs          hadoop-2.7.3.tar.gz  hbase-1.2.5-bin.tar.gz  jdk1.8.tar.gz       opt     pig-0.17.0         zookeeper-3.4.10
hadoop-2.7.3  hbase-1.2.5          jdk1.8                  maxtemperaurte.jar  output  pig-0.17.0.tar.gz  zookeeper-3.4.10.tar.gz
[hadoop@wangmaster bigdata]$ tar -xzvf pig-0.17.0

　　四、設置環境變量命令行

sudo vi /etc/profile
##設置pig的class路徑和在path加入pig的路徑，其中PIG_CLASSPATH參數是設置pig在MapReduce工做模式：
export PIG_HOME=/opt/bigdata/pig-0.17.0
export PATH=$PATH: /opt/bigdata/hadoop-2.7.3/bin:$PIG_HOME/bin
##確認生效
source /etc/profile

　　五、驗證安裝完成code

　　　　從新登陸終端，確保hadoop集羣啓動，鍵入pig命令，應該能看到pig鏈接到hadoop集羣的信息而且進入了grunt shell命令行模式：server

　　　　若是須要退出的話，在pig的grunt shell下鍵入quit便可。

3、實例

　　若是在啓動hadoop集羣時候start-all.sh裏面沒有包含mapreduce.jobhistory.address這一項？那麼進行手動啓動。

./mr-jobhistory-daemon.sh start historyserver  （在hadoop路徑下sbin下執行）

　　實例要求：這裏咱們給出一個學生表(學號，姓名，性別，年齡，所在系)，其中含有以下幾條記錄並保存在/opt/bigdata/ziliao/student.txt文件：

201000101:Lihua:men:20:CST
201000102:Wangli:women:19:CST
201000103:Xiangming:women:18:CAT
201000104:Lixiao:men:19:CST
201000105:Wuda:women:19:CA
201000106:Huake:men:21:CST
201000107:Beihang:men:20:CA
201000108:Bob:women:17:CAT
201000109:Smith:men:19:CAT
201000110:Gxl:men:19:CST
201000111:Songwei:women:19:CA
201000112:Weihua:men:21:CAT
201000113:Weilei:women:18:CA
201000114:Luozheng:men:19:CA
201000115:Shangsi:women:20:CAT
201000116:Fandong:men:19:CST
201000117:Laosh:women:22:CAT
201000118:Haha:men:19:CA

它們所對應的數據類型以下所示：
Student(sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray)
咱們將在不一樣的運行方式下取出各個學生的姓名和年齡兩個字段，執行結果以下：

(Lihua,20)
(Wangli,19)
(Xiangming,18)
(Lixiao,19)
(Wuda,19)
(Huake,21)
(Beihang,20)
(Bob,17)
(Smith,19)
(Gxl,19)
(Songwei,19)
(Weihua,21)
(Weilei,18)
(Luozheng,19)
(Shangsi,20)
(Fandong,19)
(Laosh,22)
(Haha,19)

　　一、local模式

　　　　進入grunt shell模式

[hadoop@wangmaster sbin]$ pig -x local
--加載數據(注意「=」左右兩邊要空格)
grunt> A = load '/opt/bigdata/ziliao/student.txt' using PigStorage(':') as (sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray);
--從A中選出Student相應的字段(注意「=」左右兩邊要空格)
grunt> B = foreach A generate sname, sage;
--將B中的內容輸出到屏幕上
grunt> dump B;

--將B的內容輸出到本地文件中
grunt> store B into '/opt/bigdata/ziliao/result.txt';
--查看本地文件內容,沒有''
grunt> cat /opt/bigdata/ziliao/result.txt;

　　　　（上面另外一種執行方式—腳本文件）將下面語句存儲到script.pig中(script.pig文件內容以下)

A = load '/opt/bigdata/ziliao/student.txt' using PigStorage(':') as (sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray);
B = foreach A generate sname, sage;
dump B;
store B into '/opt/bigdata/ziliao/result1.txt';

　　　　執行pig -x local script.pig命令

　　　　查看結果：grunt> cat /opt/bigdata/ziliao/result.txt;

　　二、 MapReduce模式

首先將/opt/bigdata/ziliao/student.txt放到hadoop的in目錄下
hadoop dfs -put /opt/bigdata/ziliao/student.txt /in
輸入pig，進入shell編輯模式下
grunt> ls /in
hdfs://wangmaster:9000/docs<r 3>    104
hdfs://wangmaster:9000/hbase    <dir>
hdfs://wangmaster:9000/input    <dir>
hdfs://wangmaster:9000/output    <dir>
hdfs://wangmaster:9000/student.txt<r 3>    525
hdfs://wangmaster:9000/tmp    <dir>
hdfs://wangmaster:9000/wang    <dir>

　　　　而後對其進行操做

　　　　輸入目錄變爲hdfs://wangmaster:9000/in/student.txt

　　　　輸出目錄變爲hdfs://wangmaster:9000/in/result.txt

　　　　（注意：腳本也是如此）。

A = load 'hdfs://wangmaster:9000/student.txt' using PigStorage(':') as (sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray);
B = foreach A generate sname, sage;
dump B;
store B into 'hdfs://wangmaster:9000/result0.txt'
cat hdfs://wangmaster:9000/result0.txt;

　　第二例：求每一個專業的最大的年齡人的相關信息：（數據仍是上面的）

執行（在shell裏面執行）：
A = load '/opt/bigdata/ziliao/student.txt' using PigStorage(':') as (sno:chararray, sname:chararray, ssex:chararray, sage:int, sdept:chararray);
B = group A by sdept;
dump B;
max_age = foreach B generate group,MAX(A.sage);
dump = max_age;
輸出結果：
(CA,20)
(CAT,22)
(CST,21)
查找目標信息
CA = filter A by sdept == 'CA' and sage == 20; （CA專業的最大年齡人的信息）
CAT0 = filter A by sdept == 'CAT' and sage == 22; （不可用標識詞）（CAT專業的最大年齡人的信息）
CST = filter A by sdept == 'CST' and sage == 21; （CST專業的最大年齡人的信息）

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。