查看Hadoop對應的pig版本 http://www.aboutyun.com/blog-61-62.htmlhtml
要咱們使用start-dfs.sh和start-yarn.sh來進行啓動Hadoop
在,/home/hadoop/.bashrc文件中添加以下紅色信息
#set java environment
PIG_HOME=/home/hadoop/pig-0.9.2
HBASE_HOME=/home/hadoop/hbase-0.94.3
HIVE_HOME=/home/hadoop/hive-0.9.0
HADOOP_HOME=/home/hadoop/hadoop-1.1.1
JAVA_HOME=/home/hadoop/jdk1.7.0
PATH=$JAVA_HOME/bin:$PIG_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin:$HADOOP_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$HBASE_HOME/lib:$PIG_HOME/lib:$HIVE_HOME/lib:$JAVA_HOME/lib/tools.jar
export PIG_HOME
export HBASE_HOME
export HADOOP_HOME
export JAVA_HOME
export HIVE_HOME
export PATH
export CLASSPATH
重啓機器或用source命令使文件生效。java
切換到.bashrc文件路徑下而後node
source .bashrclinux
若是執行pig命令 permission denied 則 chmod +x pig 使其變成可執行文件shell
運行一個簡單的示例,就是把linux下的/etc/passwd文件的第一列提取出來輸出,用MapReduce模式跑,效果就是輸入全部用戶名bash
首先把/etc/passwd文件put到hadoop的hdfs上,命令以下ssh
hadoop fs -put /etc/passwd /user/root/passwd
而後進入Pig shell,運行命令,以':'分隔提取A,而後把A的第一列放入B,dump打出Btcp
[root@hadoop-namenodenew]# pig grunt> A = load 'passwd' using PigStorage(':'); grunt> B = foreach A generate $0 as id; grunt> dump B;
輸出以下:grunt
(省略幾萬字....) Input(s): Successfully read 29 records (1748 bytes) from: "hdfs://192.168.12.67:8020/user/root/passwd"Output(s): Successfully stored 29 records (325 bytes) in: "hdfs://192.168.12.67:8020/tmp/temp1558767875/tmp-1327634226"Counters: Total records written : 29Total bytes written : 325Spillable Memory Manager spill count : 0Total bags proactively spilled: 0Total records proactively spilled: 0Job DAG: job_1401631066126_0005 (省略幾萬字....) (root) (bin) (daemon) (adm) (lp) (sync) (shutdown) (halt) (mail) (uucp) (operator) (games) (gopher) (ftp) (nobody) (dbus) (vcsa) (rpc) (abrt) (rpcuser) (nfsnobody) (haldaemon) (ntp) (saslauth) (postfix) (sshd) (tcpdump) (oprofile) (riak)