本次pig安裝在一個hadoop僞分佈式節點。html
Pig是yahoo捐獻給apache的一個項目,它是SQL-like語言,是在MapReduce上構建的一種高級查詢語言,把一些運算編譯進MapReduce模型的Map和Reduce中,而且用戶能夠定義本身的功能。java
Pig是一個客戶端應用程序,就算你要在Hadoop集羣上運行Pig,也不須要在集羣上裝額外的東西。shell
首先從官網上下載pig安裝包,並上傳到服務器後。使用如下命令解壓:apache
[hadoop@hadoop1 soft]$ tar -zxvf pig-0.13.0.tar.gzbash
爲了配置方便,簡單能夠修改一下解壓後的文件名服務器
[hadoop@hadoop1 ~]$ mv pig-0.13.0 pig2jvm
在hadoop用戶的.bash_profile中增長pig環境變量分佈式
[hadoop@hadoop1 ~]$ cat .bash_profile ide
# .bash_profilegrunt
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
export JAVA_HOME=/usr/lib/jvm/java-1.7.0/
export HADOOP_HOME=/home/hadoop/hadoop2
export PIG_HOME=/home/hadoop/pig2
export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop/
export ATH=$PATH:$JAVA_HOME/bin/:$HADOOP_HOME/bin:$PIG_HOME/bin
[hadoop@hadoop1 ~]$ source .bash_profile
Pig有兩種模式:
一種是Localmode,也就是本地模式,這種模式下Pig運行在一個JVM裏,訪問的是本地的文件系統,只適合於小規模數據集,通常是用來體驗Pig。並且,它並無用到Hadoop的Localrunner,Pig把查詢轉換爲物理的Plan,而後本身去執行。
在終端下輸入
% pig -x local
就能夠進入Local模式了。
還有一種就是Hadoop模式了,這種模式下,Pig才真正的把查詢轉換爲相應的MapReduce Jobs,並提交到Hadoop集羣去運行,集羣能夠是真實的分佈式也能夠是僞分佈式。
[hadoop@hadoop1 ~]$ pig
14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : LOCAL
14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : MAPREDUCE
14/09/10 21:04:08 INFOpig.ExecTypeProvider: Picked MAPREDUCE as the ExecType
2014-09-10 21:04:09,149 [main] INFO org.apache.pig.Main - Apache Pig version0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58
2014-09-10 21:04:09,150 [main] INFO org.apache.pig.Main - Logging error messagesto: /home/hadoop/pig2/pig-err.log
2014-09-10 21:04:09,435 [main] INFO org.apache.pig.impl.util.Utils - Defaultbootup file /home/hadoop/.pigbootup not found
2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address
2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated.Instead, use fs.defaultFS
2014-09-10 21:04:10,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop1:9000
2014-09-10 21:04:10,360 [main] INFO org.apache.hadoop.conf.Configuration.deprecation- mapred.used.genericoptionsparser is deprecated. Instead, usemapreduce.client.genericoptionsparser.used
2014-09-10 21:04:12,820 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address
2014-09-10 21:04:12,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to map-reduce job tracker at: hadoop1:9001
2014-09-10 21:04:12,831 [main] INFO org.apache.hadoop.conf.Configuration.deprecation- fs.default.name is deprecated. Instead, use fs.defaultFS
grunt>
grunt> help
Commands:
<pig latin statement>; - See thePigLatin manual for details: http://hadoop.apache.org/pig
File system commands:
fs <fs arguments> - Equivalent to Hadoop dfs command:http://hadoop.apache.org/common/docs/current/hdfs_shell.html
Diagnostic commands:
describe <alias>[::<alias] - Show the schema for the alias.Inner aliases can be described as A::B.
explain [-script <pigscript>] [-out <path>] [-brief][-dot|-xml] [-param <param_name>=<param_value>]
[-param_file <file_name>] [<alias>] - Show the executionplan to compute the alias or for entire script.
-script - Explain the entire script.
-out - Store the output into directory rather than print to stdout.
-brief - Don't expand nested plans (presenting a smaller graph foroverview).
-dot - Generate the output in .dot format. Default is text format.
-xml - Generate the output in .xml format. Default is text format.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
alias - Alias to explain.
dump <alias> - Compute the alias and writes the results to stdout.
Utility Commands:
exec [-param <param_name>=param_value] [-param_file<file_name>] <script> -
Execute the script with access to grunt environment including aliases.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
script - Script to be executed.
run [-param <param_name>=param_value] [-param_file<file_name>] <script> -
Execute the script with access to grunt environment.
-param <param_name - See parameter substitution for details.
-param_file <file_name> - See parameter substitution for details.
script - Script to be executed.
sh <shell command> - Invokea shell command.
kill <job_id> - Kill the hadoop job specified by the hadoop jobid.
set <key> <value> - Provide execution parameters to Pig.Keys and values are case sensitive.
The following keys are supported:
default_parallel - Script-level reduce parallelism. Basic input sizeheuristics used by default.
debug - Set debug on or off. Default is off.
job.name - Single-quoted name for jobs. Default is PigLatin:<scriptname>
job.priority - Priority for jobs. Values: very_low, low, normal, high,very_high. Default is normal
stream.skippath - String that contains the path. This is used bystreaming.
any hadoop property.
help - Display this message.
history [-n] - Display the list statements in cache.
-n Hide line numbers.
quit - Quit the grunt shell.
grunt>