pig安裝在hadoop僞分佈式節點

時間 2020-03-22

標籤 pig 安裝 hadoop 分佈式節點欄目 Hadoop 简体版

原文原文鏈接

本次pig安裝在一個hadoop僞分佈式節點。html

Pig是yahoo捐獻給apache的一個項目，它是SQL-like語言，是在MapReduce上構建的一種高級查詢語言，把一些運算編譯進MapReduce模型的Map和Reduce中，而且用戶能夠定義本身的功能。java

Pig是一個客戶端應用程序，就算你要在Hadoop集羣上運行Pig，也不須要在集羣上裝額外的東西。shell

首先從官網上下載pig安裝包，並上傳到服務器後。使用如下命令解壓：apache

[hadoop@hadoop1 soft]$ tar -zxvf pig-0.13.0.tar.gzbash

爲了配置方便，簡單能夠修改一下解壓後的文件名服務器

[hadoop@hadoop1 ~]$ mv pig-0.13.0 pig2jvm

在hadoop用戶的.bash_profile中增長pig環境變量分佈式

[hadoop@hadoop1 ~]$ cat .bash_profile ide

# .bash_profilegrunt

# Get the aliases and functions

if [ -f ~/.bashrc ]; then

. ~/.bashrc

# User specific environment and startup programs

PATH=$PATH:$HOME/bin

export PATH

export JAVA_HOME=/usr/lib/jvm/java-1.7.0/

export HADOOP_HOME=/home/hadoop/hadoop2

export PIG_HOME=/home/hadoop/pig2

export PIG_CLASSPATH=$HADOOP_HOME/etc/hadoop/

export ATH=$PATH:$JAVA_HOME/bin/:$HADOOP_HOME/bin:$PIG_HOME/bin

[hadoop@hadoop1 ~]$ source .bash_profile

Pig有兩種模式：

一種是Localmode，也就是本地模式，這種模式下Pig運行在一個JVM裏，訪問的是本地的文件系統，只適合於小規模數據集，通常是用來體驗Pig。並且，它並無用到Hadoop的Localrunner，Pig把查詢轉換爲物理的Plan，而後本身去執行。

在終端下輸入

% pig -x local

就能夠進入Local模式了。

還有一種就是Hadoop模式了，這種模式下，Pig才真正的把查詢轉換爲相應的MapReduce Jobs，並提交到Hadoop集羣去運行，集羣能夠是真實的分佈式也能夠是僞分佈式。

[hadoop@hadoop1 ~]$ pig

14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : LOCAL

14/09/10 21:04:08 INFOpig.ExecTypeProvider: Trying ExecType : MAPREDUCE

14/09/10 21:04:08 INFOpig.ExecTypeProvider: Picked MAPREDUCE as the ExecType

2014-09-10 21:04:09,149 [main] INFO org.apache.pig.Main - Apache Pig version0.13.0 (r1606446) compiled Jun 29 2014, 02:27:58

2014-09-10 21:04:09,150 [main] INFO org.apache.pig.Main - Logging error messagesto: /home/hadoop/pig2/pig-err.log

2014-09-10 21:04:09,435 [main] INFO org.apache.pig.impl.util.Utils - Defaultbootup file /home/hadoop/.pigbootup not found

2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address

2014-09-10 21:04:10,345 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated.Instead, use fs.defaultFS

2014-09-10 21:04:10,346 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://hadoop1:9000

2014-09-10 21:04:10,360 [main] INFO org.apache.hadoop.conf.Configuration.deprecation- mapred.used.genericoptionsparser is deprecated. Instead, usemapreduce.client.genericoptionsparser.used

2014-09-10 21:04:12,820 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker isdeprecated. Instead, use mapreduce.jobtracker.address

2014-09-10 21:04:12,821 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -Connecting to map-reduce job tracker at: hadoop1:9001

2014-09-10 21:04:12,831 [main] INFO org.apache.hadoop.conf.Configuration.deprecation- fs.default.name is deprecated. Instead, use fs.defaultFS

grunt>

grunt> help

Commands:

<pig latin statement>; - See thePigLatin manual for details: http://hadoop.apache.org/pig

File system commands:

fs <fs arguments> - Equivalent to Hadoop dfs command:http://hadoop.apache.org/common/docs/current/hdfs_shell.html

Diagnostic commands:

describe <alias>[::<alias] - Show the schema for the alias.Inner aliases can be described as A::B.

explain [-script <pigscript>] [-out <path>] [-brief][-dot|-xml] [-param <param_name>=<param_value>]

[-param_file <file_name>] [<alias>] - Show the executionplan to compute the alias or for entire script.

-script - Explain the entire script.

-out - Store the output into directory rather than print to stdout.

-brief - Don't expand nested plans (presenting a smaller graph foroverview).

-dot - Generate the output in .dot format. Default is text format.

-xml - Generate the output in .xml format. Default is text format.

-param <param_name - See parameter substitution for details.

-param_file <file_name> - See parameter substitution for details.

alias - Alias to explain.

dump <alias> - Compute the alias and writes the results to stdout.

Utility Commands:

exec [-param <param_name>=param_value] [-param_file<file_name>] <script> -

Execute the script with access to grunt environment including aliases.

-param <param_name - See parameter substitution for details.

-param_file <file_name> - See parameter substitution for details.

script - Script to be executed.

run [-param <param_name>=param_value] [-param_file<file_name>] <script> -

Execute the script with access to grunt environment.

-param <param_name - See parameter substitution for details.

-param_file <file_name> - See parameter substitution for details.

script - Script to be executed.

sh <shell command> - Invokea shell command.

kill <job_id> - Kill the hadoop job specified by the hadoop jobid.

set <key> <value> - Provide execution parameters to Pig.Keys and values are case sensitive.

The following keys are supported:

default_parallel - Script-level reduce parallelism. Basic input sizeheuristics used by default.

debug - Set debug on or off. Default is off.

job.name - Single-quoted name for jobs. Default is PigLatin:<scriptname>

job.priority - Priority for jobs. Values: very_low, low, normal, high,very_high. Default is normal

stream.skippath - String that contains the path. This is used bystreaming.

any hadoop property.

help - Display this message.

history [-n] - Display the list statements in cache.

-n Hide line numbers.

quit - Quit the grunt shell.

grunt>

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。