使用Pig計算出每一個ip的點擊次數

日誌文件格式以下:
220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"
1、Pig下載:
下載地址:http://www.apache.org/dyn/closer.cgi/pig

2、Pig安裝:
解壓
[grid@hadoop1 ~]$ tar -zxf pig-0.14.0.tar.gz

設置環境變量
[grid@hadoop1 ~]$ vi .bash_profile
PIG_INSTALL=/home/grid/pig-0.14.0
PIG_CLASSPATH=/home/grid/hadoop-1.2.1/conf/
PATH=$PATH:$PIG_INSTALL/bin
export PIG_INSTALL PATH PIG_CLASSPATH

設置JAVA_HOME
修改hosts文件

驗證
[grid@hadoop1 ~]$ pig -help

鏈接到Hadoop集羣
[grid@hadoop1 ~]$ pig
grunt> ls
hdfs://hadoop1:9000/user/grid/in    <dir>
hdfs://hadoop1:9000/user/grid/out    <dir>

3、開始做業
加載數據
grunt> A = LOAD 'in/8/access_log.txt' USING PigStorage (' ') AS ( ip, page);
grunt> DESCRIBE A;
A: {ip: bytearray,page: bytearray}
去掉用不着的信息
grunt> B = FOREACH A GENERATE ip;
分組
grunt> C = GROUP B BY ip;
grunt> DESCRIBE C;
C: {group: bytearray,B: {(ip: bytearray)}}
統計
grunt> D = FOREACH C GENERATE group AS ip, COUNT(B) AS count;
查看結果
grunt> DUMP D;
(127.0.0.1,2)
(1.59.65.67,2)
(112.4.2.19,9)
(112.4.2.51,80)
(60.2.99.33,42)
(69.28.58.5,1)
(69.28.58.6,9)
(69.28.58.8,5)
(1.193.3.227,3)
(1.202.221.3,6)
(117.136.9.4,6)
(121.31.62.3,26)
(182.204.8.4,59)
(183.9.112.2,25)
(221.12.37.6,25)
(223.4.16.88,2)
(27.9.110.75,122)
相關文章
相關標籤/搜索