用AWK來過濾nginx日誌中的特定值~~~

  這篇文章說是原創的其實裏面包含了不少朋友的幫助,在此對朋友們表示感謝!!
  前天
開發的同事讓我幫忙分析下 nginx訪問日誌,我用了awstat作成了圖表,結果人家說不要圖,他只要訪問日誌裏面的4個值...(早說啊),我看了下nginx的日誌格式,下面是其中一段
124.227.66.162 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "uid=355288&cuid=355287&timestamp=1264484517&check=68230e418e28a9d05b8cf1e2f7cbf392&action=plantInfo" 200 1019 "http://www.ime.com/design/flash/main.swf?v=439/`DYNAMIC`/1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)" -
124.240.39.49 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "cid=2&lid=4&oid=2&action=researchLayer&cuid=496990&timestamp=1264398138&check=b50cd4ade18c0797df24cb1a8828ae18" 200 219 "http://www.ime.com/design/flash/main.swf?v=439/`DYNAMIC`/1" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30618)" -
121.236.118.126 - - [25/Jan/2010:13:42:07 +0800] "POST /design/game.php HTTP/1.1" "check=8ec1521fc3df9c03d83af9a4d933dbb0&cuid=509590&timestamp=1264398703&oid=2&action=oreInfo" 200 261 "http://www.ime.com/design/flash/main.swf?v=439/`DYNAMIC`/1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 1.7; .NET CLR 2.0.50727)" -
同事讓我幫忙取 IP地址 時間 還有 cuid= 和 action= 的值
看上去好亂,可是仍是有規律的,裏面好多行沒有 action 和cuid,我先把他過濾掉
awk '/action/{print $0}' access.log > action.log
由於 若是有action 就確定會有cuid 因此只過濾一個action就行了
如今的全部行都有 cuid 和 action了
好了,我再來改一改格式,讓他看起來更清晰一些
awk -F "[ '&''[']" '{print $1"\t"$5"\t"$10"\t"$11"\t"$12"\t"$13"\t"$14"\t"$15}' action.log > newlog
這樣比較麻煩,不過確實能讓他更清晰一些,下面是獲得的結果
117.83.131.36    25/Jan/2010:14:31:34    "uid=438824    cuid=511252    timestamp=1264401079    check=fbb9ad922f01888e6c0757d117bf304e    action=plantInfo"    200
221.9.32.181    25/Jan/2010:14:31:34    "cuid=506517    action=plantInfo    timestamp=1264401075    check=01661377f346538eba790e856dd3713a    uid=539860"    200
221.178.128.146    25/Jan/2010:14:31:34    "timestamp=1264401105    check=7d5e41feeb3ae0482e1fe990f27ddc67    cuid=303367    display=1    action=plantInfuid=303367"
124.131.80.68    25/Jan/2010:14:31:34    "cuid=393678    timestamp=1264401093    action=checkResearchLayer    check=2f2cc50cc99aa9e05f02b6f6a47cbef6"200    765
125.107.199.28    25/Jan/2010:14:31:34    "timestamp=1264401094    oid=4    uid=350003    action=oreInfo    check=5d835e252b841c86da041b8b63b4b67e    cuid=356549"
111.167.145.209    25/Jan/2010:14:31:34    "action=plantInfo    cuid=154228    timestamp=1264401094    check=5d835e252b841c86da041b8b63b4b67e    uid=372981"    200

看到這裏我有點發愁了,由於cuid 和 action 所在的列不是固定的,用簡單的AWK過濾不行,須要藉助AWK的循環和判斷了,而這方面我沒有作過因而就在羣裏發了求助信息,這時候有兩個朋友 給了我回復一個是 輝太郎 另外一個是 jeremy.zhang
他們的方案也不一樣,一個是用perl 腳本,另外一個是直接用awk
先說說 用perl吧,其實perl我也不太懂,直接把他寫的腳本貼上來
#!/usr/bin/perl -w
open(MYFILE,"/mnt/disk/newlog") || die "$!";
while(<MYFILE>)
     {
            $str = $_;
                if ($str =~ m/(.*?)\[/s)
                        {
                              $var1 =  $1;
                                 print  $var1;
                                     }
                if ($str =~ m/\[(.*?)\"/s)
                       {
                              $var4 = $1;
                                 print $var4;
                                    }
                if ($str =~ m/cuid=(\d+)/s)
                          {
                                  $var2 = $1;
                                    print "cuid=",$var2,"\t";
                                         }
               if ($str =~ m/action=(\w+)/s)
                           {
                                   $var3 = $1;
                                    print  "action=",$var3,"\n";
                                        }
                        }

/mnt/disk/newlog 這個是我剛纔過濾出來的文件,執行的時候用perl 執行
perl 1.sh > newlog1
可是這條我執行後格式出了一點小誤差
124.197.61.124  25/Jan/2010:14:42:17    cuid=430334     action=plantInfo
54955 124.79.7.236    25/Jan/2010:14:42:17    cuid=318701     action=petsInfo
54956 122.230.66.90   25/Jan/2010:14:42:17    cuid=223422     action=compQuest
54957 113.128.147.225 25/Jan/2010:14:42:17    cuid=362043     action=plantInfo
54958 220.184.20.99   25/Jan/2010:14:42:17    cuid=484582     action=wordInfo
54959 222.161.49.201  25/Jan/2010:14:42:17    cuid=304167     218.95.48.90    25/Jan/2010:14:42:17    cuid=476480     action=plantInfo
54960 218.106.242.20  25/Jan/2010:14:42:17    cuid=501942     action=oreInfo
54961 221.137.223.58  25/Jan/2010:14:42:17    cuid=445595     action=takeQuest
54962 124.126.155.202 25/Jan/2010:14:42:17    cuid=0  action=initData
54963 113.224.227.68  25/Jan/2010:14:42:17    cuid=529218     action=editName
54964 121.4.66.146    25/Jan/2010:14:42:17    cuid=187626     action=researchLayer
54965 220.190.82.170  25/Jan/2010:14:42:17    cuid=62789      action=steal
54966 218.5.38.250    25/Jan/2010:14:42:17    cuid=456212     124.90.203.86   25/Jan/2010:14:42:17    cuid=492016     action=oreInfo

可是整體來說仍是能夠接受的,謝謝輝太郎
下面看看
jeremy 的awk 命令,
第一步 awk '/action/{print $0}' access.log >tmp.log 過濾出包含action的行
第二步
awk '{print $1"\t"$4"\t"$9}' tmp.log > action.log
將沒用的列去掉
第三部
過濾並輸出 IP 時間 cuid= action=
awk -F"[ '['\"'&''=']+" '{printf $1"\t"$2"\t";for(i=3;i<=NF;i++){if($i=="cuid" || $i=="action")printf "%s",$i"="$(i+1)"\t"};printf "\n"}' action.log > cuid_action.log
下面是最終的結果
202.113.30.144        25/Jan/2010:13:42:07        cuid=181188    action=compound   
124.227.66.162        25/Jan/2010:13:42:07        cuid=355287    action=plantInfo   
124.240.39.49        25/Jan/2010:13:42:07        action=researchLayer    cuid=496990   
121.236.118.126        25/Jan/2010:13:42:07        cuid=509590    action=oreInfo   
113.139.18.82        25/Jan/2010:13:42:07        cuid=512461    action=oreInfo   
222.184.232.183        25/Jan/2010:13:42:07        cuid=520595    action=oreInfo   
218.59.80.95        25/Jan/2010:13:42:07        cuid=293339    action=questInfo   
221.6.38.37        25/Jan/2010:13:42:07        action=plantInfo    cuid=518015   
125.39.143.96        25/Jan/2010:13:42:07        cuid=133987    action=pkResult   
119.180.17.218        25/Jan/2010:13:42:07        cuid=452667    action=wordInfo
其實上面這三步能夠合併成一步可是分開來弄更清晰一些你們能夠經過修改上面這些命令來 定製過濾本身須要的字段,但願對你們有所幫助
再次感謝jerrmy
相關文章
相關標籤/搜索