從老男孩老師那裏抓的題:html
處理如下文件內容,將域名取出並進行計數排序,如處理:(百度和sohu面試題)
python
oldboy.log http://www.etiantian.org/index.html http://www.etiantian.org/1.html http://post.etiantian.org/index.html http://mp3.etiantian.org/index.html http://www.etiantian.org/3.html http://post.etiantian.org/2.html
shell實現方式面試
awk -F "/" '{print $3}' oldboy.log | sort -r | uniq -c cut -d "/" -f3 oldboy.log | sort -r | uniq -c cat oldboy.log | sed 's/^ http:\/\///g' | sed 's/\/.*$//g' | sort -r | uniq -c 以上三種實現方式比較簡單 awk -F "/" '{++S[$3]} END {for(key in S) print key,S[key]}' oldboy.log|sort -k2
第四種詳解:shell
python實現:bash
# coding: utf-8
import sys
from itertools import groupby
#ListFile = sys.argv[1]
def demo(ListFile):
reList = []
files = file(ListFile,'r')
lines = files.readlines()
for item in lines:
#print item,
rLIst = item.split("/")
r = rLIst[2]
reList.append(r)
result = [(a,len(list(b))) for a,b in groupby(sorted(reList))]
return result
if __name__ == "__main__":
#demo(ListFile)
print demo("/tmp/oldboy.log")app