核心思想: python
創建一個List用來存儲父節點信息,每當讀到以Tab+name 開頭的行時,將這行父節點信息存儲在prefixList[tab 的個數] 中,即prefixList[i] 存儲 Tab 個數爲 i 的父節點信息。算法
當讀到以Tab+ptr 開頭的行的時候,代表到達了子節點,那麼它的父節點(前綴)一定爲:preList[0] + ...+ preList[tab 的個數],因此最終結果爲: 前綴 + 當前子節點信息。app
當再次讀到以Tab+name 開頭的行時,代表對於接下來的子節點而言,其父節點中某個節點變化了,咱們只要覆蓋對應的prefixList[tab 的個數] 的值,由於不會有節點須要原來prefixList[tab 的個數] 的值。優化
實現:spa
現模擬debug trace 建一個文本文件1.txt,內容以下:debug
service[hi] name: [1] { name:[11] { name: [111] { ptr->1111-->[value0] ptr->1112-->[value1] } name: [112] { name: [1121] { ptr->111211-->[value2] } } } name:[12] { ptr->121-->[value3] } name:[13] { ptr->131-->[value4] } } service[Jeff] name: [1] { name:[11] { name: [111] { ptr->1111-->[value0] ptr->1112-->[value1] } name: [112] { name: [1121] { ptr->111211-->[value2] } } } name:[12] { ptr->121-->[value3] } name:[13] { ptr->131-->[value4] } }
解析程序以下:
1.common.pycode
''' Created on 2012-5-28 @author: Jeff_Yu ''' def getValue(string,key1,key2): """ get the value between key1 and key2 in string """ index1 = string.find(key1) index2 = string.find(key2) value = string[index1 + 1 :index2] return value def getFiledNum(string,key,begin): """ get the number of key in string from begin position """ keyNum = 0 start = begin while True: index = string.find(key, start) if index == -1: break keyNum = keyNum + 1 start = index + 1 return keyNum
2. main.py
''' Created on 2012-6-1 @author: Jeff_Yu ''' import common fileNameRead = "1.txt" fileNameWrite = '%s%s' %("Result_",fileNameRead) writeList = [] # the first name always start with 0 Tab i = 0 fr = open(fileNameRead,'r') fw = open(fileNameWrite,'w') for data in fr: if not data: break # find the Service Name if data.startswith("service"): #for each service prefixList = list("0" * 30) prefixString = "" recordNum = "" index = data.find('\n') writeList.append('%s\n' %data[0:index]) continue # find name if data.find("name") != -1: tabNumOfData = common.getFiledNum(data, '\t', 0) value = common.getValue(data, '[', ']') prefixList[tabNumOfData] = value + "." if data.find("ptr") != -1: tabNumOfLeaf = common.getFiledNum(data, '\t', 0) valueOfLeaf = common.getValue(data, '[', ']') nameOfLeaf = common.getValue(data, '>', '-->') LeafPartstring = nameOfLeaf + "[" + valueOfLeaf + "]" finalString = "" while i < tabNumOfLeaf: finalString = finalString + prefixList[i] i = i + 1 i = 0 finalString = finalString + LeafPartstring #append line to writeList writeList.append(finalString) writeList.append("\n") # write writeList to result file fw.writelines(writeList) del prefixList del writeList fw.close() fr.close()
解析結果Result_1.txt:
service[hi] 1.11.111.1111[value0] 1.11.111.1112[value1] 1.11.112.1121.111211[value2] 1.12.121[value3] 1.13.131[value4] service[Jeff] 1.11.111.1111[value0] 1.11.111.1112[value1] 1.11.112.1121.111211[value2] 1.12.121[value3] 1.13.131[value4]
實際的trace文件比這個複雜,由於涉及公司信息,實現代碼就不貼出來,可是核心思想和上面是同樣的
字符串
這個版本效率大大提升,原來解析5M的文件要2分多鐘,如今只要1秒鐘
get
這個版本優化了:
博客
1.字符串相加的部分改爲 all = ‘%s%s%s%s’ % (str0, str1, str2, str3) 的形式。
2.要寫入得內容保存在List中,最後用f.writelines(list)一塊兒寫入。
3. 這個算法減小了讀文件的次數,及時保存讀過的有用信息,避免往回讀文件。