Python 解析樹狀結構文件

背景:node

最近要解析一些樹狀結構的debug trace文本,爲了便於閱讀,但願解析成a.b.c 的結構。python

每一個父節點和子節點靠一個Tab識別,葉子節點以ptr開頭(除了Tab)。算法

核心思想:優化

首先找到葉子節點,而後依次向前找到父節點(父節點比當前節點少一個Tab),當遇到「}」, 表示這棵樹結束了。this

現模擬debug trace 建一個文本文件1.txt。spa

內容以下:debug

service[hi]
name: [1]
{
	name:[11]
	{	
		name: [111]
		{
			ptr -- [1111]--[value0]
			ptr -- [1112]--[value1]
		}
		name: [112]
		{
			name: [1121]
			{
				ptr -- [111211]--[value2]
			}

		}
	}
	name:[12]
	{
		ptr -- [121]--[value3]
	}
	name:[13]
	{
		ptr -- [131]--[value4]
	}
}
service[Jeff]
name: [1]
{
	name:[11]
	{	
		name: [111]
		{
			ptr -- [1111]--[value0]
			ptr -- [1112]--[value1]
		}
		name: [112]
		{
			name: [1121]
			{
				ptr -- [111211]--[value2]
			}

		}
	}
	name:[12]
	{
		ptr -- [121]--[value3]
	}
	name:[13]
	{
		ptr -- [131]--[value4]
	}
}

解析程序以下:code

1.common.pyorm

'''
Created on 2012-5-26

author: Jeff
'''

def getValue(string,key1,key2):
    """
    get the value between key1 and key2 in string
    """
    index1 = string.find(key1)
    index2 = string.find(key2)
    
    value = string[index1 + 1 :index2]
    return value

def getFiledNum(string,key,begin):
    """
    get the number of key in string from begin position
    """
    keyNum = 0
    start = begin

    while True:
        index = string.find(key, start)
        if index == -1:
            break
    
        keyNum = keyNum + 1
        start = index + 1

    return keyNum

2. main.pyxml

'''
Created on 2012-5-26

author: Jeff
'''

import common
import linecache       

fileName = "1.txt"
fileNameWrite = "result.txt"
leafNode = "ptr"
curLine = 0
nextLine = 0

f = open(fileName,'r')
fw = open(fileNameWrite,'w')

# read line
while True:
    data = f.readline()
    
    if not data:
        break
    
    curLine = curLine + 1
    
    # find the leafNode
    
    if data.startswith("service"):
        index = data.find('\n')
        print data[0:index]  
        fw.write(data[0:index] + '\n')
        continue
    

    if data.find(leafNode) != -1:
        nextLine = curLine + 1
        #print "data is %s, current line is %d, next line is %d." %(data,curLine,nextLine)
        
        # value of leaf node
        value = common.getValue(data, '[', ']')
        string = value
        #print "value of leaf node is %s" % value
        
        # get the number of tab
        tabNum = common.getFiledNum(data, '\t', 0)
        #print( "Tab number is  %d" % tabNum )
        
        # i for read previous line
        # j for create perfix 
        i = curLine - 1
        j = tabNum - 1
        
        
        while True:
            
            prefix = '\t' * j + 'name'
    
            # get previous line
            preline=linecache.getline(fileName,i)
            #print "previous line is %s" % preline

            if preline.startswith("{"):
                break  
            
            if preline.startswith(prefix):
                #print "this line start with prefix              

                value = common.getValue(preline, '[', ']')
                string = value + "." + string
                
                i = i - 1
                j = j - 1
            else:
                i = i - 1
                

        print string
        fw.write(string + '\n')

fw.close()
f.close()

解析結果result.txt:

service[hi]
1.11.111.1111
1.11.111.1112
1.11.111.1121.111211
1.12.121
1.13.131
service[jeff]
1.1.11.111.1111
1.1.11.111.1112
1.1.11.111.1121.111211
1.1.12.121
1.1.13.131


優化:

1.字符串相加的部分改爲 all = ‘%s%s%s%s’ % (str0, str1, str2, str3) 形式 或者 ''.join 的形式。

2.要寫入得內容保存在List中,最後用f.writelines(list)一塊兒寫入。

3.算法優化,請參考 個人博客 :《Python 解析樹狀結構文件(算法優化)》

相關文章
相關標籤/搜索