對於數據分析人員來講,不少時候csv和xls格式的文件會更方面不少,但json格式做爲網絡數據傳輸通用格式,不少格式是以json格式保存,包括scrapy和pyspider等都是提供了默認的json格式,所以在不少場合須要將json格式轉換爲xls格式,筆者本身寫了個轉換腳本:python
def get_data_from_dict(data, result): #若是key不存在,則設置爲空 for key in data: if key not in result: #若是一個key都不存在着直接加入一個key,不然選擇第一個key的長度 if not result.keys(): result[key] = [] else: result[key] = len(result[result.keys()[0]])*[""] #而後分別加入值 for key in result: if key in data: result[key].append(data[key]) else: result[key].append("") return result def get_all_child(father_dict, spread_dict=None): '''將全部dict展開''' if spread_dict == None: spread_dict = {} for key in father_dict: if isinstance(father_dict[key], dict): spread_dict = get_all_child(father_dict[key], spread_dict) else: spread_dict[key] = father_dict[key] return spread_dict
經過json.load咱們能夠將json文件轉換爲python中的字典格式,所以處理json文件對於python來講其實就是對字典的處理,這裏主要構建了兩個函數,get_all_child()是用來將展開字典逐級展開,get_data_from_dict()則負責將字典的組合成pandas中的DataFrame樣式格式,以一份安居客的json文件爲例,能夠關注個人github,附帶送上數據,完整的代碼以下:git
#coding:utf8 import json import six import pandas as pd from collections import defaultdict def get_data_from_dict(data, result=None): #若是key不存在,則設置爲空 for key in data: if key not in result: #若是一個key都不存在着直接加入一個key,不然選擇第一個key的長度 if not result.keys(): result[key] = [] else: result[key] = len(result[result.keys()[0]])*[""] #而後分別加入值 for key in result: if key in data: result[key].append(data[key]) else: result[key].append("") return result def get_all_child(father_dict, spread_dict=None): '''將全部dict展開''' if spread_dict == None: spread_dict = {} for key in father_dict: if isinstance(father_dict[key], dict): spread_dict = get_all_child(father_dict[key], spread_dict) else: spread_dict[key] = father_dict[key] return spread_dict filename = u"安居客新房.json" with open(filename,"r") as fr: result = defaultdict(list) for line in fr: linedata = json.loads(line) spread_data = get_all_child(linedata) result = get_data_from_dict(spread_data, result) data = pd.DataFrame(result) data.to_excel(u"安居客.xls",encoding="utf8")
---------------------END----WRITE----AT----http://my.oschina.net/Kanonpy/---------------------------
github