與pandas初相識

  前一陣子有個同事說,他看不懂從kibana上拉下來的日誌,可是又想分析一些數據,感受很頭痛,每次都找開發給他整理一下,可是開發也很忙,要數據的頻率也略高,那時候正好我跟這位需求方的項目,負責測試工做。而後,我晚上加班的時候就幫他寫了一個很小的程序,幫助這位需求方同事能夠隨時查看數據。也不會佔用任何人太多時間。python

  解決思路:json

  1、讀取原始報表app

    這裏的config.ini中放的是原始報表名稱 測試

[filenames]
file_name=XXXXXX.csv

  2、拆分數據url

  3、按照既定規則計算符合flag的數據日誌

  4、拼接數據,造成新的報表輸出excel

  

'''
@create on : 20190311
@Update : 20190311
@description: 該模塊能夠直接獲取最直觀的報表

'''

import pandas as pd
import configparser
import os
import json


# 獲取項目根目錄
dirpath = os.path.dirname(os.path.realpath(__file__))

# 拼接時候注意一下,會從第一個帶有斜槓的地址開始拼接
sencondpath = os.path.join(dirpath, "log_file")
config = configparser.ConfigParser()
config.read("config.ini")
filename = config.get("filenames", "file_name")

# 改config.ini中的文件名自動拼接
finalpath = os.path.join(sencondpath, filename)

# 讀入的CSV數據對象
log_df = pd.read_csv(finalpath, encoding="utf-8")
print(log_df)


# 半成品矩陣
def mergedf():
    df_right = log_df['message']
    df_left = log_df['@timestamp']
    result_df = pd.concat([df_left, df_right], axis=1)
    return result_df


def oprate_df():
    # 計算有多少符合數據旗標
    flag = 0

    df_size = log_df.__len__()
    urlParams, jrtt_reports, convert_ids = [], [], []

    try:
        goal_df = mergedf()
        for line in range(df_size):
            data_row = json.loads(log_df.loc[line, 'message'])
            print(log_df.loc[line, '@timestamp'])
            if data_row["data"]["jrtt_report"] is not None and data_row["data"]["convert_id"] is not None:
                flag = flag + 1
            line = line + 1

            urlParams.append(data_row["data"]["urlparams"])
            jrtt_reports.append(data_row["data"]["jrtt_report"])
            convert_ids.append(data_row["data"]["convert_id"])
        print(flag)
    except Exception as e:
        print("日誌文件解析出錯" + str(e))

    try:
        goal_df.insert(0, 'uelParmas', urlParams)
        goal_df.insert(0, 'jrtt_repot', jrtt_reports)
        goal_df.insert(0, 'convert_id', convert_ids)

    except Exception as e:
        print("矩陣組合出錯!"+str(e))
    #print(goal_df)
    return goal_df


if __name__ == '__main__':

    total_df = oprate_df()
    excelFile = "D:/anylysis/dataResult/workResult.xlsx"
    writer = pd.ExcelWriter(excelFile)
    total_df.to_excel(writer, 'FinalResult')
    writer.save()
相關文章
相關標籤/搜索