一鍵生成微信我的專屬數據報告,瞭解你的微信社交歷史

時間 2019-11-09

標籤一鍵生成微信我的專屬數據報告瞭解社交歷史简体版

原文原文鏈接

[TOC]html

一鍵生成微信我的專屬數據報告,瞭解你的微信社交歷史

簡介

你是否想過生成一份屬於你的微信我的數據報告，瞭解你的微信社交歷史。如今，咱們基於python對微信好友進行全方位數據分析，包括：暱稱、性別、年齡、地區、備註名、個性簽名、頭像、羣聊和公衆號等。python

其中，在分析好友類型方面，主要統計出你的陌生人、星標好友、不讓他看個人朋友圈的好友、不看他的朋友圈的好友數據。在分析地區方面，主要統計全部好友在全國的分佈以及對好友數最多的省份進行進一步分析。在其餘方面，統計出你的好友性別比例、猜出你最親密的好友，分析你的特殊好友，找出與你所在共同羣聊數最多的好友數據，對你的好友個性簽名進行分析，對你的好友頭像進行分析，並進一步檢測出使用真人頭像的好友數據。git

目前網上關於這方面的數據分析文章比較多，可是運行起來比較麻煩，而本程序的運行十分簡單，只須要掃碼登陸一步操做便可。github

功能截圖

如何運行

# 跳轉到當前目錄
cd 目錄名
# 先卸載依賴庫
pip uninstall -y -r requirement.txt
# 再從新安裝依賴庫
pip install -r requirement.txt
# 開始運行
python generate_wx_data.py

如何打包成二進制可執行文件

# 安裝pyinstaller
pip install pyinstaller
# 跳轉到當前目錄
cd 目錄名
# 先卸載依賴庫
pip uninstall -y -r requirement.txt
# 再從新安裝依賴庫
pip install -r requirement.txt
# 更新 setuptools
pip install --upgrade setuptools
# 開始打包
pyinstaller generate_wx_data.py

編寫思路

首先，進行初始化，並根據不一樣操做系統，啓用微信機器人。

# 初始化所需文件夾
    init_folders()


    # 啓動微信機器人，自動根據操做系統執行不一樣的指令
    if('Windows' in system()):
        # Windows
        bot = Bot(cache_path=True)
    elif('Darwin' in system()):
        # MacOSX
        bot = Bot(cache_path=True)
    elif('Linux' in system()):
        # Linux
        bot = Bot(console_qr=2,cache_path=True)
    else:
        # 自行肯定
        print(u"沒法識別你的操做系統類型，請本身設置")
        exit()

登陸完微信後，開始獲取好友數據和羣聊數據。

# 獲取全部好友
friends = bot.friends(update=False)

# 獲取全部活躍羣聊
groups = bot.groups()

共同所在羣聊成員分析，依次對每一個好友進行檢測。

def group_common_in():

    # 獲取全部活躍的羣聊
    groups = bot.groups()

    # 每一個好友與你相同的羣聊個數
    dict_common_in = {}

    # 遍歷全部好友，第0個爲你本身，因此去掉
    for x in friends[1:]:
        # 依次在每一個羣聊中搜索
        for y in groups:
            # x在y中
            if(x in y):
                # 獲取微信名稱
                name = x.nick_name
                # 判斷是否有備註，有的話就使用備註
                if(x.remark_name and x.remark_name != ''):
                    name = x.remark_name

                # 增長計數
                if(name in dict_common_in.keys()):
                    dict_common_in[name] += 1
                else:
                    dict_common_in[name] = 1

獲取微信好友頭像，以便進一步分析。這裏下載頭像比較慢，因此採起多線程方式進行下載。在多線程中，使用隊列保存咱們的頭像url，不一樣線程從隊列中獲取頭像url，並下載到本地。

# 建立一個隊列，用於多線程下載頭像，提升下載速度
    queue_head_image = Queue()

    # 將每一個好友元素存入隊列中
    # 若是爲了方便調試，能夠僅僅插入幾個數據，friends[1:10]
    for user in friends:
        queue_head_image.put(user)

    # 啓動10個線程下載頭像
    for i in range(1, 10):
        t = Thread(target=download_head_image,args=(i,))
        t.start()

其中download_head_image的具體實現爲：web

# 下載好友頭像，此步驟消耗時間比較長
def download_head_image(thread_name):

    # 隊列不爲空的狀況
    while(not queue_head_image.empty()):
        # 取出一個好友元素
        user = queue_head_image.get()

        # 下載該好友頭像，並保存到指定位置，生成一個15位數的隨機字符串
        random_file_name = ''.join([str(random.randint(0,9)) for x in range(15)])
        user.get_avatar(save_path='image/' + random_file_name + '.jpg')

        # 輸出提示
        print(u'線程%d:正在下載微信好友頭像數據，進度%d/%d，請耐心等待……' %(thread_name, len(friends)-queue_head_image.qsize(), len(friends)))

進行性別、地區統計，並將生產的html文件保存到本地。這裏沒什麼難度，因此就不詳細展開了。

# 分析好友性別比例
def sex_ratio():

    # 初始化
    male, female, other = 0, 0, 0

    # 遍歷
    for user in friends:
        if(user.sex == 1):
            male += 1
        elif(user.sex == 2):
            female += 1
        else:
            other += 1

    name_list = ['男性', '女性', '未設置']
    num_list = [male, female, other]

    pie = Pie("微信好友性別比例")
    pie.add("", name_list, num_list, is_label_show=True)
    pie.render('data/好友性別比例.html')

分析你認識的好友、最親密的人以及特殊好友。以特殊好友爲例，咱們將好友分爲星標好友(很重要的人), 不讓他看個人朋友圈的好友, 不看他朋友圈的好友, 消息置頂好友, 陌生人。這裏分類的依據是根據itchat中的StarFriend和ContactFlag而來的。根據經驗可知，StarFriend爲1表示爲星標好友，ContactFlag爲1和3表示好友，259和33027表示不讓他看個人朋友圈，65539和65537和66051表示不看他的朋友圈，65795表示兩項設置全禁止, 73731表示陌生人。

# 特殊好友分析
def analyze_special_friends():

    # 星標好友(很重要的人), 不讓他看個人朋友圈的好友, 不看他朋友圈的好友, 消息置頂好友, 陌生人
    star_friends, hide_my_post_friends, hide_his_post_friends, sticky_on_top_friends, stranger_friends = 0, 0, 0, 0, 0

    for user in friends:


        # 星標好友爲1,爲0表示非星標,不存在星標選項的爲陌生人
        if('StarFriend' in (user.raw).keys()):
            if((user.raw)['StarFriend'] == 1):
                star_friends += 1
        else:
            stranger_friends += 1

        # 好友類型及權限：1和3好友，259和33027不讓他看個人朋友圈，65539和65537和66051不看他的朋友圈，65795兩項設置全禁止, 73731陌生人
        if((user.raw)['ContactFlag'] in [259, 33027, 65795]):
            hide_my_post_friends += 1
        if ((user.raw)['ContactFlag'] in [66051, 65537, 65539, 65795]):
            hide_his_post_friends += 1

        # 消息置頂好友爲2051
        if ((user.raw)['ContactFlag'] in [2051]):
            sticky_on_top_friends += 1

        # 陌生人
        if ((user.raw)['ContactFlag'] in [73731]):
            stranger_friends += 1


    bar = Bar('特殊好友分析')
    bar.add(name='', x_axis=['星標', '不讓他看我朋友圈', '不看他朋友圈', '消息置頂', '陌生人'], y_axis=[star_friends, hide_my_post_friends, hide_his_post_friends, sticky_on_top_friends, stranger_friends], legend_orient="vertical", legend_pos="left")
    bar.render('data/特殊好友分析.html')

對好友個性簽名進行分析，並繪製出詞語。這裏比較複雜，首先將個性簽名列表轉化爲字符串，調用nlp處理接口，對返回的數據進行過濾。同時，對短語進行分詞，過濾，詞頻統計操做。最後，使用pyechart進行繪製詞語圖。代碼中註釋很是多，基本都能看懂，因此在此也無需再詳細展開了。

# 分析個性簽名
def analyze_signature():

    # 個性簽名列表
    data = []
    for user in friends:

        # 清除簽名中的微信表情emoj，即<span class.*?</span>
        # 使用正則查找並替換方式，user.signature爲源文本，將<span class.*?</span>替換成空
        new_signature = re.sub(re.compile(r"<span class.*?</span>", re.S), "", user.signature)

        # 只保留簽名爲1行的數據，過濾爲多行的簽名
        if(len(new_signature.split('\n')) == 1):
            data.append(new_signature)

    # 將個性簽名列表轉爲string
    data = '\n'.join(data)

    # 進行分詞處理，調用接口進行分詞
    # 這裏不使用jieba或snownlp的緣由是沒法打包成exe文件或者打包後文件很是大
    postData = {'data':data, 'type':'exportword', 'arg':'', 'beforeSend':'undefined'}
    response = post('http://life.chacuo.net/convertexportword',data=postData)
    data = response.text.replace('{"status":1,"info":"ok","data":["', '')
    # 解碼
    data = data.encode('utf-8').decode('unicode_escape')

    # 將返回的分詞結果json字符串轉化爲python對象，並作一些處理
    data = data.split("=====================================")[0]

    # 將分詞結果轉化爲list，根據分詞結果，能夠知道以2個空格爲分隔符
    data = data.split('  ')

    # 對分詞結果數據進行去除一些無心義的詞操做
    stop_words_list = [',', '，', '、', 'the', 'a', 'is', '…', '·', 'э', 'д', 'э', 'м', 'ж', 'и', 'л', 'т', 'ы', 'н', 'з', 'м', '…', '…', '…', '…', '…', '、', '.', '。', '!', '！', ':', '：', '~', '|', '▽', '`', 'ノ', '♪', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '\'', '‘', '’', '「', '」', '的', '了', '是', '你', '我', '他', '她','=', '\r', '\n', '\r\n', '\t', '如下關鍵詞', '[', ']', '{', '}', '(', ')', '（', '）', 'span', '<', '>', 'class', 'html', '?', '就', '於', '下', '在', '嗎', '嗯']
    tmp_data = []
    for word in data:
        if(word not in stop_words_list):
            tmp_data.append(word)
    data = tmp_data


    # 進行詞頻統計，結果存入字典signature_dict中
    signature_dict = {}
    for index, word in enumerate(data):

        print(u'正在統計好友簽名數據，進度%d/%d，請耐心等待……' % (index + 1, len(data)))

        if(word in signature_dict.keys()):
            signature_dict[word] += 1
        else:
            signature_dict[word] = 1

    # 開始繪製詞雲
    name = [x for x in signature_dict.keys()]
    value = [x for x in signature_dict.values()]
    wordcloud = WordCloud('微信好友個性簽名詞雲圖')
    wordcloud.add("", name, value, shape='star', word_size_range=[1,100])
    wordcloud.render('data/好友個性簽名詞雲.html')

拼接全部好友頭像，這裏使用到PIL的圖像處理功能，首先對頭像個數進行統計，自適應生成矩形圖片。因爲咱們知道微信頭像尺寸爲640 * 640，因此處理起來就很方便了。

# 拼接全部微信好友頭像
def merge_head_image():
    # 拼接頭像
    pics = listdir('image')  # 獲得user目錄下的全部文件，即各個好友頭像
    numPic = len(pics)
    eachsize = int(math.sqrt(float(640 * 640) / numPic))  # 先圈定每一個正方形小頭像的邊長，若是嫌小能夠加大
    numrow = int(640 / eachsize)
    numcol = int(numPic / numrow)  # 向下取整
    toImage = Image.new('RGB', (eachsize * numrow, eachsize * numcol))  # 先生成頭像集模板

    x = 0  # 小頭像拼接時的左上角橫座標
    y = 0  # 小頭像拼接時的左上角縱座標

    for index, i in enumerate(pics):

        print(u'正在拼接微信好友頭像數據，進度%d/%d，請耐心等待……' % (index + 1, len(pics)))

        try:
            # 打開圖片
            img = Image.open('image/' + i)
        except IOError:
            print(u'Error: 沒有找到文件或讀取文件失敗')
        else:
            # 縮小圖片
            img = img.resize((eachsize, eachsize), Image.ANTIALIAS)
            # 拼接圖片
            toImage.paste(img, (x * eachsize, y * eachsize))
            x += 1
            if x == numrow:
                x = 0
                y += 1

    toImage.save('data/拼接' + ".jpg")

檢測使用人臉做爲頭像的好友數量，這裏使用到opencv的人臉檢測功能，使用opencv默認的模型進行檢測。首先載入圖片，並進行灰度處理，最後加載人臉識別模型進行檢測，若檢測到臉數大於0，則說明存在。同時要注意的是，對錯誤的頭像要進行捨棄操做。

# 檢測使用真實人臉的好友個數
def detect_human_face():

    # 獲得user目錄下的全部文件名稱，即各個好友頭像
    pics = listdir('image')

    # 使用人臉的頭像個數
    count_face_image = 0

    # 存儲使用人臉的頭像的文件名
    list_name_face_image = []

    # 加載人臉識別模型
    face_cascade = CascadeClassifier('model/haarcascade_frontalface_default.xml')

    for index, file_name in enumerate(pics):
        print(u'正在進行人臉識別，進度%d/%d，請耐心等待……' % (index+1, len(pics)))
        # 讀取圖片
        img = imread('image/' + file_name)

        # 檢測圖片是否讀取成功，失敗則跳過
        if img is None:
            continue

        # 對圖片進行灰度處理
        gray = cvtColor(img, COLOR_BGR2GRAY)
        # 進行實際的人臉檢測，傳遞參數是scaleFactor和minNeighbor,分別表示人臉檢測過程當中每次迭代時圖
        faces = face_cascade.detectMultiScale(gray, 1.3, 5)
        if (len(faces) > 0):
            count_face_image += 1
            list_name_face_image.append(file_name)

    print(u'使用人臉的頭像%d/%d' %(count_face_image,len(pics)))

全部數據統計完後，咱們生產一個總的html網頁文件，方便咱們直接查看。

# 生成一個html文件，並保存到文件file_name中
def generate_html(file_name):
    with open(file_name, 'w', encoding='utf-8') as f:
        data = '''
            <meta http-equiv='Content-Type' content='text/html; charset=utf-8'>
            <meta charset="UTF-8">
            <title>一鍵生成微信我的專屬數據報告(瞭解你的微信社交歷史)</title>
            <meta name='keywords' content='微信我的數據'>
            <meta name='description' content=''>
            <iframe name="iframe1" marginwidth=0 marginheight=0 width=100% height=60% src="data/好友地區分佈.html" frameborder=0></iframe>
            <iframe name="iframe2" marginwidth=0 marginheight=0 width=100% height=60% src="data/某省好友地區分佈.html" frameborder=0></iframe>
            <iframe name="iframe3" marginwidth=0 marginheight=0 width=100% height=60% src="data/好友性別比例.html" frameborder=0></iframe>
            <iframe name="iframe4" marginwidth=0 marginheight=0 width=100% height=60% src="data/你認識的好友比例.html" frameborder=0></iframe>
            <iframe name="iframe5" marginwidth=0 marginheight=0 width=100% height=60% src="data/你最親密的人.html" frameborder=0></iframe>
            <iframe name="iframe6" marginwidth=0 marginheight=0 width=100% height=60% src="data/特殊好友分析.html" frameborder=0></iframe>
            <iframe name="iframe7" marginwidth=0 marginheight=0 width=100% height=60% src="data/共同所在羣聊分析.html" frameborder=0></iframe>
            <iframe name="iframe8" marginwidth=0 marginheight=0 width=100% height=60% src="data/好友個性簽名詞雲.html" frameborder=0></iframe>
            <iframe name="iframe9" marginwidth=0 marginheight=0 width=100% height=60% src="data/微信好友頭像拼接圖.html" frameborder=0></iframe>
            <iframe name="iframe10" marginwidth=0 marginheight=0 width=100% height=60% src="data/使用人臉的微信好友頭像拼接圖.html" frameborder=0></iframe>
        '''
        f.write(data)