Python3+itchat爬蟲實戰

本文主要記錄如何用Python調用itchat來爬取好友信息，而且製做好友性別柱狀圖和好友個性簽名詞雲。涉及以下模塊：python

itchat ：一個開源的微信我的號接口，能夠實現信息收發、獲取好友列表等功能。git

jieba ：python中文分詞組件，製做詞雲的時候會用到github

matpolotlib ：python的一個用來畫圖的庫正則表達式

wordcloud ：用來製做詞雲微信

怎麼下載？app

怎麼安裝？？dom

詳細介紹？？？函數

在上面的粗體字模塊名上點擊一下就知道了~~~字體

OK ! 正式開始ui

代碼環境：Python3+win10

第一步：python登錄微信，並獲取全部好友的信息

def my_friends():
     #二維碼登錄
    itchat.auto_login()
    #獲取好友信息
    friends = itchat.get_friends(update=True)
    return friends

運行這個函數時電腦屏幕會出現一個二維碼，手機微信掃描後便可完成登錄。同時終端會輸出以下信息：

    Getting uuid of QR code.
    Downloading QR code.
    Please scan the QR code to log in.
    Please press confirm on your phone.
    Loading the contact, this may take a little while.
    Login successfully as 某某某

itchat的get_friends方法會獲取到全部好友信息。須要說明的是此處return的friends是列表類型，列表中的元素是字典類型，且列表中第0個元素是本身，這個後續數據處理的時候會遇到。至此，第一步已完成。

第二步：提取數據

在第一步中微信好友的數據已所有放入friends這個列表中，接下來遍歷列表並從中取出咱們須要內容便可。

1.好友性別統計

def my_friends_sex(friends):
   
    #建立一個字典用於存放好友性別信息
    friends_sex = dict()
    #定義好友性別信息字典的key，分別爲男性，女性，其餘
    male    =  "男性"
    female  =  "女性"
    other   =  "其餘"

    #遍歷列表中每個好友的信息，     
    for i in friends[1:]:
        sex = i["Sex"]
        if sex == 1:
            #字典操做，找到key併爲其的值加1
            friends_sex[male] = friends_sex.get(male,0) + 1
        elif sex == 2:
            friends_sex[female] = friends_sex.get(female,0) + 1
        elif sex == 0 :
            friends_sex[other] = friends_sex.get(other,0) + 1
    #打印好友性別信息的字典
    #print (friends_sex)
    #好友總數，從第二個開始是由於第一個好友是本身
    totle = len(friends[1:])
    
    proportion = [float(friends_sex[male])/totle*100,float(friends_sex[female])/totle*100,float(friends_sex[other])/totle*100]
    print (
       "男性好友：%.2f%% " % (proportion[0])     +'\n' +
       "女性好友：%.2f%% " % (proportion[1])   +'\n' +
       "其餘：%.2f%% "  % (proportion[2])
       )
    return friends_sex

額~註釋寫的夠詳細吧，主要是怕本身過兩天就忘了。。。

在遍歷friends列表的時候本函數提取其元素的key爲Sex，這是由於，由於Sex對應的是性別啊！另外還有幾個其餘經常使用的key：

       'NickName'      好友暱稱
       'RemarkName'   備註
       'Signature'         簽名
       'Province':          省
       'City':                   市
       'SEX'                    性別，1男 2女 0其餘

return的friends_sex是一個字典，有三個key，分別是male,female,other。因爲咱們的目的是畫好友性別的統計圖，因此須要獲得每一個性別的人數。

2.獲取好友個性簽名

def my_friends_style(friends):
    #建立列表用於存放個性簽名
    style = []
    for i in range(len(friends)):
        #每個好友的信息存放在列表中的字典裏，此處獲取到
        i = friends[i]
        #獲得每一個字典的個性簽名的key，即Signature
        #strip去除字符串首位的空格，replace去掉英文
        Signature = i['Signature'].strip().replace('span','').replace('class','').replace('emoji','')
        #經過正則表達式將簽名中的特殊符號去掉，re.sub則至關於字符串操做中的replace
        rep = re.compile('1f\d+\w*|[<>/=]')
        Signature=rep.sub('',Signature)
        #放入列表
        style.append(Signature)
    #join() 方法用於將序列中的元素以指定的字符鏈接生成一個新的字符串。
    #此處將全部簽名去除特殊符號和英文以後，拼接在一塊兒
    text = ''.join(style)
    #將輸出保存到文件，並用結巴來分詞
    with io.open('F:\python_實戰\itchat\微信好友個性簽名詞雲\\text.txt','a',encoding = 'utf-8') as f:
        wordlist = jieba.cut(text,cut_all=False)
        word_space_split = ' '.join(wordlist)
        f.write(word_space_split)

個性簽名的數據處理相比性別統計要複雜一丟丟，因爲你們的個性簽名都比較個性，大多包含一些表情或者特殊符號，全部提取到Signature後須要用strip方法去除字符串首位的空格，再用正則表達式去除特殊符號，最後用結巴分詞後，將數據放入一個文件中，後續製做詞雲時使用。

結巴分詞的cut_all=False表示精確模式，若是你設置爲True，詞雲會很。。。

第三步：畫圖

1.好友性別柱狀圖

def drow_sex(friends_sex):
    #獲取餅狀圖的標籤和大小
    labels = []
    sizes = []
    for key in friends_sex:
        labels.append(key)
        sizes.append(friends_sex[key])
    #每塊圖的顏色，數量不足時會循環使用
    colors = ['red', 'yellow', 'blue']
    #每一塊離中心的距離
    explode = (0.1,0,0)
    #autopct='%1.2f%%'百分數保留兩位小數點；shadow=True,加陰影使圖像更立體
    #startangle起始角度，默認爲0°，通常設置爲90比較好看
    plt.pie(sizes,explode=explode,labels=labels,colors=colors,autopct='%1.2f%%',shadow=True,startangle=90)
    #設置圖像的xy軸一致
    plt.axis('equal')
    #顯示顏色和標籤對應關係
    plt.legend()
    #添加title，中文有亂碼是個坑，不過我找到填平的辦法了
    plt.suptitle("微信好友性別統計圖")
    #保存到本地，由於show以後會建立空白圖層，因此必須在show以前保存
    plt.savefig('F:\python_實戰\itchat\好友性別餅狀圖.png')
    plt.show()

全是 matplotlib的用法，沒啥好說的

若是有title中文亂碼的問題，在程序開始前
from pylab import *
mpl.rcParams['font.sans-serif'] = ['SimHei']

2.好友個性簽名詞雲

def wordart():
    back_color = imread('F:\python_實戰\itchat\微信好友個性簽名詞雲\\貓咪.png')
    wc = WordCloud(background_color='white',    #背景色
                   max_words=1000,
                   mask=back_color,     #以該參數值繪製詞雲
                   max_font_size=100,
                   
                   font_path="C:/Windows/Fonts//STFANGSO.ttf", #設置字體類型，主要爲了解決中文亂碼問題
                   random_state=42, #爲每一詞返回一個PIL顏色
            )
    
    #打開詞源文件
    text = open("F:\python_實戰\itchat\微信好友個性簽名詞雲\\text.txt",encoding='utf-8').read()
    #
    wc.generate(text)
    #基於彩色圖像生成相應顏色
    image_colosr = ImageColorGenerator(back_color)
    #顯示圖片
    plt.imshow(wc)
    #關閉座標軸
    plt.axis("off")
    #保存圖片
    wc.to_file("F:\python_實戰\itchat\微信好友個性簽名詞雲\\詞雲.png")

完工~~~

python基礎知識補充：

1.字典操做

舉例
    b={'A':1,'B':2,'C':3,'D':4}
    b['A']
    Out[28]: 1
    b['D']
    Out[29]: 4

2.字典get方法

get()方法語法：
dict.get(key, default=None)
參數
key -- 字典中要查找的鍵。
default -- 若是指定鍵的值不存在時，返回該默認值值。
舉例
dict = {'Name': 'Zara', 'Age': 27}
print "Value : %s" % dict.get('Age')
print "Value : %s" % dict.get('Sex', "Never")
輸出：
Value : 27
Value : Never

3.列表內容直接寫入文件

with open('F:\python_實戰\itchat\\friends.txt','a+') as f:
for i in range(len(friends)):
f.write(str(friends[i]))

4.strip()方法

用於移除字符串首位的特色字符，默認爲去除空格 a = "assdgheas" a.strip('as') print(a) 輸出：ssdghe