手把手教你使用Python抓取QQ音樂數據（第三彈）

時間 2020-12-08

標籤 json 瀏覽器服務器微信網絡 app 框架 less dom 學習欄目 Python 简体版

原文原文鏈接

【1、項目目標】json

經過手把手教你使用Python抓取QQ音樂數據（第一彈）咱們實現了獲取 QQ 音樂指定歌手單曲排行指定頁數的歌曲的歌名、專輯名、播放連接。瀏覽器

經過手把手教你使用Python抓取QQ音樂數據（第二彈）咱們實現了獲取 QQ 音樂指定歌曲的歌詞和指定歌曲首頁熱評。服務器

這次咱們在項目（二）的基礎上獲取更多評論並生成詞雲圖，造成手把手教你使用Python抓取QQ音樂數據（第三彈）。微信

【2、須要的庫】網絡

主要涉及的庫有：requests、json、wordcloud、jiebaapp

如需更換詞雲圖背景圖片還須要numpy庫和PIL庫（pipinstall pillow）框架

【3、項目實現】less

1.首先回顧一下，下面是項目（二）獲取指定歌曲首頁熱評的代碼；dom

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
def get_comment(i):學習

url_3 = 'https://c.y.qq.com/base/fcgi-...'

headers = {

'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36',

/# 標記了請求從什麼設備，什麼瀏覽器上發出

}

params = {'g_tk_new_20200303': '5381', 'g_tk': '5381', 'loginUin': '0', 'hostUin': '0', 'format': 'json', 'inCharset': 'utf8', 'outCharset': 'GB2312', 'notice': '0', 'platform': 'yqq.json', 'needNewCode': '0', 'cid': '205360772', 'reqtype': '2', 'biztype': '1', 'topid': id, 'cmd': '8', 'needmusiccrit': '0', 'pagenum': '0', 'pagesize': '25', 'lasthotcommentid': '', 'domain': 'qq.com', 'ct': '24', 'cv': '10101010'}

res_music = requests.get(url_3,headers=headers,params=params)

/# 發起請求

js_2 = res_music.json()

comments = js_2'hot_comment'

f2 = open(i+'評論.txt','a',encoding='utf-8') /#存儲到txt中

for i in comments:

comment = i['rootcommentcontent'] + 'n——————————————————————————————————n'

f2.writelines(comment)

/# print(comment)

f2.close()

2.下面來考慮如何獲取後面的評論，下圖是項目（二）評論頁面的parms參數；

3.網頁沒法選擇評論的頁碼，想看後面的評論智能一次一次的點擊「點擊加載更多」；咱們能夠點擊一下看看parms有什麼變化。

![]()

4.這裏有個小技巧，先點擊下圖所示clear按鈕，把network界面清空，再點擊「點擊加載更多」，就能直接找到第二頁的數據。

![]()

5.點擊加載更多後出現下圖。

![]()

6.發現不止pagenum變了，cmd和pagesize也變了，到底那個參數的問題呢，那咱們再看下第三頁；

7.只有pagenum變了，那咱們嘗試一下把pagenum改爲「0」，其餘不變，能正常顯示第一頁數據嗎?

第一頁第一條評論

第一頁最後一條評論。

![]()

8.能正常顯示，那就肯定思路了：用第二頁的parms，寫一個for循環賦值給pagenum，參考項目（二）把評論抓取到txt。

9.代碼實現：爲了避免給服務器形成太大壓力，咱們本次只爬取20頁數據。

*
*
*
*
*
*
*
*
*
*
*
*
*
import requests,json

def get_id(i):

global id

url_1 = 'https://c.y.qq.com/soso/fcgi-...'

/# 這是請求歌曲評論的url

headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}

params = {'ct': '24', 'qqmusic_ver': '1298', 'new_json': '1', 'remoteplace': 'txt.yqq.song', 'searchid': '71600317520820180', 't': '0', 'aggr': '1', 'cr': '1', 'catZhida': '1', 'lossless': '0', 'flag_qc': '0', 'p': '1', 'n': '10', 'w': i, 'g_tk': '5381', 'loginUin': '0', 'hostUin': '0', 'format': 'json', 'inCharset': 'utf8', 'outCharset': 'utf-8', 'notice': '0', 'platform': 'yqq.json', 'needNewCode': '0'}

res_music = requests.get(url_1,headers=headers,params=params)

json_music = res_music.json()

id = json_music'data''list'['id']

return id

/# print(id)

*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
def get_comment(i):

url_3 = 'https://c.y.qq.com/base/fcgi-...'

headers = {'user-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'}

f2 = open(i+'評論.txt','a',encoding='utf-8') /#存儲到txt中

for n in range(20):

params = {'g_tk_new_20200303': '5381', 'g_tk': '5381', 'loginUin': '0', 'hostUin': '0', 'format': 'json', 'inCharset': 'utf8', 'outCharset': 'GB2312', 'notice': '0', 'platform': 'yqq.json', 'needNewCode': '0', 'cid': '205360772', 'reqtype': '2', 'biztype': '1', 'topid': '247347346', 'cmd': '6', 'needmusiccrit': '0', 'pagenum':n, 'pagesize': '15', 'lasthotcommentid': 'song_247347346_3297354203_1576305589', 'domain': 'qq.com', 'ct': '24', 'cv': '10101010'}

res_music = requests.get(url_3,headers=headers,params=params)

js_2 = res_music.json()

comments = js_2'comment'

for i in comments:

comment = i['rootcommentcontent'] + 'n——————————————————————————————————n'

f2.writelines(comment)

/# print(comment)

f2.close()

input('下載成功，按回車鍵退出！')

*
*
*
*
def main(i):

get_id(i)

get_comment(i)