python使用beautifulsoup4爬取酷狗音樂

時間 2019-12-09

標籤 python 使用 beautifulsoup4 beautifulsoup 音樂欄目 Python 简体版

原文原文鏈接

聲明：本文僅爲技術交流，請勿用於它處。
小編常常在網上聽一些音樂可是有一些網站好多音樂都是付費下載的正好我會點爬蟲技術，空閒時間寫了一份，截止4月底沒有問題的，會下載到當前目錄，只要按照bs4庫就好，
安裝方法：pip install beautifulsoup4
完整代碼以下：雙擊就能直接運行php

from bs4 import BeautifulSoup
import requests
import re
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36'
}
url='https://songsearch.kugou.com/song_search_v2?&page=1&pagesize=30&userid=-1&clientver=&platform=WebFilter&tag=em&filter=2&iscorrection=1&privilege_filter=0&_=1555124510574'
#想要爬取別的網頁直接修改這個json數據地址就行
r=requests.get(url,headers=headers)
soup=BeautifulSoup(r.text,'lxml')
title_list=soup.select('.pc_temp_songlist ul li')
hash=re.findall(r',"FileHash":"(.*?)"',r.text)
hash1=re.findall(r',"FileName":"(.*?)"',r.text)
#直接用正則匹配隱藏的數據
print(hash)
print(hash1)
q=0
for url in hash:
url_a=f'https://wwwapi.kugou.com/yy/index.php?r=play/getdata&callback=jQuery1910212680783679835_1555073815772&hash={url}&album_id=18784389'
#這個URL不用修改的
c=requests.get(url_a,headers=headers)
a=c.text[40:-3]
b=re.findall('"play_url":"(.*)","authors":',a)[0]
b1=re.sub(r"\\",'',b)
f = requests.get(b1)
with open(hash1[q]+'.mp3','wb')as d:
d.write(f.content)
print(hash1[q])
q+=1

爬取酷狗的惟一難點就是hash值的獲取找了一個多小時才找到，比網易雲好點就是本身不用寫一個哈希值，酷狗是本身就存在的能找到，網易雲是須要函數生成的。
以上所述是小編給你們介紹的python獲取酷狗音樂top500的下載地址 MP3格，但願對你們有所幫助，若是你們有任何疑問請給我留言，小編會及時回覆你們的。在此也很是感謝你們對腳本之家網站的支持！python

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。