python爬蟲簡單代碼爬取郭德綱單口相聲

搜索老郭的單口相聲,打開檢查模式,刷新json

 

沒有什麼有價值的東東, 不過....清掉內容, 點擊一個相聲,再看看有些什麼網站

是否是發現了些什麼url

咱們來點擊這個看看, 首先看一下headers, 這個url是否是看起來很順眼spa

 

再來preview, 或者打開那個Request URLcode

 

 

 怎麼樣,這個就是網站提供的數據接口了,有了這個接口,咱們獲取文件就至關方便了
 
# -*- coding:utf-8 -*-
# Author : Niuli
# Data : 2019-03-13 16:08


import requests,os

# 數據來源
URL = 'https://www.ximalaya.com/revision/play/album?albumId=9742745&pageNum=1&sort=-1&pageSize=30'
# 僞造請求頭
XMLY_HEADER = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3724.8 Safari/537.36'}


res = requests.get(URL,headers=XMLY_HEADER)
res_json = res.json()


play_list = res_json['data']['tracksAudioPlay']
ALL_PATH = play_list[0]['albumName']

# 建立本地專輯文件夾
os.system(f'mkdir -p {ALL_PATH}/MUSIC')
os.system(f'mkdir -p {ALL_PATH}/COVER')

MUSIC_PATH = ALL_PATH + '/MUSIC'
COVER_PATH = ALL_PATH + '/COVER'


for i in play_list:
    # print(i['trackName'])
    # print(i['trackCoverPath'])
    # print(i['src'])

    # 獲取文件信息 (標題 音樂路徑 圖片路徑)
    url_title = i['trackName']
    url_music_path = i['src']
    url_cover_path = 'https:' + i['trackCoverPath']

    # 下載保存音樂文件
    music_file = requests.get(url_music_path)  # 下載文件
    local_music_path = os.path.join(MUSIC_PATH,f'{url_title}.mp3')  # 保存路徑+文件名+後綴
    # 寫入音樂文件
    with open(local_music_path,'wb') as f:
        f.write(music_file.content)

    # 下載保存圖片信息
    cover_file = requests.get(url_cover_path)  # 下載文件
    local_cover_path = os.path.join(COVER_PATH,f'{url_title}.jpg')  # 保存路徑+文件名+後綴
    # 寫入圖片文件
    with open(local_cover_path, 'wb') as f:
        f.write(cover_file.content)

同理能夠獲取其餘音頻咯blog

相關文章
相關標籤/搜索