Python系列爬蟲之B站Top100小視頻下載

時間 2021-04-23

標籤 python 編程 json api app dom ide 函數工具學習欄目 Python 简体版

原文原文鏈接

前言

今天給你們介紹下載B站top100小視頻，讓咱們愉快地開始吧~python

開發工具

Python版本：3.6.4

環境搭建

安裝Python並添加到環境變量，pip安裝須要的相關模塊便可。app

原理簡介

首先，固然是打開B站小視頻所在的網址：dom

http://vc.bilibili.com/p/eden/rank#/?tab=%E5%85%A8%E9%83%A8

而後打開開發者模式，簡單抓包能夠發現請求如下這個連接就能夠返回視頻的真實地址： ide

請求該連接須要攜帶的參數包括：函數

page_size: 10    # 顯然，參數含義是每頁返回幾個視頻唄
next_offset:     # 往下翻頁能夠發現第二頁的值爲11, 第三頁爲21，因此應該是當前偏移量
tag: 今日熱門    # 標籤，值固定
platform: pc     # 聲明平臺，值固定

根據上面的分析結果，定義個函數來自動獲取B站前100個小視頻的連接吧：工具

'''獲取B站前top_n個小視頻的連接'''
def getVideoTopNLinks(top_n):
  assert top_n > 0, '<top_n> in function getVideoTopNLinks must be larger than zero.'
  print('[INFO]: Start to get video topn links...')
  info_url = 'http://api.vc.bilibili.com/board/v1/ranking/top?'
  headers = {
        'Referer': 'http://vc.bilibili.com/p/eden/rank',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
      }
  params_base = {
          'page_size': 10,
          'next_offset': -10,
          'tag': '今日熱門',
          'platform': 'pc'
          }
  video_infos = []
  while True:
    params_base['next_offset'] += params_base['page_size']
    if top_n <= 10:
      params_base['page_size'] = top_n
      top_n = 0
    else:
      top_n = top_n - 10
    try:
      res = requests.get(info_url, params=params_base, headers=headers)
      items = res.json()['data']['items']
      for item in items:
        title = item['item']['description']
        for char in '/:：*?？"<>|':
          title = title.replace(char, '')
        link = item['item']['video_playurl']
        video_infos.append([title, link])
        print('[INFO]: Got %s...' % title)
    except:
      print('[Warnning]: Something error when getting video links...')
    if top_n <= 0:
      break
    time.sleep(random.random() * 2)
  print('[INFO]: Finish, get %d links in total...' % (len(video_infos)))
  return video_infos

而後寫個下載視頻的函數：學習

'''下載單個視頻'''
def downloadVideo(video_info, savepath):
  checkDir(savepath)
  savename, video_link = '%s.mp4' % video_info[0], video_info[1]
  headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
      }
  with closing(requests.get(video_link, headers=headers, stream=True, verify=False)) as res:
    total_size = int(res.headers['content-length'])
    if res.status_code == 200:
      label = '[%s, FileSize]:%0.2f MB' % (savename, total_size/(1024*1024))
      with click.progressbar(length=total_size, label=label) as progressbar:
        with open(os.path.join(savepath, savename), "wb") as f:
          for chunk in res.iter_content(chunk_size=1024):
            if chunk:
              f.write(chunk)
              progressbar.update(1024)

最後遍歷獲得的視頻連接列表下載這些視頻就大功告成啦：

for video_info in video_infos:
    try:
      downloadVideo(video_info, savepath)
    except:
      print('[Warnning]: Fail to download %s...' % video_info[1])

爲了幫助提高正在學習Python編程的夥伴們，在這裏爲你們準備了豐富的學習大禮包

All done~ 完整源代碼詳見我的簡介獲取相關文件~