以前都是爬取網頁中的文本信息,沒有爬取過視頻和音頻文件,因此爬取了下b站和網易雲音樂,記錄下整個過程,留着學習。python
1. 爬取b站視頻git
1.1 網頁分析github
最近python機器學習比較火熱,那就爬取點機器學習的視頻吧。首先打開b站網頁,輸入「python機器」進行搜索,返回頁面中,審查元素能夠發現每一個視頻系列都有一個惟一的ID,以下圖所示: av28879057, 即爲當前視頻的一個ID值。正則表達式
得知每一個視頻對應的惟一ID後,點擊視頻進去查看下,發現視頻url主要有這下面這兩種:json
1:https://www.bilibili.com/video/av28879057 (視頻只有一集,url即爲上面咱們觀察到的ID值)windows
2. https://www.bilibili.com/video/av30292394/?p=3 (視頻爲一個系列,後面參數p=3,表示該ID下的第三集)api
至此咱們基本上對於每一個視頻界面的url構造清楚了,接下來就是尋找視頻的下載地址了。刷新下網頁,點擊播放,查看下網絡請求,對結果按大小排序,能夠發現一個x-flv格式的大文件的傳輸請求,應該就是視頻的下載地址,以下圖所示,能夠看到請求須要7個參數,研究了下別的視頻後發現,有兩個參數是動態變化的:ssig和trid。查看了下其餘的json返回請求,並無發現這兩個參數,最後只能去網頁源碼裏搜索下,看看有沒有相關的動態生成函數,卻發現網頁源碼中直接包含視頻的下載地址,存在於一個window.__playinfo__={} 的字典json中,只需對其正則匹配就好了,這下就簡單了。xcode
將這個字典匹配後進行查看,結果以下:能夠發現整個視頻被拆分紅了多個小的視頻,按順序進行了編號,order爲序號,url即爲視頻下載地址,所以只須要分別對這些視頻進行下載,最後再拼接就能夠了。瀏覽器
{ "code": 0, "message": "0", "ttl": 1, "data": { "from": "local", "result": "suee", "message": "", "quality": 32, "format": "flv480", "timelength": 7121936, "accept_format": "flv720,flv480,flv360", "accept_description": ["高清 720P", "清晰 480P", "流暢 360P"], "accept_quality": [64, 32, 16], "video_codecid": 7, "seek_param": "start", "seek_type": "offset", "durl": [{ "order": 1, "length": 363246, "size": 24653145, "ahead": "EZA=", "vhead": "AWQAH//hAB5nZAAfrNlAvD3m//DQEM/xAAADAAEAAAMAPA8YMZYBAAVo6+zyPA==", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?expires=1554535500&platform=pc&ssig=tz7ktrLd7bdj8qukIG9cjQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?expires=1554535500&platform=pc&ssig=tz7ktrLd7bdj8qukIG9cjQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-1-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=732e5ee7aad2a9a08406b92aa0bb2ca3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 2, "length": 330944, "size": 23865726, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?expires=1554535500&platform=pc&ssig=LemBQ8rVic-aAAN9iXwWGg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?expires=1554535500&platform=pc&ssig=LemBQ8rVic-aAAN9iXwWGg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-2-32.flv?e=ig8euxZM2rNcNbR3hwdVhoM1nwdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=bb0c67342e48e1a8b438dcc9606f9e91&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 3, "length": 352981, "size": 25848758, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?expires=1554535500&platform=pc&ssig=vSDeETHYfUOLYf8caLiW5Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?expires=1554535500&platform=pc&ssig=vSDeETHYfUOLYf8caLiW5Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-3-32.flv?e=ig8euxZM2rNcNbR3hbUVhoM1nwNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=30faa351c57a559f7b69654809418da9&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 4, "length": 394413, "size": 26565740, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?expires=1554535500&platform=pc&ssig=uaupgm_tbgSyVbou66oO-A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?expires=1554535500&platform=pc&ssig=uaupgm_tbgSyVbou66oO-A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-4-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=2bb21503e670b1a82769ed6524ea7c25&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 5, "length": 388312, "size": 26901267, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?expires=1554535500&platform=pc&ssig=DM7BjFfnFGzoux7NA7Ix5g&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?expires=1554535500&platform=pc&ssig=DM7BjFfnFGzoux7NA7Ix5g&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-5-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=68a9f6b8213285eb7fba15736e2c683b&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 6, "length": 239979, "size": 15473865, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?expires=1554535500&platform=pc&ssig=KGQ7DIH2XeAfW0QU4C7X7w&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?expires=1554535500&platform=pc&ssig=KGQ7DIH2XeAfW0QU4C7X7w&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-6-32.flv?e=ig8euxZM2rNcNbRjhwdVhoM17bdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=4e27dfa3076edd399b0e6ee547f1dd51&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 7, "length": 426645, "size": 29245686, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?expires=1554535500&platform=pc&ssig=X_NsbB2FEjaE4W2yGI2YMQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?expires=1554535500&platform=pc&ssig=X_NsbB2FEjaE4W2yGI2YMQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-7-32.flv?e=ig8euxZM2rNcNbRahwdVhoM17zdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=c3ef5ea3bdd2ab1ac310970d85341c80&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 8, "length": 423211, "size": 30372670, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?expires=1554535500&platform=pc&ssig=rU90cc9rkqn--2je747LAQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?expires=1554535500&platform=pc&ssig=rU90cc9rkqn--2je747LAQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-8-32.flv?e=ig8euxZM2rNcNbRa7zUVhoM17zuBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=98d5301937834486e0bd9c2996cd73f4&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 9, "length": 291178, "size": 19475045, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?expires=1554535500&platform=pc&ssig=sMfGnyjVuKCsOzIp9EAanQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?expires=1554535500&platform=pc&ssig=sMfGnyjVuKCsOzIp9EAanQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-9-32.flv?e=ig8euxZM2rNcNbRj7WdVhoM17bUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=08439132f1831b423be6577c7bd5ef89&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 10, "length": 370880, "size": 25219151, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?expires=1554535500&platform=pc&ssig=kKqhofi4ayRRMoquCxz-pw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?expires=1554535500&platform=pc&ssig=kKqhofi4ayRRMoquCxz-pw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-10-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=3d5d140e0dd02a83245ae86da23eb8b9&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 11, "length": 381612, "size": 26624914, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?expires=1554535500&platform=pc&ssig=HFFhsFFGyXOV8Q3QmF8sJQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?expires=1554535500&platform=pc&ssig=HFFhsFFGyXOV8Q3QmF8sJQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-11-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=63f2b71981080c752eed5166a9a85332&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 12, "length": 361344, "size": 25254786, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?expires=1554535500&platform=pc&ssig=UuAqqNbr1xC5gMlu5FUYdQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?expires=1554535500&platform=pc&ssig=UuAqqNbr1xC5gMlu5FUYdQ&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-12-32.flv?e=ig8euxZM2rNcNbRahbUVhoM17zNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=1705e8d3f1075a717c6a91ae018396fe&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 13, "length": 334912, "size": 24639608, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?expires=1554535500&platform=pc&ssig=MQbcDgFo8iqQ2Uf4yO-L0A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?expires=1554535500&platform=pc&ssig=MQbcDgFo8iqQ2Uf4yO-L0A&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-13-32.flv?e=ig8euxZM2rNcNbR3hbUVhoM1nwNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=a5f1be479528b8a92a462acab849af46&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 14, "length": 365845, "size": 24930389, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?expires=1554535500&platform=pc&ssig=bpVSp4oDvkaLf1HTlWl5xA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?expires=1554535500&platform=pc&ssig=bpVSp4oDvkaLf1HTlWl5xA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-14-32.flv?e=ig8euxZM2rNcNbRahwdVhoM17zdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=31a76487e1d32acd5b573f45a4169997&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 15, "length": 338347, "size": 23943047, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?expires=1554535500&platform=pc&ssig=ieioDVAxcZLksQ55egulgg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?expires=1554535500&platform=pc&ssig=ieioDVAxcZLksQ55egulgg&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-15-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=36cdd93257a88fdc90a0c85f2b9babe3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 16, "length": 475181, "size": 34293360, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?expires=1554535500&platform=pc&ssig=Ps_lae8ZoX800sJZh-eRRA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?expires=1554535500&platform=pc&ssig=Ps_lae8ZoX800sJZh-eRRA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-16-32.flv?e=ig8euxZM2rNcNbR3hwdVhoM1nwdVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=fe34135d841548c79f78f687282c6bc3&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 17, "length": 204846, "size": 13746922, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?expires=1554535500&platform=pc&ssig=mzbEJYcCFWAO0ioYePxG_Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?expires=1554535500&platform=pc&ssig=mzbEJYcCFWAO0ioYePxG_Q&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-17-32.flv?e=ig8euxZM2rNcNbRj7zUVhoM17buBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=915e7dc2c91a4072e91bd43988379c8b&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 18, "length": 469078, "size": 32875195, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?expires=1554535500&platform=pc&ssig=gdm21_hyrHYWZfsmgPkMDA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?expires=1554535500&platform=pc&ssig=gdm21_hyrHYWZfsmgPkMDA&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-18-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=e644f16f487b7bd5625326a550716479&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 19, "length": 328213, "size": 21350561, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?expires=1554535500&platform=pc&ssig=3LoFiUwUGXFRJHBpigewOw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?expires=1554535500&platform=pc&ssig=3LoFiUwUGXFRJHBpigewOw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-19-32.flv?e=ig8euxZM2rNcNbRjhbUVhoM17bNBhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=ec0d506311d0efdbfbf576d297c3ebba&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }, { "order": 20, "length": 280769, "size": 19777669, "ahead": "", "vhead": "", "url": "http://cn-hbwh-cmcc-v-04.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?expires=1554535500&platform=pc&ssig=r8NbvnHMQ58qfdYJHoD4kw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "backup_url": ["http://cn-fjfz-cmcc-v-01.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?expires=1554535500&platform=pc&ssig=r8NbvnHMQ58qfdYJHoD4kw&oi=3083732713&trid=eee825f5aa484900b4976d25ac8b876e&nfb=maPYqpoel5MI3qOUX6YpRA==&nfc=1", "http://cn-sdjn3-cmcc-acache-02.acgvideo.com/upgcxcode/45/83/52808345/52808345-20-32.flv?e=ig8euxZM2rNcNbRa7WdVhoM17zUVhwdEto8g5X10ugNcXBlqNxHxNEVE5XREto8KqJZHUa6m5J0SqE85tZvEuENvNC8xNEVE9EKE9IMvXBvE2ENvNCImNEVEK9GVqJIwqa80WXIekXRE9IMvXBvEuENvNCImNEVEua6m2jIxux0CkF6s2JZv5x0DQJZY2F8SkXKE9IB5QK==&deadline=1554535694&gen=playurl&nbs=1&oi=3083732713&os=acache&platform=pc&trid=eee825f5aa484900b4976d25ac8b876e&uipk=5&upsig=1991fbc0d72dfe2aaac05943f26d54e4&uparams=e,deadline,gen,nbs,oi,os,platform,trid,uipk"] }] }, "session": "e5c0e030d13633062a9889d1390010d9", "videoFrame": {} }
1.2 視頻下載緩存
根據上面的分析過程,視頻爬取步驟以下:
1,根據視頻的ID,構造該視頻的url
2,訪問視頻url,對返回的網頁進行正則匹配,拿到全部的視頻下載地址和編號
3,根據視頻下載地址,將視頻保存到本地 (請求頭中注意加入Referer和Origin,不然會返回Http 458)
代碼以下:
#coding:utf-8 import requests import re import json import os import time import subprocess #傳入視頻的url def down_video(video_url,path="temp_videos"): """ video_url 待下載的video的url path 下載的視頻保存地址 """ #video_url = "https://www.bilibili.com/video/av30292394?p=3" #video_url = "https://www.bilibili.com/video/av28879057" headers = { "User-Agent":"Mozilla/5.0 (Windows NT 6.1; r…) Gecko/20100101 Firefox/66.0", } response = requests.get(video_url,headers=headers) #在網頁源碼中匹配視頻地址信息 match_text = re.search(r'<script>window.__playinfo__=(\{.*?\})</script>',response.text,re.S) #re.S,將字符竄中有換行時,將字符竄做爲一個總體進行匹配;(不然一行匹配不到時,再匹配下一行) json_data = json.loads(match_text.group(1),encoding="utf-8") #match_text.group(1)爲unicode字符竄 urls = json_data["data"]["durl"] #視頻包括多個部分,拿到包括各個部分url的列表 content_size = sum([item["size"] for item in urls]) #視頻總大小 print("視頻總大小爲:%0.2f Mb"%(content_size/(1024*1024))) if not os.path.exists(path): os.mkdir(path) header={ "Origin":"https://www.bilibili.com", "Referer":video_url, #請求頭必須添加referer } headers.update(header) size=0 start = time.time() for i,item in enumerate(urls): url = item["url"] try: result = requests.get(url,headers=headers,stream=True,verify=False) print result.status_code video_path = os.path.join(path,"{}.mp4".format(i)) with open(video_path,"wb") as f: for chunk in result.iter_content(1024): f.write(chunk) f.flush() #清空緩存 size = size+len(chunk) #print("已下載:%0.2f Mb"%(size/(1024*1024))) except Exception as e: print("url下載錯誤:%s"%url) print(e) stop = time.time() print("下載完成,耗時:%0.2f秒"%(stop-start))
1.3 視頻拼接
上面下載下來的視頻也能夠直接播放,但逐個播放比較麻煩,能夠利用ffmpeg進行拼接。
首先須要下載ffmpeg(https://ffmpeg.zeranoe.com/builds/),解壓將其拷貝到相應的文件夾,而後將bin目錄下的ffmpeg.exe加入到環境變量,命令行輸入ffmpeg -version, 返回提示信息即安裝成功
ffmpeg拼接視頻的命令語句爲: ffmpeg -f concat -safe 0 -i path.txt -c copy output.mp4
其中path.txt包含須要拼接的視頻的路徑,格式以下:(表示video路徑下的v_1.mp4)
file 'video/v_1.mp4' file 'video/v_2.mp4' file 'video/v_3.mp4'
output.mp4表示拼接後的視頻存放地址,也能夠寫成 video/output.mp4,即保存到video文件夾下。
最終拼接的代碼以下:
#將下載的多個視頻拼接成一個完整的視頻 def concatenate(path,title,output="vidoes"): """ path 爲待拼接的視頻的保存地址 title 爲拼接後視頻的名稱 output 爲拼接後視頻保存的地址 """ with open("path.txt",'w') as f: for root,dirs,files in os.walk(path): for file in files: if os.path.splitext(file)[1] in [".flv",".mkv",".mp4"]: v_path = os.path.join(root,file) f.write("file '{}'\n".format(v_path)) if os.path.exists("path.txt"): if not os.path.exists(output): os.mkdir(output) try: print("開始合併視頻") path_name = os.path.join(output,title+".mp4") ffmpeg_command = r"D:\ffmpeg-win32-static\bin\ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name) #若將D:\ffmpeg-win32-static\bin\ffmpeg.exe路徑加入環境變量,能夠用"ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name) #print ffmpeg_command subprocess.call(ffmpeg_command) subprocess.call("rmdir /s %s"%path) #windows 刪除目錄 subprocess.call("del path.txt") #windows 刪除文件 except Exception as e: print(e)
完整的代碼以下:
#coding:utf-8 import requests import re import json import os import time import subprocess #傳入視頻的url def down_video(video_url,path="temp_videos"): """ video_url 待下載的video的url path 下載的視頻保存地址 """ #video_url = "https://www.bilibili.com/video/av30292394?p=3" #video_url = "https://www.bilibili.com/video/av28879057" headers = { "User-Agent":"Mozilla/5.0 (Windows NT 6.1; r…) Gecko/20100101 Firefox/66.0", } response = requests.get(video_url,headers=headers) #在網頁源碼中匹配視頻地址信息 match_text = re.search(r'<script>window.__playinfo__=(\{.*?\})</script>',response.text,re.S) #re.S,將字符竄中有換行時,將字符竄做爲一個總體進行匹配;(不然一行匹配不到時,再匹配下一行) json_data = json.loads(match_text.group(1),encoding="utf-8") #match_text.group(1)爲unicode字符竄 urls = json_data["data"]["durl"] #視頻包括多個部分,拿到包括各個部分url的列表 content_size = sum([item["size"] for item in urls]) #視頻總大小 print("視頻總大小爲:%0.2f Mb"%(content_size/(1024*1024))) if not os.path.exists(path): os.mkdir(path) header={ "Origin":"https://www.bilibili.com", "Referer":video_url, #請求頭必須添加referer } headers.update(header) size=0 start = time.time() for i,item in enumerate(urls): url = item["url"] try: result = requests.get(url,headers=headers,stream=True,verify=False) print result.status_code video_path = os.path.join(path,"{}.mp4".format(i)) with open(video_path,"wb") as f: for chunk in result.iter_content(1024): f.write(chunk) f.flush() #清空緩存 size = size+len(chunk) #print("已下載:%0.2f Mb"%(size/(1024*1024))) except Exception as e: print("url下載錯誤:%s"%url) print(e) stop = time.time() print("下載完成,耗時:%0.2f秒"%(stop-start)) #將下載的多個視頻拼接成一個完整的視頻 def concatenate(path,title,output="vidoes"): """ path 爲待拼接的視頻的保存地址 title 爲拼接後視頻的名稱 output 爲拼接後視頻保存的地址 """ with open("path.txt",'w') as f: for root,dirs,files in os.walk(path): for file in files: if os.path.splitext(file)[1] in [".flv",".mkv",".mp4"]: v_path = os.path.join(root,file) f.write("file '{}'\n".format(v_path)) if os.path.exists("path.txt"): if not os.path.exists(output): os.mkdir(output) try: print("開始合併視頻") path_name = os.path.join(output,title+".mp4") ffmpeg_command = r"D:\ffmpeg-win32-static\bin\ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name) #若將D:\ffmpeg-win32-static\bin\ffmpeg.exe路徑加入環境變量,能夠用"ffmpeg -f concat -safe 0 -i path.txt -c copy %s"%(path_name) #print ffmpeg_command subprocess.call(ffmpeg_command) subprocess.call("rmdir /s %s"%path) #windows 刪除目錄 subprocess.call("del path.txt") #windows 刪除文件 except Exception as e: print(e) if __name__=="__main__": # down_video("https://www.bilibili.com/video/av28879057") # concatenate("temp_videos",title="python") down_video("https://www.bilibili.com/video/av30292394?p=3") concatenate("temp_videos",title="python機器學習與量化分析")
參考:
https://amberwest.github.io/2018/09/11/%E7%94%A8python%E4%B8%8B%E8%BD%BD%E5%93%94%E5%93%A9%E5%93%94%E5%93%A9%E8%A7%86%E9%A2%91/
https://github.com/Henryhaohao/Bilibili_video_download
2. 爬取網易雲音樂
2.1 網頁分析
查看了下網頁版的網易雲音樂,也是每首歌有一個ID,以下,對應的網址組成爲 https://music.163.com/song?id=1353372483(請求時網易自動添加了一個「#」,從而變成了https://music.163.com/#/song?id=1352541009)
接着刷新網頁,看下網絡請求,一樣按大小排序,能夠發現一個較大的mp3傳輸請求,以下圖所示:該url即爲音樂的下載url,直接發送請求就能下載該視頻,剩下就是如何得到每首歌的下載url。
查看了下其餘xhr請求的返回值,發現了以下的返回值,能夠看到其包含了歌曲的相關信息,從中能夠拿到咱們須要的url。觀察這個請求,發現是一個post請求,須要提交表單數據,主要是兩個參數'params' 和'encSecKey', 可是是加密後的數據,以下第二張圖所示,所以須要對加密方法進行解析。
整理下思路,下載音樂的整個流程能夠分爲三步,以下:
1.經過get請求,訪問https://music.163.com/song?id=1353372483,能拿到歌曲的名字,歌詞等基本信息
2.經過post請求,提交兩個參數'params' 和'encSecKey',訪問https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=,從返回的json 數據中能拿到歌曲的下載地址和大小等信息
3. 訪問歌曲的下載地址(http://m10.music.126.net/20190407154531/74f897c9d014dede19a0905644433907/ymusic/035c/5458/530f/46ebf59083c2f04cc090de3b1e0beaf0.mp3),將其寫到本地,即完成下載信息
所以,剩下的就是如何構造加密後的兩個參數'params' 和'encSecKey'。點擊瀏覽器的source選項,在每一個js文件下搜索下encSecKey(或者直接ctrl+shift +f 全局搜索),在以下js文件中找到了相關的代碼,正好包括了咱們須要的兩個參數。
對上面的代碼進行分析,主要是var bYl2x = window.asrsea()這個函數完成具體的工做,搜索這個函數發現了以下的語句 window.asrsea = d, 即該函數是d函數,而d函數中調用了一次a函數,兩次b函數和一次c函數
其中a函數主要是產生一組隨機的字符竄,這裏是a(16)產生一個包含16個字符的隨機字符竄,上面js代碼和對應的python實現以下:
#a 函數 function a(a) { var d, e, b = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", c = ""; for (d = 0; a > d; d += 1) e = Math.random() * b.length, e = Math.floor(e), c += b.charAt(e); return c } #對應python產生隨機字符竄代碼 def random_str(size): return binascii.hexlify(os.urandom(size))[:16] #binascii.hexlify()接受byte字符竄,返回ascii字符竄
b函數是對數據進行AES對稱加密,js代碼和對應的python實現以下:
python須要用到Crypto模塊,pip install crypto安裝會有問題,經過以下方式安裝:(windows 7和python2.7環境安裝成功)
python -m pip install pycrypto
#b函數 function b(a, b) { var c = CryptoJS.enc.Utf8.parse(b) , d = CryptoJS.enc.Utf8.parse("0102030405060708") , e = CryptoJS.enc.Utf8.parse(a) , f = CryptoJS.AES.encrypt(e, c, { iv: d, mode: CryptoJS.mode.CBC }); return f.toString() } #python 實現b函數 from Crypto.Cipher import AES import base64 def get_params(text,key): #AES對稱加密 iv = '0102030405060708' pad = 16 - len(text)%16 text = text + pad * chr(pad) encryptor = AES.new(key, AES.MODE_CBC, iv) result = encryptor.encrypt(text) result_str = base64.b64encode(result).decode('utf-8') return result_str
c函數是對數據進行RSA不對稱加密,s代碼和對應的python實現以下:
#c函數 function c(a, b, c) { var d, e; return setMaxDigits(131), d = new RSAKeyPair(b,"",c), e = encryptedString(d, a) } #python實現c函數 def get_encSecKey(text,pubkey,modulus): #rsa不對稱加密 text = text[::-1] rs = pow(int(binascii.hexlify(text),16),int(pubkey,16),int(modulus,16)) return format(rs,'x').zfill(256)
接下來就該分析下window.asrsea()傳入的四個參數了,須要插入斷點,如圖所示,點擊某一行插入斷點,而後點擊播放音樂,執行到斷點處後,點擊右邊紅圈處的兩個按鈕(第一個向下執行一個過程,第二個向下執行一句),當咱們選中四個參數中的某一個時(複製時那樣選中),即能看到該參數的值。
以下圖是選中第二個參數時,顯示的值爲「010001」,說明第二個參數爲一個常量,查看其它參數後發現第二三四個參數都爲常量,第一個參數爲與id相關的json數據。四個參數的示例能夠見下面:
四個參數示例:
first_param = {"ids":"[1353194608]","level":"standard","encodeType":"aac","csrf_token":""} second_param = "010001" third_param = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7" fourth_param = "0CoJUm6Qyw8W8jud"
上面整個過程只須要利用歌曲的ID值和上面三個常量參數,就能夠構造最終的加密數據了,剩下的就是寫代碼了
2.2 歌曲下載
根據上面的分析過程,代碼書寫流程以下:
1,根據歌曲id值,訪問https://music.163.com/song?id=1353372483,利用正則表達式匹配網頁內容,得到歌曲名稱
2,計算加密後的參數'params' 和'encSecKey',post請求訪問https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=,拿到歌曲url和size
3. 訪問歌曲的下載地址,將結果寫到本地
完整代碼以下:
#coding:utf-8 import os import binascii from Crypto.Cipher import AES import base64 import json import requests import re first_param = {"ids":"[1353194608]","level":"standard","encodeType":"aac","csrf_token":""} second_param = "010001" third_param = "00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7" fourth_param = "0CoJUm6Qyw8W8jud" headers={ "Referer":"https://music.163.com/", "User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Mobile Safari/537.36" } def random_str(size): return binascii.hexlify(os.urandom(size))[:16] #binascii.hexlify()接受byte字符竄,返回ascii字符竄 def get_params(text,key): #AES對稱加密 iv = '0102030405060708' pad = 16 - len(text)%16 text = text + pad * chr(pad) encryptor = AES.new(key, AES.MODE_CBC, iv) result = encryptor.encrypt(text) result_str = base64.b64encode(result).decode('utf-8') return result_str def get_encSecKey(text,pubkey,modulus): #rsa不對稱加密 text = text[::-1] rs = pow(int(binascii.hexlify(text),16),int(pubkey,16),int(modulus,16)) return format(rs,'x').zfill(256) def encrypt_data(first_param,second_param,third_param,fourth_param): data={} i = random_str(16) temp = get_params(json.dumps(first_param),fourth_param) params = get_params(temp,i) encSecKey = get_encSecKey(i,second_param,third_param) data['params']=params.encode("utf-8") data['encSecKey']=encSecKey return data #獲取歌曲名稱 def get_song_title(id): url = "https://music.163.com/song?id=%s"%(id) response = requests.get(url,headers=headers) title = re.search(r'<title>(.*?)\s-',response.text).group(1) #匹配歌曲標題 #print(title) return title #獲取歌曲的下載地址,大小等信息 def get_song_info(id): first_param['ids'] = "[%s]"%id data = encrypt_data(first_param,second_param,third_param,fourth_param) url="https://music.163.com/weapi/song/enhance/player/url/v1?csrf_token=" response = requests.post(url,headers=headers,data=data) #print response.status_code json_data = json.loads(response.text) return json_data #下載歌曲 def down_song(id,down_url,song_title,size): filename = song_title+str(id)+".mp3" print("歌曲大小爲:%0.2f Mb"%(size/(1024*1024))) try: result = requests.get(down_url,headers=headers) with open(filename,"wb") as f: for chunk in result.iter_content(1024): f.write(chunk) f.flush() except Exception as e: print("下載失敗,id值爲:%s"%id) print(e) print("下載完成") if __name__=="__main__": id=input("請輸入歌曲的id值,如:1353194608 ") song_title = get_song_title(id) song_info=get_song_info(id) down_url = song_info["data"][0]["url"] size = song_info["data"][0]["size"] #print down_url,size down_song(id,down_url,song_title,size)
參考:
https://blog.csdn.net/qq_38282706/article/details/80251666
https://github.com/Jack-Cherish/python-spider/blob/master/Netease/Netease.py