須要改進的點:
之間分析的時候在谷歌瀏覽自帶的抓包工具沒有顯示出ajax響應的數據,讓我誤覺得是在js中或者是tcp直接發的,其實複製ajax的請求地址在瀏覽器中直接就能夠看到響應內容了,另外明星和普通用戶的ajax地址是不同的,這點須要注意下
使用數據庫存儲,直接寫csv文件簡單,可是操做起來很不方便,查詢亦很差查詢
請求獲得的data長度小於500後不要直接停掉程序,多是代理ip被識別出來,使用retry再嘗試幾回
提升爬取效率,再試試線程池和scrapy,不知道scaray哪出問題了html
id 1594052081
超話
https://m.weibo.cn/api/container/getIndex?uid=1594052081&luicode=10000011&lfid=1005051594052081&type=uid&value=1594052081&containerid=2314751594052081
經過這個獲取containerid res["data"]["cardlistInfo"]["hide_oids"][0].split(":")[1]
1008083508b653c61b7212de392d3b67cc14a3
而後就能夠拼接名人堂的url了,鐵桿粉絲的url和名人堂是一級的ajax
名人堂
https://m.weibo.cn/api/container/getIndex?containerid=1008083508b653c61b7212de392d3b67cc14a3_-_hotuser&luicode=10000011&lfid=2314751594052081
經過名人堂的url能夠獲取下面的主持人團隊、周貢獻榜、粉絲大咖、鐵桿粉絲的url
res["data"]["cards"][""]數據庫
主持人團隊
https://m.weibo.cn/api/container/getIndex?containerid=1073033508b653c61b7212de392d3b67cc14a3_-_ext_super_emcee_team&luicode=10000011&lfid=1008083508b653c61b7212de392d3b67cc14a3_-_hotuser
https://m.weibo.cn/p/index?containerid=1073033508b653c61b7212de392d3b67cc14a3_-_ext_super_emcee_team&luicode=10000011&lfid=1008083508b653c61b7212de392d3b67cc14a3_-_hotuser
周貢獻榜
https://m.weibo.cn/api/container/getIndex?containerid=2311403508b653c61b7212de392d3b67cc14a3_-_contribute&luicode=10000011&lfid=1008083508b653c61b7212de392d3b67cc14a3_-_hotuser
粉絲大咖
https://m.weibo.cn/api/container/getIndex?title=粉絲大咖&containerid=1073033508b653c61b7212de392d3b67cc14a3_-_ext_super_fan_big_shot&luicode=10000011&lfid=1008083508b653c61b7212de392d3b67cc14a3_-_hotuser
鐵桿粉絲
https://m.weibo.cn/api/container/getIndex?containerid=1008083508b653c61b7212de392d3b67cc14a3_-_hotuser&luicode=10000011&lfid=2314751594052081&page=2api
1008083508b653c61b7212de392d3b67cc14a3瀏覽器
1195230310dom
https://weibo.com/p/aj/v6/mblog/mbloglist?ajwvr=6&domain=100306&is_search=0&visible=0&is_all=1&is_tag=0&profile_ftype=1&page=1&pagebar=1&pl_name=Pl_Official_MyProfileFeed__22&id=100306{uid}&script_uri=/hejiong&feed_type=0&pre_page=1&domain_op=100306
ajax請求
第一個
uid="1594052081",page=2,pagebar=0,prepage=1
第二個
uid="1594052081",page=2,pagebar=0,prepage=2
第三個
uid="1594052081",page=2,pagebar=1,prepage=2tcp
把data的字符串內容,將\/替換爲/,將/替換爲,將\r、\u200b去掉,將\t變爲\t(字符串變爲空白),ide