用python爬取知乎的熱榜,獲取標題和連接。html
環境和方法:ubantu16.0四、python三、requests、xpathpython
1.用瀏覽器打開知乎,並登陸瀏覽器
2.獲取cookie和User—Agentcookie
3.上代碼url
1 import requests 2 from lxml import etree 3 4 def get_html(url): 5 headers={ 6 'Cookie':'你的Cookie', 7 #'Host':'www.zhihu.com', 8 'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36' 9 } 10 11 r=requests.get(url,headers=headers) 12 13 if r.status_code==200: 14 deal_content(r.text) 15 16 def deal_content(r): 17 html = etree.HTML(r) 18 title_list = html.xpath('//*[@id="TopstoryContent"]/div/section/div[2]/a/h2') 19 link_list = html.xpath('//*[@id="TopstoryContent"]/div/section/div[2]/a/@href') 20 for i in range(0,len(title_list)): 21 print(title_list[i].text) 22 print(link_list[i]) 23 with open("zhihu.txt",'a') as f: 24 f.write(title_list[i].text+'\n') 25 f.write('\t連接爲:'+link_list[i]+'\n') 26 f.write('*'*50+'\n') 27 28 def main(): 29 url='https://www.zhihu.com/hot' 30 get_html(url) 31 32 main()
4.爬取結果spa