Python知乎熱門話題爬取

時間 2019-11-24

標籤 python 熱門話題欄目 Python 简体版

原文原文鏈接

本例子是參考崔老師的Python3網絡爬蟲開發實戰寫的html

看網頁界面：jquery

熱門話題都在 explore-feed feed-item的div裏面網絡

源碼以下：學習

import requests
from pyquery import PyQuery as pq

url='https://www.zhihu.com/explore'   #今日最熱
#url='https://www.zhihu.com/explore#monthly-hot'   #本月最熱
headers={
    'User-Agent':"Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5",
}
html=requests.get(url,headers=headers).text
doc=pq(html)
#print(doc)
items=doc('.explore-feed.feed-item').items()
for item in items:
    question=item.find('h2').text()
    #獲取問題
    print(question)
    author=item.find('.author-link').text()
    #獲取做者
    print(author)
    answer=pq(item.find('.content').html()).text()
    #獲取答案（老師寫的沒看懂，可能須要jquery知識）
    print(answer)
    print('===='*10)
    answer1=item.find('.zh-summary').text()
    #本身寫的獲取答案。。。
    print(answer1)

    #第一種寫入方法
    file=open('知乎.txt','a',encoding='utf-8')
    file.write('\n'.join([question,author,answer]))
    file.write('\n'+'****'*50+'\n')
    file.close()

    #第二種寫入方法 不須要寫關閉方法
    with open('知乎.txt','a',encoding='utf-8') as fp:
        fp.write('\n'.join([question, author, answer]))
        fp.write('\n' + '****' * 50 + '\n')

運行結果以下：url

不過比較奇怪的地方是 url爲今日最熱和本月最熱所爬取的結果如出一轍。。並且都只能爬下五個div裏面的東西，多是由於知乎是動態界面。須要用到selenium吧spa

還有就是3d

answer=pq(item.find('.content').html()).text()
#獲取答案（老師寫的沒看懂，可能須要jquery知識）

這行代碼沒有看懂。。。。code

還得學習jQueryhtm

相關標籤/搜索