xiaolinBot（Twitter笑話集錦爬蟲Bot） Step1－最簡爬蟲

時間 2019-12-08

標籤 xiaolinbot 笑話集錦爬蟲 bot step1 step 欄目網絡爬蟲简体版

原文原文鏈接

Step1 - 最簡爬蟲

環境準備

Python3.5 最好使用venvpython

另外須要兩個必要的庫：git

requests : 一個封裝了HTTP服務的python庫github
pyquery : 相似Jquery，使用很是方便瀏覽器

$ pip install requests
$ pip install pyquery

開始

實現第一個應用

咱們第一個應用實現的功能主要以下：bash

訪問一個頁面,這裏咱們以糗事百科(http://www.qiushibaike.com/) 爲例網絡
得到頁面的內容優化
進行簡單的處理，得到咱們須要的內容spa

import requests
from pyquery import PyQuery as pq

__author__ = 'BONFY CHEN <foreverbonfy@163.com>'


SITE = 'http://www.qiushibaike.com/'
r = requests.get(SITE)
assert r.status_code == 200
d = pq(r.text)
contents = d("div .article")
for item in contents:
    i = pq(item)
    content = i("div .content").text()
    print(content)