xiaolinBot(Twitter笑話集錦爬蟲Bot) Step2-代碼優化

Step2 - 代碼優化

簡介

這篇咱們簡要的討論一下代碼優化,這裏主要討論兩點html

  1. 過程到函數python

  2. 加入對media的處理git

  3. PEP8github

咱們在Step1中的編碼是面向過程的,這個不利於複用,因此咱們簡單的將咱們前面的代碼函數化,方便之後擴展及別人的調用web

另外,Python代碼最好符合PEP8規範,方便本身和別人閱讀app

編碼

建立 utils/common.py

import os
import requests

PROXIES = None

HEADERS = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1)'
                  ' AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/38.0.2125.122 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,'
              'application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'zh-CN,zh;q=0.8'
}


################################
#
# requests Operation
#
################################


def GetPage(url, proxies=PROXIES, headers=HEADERS):
    r = requests.get(url, proxies=proxies, headers=headers)
    assert r.status_code == 200
    return r.text


def GetMedia(
        url, proxies=PROXIES, headers=HEADERS, chunk_size=512,
        media_type='pic'):
    r = requests.get(url, proxies=proxies, headers=headers, stream=True)
    filename = 'download/' + media_type + '/' + os.path.basename(url)
    with open(filename, 'wb') as fd:
        for chunk in r.iter_content(chunk_size):
            fd.write(chunk)
    return filename

這裏主要封裝了兩個方法: GetPage 與 GetMediaide

GetPage: 傳入頁面url 得到 整個頁面函數

GetMeida: 傳入圖片或者視頻的url, 下載媒體文件到 download/pic 或者 download/video(主要爲了後續支持百思不得姐的視頻)優化

main.py 更改成:

# coding: utf-8

from pyquery import PyQuery as pq
from utils.common import GetMedia, GetPage

__author__ = 'BONFY CHEN <foreverbonfy@163.com>'


####################
#
# main function
#
####################

def qiushi():
    url = 'http://www.qiushibaike.com/'
    page = GetPage(url)
    d = pq(page)
    contents = d("div .article")
    for item in contents:
        i = pq(item)
        pic_url = i("div .thumb img").attr.src
        content = i("div .content").text()
        id = i.attr.id
        if pic_url:
            pic_path = GetMedia(pic_url)
            print('pic - {id}: {content} \npic下載到{pic_path}'.format(
                id=id, content=content, pic_path=pic_path))
        else:
            print('text - {id}: {content}'.format(id=id, content=content))


def main():
    qiushi()


if __name__ == '__main__':
    main()

運行結果:

結果

PEP8

$ pip install pep8
$ pep8 xiaolinBot

而後若是有不符合規範的代碼,會顯示提示,而後去更改就好了編碼

PEP8

完整代碼

詳情見 https://github.com/bonfy/xiaolinBot

歡迎關注一塊兒交流

敬請期待下一篇: 適配器

相關文章
相關標籤/搜索