Python簡單網頁爬蟲

時間 2019-12-09

標籤 python 簡單網頁爬蟲欄目 Python 简体版

原文原文鏈接

因爲Python2.x與Python3.x存在很的差別，Python2.x調用urllib用指令urllib.urlopen（），html

運行時報錯：AttributeError: module 'urllib' has no attribute 'urlopen'web

緣由是在Python3.X中應該用urllib.request。url

下載網頁成功後，調用webbrowsser模塊，輸入指令webbrowsserspa

.open_new_tab('baidu.com.html')code

truehtm

open('baidu.com.html'，‘w’)。write（html）blog

將下載的網頁寫入指定的目錄下，然而下載的網頁佔0KB，打開顯示空白，而後將上代碼改成get

open('baidu.com.html'，‘wb’)。write（html）it

就能夠打開了class

import urllib
>>> import urllib.request
>>> def getHtml(url):
    page = urllib.request.urlopen(url)
    html = page.read()
    return html

>>> import webbrowser
>>> webbrowser.open_new_tab('baidu.com.html')
True
>>> open('baidu.com.html','wb').write(html)

相關標籤/搜索