Python標準庫 urllib

時間 2019-12-20

標籤 python 標準 urllib 欄目 Python 简体版

原文原文鏈接

urllib是python的一個獲取url的模塊。它用urlopen函數的形式提供了一個很是簡潔的接口。這使得用各類各樣的協議獲取url成爲可能。它同時也提供了一個稍微複雜的接口來處理常見的情況-如基本的認證，cookies，代理，等等。這些都是由叫作opener和handler的對象來處理的。html

urllibpython

import urllib
s = urllib.urlopen('http://tieba.baidu.com/p/3606519228')
print s.read()  #將會打印出整個文件的html源代碼

s.readline() #打印Html代碼的第一行
s.getcode()  #返回Http狀態碼。若是是http請求，200請求成功完成;404網址未找到
s.info()     #返回一個httplib.HTTPMessage對象，表示遠程服務器返回的頭信息
s.geturl()   #返回請求的url

>>> s = urllib.urlopen('http://www.alwme.com/')
>>> byte = s.read()
>>> print("從 %s 上獲取了 %s 字節") % (s.geturl(),len(byte))
從 http://alwme.com/ 上獲取了 26834 字節

urlretrieve方法將url定位到的html文件下載到你本地的硬盤中，若是不指定filename，則會存爲臨時文件。緩存

urlretrieve() 返回一個二元組服務器

臨時存放：cookie

>>> filename = urllib.urlretrieve('http://www.alwme.com/')
>>> type(filename)
<type 'tuple'>
>>> print filename
('/tmp/tmpaOdE2g', <httplib.HTTPMessage instance at 0x7f1b021e8680>)

存爲本地文件：函數

>>> filename = urllib.urlretrieve('http://www.alwme.com/',filename='/home/zhg/temptest/alwme.html')
>>> type(filename)
<type 'tuple'>
>>> print filename
('/home/zhg/temptest/alwme.html', <httplib.HTTPMessage instance at 0x7f1b021e8a28>)

urllib.urlcleanup()   #清除因爲urllib.urlretrieve()所產生的緩存

相關標籤/搜索

python+urllib+beautifulsoup

python+urllib+beautifusoup

python+urllib+beautifulsoup+pymysql

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。