urllib 學習二

時間 2019-12-14

標籤 urllib 學習简体版

原文原文鏈接

編碼解碼:

python2 用法：
    urllib.urlencode()   編碼
　　 urlparse.parse_qs()　　解碼


python3 用法：
　　urllib.parse.urlencode() 編碼 
　　urllib.parse.parse_qs() 解碼

做用：
    1）把字典數據轉化成URL編碼
    2）用途
        a)對URL參數進行編碼
        b)對post上去的form數據進行編碼

示例python

#python2.x

import urllib
import urlparse

def urlencode():
    params = {'score': 100, 'name': '爬蟲基礎', 'comment': 'very good'}
    qs = urllib.urlencode(params)     ###編碼
    print(qs)
    unqs = urlparse.parse_qs(qs)               ##解碼
    print unqs
if __name__ == '__main__':
    urlencode()


#python3.x

import urllib
import urllib.parse

def urlencode():
    params = {'score': 100, 'name': '爬蟲基礎', 'comment': 'very good'}
    qs = urllib.parse.urlencode(params)     ###編碼
    print(qs)
    unqs = urllib.parse.urlparse(qs)               ##解碼
    print (unqs)
if __name__ == '__main__':
    urlencode()

結果：python3.x



python2.x


urllib2中的兩個重要概念：Openers和Handlers

  1.Openers：
        當你獲取一個URL你使用一個opener(一個urllib2.OpenerDirector的實例)。
        正常狀況下，咱們使用默認opener：經過urlopen。
        但你可以建立個性的openers。

    2.Handles：
        openers使用處理器handlers，全部的「繁重」工做由handlers處理。
        每一個handlers知道如何經過特定協議打開URLs，或者如何處理URL打開時的各個方面。例如HTTP重定向或者HTTP cookies。


實例：

　　 import http.cookiejar
    import urllib2
    def cookies():
         cookejar = http.cookiejar.CookieJar()
         hadler = urllib.request.HTTPCookieProcessor(cookiejar=cookejar)
         opener = urllib.request.build_opener(hadler,urllib.request.HTTPHandler(debuglevel=1))  #####打印調試信息
         s = opener.open("http://www.douban.com")
         print (s.read(100))
         s.close()
         print ('=' * 80)
         print (cookejar._cookies)
         print ("=" * 80)
         s = opener.open("http://www.douban.com")
         s.close()
    
    cookies()


urllib2.Request

自定製headers

# -*- coding: utf-8 -*-
    import urllib2

    def request():
        # 定製 HTTP 頭
        headers = {'User-Agent': 'Mozilla/5.0', 'x-my-header': 'my value'}  #在http裏面自定義的頭通常是x開頭的
        req = urllib2.Request('http://blog.kamidox.com', headers=headers)  #建立一個請求
        s = urllib2.urlopen(req)   　　　　　　　　　　　#打開這個請求，urlopen不單單能夠接受一個網址做爲參數，也能夠接受request做爲它的對象
        print(s.read(100))
        s.close()

    if __name__ == '__main__':
        request()

urllib2.bulid-opener



    可讓咱們去定製這個http的行爲

    1）BeseHandler及其子類

        BeseHandler是全部HTTPHandler的父類

        a.HTTPHandler（處理http請求的）

        b.HTTPSHandler（處理安全連接的請求）

        c.HTTPCookieProcessor（處理cookie的請求）

    2）bulid-opener

        a.返回Handler列表，會把這個Handler都給它串起來，串起來之後，就相似於咱們管道同樣，當http請求應答回來的時候，它會流經這個Handler，讓這個裏面每個Handler去處理不一樣的東西

        b.返回OpenerDirector，這個參數很重要的方法就是open，這個open就是要去打開這個遠程的連接去處理這個數

    3）默認會建立的Handler鏈

        Handler的鏈，其實是一個Handler的數組，在調用這個urlOpener的時候，它默認的時候會是如下的鏈表，也就是，它會默認的把這個鏈表給你建立起來

        a.ProxyHandler（若是設置了代理）

        b.UnknownHandler（不知道什麼協議的時候，它會去調）

        d.HTTPHandler（處理http的請求）

        c.HTTPDefaultEorrorHandler（處理錯誤的請求）

        d.HTTPRedirectHandler（處理跳轉，好比http出現301,302這種應答碼的時候）

        e.FTPHandler（能夠去支持FTP的協議）

        f.FileHandler（能夠支持本地的文件打開）

        g.HTTPEorrorProcessor（能夠去處理http的錯誤）

        e.HTTPSHandler（若是安裝了ssl模塊）


實例：

# -*- coding: utf-8 -*-
        import urllib2
        import urllib

        def request_post_debug():
            # POST
            data = {'username': 'kamidox', 'password': 'xxxxxxxx'}   ##數據體
            # headers = {'User-Agent': 'Mozilla/5.0', 'Content-Type': 'plain/text'}
            headers = {'User-Agent': 'Mozilla/5.0'}   ###定製的頭
            req = urllib2.Request('http://www.douban.com', data=urllib.urlencode(data), headers=headers)  //建立一個請求，這個請求是發送給豆瓣
            opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1))  ##建立一個open打開器，若是不傳參數，它就是給你傳系統默認的Handler，若是咱們傳了參數給它，若是系統裏面默認有的它就去去替換掉，若是系統沒有它就去去添加
            s = opener.open(req)        ###用這個open去打開這個請求
            print(s.read(100))          ###打印前面100個字節
        s.close()

        if __name__ == '__main__':
            Request_post_debug()

若是我建立了一個opener，那我後面的函數還想用該怎麼辦呢？怎麼將這個opener保存起來呢？

保存opener爲默認

            1.urllib2.install_opener （咱們能夠把建立處理的opener，給它保存到urllib2這個庫裏面，調用urllib2的時候，就會直接去下載安裝這個opener了）

            2.示例：install_debug_opener

示例：

 # -*- coding: utf-8 -*-
            import urllib2

            def request():
                # 定製 HTTP 頭
                headers = {'User-Agent': 'Mozilla/5.0', 'x-my-header': 'my value'}
                req = urllib2.Request('http://blog.kamidox.com', headers=headers)
                s = urllib2.urlopen(req)
                print(s.read(100))
                print(req.headers)
            s.close()

            def request_post_debug():
                # POST
                data = {'username': 'kamidox', 'password': 'xxxxxxxx'}
                # headers = {'User-Agent': 'Mozilla/5.0', 'Content-Type': 'plain/text'}
                headers = {'User-Agent': 'Mozilla/5.0'}
                req = urllib2.Request('http://www.douban.com', data=urllib.urlencode(data), headers=headers)
                opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1))
                s = opener.open(req)
                print(s.read(100))
            s.close()

            def install_debug_handler():
                opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1),
                                              urllib2.HTTPSHandler(debuglevel=1))
                                              ##這裏能處理HTTP協議和HTTPS協議

                urllib2.install_opener(opener)   ##將Handler安裝到系統默認區，要打開的就是咱們這裏要安裝的opener

            if __name__ == '__main__':
                install_debug_handler()
            request()

Cookies

1）cookieillib.CookieJar

  提供解析並保存cookie的接口，由於cookie有些有生命週期，還有不少參數，這個類就是提供這些cookie的處理。

2）HTTPCookieProcessor

  提供自動處理cookie的功能，它的父類也是BeseHandler，因此咱們能夠把這個cookie串起來，這樣就能夠處理一些信息。



示例：handle_cookies

# -*- coding: utf-8 -*-
        import urllib2

        def handle_cookie():  　　　　　　　　　　#先定義一個處理cookie的信息
            cookiejar = cookielib.CookieJar()  ##先串聯一個CookieJar的對象
            handler = urllib2.HTTPCookieProcessor(cookiejar=cookiejar)  ##建立一個HTTPCookieProcessor的對象，傳入一個參數CookieJar進去

            ##還須要建立一個新的handler，打印出它的調試信息
            opener = urllib2.build_opener(handler, urllib2.HTTPHandler(debuglevel=1))
            s = opener.open('http://www.douban.com')
            print(s.read(100))
            s.close()
        if __name__ == '__main__':
            handle_cookie()

運行結果，這個應答裏面有一個set_cookie，有一個bid


在收到這個請求以後，咱們的CookieJar，包含這些服務器返回的cookie，咱們能夠把它打印出來。代碼以下：

# -*- coding: utf-8 -*-
        import urllib2

        def handle_cookie():            ##先定義一個處理cookie的信息
            cookiejar = cookielib.CookieJar()  ##先串聯一個CookieJar的對象
            handler = urllib2.HTTPCookieProcessor(cookiejar=cookiejar)  ##建立一個HTTPCookieProcessor的對象，傳入一個參數CookieJar進去
            opener = urllib2.build_opener(handler, urllib2.HTTPHandler(debuglevel=1))###還須要建立一個新的handler，打印出它的調試信息
            s = opener.open('http://www.douban.com')
            print(s.read(100))
            s.close()
            print('=' * 80)
            print(cookiejar._cookies)  ###這個屬性就是服務器全部的cookie
            print('=' * 80)

        if __name__ == '__main__':
            handle_cookie()


opener其實是帶着這些cookie信息的，那我下次再發一個請求過去的時候，它會把這個cookie也發送過去。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。