Requests 源碼閱讀-Day2

Requests 源碼閱讀-Day2

<!-- TOC -->python

<!-- /TOC -->cookie

get方法

再來看這個文件:
tests/test_requests.pysession

def test_DIGEST_HTTP_200_OK_GET(self, httpbin):

        for authtype in self.digest_auth_algo:
            auth = HTTPDigestAuth('user', 'pass')
            url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never')
            pytest.set_trace()

            r = requests.get(url, auth=auth)
            assert r.status_code == 200

            r = requests.get(url)
            assert r.status_code == 401
            print(r.headers['WWW-Authenticate'])

            s = requests.session()
            s.auth = HTTPDigestAuth('user', 'pass')
            r = s.get(url)
            assert r.status_code == 200

這裏咱們分析requests.get方法, 到requests模塊中找到__init__.py文件
看到:app

from .api import request, get, head, post, patch, put, delete, options



OK, 不廢話 直接找到api.pysocket

def get(url, params=None, **kwargs):
    r"""Sends a GET request.

    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    :param \*\*kwargs: Optional arguments that ``request`` takes.
    :return: :class:`Response <Response>` object
    :rtype: requests.Response
    """

    kwargs.setdefault('allow_redirects', True)
    return request('get', url, params=params, **kwargs)


該方法的做用是向url指定的地址發起GET請求。ide

輸入參數分別爲:post

url:url全稱叫統一資源定位符,即訪問對象在互聯網中的惟一地址。

params:可選參數,字典類型,爲請求提供查詢參數,最後構造到url中。

**kwargs:參數前加**在方法中會轉換爲字典類型,做爲請求方法request的可選參數。

kwargs.setdefault('allow_redirects', True),設置默認鍵值對,若鍵值不存在,則插入值爲"True"的鍵'allow_redirects'。

返回請求方法request對象。

request

再看這個request對象

def request(method, url, **kwargs):
        with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)


請求方法request包含了許多輸入參數

with sessions.Session() as session,with語句的做用是確保 session對象不管是否正常運行都能確保正確退出,避免程序異常致使sockets接口沒法正常關閉

最後返回session.request對象。

那with是什麼?with是用來實現上下文管理的。

那上下文管理是什麼?爲了保證with對象不管是否正常運行都能確保正確退出。

with語句的原型以下:

with expression [as variable]:
    with-block

with語句中的[as variable]是可選的,若是指定了as variable說明符,則variable就是上下文管理器expression.__enter__()方法返回的對象。

with-block是執行語句,with-block執行完畢時,with語句會自動調用expression.__exit__()方法進行資源清理。

咱們常見的讀寫文件建議使用with寫法就是這個道理

例子:

file = open("welcome.txt")
data = file.read()
print(data)

file.close()

使用with寫法:

with open("welcome.txt") as file:
    data = file.read()
    # do something

結合with語句,該部分代碼的實現一目瞭然:

session = sessions.Session().`__enter__`(self)  # 也即Session實例自己。
session.request(method=method, url=url, **kwargs) # 爲with語句執行部分。

當執行部分session.request方法調用完成,

sessions.Session().__exit__(self, *args)方法被調用,

接着Session對象中的close(self)方法被執行,

完成Session對象資源的銷燬,最後退出。

以上就是with語句的用途

其實with語句執行完後,requests.get方法也就執行完了,一次請求也即完成。

Session

上面說的sessions實際上是導入了sessions.py這個文件裏面的

class Session(SessionRedirectMixin):
    """A Requests session.

    Provides cookie persistence, connection-pooling, and configuration.

    Basic Usage::

      >>> import requests
      >>> s = requests.Session()
      >>> s.get('https://httpbin.org/get')
      <Response [200]>

    Or as a context manager::

      >>> with requests.Session() as s:
      ...     s.get('https://httpbin.org/get')
      <Response [200]>
    """
    ...

Session是什麼? 這裏要好好研讀下源碼

主要功能:

支持持久性的cookies,使用urllib3鏈接池功能,對參數進行配置,爲request對象提供參數,擁有全部的請求方法等。

原來咱們全部的設置操做,真真正正開始執行是在Session對象裏。

同時Session繼承了類SessionRedirectMixin,這個類實現了重定向的接口方法。

重定向的意思就是當咱們經過url指定的路徑向服務器請求資源時,發現該資源並不在url指定的路徑上,這時服務器經過響應,給出新的資源地址,而後咱們經過新的url再次發起請求。- 這裏又涉及到了Mixin類的做用-這個會另外寫一章節進行講解

接下去,咱們來分析Session是如何被調用的。

前面提到過,Session調用時採用了with的方法,
而後咱們看下源碼中的with語句以及上下文管理器expression方法實現部分:

sessions.py

class Session(SessionRedirectMixin):

    ...
    def __enter__(self):
        return self

    def __exit__(self, *args):
        self.close()
        ...
    
    ...
    def close(self):
        """Closes all adapters and as such the session"""
        for v in self.adapters.values():
            v.close()

Session.__enter__

回到with語句中session得到上下文管理器sessions.Session()的__enter__(self)對象,

先會調用這個方法, 這個方法的返回值是 <requests.sessions.Session object at 0x7fb690228080>,也就是調用初始化方法__init__(self)

接下來對__init__方法分析

Session.__init__

def __init__(self):

        #: A case-insensitive dictionary of headers to be sent on each
        #: :class:`Request <Request>` sent from this
        #: :class:`Session <Session>`.
        self.headers = default_headers()

        #: Default Authentication tuple or object to attach to
        #: :class:`Request <Request>`.
        self.auth = None
        ...


初始化方法主要實現了參數的默認設置,包括headers,auth,proxies,stream,verify,cookies,hooks等等。

首先咱們看下header參數是怎麼寫的:

在發起一次請求時沒有設置headers參數,那麼header就會使用默認參數,由方法default_headers()來設置

utils.py

def default_headers():
    """
    :rtype: requests.structures.CaseInsensitiveDict
    """
    return CaseInsensitiveDict({
        'User-Agent': default_user_agent(),
        'Accept-Encoding': ', '.join(('gzip', 'deflate')),
        'Accept': '*/*',
        'Connection': 'keep-alive',
    })

這時你會發現header默認參數中用戶代理'User-Agent'將被設置爲"python-requests",

若是你正在寫爬蟲程序抓取某個網站的數據,那麼建議你儘快修改用戶代理,由於對方服務器可能很快就拒絕一個來之python的訪問。

這裏的CaseInsensitiveDict 方法是作什麼用的,一直往裏面分析。

它實際上是structures.py裏面的方法, 這個方法作什麼事情呢?

主要做用是:大小寫不敏感的dict key-value

class CaseInsensitiveDict(MutableMapping):
    def __init__(self, data=None, **kwargs):
        self._store = OrderedDict()
        if data is None:
            data = {}
        self.update(data, **kwargs)
       ...

首先它繼承了MutableMapping類,而後初始化作了一個賦值操做OrderedDict對象

兩個問題:

1.繼承MutableMapping類的做用是什麼?

2.OrderedDict對象是什麼東西?

慢慢來分析:

問題1

看到structures.py文件開頭上面的import 信息,

from .compat import Mapping, MutableMapping

咱們看到它實際上是導入的是compat.py裏面的MutableMapping方法

進入這個compat.py文件,看到首先進行了python版本判斷:

import chardet

import sys

# -------
# Pythons
# -------

# Syntax sugar.
_ver = sys.version_info

#: Python 2.x?
is_py2 = (_ver[0] == 2)

#: Python 3.x?
is_py3 = (_ver[0] == 3)
...

這裏我本機是python3 版本 直接看python3的

elif is_py3:
    from urllib.parse import urlparse, urlunparse, urljoin, urlsplit, urlencode, quote, unquote, quote_plus, unquote_plus, urldefrag
    from urllib.request import parse_http_list, getproxies, proxy_bypass, proxy_bypass_environment, getproxies_environment
    from http import cookiejar as cookielib
    from http.cookies import Morsel
    from io import StringIO
    # Keep OrderedDict for backwards compatibility.
    from collections import OrderedDict
    from collections.abc import Callable, Mapping, MutableMapping

    builtin_str = str
    str = str
    bytes = bytes
    basestring = (str, bytes)
    numeric_types = (int, float)
    integer_types = (int,)

能夠看到其實調用的是collections.abc模塊裏面的MutableMapping方法,而後咱們繼續分析

這個collections.abc模塊是系統自帶的模塊,根據python模塊路徑查找到這個模塊

collections/abc.py

from _collections_abc import *
from _collections_abc import __all__

調用的是_collections_abc模塊裏面的方法,繼續分析這個也是系統自帶的模塊

_collections_abc.py, 進入這個模塊找到MutableMappingf方法

class MutableMapping(Mapping):

    __slots__ = ()
    """A MutableMapping is a generic container for associating
    key/value pairs.

    This class provides concrete generic implementations of all
    methods except for __getitem__, __setitem__, __delitem__,
    __iter__, and __len__.

    """

    @abstractmethod
    def __setitem__(self, key, value):
        raise KeyError

    @abstractmethod
    def __delitem__(self, key):
        raise KeyError

    __marker = object()

    def pop(self, key, default=__marker):
        '''D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
          If key is not found, d is returned if given, otherwise KeyError is raised.
        '''
        try:
            value = self[key]
        except KeyError:
            if default is self.__marker:
                raise
            return default
        else:
            del self[key]
            return value

    def popitem(self):
        '''D.popitem() -> (k, v), remove and return some (key, value) pair
           as a 2-tuple; but raise KeyError if D is empty.
        '''
        try:
            key = next(iter(self))
        except StopIteration:
            raise KeyError
        value = self[key]
        del self[key]
        return key, value

    def clear(self):
        'D.clear() -> None.  Remove all items from D.'
        try:
            while True:
                self.popitem()
        except KeyError:
            pass

    def update(*args, **kwds):
        ''' D.update([E, ]**F) -> None.  Update D from mapping/iterable E and F.
            If E present and has a .keys() method, does:     for k in E: D[k] = E[k]
            If E present and lacks .keys() method, does:     for (k, v) in E: D[k] = v
            In either case, this is followed by: for k, v in F.items(): D[k] = v
        '''
        if not args:
            raise TypeError("descriptor 'update' of 'MutableMapping' object "
                            "needs an argument")
        self, *args = args
        if len(args) > 1:
            raise TypeError('update expected at most 1 arguments, got %d' %
                            len(args))
        if args:
            other = args[0]
            if isinstance(other, Mapping):
                for key in other:
                    self[key] = other[key]
            elif hasattr(other, "keys"):
                for key in other.keys():
                    self[key] = other[key]
            else:
                for key, value in other:
                    self[key] = value
        for key, value in kwds.items():
            self[key] = value

    def setdefault(self, key, default=None):
        'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D'
        try:
            return self[key]
        except KeyError:
            self[key] = default
        return default


MutableMapping.register(dict)

首先它繼承了 Mapping 類,這個Mapping類是什麼東西,

繼續分析實際上是繼承了ABCMeta元類,這個ABC元類是作什麼的?我會單獨寫一章介紹這個

通讀代碼發現其實它的做用就是對字典進行了一系列操做

重點是最後一句:

MutableMapping.register(dict)

這裏的做用是將"子類"註冊爲該抽象基類的」抽象子類"

例如:

from abc import ABC

class MyABC(ABC):
    pass

MyABC.register(tuple)

assert issubclass(tuple, MyABC)
assert isinstance((), MyABC)

問題2

OrderedDict對象是什麼東西?

看到它實際上是導入了collections 模塊

from collections import OrderedDict

這個collections模塊也是系統自帶的,到模塊路徑查看具體內容:

找到collections.__init__.py文件裏面的OrderedDict方法:

class OrderedDict(dict):
    'Dictionary that remembers insertion order'
    # An inherited dict maps keys to values.
    # The inherited dict provides __getitem__, __len__, __contains__, and get.
    # The remaining methods are order-aware.
    # Big-O running times for all methods are the same as regular dictionaries
    
    ....

它的做用就是作了個順序的dict, 爲啥要作個順序dict, 是爲了解決啥問題呢

用傳統的dict 方法有什麼很差的地方呢?

python中的字典是無序的,由於它是按照hash來存儲的,可是OrderedDict,實現了對

字典對象中元素的排序,而且字典順序保證是插入順序

前面說的update方法 是繼承compat裏面的MutableMapping方法,繼續深挖,
_collections_abc.py, 進入這個模塊找到update方法

from .compat import Mapping, MutableMapping


class CaseInsensitiveDict(MutableMapping):
....
def update(*args, **kwds):
        ''' D.update([E, ]**F) -> None.  Update D from mapping/iterable E and F.
            If E present and has a .keys() method, does:     for k in E: D[k] = E[k]
            If E present and lacks .keys() method, does:     for (k, v) in E: D[k] = v
            In either case, this is followed by: for k, v in F.items(): D[k] = v
        '''
        if not args:
            raise TypeError("descriptor 'update' of 'MutableMapping' object "
                            "needs an argument")
        self, *args = args
        if len(args) > 1:
            raise TypeError('update expected at most 1 arguments, got %d' %
                            len(args))
        if args:
            other = args[0]
            if isinstance(other, Mapping):
                for key in other:
                    self[key] = other[key]
            elif hasattr(other, "keys"):
                for key in other.keys():
                    self[key] = other[key]
            else:
                for key, value in other:
                    self[key] = value

        for key, value in kwds.items():
            self[key] = value

update的功能是對字典進行了一些key, value的賦值

相關文章
相關標籤/搜索