<!-- TOC -->python
Requests 源碼閱讀-Day2express
<!-- /TOC -->cookie
再來看這個文件:
tests/test_requests.pysession
def test_DIGEST_HTTP_200_OK_GET(self, httpbin): for authtype in self.digest_auth_algo: auth = HTTPDigestAuth('user', 'pass') url = httpbin('digest-auth', 'auth', 'user', 'pass', authtype, 'never') pytest.set_trace() r = requests.get(url, auth=auth) assert r.status_code == 200 r = requests.get(url) assert r.status_code == 401 print(r.headers['WWW-Authenticate']) s = requests.session() s.auth = HTTPDigestAuth('user', 'pass') r = s.get(url) assert r.status_code == 200
這裏咱們分析requests.get方法, 到requests模塊中找到__init__.py
文件
看到:app
from .api import request, get, head, post, patch, put, delete, options
OK, 不廢話 直接找到api.pysocket
def get(url, params=None, **kwargs): r"""Sends a GET request. :param url: URL for the new :class:`Request` object. :param params: (optional) Dictionary, list of tuples or bytes to send in the query string for the :class:`Request`. :param \*\*kwargs: Optional arguments that ``request`` takes. :return: :class:`Response <Response>` object :rtype: requests.Response """ kwargs.setdefault('allow_redirects', True) return request('get', url, params=params, **kwargs)
該方法的做用是向url指定的地址發起GET請求。ide
輸入參數分別爲:post
url:url全稱叫統一資源定位符,即訪問對象在互聯網中的惟一地址。params:可選參數,字典類型,爲請求提供查詢參數,最後構造到url中。
**kwargs
:參數前加**在方法中會轉換爲字典類型,做爲請求方法request的可選參數。kwargs.setdefault('allow_redirects', True),設置默認鍵值對,若鍵值不存在,則插入值爲"True"的鍵'allow_redirects'。
返回請求方法request對象。
再看這個request對象
def request(method, url, **kwargs): with sessions.Session() as session: return session.request(method=method, url=url, **kwargs)
請求方法request包含了許多輸入參數
with sessions.Session() as session,with語句的做用是確保 session對象不管是否正常運行都能確保正確退出,避免程序異常致使sockets接口沒法正常關閉。最後返回session.request對象。
那with是什麼?
with是用來實現上下文管理的。
那上下文管理是什麼?
爲了保證with對象不管是否正常運行都能確保正確退出。
with語句的原型以下:
with expression [as variable]: with-block
with語句中的[as variable]是可選的,若是指定了as variable說明符,則variable就是上下文管理器expression.__enter__()
方法返回的對象。
with-block是執行語句,with-block執行完畢時,with語句會自動調用expression.__exit__()
方法進行資源清理。
咱們常見的讀寫文件建議使用with寫法就是這個道理
例子:
file = open("welcome.txt") data = file.read() print(data) file.close()
使用with寫法:
with open("welcome.txt") as file: data = file.read() # do something
結合with語句,該部分代碼的實現一目瞭然:
session = sessions.Session().`__enter__`(self) # 也即Session實例自己。 session.request(method=method, url=url, **kwargs) # 爲with語句執行部分。
當執行部分session.request方法調用完成,
sessions.Session().__exit__
(self, *args)方法被調用,
接着Session對象中的close(self)方法被執行,
完成Session對象資源的銷燬,最後退出。
以上就是with語句的用途
其實with語句執行完後,requests.get方法也就執行完了,一次請求也即完成。
上面說的sessions實際上是導入了sessions.py這個文件裏面的
class Session(SessionRedirectMixin): """A Requests session. Provides cookie persistence, connection-pooling, and configuration. Basic Usage:: >>> import requests >>> s = requests.Session() >>> s.get('https://httpbin.org/get') <Response [200]> Or as a context manager:: >>> with requests.Session() as s: ... s.get('https://httpbin.org/get') <Response [200]> """ ...
Session是什麼? 這裏要好好研讀下源碼
主要功能:
支持持久性的cookies,使用urllib3鏈接池功能,對參數進行配置,爲request對象提供參數,擁有全部的請求方法等。
原來咱們全部的設置操做,真真正正開始執行是在Session對象裏。
同時Session繼承了類SessionRedirectMixin,這個類實現了重定向的接口方法。
重定向的意思就是當咱們經過url指定的路徑向服務器請求資源時,發現該資源並不在url指定的路徑上,這時服務器經過響應,給出新的資源地址,而後咱們經過新的url再次發起請求。- 這裏又涉及到了Mixin類的做用-這個會另外寫一章節進行講解
接下去,咱們來分析Session是如何被調用的。
前面提到過,Session調用時採用了with的方法,
而後咱們看下源碼中的with語句以及上下文管理器expression方法實現部分:
sessions.py
class Session(SessionRedirectMixin): ... def __enter__(self): return self def __exit__(self, *args): self.close() ... ... def close(self): """Closes all adapters and as such the session""" for v in self.adapters.values(): v.close()
Session.__enter__
回到with語句中session得到上下文管理器sessions.Session()的__enter__
(self)對象,
先會調用這個方法, 這個方法的返回值是 <requests.sessions.Session object at 0x7fb690228080>,也就是調用初始化方法__init__
(self)
接下來對__init__
方法分析
Session.__init__
def __init__(self): #: A case-insensitive dictionary of headers to be sent on each #: :class:`Request <Request>` sent from this #: :class:`Session <Session>`. self.headers = default_headers() #: Default Authentication tuple or object to attach to #: :class:`Request <Request>`. self.auth = None ...
初始化方法主要實現了參數的默認設置,包括headers,auth,proxies,stream,verify,cookies,hooks等等。
首先咱們看下header參數是怎麼寫的:
在發起一次請求時沒有設置headers參數,那麼header就會使用默認參數,由方法default_headers()來設置
utils.py
def default_headers(): """ :rtype: requests.structures.CaseInsensitiveDict """ return CaseInsensitiveDict({ 'User-Agent': default_user_agent(), 'Accept-Encoding': ', '.join(('gzip', 'deflate')), 'Accept': '*/*', 'Connection': 'keep-alive', })
這時你會發現header默認參數中用戶代理'User-Agent'將被設置爲"python-requests",
若是你正在寫爬蟲程序抓取某個網站的數據,那麼建議你儘快修改用戶代理,由於對方服務器可能很快就拒絕一個來之python的訪問。
這裏的CaseInsensitiveDict 方法是作什麼用的,一直往裏面分析。
它實際上是structures.py裏面的方法, 這個方法作什麼事情呢?
主要做用是:大小寫不敏感的dict key-value
class CaseInsensitiveDict(MutableMapping): def __init__(self, data=None, **kwargs): self._store = OrderedDict() if data is None: data = {} self.update(data, **kwargs) ...
首先它繼承了MutableMapping類,而後初始化作了一個賦值操做OrderedDict對象
兩個問題:
1.繼承MutableMapping類的做用是什麼?
2.OrderedDict對象是什麼東西?
慢慢來分析:
看到structures.py文件開頭上面的import 信息,
from .compat import Mapping, MutableMapping
咱們看到它實際上是導入的是compat.py裏面的MutableMapping方法
進入這個compat.py文件,看到首先進行了python版本判斷:
import chardet import sys # ------- # Pythons # ------- # Syntax sugar. _ver = sys.version_info #: Python 2.x? is_py2 = (_ver[0] == 2) #: Python 3.x? is_py3 = (_ver[0] == 3) ...
這裏我本機是python3 版本 直接看python3的
elif is_py3: from urllib.parse import urlparse, urlunparse, urljoin, urlsplit, urlencode, quote, unquote, quote_plus, unquote_plus, urldefrag from urllib.request import parse_http_list, getproxies, proxy_bypass, proxy_bypass_environment, getproxies_environment from http import cookiejar as cookielib from http.cookies import Morsel from io import StringIO # Keep OrderedDict for backwards compatibility. from collections import OrderedDict from collections.abc import Callable, Mapping, MutableMapping builtin_str = str str = str bytes = bytes basestring = (str, bytes) numeric_types = (int, float) integer_types = (int,)
能夠看到其實調用的是collections.abc模塊裏面的MutableMapping方法,而後咱們繼續分析
這個collections.abc模塊是系統自帶的模塊,根據python模塊路徑查找到這個模塊
collections/abc.py
from _collections_abc import * from _collections_abc import __all__
調用的是_collections_abc
模塊裏面的方法,繼續分析這個也是系統自帶的模塊
叫_collections_abc.py
, 進入這個模塊找到MutableMappingf方法
class MutableMapping(Mapping): __slots__ = () """A MutableMapping is a generic container for associating key/value pairs. This class provides concrete generic implementations of all methods except for __getitem__, __setitem__, __delitem__, __iter__, and __len__. """ @abstractmethod def __setitem__(self, key, value): raise KeyError @abstractmethod def __delitem__(self, key): raise KeyError __marker = object() def pop(self, key, default=__marker): '''D.pop(k[,d]) -> v, remove specified key and return the corresponding value. If key is not found, d is returned if given, otherwise KeyError is raised. ''' try: value = self[key] except KeyError: if default is self.__marker: raise return default else: del self[key] return value def popitem(self): '''D.popitem() -> (k, v), remove and return some (key, value) pair as a 2-tuple; but raise KeyError if D is empty. ''' try: key = next(iter(self)) except StopIteration: raise KeyError value = self[key] del self[key] return key, value def clear(self): 'D.clear() -> None. Remove all items from D.' try: while True: self.popitem() except KeyError: pass def update(*args, **kwds): ''' D.update([E, ]**F) -> None. Update D from mapping/iterable E and F. If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v ''' if not args: raise TypeError("descriptor 'update' of 'MutableMapping' object " "needs an argument") self, *args = args if len(args) > 1: raise TypeError('update expected at most 1 arguments, got %d' % len(args)) if args: other = args[0] if isinstance(other, Mapping): for key in other: self[key] = other[key] elif hasattr(other, "keys"): for key in other.keys(): self[key] = other[key] else: for key, value in other: self[key] = value for key, value in kwds.items(): self[key] = value def setdefault(self, key, default=None): 'D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D' try: return self[key] except KeyError: self[key] = default return default MutableMapping.register(dict)
首先它繼承了 Mapping 類,這個Mapping類是什麼東西,
繼續分析實際上是繼承了ABCMeta元類,這個ABC元類是作什麼的?我會單獨寫一章介紹這個
通讀代碼發現其實它的做用就是對字典進行了一系列操做
重點是最後一句:
MutableMapping.register(dict)
這裏的做用是將"子類"註冊爲該抽象基類的」抽象子類"
例如:
from abc import ABC class MyABC(ABC): pass MyABC.register(tuple) assert issubclass(tuple, MyABC) assert isinstance((), MyABC)
OrderedDict對象是什麼東西?
看到它實際上是導入了collections 模塊
from collections import OrderedDict
這個collections模塊也是系統自帶的,到模塊路徑查看具體內容:
找到collections.__init__.py
文件裏面的OrderedDict方法:
class OrderedDict(dict): 'Dictionary that remembers insertion order' # An inherited dict maps keys to values. # The inherited dict provides __getitem__, __len__, __contains__, and get. # The remaining methods are order-aware. # Big-O running times for all methods are the same as regular dictionaries ....
它的做用就是作了個順序的dict, 爲啥要作個順序dict, 是爲了解決啥問題呢
?
用傳統的dict 方法有什麼很差的地方呢?
python中的字典是無序的,由於它是按照hash來存儲的,可是OrderedDict,實現了對
字典對象中元素的排序,而且字典順序保證是插入順序
前面說的update方法 是繼承compat裏面的MutableMapping方法,繼續深挖,_collections_abc.py
, 進入這個模塊找到update方法
from .compat import Mapping, MutableMapping class CaseInsensitiveDict(MutableMapping): .... def update(*args, **kwds): ''' D.update([E, ]**F) -> None. Update D from mapping/iterable E and F. If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v ''' if not args: raise TypeError("descriptor 'update' of 'MutableMapping' object " "needs an argument") self, *args = args if len(args) > 1: raise TypeError('update expected at most 1 arguments, got %d' % len(args)) if args: other = args[0] if isinstance(other, Mapping): for key in other: self[key] = other[key] elif hasattr(other, "keys"): for key in other.keys(): self[key] = other[key] else: for key, value in other: self[key] = value for key, value in kwds.items(): self[key] = value
update的功能是對字典進行了一些key, value的賦值