在工做環境中,訪問 Http 是再常見不過了,相應的庫也很是多,而 Requests 是當中比較好用的一個。python
除了常見的 GET、 POST、Delete、PUT 以外,timeout 的參數也是很是好用,它能夠防止請求堵塞太長時間,如:segmentfault
>>> requests.get('http://google.com', timeout=1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/api.py", line 75, in get return request('get', url, params=params, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/api.py", line 60, in request return session.request(method=method, url=url, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/sessions.py", line 533, in request resp = self.send(prep, **send_kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/sessions.py", line 646, in send r = adapter.send(request, **kwargs) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/adapters.py", line 504, in send raise ConnectTimeout(e, request=request) requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='google.com', port=80): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x10b467790>, 'Connection to google.com timed out. (connect timeout=1)'))
既然這個參數這麼好用,不免想知道實現的原理,因而就有此次探索!api
就拿咱們最經常使用的 GET 來分析吧,咱們須要先搞清楚調用的鏈路(比較長,畫了個簡圖輔助下):
緩存
從上面的調用圖能夠看到兩個關鍵點:session
結合下面的源碼,能夠看到咱們平時調用的 GET、 POST、Delete、PUT 等等都並無實際的內容,都只是 session.send 的封裝,而最根本的仍是 HTTPAdapter.send:架構
# requests/api.py def get(url, params=None, **kwargs): kwargs.setdefault('allow_redirects', True) return request('get', url, params=params, **kwargs) def request(method, url, **kwargs): with sessions.Session() as session: return session.request(method=method, url=url, **kwargs) # requests/sessions.py class Session(SessionRedirectMixin): def send(self, request, **kwargs): ... # Get the appropriate adapter to use adapter = self.get_adapter(url=request.url) # Send the request r = adapter.send(request, **kwargs) ...
在 HTTPAdapter 裏面維護着一個 PoolManager 和 ProxyManager 對象,ProxyManager 是繼承 PoolManager 的app
因此它有着 PoolManager 的全部特性,除此以外,還專門處理 proxy 的狀況,而在這裏咱們只簡單看 PoolManager。python2.7
PoolManager 顧名思義就是管理 Pool 的,主要是 HTTPConnectionPool 和 HTTPSConnectionPool 兩大類。socket
用戶調用函數發起請求時,PoolManager 將請求參數信息拆分和構建 pool_key,pool_key 的組成主要是如下幾個元素:tcp
# requests/poolmanager.py key_fields = ( "key_scheme", # str "key_host", # str "key_port", # int "key_timeout", # int or float or Timeout "key_retries", # int or Retry "key_strict", # bool "key_block", # bool "key_source_address", # str "key_key_file", # str "key_key_password", # str "key_cert_file", # str "key_cert_reqs", # str "key_ca_certs", # str "key_ssl_version", # str "key_ca_cert_dir", # str "key_ssl_context", # instance of ssl.SSLContext or urllib3.util.ssl_.SSLContext "key_maxsize", # int "key_headers", # dict "key__proxy", # parsed proxy url "key__proxy_headers", # dict "key_socket_options", # list of (level (int), optname (int), value (int or str)) tup les "key__socks_options", # dict "key_assert_hostname", # bool or string "key_assert_fingerprint", # str "key_server_hostname", # str )
咱們能夠理解成,即便在不一樣的請求中,若是上述的元素卻恰好命中,那麼就能命中 Pool 的緩存,省去構建的成本;
每一個 Pool 管理着對應的 HTTPConnection 對象,這裏的 HTTPConnection 對象並非具體的 http/tcp 連接,而是相似一個連接管理器的地位,它們來負責實質的數據請求、處理、關閉等。
先看 HTTPConnectionPool 源碼(只顯示關鍵的 _get_conn、_new_conn 和 _make_request 函數):
# urllib3/connectionpool.py class HTTPConnectionPool(ConnectionPool, RequestMethods): ConnectionCls = HTTPConnection def _get_conn(self, timeout=None): conn = None try: conn = self.pool.get(block=self.block, timeout=timeout) except AttributeError: # self.pool is None raise ClosedPoolError(self, "Pool is closed.") except queue.Empty: if self.block: raise EmptyPoolError( self, "Pool reached maximum size and no more connections are allowed.", ) pass # Oh well, we'll create a new connection then # If this is a persistent connection, check if it got disconnected if conn and is_connection_dropped(conn): log.debug("Resetting dropped connection: %s", self.host) conn.close() if getattr(conn, "auto_open", 1) == 0: conn = None return conn or self._new_conn() def _new_conn(self): """ Return a fresh :class:`HTTPConnection`. """ self.num_connections += 1 log.debug( "Starting new HTTP connection (%d): %s:%s", self.num_connections, self.host, self.port or "80", ) conn = self.ConnectionCls( host=self.host, port=self.port, timeout=self.timeout.connect_timeout, strict=self.strict, **self.conn_kw ) return conn def _make_request( self, conn, method, url, timeout=_Default, chunked=False, **httplib_request_kw ): self.num_requests += 1 timeout_obj = self._get_timeout(timeout) timeout_obj.start_connect() conn.timeout = timeout_obj.connect_timeout ... if chunked: conn.request_chunked(method, url, **httplib_request_kw) else: conn.request(method, url, **httplib_request_kw) # Reset the timeout for the recv() on the socket read_timeout = timeout_obj.read_timeout # App Engine doesn't have a sock attr if getattr(conn, "sock", None): if read_timeout == 0: raise ReadTimeoutError( self, url, "Read timed out. (read timeout=%s)" % read_timeout ) if read_timeout is Timeout.DEFAULT_TIMEOUT: conn.sock.settimeout(socket.getdefaulttimeout()) else: # None or a value conn.sock.settimeout(read_timeout) # Receive the response from the server try: try: # Python 2.7, use buffering of HTTP responses httplib_response = conn.getresponse(buffering=True) except TypeError: # Python 3 # Python 3 (including for exceptions like SystemExit). # Otherwise it looks like a bug in the code. six.raise_from(e, None) except (SocketTimeout, BaseSSLError, SocketError) as e: self._raise_timeout(err=e, url=url, timeout_value=read_timeout) raise ... return httplib_response ...(省略其餘)
從以前的調用圖和上述的源碼能夠看出,Pool 會調用 urlopen,其中經過 _get_conn 來獲取一個 HTTPConnection,這個函數會優先從 Pool 自身的隊列裏面獲取,若是有則直接返回;若是沒有,則須要經過 _new_conn 從新建立一個(在處理結束後會 put 進隊列)
得到一個 HTTPConnection 以後,由 _make_request 讓它嗨起來,這主要作三件事:
到了這裏,咱們終於看到了心心念唸的超時了! 太不容易了!
咱們直接來看看這個 conn.sock.settimeout 吧,在看以前發現一個小插曲,雖然在 connection.py 裏面能找到在HTTPConnection,但這個是不全的,由於它是繼承另外一個同名的:
from .packages.six.moves.http_client import HTTPConnection as _HTTPConnection
谷歌裏下才知道 six 的這個文件是爲了兼容 python 2 和 3 的,可是當前目錄只有一個 six.py,如何更簡單找到對應的文件呢?
後來靈機一動,經過 pyconsole 才找到實際的源碼位置:
原來如此!因而就能直接去 httplib.py 查看了:
# httplib.py class HTTPConnection: def __init__(self, host, port=None, strict=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, source_address=None): self.timeout = timeout self.source_address = source_address self.sock = None ... (self.host, self.port) = self._get_hostport(host, port) self._create_connection = socket.create_connection def connect(self): """Connect to the host and port specified in __init__.""" self.sock = self._create_connection((self.host,self.port), self.timeout, self.source_address) def send(self, data): """Send `data' to the server.""" if self.sock is None: if self.auto_open: self.connect() else: raise NotConnected() if self.debuglevel > 0: print "send:", repr(data) blocksize = 8192 if hasattr(data,'read') and not isinstance(data, array): if self.debuglevel > 0: print "sendIng a read()able" datablock = data.read(blocksize) while datablock: self.sock.sendall(datablock) datablock = data.read(blocksize) else: self.sock.sendall(data) ....(省略其餘)
在上文咱們須要特別關注一個函數:socket.create_connection,這是啥?說到 socket 咱們通常是直接這樣幹:
import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect((HOST, PORT)) ....
這個方法雖然能夠,可是在面對 ipv4 和 ipv6 時候不夠靈活,因此 socket 提供了另外一個比較方便的:
socket.create_connection(address[, timeout[, source_address]]) Connect to a TCP service listening on the Internet address (a 2-tuple (host, port)), and return the socket object. This is a higher-level function than socket.connect(): if host is a non-numeric hostname, it will try to resolve it for both AF_INET and AF_INET6, and then try to connect to all possible addresses in turn until a connection succeeds. This makes it easy to write clients that are compatible to both IPv4 and IPv6. Passing the optional timeout parameter will set the timeout on the socket instance before attempting to connect. If no timeout is supplied, the global default timeout setting returned by getdefaulttimeout() is used.
這樣一來咱們基本就明白狀況了, conn.sock 是 socket.create_connection 建立的 socket 對象,因此 settimeout 也天然是 socket 的屬性了~
那麼這個屬性是如何發揮做用呢?是在 connect 、recv 的時候發揮功效的:
原本想着挺簡單的一件事應該很快就能梳理完,可是沒想到越梳理坑越大,層層調用真是太過複雜了,以致於都要專門畫張調用圖來記錄(雖然其實也沒畫得很完整)。
不過能夠埋個伏筆,有精力分析整個庫的使用時能夠再補充,畢竟 Requests 真的挺優秀的,接口和架構設計感受都很清晰,在緩存的處理也有考慮,真的能夠花時間研究下。
同時意外地發現這樣繪圖的效果比較清晰和易於闡述,也能鍛鍊到的本身思惟的整理和抽象能力,你們也能夠嘗試下。
歡迎各位大神指點交流, QQ討論羣: 258498217
轉載請註明來源: http://www.javashuo.com/article/p-zsagomam-do.html