官方文檔:https://2.python-requests.org//en/master/python
工做中涉及到一個功能,須要上傳附件到一個接口,接口參數以下:json
使用http post提交附件 multipart/form-data 格式,url : http://test.com/flow/upload,api
1 字段列表: 2 md5: //md5加密(隨機值_當時時間戳) 3 filesize: //文件大小 4 file: //文件內容(須含文件名) 5 返回值: 6 {"success":true,"uploadName":"tmp.xml","uploadPath":"uploads\/201311\/758e875fb7c7a508feef6b5036119b9f"}
因爲工做中主要用python,而且項目中已有使用requests庫的地方,因此計劃使用requests來實現,原本覺得是很簡單的一個小功能,結果花費了大量的時間,requests官方的例子只提到了上傳文件,並不須要傳額外的參數:服務器
https://2.python-requests.org//en/master/user/quickstart/#post-a-multipart-encoded-filecookie
1 >>> url = 'https://httpbin.org/post' 2 >>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})} 3 4 >>> r = requests.post(url, files=files) 5 >>> r.text 6 { 7 ... 8 "files": { 9 "file": "<censored...binary...data>" 10 }, 11 ... 12 }
可是若是涉及到了參數的傳遞時,其實就要用到requests的兩個參數:data、files,將要上傳的文件傳入files,將其餘參數傳入data,request庫會將二者合併到一塊兒作一個multi part,而後發送給服務器。session
最終實現的代碼是這樣的:app
1 with open(file_name) as f: 2 content = f.read() 3 request_data = { 4 'md5':md5.md5('%d_%d' % (0, int(time.time()))).hexdigest(), 5 'filesize':len(content), 6 } 7 files = {'file':(file_name, open(file_name, 'rb'))} 8 MyLogger().getlogger().info('url:%s' % (request_url)) 9 resp = requests.post(request_url, data=request_data, files=files)
雖然最終代碼可能看起來很簡單,可是其實我費了好大功夫才確認這樣是OK的,中間還翻了requests的源碼,下面記錄一下翻閱源碼的過程:socket
首先,找到post方法的實現,在requests.api.py中:ide
1 def post(url, data=None, json=None, **kwargs): 2 r"""Sends a POST request. 3 4 :param url: URL for the new :class:`Request` object. 5 :param data: (optional) Dictionary, list of tuples, bytes, or file-like 6 object to send in the body of the :class:`Request`. 7 :param json: (optional) json data to send in the body of the :class:`Request`. 8 :param \*\*kwargs: Optional arguments that ``request`` takes. 9 :return: :class:`Response <Response>` object 10 :rtype: requests.Response 11 """ 12 13 return request('post', url, data=data, json=json, **kwargs)
這裏能夠看到它調用了request方法,我們繼續跟進request方法,在requests.api.py中:函數
1 def request(method, url, **kwargs): 2 """Constructs and sends a :class:`Request <Request>`. 3 4 :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``. 5 :param url: URL for the new :class:`Request` object. 6 :param params: (optional) Dictionary, list of tuples or bytes to send 7 in the query string for the :class:`Request`. 8 :param data: (optional) Dictionary, list of tuples, bytes, or file-like 9 object to send in the body of the :class:`Request`. 10 :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`. 11 :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`. 12 :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`. 13 :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload. 14 ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')`` 15 or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string 16 defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers 17 to add for the file. 18 :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth. 19 :param timeout: (optional) How many seconds to wait for the server to send data 20 before giving up, as a float, or a :ref:`(connect timeout, read 21 timeout) <timeouts>` tuple. 22 :type timeout: float or tuple 23 :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``. 24 :type allow_redirects: bool 25 :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy. 26 :param verify: (optional) Either a boolean, in which case it controls whether we verify 27 the server's TLS certificate, or a string, in which case it must be a path 28 to a CA bundle to use. Defaults to ``True``. 29 :param stream: (optional) if ``False``, the response content will be immediately downloaded. 30 :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. 31 :return: :class:`Response <Response>` object 32 :rtype: requests.Response 33 34 Usage:: 35 36 >>> import requests 37 >>> req = requests.request('GET', 'https://httpbin.org/get') 38 <Response [200]> 39 """ 40 41 # By using the 'with' statement we are sure the session is closed, thus we 42 # avoid leaving sockets open which can trigger a ResourceWarning in some 43 # cases, and look like a memory leak in others. 44 with sessions.Session() as session: 45 return session.request(method=method, url=url, **kwargs)
這個方法的註釋比較多,從註釋裏其實已經能夠看到files參數使用傳送文件,可是仍是沒法知道當須要同時傳遞參數和文件時該如何處理,繼續跟進session.request方法,在requests.session.py中:
1 def request(self, method, url, 2 params=None, data=None, headers=None, cookies=None, files=None, 3 auth=None, timeout=None, allow_redirects=True, proxies=None, 4 hooks=None, stream=None, verify=None, cert=None, json=None): 5 """Constructs a :class:`Request <Request>`, prepares it and sends it. 6 Returns :class:`Response <Response>` object. 7 8 :param method: method for the new :class:`Request` object. 9 :param url: URL for the new :class:`Request` object. 10 :param params: (optional) Dictionary or bytes to be sent in the query 11 string for the :class:`Request`. 12 :param data: (optional) Dictionary, list of tuples, bytes, or file-like 13 object to send in the body of the :class:`Request`. 14 :param json: (optional) json to send in the body of the 15 :class:`Request`. 16 :param headers: (optional) Dictionary of HTTP Headers to send with the 17 :class:`Request`. 18 :param cookies: (optional) Dict or CookieJar object to send with the 19 :class:`Request`. 20 :param files: (optional) Dictionary of ``'filename': file-like-objects`` 21 for multipart encoding upload. 22 :param auth: (optional) Auth tuple or callable to enable 23 Basic/Digest/Custom HTTP Auth. 24 :param timeout: (optional) How long to wait for the server to send 25 data before giving up, as a float, or a :ref:`(connect timeout, 26 read timeout) <timeouts>` tuple. 27 :type timeout: float or tuple 28 :param allow_redirects: (optional) Set to True by default. 29 :type allow_redirects: bool 30 :param proxies: (optional) Dictionary mapping protocol or protocol and 31 hostname to the URL of the proxy. 32 :param stream: (optional) whether to immediately download the response 33 content. Defaults to ``False``. 34 :param verify: (optional) Either a boolean, in which case it controls whether we verify 35 the server's TLS certificate, or a string, in which case it must be a path 36 to a CA bundle to use. Defaults to ``True``. 37 :param cert: (optional) if String, path to ssl client cert file (.pem). 38 If Tuple, ('cert', 'key') pair. 39 :rtype: requests.Response 40 """ 41 # Create the Request. 42 req = Request( 43 method=method.upper(), 44 url=url, 45 headers=headers, 46 files=files, 47 data=data or {}, 48 json=json, 49 params=params or {}, 50 auth=auth, 51 cookies=cookies, 52 hooks=hooks, 53 ) 54 prep = self.prepare_request(req) 55 56 proxies = proxies or {} 57 58 settings = self.merge_environment_settings( 59 prep.url, proxies, stream, verify, cert 60 ) 61 62 # Send the request. 63 send_kwargs = { 64 'timeout': timeout, 65 'allow_redirects': allow_redirects, 66 } 67 send_kwargs.update(settings) 68 resp = self.send(prep, **send_kwargs) 69 70 return resp
先大概看一下這個方法,先是準備request,最後一步是調用send,推測應該是發送請求了,因此咱們須要跟進到prepare_request方法中,在requests.session.py中:
1 def prepare_request(self, request): 2 """Constructs a :class:`PreparedRequest <PreparedRequest>` for 3 transmission and returns it. The :class:`PreparedRequest` has settings 4 merged from the :class:`Request <Request>` instance and those of the 5 :class:`Session`. 6 7 :param request: :class:`Request` instance to prepare with this 8 session's settings. 9 :rtype: requests.PreparedRequest 10 """ 11 cookies = request.cookies or {} 12 13 # Bootstrap CookieJar. 14 if not isinstance(cookies, cookielib.CookieJar): 15 cookies = cookiejar_from_dict(cookies) 16 17 # Merge with session cookies 18 merged_cookies = merge_cookies( 19 merge_cookies(RequestsCookieJar(), self.cookies), cookies) 20 21 # Set environment's basic authentication if not explicitly set. 22 auth = request.auth 23 if self.trust_env and not auth and not self.auth: 24 auth = get_netrc_auth(request.url) 25 26 p = PreparedRequest() 27 p.prepare( 28 method=request.method.upper(), 29 url=request.url, 30 files=request.files, 31 data=request.data, 32 json=request.json, 33 headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict), 34 params=merge_setting(request.params, self.params), 35 auth=merge_setting(auth, self.auth), 36 cookies=merged_cookies, 37 hooks=merge_hooks(request.hooks, self.hooks), 38 ) 39 return p
在prepare_request中,生成了一個PreparedRequest對象,並調用其prepare方法,跟進到prepare方法中,在requests.models.py中:
1 def prepare(self, 2 method=None, url=None, headers=None, files=None, data=None, 3 params=None, auth=None, cookies=None, hooks=None, json=None): 4 """Prepares the entire request with the given parameters.""" 5 6 self.prepare_method(method) 7 self.prepare_url(url, params) 8 self.prepare_headers(headers) 9 self.prepare_cookies(cookies) 10 self.prepare_body(data, files, json) 11 self.prepare_auth(auth, url) 12 13 # Note that prepare_auth must be last to enable authentication schemes 14 # such as OAuth to work on a fully prepared request. 15 16 # This MUST go after prepare_auth. Authenticators could add a hook 17 self.prepare_hooks(hooks)
這裏調用許多prepare_xx方法,這裏咱們只關心處理了data、files、json的方法,跟進到prepare_body中,在requests.models.py中:
1 def prepare_body(self, data, files, json=None): 2 """Prepares the given HTTP body data.""" 3 4 # Check if file, fo, generator, iterator. 5 # If not, run through normal process. 6 7 # Nottin' on you. 8 body = None 9 content_type = None 10 11 if not data and json is not None: 12 # urllib3 requires a bytes-like body. Python 2's json.dumps 13 # provides this natively, but Python 3 gives a Unicode string. 14 content_type = 'application/json' 15 body = complexjson.dumps(json) 16 if not isinstance(body, bytes): 17 body = body.encode('utf-8') 18 19 is_stream = all([ 20 hasattr(data, '__iter__'), 21 not isinstance(data, (basestring, list, tuple, Mapping)) 22 ]) 23 24 try: 25 length = super_len(data) 26 except (TypeError, AttributeError, UnsupportedOperation): 27 length = None 28 29 if is_stream: 30 body = data 31 32 if getattr(body, 'tell', None) is not None: 33 # Record the current file position before reading. 34 # This will allow us to rewind a file in the event 35 # of a redirect. 36 try: 37 self._body_position = body.tell() 38 except (IOError, OSError): 39 # This differentiates from None, allowing us to catch 40 # a failed `tell()` later when trying to rewind the body 41 self._body_position = object() 42 43 if files: 44 raise NotImplementedError('Streamed bodies and files are mutually exclusive.') 45 46 if length: 47 self.headers['Content-Length'] = builtin_str(length) 48 else: 49 self.headers['Transfer-Encoding'] = 'chunked' 50 else: 51 # Multi-part file uploads. 52 if files: 53 (body, content_type) = self._encode_files(files, data) 54 else: 55 if data: 56 body = self._encode_params(data) 57 if isinstance(data, basestring) or hasattr(data, 'read'): 58 content_type = None 59 else: 60 content_type = 'application/x-www-form-urlencoded' 61 62 self.prepare_content_length(body) 63 64 # Add content-type if it wasn't explicitly provided. 65 if content_type and ('content-type' not in self.headers): 66 self.headers['Content-Type'] = content_type 67 68 self.body = body
這個函數比較長,須要重點關注L52,這裏調用了_encode_files方法,咱們跟進這個方法:
1 def _encode_files(files, data): 2 """Build the body for a multipart/form-data request. 3 4 Will successfully encode files when passed as a dict or a list of 5 tuples. Order is retained if data is a list of tuples but arbitrary 6 if parameters are supplied as a dict. 7 The tuples may be 2-tuples (filename, fileobj), 3-tuples (filename, fileobj, contentype) 8 or 4-tuples (filename, fileobj, contentype, custom_headers). 9 """ 10 if (not files): 11 raise ValueError("Files must be provided.") 12 elif isinstance(data, basestring): 13 raise ValueError("Data must not be a string.") 14 15 new_fields = [] 16 fields = to_key_val_list(data or {}) 17 files = to_key_val_list(files or {}) 18 19 for field, val in fields: 20 if isinstance(val, basestring) or not hasattr(val, '__iter__'): 21 val = [val] 22 for v in val: 23 if v is not None: 24 # Don't call str() on bytestrings: in Py3 it all goes wrong. 25 if not isinstance(v, bytes): 26 v = str(v) 27 28 new_fields.append( 29 (field.decode('utf-8') if isinstance(field, bytes) else field, 30 v.encode('utf-8') if isinstance(v, str) else v)) 31 32 for (k, v) in files: 33 # support for explicit filename 34 ft = None 35 fh = None 36 if isinstance(v, (tuple, list)): 37 if len(v) == 2: 38 fn, fp = v 39 elif len(v) == 3: 40 fn, fp, ft = v 41 else: 42 fn, fp, ft, fh = v 43 else: 44 fn = guess_filename(v) or k 45 fp = v 46 47 if isinstance(fp, (str, bytes, bytearray)): 48 fdata = fp 49 elif hasattr(fp, 'read'): 50 fdata = fp.read() 51 elif fp is None: 52 continue 53 else: 54 fdata = fp 55 56 rf = RequestField(name=k, data=fdata, filename=fn, headers=fh) 57 rf.make_multipart(content_type=ft) 58 new_fields.append(rf) 59 60 body, content_type = encode_multipart_formdata(new_fields) 61 62 return body, content_type
OK,到此爲止,仔細閱讀完這個段代碼,就能夠搞明白requests.post方法傳入的data、files兩個參數的做用了,其實requests在這裏把它倆合併在一塊兒了,做爲post的body。