requests上傳文件

官方文檔:https://2.python-requests.org//en/master/python

工做中涉及到一個功能,須要上傳附件到一個接口,接口參數以下:json

使用http post提交附件 multipart/form-data 格式,url : http://test.com/flow/upload,api

1 字段列表:
2 md5:            //md5加密(隨機值_當時時間戳)
3 filesize:   //文件大小
4 file:              //文件內容(須含文件名)
5 返回值:
6 {"success":true,"uploadName":"tmp.xml","uploadPath":"uploads\/201311\/758e875fb7c7a508feef6b5036119b9f"}

因爲工做中主要用python,而且項目中已有使用requests庫的地方,因此計劃使用requests來實現,原本覺得是很簡單的一個小功能,結果花費了大量的時間,requests官方的例子只提到了上傳文件,並不須要傳額外的參數:服務器

https://2.python-requests.org//en/master/user/quickstart/#post-a-multipart-encoded-filecookie

 1 >>> url = 'https://httpbin.org/post'
 2 >>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
 3 
 4 >>> r = requests.post(url, files=files)
 5 >>> r.text
 6 {
 7   ...
 8   "files": {
 9     "file": "<censored...binary...data>"
10   },
11   ...
12 }

可是若是涉及到了參數的傳遞時,其實就要用到requests的兩個參數:data、files,將要上傳的文件傳入files,將其餘參數傳入data,request庫會將二者合併到一塊兒作一個multi part,而後發送給服務器。session

最終實現的代碼是這樣的:app

1 with open(file_name) as f:
2   content = f.read()
3 request_data = { 
4    'md5':md5.md5('%d_%d' % (0, int(time.time()))).hexdigest(), 
5    'filesize':len(content), 
6 }   
7 files = {'file':(file_name, open(file_name, 'rb'))}
8 MyLogger().getlogger().info('url:%s' % (request_url))
9 resp = requests.post(request_url, data=request_data, files=files)

雖然最終代碼可能看起來很簡單,可是其實我費了好大功夫才確認這樣是OK的,中間還翻了requests的源碼,下面記錄一下翻閱源碼的過程:socket

首先,找到post方法的實現,在requests.api.py中:ide

 1 def post(url, data=None, json=None, **kwargs):
 2     r"""Sends a POST request.
 3 
 4     :param url: URL for the new :class:`Request` object.
 5     :param data: (optional) Dictionary, list of tuples, bytes, or file-like
 6         object to send in the body of the :class:`Request`.
 7     :param json: (optional) json data to send in the body of the :class:`Request`.
 8     :param \*\*kwargs: Optional arguments that ``request`` takes.
 9     :return: :class:`Response <Response>` object
10     :rtype: requests.Response
11     """
12 
13     return request('post', url, data=data, json=json, **kwargs)

這裏能夠看到它調用了request方法,我們繼續跟進request方法,在requests.api.py中:函數

 1 def request(method, url, **kwargs):
 2     """Constructs and sends a :class:`Request <Request>`.
 3 
 4     :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
 5     :param url: URL for the new :class:`Request` object.
 6     :param params: (optional) Dictionary, list of tuples or bytes to send
 7         in the query string for the :class:`Request`.
 8     :param data: (optional) Dictionary, list of tuples, bytes, or file-like
 9         object to send in the body of the :class:`Request`.
10     :param json: (optional) A JSON serializable Python object to send in the body of the :class:`Request`.
11     :param headers: (optional) Dictionary of HTTP Headers to send with the :class:`Request`.
12     :param cookies: (optional) Dict or CookieJar object to send with the :class:`Request`.
13     :param files: (optional) Dictionary of ``'name': file-like-objects`` (or ``{'name': file-tuple}``) for multipart encoding upload.
14         ``file-tuple`` can be a 2-tuple ``('filename', fileobj)``, 3-tuple ``('filename', fileobj, 'content_type')``
15         or a 4-tuple ``('filename', fileobj, 'content_type', custom_headers)``, where ``'content-type'`` is a string
16         defining the content type of the given file and ``custom_headers`` a dict-like object containing additional headers
17         to add for the file.
18     :param auth: (optional) Auth tuple to enable Basic/Digest/Custom HTTP Auth.
19     :param timeout: (optional) How many seconds to wait for the server to send data
20         before giving up, as a float, or a :ref:`(connect timeout, read
21         timeout) <timeouts>` tuple.
22     :type timeout: float or tuple
23     :param allow_redirects: (optional) Boolean. Enable/disable GET/OPTIONS/POST/PUT/PATCH/DELETE/HEAD redirection. Defaults to ``True``.
24     :type allow_redirects: bool
25     :param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
26     :param verify: (optional) Either a boolean, in which case it controls whether we verify
27             the server's TLS certificate, or a string, in which case it must be a path
28             to a CA bundle to use. Defaults to ``True``.
29     :param stream: (optional) if ``False``, the response content will be immediately downloaded.
30     :param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair.
31     :return: :class:`Response <Response>` object
32     :rtype: requests.Response
33 
34     Usage::
35 
36       >>> import requests
37       >>> req = requests.request('GET', 'https://httpbin.org/get')
38       <Response [200]>
39     """
40 
41     # By using the 'with' statement we are sure the session is closed, thus we
42     # avoid leaving sockets open which can trigger a ResourceWarning in some
43     # cases, and look like a memory leak in others.
44     with sessions.Session() as session:
45         return session.request(method=method, url=url, **kwargs)

這個方法的註釋比較多,從註釋裏其實已經能夠看到files參數使用傳送文件,可是仍是沒法知道當須要同時傳遞參數和文件時該如何處理,繼續跟進session.request方法,在requests.session.py中:

 1     def request(self, method, url,
 2             params=None, data=None, headers=None, cookies=None, files=None,
 3             auth=None, timeout=None, allow_redirects=True, proxies=None,
 4             hooks=None, stream=None, verify=None, cert=None, json=None):
 5         """Constructs a :class:`Request <Request>`, prepares it and sends it.
 6         Returns :class:`Response <Response>` object.
 7 
 8         :param method: method for the new :class:`Request` object.
 9         :param url: URL for the new :class:`Request` object.
10         :param params: (optional) Dictionary or bytes to be sent in the query
11             string for the :class:`Request`.
12         :param data: (optional) Dictionary, list of tuples, bytes, or file-like
13             object to send in the body of the :class:`Request`.
14         :param json: (optional) json to send in the body of the
15             :class:`Request`.
16         :param headers: (optional) Dictionary of HTTP Headers to send with the
17             :class:`Request`.
18         :param cookies: (optional) Dict or CookieJar object to send with the
19             :class:`Request`.
20         :param files: (optional) Dictionary of ``'filename': file-like-objects``
21             for multipart encoding upload.
22         :param auth: (optional) Auth tuple or callable to enable
23             Basic/Digest/Custom HTTP Auth.
24         :param timeout: (optional) How long to wait for the server to send
25             data before giving up, as a float, or a :ref:`(connect timeout,
26             read timeout) <timeouts>` tuple.
27         :type timeout: float or tuple
28         :param allow_redirects: (optional) Set to True by default.
29         :type allow_redirects: bool
30         :param proxies: (optional) Dictionary mapping protocol or protocol and
31             hostname to the URL of the proxy.
32         :param stream: (optional) whether to immediately download the response
33             content. Defaults to ``False``.
34         :param verify: (optional) Either a boolean, in which case it controls whether we verify
35             the server's TLS certificate, or a string, in which case it must be a path
36             to a CA bundle to use. Defaults to ``True``.
37         :param cert: (optional) if String, path to ssl client cert file (.pem).
38             If Tuple, ('cert', 'key') pair.
39         :rtype: requests.Response
40         """
41         # Create the Request.
42         req = Request(
43             method=method.upper(),
44             url=url,
45             headers=headers,
46             files=files,
47             data=data or {},
48             json=json,
49             params=params or {},
50             auth=auth,
51             cookies=cookies,
52             hooks=hooks,
53         )
54         prep = self.prepare_request(req)
55 
56         proxies = proxies or {}
57 
58         settings = self.merge_environment_settings(
59             prep.url, proxies, stream, verify, cert
60         )
61 
62         # Send the request.
63         send_kwargs = {
64             'timeout': timeout,
65             'allow_redirects': allow_redirects,
66         }
67         send_kwargs.update(settings)
68         resp = self.send(prep, **send_kwargs)
69 
70         return resp

先大概看一下這個方法,先是準備request,最後一步是調用send,推測應該是發送請求了,因此咱們須要跟進到prepare_request方法中,在requests.session.py中:

 1 def prepare_request(self, request):
 2         """Constructs a :class:`PreparedRequest <PreparedRequest>` for
 3         transmission and returns it. The :class:`PreparedRequest` has settings
 4         merged from the :class:`Request <Request>` instance and those of the
 5         :class:`Session`.
 6 
 7         :param request: :class:`Request` instance to prepare with this
 8             session's settings.
 9         :rtype: requests.PreparedRequest
10         """
11         cookies = request.cookies or {}
12 
13         # Bootstrap CookieJar.
14         if not isinstance(cookies, cookielib.CookieJar):
15             cookies = cookiejar_from_dict(cookies)
16 
17         # Merge with session cookies
18         merged_cookies = merge_cookies(
19             merge_cookies(RequestsCookieJar(), self.cookies), cookies)
20 
21         # Set environment's basic authentication if not explicitly set.
22         auth = request.auth
23         if self.trust_env and not auth and not self.auth:
24             auth = get_netrc_auth(request.url)
25 
26         p = PreparedRequest()
27         p.prepare(
28             method=request.method.upper(),
29             url=request.url,
30             files=request.files,
31             data=request.data,
32             json=request.json,
33             headers=merge_setting(request.headers, self.headers, dict_class=CaseInsensitiveDict),
34             params=merge_setting(request.params, self.params),
35             auth=merge_setting(auth, self.auth),
36             cookies=merged_cookies,
37             hooks=merge_hooks(request.hooks, self.hooks),
38         )
39         return p

在prepare_request中,生成了一個PreparedRequest對象,並調用其prepare方法,跟進到prepare方法中,在requests.models.py中:

 1 def prepare(self,
 2             method=None, url=None, headers=None, files=None, data=None,
 3             params=None, auth=None, cookies=None, hooks=None, json=None):
 4         """Prepares the entire request with the given parameters."""
 5 
 6         self.prepare_method(method)
 7         self.prepare_url(url, params)
 8         self.prepare_headers(headers)
 9         self.prepare_cookies(cookies)
10         self.prepare_body(data, files, json)
11         self.prepare_auth(auth, url)
12 
13         # Note that prepare_auth must be last to enable authentication schemes
14         # such as OAuth to work on a fully prepared request.
15 
16         # This MUST go after prepare_auth. Authenticators could add a hook
17         self.prepare_hooks(hooks)

這裏調用許多prepare_xx方法,這裏咱們只關心處理了data、files、json的方法,跟進到prepare_body中,在requests.models.py中:

 1 def prepare_body(self, data, files, json=None):
 2         """Prepares the given HTTP body data."""
 3 
 4         # Check if file, fo, generator, iterator.
 5         # If not, run through normal process.
 6 
 7         # Nottin' on you.
 8         body = None
 9         content_type = None
10 
11         if not data and json is not None:
12             # urllib3 requires a bytes-like body. Python 2's json.dumps
13             # provides this natively, but Python 3 gives a Unicode string.
14             content_type = 'application/json'
15             body = complexjson.dumps(json)
16             if not isinstance(body, bytes):
17                 body = body.encode('utf-8')
18 
19         is_stream = all([
20             hasattr(data, '__iter__'),
21             not isinstance(data, (basestring, list, tuple, Mapping))
22         ])
23 
24         try:
25             length = super_len(data)
26         except (TypeError, AttributeError, UnsupportedOperation):
27             length = None
28 
29         if is_stream:
30             body = data
31 
32             if getattr(body, 'tell', None) is not None:
33                 # Record the current file position before reading.
34                 # This will allow us to rewind a file in the event
35                 # of a redirect.
36                 try:
37                     self._body_position = body.tell()
38                 except (IOError, OSError):
39                     # This differentiates from None, allowing us to catch
40                     # a failed `tell()` later when trying to rewind the body
41                     self._body_position = object()
42 
43             if files:
44                 raise NotImplementedError('Streamed bodies and files are mutually exclusive.')
45 
46             if length:
47                 self.headers['Content-Length'] = builtin_str(length)
48             else:
49                 self.headers['Transfer-Encoding'] = 'chunked'
50         else:
51             # Multi-part file uploads.
52             if files:
53                 (body, content_type) = self._encode_files(files, data)
54             else:
55                 if data:
56                     body = self._encode_params(data)
57                     if isinstance(data, basestring) or hasattr(data, 'read'):
58                         content_type = None
59                     else:
60                         content_type = 'application/x-www-form-urlencoded'
61 
62             self.prepare_content_length(body)
63 
64             # Add content-type if it wasn't explicitly provided.
65             if content_type and ('content-type' not in self.headers):
66                 self.headers['Content-Type'] = content_type
67 
68         self.body = body

這個函數比較長,須要重點關注L52,這裏調用了_encode_files方法,咱們跟進這個方法:

 1     def _encode_files(files, data):
 2         """Build the body for a multipart/form-data request.
 3 
 4         Will successfully encode files when passed as a dict or a list of
 5         tuples. Order is retained if data is a list of tuples but arbitrary
 6         if parameters are supplied as a dict.
 7         The tuples may be 2-tuples (filename, fileobj), 3-tuples (filename, fileobj, contentype)
 8         or 4-tuples (filename, fileobj, contentype, custom_headers).
 9         """
10         if (not files):
11             raise ValueError("Files must be provided.")
12         elif isinstance(data, basestring):
13             raise ValueError("Data must not be a string.")
14 
15         new_fields = []
16         fields = to_key_val_list(data or {})
17         files = to_key_val_list(files or {})
18 
19         for field, val in fields:
20             if isinstance(val, basestring) or not hasattr(val, '__iter__'):
21                 val = [val]
22             for v in val:
23                 if v is not None:
24                     # Don't call str() on bytestrings: in Py3 it all goes wrong.
25                     if not isinstance(v, bytes):
26                         v = str(v)
27 
28                     new_fields.append(
29                         (field.decode('utf-8') if isinstance(field, bytes) else field,
30                          v.encode('utf-8') if isinstance(v, str) else v))
31 
32         for (k, v) in files:
33             # support for explicit filename
34             ft = None
35             fh = None
36             if isinstance(v, (tuple, list)):
37                 if len(v) == 2:
38                     fn, fp = v
39                 elif len(v) == 3:
40                     fn, fp, ft = v
41                 else:
42                     fn, fp, ft, fh = v
43             else:
44                 fn = guess_filename(v) or k
45                 fp = v
46 
47             if isinstance(fp, (str, bytes, bytearray)):
48                 fdata = fp
49             elif hasattr(fp, 'read'):
50                 fdata = fp.read()
51             elif fp is None:
52                 continue
53             else:
54                 fdata = fp
55 
56             rf = RequestField(name=k, data=fdata, filename=fn, headers=fh)
57             rf.make_multipart(content_type=ft)
58             new_fields.append(rf)
59 
60         body, content_type = encode_multipart_formdata(new_fields)
61 
62         return body, content_type

OK,到此爲止,仔細閱讀完這個段代碼,就能夠搞明白requests.post方法傳入的data、files兩個參數的做用了,其實requests在這裏把它倆合併在一塊兒了,做爲post的body。

相關文章
相關標籤/搜索