瞭解HTTP協議

什麼是HTTP協議

  • HyperText Transfer Protocol超文本傳輸協議
  • The Hypertext Transfer Protocol(HTTP) is a stateless(無狀態) application-level protocl for distributed(分佈式), collaborative(協做式),hypertext information systems(超文本信息系統)(referred:wikipedia)

Chrome開發者工具

ctrl+shift+Ihtml

curl命令訪問網站

curl -v http://baidu.com > tmp.txtpython

* Rebuilt URL to: http://baidu.com/
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 123.125.114.144...
* TCP_NODELAY set
* Connected to baidu.com (123.125.114.144) port 80 (#0)
> GET / HTTP/1.1
> Host: baidu.com
> User-Agent: curl/7.55.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Sat, 20 Apr 2019 08:15:07 GMT
< Server: Apache
< Last-Modified: Tue, 12 Jan 2010 13:48:00 GMT
< ETag: "51-47cf7e6ee8400"
< Accept-Ranges: bytes
< Content-Length: 81
< Cache-Control: max-age=86400
< Expires: Sun, 21 Apr 2019 08:15:07 GMT
< Connection: Keep-Alive
< Content-Type: text/html
<
{ [81 bytes data]
100    81  100    81    0     0     81      0  0:00:01 --:--:--  0:00:01   470
* Connection #0 to host baidu.com left intact
複製代碼

Request

> GET / HTTP/1.1
# StartLine: 方法 地址 協議
> Host: baidu.com
> User-Agent: curl/7.55.1
> Accept: */*
# Headers:key: value
複製代碼

Response

< HTTP/1.1 200 OK
# Start Line: 狀態碼 具體解釋
< Date: Sat, 20 Apr 2019 08:15:07 GMT
< Server: Apache
< Last-Modified: Tue, 12 Jan 2010 13:48:00 GMT
< ETag: "51-47cf7e6ee8400"
< Accept-Ranges: bytes
< Content-Length: 81
< Cache-Control: max-age=86400
< Expires: Sun, 21 Apr 2019 08:15:07 GMT
< Connection: Keep-Alive
< Content-Type: text/html
# Headers: key: value
複製代碼

Message Body

<html>
<meta http-equiv="refresh" content="0;url=http://www.baidu.com/">
</html>
複製代碼

簡單小程序

  • urllib
  • requests
  1. urlliburllib2是相互獨立的模塊(在python3.3後urllib2已經不能再用,只能用urllib.request來代替)
  2. requests庫使用了urllib3(屢次請求重複使用一個socket)
  • urllib
import urllib.request as urllib2
def use_simple_urllib2():
    url = 'http://httpbin.org/ip'
    response = urllib2.urlopen(url)
    print('>>>Response Headers')
    print(response.info())
    print('>>>Response Body')
    #獲取返回內容,readlines()獲得的是二進制,須要轉化爲字符串輸出
    print(response.read().decode())
>>>Response Headers
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Sat, 20 Apr 2019 08:38:52 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Length: 51
Connection: Close

>>>Response Body
{
  "origin": "122.205.61.100, 122.205.61.100"
}
複製代碼
def use_param_urllib2():
    url_get = 'http://httpbin.org/get'
    param = {'param1': 'hello', 'param2': 'world'}
    param = urllib.parse.urlencode(param)
    print('>>>Resquest Params')
    print(param)
    response = urllib2.urlopen('?'.join([url_get, '%s']) % param)
    print('>>>Response Headers')
    print(response.info())
    print('>>>Status Code')
    print(response.getcode())
    print('>>>Response Body')
    #獲取返回內容,readlines()獲得的是二進制,須要轉化爲字符串輸出
    print(response.read().decode())
>>>Resquest Params
param2=world&param1=hello
>>>Response Headers
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Content-Type: application/json
Date: Sat, 20 Apr 2019 09:04:11 GMT
Referrer-Policy: no-referrer-when-downgrade
Server: nginx
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block
Content-Length: 299
Connection: Close

>>>Status Code
200
>>>Response Body
{
  "args": {
    "param1": "hello", 
    "param2": "world"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.5"
  }, 
  "origin": "122.205.61.100, 122.205.61.100", 
  "url": "https://httpbin.org/get?param2=world&param1=hello"
}
複製代碼
  • request
def use_simple_request():
    url = 'http://httpbin.org/ip'
    response = requests.get(url)
    print('>>>Response Headers')
    print(response.headers)
    print('>>>Response Body')
    print(response.text)
複製代碼
def use_param_request():
    url_get = 'http://httpbin.org/ip'
    param = {'param1': 'hello', 'param2': 'world'}
    print('>>>Resquest Params')
    print(param)
    response = requests.get(url_get,params=param)
    print('>>>Response Headers')
    print(response.headers)
    print('>>>Status Code')
    print(response.status_code)
    print(response.reason)
    print('>>>Response Body')
    print(response.json())
>>>Resquest Params
{'param2': 'world', 'param1': 'hello'}
>>>Response Headers
{'Access-Control-Allow-Origin': '*', 'X-XSS-Protection': '1; mode=block', 'Content-Type': 'application/json', 'Access-Control-Allow-Credentials': 'true', 'X-Content-Type-Options': 'nosniff', 'Content-Length': '58', 'X-Frame-Options': 'DENY', 'Server': 'nginx', 'Date': 'Sat, 20 Apr 2019 09:13:01 GMT', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Referrer-Policy': 'no-referrer-when-downgrade'}
>>>Status Code
200
OK
>>>Response Body
{'origin': '115.156.141.224, 115.156.141.224'}

複製代碼
相關文章
相關標籤/搜索