從tcp開始，用Python寫一個web框架1

時間 2020-06-13

原文原文鏈接

想嘗試寫一個web框架，不是由於Django, Flask, Sanic, tornado等web框架不香, 而是嘗試造一個輪子會對框架的認識更深，爲了認識更深天然不該該依賴第三方庫(僅使用內置庫)。html

大多數寫web框架的文章專一於應用層的實現，好比在wsgi接口的基礎上實現web框架，這樣固然是沒有問題的，就是少了更底層一點的東西，好比不知道request到底怎麼來的，可是我也理解如此作法，由於解析http請求實在不是太有意思的內容。python

本文主要會從tcp傳輸開始講起，依次介紹tcp傳輸，http協議的解析，路由解析，框架的實現。並且本文也不會實現模板引擎, 由於這個能夠單獨說一篇文章。git

而其中框架的實現會分爲三個階段:單線程，多線程，異步IO。github

最終的目標就是一個使用上大概相似flask, sanic的框架。web

由於http的內容比較多，本文天然也不會實現http協議的全部內容。正則表達式

文章目錄結構以下:chrome

TCP傳輸
HTTP解析
路由
WEB框架

環境說明

Python: 3.6.8 不依賴任何第三方庫json

高於此版本應該均可以flask

HTTP協議

HTTP應該是受衆最廣的應用層協議了，沒有之一。windows

HTTP協議通常分爲兩個部分，客戶端，服務端。其中客戶端通常指瀏覽器。客戶端發送HTTP請求給服務端，服務端根據客戶端的請求做出響應。

那麼這些請求和響應是什麼呢？下面在tcp層面模擬http請求及響應。

TCP傳輸

HTTP是應用層的協議，而所謂協議天然是一堆約定，好比第一行內容應該怎麼寫，怎麼組織內容的格式。

TCP做爲傳輸層承載着這些內容的傳輸任務，天然能夠在不使用任何http庫的狀況下，用tcp模擬http請求，或者說發送http請求。所謂傳輸無非發送(send)接收(recv)。

#socket_http_client.py

import socket

client = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

CRLF = b"\r\n"
req = b"GET / HTTP/1.1" + (CRLF * 3)

client.connect(("www.baidu.com", 80))
client.send(req)

resp = b""
while True:
    data = client.recv(1024)
    if data:
        resp += data
    else:
        break

client.close()
# 查看未解碼的前1024的bytes
print(resp[:1024])
# 查看解碼後的前1024個字符
print()
print(resp.decode("utf8")[:1024])

輸出以下:

b'HTTP/1.1 200 OK\r\nAccept-Ranges: bytes\r\nCache-Control: no-cache\r\nConnection: keep-alive\r\nContent-Length: 14615\r\nContent-Type: text/html\r\nDate: Wed, 10 Jun 2020 10:14:37 GMT\r\nP3p: CP=" OTI DSP COR IVA OUR IND COM "\r\nP3p: CP=" OTI DSP COR IVA OUR IND COM "\r\nPragma: no-cache\r\nServer: BWS/1.1\r\nSet-Cookie: BAIDUID=32C6E7B012F4DBAAB40756844698B7DF:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com\r\nSet-Cookie: BIDUPSID=32C6E7B012F4DBAAB40756844698B7DF; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com\r\nSet-Cookie: PSTM=1591784077; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com\r\nSet-Cookie: BAIDUID=32C6E7B012F4DBAA3C9883ABA2DD201E:FG=1; max-age=31536000; expires=Thu, 10-Jun-21 10:14:37 GMT; domain=.baidu.com; path=/; version=1; comment=bd\r\nTraceid: 159178407703725358186803341565479700940\r\nVary: Accept-Encoding\r\nX-Ua-Compatible: IE=Edge,chrome=1\r\n\r\n<!DOCTYPE html><!--STATUS OK-->\r\n<html>\r\n<head>\r\n\t<meta http-equi'

HTTP/1.1 200 OK
Accept-Ranges: bytes
Cache-Control: no-cache
Connection: keep-alive
Content-Length: 14615
Content-Type: text/html
Date: Wed, 10 Jun 2020 10:14:37 GMT
P3p: CP=" OTI DSP COR IVA OUR IND COM "
P3p: CP=" OTI DSP COR IVA OUR IND COM "
Pragma: no-cache
Server: BWS/1.1
Set-Cookie: BAIDUID=32C6E7B012F4DBAAB40756844698B7DF:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BIDUPSID=32C6E7B012F4DBAAB40756844698B7DF; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=1591784077; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BAIDUID=32C6E7B012F4DBAA3C9883ABA2DD201E:FG=1; max-age=31536000; expires=Thu, 10-Jun-21 10:14:37 GMT; domain=.baidu.com; path=/; version=1; comment=bd
Traceid: 159178407703725358186803341565479700940
Vary: Accept-Encoding
X-Ua-Compatible: IE=Edge,chrome=1

<!DOCTYPE html><!--STATUS OK-->
<html>
<head>
        <meta http-equi

既然經過tcp就能完成http的客戶端的請求，那麼完成服務端的實現不也是理所固然麼？

#socket_http_server.py

import socket

server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# 防止socket關閉以後，系統保留socket一段時間，以至於沒法從新綁定同一個端口
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

CRLF = b"\r\n"
host = "127.0.0.1"
port = 6666
server.bind((host, port))
server.listen()
print("啓動服務器: http://{}:{}".format(host, port))

resp = b"HTTP/1.1 200 OK" + (CRLF * 2) + b"Hello world"

while True:
    peer, addr = server.accept()
    print("客戶端來自於: {}".format(str(addr)))

    data = peer.recv(1024)
    print("收到請求以下:")
    print("字節碼格式數據")
    print(data)
    print()
    print("字符串格式數據")
    print(data.decode("utf8"))
    peer.send(resp)
    peer.close()
    # 由於windows沒辦法ctrl+c取消, 因此這裏直接退出了
    break

在啓動以後，咱們能夠經過requests進行測試

In [1]: import requests
In [2]: resp = requests.get("http://127.0.0.1:6666")
In [3]: resp.ok
Out[3]: True
In [4]: resp.text
Out[4]: 'Hello world'

而後服務端會輸出一些信息而後退出。

收到請求以下:
字節碼格式數據
b'GET / HTTP/1.1\r\nHost: 127.0.0.1:6666\r\nUser-Agent: python-requests/2.18.4\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'

字符串格式數據
GET / HTTP/1.1
Host: 127.0.0.1:6666
User-Agent: python-requests/2.18.4
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

這裏之所孜孜不倦的既輸出bytes也輸出str類型的數據, 主要是爲了讓你們注意到其中的\r\n, 這兩個不可見字符很重要。

誰說不可見字符不可見，我在字節碼格式數據格式數據中不看到了麼？這是一個頗有意思的問題呢。

至此，咱們知道http(超文本傳輸協議)就如它的名字同樣，它定義的客戶端端應該使用怎樣格式的文本發送請求，服務端應該使用怎樣格式的文本迴應請求。

上面完成了http客戶端，服務端的模擬，這裏能夠進一步將服務端的響應內容作封裝，抽象出Response類來

爲何不也抽象出客戶端的Request類呢? 由於本文打算寫的是web服務端的框架它 : )。

# response.py

from collections import namedtuple

RESP_STATUS = namedtuple("RESP_STATUS", ["code", "phrase"])
CRLF = "\r\n"

status_ok = RESP_STATUS(200, "ok")
status_bad_request = RESP_STATUS(400, "Bad Request")
statue_server_error = RESP_STATUS(500, "Internal Server Error")

default_header = {"Server": "youerning", "Content-Type": "text/html"}

class Response(object):
    http_version = "HTTP/1.1"
    def __init__(self, resp_status=status_ok, headers=None, body=None):
        self.resp_status = resp_status
        if not headers:
            headers = default_header
        if not body:
            body = "hello world"

        self.headers = headers
        self.body = body

    def to_bytes(self):
        status_line = "{} {} {}".format(self.http_version, self.resp_status.code, self.resp_status.phrase)
        header_lines = ["{}: {}".format(k, v) for k,v in self.headers.items()]
        headers_text = CRLF.join(header_lines)
        if self.body:
            headers_text += CRLF

        message_body = self.body
        data = CRLF.join([status_line, headers_text, message_body])

        return data.encode("utf8")

因此前面的響應能夠這麼寫。

# socket_http_server2.py

import socket
from response import Response

server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# 防止socket關閉以後，系統保留socket一段時間，以至於沒法從新綁定同一個端口
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

CRLF = b"\r\n"
host = "127.0.0.1"
port = 6666
server.bind((host, port))
server.listen()
print("啓動服務器: http://{}:{}".format(host, port))

resp = Response()

while True:
    peer, addr = server.accept()
    print("客戶端來自於: {}".format(str(addr)))

    data = peer.recv(1024)
    print("收到請求以下:")
    print("二進制數據")
    print(data)
    print()
    print("字符串")
    print(data.decode("utf8"))
    peer.send(resp.to_bytes())
    peer.close()
    # 由於windows沒辦法ctrl+c取消, 因此這裏直接退出了
    break

最終的結果大同小異，惟一的不一樣是後者的響應中還有http頭信息。

關於HTTP請求(Request)及響應(Response)的具體定義能夠參考下面連接:

https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5

https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6

HTTP解析

前面的內容雖然完成了HTTP交互的模擬，卻沒有達到根據請求返回指定響應的要求，這是由於咱們尚未解析客戶端發送來的請求，天然也就判斷請求的不一樣。

下面列出兩個比較常見的請求，內容以下。

GET請求

# Bytes類型
b'GET / HTTP/1.1\r\nHost: 127.0.0.1:6666\r\nUser-Agent: python-requests/2.18.4\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\n\r\n'

# string類型
GET / HTTP/1.1
Host: 127.0.0.1:6666
User-Agent: python-requests/2.18.4
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive

POST請求

# Bytes類型
b'POST / HTTP/1.1\r\nHost: 127.0.0.1:6666\r\nUser-Agent: python-requests/2.18.4\r\nAccept-Encoding: gzip, deflate\r\nAccept: */*\r\nConnection: keep-alive\r\nContent-Length: 29\r\nContent-Type: application/x-www-form-urlencoded\r\n\r\nusername=admin&password=admin'

# string類型
POST / HTTP/1.1
Host: 127.0.0.1:6666
User-Agent: python-requests/2.18.4
Accept-Encoding: gzip, deflate
Accept: */*
Connection: keep-alive
Content-Length: 29
Content-Type: application/x-www-form-urlencoded

username=admin&password=admin

這裏依舊"畫蛇添足"的貼出了兩個類型的內容，是由於字符串在打印的時候會將不可見字符格式化，好比\n就是一個換行符，而咱們之因此看到的HTTP協議是一行一行的數據，就是由於咱們在打印的時候將其格式化了，若是沒有這個意識的話，咱們就沒法肯定Request Line(請求行), Request Header Fields(請求頭字段), message-body(消息主題)

之因此中英文混寫是爲了不歧義

爲了更好的將客戶端發送過來的信息抽象，咱們寫一個Request類來容納全部請求的全部信息。

# request.py

class Request(object):
    def __init__(self):
        self.method = None
        self.path = None
        self.raw_path = None
        self.query_params = {}
        self.path_params = {}
        self.headers = {}
        self.raw_body = None
        self.data = None

那麼解析一下吧

# http_parser.py

import re
import json
from urllib import parse
from request import Request
from http_exceptions import BadRequestException, InternalServerErrorException

CRLF = b"\r\n"
SEPARATOR = CRLF + CRLF
HTTP_VERSION = b"1.1"
REQUEST_LINE_REGEXP = re.compile(br"[a-z]+ [a-z0-9.?_\[\]=&-\\]+ http/%s" % HTTP_VERSION, flags=re.IGNORECASE)
SUPPORTED_METHODS = {"GET", "POST"}

def http_parse(buffer):
    print(type(buffer[:]))
    request = Request()

    def remove_buffer(buffer, stop_index):
        buffer = buffer[stop_index:]
        return buffer

    def parse_request_line(line):
        method, raw_path = line.split()[:2]
        method = method.upper()

        if method not in SUPPORTED_METHODS:
            raise BadRequestException("{} method noy supported".format(method))

        request.method = method
        request.raw_path = raw_path

        # 處理路徑, 好比/a/b/c?username=admin&password=admin
        # 路徑是/a/b/c
        # ?後面的是路徑參數
        # 值得注意的路徑參數能夠重複，好比/a/b/c?filter=name&filter=id
        # 因此解析後的路徑參數應該是字符串對應着列表, 好比{"filter": ["name", "id"]}
        url_obj = parse.urlparse(raw_path)
        path = url_obj.path
        query_params = parse.parse_qs(url_obj.query)
        request.path = path
        request.query_params = query_params

    def parse_headers(header_lines):
        # 其實這裏使用bytes應該會更快，可是爲了跟上面的parse_request_line方法解析模式保持一致
        header_iter = (line for line in header_lines.split(CRLF.decode("utf8")) if line)
        headers = {}

        for line in header_iter:
            header, value = [i.strip() for i in line.strip().split(":")][:2]
            header = header.lower()
            headers[header] = value

        request.headers = headers

    def parse_body(body):
        # 爲了代碼簡潔就不加異常捕獲了
        data = body_parser(raw_body)

        request.raw_body = raw_body
        request.data = data

    # 判斷是否有request line
    if REQUEST_LINE_REGEXP.match(buffer):
        line = buffer.split(CRLF, maxsplit=1)[0].decode("utf8")
        parse_request_line(line)
        # 由於request line已經處理完成了，因此能夠移除
        first_line_end = buffer.index(CRLF)
        # 之因此加, 由於\r\n站兩個字節
        # 我的以爲參考連接這裏不加2是錯誤的，由於沒有移除\r\n那麼判斷是否有http header的時候會由於沒有http header出錯。
        # del buffer[:first_line_end + 2]
        buffer = remove_buffer(buffer, first_line_end + 2)

    # 若是存在\r\n\r\n說明以前有http header
    if SEPARATOR in buffer:
        header_end = buffer.index(SEPARATOR)
        header_lines = buffer[:header_end].decode("utf8")
        parse_headers(header_lines)
        # 同上
        # del buffer[:header_end + 4]
        buffer = remove_buffer(buffer, header_end + 4)

    headers = request.headers
    if headers and "content-length" in headers:
        # 這裏只處理請求主體是application/x-www-form-urlencoded及application/json兩種content-type
        # 內容格式爲application/x-www-form-urlencoded時，內容長這個樣:username=admin&password=admin就像url裏面的query_params同樣
        # 內容格式爲application/json時，內容就是json的字符串
        content_type = headers.get("content-type")
        # content_length = headers.get("content-length", "0")

        body_parser = parse.parse_qs
        if content_type == "application/json":
            # 源連接應該錯了
            body_parser = json.loads

        # 這就版本就不是糾結內容是否接受完畢了
        raw_body = buffer.decode("utf8")
        parse_body(raw_body)

    return request

而後測試一下

# 啓動服務器
python socket_http_server3.py

使用request發送http請求

In [115]: requests.post("http://127.0.0.1:6666/test/path?asd=aas", data={"username": "admin", "password": "admin"})
Out[115]: <Response [200]>

服務端輸出以下:

客戶端來自於: ('127.0.0.1', 1853)
<class 'bytes'>
收到請求以下:
請求方法: POST
請求路徑: /test/path
請求參數: {'asd': ['aas']}
請求頭: {'host': '127.0.0.1', 'user-agent': 'python-requests/2.18.4', 'accept-encoding': 'gzip, deflate', 'accept': '*/*', 'connection': 'keep-alive', 'content-length': '29', 'content-type': 'application/x-www-form-urlencoded'}
請求內容: {'username': ['admin'], 'password': ['admin']}

至此經過一個解析客戶端發來的請求，咱們獲得了一個Request對象，在這個Request對象裏面咱們能夠獲得咱們須要的一切信息。

路由

路由解析

根據經驗咱們知道不一樣的網頁路徑對應着不一樣的內容，經過路徑的不一樣響應不一樣的內容，這部份內容通常稱爲路由解析。

因此在獲得請求以後，咱們須要根據客戶端訪問的路徑來判斷返回什麼樣的內容，存儲這些對應關係的對象咱們通常叫作路由。

路由至少提供兩個接口，一是添加這種對應關係的方法，二是根據路徑返回能夠響應請求的可執行函數, 這個函數咱們通常叫作handler.

所謂路徑通常有兩種，靜態的，動態的。

靜態路由

靜態的簡單，一個字典就能夠解決，經過將請求方法及路徑做爲一個二元組做爲字典的key, 而對應的處理方法做爲value就能夠了。以下

# router1.py

import re
from collections import namedtuple
from functools import partial

def home():
    return "home"

def info():
    return "info"

def not_found():
    return "not found"

class Router(object):
    def __init__(self):
        self._routes = {}

    def add(self, path, handler, methods=None):
        if methods is None:
            methods = ["GET"]

        if not isinstance(methods, list):
            raise Exception("methods須要一個列表")

        for method in methods:
            key = (method, path)
            if key in self._routes:
                raise Exception("路由重複了: {}".format(path))
            self._routes[key] = handler

    def get_handler(self, method, path):
        method_path = (method, path)
        return self._routes.get(method_path, not_found)

route = Router()
route.add("/home", home)
route.add("/info", info, methods=["GET", "POST"])
print(route.get_handler("GET", "/home")())
print(route.get_handler("POST", "/home")())
print(route.get_handler("GET", "/info")())
print(route.get_handler("POST", "/info")())
print(route.get_handler("GET", "/xxxxxx")())

執行結果以下:

home
not found
info
info
not found

動態路由

動態就稍微複雜一些，須要使用到正則表達式。不過爲了簡單，這裏就不提供過濾動態路徑類型的接口了，好比/user/{id:int}這樣的騷操做。

代碼以下

# router2.py

import re
from collections import namedtuple
from functools import partial

Route = namedtuple("Route", ["methods", "pattern", "handler"])

def home():
    return "home"

def item(name):
    return name

def not_found():
    return "not found"

class Router(object):
    def __init__(self):
        self._routes = []

    @classmethod
    def build_route_regex(self, regexp_str):
        # 路由的路徑有兩種格式
        # 1. /home 這種格式沒有動態變量, 返回^/home$這樣的正則表達式
        # 2. /item/{name} 這種格式用動態變量,  將其處理成^/item/(?P<name>[a-zA-Z0-9_-]+)$這種格式
        def named_groups(matchobj):
            return '(?P<{0}>[a-zA-Z0-9_-]+)'.format(matchobj.group(1))

        re_str = re.sub(r'{([a-zA-Z0-9_-]+)}', named_groups, regexp_str)
        re_str = ''.join(('^', re_str, '$',))
        return re.compile(re_str)

    @classmethod
    def match_path(self, pattern, path):
        match = pattern.match(path)
        try:
            return match.groupdict()
        except AttributeError:
            return None

    def add(self, path, handler, methods=None):
        if methods is None:
            methods = {"GET"}
        else:
            methods = set(methods)
        pattern = self.__class__.build_route_regex(path)
        route = Route(methods, pattern, handler)

        if route in self._routes:
            raise Exception("路由重複了: {}".format(path))
        self._routes.append(route)

    def get_handler(self, method, path):
        for route in self._routes:
            if method in route.methods:
                params = self.match_path(route.pattern, path)

                if params is not None:
                    return partial(route.handler, **params)

        return not_found

route = Router()
route.add("/home", home)
route.add("/item/{name}", item, methods=["GET", "POST"])
print(route.get_handler("GET", "/home")())
print(route.get_handler("POST", "/home")())
print(route.get_handler("GET", "/item/item1")())
print(route.get_handler("POST", "/item/item1")())
print(route.get_handler("GET", "/xxxxxx")())

執行結果以下

home
not found
item1
item1
not found

經過裝飾器添加路由

之因此單獨在說一下路由的添加，是由於顯式的調用感受不夠花哨(不夠甜) : ).因此相似flask那樣經過裝飾器(語法糖)來添加路由是很棒(甜)的一個選擇。

# router3.py

import re
from collections import namedtuple
from functools import partial
from functools import wraps

SUPPORTED_METHODS = {"GET", "POST"}
Route = namedtuple("Route", ["methods", "pattern", "handler"])

class View:
    pass

class Router(object):
    def __init__(self):
        self._routes = []

    @classmethod
    def build_route_regex(self, regexp_str):
        # 路由的路徑有兩種格式
        # 1. /home 這種格式沒有動態變量, 返回^/home$這樣的正則表達式
        # 2. /item/{name} 這種格式用動態變量,  將其處理成^/item/(?P<name>[a-zA-Z0-9_-]+)$這種格式
        def named_groups(matchobj):
            return '(?P<{0}>[a-zA-Z0-9_-]+)'.format(matchobj.group(1))

        re_str = re.sub(r'{([a-zA-Z0-9_-]+)}', named_groups, regexp_str)
        re_str = ''.join(('^', re_str, '$',))
        return re.compile(re_str)

    @classmethod
    def match_path(self, pattern, path):
        match = pattern.match(path)
        try:
            return match.groupdict()
        except AttributeError:
            return None

    def add_route(self, path, handler, methods=None):
        if methods is None:
            methods = {"GET"}
        else:
            methods = set(methods)
        pattern = self.__class__.build_route_regex(path)
        route = Route(methods, pattern, handler)

        if route in self._routes:
            raise Exception("路由重複了: {}".format(path))
        self._routes.append(route)

    def get_handler(self, method, path):
        for route in self._routes:
            if method in route.methods:
                params = self.match_path(route.pattern, path)

                if params is not None:
                    return partial(route.handler, **params)

        return not_found

    def route(self, path, methods=None):
        def wrapper(handler):
            # 閉包函數中若是有該變量的賦值語句，會認爲是本地變量，就不上去上層找了
            nonlocal methods
            if callable(handler):
                if methods is None:
                    methods = {"GET"}
                else:
                    methods = set(methods)
                self.add_route(path, handler, methods)

            return handler
        return wrapper

route = Router()

@route.route("/home")
def home():
    return "home"

@route.route("/item/{name}", methods=["GET", "POST"])
def item(name):
    return name

def not_found():
    return "not found"

print(route.get_handler("GET", "/home")())
print(route.get_handler("POST", "/home")())
print(route.get_handler("GET", "/item/item1")())
print(route.get_handler("POST", "/item/item1")())
print(route.get_handler("GET", "/xxxxxx")())

輸出結果以下，與上面沒有使用裝飾器時是同樣的。