python http源碼閱讀

上篇文章在閱讀bottle源碼時候,沒有詳細介紹bottle如何處理http請求,這須要先閱讀python-http源碼。本週咱們一塊兒看看python-http的源碼,瞭解python構建http服務,響應http請求,把這一部分基礎補充上。本文會分下面幾個部分:html

  • http相關代碼結構
  • socket
  • selector
  • socketserver
  • http-server

1. http-server 相關代碼結構

本次代碼閱讀使用的python 3.6.5 以上版本,有python環境便可(winddows和mac等系統在select部分會有差別),涉及的代碼有:python

文件 描述
socket.py socket的API
select.py select.py是stub文件,提供多路複用的異步IO的底層實現
selectors.py 對select的高層實現,推薦使用
socketserver.py tcpserver/udpserver等默認實現
http/server.py 一個簡單的http服務實現
http/client.py 簡單的http客戶端

2. socket

socket部分的基礎知識,推薦直接查看參考連接1,介紹的很是詳細。本文仍是按照源碼閱讀的主題來進行解讀,先是socket對象的建立:api

class socket(_socket.socket):

    """A subclass of _socket.socket adding the makefile() method."""

    __slots__ = ["__weakref__", "_io_refs", "_closed"]

    def __init__(self, family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None):
        _socket.socket.__init__(self, family, type, proto, fileno)
        self._io_refs = 0
        self._closed = False

    def __enter__(self):
        return self

    def __exit__(self, *args):
        if not self._closed:
            self.close()
複製代碼

從socket類能夠看到:markdown

  • __slots__ 能夠優化對象
  • __enter__ 和 __exit__ 可讓socket對象當上下文使用
  • family和type是兩個很是重要的參數,決定了socket的資源類型。family常見有:AF_INET IPV4的協議,AF_INET6 IPV6協議;type常見有TCP協議的SOCK_STREAM和UPD協議的SOCK_DGRAM可用。

socket的api比較多,能夠從socket.pyi或者_socket.py看到,咱們重點關注下圖中TCP Socket Flow涉及到的api。網絡

TCP Socket Flow

其它的api都由底層實現,除了accept。acecept函數是運行在服務端,接受一個新的進來新生成一個socket對象,用來表明這個新的鏈接。多線程

def accept(self):
    fd, addr = self._accept  # 獲取到本地的文件描述符和遠程鏈接的地址
    type = self.type & ~globals().get("SOCK_NONBLOCK", 0)
    sock = socket(self.family, type, self.proto, fileno=fd)  # 封裝新的socket並返回
    if getdefaulttimeout() is None and self.gettimeout():
        sock.setblocking(True)
    return sock, addr
複製代碼

socket代碼中還提供了SocketIO類,示例如何包裝socket進行讀寫操做:app

class SocketIO(io.RawIOBase):
    def __init__(self, sock, mode):
        if mode not in ("r", "w", "rw", "rb", "wb", "rwb"):  # socket也是文件,有讀寫等模式
            raise ValueError("invalid mode: %r" % mode)
        io.RawIOBase.__init__(self)
        self._sock = sock
        if "b" not in mode:
            mode += "b"  # socket是基於二進制
        self._mode = mode
        self._reading = "r" in mode  # 讀斷定 
        self._writing = "w" in mode  # 寫斷定
        
    def readinto(self, b):  # 從socket讀取數據到指定的buffer
        return self._sock.recv_into(b)
            
    def write(self, b): # 寫入數據到socket
        return self._sock.send(b)
複製代碼

2.1 socket示例

這是2組示例,分別演示了使用socket實現tcp和udp協議收發數據。curl

# tcp-server

import socket

HOST = '127.0.0.1'
PORT = 65432

with socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM) as s:
    s.bind((HOST, PORT))
    s.listen()
    conn, addr = s.accept()
    with conn:
        print('Connected by', addr)
        while True:
            data = conn.recv(1024)
            if not data:
                break
            print("recv data", data, len(data))
            conn.sendall(data)
複製代碼

tcp-server的socket有bind,listen和accept三個過程,使用recv接收數據,使用sendall發送數據。tcp-client須要connect到服務端。異步

# tcp-client

import socket

HOST = '127.0.0.1'  # The server's hostname or IP address
PORT = 65432  # The port used by the server

with socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM) as sock:
    sock.connect((HOST, PORT))
    sock.sendall(b'Hello, world')
    data = sock.recv(1024)
    print('Received', repr(data))
複製代碼

tcp示例的輸出日誌:socket

# tcp-server
Connected by ('127.0.0.1', 64203)
recv data b'Hello, world' 12

# tcp-client
Received b'Hello, world'
複製代碼

udp協議下server和client都簡單一些,沒有listen和accept的過程:

# udp-sever

import socket

HOST = 'localhost'
PORT = 65432

with socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM) as sock:  # type不同
    # Bind the socket to the port
    sock.bind((HOST, PORT))
    while True:
        data, address = sock.recvfrom(4096)  # 直接接收數據
        print("recv data", data, address)
        if data:
            sock.sendto(data, address)  #  sendto 發送到制定地址

# udp-client
import socket

HOST = '127.0.0.1'  # Standard loopback interface address (localhost)
PORT = 65432  # Port to listen on (non-privileged ports are > 1023)

# Create a UDP socket
with socket.socket(family=socket.AF_INET, type=socket.SOCK_DGRAM) as sock:
    # Send data
    sock.sendto(b'Hello, world', (HOST, PORT))
    # Receive response
    data, server = sock.recvfrom(4096)
    print("recv data", data, server)
複製代碼

udp示例的日誌:

# udp-server
recv data b'Hello, world' ('127.0.0.1', 55429)

# udp-client
recv data b'Hello, world' ('127.0.0.1', 65432)
複製代碼

3. selector

前面示例中,服務端只可以處理一個鏈接的讀寫,不知足同時服務多個鏈接的需求。同時多個客戶端鏈接,要在客戶端鏈接之間切換,這就須要select。下面內容來自官方的中文文檔:

select.select(rlist, wlist, xlist[, timeout])
這是一個明白直觀的 Unix select() 系統調用接口。 前三個參數是由‘可等待對象’組成的序列:能夠是表明文件描述符的整數,或是帶有名爲 fileno() 的返回這樣的整數的無形參方法的對象:

rlist:等待,直到能夠開始讀取

wlist:等待,直到能夠開始寫入

xlist:等待「異常狀況」(請參閱當前系統的手冊,以獲取哪些狀況稱爲異常狀況)

容許空的可迭代對象,可是否接受三個空的可迭代對象則取決於具體平臺。 (已知在 Unix 上可行但在 Windows 上不可行。) 可選的 timeout 參數以一個浮點數表示超時秒數。 當省略 timeout 參數時該函數將阻塞直到至少有一個文件描述符準備就緒。 超時值爲零表示執行輪詢且永不阻塞。

返回值是三個列表,包含已就緒對象,返回的三個列表是前三個參數的子集。當超時時間已到且沒有文件描述符就緒時,返回三個空列表。

可迭代對象中可接受的對象類型有 Python 文件對象 (例如 sys.stdin 以及 open() 或 os.popen() 所返回的對象),由 socket.socket() 返回的套接字對象等。 你也能夠自定義一個 wrapper 類,只要它具備適當的 fileno() 方法(該方法要確實返回一個文件描述符,而不能只是一個隨機整數)。
複製代碼

咱們能夠簡單理解爲select是個事件中心,管理多個鏈接,接受系統網絡調用,派發不一樣的讀寫事件通知應用程序。select具體的應用仍是看selector中的高層次API。

3.1 selector的實現

selector定義的讀和寫事件:

EVENT_READ = (1 << 0)
EVENT_WRITE = (1 << 1)
複製代碼

使用可命名元祖定義SelectorKey:

def _fileobj_to_fd(fileobj):
    if isinstance(fileobj, int):
        fd = fileobj
    else:
        try:
            fd = int(fileobj.fileno())  # 獲取文件描述符
        except (AttributeError, TypeError, ValueError):
            raise ValueError("Invalid file object: "
                             "{!r}".format(fileobj)) from None
    if fd < 0:
        raise ValueError("Invalid file descriptor: {}".format(fd))
    return fd

SelectorKey = namedtuple('SelectorKey', ['fileobj', 'fd', 'events', 'data'])
複製代碼

BaseSelector是元類,要求全部子類必須實現register,unregister和select方法:

class BaseSelector(metaclass=ABCMeta):
    @abstractmethod
    def register(self, fileobj, events, data=None):
        raise NotImplementedError
    
    @abstractmethod
    def unregister(self, fileobj):
        raise NotImplementedError
    
    @abstractmethod
    def select(self, timeout=None):
        raise NotImplementedError
    ...
複製代碼

register和unregister的實現看起來也比較簡單,就是使用字典管理對應的SelectorKey對象。

class _BaseSelectorImpl(BaseSelector):
    
    def __init__(self):
        # this maps file descriptors to keys
        self._fd_to_key = {}
        # read-only mapping returned by get_map()
        self._map = _SelectorMapping(self)
        
    def register(self, fileobj, events, data=None):

        key = SelectorKey(fileobj, self._fileobj_lookup(fileobj), events, data)

        self._fd_to_key[key.fd] = key
        return key
    
    def unregister(self, fileobj):
        key = self._fd_to_key.pop(self._fileobj_lookup(fileobj))
        return key
複製代碼

不一樣的操做系統有不一樣的select實現 :

class SelectSelector(_BaseSelectorImpl):
    """Select-based selector."""
    
if hasattr(select, 'poll'):

    class PollSelector(_BaseSelectorImpl):
        """Poll-based selector."""
        
if hasattr(select, 'epoll'):

    class EpollSelector(_BaseSelectorImpl):
        """Epoll-based selector."""
        
if hasattr(select, 'devpoll'):

    class DevpollSelector(_BaseSelectorImpl):
        """Solaris /dev/poll selector."""
        
if hasattr(select, 'kqueue'):
    class KqueueSelector(_BaseSelectorImpl):
        """Kqueue-based selector."""

# Choose the best implementation, roughly:
#    epoll|kqueue|devpoll > poll > select.
if 'KqueueSelector' in globals():
    DefaultSelector = KqueueSelector
elif 'EpollSelector' in globals():
    DefaultSelector = EpollSelector
elif 'DevpollSelector' in globals():
    DefaultSelector = DevpollSelector
elif 'PollSelector' in globals():
    DefaultSelector = PollSelector
else:
    DefaultSelector = SelectSelector
複製代碼

註釋中給出了效率高低排序 epoll|kqueue|devpoll > poll > select 。咱們學習一下最簡單的SelectSelector,額外使用了2個集合管理所持有的fileobj:

class SelectSelector(_BaseSelectorImpl):
    """Select-based selector."""

    def __init__(self):
        super().__init__()
        self._readers = set()
        self._writers = set()

    def register(self, fileobj, events, data=None):
        key = super().register(fileobj, events, data)
        if events & EVENT_READ:
            self._readers.add(key.fd)
        if events & EVENT_WRITE:
            self._writers.add(key.fd)
        return key

    def unregister(self, fileobj):
        key = super().unregister(fileobj)
        self._readers.discard(key.fd)
        self._writers.discard(key.fd)
        return key
複製代碼

重點是select函數對_select的封裝:

_select = select.select
    
    def select(self, timeout=None):
        timeout = None if timeout is None else max(timeout, 0)
        ready = []
        try:
            r, w, _ = self._select(self._readers, self._writers, [], timeout)
        except InterruptedError:
            return ready
        r = set(r)
        w = set(w)
        for fd in r | w:
            events = 0
            if fd in r:
                events |= EVENT_READ
            if fd in w:
                events |= EVENT_WRITE

            key = self._fd_to_key[fd]
            if key:
                ready.append((key, events & key.events))
        return ready  # 就緒的對象
複製代碼

至於更高效的epoll,pool的實現,只是內部實現有區別,能夠之後再理解,通常狀況下應用使用DefaultSelector的API,由系統自動選擇最高效的方式。

3.2 selecotr示例

使用selector實現的能夠支持多個客戶端連接的server:

# multi-server

import socket
import selectors
HOST = '127.0.0.1'
PORT = 65432

sel = selectors.DefaultSelector()

def accept(sock, mask):  # 接受新鏈接
    conn, addr = sock.accept()  # Should be ready
    print('accepted', conn, 'from', addr)
    conn.setblocking(False)
    sel.register(conn, selectors.EVENT_READ, read)  # 繼續加入selector

def read(conn, mask):  # 讀取數據
    data = conn.recv(1000)  # Should be ready
    if data:
        print('echoing', repr(data), 'to', conn)
        conn.send(data)  # Hope it won't block
    else:
        print('closing', conn)
        sel.unregister(conn)
        conn.close()

serverd = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serverd.bind((HOST, PORT))
serverd.listen(100)
serverd.setblocking(False)  # 非阻塞
sel.register(serverd, selectors.EVENT_READ, accept)  # 只註冊read事件

while True:  # 無限循環持續監聽
    events = sel.select()
    for key, mask in events:
        callback = key.data
        callback(key.fileobj, mask)
複製代碼

客戶端和以前的tcp-client相似,只是爲了方便手動操做多開,增長了休眠時間:

# multi-client

import socket
import time

HOST = '127.0.0.1'  # The server's hostname or IP address
PORT = 65432  # The port used by the server

with socket.socket(family=socket.AF_INET, type=socket.SOCK_STREAM) as sock:
    sock.connect((HOST, PORT))
    for x in range(10):
        sock.sendall(b'Hello, world')
        data = sock.recv(1024)
        print('Received', repr(data))
        time.sleep(1)
複製代碼

開啓服務端後,能夠開多個客戶端,觀察服務端的日誌:

accepted <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63288)> from ('127.0.0.1', 63288)
echoing b'Hello, world' to <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63288)>
echoing b'Hello, world' to <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63288)>
accepted <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63295)> from ('127.0.0.1', 63295)
echoing b'Hello, world' to <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63295)>
echoing b'Hello, world' to <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63288)>
...
echoing b'Hello, world' to <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63295)>
closing <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63288)>
echoing b'Hello, world' to <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63295)>
closing <socket.socket fd=6, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 65432), raddr=('127.0.0.1', 63295)>
複製代碼

日誌清晰的展現2個客戶端接入,收發數據和關閉離開的過程。

4. socketserver

socketserver註釋很是的詳盡,是理解socketserver代碼的最好幫助。好比下面這段,直觀的介紹了socketserver的結構,咱們重點關注TCPServer的實現。

+------------+
        | BaseServer |
        +------------+
              |
              v
        +-----------+        +------------------+
        | TCPServer |------->| UnixStreamServer |
        +-----------+        +------------------+
              |
              v
        +-----------+        +--------------------+
        | UDPServer |------->| UnixDatagramServer |
        +-----------+        +--------------------+
複製代碼

4.1 TCPServer

BaseServer定義了一個基礎的socket服務模型, 接收服務參數和客戶端類後初始化對象,使用serve_forever持續的監聽鏈接,約定對請求的處理流程:

class BaseServer:
    def __init__(self, server_address, RequestHandlerClass):
        self.server_address = server_address
        self.RequestHandlerClass = RequestHandlerClass
        ...
    
    def serve_forever(self, poll_interval=0.5):
        with selectors.SelectSelector() as selector:
            selector.register(self, selectors.EVENT_READ)

            while not self.__shutdown_request:
                ready = selector.select(poll_interval)
                if ready:
                    self._handle_request_noblock()

    def _handle_request_noblock(self):
        try:
            request, client_address = self.get_request()  # 子類實現
        except OSError:
            return
        
        self.RequestHandlerClass(request, client_address, self)  # 分層由客戶端請求類實現具體需求
        self.close_request(request)  # 子類實現
複製代碼

TCPServer就是按TCP協議實現bind,listen和accept:

class TCPServer(BaseServer):
    
    address_family = socket.AF_INET

    socket_type = socket.SOCK_STREAM
    
    request_queue_size = 5

    def __init__(self, server_address, RequestHandlerClass, bind_and_activate=True):
        """Constructor.  May be extended, do not override."""
        BaseServer.__init__(self, server_address, RequestHandlerClass)
        self.socket = socket.socket(self.address_family,
                                    self.socket_type)
        if bind_and_activate:
            try:
                self.server_bind()
                self.server_activate()
            except:
                self.server_close()
                raise
                
    def server_bind(self):
        self.socket.bind(self.server_address)
        self.server_address = self.socket.getsockname()

    def server_activate(self):
        self.socket.listen(self.request_queue_size)
    
    def get_request(self):
        return self.socket.accept()
複製代碼

4.2 ThreadingMixIn

ThreadingMixIn也很重要,展現了使用多線程方式提供服務:

class ThreadingMixIn:
    """Mix-in class to handle each request in a new thread."""

    def process_request_thread(self, request, client_address):
        try:
            self.finish_request(request, client_address) # 回到父類的標準實現
        except Exception:
            self.handle_error(request, client_address)
        finally:
            self.shutdown_request(request)

    def process_request(self, request, client_address):
        """Start a new thread to process the request."""
        t = threading.Thread(target = self.process_request_thread,
                             args = (request, client_address))  # 使用新線程處理請求
        t.daemon = self.daemon_threads
        t.start()
複製代碼

4.3 RequestHandler

請求邏輯由RequestHandler處理,基礎類是BaseRequestHandler,定義了請求處理的主要流程 setup -> handler -> finish :

class BaseRequestHandler:
    def __init__(self, request, client_address, server):
        self.request = request
        self.client_address = client_address
        self.server = server
        self.setup()  # 子類實現
        try:
            self.handle()  # 子類實現
        finally:
            self.finish()  # 子類實現
複製代碼

tcp方式的處理StreamRequestHandler, 主要就是對connection(socket)進行了包裝:

class StreamRequestHandler(BaseRequestHandler):
    rbufsize = -1
    wbufsize = 0 
    
    def setup(self):
        self.connection = self.request
        self.rfile = self.connection.makefile('rb', self.rbufsize)  # 包裝讀
        if self.wbufsize == 0:
            self.wfile = _SocketWriter(self.connection)  # 包裝寫
        ...

    def finish(self):
        if not self.wfile.closed:
            self.wfile.flush()
        self.wfile.close()
        self.rfile.close()

class _SocketWriter(BufferedIOBase):

    def __init__(self, sock):
        self._sock = sock

    def write(self, b):  # 寫入數據
        self._sock.sendall(b)
        with memoryview(b) as view:
            return view.nbytes
複製代碼

最重要的handler卻留白了,等待應用程序的實現。

5. http-server

通過socket, selector和tcpserver三關,總算進入了咱們的主題 http-server, 可使用下面方式啓動一個簡單的http服務:

python3 -m http.server
Serving HTTP on :: port 8000 (http://[::]:8000/) ...
複製代碼

我把啓動過程梳理成下面代碼:

HandlerClass = SimpleHTTPRequestHandler  # 定義RequestHandler類
HandlerClass.protocol_version = "HTTP/1.0"  # http協議版本
with ThreadingHTTPServer(addr, HandlerClass) as httpd:  # 建立http服務
    host, port = httpd.socket.getsockname()[:2]
    url_host = f'[{host}]' if ':' in host else host
    print(
        f"Serving HTTP on {host} port {port} "
        f"(http://{url_host}:{port}/) ..."
    )
    httpd.serve_forever()  # 啓動服務
複製代碼

得益於以前良好的封裝ThreadingHTTPServer的實現很是簡單,不用再介紹:

class HTTPServer(socketserver.TCPServer):

    def server_bind(self):
        """Override server_bind to store the server name."""
        socketserver.TCPServer.server_bind(self)
        host, port = self.server_address[:2]
        self.server_name = socket.getfqdn(host)
        self.server_port = port

class ThreadingHTTPServer(socketserver.ThreadingMixIn, HTTPServer):
    daemon_threads = True  # 守護進程方式
    
複製代碼

5.1 HTTPRequestHandler

從HTTPServer能夠知道http服務的實現主要在SimpleHTTPRequestHandler中。首先看其父類BaseHTTPRequestHandler:

class BaseHTTPRequestHandler(socketserver.StreamRequestHandler):  # 繼承自StreamRequestHandler,這不奇怪http服務是tcp服務的子集,對應的請求實現也應該是基於stream的。
    
    def handle(self):
        """Handle multiple requests if necessary."""
        self.close_connection = True

        self.handle_one_request()  # 處理一個請求
        while not self.close_connection:  # 用於keep-alive等場景
            self.handle_one_request()
複製代碼

BaseHTTPRequestHandler重點在實現StreamRequestHandler留白的handler方法(從名稱上看就體現了http服務的特色,每一個請求都是一次性的)。

def handle_one_request(self):
    self.raw_requestline = self.rfile.readline(65537)  # 處理http協議頭
    self.parse_request()  # 處理http頭
    mname = 'do_' + self.command
    method = getattr(self, mname)
    method()  # 處理http協議方法
    self.wfile.flush()  # 響應請求 
複製代碼

在繼續閱讀代碼以前,能夠先簡單瞭解一下http協議,使用curl訪問一下咱們的http服務:

curl -v http://127.0.0.1:8000
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8000 (#0)
> GET / HTTP/1.1
> Host: 127.0.0.1:8000
> User-Agent: curl/7.64.1
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/3.8.5
< Date: Wed, 27 Jan 2021 11:03:08 GMT
< Content-type: text/html; charset=utf-8
< Content-Length: 1570
<
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Directory listing for /</title>
</head>
<body>
<h1>Directory listing for /</h1>
<hr>
<ul>
...
</ul>
<hr>
</body>
</html>
* Closing connection 0
複製代碼

日誌中>開頭的部分是請求,<開頭的部分是響應,關於http協議的詳解建議查看參考鏈接2。http請求的圖示:

HTTP_Request

結合上圖,咱們能夠知道咱們的請求首行是 GET / HTTP/1.1, 下面還有一些header信息。回到代碼parse_request的實現

def parse_request(self):
    self.raw_requestline = self.rfile.readline(65537)
    
    # 協議校驗
    requestline = str(self.raw_requestline, 'iso-8859-1')
    requestline = requestline.rstrip('\r\n')  
    self.requestline = requestline
    words = requestline.split()  # 分割
    if len(words) == 0 or len(words) >= 3: # 長度校驗
        return False

    if len(words) >= 3:  # Enough to determine protocol version
        version = words[-1]
        if not version.startswith('HTTP/'):
            raise ValueError
                    
        base_version_number = version.split('/', 1)[1]
        version_number = base_version_number.split(".")
        if len(version_number) != 2:
            raise ValueError
            ...
    
    # 分離方法和路徑
    command, path = words[:2]
    self.command, self.path = command, path
    
    # 解析http頭
    self.headers = http.client.parse_headers(self.rfile, _class=self.MessageClass)
複製代碼

根據代碼能夠反推http協議的首行要求:使用空格分割的三元組,分別對應http方法,路徑和http協議版本,其中http協議版本又須要使用HTTP關鍵字前綴和協議版本組成。繼續解析剩下的http請求頭:

# http/client.py

def parse_headers(fp, _class=HTTPMessage):
    headers = []
    while True:
        line = fp.readline(_MAXLINE + 1) # 持續讀取行數據
        if len(line) > _MAXLINE:
            raise LineTooLong("header line")
        headers.append(line)
        if len(headers) > _MAXHEADERS:
            raise HTTPException("got more than %d headers" % _MAXHEADERS)
        if line in (b'\r\n', b'\n', b''):  # 遇到空行完成讀取
            break
    hstring = b''.join(headers).decode('iso-8859-1')
    return email.parser.Parser(_class=_class).parsestr(hstring)  # 解析封裝頭
複製代碼

http方法的實現邏輯在子類SimpleHTTPRequestHandler中:

def do_GET(self):
        """Serve a GET request."""
        f = self.send_head()
        if f:
            try:
                self.copyfile(f, self.wfile)  # copy結果到socket的輸出
            finally:
                f.close()
    
    def send_head(self):
        ...
        path = self.translate_path(self.path)
        f = None
        if os.path.isdir(path):
            ...
            return self.list_directory(path)
        ...
複製代碼

展現目錄的輸出:

def list_directory(self, path):
        list = os.listdir(path)
        displaypath = urllib.parse.unquote(self.path,
                                               errors='surrogatepass')
        enc = sys.getfilesystemencoding()
        title = 'Directory listing for %s' % displaypath
        r.append('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" '
                 '"http://www.w3.org/TR/html4/strict.dtd">')
        r.append('<html>\n<head>')
        r.append('<meta http-equiv="Content-Type" '
                 'content="text/html; charset=%s">' % enc)
        r.append('<title>%s</title>\n</head>' % title)
        r.append('<body>\n<h1>%s</h1>' % title)
        r.append('<hr>\n<ul>')
        for name in list:
            fullname = os.path.join(path, name)
            displayname = linkname = name
            # Append / for directories or @ for symbolic links
            if os.path.isdir(fullname):
                displayname = name + "/"
                linkname = name + "/"
            if os.path.islink(fullname):
                displayname = name + "@"
                # Note: a link to a directory displays with @ and links with /
            r.append('<li><a href="%s">%s</a></li>'
                    % (urllib.parse.quote(linkname,
                                          errors='surrogatepass'),
                       html.escape(displayname, quote=False)))
        r.append('</ul>\n<hr>\n</body>\n</html>\n')
        encoded = '\n'.join(r).encode(enc, 'surrogateescape')
        
        f = io.BytesIO()
        f.write(encoded)
        f.seek(0)
        self.send_response(HTTPStatus.OK)
        self.send_header("Content-type", "text/html; charset=%s" % enc)
        self.send_header("Content-Length", str(len(encoded)))
        self.end_headers()
        return f
複製代碼

能夠看到list_directory進行了文件目錄的操做,轉換成html的文本後輸出。同時設置http頭及響應的處理:

def send_response(self, code, message=None):
    self.send_response_only(code, message)
    self.send_header('Server', self.version_string())
    self.send_header('Date', self.date_time_string())

def send_response_only(self, code, message=None):
    """Send the response header only."""
    message = self.responses[code][0]
    self._headers_buffer.append(("%s %d %s\r\n" %
                (self.protocol_version, code, message)).encode(
                    'latin-1', 'strict'))  # 三段式迴應 HTTP/1.0 200 OK
                        
def send_header(self, keyword, value):
    """Send a MIME header to the headers buffer."""
    if self.request_version != 'HTTP/0.9':
        if not hasattr(self, '_headers_buffer'):
            self._headers_buffer = []
        self._headers_buffer.append(
            ("%s: %s\r\n" % (keyword, value)).encode('latin-1', 'strict'))

def end_headers(self):
    """Send the blank line ending the MIME headers."""
    if self.request_version != 'HTTP/0.9':
        self._headers_buffer.append(b"\r\n")  # 隔開header和html
        self.flush_headers()

def flush_headers(self):
    if hasattr(self, '_headers_buffer'):
        self.wfile.write(b"".join(self._headers_buffer))  # 寫入head信息
        self._headers_buffer = []
複製代碼

http狀態的生成以下:

responses = {
        v: (v.phrase, v.description)
        for v in HTTPStatus.__members__.values()
    }  # 列表推導式

class HTTPStatus(IntEnum):
    
    def __new__(cls, value, phrase, description=''):
        obj = int.__new__(cls, value)
        obj._value_ = value

        obj.phrase = phrase
        obj.description = description
        return obj
    
    OK = 200, 'OK', 'Request fulfilled, document follows'
複製代碼

http-server的主要流程已經梳理完成,還有不少實現的細節能夠留到具體問題時候再進行研究。

小技巧

使用Mixin模式,能夠很好的組織實現類,下面22組合造成了6個實現類:

if hasattr(os, "fork"):
    class ForkingUDPServer(ForkingMixIn, UDPServer): pass
    class ForkingTCPServer(ForkingMixIn, TCPServer): pass

class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass
class ThreadingTCPServer(ThreadingMixIn, TCPServer): pass

if hasattr(socket, 'AF_UNIX'):

    class UnixStreamServer(TCPServer):
        address_family = socket.AF_UNIX

    class UnixDatagramServer(UDPServer):
        address_family = socket.AF_UNIX

    class ThreadingUnixStreamServer(ThreadingMixIn, UnixStreamServer): pass

    class ThreadingUnixDatagramServer(ThreadingMixIn, UnixDatagramServer): pass
複製代碼

使用元類來強制子類的建立過程:

class BaseSelector(metaclass=ABCMeta): # 元類
    @abstractmethod
    def register(self, fileobj, events, data=None):
        raise NotImplementedError
    
    @abstractmethod
    def unregister(self, fileobj):
        raise NotImplementedError
    
    @abstractmethod
    def select(self, timeout=None):
        raise NotImplementedError
複製代碼

參考連接:

  1. realpython.com/python-sock…
  2. developer.mozilla.org/zh-CN/docs/…
相關文章
相關標籤/搜索