Tornado1.0源碼分析-Web Framework

時間 2019-11-15

標籤 tornado1.0 tornado 源碼分析 web framework 欄目 Python 简体版

原文原文鏈接

#Web Frameworkweb

做者：MetalBug
時間：2015-03-02
出處：http://my.oschina.net/u/247728/blog
聲明：版權全部，侵犯必究

tornado.web — RequestHandler and Application classes

Tornado的Web程序將URL或者URL範式映射到RequestHandler的子類。在其子類中定義了get()或者post()等函數，用於處理不一樣的HTTP請求。正則表達式

如下是示例：cookie

class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("You requested the main page")

application = web.Application([(r"/", MainPageHandler),])
http_server = httpserver.HTTPServer(application)
http_server.listen(8080)
ioloop.IOLoop.instance().start()

MainHandler繼承於RequestHandler，重寫了get()函數，在Application中將其映射到URL:/,因此當咱們以get方式訪問host:/時會等到返回字符串"You requested the main page"。數據結構

1.Application##

Application包含了URL與其對於那個的handler(繼承自RequestHandler)，內部定義了__call__，因此可將其做爲requset_callback傳遞給HTTPServer，當客戶端訪問對應URL，對調用對應的handler。app

###內部實現-數據結構### self.transforms用於對輸出進行分塊和壓縮。框架

self.handlers主機名路由路徑列表,每一個元素爲(host, URLSpec objects)。異步

self.named_handlers爲name映射對應handler的字典，用於reverse_url時反向查找。async

self.settings爲設置，可用設置static_path,static_url_prefix等信息。ide

###內部實現-主要函數###函數

Application._init_()

初始化Application，主要作了如下工做： 1 .初始化self.transforms，默認爲GZipContentEncoding和 ChunkedTransferEncoding。 2 .初始化self.hanlders，先設定靜態文件路由，再將添加路由規則。 3 .若是設置運行模式爲Debug，則啓用autoreload。

def __init__(self, handlers=None, default_host="", transforms=None,
             wsgi=False, **settings):
     if transforms is None:
        self.transforms = []
        if settings.get("gzip"):
            self.transforms.append(GZipContentEncoding)
        self.transforms.append(ChunkedTransferEncoding)
    else:
        self.transforms = transforms
    ######
    if self.settings.get("static_path"):
        path = self.settings["static_path"]
        handlers = list(handlers or [])
        static_url_prefix = settings.get("static_url_prefix",
                                         "/static/")
        handlers = [
            (re.escape(static_url_prefix) + r"(.*)", StaticFileHandler,
             dict(path=path)),
            (r"/(favicon\.ico)", StaticFileHandler, dict(path=path)),
            (r"/(robots\.txt)", StaticFileHandler, dict(path=path)),
        ] + handlers
    if handlers: self.add_handlers(".*$", handlers)
    ####
    if self.settings.get("debug") and not wsgi:
        import autoreload
        autoreload.start()

Application.add_handler()

Application.add_handler()往self.handlers中添加路由路徑規則。

self.handlers爲主機名路由路徑列表，每一個元素爲tuple，包含了主機名和路由路徑列表(URLSpec)。

Application.add_handler()先將host_pattern(主機名)和handlers(路由路徑列表)合成一個tuple，而後添加到self.handles中。

def add_handlers(self, host_pattern, host_handlers):
    ####
    if self.handlers and self.handlers[-1][0].pattern == '.*$':
        self.handlers.insert(-1, (re.compile(host_pattern), handlers))
    else:
        self.handlers.append((re.compile(host_pattern), handlers))

    for spec in host_handlers:
        if spec.name:
    ####
            self.named_handlers[spec.name] = spec

Application.call()

Application定義了__call()__，使其實例可以被調用，做爲HTTPServer的requset_callback。該函數執行流程爲： 1 .使用request初始化self.transforms，self.transforms將會對發送數據進行分塊和壓縮。 2 .根據request的host獲得路由路徑列表，使用request.path依次匹配路由路徑列表的每個對象，獲得對應handler，同時解析獲得路徑中的參數(match.group())。 3 .匹配獲得的handler是RequestHandler對象，調用其_execute()方法，它的做用是根據不一樣的HTTP方法調用不一樣的對應函數。

def __call__(self, request):
    transforms = [t(request) for t in self.transforms]
    ####
    handlers = self._get_host_handlers(request)
    ####
    for spec in handlers:
        match = spec.regex.match(request.path)
        if match:
            handler = spec.handler_class(self, request, **spec.kwargs)
            kwargs=dict((k, unquote(v)) for (k, v) in match.groupdict().iteritems())
            args=[unquote(s) for s in match.groups()]
            break
    if not handler:
        handler = ErrorHandler(self, request, 404)
    ####
    handler._execute(transforms, *args, **kwargs)
    return handler

###內部實現-內部細節###

在Application的初始化時候，調用了add_handlers(".*$", handlers)

這裏將.*做爲默認主機名，由於.*可以匹配任意字符，因此默認狀況下，傳入的路由路徑列表即爲默認路由路徑列表。

由於.*可以匹配任意字符，因此在Application.add_handlers()中須要保證它被放置在列表的最後。
Application爲何定義__call__() 如下是__call__(),其與C++的functor相似，主要用在涉及須要保存內部狀態的狀況下。

__call__(self, [args...]) Allows an instance of a class to be called as a function. Essentially, this means that x() is the same as x.__call__(). Note that __call__ takes a variable number of arguments; this means that you define __call__ as you would any other function, taking however many arguments you'd like it to. __call__ can be particularly useful in classes whose instances that need to often change state.

但對於當前的Application，在這裏其實並無特殊的做用，使用self.callback也能夠。

2.RequestHandler##

在Application.__call__()，RequestHandler將__execute()暴露給Application，在這個函數中，實現了對HTTP請求的具體的分發和處理。在實際使用時，咱們繼承RequestHandler並重寫 get()或post()等實現對HTTP請求的處理。

###內部實現-數據結構###

self.request表示RequestHandler須要處理的請求(HTTPRquest)。 self._auto_finish用於處理異步狀況。

###內部實現-主要函數### RequestHandler._execute() 在RequestHandler._execute()中，會根據HTTP請求的方法調用相對應的函數進行處理。主要流程以下： 1 .若是爲POST請求，同時設置了xsrf檢查，那麼先校驗xsrf。 2 .調用self.prepare(),該函數爲子類重寫，作處理請求前的準備。 3 .根據HTTP請求方法調用對應處理函數。 4 .若是爲self._auto_finish爲True，那麼執行self.finish()結束請求。

def _execute(self, transforms, *args, **kwargs):
    self._transforms = transforms
    try:
        if self.request.method not in self.SUPPORTED_METHODS:
            raise HTTPError(405)
        if self.request.method == "POST" and \
           self.application.settings.get("xsrf_cookies"):
            self.check_xsrf_cookie()
        self.prepare()
        if not self._finished:
            getattr(self, self.request.method.lower())(*args, **kwargs)
            if self._auto_finish and not self._finished:
                self.finish()
    except Exception, e:
        self._handle_request_exception(e)

Requesthandler.finish() Requesthandler.finish()用於業務邏輯代碼執行後的處理工做。主要完成了如下善後工做： 1 .設置返回請求的頭部。 2 .調用self.flush()函數將緩衝區經過IOStream輸出。 3 .關閉鏈接。

def finish(self, chunk=None):
    if chunk is not None: self.write(chunk)
    if not self._headers_written:
        ####set_header
    if hasattr(self.request, "connection"):
        self.request.connection.stream.set_close_callback(None)
    if not self.application._wsgi:
        self.flush(include_footers=True)
        self.request.finish()
        self._log()
    self._finished = True

Requesthandler.flush() Requesthandler.flush()先將緩衝區中數據使用transform進行分塊和壓縮，再發送到客戶端。

def flush(self, include_footers=False):
    if self.application._wsgi:
        raise Exception("WSGI applications do not support flush()")
    chunk = "".join(self._write_buffer)
    self._write_buffer = []
    if not self._headers_written:
        self._headers_written = True
        for transform in self._transforms:
            self._headers, chunk = transform.transform_first_chunk(
                self._headers, chunk, include_footers)
        headers = self._generate_headers()
    else:
        for transform in self._transforms:
            chunk = transform.transform_chunk(chunk, include_footers)
        headers = ""

    if self.request.method == "HEAD":
        if headers: self.request.write(headers)
        return

    if headers or chunk:
        self.request.write(headers + chunk)

###內部實現-內部細節###

在RequestHadlers.finish()中，會將self.request.connection.stream.close_callback(下稱close_callback)設置爲None。由於request已經結束，清除close_callback可以避免出現RequestHandle回收不及時狀況。若是不清除，假設request爲長鏈接，當一次請求結束，這時候RequestHandler會由於close_back仍然綁定在request上而不會被回收。

def finish(self, chunk=None):
    ####
    if hasattr(self.request, "connection"):
        # Now that the request is finished, clear the callback we
        # set on the IOStream (which would otherwise prevent the
        # garbage collection of the RequestHandler when there
        # are keepalive connections)
        self.request.connection.stream.set_close_callback(None)
    if not self.application._wsgi:
        self.flush(include_footers=True)
        self.request.finish()
        self._log()
    self._finished = True

上述代碼中，先將close_callback設置爲None，再調用request.finish()，根據以前對HTTPRequest和IOStream分析，在request.finish()中由於_close_callback已被設置爲None,並不會被調用，這是爲何呢。

其實在這裏，咱們要注意的是RequestHandler.on_connection_close()跟IOstream.on_close_callback()意義並不一致。

在RequestHandler中，使用情景是當檢測到客戶端斷開鏈接時使用，在異步調用時會被調用，能夠作一些錯誤處理等工做。

def on_connection_close(self):
    """Called in async handlers if the client closed the connection.

    You may override this to clean up resources associated with
    long-lived connections.

    Note that the select()-based implementation of IOLoop does not detect
    closed connections and so this method will not be called until
    you try (and fail) to produce some output.  The epoll- and kqueue-
    based implementations should detect closed connections even while
    the request is idle.
    """
    pass

在IOStream中，self._close_callback在IOStream.close()時被調用，也就是在Request.finish()時被調用。

def set_close_callback(self, callback):
    """Call the given callback when the stream is closed."""
    self._close_callback = callback

#總結

根據對Application和RequestHandler的分析，咱們能夠了解到Tornado1.0的Web框架對於一個請求的處理流程以下：

1 .Web程序爲每個請求建立一個RequestHandler對象而且初始化。 2 .Web程序調用RequestHandler.prepare()。不管使用了哪一種HTTP方法，RequestHandler.prepare()都會被調用到，這個方法在子類中重寫。 3 .Web程序根據HTTP方法調用對應處理函數：例如get()、post()、put()等。若是URL的正則表達式模式中有分組匹配，那麼相關匹配會做爲參數傳入方法。

固然咱們也能夠看到，在Tronado1.0中，對於RequestHandler的設計仍是有不足的，例如上文講到的close_callback意義問題，例如能夠重寫prepare()用於處理前的準備，爲何不能在finish()在添加調用on_finish(),用於本身增添的善後工做？這些都是有待完善的，具體的能夠看Tornado後序版本的處理。

PS:博主本身對於Web這塊瞭解比較薄弱，哪裏說錯請各位多多指正，謝謝。