【Python】【Web.py】詳細解讀Python的web.py框架下的application.py模塊

時間 2019-11-29

標籤 Python Web.py 詳細解讀 python web.py web 框架 application.py application 模塊欄目 Python 简体版

原文原文鏈接

詳細解讀Python的web.py框架下的application.py模塊

這篇文章主要介紹了Python的web.py框架下的application.py模塊,做者深刻分析了web.py的源碼,須要的朋友能夠參考下

本文主要分析的是web.py庫的application.py這個模塊中的代碼。總的來講，這個模塊主要實現了WSGI兼容的接口，以便應用程序可以被WSGI應用服務器調用。WSGI是Web Server Gateway Interface的縮寫，具體細節能夠查看WSGI的WIKI頁面
接口的使用
使用web.py自帶的HTTP Serverweb

下面這個例子來自官方文檔的Hello World，這個代碼通常是應用入口的代碼：安全

 
         import 
         web 
        
         urls  
         = 
         ( 
         "/.*" 
         ,  
         "hello" 
         ) 
        
         app  
         = 
         web.application(urls,  
         globals 
         ()) 
        
         class 
         hello: 
        
         def 
         GET( 
         self 
         ): 
        
         return 
         'Hello, world!' 
        
         if 
         __name__  
         = 
         = 
         "__main__" 
         : 
        
         app.run()

上面的例子描述了一個web.py應用最基本的組成元素：服務器

URL路由表
一個web.application實例app
調用app.run()

其中，app.run()的調用是初始化各類WCGI接口，並啓動一個內置的HTTP服務器和這些接口對接，代碼以下：session

1 2	`def` `run(` `self` `,` `` `middleware):` `return` `wsgi.runwsgi(` `self` `.wsgifunc(` `` `middleware))`

與WSGI應用服務器對接app

若是你的應用要與WSGI應用服務器對接，好比uWSGI，gunicorn等，那麼應用入口的代碼就要換一種寫法了：框架

 
         import 
         web 
        
         class 
         hello: 
        
         def 
         GET( 
         self 
         ): 
        
         return 
         'Hello, world!' 
        
         urls  
         = 
         ( 
         "/.*" 
         ,  
         "hello" 
         ) 
        
         app  
         = 
         web.application(urls,  
         globals 
         ()) 
        
         application  
         = 
         app.wsgifunc()

在這種場景下，應用的代碼不須要啓動HTTP服務器，而是實現一個WSGI兼容的接口供WSGI服務器調用。web.py框架爲咱們實現了這樣的接口，你只須要調用application = app.wsgifunc()就能夠了，這裏所獲得的application變量就是WSGI接口（後面分析完代碼你就會知道了）。
WSGI接口的實現分析函數

分析主要圍繞着下面兩行代碼進行：工具

1 2	`app` `=` `web.application(urls,` `globals` `())` `application` `=` `app.wsgifunc()`

web.application實例化this

初始化這個實例須要傳遞兩個參數：URL路由元組和globals()的結果。url

另外，還能夠傳遞第三個變量：autoreload，用來指定是否須要自動從新導入Python模塊，這在調試的時候頗有用，不過咱們分析主要過程的時候能夠忽略。

application類的初始化代碼以下：

 
         class 
         application: 
        
         def 
         __init__( 
         self 
         , mapping 
         = 
         (), fvars 
         = 
         {}, autoreload 
         = 
         None 
         ): 
        
         if 
         autoreload  
         is 
         None 
         : 
        
         autoreload  
         = 
         web.config.get( 
         'debug' 
         ,  
         False 
         ) 
        
         self 
         .init_mapping(mapping) 
        
         self 
         .fvars  
         = 
         fvars 
        
         self 
         .processors  
         = 
         [] 
        
         self 
         .add_processor(loadhook( 
         self 
         ._load)) 
        
         self 
         .add_processor(unloadhook( 
         self 
         ._unload)) 
        
         if 
         autoreload: 
        
         ...

其中，autoreload相關功能的代碼略去了。其餘的代碼主要做了以下幾個事情：

self.init_mapping(mapping)：初始化URL路由映射關係。
self.add_processor()：添加了兩個處理器。

初始化URL路由映射關係

1 2	`def` `init_mapping(` `self` `, mapping):` `self` `.mapping` `=` `list` `(utils.group(mapping,` `2` `))`

這個函數還調用了一個工具函數，效果是這樣的：

 
         urls  
         = 
         ( 
         "/" 
         ,  
         "Index" 
         , 
        
 
              
         "/hello/(.*)" 
         ,  
         "Hello" 
         , 
        
 
              
         "/world" 
         ,  
         "World" 
         ) 
        

若是用戶初始化時傳遞的元組是這樣的，那麼調用init_mapping以後：

 
         self 
         .mapping  
         = 
         [[ 
         "/" 
         ,  
         "Index" 
         ], 
        
 
                  
         [ 
         "/hello/(.*)" 
         ,  
         "Hello" 
         ], 
        
 
                  
         [ 
         "/world" 
         ,  
         "World" 
         ]] 
        

後面框架在進行URL路由時，就會遍歷這個列表。
添加處理器

1 2	`self` `.add_processor(loadhook(` `self` `._load))` `self` `.add_processor(unloadhook(` `self` `._unload))`

這兩行代碼添加了兩個處理器：self._load和self._unload，並且還對這兩個函數進行了裝飾。處理器的是用在HTTP請求處理先後的，它不是真正用來處理一個HTTP請求，可是能夠用來做一些額外的工做，好比官方教程裏面有提到的給子應用添加session的作法，就是使用了處理器：

 
         def 
         session_hook(): 
        
         web.ctx.session  
         = 
         session 
        
         app.add_processor(web.loadhook(session_hook))

處理器的定義和使用都是比較複雜的，後面專門講。
wsgifunc函數

wsgifunc的執行結果是返回一個WSGI兼容的函數，而且該函數內部實現了URL路由等功能。

 
         def 
         wsgifunc( 
         self 
         ,  
         * 
         middleware): 
        
         """Returns a WSGI-compatible function for this application.""" 
        
         ... 
        
         for 
         m  
         in 
         middleware:  
        
         wsgi  
         = 
         m(wsgi) 
        
         return 
         wsgi

除開內部函數的定義，wsgifunc的定義就是這麼簡單，若是沒有實現任何中間件，那麼就是直接返回其內部定義的wsgi函數。
wsgi函數

該函數實現了WSGI兼容接口，同時也實現了URL路由等功能。

 
         def 
         wsgi(env, start_resp): 
        
         # clear threadlocal to avoid inteference of previous requests 
        
         self 
         ._cleanup() 
        
         self 
         .load(env) 
        
         try 
         : 
        
         # allow uppercase methods only 
        
         if 
         web.ctx.method.upper() ! 
         = 
         web.ctx.method: 
        
         raise 
         web.nomethod() 
        
         result  
         = 
         self 
         .handle_with_processors() 
        
         if 
         is_generator(result): 
        
         result  
         = 
         peep(result) 
        
         else 
         : 
        
         result  
         = 
         [result] 
        
         except 
         web.HTTPError, e: 
        
         result  
         = 
         [e.data] 
        
         result  
         = 
         web.safestr( 
         iter 
         (result)) 
        
         status, headers  
         = 
         web.ctx.status, web.ctx.headers 
        
         start_resp(status, headers) 
        
         def 
         cleanup(): 
        
         self 
         ._cleanup() 
        
         yield 
         ''  
         # force this function to be a generator 
        
         return 
         itertools.chain(result, cleanup()) 
        
         for 
         m  
         in 
         middleware:  
        
         wsgi  
         = 
         m(wsgi) 
        
         return 
         wsgi

下面來仔細分析一下這個函數：

1 2	`self` `._cleanup()` `self` `.load(env)`

self._cleanup()內部調用utils.ThreadedDict.clear_all()，清除全部的thread local數據，避免內存泄露（由於web.py框架的不少數據都會保存在thread local變量中）。

self.load(env)使用env中的參數初始化web.ctx變量，這些變量涵蓋了當前請求的信息，咱們在應用中有可能會使用到，好比web.ctx.fullpath。

 
         try 
         : 
        
         # allow uppercase methods only 
        
         if 
         web.ctx.method.upper() ! 
         = 
         web.ctx.method: 
        
         raise 
         web.nomethod() 
        
         result  
         = 
         self 
         .handle_with_processors() 
        
         if 
         is_generator(result): 
        
         result  
         = 
         peep(result) 
        
         else 
         : 
        
         result  
         = 
         [result] 
        
         except 
         web.HTTPError, e: 
        
         result  
         = 
         [e.data]

這一段主要是調用self.handle_with_processors()，這個函數會對請求的URL進行路由，找到合適的類或子應用來處理該請求，也會調用添加的處理器來作一些其餘工做（關於處理器的部分，後面專門講）。對於處理的返回結果，可能有三種方式：

返回一個可迭代對象，則進行安全迭代處理。
返回其餘值，則建立一個列表對象來存放。
若是拋出了一個HTTPError異常（好比咱們使用raise web.OK("hello, world")這種方式來返回結果時），則將異常中的數據e.data封裝成一個列表。

 
         result  
         = 
         web.safestr( 
         iter 
         (result)) 
        
         status, headers  
         = 
         web.ctx.status, web.ctx.headers 
        
         start_resp(status, headers) 
        
         def 
         cleanup(): 
        
         self 
         ._cleanup() 
        
         yield 
         ''  
         # force this function to be a generator 
        
         return 
         itertools.chain(result, cleanup())

接下來的這段代碼，會對前面返回的列表result進行字符串化處理，獲得HTTP Response的body部分。而後根據WSGI的規範做以下兩個事情：

調用start_resp函數。
將result結果轉換成一個迭代器。

如今你能夠看到，以前咱們提到的application = app.wsgifunc()就是將wsgi函數賦值給application變量，這樣應用服務器就能夠採用WSGI標準和咱們的應用對接了。
處理HTTP請求

前面分析的代碼已經說明了web.py框架如何實現WSGI兼容接口的，即咱們已經知道了HTTP請求到達框架以及從框架返回給應用服務器的流程。那麼框架內部是如何調用咱們的應用代碼來實現一個請求的處理的呢？這個就須要詳細分析剛纔忽略掉的處理器的添加和調用過程。
loadhook和unloadhook裝飾器

這兩個函數是真實處理器的函數的裝飾器函數（雖然他的使用不是採用裝飾器的@操做符），裝飾後獲得的處理器分別對應請求處理以前（loadhook）和請求處理以後（unloadhook）。
loadhook

 
         def 
         loadhook(h): 
        
         def 
         processor(handler): 
        
         h() 
        
         return 
         handler() 
        
         return 
         processor

這個函數返回一個函數processor，它會確保先調用你提供的處理器函數h，而後再調用後續的操做函數handler。
unloadhook

 
         def 
         unloadhook(h): 
        
         def 
         processor(handler): 
        
         try 
         : 
        
         result  
         = 
         handler() 
        
         is_generator  
         = 
         result  
         and 
         hasattr 
         (result,  
         'next' 
         ) 
        
         except 
         : 
        
         # run the hook even when handler raises some exception 
        
         h() 
        
         raise 
        
         if 
         is_generator: 
        
         return 
         wrap(result) 
        
         else 
         : 
        
         h() 
        
         return 
         result 
        
         def 
         wrap(result): 
        
         def 
         next 
         (): 
        
         try 
         : 
        
         return 
         result. 
         next 
         () 
        
         except 
         : 
        
         # call the hook at the and of iterator 
        
         h() 
        
         raise 
        
         result  
         = 
         iter 
         (result) 
        
         while 
         True 
         : 
        
         yield 
         next 
         () 
        
         return 
         processor

這個函數也返回一個processor，它會先調用參數傳遞進來的handler，而後再調用你提供的處理器函數。
handle_with_processors函數

 
         def 
         handle_with_processors( 
         self 
         ): 
        
         def 
         process(processors): 
        
         try 
         : 
        
         if 
         processors: 
        
         p, processors  
         = 
         processors[ 
         0 
         ], processors[ 
         1 
         :] 
        
         return 
         p( 
         lambda 
         : process(processors)) 
        
         else 
         : 
        
         return 
         self 
         .handle() 
        
         except 
         web.HTTPError: 
        
         raise 
        
         except 
         (KeyboardInterrupt, SystemExit): 
        
         raise 
        
         except 
         : 
        
         print 
         >> web.debug, traceback.format_exc() 
        
         raise 
         self 
         .internalerror() 
        
         # processors must be applied in the resvere order. (??) 
        
         return 
         process( 
         self 
         .processors)

這個函數挺複雜的，最核心的部分採用了遞歸實現（我感受不遞歸應該也能實現一樣的功能）。爲了說明清晰，採用實例說明。

前面有提到，初始化application實例的時候，會添加兩個處理器到self.processors：

1 2	`self` `.add_processor(loadhook(` `self` `._load))` `self` `.add_processor(unloadhook(` `self` `._unload))`

因此，如今的self.processors是下面這個樣子的：

1	`self` `.processors` `=` `[loadhook(` `self` `._load), unloadhook(` `self` `._unload)]`

# 爲了方便後續說明，咱們縮寫一下：

1	`self` `.processors` `=` `[load_processor, unload_processor]`

當框架開始執行handle_with_processors的時候，是逐個執行這些處理器的。咱們仍是來看代碼分解，首先簡化一下handle_with_processors函數：

 
         def 
         handle_with_processors( 
         self 
         ): 
        
         def 
         process(processors): 
        
         try 
         : 
        
         if 
         processors:  
         # 位置2 
        
         p, processors  
         = 
         processors[ 
         0 
         ], processors[ 
         1 
         :] 
        
         return 
         p( 
         lambda 
         : process(processors))  
         # 位置3 
        
         else 
         : 
        
         return 
         self 
         .handle()  
         # 位置4 
        
         except 
         web.HTTPError: 
        
         raise 
        
         ... 
        
         # processors must be applied in the resvere order. (??) 
        
         return 
         process( 
         self 
         .processors)  
         # 位置1

函數執行的起點是位置1，調用其內部定義函數process(processors)。
若是位置2判斷處理器列表不爲空，則進入if內部。
在位置3調用本次須要執行的處理器函數，參數爲一個lambda函數，而後返回。
若是位置2判斷處理器列表爲空，則執行self.handle()，該函數真正的調用咱們的應用代碼（下面會講到）。

以上面的例子來講，目前有兩個處理器：

1	`self` `.processors` `=` `[load_processor, unload_processor]`

從位置1進入代碼後，在位置2會判斷還有處理器要執行，會走到位置3，此時要執行代碼是這樣的：

1	`return` `load_processor(` `lambda` `: process([unload_processor]))`

load_processor函數是一個通過loadhook裝飾的函數，所以其定義在執行時是這樣的：

 
         def 
         load_processor( 
         lambda 
         : process([unload_processor])): 
        
         self 
         ._load() 
        
         return 
         process([unload_processor])  
         # 就是參數的lambda函數

會先執行self._load()，而後再繼續執行process函數，依舊會走到位置3，此時要執行的代碼是這樣的：

1	`return` `unload_processor(` `lambda` `: process([]))`

unload_processor函數是一個通過unloadhook裝飾的函數，所以其定義在執行時是這樣的：

 
         def 
         unload_processor( 
         lambda 
         : process([])): 
        
         try 
         : 
        
         result  
         = 
         process([])  
         # 參數傳遞進來的lambda函數 
        
         is_generator  
         = 
         result  
         and 
         hasattr 
         (result,  
         'next' 
         ) 
        
         except 
         : 
        
         # run the hook even when handler raises some exception 
        
         self 
         ._unload() 
        
         raise 
        
         if 
         is_generator: 
        
         return 
         wrap(result) 
        
         else 
         : 
        
         self 
         ._unload() 
        
         return 
         result

如今會先執行process([])函數，而且走到位置4（調用self.handle()的地方），從而獲得應用的處理結果，而後再調用本處理器的處理函數self._unload()。

總結一下執行的順序：

 
         self 
         ._load() 
        
         self 
         .handle() 
        
         self 
         ._unload()

若是還有更多的處理器，也是按照這種方法執行下去，對於loadhook裝飾的處理器，先添加的先執行，對於unloadhook裝飾的處理器，後添加的先執行。
handle函數

講了這麼多，纔講到真正要調用咱們寫的代碼的地方。在全部的load處理器執行完以後，就會執行self.handle()函數，其內部會調用咱們寫的應用代碼。好比返回個hello, world之類的。self.handle的定義以下：

 
         def 
         handle( 
         self 
         ): 
        
 
            
         fn, args  
         = 
         self 
         ._match( 
         self 
         .mapping, web.ctx.path) 
        
 
            
         return 
         self 
         ._delegate(fn,  
         self 
         .fvars, args) 
        

這個函數就很好理解了，第一行調用的self._match是進行路由功能，找到對應的類或者子應用，第二行的self._delegate就是調用這個類或者傳遞請求到子應用。
_match函數

_match函數的定義以下：

 
         def 
         _match( 
         self 
         , mapping, value): 
        
         for 
         pat, what  
         in 
         mapping: 
        
         if 
         isinstance 
         (what, application):  
         # 位置1 
        
         if 
         value.startswith(pat): 
        
         f  
         = 
         lambda 
         :  
         self 
         ._delegate_sub_application(pat, what) 
        
         return 
         f,  
         None 
        
         else 
         : 
        
         continue 
        
         elif 
         isinstance 
         (what,  
         basestring 
         ):  
         # 位置2 
        
         what, result  
         = 
         utils.re_subm( 
         '^' 
         + 
         pat  
         + 
         '$' 
         , what, value) 
        
         else 
         :  
         # 位置3 
        
         result  
         = 
         utils.re_compile( 
         '^' 
         + 
         pat  
         + 
         '$' 
         ).match(value) 
        
         if 
         result:  
         # it's a match 
        
         return 
         what, [x  
         for 
         x  
         in 
         result.groups()] 
        
         return 
         None 
         ,  
         None

該函數的參數中mapping就是self.mapping，是URL路由映射表；value則是web.ctx.path，是本次請求路徑。該函數遍歷self.mapping，根據映射關係中處理對象的類型來處理：

位置1，處理對象是一個application實例，也就是一個子應用，則返回一個匿名函數，該匿名函數會調用self._delegate_sub_application進行處理。
位置2，若是處理對象是一個字符串，則調用utils.re_subm進行處理，這裏會把value（也就是web.ctx.path）中的和pat匹配的部分替換成what（也就是咱們指定的一個URL模式的處理對象字符串），而後返回替換後的結果以及匹配的項（是一個re.MatchObject實例）。
位置3，若是是其餘狀況，好比直接指定一個類對象做爲處理對象。

若是result非空，則返回處理對象和一個參數列表（這個參數列表就是傳遞給咱們實現的GET等函數的參數）。
_delegate函數

從_match函數返回的結果會做爲參數傳遞給_delegate函數：

1 2	`fn, args` `=` `self` `._match(` `self` `.mapping, web.ctx.path)` `return` `self` `._delegate(fn,` `self` `.fvars, args)`

其中：

fn：是要處理當前請求的對象，通常是一個類名。
args：是要傳遞給請求處理對象的參數。
self.fvars：是實例化application時的全局名稱空間，會用於查找處理對象。

_delegate函數的實現以下：

 
         def 
         _delegate( 
         self 
         , f, fvars, args 
         = 
         []): 
        
         def 
         handle_class( 
         cls 
         ): 
        
         meth  
         = 
         web.ctx.method 
        
         if 
         meth  
         = 
         = 
         'HEAD' 
         and 
         not 
         hasattr 
         ( 
         cls 
         , meth): 
        
         meth  
         = 
         'GET' 
        
         if 
         not 
         hasattr 
         ( 
         cls 
         , meth): 
        
         raise 
         web.nomethod( 
         cls 
         ) 
        
         tocall  
         = 
         getattr 
         ( 
         cls 
         (), meth) 
        
         return 
         tocall( 
         * 
         args) 
        
         def 
         is_class(o):  
         return 
         isinstance 
         (o, (types.ClassType,  
         type 
         )) 
        
         if 
         f  
         is 
         None 
         : 
        
         raise 
         web.notfound() 
        
         elif 
         isinstance 
         (f, application): 
        
         return 
         f.handle_with_processors() 
        
         elif 
         is_class(f): 
        
         return 
         handle_class(f) 
        
         elif 
         isinstance 
         (f,  
         basestring 
         ): 
        
         if 
         f.startswith( 
         'redirect ' 
         ): 
        
         url  
         = 
         f.split( 
         ' ' 
         ,  
         1 
         )[ 
         1 
         ] 
        
         if 
         web.ctx.method  
         = 
         = 
         "GET" 
         : 
        
         x  
         = 
         web.ctx.env.get( 
         'QUERY_STRING' 
         , '') 
        
         if 
         x: 
        
         url  
         + 
         = 
         '?' 
         + 
         x 
        
         raise 
         web.redirect(url) 
        
         elif 
         '.' 
         in 
         f: 
        
         mod,  
         cls 
         = 
         f.rsplit( 
         '.' 
         ,  
         1 
         ) 
        
         mod  
         = 
         __import__ 
         (mod,  
         None 
         ,  
         None 
         , ['']) 
        
         cls 
         = 
         getattr 
         (mod,  
         cls 
         ) 
        
         else 
         : 
        
         cls 
         = 
         fvars[f] 
        
         return 
         handle_class( 
         cls 
         ) 
        
         elif 
         hasattr 
         (f,  
         '__call__' 
         ): 
        
         return 
         f() 
        
         else 
         : 
        
         return 
         web.notfound()