衆所周知,在 python 中可使用 exec
函數來執行包含 python 源代碼的字符串:html
>>> code = ''' ...: a = "hello" ...: print(a) ...: ''' >>> exec(code) hello >>> a 'hello'
exec
函數的這個功能非常強大,慎用。若是必定要用的話,那麼就須要注意一下下面這些安全相關的問題。python
在 exec
執行的代碼中,默承認以訪問執行 exec
時的局部變量和全局變量, 一樣也會修改全局變量。若是 exec 執行的代碼是根據用戶提交的數據生產的話,這種默認行爲就是一個安全隱患。安全
如何更改這種默認行爲呢?能夠經過執行 exec
函數的時候再傳兩個參數的方式來 修改這種行爲(詳見 以前 關於 exec 的文章):函數
>>> g = {} >>> l = {'b': 'world'} >>> exec('hello = "hello" + b', g, l) >>> l {'b': 'world', 'hello': 'helloworld'} >>> g {'__builtins__': {...}} >>> hello --------------------------------------------------------------------------- NameError Traceback (most recent call last) ... NameError: name 'hello' is not defined
若是要限制使用內置函數的話,能夠在 globals 參數中定義一下 __builtins__
這個 key:ui
>>> g = {} >>> l = {} >>> exec('a = int("1")', g, l) >>> l {'a': 1} >>> g = {'__builtins__': {}} >>> exec('a = int("1")', g, l) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 1, in <module> NameError: name 'int' is not defined >>>
如今咱們限制了訪問和修改全局變量以及使用內置函數,難道這樣就萬事大吉了嗎? 然而並不是如此,仍是能夠經過其餘的方式來獲取內置函數甚至 os.system
函數。code
經過函數對象:orm
>>> def a(): pass ... >>> a.__globals__['__builtins__'] >>> a.__globals__['__builtins__'].open <built-in function open>
經過內置類型對象:htm
>>> for cls in {}.__class__.__base__.__subclasses__(): ... if cls.__name__ == 'WarningMessage': ... b = cls.__init__.__globals__['__builtins__'] ... b['open'] ... <built-in function open> >>>
獲取 os.system
:對象
>>> cls = [x for x in [].__class__.__base__.__subclasses__() if x.__name__ == '_wrap_close'][0] >>> cls.__init__.__globals__['path'].os <module 'os' from '/usr/local/var/pyenv/versions/3.5.1/lib/python3.5/os.py'> >>>
對於這兩種辦法又如何應對呢? 一種辦法就是禁止訪問以 _
開頭的屬性:token
若是能夠控制 code 的生成,那麼就在生成 code 的時候判斷
若是不能的話,能夠經過 (dist 沒法分析嵌套函數的代碼)dis
模塊分析生成的 code
使用 tokenize
模塊:
In [68]: from io import BytesIO In [69]: code = ''' ....: a = 'b' ....: a.__str__ ....: def b(): ....: b.__get__ ....: ''' In [70]: t = tokenize(BytesIO(code.encode()).readline) In [71]: for x in t: ....: print(x) ....: TokenInfo(type=59 (ENCODING), string='utf-8', start=(0, 0), end=(0, 0), line='') TokenInfo(type=58 (NL), string='\n', start=(1, 0), end=(1, 1), line='\n') TokenInfo(type=1 (NAME), string='a', start=(2, 0), end=(2, 1), line="a = 'b'\n") TokenInfo(type=53 (OP), string='=', start=(2, 2), end=(2, 3), line="a = 'b'\n") TokenInfo(type=3 (STRING), string="'b'", start=(2, 4), end=(2, 7), line="a = 'b'\n") TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 7), end=(2, 8), line="a = 'b'\n") TokenInfo(type=1 (NAME), string='a', start=(3, 0), end=(3, 1), line='a.__str__\n') TokenInfo(type=53 (OP), string='.', start=(3, 1), end=(3, 2), line='a.__str__\n') TokenInfo(type=1 (NAME), string='__str__', start=(3, 2), end=(3, 9), line='a.__str__\n') TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 9), end=(3, 10), line='a.__str__\n') TokenInfo(type=1 (NAME), string='def', start=(4, 0), end=(4, 3), line='def b():\n') TokenInfo(type=1 (NAME), string='b', start=(4, 4), end=(4, 5), line='def b():\n') TokenInfo(type=53 (OP), string='(', start=(4, 5), end=(4, 6), line='def b():\n') TokenInfo(type=53 (OP), string=')', start=(4, 6), end=(4, 7), line='def b():\n') TokenInfo(type=53 (OP), string=':', start=(4, 7), end=(4, 8), line='def b():\n') TokenInfo(type=4 (NEWLINE), string='\n', start=(4, 8), end=(4, 9), line='def b():\n') TokenInfo(type=5 (INDENT), string=' ', start=(5, 0), end=(5, 4), line=' b.__get__\n') TokenInfo(type=1 (NAME), string='b', start=(5, 4), end=(5, 5), line=' b.__get__\n') TokenInfo(type=53 (OP), string='.', start=(5, 5), end=(5, 6), line=' b.__get__\n') TokenInfo(type=1 (NAME), string='__get__', start=(5, 6), end=(5, 13), line=' b.__get__\n') TokenInfo(type=4 (NEWLINE), string='\n', start=(5, 13), end=(5, 14), line=' b.__get__\n') TokenInfo(type=6 (DEDENT), string='', start=(6, 0), end=(6, 0), line='') TokenInfo(type=0 (ENDMARKER), string='', start=(6, 0), end=(6, 0), line='')
從上面的輸出咱們能夠知道當 type 是 OP 而且 string 等於 '.' 時,下一條記錄就是
點以後的屬性名稱。因此咱們的檢查代碼能夠這樣寫:
import io import tokenize def check_unsafe_attributes(string): g = tokenize.tokenize(io.BytesIO(string.encode('utf-8')).readline) pre_op = '' for toktype, tokval, _, _, _ in g: if toktype == tokenize.NAME and pre_op == '.' and tokval.startswith('_'): attr = tokval msg = "access to attribute '{0}' is unsafe.".format(attr) raise AttributeError(msg) elif toktype == tokenize.OP: pre_op = tokval
我所知道的使用 exec
函數時須要注意的安全問題就是這些了。 若是你還知道其餘須要注意的安全問題的話,歡迎留言告知。