python自帶的 i.e. text.split('.')
只能用單個符號給文本分段,若是想用多個符號分段呢? 好比想用句號,逗號,分號,感嘆號問好等等標點,怎麼辦?
這時候用 re.split()
python
import re a='Beautiful uef filenrfwe, is not really right; better*than\nugly' print(re.split('(; |, |\*|\n)',a)) text = 'If you have a; suspicion about, an activity. but are !unsure if it ?warrants, escalation' pattern = '(;|\.|,|\?|\!)' new = re.split(pattern, text)
解釋:scala
pattern = '(;|\.|,|\?|\!)'
|
表明 or \
是escape character, 因爲 ,
?
!
這些符號自己在regex中有特殊意味,因此要在前面加個escape,用\,
, \?
, \!
來表明 逗號,問號,感嘆號。()
的效果是 split後仍然包括這些標點自己。 比較:code
new = re.split('(;|\.|,|\?|\!)', text)
輸出是:ci
['If you have a', ';', ' suspicion about', ',', ' an activity', '.', ' but are ', '!', 'unsure if it ', '?', 'warrants', ',', ' escalation']
然而:it
new = re.split(';|\.|,|\?|\!', text)
輸出是:io
['If you have a', ' suspicion about', ' an activity', ' but are ', 'unsure if it ', 'warrants', ' escalation']