NLPIR/ICTCLAS 漢語分詞系統(http://ictclas.nlpir.org)
PyNLPIR 是該漢語分詞系統的 python 封裝版(http://pynlpir.readthedocs.io...)python
安裝步驟:
① pip install pynlpir
② pynlpir updategit
官方文檔的漢語分詞示例:github
import pynlpir pynlpir.open() str = '歡迎科研人員、技術工程師、企事業單位與我的參與 NLPIR 平臺的建設工做。' result = pynlpir.segment(str) print(result) # output: [('歡迎', 'verb'), ('科研', 'noun'), ('人員', 'noun'), ('、', 'punctuation mark'), ('技術', 'noun'), ('工程師', 'noun'), ('、', 'punctuation mark'), ('企事業', 'noun'), ('單位', 'noun'), ('與', 'conjunction'), ('我的', 'noun'), ('參與', 'verb'), ('NLPIR', 'noun'), ('平臺', 'noun'), ('的', 'particle'), ('建設', 'verb'), ('工做', 'verb'), ('。', 'punctuation mark')]
可能遇到的問題:
① raise RuntimeError("NLPIR function 'NLPIR_Init' failed.")安全
解決方案:
訪問 https://github.com/NLPIR-team... 倉庫,
下載 license 例如 NLPIR-ICTCLAS 分詞系統受權中的 NLPIR.user 文件,
替換路徑 path_to_local_python/Lib/site-packages/pynlpir/Data 下的同名文件以更新受權。code
中文停用詞表:ip
["啊","阿","哎","哎呀","哎喲","唉","俺","俺們","按","按照","吧","吧噠","把","罷了","被","本","本着","比","比方","好比","鄙人","彼","彼此","邊","別","別的","別說","並","而且","不比","不成","不單","不但","不獨","無論","不光","不過","不只","不拘","不論","不怕","否則","不如","不特","不唯","不問","不僅","朝","朝着","趁","趁着","乘","衝","除","除此以外","除非","除了","此","此間","此外","從","從而","打","待","但","可是","當","當着","到","得","的","的話","等","等等","地","第","叮咚","對","對於","多","多少","而","而況","並且","而是","而外","而言","而已","爾後","反過來","反過來講","反之","非但","非徒","不然","嘎","嘎登","該","趕","個","各","各個","各位","各類","各自","給","根據","跟","故","故此","當然","關於","管","歸","果真","果然","過","哈","哈哈","呵","和","何","何處","況且","什麼時候","嘿","哼","哼唷","呼哧","乎","譁","仍是","還有","換句話說","換言之","或","或是","或者","極了","及","及其","及至","即","即使","即或","即令","即若","即便","幾","幾時","己","既","既然","既是","繼而","加之","假如","倘若","假使","鑑於","將","較","較之","叫","接着","結果","借","緊接着","進而","盡","儘管","經","通過","就","就是","就是說","據","具體地說","具體說來","開始","開外","靠","咳","可","可見","但是","能夠","何況","啦","來","來着","離","例如","哩","連","連同","二者","了","臨","另","另外","另外一方面","論","嘛","嗎","慢說","漫說","冒","麼","每","每當","們","莫若","某","某個","某些","拿","哪","哪邊","哪兒","哪一個","哪裏","哪年","哪怕","哪天","哪些","哪樣","那","那邊","那兒","那個","那會兒","那裏","那麼","那麼些","那麼樣","那時","那些","那樣","乃","乃至","呢","能","你","大家","您","寧","寧肯","寧可","寧願","哦","嘔","啪達","旁人","呸","憑","憑藉","其","其次","其二","其餘","其它","其一","其他","其中","起","起見","豈但","偏偏相反","先後","前者","且","然而","而後","然則","讓","人家","任","任何","任憑","如","如此","若是","如何","如其","如若","如上所述","若","若非","如果","啥","上下","尚且","設若","設使","甚而","甚麼","甚至","免得","時候","什麼","什麼樣","使得","是","是的","首先","誰","誰知","順","順着","似的","雖","雖然","雖然說","雖則","隨","隨着","所","因此","他","他們","他人","它","它們","她","她們","倘","倘或","倘然","假若","倘使","騰","替","經過","同","同時","哇","萬一","往","望","爲","爲什麼","爲了","爲何","爲着","喂","嗡嗡","我","咱們","嗚","嗚呼","烏乎","不管","無寧","毋寧","嘻","嚇","相對而言","像","向","向着","噓","呀","焉","沿","沿着","要","要不","要否則","要不是","要麼","要是","也","也罷","也好","一","通常","一旦","一方面","一來","一切","同樣","一則","依","依照","矣","以","以便","以及","以避免","以致","以致於","以至","抑或","因","所以","於是","由於","喲","用","由","因而可知","因爲","有","有的","有關","有些","又","於","因而","因而乎","與","與此同時","與否","與其","越是","云云","哉","再說","再者","在","在下","咱","我們","則","怎","怎麼","怎麼辦","怎麼樣","怎樣","咋","照","照着","者","這","這邊","這兒","這個","這會兒","這就是說","這裏","這麼","這麼點兒","這麼些","這麼樣","這時","這些","這樣","正如","吱","之","之類","之因此","之一","只是","只限","只要","只有","至","至於","諸位","着","着呢","自","自從","自個兒","自各兒","本身","自家","自身","綜上所述","總的來看","總的來講","總的說來","總而言之","總之","縱","縱令","縱然","縱使","遵守","做爲","兮","呃","唄","咚","咦","喏","啐","喔唷","嗬","嗯","噯","啊哈","啊呀","啊喲","挨次","挨個","挨家挨戶","挨門挨戶","挨門逐戶","挨着","按理","定期","按時","按說","暗地裏","暗中","暗自","昂然","八成","白白","半","梆","保管","保險","飽","背地裏","背靠背","倍感","倍加","本人","自己","甭","比起","好比說","比照","畢竟","必","一定","必將","必須","便","別人","並不是","並肩","並沒","並無","並排","並沒有","勃然","不","沒必要","不常","不大","不得","不得不","不得了","不得已","不迭","不定","不對","不妨","無論怎樣","不會","不只僅","不只僅是","不經意","不可開交","不可抗拒","不力","不了","不料","不滿","難免","不能不","不起","不巧","否則的話","不日","很多","不勝","不時","不是","不一樣","不能","不要","不外","不外乎","不下","不限","不消","不已","不亦樂乎","不禁得","再也不","不擇手段","不怎麼","未曾","不知不覺","不止","不止一次","不至於","才","才能","策略地","差很少","差一點","常","經常","常言道","常言說","常言說得好","長此下去","長話短說","長期以來","長線","敞開兒","徹夜","陳年","趁便","趁機","趁熱","趁勢","趁早","成年","成年累月","成心","伺機","乘勝","乘勢","乘隙","乘虛","誠然","早晚","充分","充其極","充其量","抽冷子","臭","初","出","出來","出去","除此","除此而外","除此之外","除開","除去","除卻","除外","到處","川流不息","傳","傳說","傳聞","串行","純","純粹","此後","此中","次第","匆匆","從不","今後","今後之後","古往今來","從古至今","從今之後","從寬","歷來","從輕","從速","從頭","從未","從無到有","從小","重新","從嚴","從優","從早到晚","從中","從重","湊巧","粗","存心","達旦","打從","打開天窗說亮話","大","大不了","大大","大抵","大都","大多","大凡","大概","你們","大舉","大略","大面兒上","大事","大致","大致上","大約","大張旗鼓","大體","呆呆地","帶","殆","待到","單","單純","單單","希望","彈指之間","當場","當兒","立即","當口兒","固然","當庭","當頭","當下","當真","當中","倒不如","倒不如說","卻是","處處","到底","到了兒","到目前爲止","到頭","到頭來","得起","得天獨厚","的確","等到","叮噹","頂多","定","動不動","動輒","陡然","都","獨","獨自","斷然","頓時","屢次","多多","多多少少","多多益善","多虧","多年來","多年前","然後","而論","而又","爾等","二話不說","二話沒說","反倒","反卻是","反而","反手","反之亦然","反之則","方","方纔","方能","放量","很是","非得","分期","分期分批","分頭","奮勇","憤然","風雨無阻","逢","弗","甫","嘎嘎","該當","概","趕快","趕早不趕晚","敢","敢情","勇於","剛","剛纔","恰好","剛巧","高低","格外","隔日","隔夜","我的","各式","更","更加","更進一步","更爲","公然","共","共總","夠瞧的","姑且","古來","故而","故意","固","怪","怪不得","慣常","光","光是","歸根到底","歸根結底","過於","絕不","毫無","毫無保留地","毫無例外","好在","何須","未嘗","何妨","何苦","何樂而不爲","何必","何止","很","不少","不多","轟然","後來","呼啦","忽地","突然","互","互相","嘩啦","話說","還","恍然","會","豁然","活","夥同","或多或少","或許","基本","基本上","基於","極","極大","極度","極端","極力","極其","極爲","急匆匆","即將","即刻","便是說","幾度","幾番","幾乎","幾經","既...又","繼之","加上","加以","間或","簡而言之","簡言之","簡直","見","將才","將近","將要","交口","較比","較爲","接連不斷","接下來","皆可","截然","截至","藉以","藉此","藉以","屆時","僅","僅僅","謹","進來","進去","近","近幾年來","近來","近年來","儘管如此","儘量","儘快","儘可能","盡然","盡如人意","盡心竭力","全力以赴","儘早","精光","常常","竟","居然","究竟","就此","就地","就算","竟然","局外","舉凡","據稱","據此","據實","聽說","據我所知","據悉","具體來講","決不","決非","絕","毫不","絕頂","絕對","絕非","均","喀","看","看來","看起來","看上去","看樣子","可好","可能","恐怕","快","快要","來不及","來得及","來說","來看","攔腰","緊緊","老","老大","老老實實","總是","累次","累年","理當","理該","理應","歷","立","立地","馬上","立馬","立時","聯袂","連連","連日","連日來","連聲","連袂","臨到","另方面","另行","另外一個","路經","屢","多次","多次三番","屢屢","縷縷","率爾","率然","略","略加","略微","略爲","論說","立刻","蠻","滿","沒","沒有","每逢","往往","每時每刻","猛然","猛然間","莫","莫不","莫非","莫如","默默地","默然","吶","那末","奈","難道","可貴","難怪","難說","內","年復一年","凝神","偶而","偶爾","怕","砰","碰巧","譬如","恰恰","乒","平素","頗","迫於","撲通","其後","其實","奇","齊","起初","起來","起首","起頭","起先","豈","豈非","豈止","迄","恰逢","剛好","偏偏","恰巧","恰如","恰似","千","萬","千萬","千萬千萬","切","切不可","切莫","切切","切勿","竊","親口","親身","親手","親眼","親自","頃","頃刻","頃刻間","頃刻之間","請勿","窮年累月","取道","去","權時","全都","全力","整年","全然","全身心","然","人人","仍","仍舊","仍然","日復一日","日見","日漸","日益","日臻","如常","如此等等","如次","現在","如期","如前所述","如上","以下","汝","三番兩次","三番五次","三天兩頭","瑟瑟","沙沙","上","上來","上去","一.","一一","一下","一個","一些","一何","一則經過","一天","必定","一時","一次","一片","一番","一直","一致","一塊兒","一轉眼","一邊","一面","上升","上述","上面","下","下列","下去","下來","下面","不一","不久","不變","不可","不夠","不盡","不盡然","不敢","不斷","不若","不足","與其說","專門","且不說","且說","嚴格","嚴重","個別","中小","中間","豐富","爲主","爲什麼","爲止","爲此","主張","主要","舉行","乃至於","以前","以後","之後","也就是說","也是","瞭解","爭取","二來","云爾","些","亦","產生","人","人們","什麼","今","從此","今天","今年","今後","介於","從事","他是","他的","代替","以上","如下","覺得","之前","之後","之外","以後","以故","以期","以來","任務","企圖","偉大","彷佛","但凡","何以","餘外","你是","你的","使","使用","依據","依靠","便於","促進","保持","作到","儻然","兒","容許","元/噸","先不先","前後","先後","先生","全體","所有","全面","共同","具體","具備","兼之","再","再其次","再則","再有","再次","再者說","決定","準備","凡","凡是","出於","出現","分別","則甚","別處","別是","別管","前此","前進","前面","加入","增強","十分","即如","卻","卻不","原來","又及","及時","雙方","反應","反映","取得","受到","變成","另悉","只","只當","只怕","只消","叫作","召開","各人","各地","各級","合理","同一","一樣","後","後者","後面","向使","周圍","呵呵","咧","惟有","啷噹","嘍","嗡","嘿嘿","因了","因着","在於","堅定","堅持","處在","處理","複雜","多麼","多數","大力","大多數","大批","大量","失去","她是","她的","好","好的","好象","如同","如是","始而","存在","孰料","孰知","它們的","它是","它的","安全","徹底","完成","實現","實際","宣佈","容易","密切","對應","對待","對方","對比","小","少數","爾","爾爾","尤爲","就是了","就要","屬於","左右","巨大","鞏固","已","已矣","已經","巴","巴巴","幫助","並不","並非","廣大","普遍","應當","應用","應該","庶乎","庶幾","開展","引發","強烈","強調","歸齊","當前","當地","當時","造成","完全","彼時","每每","後來","後面","得了","得出","獲得","內心","必然","必要","怎奈","怎麼","老是","總結","您們","您是","唯其","意思","願意","成爲","我是","個人","或則","或曰","戰鬥","所在","所幸","全部","所謂","擴大","掌握","接著","數/","整個","方便","方面","無","沒法","既往","明顯","明確","是否是","是以","是否","顯然","顯著","普通","廣泛","曾","曾經","替代","最","最後","最大","最好","最後","最近","最高","有利","有力","有及","有所","有效","有時","有點","有的是","有着","有著","末##末","本地","來自","來講","構成","某某","根本","歡迎","歟","正值","正在","正巧","正常","正是","此地","此處","此時","這次","每一個","天天","每一年","比及","比較","沒奈何","注意","深刻","清楚","知足","然後","特別是","特殊","特色","猶且","猶自","現代","如今","甚且","甚或","甚至於","用來","由是","由此","目前","直到","直接","類似","相信","相反","相同","相對","相應","至關","相等","看出","看到","看看","看見","真是","真正","眨眼","矣乎","矣哉","知道","肯定","種","積極","移動","突出","忽然","當即","竟而","第二","類如","練習","組成","結合","繼後","繼續","維持","考慮","聯繫","可否","可以","自後","自打","至今","至若","致","般的","良好","若夫","若果","範圍","莫否則","得到","行爲","行動","代表","表示","要求","規定","以爲","譬喻","認爲","認真","認識","許多","設或","誠如","說明","說來","說說","諸","諸如","誰人","誰料","賊死","賴以","距","轉動","轉變","轉貼","達到","迅速","過去","過來","運用","還要","這一來","此次","這點","這種","這般","這麼","進入","進步","進行","適應","適當","適用","逐步","逐漸","一般","形成","遇到","遭到","遵循","避免","那般","那麼","部分","採起","裏面","重大","從新","重要","針對","問題","防止","附近","限制","隨後","隨時","隨著","難道說","集中","須要","非特","非獨","高興","若果 "]