兩種方法進行中文分詞:Rwordseg和jiebaRweb
R語言的環境配置:spa
R_Path:orm
C:\Program Files\R\R-3.1.2server
Path: ip
%R_Path%
get
(1)進行Java的環境變量配置:it
JAVA_HOME:
io
C:\Program Files\Java\jdk1.8.0_31
test
Path:import
%JAVA_HOME%\bin;%JAVA_HOME%\jre\bin
CLASSPATH:
%JAVA_HOME%\lib\dt.jar;%JAVA_HOME%\lib\tools.jar
(2)下載Rwordseg包到本地硬盤,當前版本的Rwordseg包在https://r-forge.r-project.org/R/?group_id=1054
1 > install.packages("rJava")
2 > 將如下路徑添加到Path環境變量中:
• %JAVA_HOME%\jre\bin
• %JAVA_HOME%\jre\bin\server
• %R_Path%\library\rJava\jri
3 > install.packages("下載好的Rwordseg包所在的文件夾地址/Rwordseg_0.2-1.zip", repos=NULL,type="source")
(3)輸入命令:
1 > library("rJava")
2 > library("Rwordseg")
3 > words = "環衛工因在寒風中烤火取暖被辭退"
4 > segment.options(isNameRecognition = TRUE) #打開人名識別
5 > segmentCN(words)
運行結果:
[1] "環衛" "工" "因" "在" "寒風" "中" "烤火" "取暖" "被" "辭退"
換成words = "個人名字是R語言"
運行結果:[1] "我" "的" "名字" "是" "R語言"
(1)輸入命令:
1 > install.packages("jiebaR") #安裝jiebaR包
2 > library("jiebaRD") #加載jiebaRD包
3 > library("jiebaR")
4 > words = "環衛工因在寒風中烤火取暖被辭退"
5 > test = worker()
6 > test <= words
(2)輸出結果:
[1] "環衛工" "因在" "寒風" "中" "烤火" "取暖" "被" "辭退"
換成words = "個人名字是R語言"
運行結果:[1] "我" "的" "名字" "是" "R" "語言"
更多分享請關注:www.crxy.cn