國內有jieba 分詞html
https://nlp.stanford.edu/software/crf-faq.html#ajava
C:\my_study\ML\NLP\stanford-ner-2018-02-27>java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop chinese.meal.fpp.prop Invoked on Thu Mar 22 16:34:06 CST 2018 with arguments: -prop chinese.meal.fpp.prop usePrevSequences=true useClassFeature=true useTypeSeqs2=true useSequences=true wordShape=chris2useLC useTypeySequences=true useDisjunctive=true noMidNGrams=true serializeTo=ner-model.ser.gz maxNGramLeng=6 useNGrams=true usePrev=true useNext=true maxLeft=1 trainFile=chinese.meal.fpp.tsv map=word=0,answer=1 useWord=true useTypeSeqs=true numFeatures = 564 Time to convert docs to feature indices: 0.0 seconds numClasses: 5 [0=O,1=TIME,2=QUANTITY,3=UNIT,4=FOOD] numDocuments: 1 numDatums: 56 numFeatures: 564 Time to convert docs to data/labels: 0.0 seconds numWeights: 6460 QNMinimizer called on double function of 6460 variables, using M = 25. An explanation of the output: Iter The number of iterations evals The number of function evaluations SCALING <D> Diagonal scaling was used; <I> Scaled Identity LINESEARCH [## M steplength] Minpack linesearch 1-Function value was too high 2-Value ok, gradient positive, positive curvature 3-Value ok, gradient negative, positive curvature 4-Value ok, gradient negative, negative curvature [.. B] Backtracking VALUE The current function value TIME Total elapsed time
|GNORM| The current norm of the gradient {RELNORM} The ratio of the current to initial gradient norms AVEIMPROVE The average improvement / current value EVALSCORE The last available eval score Iter ## evals ## <SCALING> [LINESEARCH] VALUE TIME |GNORM| {RELNORM} AVEIMPROVE EVALSCORE Iter 1 evals 1 <D> [M 1.000E-1] 9.068E2 0.04s |4.550E1| {4.995E-1} 0.000E0 - Iter 2 evals 2 <D> [M 1.000E0] 6.222E2 0.05s |3.525E1| {3.870E-1} 2.287E-1 - Iter 3 evals 3 <D> [M 1.000E0] 2.386E2 0.07s |5.406E1| {5.935E-1} 9.334E-1 - Iter 4 evals 4 <D> [M 1.000E0] 9.082E1 0.08s |1.571E1| {1.724E-1} 2.246E0 - Iter 5 evals 5 <D> [M 1.000E0] 7.031E1 0.10s |1.181E1| {1.297E-1} 2.379E0 - Iter 6 evals 6 <D> [M 1.000E0] 5.308E1 0.11s |1.025E1| {1.125E-1} 2.681E0 - Iter 7 evals 7 <D> [1M 2.740E-1] 2.988E1 0.14s |7.586E0| {8.328E-2} 4.193E0 - Iter 8 evals 9 <D> [1M 1.292E-1] 2.234E1 0.16s |6.471E0| {7.105E-2} 4.949E0 - Iter 9 evals 11 <D> [1M 1.801E-1] 1.615E1 0.18s |5.573E0| {6.118E-2} 6.127E0 - Iter 10 evals 13 <D> [1M 1.815E-1] 1.218E1 0.24s |4.477E0| {4.915E-2} 7.346E0 - Iter 11 evals 15 <D> [1M 3.119E-1] 8.873E0 0.30s |4.694E0| {5.154E-2} 6.912E0 - Iter 12 evals 17 <D> [1M 4.760E-1] 6.621E0 0.31s |2.092E0| {2.296E-2} 3.504E0 - Iter 13 evals 19 <D> [M 1.000E0] 6.093E0 0.32s |1.906E0| {2.092E-2} 1.390E0 - Iter 14 evals 20 <D> [M 1.000E0] 5.844E0 0.33s |9.067E-1| {9.955E-3} 1.103E0 - Iter 15 evals 21 <D> [M 1.000E0] 5.721E0 0.33s |5.774E-1| {6.339E-3} 8.279E-1 - Iter 16 evals 22 <D> [M 1.000E0] 5.660E0 0.34s |3.535E-1| {3.881E-3} 4.279E-1 - Iter 17 evals 23 <D> [M 1.000E0] 5.640E0 0.35s |1.946E-1| {2.137E-3} 2.961E-1 - Iter 18 evals 24 <D> [M 1.000E0] 5.632E0 0.36s |7.832E-2| {8.599E-4} 1.868E-1 - Iter 19 evals 25 <D> [M 1.000E0] 5.631E0 0.38s |3.559E-2| {3.907E-4} 1.163E-1 - Iter 20 evals 26 <D> [M 1.000E0] 5.631E0 0.39s |2.149E-2| {2.359E-4} 5.758E-2 - Iter 21 evals 27 <D> [M 1.000E0] 5.631E0 0.41s |1.027E-2| {1.128E-4} 1.758E-2 - Iter 22 evals 28 <D> [M 1.000E0] 5.631E0 0.42s |3.631E-3| {3.986E-5} 8.218E-3 - Iter 23 evals 29 <D> [M 1.000E0] 5.631E0 0.44s |1.629E-3| {1.789E-5} 3.791E-3 - Iter 24 evals 30 <D> [M 1.000E0] 5.631E0 0.45s |9.548E-4| {1.048E-5} 1.596E-3 - Iter 25 evals 31 <D> [M 1.000E0] 5.631E0 0.45s |5.724E-4| {6.284E-6} 5.196E-4 - Iter 26 evals 32 <D> [M 1.000E0] 5.631E0 0.47s |1.578E-4| {1.732E-6} 1.686E-4 - QNMinimizer terminated due to average improvement: | newest_val - previous_val | / |newestVal| < TOL Total time spent in optimization: 0.49s CRFClassifier training ... done [0.6 sec]. Serializing classifier to ner-model.ser.gz... done.
C:\my_study\ML\NLP\stanford-ner-2018-02-27>java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier ner-model.ser.gz -testFile chinese.meal.fpp.test.tsv Invoked on Thu Mar 22 16:30:48 CST 2018 with arguments: -loadClassifier ner-model.ser.gz -testFile chinese.meal.fpp.test.tsv testFile=chinese.meal.fpp.test.tsv loadClassifier=ner-model.ser.gz Loading classifier from ner-model.ser.gz ... done [0.1 sec]. 我 O O 今天 O O 晚上 TIME TIME 吃 O O 了 O O 兩 QUANTITY QUANTITY 盤 UNIT UNIT 回鍋肉 FOOD FOOD CRFClassifier tagged 8 words in 1 documents at 88.89 words per second. Entity P R F1 TP FP FN FOOD 1.0000 1.0000 1.0000 1 0 0 QUANTITY 1.0000 1.0000 1.0000 1 0 0 TIME 1.0000 1.0000 1.0000 1 0 0 UNIT 1.0000 1.0000 1.0000 1 0 0 Totals 1.0000 1.0000 1.0000 4 0 0
還不錯哦!lua
Ref:spa
1. Standford NLP NER: https://nlp.stanford.edu/software/CRF-NER.htmlcode