中文電子病歷命名實體識別（CNER）研究進展

時間 2021-01-29

標籤 ios git github web spring 網絡 app dom electron 性能欄目 iOS 简体版

原文原文鏈接

中文電子病歷命名實體識別（CNER）研究進展

中文電子病歷命名實體識別（Chinese Clinical Named Entity Recognition, Chinese-CNER）任務目標是從給定的電子病歷純文本文檔中識別並抽取出與醫學臨牀相關的實體說起，並將它們歸類到預約義的類別。最近把以前收集整理的一些CNER相關的研究進展放在了github上。主要內容包括Chinese-CNER的相關論文列表，以及目前各個主要數據集上的一些先進結果，但願對CNER感興趣的讀者有所幫助。ios

github地址：https://github.com/lingluodlut/Chinese-BioNLPgit

中文電子病歷實體識別研究相關論文

在中文電子病歷實體識別任務上，已經有很多研究方法被提出，這些研究主要集中在對領域特徵的探索上，即在通用領域NER方法的基礎上，研究中文漢字特徵和電子病歷知識特徵等來提高模型性能。github

綜述論文

電子病歷命名實體識別和實體關係抽取研究綜述. 楊錦鋒, 於秋濱, 關毅等. 自動化學報, 2014, 40(8):1537-1561.[paper]
中文電子病歷的命名實體識別研究進展. 楊飛洪,張宇,覃露等.中國數字醫學,2020,15(02):9-12. [paper]
Overview of CCKS 2018 Task 1: Named Entity Recognition in Chinese Electronic Medical Records. Zhang J, Li J, Jiao Z, et al. In China Conference on Knowledge Graph and Semantic Computing, Springer, 2019:158-164. [paper]
Overview of the CCKS 2019 Knowledge Graph Evaluation Track: Entity, Relation, Event and QA. Han X, Wang Z, Zhang J, et al. arXiv preprint, 2020, arXiv:2003.03875. [paper]

方法論文

HITSZ_CNER: a hybrid system for entity recognition from Chinese clinical text. Hu J, Shi X, Liu Z, et al. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2017), Chendu, China, 2017:1-6. [paper].
Clinical named entity recognition from Chinese electronic health records via machine learning methods. Zhang Y, Wang X, Hou Z, et al. JMIR medical informatics. 2018;6(4):e50. [paper]
A BiLSTM-CRF Method to Chinese Electronic Medical Record Named Entity Recognition. Ji B, Liu R, Li S, et al. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence, 2018:1-6.[paper]
A multitask bi-directional RNN model for named entity recognition on Chinese electronic medical records. Chowdhury S, Dong X, Qian L, et al. BMC bioinformatics. 2018, 19(17):75-84.[paper]
A Conditional Random Fields Approach to Clinical Name Entity Recognition. Yang X, Huang W. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2018). Tianjin, China, 2018:1-6.[paper]
DUTIR at the CCKS-2018 Task1: A Neural Network Ensemble Approach for Chinese Clinical Named Entity Recognition. Luo L, Li N, Li S, et al. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2018). Tianjin, China, 2018:1-6. [paper]
Incorporating dictionaries into deep neural networks for the chinese clinical named entity recognition. Wang Q, Zhou Y, Ruan T, et al. Journal of biomedical informatics, 2019, 92: 103133. [paper]
A hybrid approach for named entity recognition in Chinese electronic medical record. Ji B, Liu R, Li S, et al. BMC medical informatics and decision making. 2019 Apr;19(2):149-58. [paper]
Chinese Clinical Named Entity Recognition Using Residual Dilated Convolutional Neural Network with Conditional Random Field. Qiu J, Zhou Y, Wang Q, et al. IEEE Transactions on NanoBioscience. 2019, 18(3):306-315. [paper]
An attention-based deep learning model for clinical named entity recognition of Chinese electronic medical records. Li L, Zhao J, Hou L, et al. BMC medical informatics and decision making. 2019, 19(5):1-1. [paper]
Chinese clinical named entity recognition with word-level information incorporating dictionaries. Lu N, Zheng J, Wu W, et al. In 2019 International Joint Conference on Neural Networks (IJCNN), 2019,1-8. [paper]
Fine-tuning BERT for joint entity and relation extraction in Chinese medical text. Xue K, Zhou Y, Ma Z, et al. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2019, 892-897. [paper]
Chinese clinical named entity recognition with radical-level feature and self-attention mechanism. Yin M, Mou C, Xiong K, et al. Journal of biomedical informatics. 2019, 98:103289. [paper]
Adversarial training based lattice LSTM for Chinese clinical named entity recognition. Zhao S, Cai Z, Chen H, et al. Journal of biomedical informatics. 2019, 99:103290. [paper]
基於句子級 Lattice-長短記憶神經網絡的中文電子病歷命名實體識別. 潘璀然, 王青華, 湯步洲等. 第二軍醫大學學報. 2019,40(05):497-507.[paper]
基於BERT與模型融合的醫療命名實體識別. 喬銳，楊笑然，黃文亢. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2019) [paper]
Noisy Label Learning for Chinese Medical Named Entity Recognition Based on Uncertainty Strategy. Li Z, Gan Z, Zhang B, et al. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2020) [paper]
基於BERT與字形字音特徵的醫療命名實體識別. 晏陽天, 趙新宇, 吳賢. Proceedings of the Evaluation Tasks at the China Conference on Knowledge Graph and Semantic Computing (CCKS 2020) [paper]
Cross domains adversarial learning for Chinese named entity recognition for online medical consultation. Wen G, Chen H, Li H, et al. Journal of Biomedical Informatics. 2020 Dec 1;112:103608. [paper]
Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree. Wang C, Wang H, Zhuang H, et al. Journal of Biomedical Informatics. 2020, 111:103583. [paper]
Chinese Clinical Named Entity Recognition in Electronic Medical Records: Development of a Lattice Long Short-Term Memory Model With Contextualized Character Representations. Li Y, Wang X, Hui L, et al. JMIR Medical Informatics. 2020;8(9):e19848. [paper]
Chinese clinical named entity recognition with variant neural structures based on BERT methods. Li X, Zhang H, Zhou XH. Journal of biomedical informatics. 2020, 107:103422. [paper]
融入語言模型和注意力機制的臨牀電子病歷命名實體識別. 唐國強,高大啓,阮彤等. 計算機科學,2020,47(03):211-216.[paper]
基於筆畫ELMo和多任務學習的中文電子病歷命名實體識別研究. 羅凌, 楊志豪, 宋雅文等. 計算機學報, 2020, 43(10): 1943-1957. [paper]

中文電子病歷實體識別現存方法性能

中文電子病歷實體識別任務的數據集以及相應數據集上系統模型性能表現。目前現存公開的中文電子病歷標註數據十分稀缺，爲了推進CNER系統在中文臨牀文本上的表現，中國知識圖譜與語義計算大會(China Conference on Knowledge Graph and Semantic Computing, CCKS)在近幾年都組織了面向中文電子病歷的命名實體識別評測任務，下面咱們主要關注CCKS CNER數據集上的結果。web

CCKS 2017

CCKS17數據集：原始數據集分爲訓練集和測試集，其中訓練集包括300個醫療記錄，人工標註了五類實體(包括症狀和體徵、檢查和檢驗、疾病和診斷、治療、身體部位)。測試集包含100個醫療記錄。spring

語料數據統計網絡

	症狀體徵app	檢查檢驗dom	疾病診斷electron	治療性能	身體部位	總數
訓練集	7,831	9,546	722	1,048	10,719	29,866
測試集	2,311	3,143	553	465	3,021	9,493

現存方法性能比較 (%F值)

方法	症狀體徵	檢查檢驗	疾病診斷	治療	身體部位	整體	論文
HIT-CNER (Hu et al., 2017) Top1	96.00	94.43	78.97	81.47	87.48	91.14	HITSZ_CNER: a hybrid system for entity recognition from Chinese clinical text
BiLSTM-CRF-DIC (Wang et al., 2019)	-	-	-	-	-	91.24	Incorporating dictionaries into deep neural networks for the chinese clinical named entity recognition
RD-CNN-CRF (Qiu et al., 2019)	-	-	-	-	-	91.32	Chinese Clinical Named Entity Recognition Using Residual Dilated Convolutional Neural Network with Conditional Random Field
Tang et al. (2019)	-	-	-	-	-	91.34	融入語言模型和注意力機制的臨牀電子病歷命名實體識別
PDET Feature in Model-II (Lu et al., 2019)	-	-	-	-	-	92.68	Chinese Clinical Named Entity Recognition with Word-Level Information Incorporating Dictionaries
BiLSTM-CRF-SP+ELMo (Luo et al., 2020)	95.37	94.94	81.13	83.32	88.74	91.75	基於筆畫ELMo和多任務學習的中文電子病歷命名實體識別研究
FT-BERT + BiLSTM + CRF+Fea (Li et al., 2020)	96.57	94.09	81.26	82.62	88.37	91.60	Chinese clinical named entity recognition with variant neural structures based on BERT methods

注：Top表示當時評測的前三名系統方法。

CCKS 2018

CCKS18數據集：原始數據集包括訓練集和測試集．其中訓練集包括600個醫療記錄，人工標註了五類實體（包括解剖部位、症狀描述、獨立症狀、藥物、手術）。測試集包含400個醫療記錄原始數據。

語料數據統計

	解剖部位	症狀描述	獨立症狀	藥物	手術	總數
訓練集	9,472	2,484	3,712	1,221	1,329	18,218
測試集	6,339	918	1,327	813	735	10,132

現存方法性能比較 (%F值)

方法	解剖部位	症狀描述	獨立症狀	藥物	手術	整體	論文
Alihealth Lab (Yang and Huang) (2018) Top1	87.97	90.59	92.45	94.49	85.43	89.13	A Conditional Random Fields Approach to Clinical Name Entity Recognition
DUTIR (Luo et al., 2018) Top3	87.59	90.77	91.72	91.53	86.41	88.63	DUTIR at the CCKS-2018 Task1: A Neural Network Ensemble Approach for Chinese Clinical Named Entity Recognition
BiLSTM-CRF (Ji et al., 2018)	86.65	89.13	90.69	91.15	85.61	87.68	A BiLSTM-CRF Method to Chinese Electronic Medical Record Named Entity Recognition
Lattice-LSTM (潘璀然等人, 2019)	-	-	-	-	-	89.75	基於句子級 Lattice- 長短記憶神經網絡的中文電子病歷命名實體識別
Attention-BiLSTM-CRF + all (Ji et al, 2019)	-	-	-	-	-	90.82	A hybrid approach for named entity recognition in Chinese electronic medical record
MSD_DT_NER (Wang et al., 2020)	88.01	92.57	90.71	94.58	85.62	89.88	Chinese medical named entity recognition based on multi-granularity semantic dictionary and multimodal tree
BiLSTM-CRF-SP+ELMo (Luo et al., 2020)	89.69	91.83	92.01	91.30	86.22	90.05	基於筆畫ELMo和多任務學習的中文電子病歷命名實體識別研究
FT-BERT + BiLSTM + CRF+Fea (Li et al., 2020)	89.12	90.66	92.94	87.99	87.59	89.56	Chinese clinical named entity recognition with variant neural structures based on BERT methods

注：Top表示當時評測的前三名系統方法。

CCKS 2019

CCKS19數據集：原始數據集包括訓練集和測試集．其中訓練集包括1000個醫療記錄，人工標註了六類實體（包括疾病和診斷、檢查、檢驗、手術、藥物、解剖部位）。測試集包含379個醫療記錄原始數據。

語料數據統計（惟一實體個數）

	疾病和診斷	檢查	檢驗	手術	藥物	解剖部位	總數
訓練集	2,116	222	318	765	456	1486	5,363
測試集	682	91	193	140	263	447	1,816

現存方法性能比較 (%F值)

方法	疾病和診斷	檢查	檢驗	手術	藥物	解剖部位	整體	論文
Alihealth (喬銳等人, 2019) Top1	84.29	86.29	76.94	83.33	96.02	86.18	85.62	基於BERT與模型融合的醫療命名實體識別
MSIIP (Liu et al., 2019) Top2	-	-	-	-	-	-	85.59	Team MSIIP at CCKS 2019 Task 1
DUTIR (Li et al., 2019) Top3	82.81	88.01	75.65	86.79	94.49	85.99	85.16	DUTIR at the CCKS-2019 Task 1: Improving Chinese clinical named entity recognition using stroke ELMo and transfer learning

注：Top表示當時評測的前三名系統方法。

CCKS 2020

CCKS20數據集：原始數據集包括訓練集和測試集．其中訓練集包括1050個醫療記錄，人工標註了六類實體（包括疾病和診斷、檢查、檢驗、手術、藥物、解剖部位）。測試集未公開。

語料數據統計

	疾病和診斷	檢查	檢驗	手術	藥物	解剖部位	總數
訓練集	4,345	1002	1297	923	1935	8811	18313

現存方法性能比較 (%F值)

方法	疾病和診斷	檢查	檢驗	手術	藥物	解剖部位	整體	論文
CASIA_Unisound (Li et al.,2020) Top1	90.93	89.96	85.94	94.85	93.56	91.62	91.56	Noisy Label Learning for Chinese Medical Named Entity Recognition Based on Uncertainty Strategy
TMAIL (晏陽天等人, 2020) Top2	90.53	88.47	83.50	96.21	93.75	92.00	91.54	基於BERT與字形字音特徵的醫療命名實體識別
ChiEHRBert (楊文明等人, 2020) Top3	91.10	88.62	85.71	95.52	92.93	91.16	91.24	基於 ChiEHRBert 與多模型融合的醫療命名實體識別