Abstract:
Traditional Chinese Medicine(TCM) medical records are the important data resources of the TCM medical
research. The main form of them is still text now,and it is necessary to extract the structured information from the medical records,while named entity extraction is the basic step. It makes 413 copies of manually labeled medical records in Chinese text and four types of feature templates to study about the named entity extraction practice such as symptoms, diseases and incentives. It compares the results of TCM medical records named entity extraction by Conditional Random Field(CRF ), Hidden Markov Model ( HMM ) and Maximum Entropy Markov Model ( MEMM ). Combined with appropriate feature templates,CRF has well performance of F1:symptoms 0. 80,the name of the disease 0. 74,incentives 0. 74. Compared with HMM and MEMM,CRF has the highest precision and recall rate. This preliminary shows that CRF is an applicable method of the Chinese medical records named entity extraction
Key words:
Traditional Chinese Medicine(TCM) medical records,
named entity extraction,
corpus annotation system,
Conditional Random Field(CRF),
feature template
摘要: 中医临床病历是中医重要的科研数据资源,但目前临床病历仍以文本为主要表达形式,对病历数据深入分 析的前提是进行结构化信息抽取,而命名实体抽取是其基础性步骤。针对中医临床病历的命名实体,如症状、疾病 和诱因等的抽取问题,通过手工标注的413 份病历数据(以中文字为特征)与4 类特征模版,将条件随机场(CRF)、隐马尔科夫模型(HMM)和最大熵马尔科夫模型(MEMM)用于中医病历命名实体抽取的实验,并进行比较分析。结果表明,结合合适的特征模版,CRF 命名实体抽取方法取得了较好的性能,F1 值的症状达到0. 80,疾病名称达到0 . 74,诱因0. 74。与HMM 和MEMM 相比,CRF 有最高的准确率和召回率,是一种较为适用的中医临床病历命名 实体抽取方法。
关键词:
中医临床病历,
命名实体抽取,
语料库标注系统,
条件随机场,
特征模板
CLC Number:
LIU Kai,ZHOU Xue-zhong,YU Jian,ZHANG Run-shun. Named Entity Extraction of Traditional Chinese Medicine Medical Records Based on Conditional Random Field[J]. Computer Engineering.
刘凯,周雪忠,于剑,张润顺. 基于条件随机场的中医临床病历命名实体抽取[J]. 计算机工程.