计算机工程 ›› 2020, Vol. 46 ›› Issue (8): 297-304.doi: 10.19678/j.issn.1000-3428.0055363

• 开发研究与工程应用 • 上一篇    下一篇

融合CRF与规则的老挝语军事领域命名实体识别方法

何阳宇1, 晏雷2, 易绵竹1, 李宏欣1,3   

  1. 1. 中国人民解放军战略支援部队信息工程大学(洛阳校区), 河南 洛阳 471003;
    2. 昆明理工大学 信息工程与自动化学院, 昆明 650500;
    3. 密码科学技术国家重点实验室, 北京 100878
  • 收稿日期:2019-07-02 修回日期:2019-08-21 发布日期:2019-09-02
  • 作者简介:何阳宇(1992-),男,博士研究生,主研方向为自然语言处理、知识图谱;晏雷,硕士研究生;易绵竹,教授、博士生导师;李宏欣,讲师、博士。
  • 基金项目:
    国家自然科学基金(61701539);密码科学技术国家重点实验室开放课题(MMKFKT201825);国防科技创新特区项目。

Named Entitiy Recognition Method for Laotian in Military Field Combining CRF and Rules

HE Yangyu1, YAN Lei2, YI Mianzhu1, LI Hongxin1,3   

  1. 1. PLA Strategic Support Force Information Engineering University(Luoyang Campus), Luoyang, Henan 471003, China;
    2. College of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;
    3. State Key Laboratory of Cryptology, Beijing 100878, China
  • Received:2019-07-02 Revised:2019-08-21 Published:2019-09-02

摘要: 针对老挝语军事领域命名实体识别存在的规则制定不准确、覆盖不全等问题,提出一种融合条件随机场与规则的识别方法。通过分析老挝语语言和领域文本特点,选取词、词性、通名、指界词和词典等原子特征构建组合特征模板,在自建标注语料上训练条件随机场模型,并利用测试语料进行测试。为识别错例,加入能够表达语言确定性的规则进行后处理,以提升识别性能。实验结果表明,该方法总体准确率、召回率和F测度值分别达到91.49%、90.96%和91.22%,可有效提高老挝语军事领域命名实体识别效果。

关键词: 命名实体识别, 军事领域, 老挝语, 条件随机场, 信息抽取

Abstract: To address the problems of inaccurate formulation and incomplete coverage of existing methods for Laotian Named Entity Recognition(NER) in the military field,this paper proposes a method combining Conditional Random Field(CRF) and rules.By analyzing the characteristics of Laotian and domain texts,the method selects the atomic features such as the word,the part of speech,the general name,the boundary word and the dictionary to construct a combined feature template.The CRF model is trained on the self-built tagged corpus,and tested by using the test corpus.To identify wrong examples,it adds rules that can express language certainty for post-processing to improve recognition performance.Experimental results show that the final overall accuracy,recall rate and F measures of this method reach 91.49%,90.96% and 91.22% respectively,effectively improve the Laotian Named Entity Recognition(NER) in military field.

Key words: Named Entity Recognition(NER), military field, Laotian, Conditional Random Field(CRF), Information Extraction(IE)

中图分类号: