作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (11): 84-92. doi: 10.19678/j.issn.1000-3428.0059810

• 人工智能与模式识别 • 上一篇    下一篇

基于知识增强的中文命名实体识别

胡新棒1, 于溆乔2, 李邵梅1, 张建朋1   

  1. 1. 中国人民解放军战略支援部队信息工程大学 信息技术研究所, 郑州 450003;
    2. 墨尔本大学, 澳大利亚 墨尔本 3010
  • 收稿日期:2020-10-23 修回日期:2020-12-21 发布日期:2020-12-24
  • 作者简介:胡新棒(1995-),男,硕士研究生,主研方向为自然语言处理、知识图谱;于溆乔,本科生;李邵梅,副研究员、博士;张建朋,助理研究员、博士。
  • 基金资助:
    国家自然科学基金青年基金(62002384);国家重点研发计划(2016QY03D0502);郑州市协同创新重大专项(162/32410218)。

Chinese Named Entity Recognition Based on Knowledge Enhancement

HU Xinbang1, YU Xuqiao2, LI Shaomei1, ZHANG Jianpeng1   

  1. 1. Institute of Information Technology, PLA Strategic Support Force Information Engineering University, Zhengzhou 450003, China;
    2. The University of Melbourne, Melbourne 3010, Australia
  • Received:2020-10-23 Revised:2020-12-21 Published:2020-12-24

摘要: 基于字词联合的中文命名实体识别模型能够兼顾字符级别与词语级别的信息,但受未登录词影响较大且在小规模数据集上存在训练不充分等问题。在现有LR-CNN模型的基础上,提出一种结合知识增强的中文命名实体识别模型,采用相对位置编码的多头注意力机制提高模型上下文信息捕捉能力,通过实体词典融入先验知识降低未登录词的影响并增强模型学习能力。实验结果表明,该模型在保持较快解码速度和较低计算资源占用量的情况下,在MSRA、People Daily、Resume、Weibo数据集上相比SoftLexicon、FLAT等模型F1值均有明显提升,同时具有较强的鲁棒性和泛化能力。

关键词: 中文命名实体识别, 注意力机制, 知识增强, 未登录词, 小规模数据集

Abstract: Chinese Named Entity Recognition(CNER) models can capture both character-level and word-level information,but are vulnerable to the negative impact of Out-of-Vocabulary(OOV) words and insufficient training caused by small datasets.To address this problem,an additional knowledge enhanced CNER model is proposed based on the LR-CNN model.The model uses the multi-head attention mechanism with relative position embedding to improve the ability of the model to capture contextual information.Additionally,the entity dictionary is used to add prior knowledge to reduce the impact of OOV words,and to enhance the generalization ability of the model.Experimental results show that compared with SoftLexicon,FLAT and other models on the MSRA,People Daily,Resume,Weibo datasets,the F1 value has significantly improved.It displays excellent robustness and generalization ability.

Key words: Chinese Named Entity Recognition(CNER), attention mechanism, knowledge enhancement, Out-of-Vocabulary (OOV) word, small-scale dataset

中图分类号: