计算机工程

• •    

基于知识增强的中文命名实体识别

  

  • 出版日期:2020-12-24 发布日期:2020-12-24

Additional Knowledge Enhanced Chinese Name Entity Recognition

  • Online:2020-12-24 Published:2020-12-24

摘要: 基于字词联合的中文命名实体识别模型能够兼顾字符级别与词级别的信息,近些年受到了广泛关注,但这类模型受 未登录词影响较大,且在小规模数据集存在学习不充分等问题。为此,在 LR-CNN 模型的基础上,提出一种融入先验知识的 AKE 模型。采用相对位置编码的多头注意力机制,提高上下文信息捕捉能力,并通过实体词典融入先验知识,降低未登录词 的影响、增强模型学习能力。实验结果表明,与 SoftLexicon、FLAT 等模型相比,在维持较快解码速度和较低资源占用量的 基础上,该模型在 MSRA、People Daily、Resume、Weibo 等数据集上 F1 值分别提升了 0.12%、0.12%、0.06%、2.69%,在 OntoNotes5.0、Boson 等数据集上也有较好表现。

Abstract: C hinese named entity recognition(CNER) models that use different character-word information fusion tactics are getting lots of attention recently for their remarkable capacity for information integration but, nonetheless, such models can hardly handle certain problems either, including the negative impact of out-of-vocabulary words, the instability of model performance on small datasets, etc. To address this problem, based on LR-CNN model, this paper proposed Additional Knowledge Enhanced CNER (AKE) model that effectively utilizes priori knowledge. The model uses multi-head attention mechanism with relative position embedding to improve the context modelling competence, and adds prior knowledge to reduce the impact of out-of-vocabulary(OOV) words and to enhance model’s generalization ability. Experimental results show that, compared with SoftLexicon, FLAT, and other models, the proposed one increases the F1 value by 0.12%, 0.12%, 0.06%, 2.69% on MSRA, People Daily, Resume, and Weibo respectively, and gets satisfactory results on OntoNotes5.0 and Boson, while maintain high inference speed and low resources occupancy at the same time.