作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (10): 73-80. doi: 10.19678/j.issn.1000-3428.0062525

• 人工智能与模式识别 • 上一篇    下一篇

基于CNN‐Head Transformer编码器的中文命名实体识别

史占堂1,2,3, 马玉鹏1,2,3, 赵凡1,2,3, 马博1,2,3   

  1. 1. 中国科学院新疆理化技术研究所, 乌鲁木齐 830011;
    2. 中国科学院大学, 北京 100049;
    3. 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011
  • 收稿日期:2021-08-28 修回日期:2021-10-22 发布日期:2021-11-16
  • 作者简介:史占堂(1995—),男,硕士研究生,主研方向为自然语言处理;马玉鹏、赵凡,研究员、博士;马博,副研究员、博士。
  • 基金资助:
    国家重点研发计划(2018YFC0825300,2018YFC0823002);新疆维吾尔自治区重大专项(2020A03004-4);“西部之光”人才培养计划(2018-XBQNXZ-A-003);中国科学院青年创新促进会基金(科发人函字[2019]26号)。

Chinese Named Entity Recognition Based on CNN-Head Transformer Encoder

SHI Zhantang1,2,3, MA Yupeng1,2,3, ZHAO Fan1,2,3, MA Bo1,2,3   

  1. 1. Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi 830011, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China
  • Received:2021-08-28 Revised:2021-10-22 Published:2021-11-16

摘要: 基于多头自注意力机制的Transformer作为主流特征提取器在多种自然语言处理任务中取得了优异表现,但应用于命名实体识别任务时存在一字多词、增加额外存储与词典匹配时间等问题。提出一种CNN-Head Transformer编码器(CHTE)模型,在未使用外部词典和分词工具的基础上,通过自注意力机制捕获全局语义信息,利用不同窗口大小的CNN获取Transformer中6个注意力头的Value向量,使CHTE模型在保留全局语义信息的同时增强局部特征和潜在词信息表示,并且应用自适应的门控残差连接融合当前层和子层特征,提升了Transformer在命名实体识别领域的性能表现。在Weibo和Resume数据集上的实验结果表明,CHTE模型的F1值相比于融合词典信息的Lattice LSTM和FLAT模型分别提升了3.77、2.24和1.30、0.31个百分点,具有更高的中文命名实体识别准确性。

关键词: 命名实体识别, 自注意力机制, Transformer编码器, 卷积神经网络, 残差连接

Abstract: Transformers based on the multihead self-attention mechanism as a mainstream feature extractor have achieved excellent performance in various Natural Language Processing(NLP) tasks.However, problems such as one character belonging to multiple words, additional storage, and dictionary matching time arise when those transformers are applied to Named Entity Recognition(NER) tasks.This study proposes a Convolutional Neural Network(CNN)-Head Transformer Encoder(CHTE) model to capture global semantic information via a self-attention mechanism without using external lexicon and word segmentation tools.Additionally, a CNN with different window sizes is used to obtain the value vectors of six attention heads in the transformer.It preserves global semantic information while enhancing local features and latent word information representation, as well as applies adaptive gated residual connection to fuse current layer and sublayer features to improve the performance of the transformer in NER tasks.Experimental results show that the F1 values of the CHTE model on the Weibo dataset are 3.77 and 2.24 percentage points higher than those of the Lattice Long Short-Term Memory(Lattice LSTM) and FLAT-lattice Transformer(FLAT) models fused with dictionary information, respectively;the F1 values of the CHTE model on the Resume dataset improved by 1.30 and 0.31 percentage points than those of the Lattice LSTM and FLAT models, respectively;and the CHTE model demonstrates higher recognition accuracy for Chinese named entities than Lattice LSTM and FLAT models.

Key words: Named Entity Recognition(NER), self-attention mechanism, Transformer encoder, Convolutional Neural Network(CNN), residual connection

中图分类号: