作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (8): 85-95. doi: 10.19678/j.issn.1000-3428.0065455

• 人工智能与模式识别 • 上一篇    下一篇

基于门控空洞卷积特征融合的中文命名实体识别

杨长沛1, 廖列法1,2,*   

  1. 1. 江西理工大学 信息工程学院, 江西 赣州 341000
    2. 江西理工大学 软件工程学院, 南昌 330000
  • 收稿日期:2022-08-08 出版日期:2023-08-15 发布日期:2023-08-15
  • 通讯作者: 廖列法
  • 作者简介:

    杨长沛(1996—),男,硕士研究生,主研方向为自然语言处理、命名实体识别

  • 基金资助:
    国家自然科学基金(71462018); 国家自然科学基金(71761018)

Chinese Named Entity Recognition Based on Dilated Gated Convolution Feature Fusion

Changpei YANG1, Liefa LIAO1,2,*   

  1. 1. School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou 341000, Jiangxi, China
    2. School of Software Engineering, Jiangxi University of Science and Technology, Nanchang 330000, China
  • Received:2022-08-08 Online:2023-08-15 Published:2023-08-15
  • Contact: Liefa LIAO

摘要:

在中文命名实体识别任务中,具有循环结构的长短时记忆网络模型通过捕捉时序特征解决长距离依赖问题,但其特征捕捉方式单一,信息获取能力有限。卷积神经网络通过使用多层卷积并行处理文本,能够提高模型运算速度,捕捉文本的空间特征,但简单地堆叠多个卷积层容易导致梯度消失。为同时获得多维度的文本特征且改善梯度消失问题,提出一种基于RoBERTa-wwm-DGCNN-BiLSTM-BMHA-CRF的中文命名实体识别模型,通过基于全词遮蔽技术的预训练语言模型RoBERTa-wwm把文本表征为字符级嵌入向量,捕捉深度上下文语义信息,并采用门控机制和残差结构对空洞卷积神经网络进行改进以降低梯度消失的风险。使用双向长短时记忆网络和门控空洞卷积神经网络分别捕捉文本的时序特征和空间特征,采用双线性多头注意力机制对多维度的文本特征进行动态融合,最后使用条件随机场对结果进行约束,获得最佳标记序列。实验结果表明,所提模型在Resume、Weibo和MSRA数据集上的F1值分别为97.20%、74.28%和95.74%,证明了该模型在中文命名实体识别中的有效性。

关键词: 命名实体识别, RoBERTa-wwm模型, 空洞卷积, 注意力机制, 特征融合

Abstract:

In the task of Chinese Named Entity Recognition(NER), the long short-term memory network model with cyclic structure can solve the problem of long-distance dependence by capturing temporal features, but its feature capture method is singular and the information acquisition ability is limited. By using multi-layer convolution to process text in parallel, the Convolutional Neural Network(CNN) can improve the operation speed of the model and capture the spatial features of text. However, simply stacking multiple convolutional layers can easily lead to the gradient vanishing problem.To obtain multi-dimensional text features simultaneously and improve the gradient vanishing problem, this paper proposes a Chinese NER model based on RoBERTa-wwm-DGCNN-BiLSTM-BMHA-CRF.Firstly, text is represented as a character-level embedding vector by the pre-trained language model RoBERTa-wwm based on the whole-word masking technique to capture the deep contextual semantic information.Secondly, the gating mechanism and residual structure are used to improve the Dilated CNN(DCNN) to reduce the risk of gradient disappearance, and then the Bi-directional Long Short-Term Memory(BiLSTM) network and Dilated Gated CNN(DGCNN) are used to capture the temporal and spatial characteristics of the text, respectively. Thirdly, the Bi-linear Multi-Head Attention (BMHA) mechanism is used to dynamically fuse the multi-dimensional text features. Finally, the Conditional Random Field(CRF) is used to constrain the results and obtain the best marker sequence. The experimental results indicate that the F1 values of the proposed model on the Resume, Weibo, and MSRA data sets were 97.20%, 74.28% and 95.74%, respectively, which proves the effectiveness of the proposed model for Chinese NER.

Key words: Named Entity Recognition(NER), RoBERTa-wwm model, dilated convolution, attention mechanism, feature fusion