作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (8): 78-83,92. doi: 10.19678/j.issn.1000-3428.0058838

• 人工智能与模式识别 • 上一篇    下一篇

基于BERT的电机领域中文命名实体识别方法

顾亦然1, 霍建霖1, 杨海根2, 卢逸飞1, 郭玉雯1   

  1. 1. 南京邮电大学 自动化学院 人工智能学院, 南京 210023;
    2. 南京邮电大学 宽带无线通信技术教育部工程研究中心, 南京 210003
  • 收稿日期:2020-07-06 修回日期:2020-08-11 发布日期:2020-08-27
  • 作者简介:顾亦然(1972-),女,教授、博士,主研方向为复杂网络、嵌入式系统;霍建霖,硕士研究生;杨海根,副教授、博士;卢逸飞、郭玉雯,硕士研究生。
  • 基金资助:
    国家部委基金。

BERT-Based Chinese Named Entity Recognition Method in Motor Field

GU Yiran1, HUO Jianlin1, YANG Haigen2, LU Yifei1, GUO Yuwen1   

  1. 1. College of Automation & College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing 210023, China;
    2. Engineering Research Center of Wideband Wireless Communication Technology, Ministry of Education, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
  • Received:2020-07-06 Revised:2020-08-11 Published:2020-08-27

摘要: 针对电机领域实体识别精度较低的问题,提出一种融合BERT预训练语言模型的中文命名实体识别方法。利用BERT预训练语言模型增强字的语义表示并按照上下文特征动态生成字向量,将字向量序列输入双向长短期记忆神经网络进行双向编码,同时通过条件随机场算法标注出实体识别结果。根据电机文本特点对自建数据集进行标注,并将电机领域实体划分为实物、特性描述、问题/故障、方法/技术等4个类别。实验结果表明,与基于BiLSTM-CRF、BiLSTM-CNN和BiGRU的实体识别方法相比,该方法具有更高的准确率、召回率和F1值,并且有效解决了电机领域命名实体识别任务中标注数据不足及实体边界模糊的问题。

关键词: 命名实体识别, BERT预训练语言模型, 电机领域, 深度学习, 迁移学习

Abstract: For motor-related texts, accuracy of Named Entity Recognition(NER) is relatively low. A method for Chinese NER based on a BERT pre-training language model is proposed. The BERT model is used to enhance the semantic representation of words and dynamically generate word vectors based on context features. Then the word sequence is input into the Bidirectional Long Short-Term Memory(BiLSTM) neural network for bidirectional encoding, and the entity recognition results are labeled by using the Conditional Random Field(CRF) algorithm. A data set is built for experiments, and labeled according to the characteristics of the motor-related texts. The entities in the texts are categorized into physical objects, characteristic descriptions, problems/faults, methods/technologies. Experimental results show that the proposed method has higher accuracy, recall rate and F1 value than the BiLSTM-CRF-based, BiLSTM-CNN-based or BiGRUNER-based methods. The proposed method can effectively solve the problems of insufficient annotation data and fuzzy entity boundaries in the NER tasks for the motor-related texts.

Key words: Named Entity Recognition(NER), BERT pre-training language model, motor field, deep learning, transfer learning

中图分类号: