基于知识蒸馏的企业命名实体识别模型

doi:10.19678/j.issn.1000-3428.0064366

摘要/Abstract

摘要： BERT词嵌入模型能够解决简单命名实体识别模型预测精度低的问题，但基于BERT类的复杂词嵌入模型具有计算复杂度高、模型预测时间过长等缺陷。针对该问题，构建基于知识蒸馏的命名实体识别模型，将BERT+CRF模型作为教师模型，获取较高的命名实体识别精度，并基于模型结构相似原则将BiGRU+CRF作为学生模型，在学生模型训练的过程中进行知识蒸馏。知识蒸馏根据教师模型Softmax层和学生模型Softmax层输出的标注概率矩阵分别作为教师模型的知识和学生模型的知识，通过均方损失函数计算教师模型知识与学生模型知识之间的差距，将获得的结果作为软标签误差，将学生模型预测的标签结果与真实标签之间的误差作为硬标签误差，总误差为软标签误差与硬标签误差的加权和，通过误差反向传播进行模型的训练，在减小总误差的同时缩小教师模型知识与学生模型知识之间的差距，使学生模型预测精度接近教师模型。最终使用学生模型进行预测，在接近教师模型预测精度的同时保证相对较短的预测时间。在DuIE2.0数据集上的实验结果表明，该命名实体识别模型在F1值损失2.6%的情况下，可使模型参数规模缩小93.7%，从而缩短了65.2%的运算时间。

关键词: 知识蒸馏, 命名实体识别, 教师模型, 学生模型, BERT模型

Abstract: Currently，to address the low prediction accuracy problem of simple named entity recognition model，the Bidirectional Encoder Representations from Transformers（BERT） word embedding model is widely adopted. However， the complex word embedding model based on BERT suffers from high computational complexity and significant model prediction time. To address this problem，in this study，a named entity recognition model based on knowledge distillation is proposed，which uses a BERT + Conditional Random Field（CRF） model as a teacher model to obtain high named entity recognition accuracy.Using the Bidirectional Gated Recurrent Unit（BiGRU）+CRF as the student model based on the model structure similarity principle.Knowledge distillation uses the annotated probability matrix output taken from the Softmax layer of the teacher model as the knowledge of the teacher model and that taken from the Softmax layer of the student model as the knowledge of the student model. Subsequently，the gap between the knowledge of the teacher model and that of the student model is calculated using the mean-square loss function as soft-label errors.The error between the label results predicted by the student model and real label is denoted as the hard label error. The total error is the weighted combination of the soft label and hard label errors. Finally，the student model is trained through error backpropagation， reducing the difference between the knowledge of the teacher model and that of the student model while narrowing the total error. This ensures that the accuracy of the student model prediction is close to that of the teacher model. Finally，the student model is used for prediction，which ensures a relatively short prediction time and high prediction accuracy.The experimental results demonstrate that，using the dataset DuIE2.0，the scale of model parameters is reduced by 93.7% and operation time is reduced by 65.2% under the condition of losing 2.6% of F1 value.

Key words: knowledge distillation, named entity recognition, teacher model, student model, BERT model

中图分类号:

TP18

毛亮, 赵林均, 余敦辉, 孙斌. 基于知识蒸馏的企业命名实体识别模型[J]. 计算机工程, 2023, 49(5): 90-96.

MAO Liang, ZHAO Linjun, YU Dunhui, SUN Bin. Enterprise-Named Entity Recognition Model Based on Knowledge Distillation[J]. Computer Engineering, 2023, 49(5): 90-96.

https://www.ecice06.com/CN/Y2023/V49/I5/90

图/表 10

20230515185249

20230515185252

20230515185255

20230515185259

20230515185302

20230515185305

20230515185308

20230515185311

20230515185315

20230515185318

参考文献

[1] 李冬梅,张扬,李东远,等.实体关系抽取方法研究综述[J].计算机研究与发展,2020,57(7):1424-1448. LI D M,ZHANG Y,LI D Y,et al.Review of entity relation extraction methods[J].Journal of Computer Research and Development,2020,57(7):1424-1448.(in Chinese)
[2] 何阳宇,晏雷,易绵竹,等.融合CRF与规则的老挝语军事领域命名实体识别方法[J].计算机工程,2020,46(8):297-304. HE Y Y,YAN L,YI M Z,et al.Named entitiy recognition method for Laotian in military field combining CRF and rules[J].Computer Engineering,2020,46(8):297-304.(in Chinese)
[3] 张玥杰,徐智婷,薛向阳.融合多特征的最大熵汉语命名实体识别模型[J].计算机研究与发展,2008,45(6):1004-1010. ZHANG Y J,XU Z T,XUE X Y.Fusion of multiple features for Chinese named entity recognition based on maximum entropy model[J].Journal of Computer Research and Development,2008,45(6):1004-1010.(in Chinese)
[4] 李春楠,王雷,孙媛媛,等.基于BERT的盗窃罪法律文书命名实体识别方法[J].中文信息学报,2021,35(8):73-81. LI C N,WANG L,SUN Y Y,et al.BERT based named entity recognition for legal texts on theft cases[J].Journal of Chinese Information Processing,2021,35(8):73-81.(in Chinese)
[5] 杨培,杨志豪,罗凌,等.基于注意机制的化学药物命名实体识别[J].计算机研究与发展,2018,55(7):1548-1556. YANG P,YANG Z H,LUO L,et al.An attention-based approach for chemical compound and drug named entity recognition[J].Journal of Computer Research and Development,2018,55(7):1548-1556.(in Chinese)
[6] YOU Y S,PARK H R.Syllable-based Korean named entity recognition using convolutional neural network[J].Journal of Advanced Marine Engineering and Technology,2020,44(1):68-74.
[7] LUO L,YANG Z H,YANG P,et al.An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition[J].Bioinformatics,2018,34(8):1381-1388.
[8] WANG M,ZHOU T.Chinese power dispatching text entity recognition based on a double-layer BiLSTM and multi-feature fusion[C]//Proceedings of the 2nd International Conference on Power Engineering.[S.l.]:Elsevier,2022:980-987.
[9] GUI T,ZOU Y C,ZHANG Q,et al.A lexicon-based graph neural network for Chinese NER[C]//Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing.Stroudsburg,USA:Association for Computational Linguistics,2019:1040-1050.
[10] 琚生根,李天宁,孙界平.基于关联记忆网络的中文细粒度命名实体识别[J].软件学报,2021,32(8):2545-2556. JU S G,LI T N,SUN J P.Chinese fine-grained name entity recognition based on associated memory networks[J].Journal of Software,2021,32(8):2545-2556.(in Chinese)
[11] 罗凌,杨志豪,宋雅文,等.基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究[J].计算机学报,2020,43(10):1943-1957. LUO L,YANG Z H,SONG Y W,et al.Chinese clinical named entity recognition based on stroke ELMo and multi-task learning[J].Chinese Journal of Computers,2020,43(10):1943-1957.(in Chinese)
[12] 成于思,施云涛.融合词典特征的Bi-LSTM-WCRF中文人名识别[J].中文信息学报,2020,34(4):69-76. CHENG Y S,SHI Y T.Bi-LSTM-WCRF incorporating dictionary feature for Chinese person Name recognition[J].Journal of Chinese Information Processing,2020,34(4):69-76.(in Chinese)
[13] WU G H,TANG G G,WANG Z R,et al.An attention-based BiLSTM-CRF model for Chinese clinic named entity recognition[J].IEEE Access,2019,7:113942-113949.
[14] DANG T H,LE H Q,NGUYEN T M,et al.D3NER:biomedical named entity recognition using CRF-biLSTM improved with fine-tuned embeddings of various linguistic information[J].Bioinformatics,2018,34(20):3539-3546.
[15] JIN Y L,XIE J F,GUO W S,et al.LSTM-CRF neural network with gated self attention for Chinese NER[J].IEEE Access,2019,7:136694-136703.
[16] 张晗,郭渊博,李涛.结合GAN与BiLSTM-Attention-CRF的领域命名实体识别[J].计算机研究与发展,2019,56(9):1851-1858. ZHANG H,GUO Y B,LI T.Domain named entity recognition combining GAN and BiLSTM-attention-CRF[J].Journal of Computer Research and Development,2019,56(9):1851-1858.(in Chinese)
[17] ZHANG N X,LI F,XU G L,et al.Chinese NER using dynamic meta-embeddings[J].IEEE Access,2019,7:64450-64459.
[18] LI X Y,ZHANG H,ZHOUX X.Chinese clinical named entity recognition with variant neural structures based on BERT methods[J].Journal of Biomedical Informatics,2020,107:103422.
[19] 胡滨,耿天玉,邓赓,等.基于知识蒸馏的高效生物医学命名实体识别模型[J].清华大学学报(自然科学版),2021,61(9):936-942. HU B,GENG T Y,DENG G,et al.Faster biomedical named entity recognition based on knowledge distillation[J].Journal of Tsinghua University(Science and Technology),2021,61(9):936-942.(in Chinese)
[20] HAGSTROM L,JOHANSSON R.Knowledge distillation for Swedish NER models:a search for performance and efficiency[C]//Proceedings of the 23rd Nordic Conference on Computational Linguistics.Washington D.C.,USA:IEEE Press,2021:124-134.
[21] HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[EB/OL].[2022-03-01].https://arxiv.org/abs/1503.02531.
[22] LIU W,XU T G,XU Q H,et al.An encoding strategy based word-character LSTM for Chinese NER[C]//Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg,USA:Association for Computational Linguistics,2019:2379-2389.
[23] ZHOU X,ZHANG X,TAO C Y,et al.Multi-grained knowledge distillation for named entity recognition[C]//Proceedings of 2021 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Stroudsburg,USA:Association for Computational Linguistics,2021:5704-5716.
[24] LIANG S N,GONG M,PEI J,et al.Reinforced iterative knowledge distillation for cross-lingual named entity recognition[C]//Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining.New York,USA:ACM Press,2021:3231-3239.

选择文件类型/文献管理软件名称

选择包含的内容