基于BERT模型与知识蒸馏的意图分类方法

doi:10.19678/j.issn.1000-3428.0057416

计算机工程 ›› 2021, Vol. 47 ›› Issue (5): 73-79. doi: 10.19678/j.issn.1000-3428.0057416

基于BERT模型与知识蒸馏的意图分类方法

廖胜兰¹, 吉建民¹, 俞畅², 陈小平¹

1. 中国科学技术大学计算机科学与技术学院, 合肥 230026;
2. 中国科学技术大学软件学院, 合肥 230031

收稿日期:2020-02-18 修回日期:2020-04-14 发布日期:2020-04-29
作者简介:廖胜兰(1995-),女,硕士研究生,主研方向为文本分类、语义解析;吉建民(通信作者),副教授;俞畅,硕士研究生;陈小平,教授。
基金资助:
国家自然科学基金（U1613216）；广东省科技计划项目（2017B010110011）；科技创新2030-“新一代人工智能”重大项目（2018AA000500）；2019年华为-中国科大基础系统软件联合创新项目。

Intention Classification Method Based on BERT Model and Knowledge Distillation

LIAO Shenglan¹, JI Jianmin¹, YU Chang², CHEN Xiaoping¹

1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China;
2. School of Software Engineering, University of Science and Technology of China, Hefei 230031, China

Received:2020-02-18 Revised:2020-04-14 Published:2020-04-29

摘要/Abstract

摘要： 意图分类是一种特殊的短文本分类方法，其从传统的模板匹配方法发展到深度学习方法，基于BERT模型的提出，使得大规模的预训练语言模型成为自然语言处理领域的主流方法。然而预训练模型十分庞大，且需要大量的数据和设备资源才能完成训练过程。提出一种知识蒸馏意图分类方法，以预训练后的BERT作为教师模型，文本卷积神经网络Text-CNN等小规模模型作为学生模型，通过生成对抗网络得到的大量无标签数据将教师模型中的知识迁移到学生模型中。实验数据包括基于真实场景下的电力业务意图分类数据集，以及通过生成对抗网络模型生成的大量无标签文本作为扩充数据。在真实数据和生成数据上的实验结果表明，用教师模型来指导学生模型训练，该方法可以在原有数据资源和计算资源的条件下将学生模型的意图分类准确率最高提升3.8个百分点。

关键词: 意图分类, 预训练模型, 知识蒸馏, 生成对抗网络, 对话系统

Abstract: As an important module in dialog systems,intention classification is a domain-specific short text classification method that has developed from traditional template matching to deep learning.And the proposal of Bidirectional Encoder Representations from Transformers(BERT) model makes the large-scale pre-trained language models become the mainstream method in the field of natural language processing.However,the size of these pre-trained models is huge and requires a lot of data and computing resources to complete the training process.To address the problem,this paper proposes an intention classification method based on the idea of Knowledge Distillation(KD).The method employs the pre-trained BERT as the "teacher" model,and small-scale models such as Text Convolutional Neural Networks(Text-CNN) as the "student" models. The knowledge in the teacher model is transferred to the student models through a large amount of unlabeled data generated by Generative Adversarial Network(GAN).An intention data set of real-world power business is used for experiments,and a large number of unlabeled texts generated by GAN is used as augmented data.Experimental results on these data show that by using the teacher model to guide the training of student models,the intention classification accuracy of the student models can be improved by 3.8 percent with data resources and computing resources unchanged.

Key words: intention classification, pre-trained model, Knowledge Distillation(KD), Generative Adversarial Network(GAN), dialogue system

中图分类号:

TP391.1

廖胜兰, 吉建民, 俞畅, 陈小平. 基于BERT模型与知识蒸馏的意图分类方法[J]. 计算机工程, 2021, 47(5): 73-79.

LIAO Shenglan, JI Jianmin, YU Chang, CHEN Xiaoping. Intention Classification Method Based on BERT Model and Knowledge Distillation[J]. Computer Engineering, 2021, 47(5): 73-79.

https://www.ecice06.com/CN/Y2021/V47/I5/73

参考文献

[1] DEVLIN J,CHANG M W,LEE K,et al.BERT:pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of IEEE NAACL'19.Washington D.C.,USA:IEEE Press,2019:4171-4186.
[2] KIM Y.Convolutional neural networks for sentence classification[C]//Proceedings of 2014 Conference on Empirical Methods in Natural Language Processing.Doha,Qatar:[s.n.],2014:1746-1751.
[3] LAI Siwei,XU Liheng,LIU Kang,et al.Recurrent convolutional neural networks for text classification[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence.[S.1.]:AAAI Press,2015:2267-2273.
[4] HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[EB/OL].[2020-01-10].https://arxiv.org/abs/1503.02531.
[5] YU Lantao,ZHANG Weinan,WANG Jun,et al.SeqGAN:sequence generative adversarial nets with policy gradient[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence.[S.1.]:AAAI Press,2017:2852-2858.
[6] LI X,ROTH D.Learning question classifiers:the role of semantic information[J].Natural Language Engineering,2006,12(3):229-249.
[7] HAFFNER P,TUR G,WRIGHT J H.Optimizing SVM for complex call classification[C]//Proceedings of IEEE International Conference on Acoustics,Speech,and Signal Processing.Washington D.C.,USA:IEEE Press,2003:210-226.
[8] GENKIN A,LEWIS D D,MADIGAN D.Large-scale Bayesian logistic regression for text categorization[J].Technometrics,2007,49(3):291-304.
[9] MCCALLUM A,NIGAM K.A comparison of event models for naive Bayesian text classification[C]//Proceedings of AAAI Conference on Learning for Text Categorization.[S.1.]:AAAI Press,1998:41-48.
[10] SCHAPIRE R E,SINGER Y.BoosTexter:a boosting-based system for text categorization[J].Machine Learning,2000,39(2/3):135-168.
[11] YANG Y,PEDERSEN J O.A comparative study on feature selection in text categorization[C]//Proceedings of the 14th International Conference on Machine Learning.Washington D.C.,USA:IEEE Press,1997:412-420.
[12] BENGIO Y,DUCHARME R,VINCENT P,et al.A neural probabilistic language model[J].Journal of Machine Learning Research,2003,3:1137-1155.
[13] ELMAN J L.Finding structure in time[J].Cognitive Science,1990,14(2):179-211.
[14] HOCHREITER S,SCHMIDHUBER J.Long short-term memory[J].Neural Computation,1997,9(8):1735-1780.
[15] CHO K,VAN MERRIENBOER B,GULCEHRE C,et al.Learning phrase representations using RNN encoder-decoder for statistical machine translation[EB/OL].[2020-01-10].https://arxiv.org/abs/1406.1078.
[16] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].[2020-01-10].https://arxiv.org/abs/1409.0473.
[17] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of Conference on Advances in Neural Information Processing Systems.Washington D.C.,USA:IEEE Press,2013:3111-3119.
[18] PETEERS M,NEUMANN M,IYYER M,et al.Deep contextualized word representations[C]//Proceedings of 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Washington D.C.,USA:IEEE Press,2018:2227-2237.
[19] VASWANI A,SHAZEER N,PARMAR N,et al.Attention is all you need[C]//Proceedings of Conference on Advances in Neural Information Processing Systems.Washington D.C.,USA:IEEE Press,2017:5998-6008.
[20] RADFORD A,WU J,CHILD R,et al.Language models are unsupervised multitask learners[J].OpenAI Blog,2019,1(8):457-468.
[21] SONG K T,TAN X,QIN T,et al.MASS:masked sequence to sequence pre-training for language generation[EB/OL].[2020-01-10].https://arxiv.org/abs/1905.02450.
[22] ZHANG Zhengyan,HAN Xu,LIU Zhiyan,et al.ERNIE:enhanced language representation with informative entities[EB/OL].[2020-01-10].https://arxiv.org/abs/1905.07129.
[23] LIU Xiaodong,HE Pengcheng,CHEN Weizhu,et al.Multi-task deep neural networks for natural language understanding[EB/OL].[2020-01-10].https://arxiv.org/abs/1901.11504.
[24] GOODFELLOW I J,POUGET-ABADIE J,MIRZA M,et al.Generative adversarial nets[C]//Proceedings of Conference on Advances in Neural Information Processing Systems.Washington D.C.,USA:IEEE Press,2014:589-596.
[25] YU Chang,OUYANG Yu,ZHANG Bo,et al.Power user intent text generation based on generative adversarial network[J].Information Technology and Network Security,2019,38(11):67-72.(in Chinese)俞畅,欧阳昱,张波,等.基于对抗式生成网络的电力用户意图文本生成[J].信息技术与网络安全,2019,38(11):67-72.

选择文件类型/文献管理软件名称

选择包含的内容

基于BERT模型与知识蒸馏的意图分类方法

Intention Classification Method Based on BERT Model and Knowledge Distillation

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	屠乃威, 焦猛, 阎馨. 复杂环境下输电线路鸟巢目标图像检测模型[J]. 计算机工程, 2024, 50(7): 216-226.
[2]	胡庆. 多尺度融合与双输出U-Net网络的行人重识别[J]. 计算机工程, 2024, 50(6): 102-109.
[3]	张慧妍, 梁勇, 兰景宏, 赵强. 基于记忆模块与过滤式生成对抗网络的入侵检测方法[J]. 计算机工程, 2024, 50(6): 197-207.
[4]	周昭辰, 方清茂, 吴晓红, 胡平, 何小海. 基于MacBERT与对抗训练的机器阅读理解模型[J]. 计算机工程, 2024, 50(5): 41-50.
[5]	李田芳, 普园媛, 赵征鹏, 徐丹, 钱文华. 基于CLIP和双空间自适应归一化的图像翻译[J]. 计算机工程, 2024, 50(5): 229-240.
[6]	侯钰涛, 阿布都克力木·阿布力孜, 史亚庆, 马依拉木·木斯得克, 哈里旦木·阿布都克里木. 面向"一带一路"的低资源语言机器翻译研究[J]. 计算机工程, 2024, 50(4): 332-341.
[7]	于明诚, 党亚固, 吴奇林, 吉旭, 毕可鑫. 基于多尺度上下文的英文作文自动评分研究[J]. 计算机工程, 2024, 50(3): 259-266.
[8]	刘帅威, 李智, 王国美, 张丽. 基于Transformer和GAN的对抗样本生成算法[J]. 计算机工程, 2024, 50(2): 180-187.
[9]	何银银, 胡静, 陈志泊, 张荣国. 融合门控变换机制和GAN的低光照图像增强方法[J]. 计算机工程, 2024, 50(2): 247-255.
[10]	张美美, 秦品乐, 柴锐, 曾建潮, 翟双姣, 闫俊义, 冯二燕. 面向急性缺血性脑卒中的CT生成MRI算法[J]. 计算机工程, 2024, 50(2): 317-326.
[11]	曹发鑫, 孙媛媛, 王治政, 潘丁豪, 林鸿飞. 面向借贷案件的相似案例匹配模型[J]. 计算机工程, 2024, 50(1): 306-312.
[12]	曹坪, 杨怀志, 薄一军, 尤嘉, 张淳杰, 李丹勇. 面向低质量裂缝图像的多知识蒸馏分类[J]. 计算机工程, 2023, 49(7): 204-213.
[13]	沈梦强, 于文年, 易黎, 宋南. 基于GAN的全时间尺度语音增强方法[J]. 计算机工程, 2023, 49(6): 115-122,130.
[14]	张博旭, 蒲智, 程曦. 基于提示学习的维吾尔语文本分类研究[J]. 计算机工程, 2023, 49(6): 292-299,313.
[15]	朱红, 牛浩然, 朱彤. 基于字词融合与对抗训练的行业人物实体识别[J]. 计算机工程, 2023, 49(5): 56-62.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于BERT模型与知识蒸馏的意图分类方法

Intention Classification Method Based on BERT Model and Knowledge Distillation

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价