作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (5): 73-79. doi: 10.19678/j.issn.1000-3428.0057416

• 人工智能与模式识别 • 上一篇    下一篇

基于BERT模型与知识蒸馏的意图分类方法

廖胜兰1, 吉建民1, 俞畅2, 陈小平1   

  1. 1. 中国科学技术大学 计算机科学与技术学院, 合肥 230026;
    2. 中国科学技术大学 软件学院, 合肥 230031
  • 收稿日期:2020-02-18 修回日期:2020-04-14 发布日期:2020-04-29
  • 作者简介:廖胜兰(1995-),女,硕士研究生,主研方向为文本分类、语义解析;吉建民(通信作者),副教授;俞畅,硕士研究生;陈小平,教授。
  • 基金资助:
    国家自然科学基金(U1613216);广东省科技计划项目(2017B010110011);科技创新2030-“新一代人工智能”重大项目(2018AA000500);2019年华为-中国科大基础系统软件联合创新项目。

Intention Classification Method Based on BERT Model and Knowledge Distillation

LIAO Shenglan1, JI Jianmin1, YU Chang2, CHEN Xiaoping1   

  1. 1. School of Computer Science and Technology, University of Science and Technology of China, Hefei 230026, China;
    2. School of Software Engineering, University of Science and Technology of China, Hefei 230031, China
  • Received:2020-02-18 Revised:2020-04-14 Published:2020-04-29

摘要: 意图分类是一种特殊的短文本分类方法,其从传统的模板匹配方法发展到深度学习方法,基于BERT模型的提出,使得大规模的预训练语言模型成为自然语言处理领域的主流方法。然而预训练模型十分庞大,且需要大量的数据和设备资源才能完成训练过程。提出一种知识蒸馏意图分类方法,以预训练后的BERT作为教师模型,文本卷积神经网络Text-CNN等小规模模型作为学生模型,通过生成对抗网络得到的大量无标签数据将教师模型中的知识迁移到学生模型中。实验数据包括基于真实场景下的电力业务意图分类数据集,以及通过生成对抗网络模型生成的大量无标签文本作为扩充数据。在真实数据和生成数据上的实验结果表明,用教师模型来指导学生模型训练,该方法可以在原有数据资源和计算资源的条件下将学生模型的意图分类准确率最高提升3.8个百分点。

关键词: 意图分类, 预训练模型, 知识蒸馏, 生成对抗网络, 对话系统

Abstract: As an important module in dialog systems,intention classification is a domain-specific short text classification method that has developed from traditional template matching to deep learning.And the proposal of Bidirectional Encoder Representations from Transformers(BERT) model makes the large-scale pre-trained language models become the mainstream method in the field of natural language processing.However,the size of these pre-trained models is huge and requires a lot of data and computing resources to complete the training process.To address the problem,this paper proposes an intention classification method based on the idea of Knowledge Distillation(KD).The method employs the pre-trained BERT as the "teacher" model,and small-scale models such as Text Convolutional Neural Networks(Text-CNN) as the "student" models. The knowledge in the teacher model is transferred to the student models through a large amount of unlabeled data generated by Generative Adversarial Network(GAN).An intention data set of real-world power business is used for experiments,and a large number of unlabeled texts generated by GAN is used as augmented data.Experimental results on these data show that by using the teacher model to guide the training of student models,the intention classification accuracy of the student models can be improved by 3.8 percent with data resources and computing resources unchanged.

Key words: intention classification, pre-trained model, Knowledge Distillation(KD), Generative Adversarial Network(GAN), dialogue system

中图分类号: