作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (7): 97-103. doi: 10.19678/j.issn.1000-3428.0061790

• 人工智能与模式识别 • 上一篇    下一篇

基于知识蒸馏与模型集成的事件论元抽取方法

王士浩, 王中卿, 李寿山, 周国栋   

  1. 苏州大学 计算机科学与技术学院, 江苏 苏州 215006
  • 收稿日期:2021-05-31 修回日期:2021-09-16 出版日期:2022-07-15 发布日期:2022-07-12
  • 作者简介:王士浩(1997—),男,硕士研究生,主研方向为事件论元抽取;王中卿(通信作者),副教授;李寿山、周国栋,教授。
  • 基金资助:
    国家自然科学基金(61806137,61702518);江苏省高等学校自然科学研究面上项目(18KJB520043)。

Event Argument Extraction Method Based on Knowledge Distillation and Model Ensemble

WANG Shihao, WANG Zhongqing, LI Shoushan, ZHOU Guodong   

  1. School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China
  • Received:2021-05-31 Revised:2021-09-16 Online:2022-07-15 Published:2022-07-12

摘要: 目前先进的事件论元抽取方法通常使用BERT模型作为编码器,但BERT巨大的参数量会降低效率,使模型无法在计算资源有限的设备中运行。提出一种新的事件论元抽取方法,将事件论元抽取教师模型蒸馏到2个不同的学生模型中,再对2个学生模型进行集成。构造使用BERT模型和图卷积神经网络的事件论元抽取教师模型,以及2个分别使用单层卷积神经网络和单层长短期记忆网络的学生模型。先通过均方误差损失函数对学生模型和教师模型的中间层向量进行知识蒸馏,再对分类层进行知识蒸馏,使用均方误差损失函数和交叉熵损失函数让学生模型学习教师模型分类层的知识和真实标签的知识。在此基础上,利用加权平均的方法对2个学生模型进行集成,从而提升事件论元抽取性能。使用ACE2005英文数据集进行实验,结果表明,与学生模型相比,该方法可使事件论元抽取F1值平均提升5.05个百分点,推理时间和参数量较教师模型减少90.85%和99.25%。

关键词: 事件论元抽取, 知识蒸馏, 模型集成, 预训练语言模型, 模型压缩

Abstract: Existing advanced event argument extraction methods focus on model's performance and ignore model's size and efficiency.These models exist problems of high computation cost and high delay.To address these problems, this paper proposes Event Argument Extraction method via knowledge Distillation and model Ensemble(EAEDE).The event argument extraction teacher model is distilled into two different student models, and then ensemble the student models.Firstly, a event argument extraction teacher model using BERT and graph Convolution Neural Network(CNN) is constructed, and then two student models using Long Short-Term Memory network(LSTM) and CNN respectively are constructed.During the distilling process, the student models learn the middle hidden of teacher model, and then learn the logits of teacher model.The Mean Square Error(MSE) loss function and Cross Entropy(CE) loss function are used to let students learn the knowledge of the teacher's model classification layer and the knowledge of the real label.Finally, the weighted average method is used to ensemble the two student models to get the final model.The experiments using ACE2005 dataset show that this method improves the event argument extraction performance of student models by an average of 5.05 percentage points, while reduces the infer time by 90.85% and reduces the size of model by 99.25%, comparing with the teacher model.

Key words: event argument extraction, knowledge distillation, model ensemble, pre-trained language model, model compression

中图分类号: