基于知识蒸馏与模型集成的事件论元抽取方法

doi:10.19678/j.issn.1000-3428.0061790

计算机工程 ›› 2022, Vol. 48 ›› Issue (7): 97-103. doi: 10.19678/j.issn.1000-3428.0061790

基于知识蒸馏与模型集成的事件论元抽取方法

王士浩, 王中卿, 李寿山, 周国栋

苏州大学计算机科学与技术学院, 江苏苏州 215006

收稿日期:2021-05-31 修回日期:2021-09-16 出版日期:2022-07-15 发布日期:2022-07-12
作者简介:王士浩(1997—),男,硕士研究生,主研方向为事件论元抽取;王中卿(通信作者),副教授;李寿山、周国栋,教授。
基金资助:
国家自然科学基金（61806137，61702518）；江苏省高等学校自然科学研究面上项目（18KJB520043）。

Event Argument Extraction Method Based on Knowledge Distillation and Model Ensemble

WANG Shihao, WANG Zhongqing, LI Shoushan, ZHOU Guodong

School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006, China

Received:2021-05-31 Revised:2021-09-16 Online:2022-07-15 Published:2022-07-12

摘要/Abstract

摘要： 目前先进的事件论元抽取方法通常使用BERT模型作为编码器，但BERT巨大的参数量会降低效率，使模型无法在计算资源有限的设备中运行。提出一种新的事件论元抽取方法，将事件论元抽取教师模型蒸馏到2个不同的学生模型中，再对2个学生模型进行集成。构造使用BERT模型和图卷积神经网络的事件论元抽取教师模型，以及2个分别使用单层卷积神经网络和单层长短期记忆网络的学生模型。先通过均方误差损失函数对学生模型和教师模型的中间层向量进行知识蒸馏，再对分类层进行知识蒸馏，使用均方误差损失函数和交叉熵损失函数让学生模型学习教师模型分类层的知识和真实标签的知识。在此基础上，利用加权平均的方法对2个学生模型进行集成，从而提升事件论元抽取性能。使用ACE2005英文数据集进行实验，结果表明，与学生模型相比，该方法可使事件论元抽取F1值平均提升5.05个百分点，推理时间和参数量较教师模型减少90.85%和99.25%。

关键词: 事件论元抽取, 知识蒸馏, 模型集成, 预训练语言模型, 模型压缩

Abstract: Existing advanced event argument extraction methods focus on model's performance and ignore model's size and efficiency.These models exist problems of high computation cost and high delay.To address these problems, this paper proposes Event Argument Extraction method via knowledge Distillation and model Ensemble(EAEDE).The event argument extraction teacher model is distilled into two different student models, and then ensemble the student models.Firstly, a event argument extraction teacher model using BERT and graph Convolution Neural Network(CNN) is constructed, and then two student models using Long Short-Term Memory network(LSTM) and CNN respectively are constructed.During the distilling process, the student models learn the middle hidden of teacher model, and then learn the logits of teacher model.The Mean Square Error(MSE) loss function and Cross Entropy(CE) loss function are used to let students learn the knowledge of the teacher's model classification layer and the knowledge of the real label.Finally, the weighted average method is used to ensemble the two student models to get the final model.The experiments using ACE2005 dataset show that this method improves the event argument extraction performance of student models by an average of 5.05 percentage points, while reduces the infer time by 90.85% and reduces the size of model by 99.25%, comparing with the teacher model.

Key words: event argument extraction, knowledge distillation, model ensemble, pre-trained language model, model compression

中图分类号:

TP18

王士浩, 王中卿, 李寿山, 周国栋. 基于知识蒸馏与模型集成的事件论元抽取方法[J]. 计算机工程, 2022, 48(7): 97-103.

WANG Shihao, WANG Zhongqing, LI Shoushan, ZHOU Guodong. Event Argument Extraction Method Based on Knowledge Distillation and Model Ensemble[J]. Computer Engineering, 2022, 48(7): 97-103.

http://www.ecice06.com/CN/Y2022/V48/I7/97

图/表 9

20220808084458

20220808084539

20220808084543

20220808084547

20220808084551

20220808084555

20220808084559

20220808084602

20220808084606

参考文献

[1] DEVLIN J, CHANG M W, LEE K, et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2021-05-10].https://arxiv.org/abs/1810.04805.
[2] SANH V, DEBUT L, CHAUMOND J, et al.DistilBERT, a distilled version of BERT:smaller, faster, cheaper and lighter[EB/OL].[2021-05-10].https://arxiv.org/abs/1910. 01108.
[3] JIAO X Q, YIN Y C, SHANG L F, et al.TinyBERT:distilling BERT for natural language understanding[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2020:4163-4174.
[4] SUN Z Q, YU H K, SONG X D, et al.MobileBERT:a compact task-agnostic BERT for resource-limited devices[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2020:2158-2170.
[5] TANG R, LU Y, LIU L Q, et al.Distilling task-specific knowledge from BERT into simple neural networks[EB/OL].[2021-05-10].https://arxiv.org/abs/1903.12136.
[6] CHEN Y B, XU L H, LIU K, et al.Event extraction via dynamic multi-pooling convolutional neural networks[C]//Proceedigns of the 53th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2015:409-419.
[7] NGUYEN T H, CHO K, GRISHMAN R.Joint event extraction via recurrent neural networks[C]//Proceedings of NAACL-HLT 2016.Stroudsburg, USA:Association for Computational Linguistics, 2016:300-309.
[8] SHA L, QIAN F, CHANG B, et al.Jointly extracting event triggers and arguments by dependency-bridge RNN and tensor-based argument interaction[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI, 2018:5916-5923.
[9] LIU X, LUO Z C, HUANG H Y.Jointly multiple events extraction via attention-based graph information aggregation[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2018:1247-1256.
[10] 贺瑞芳, 段绍杨.基于多任务学习的中文事件抽取联合模型[J].软件学报, 2019, 30(4):1015-1030. HE R F, DUAN S Y.Joint Chinese event extraction based multi-task learning[J].Journal of Software, 2019, 30(4):1015-1030.(in Chinese)
[11] WANG X, HAN X, LIU Z, et al.Adversarial training for weakly supervised event detection[C]//Proceedings of NAACL-HLT 2019.Stroudsburg, USA:Association for Computational Linguistics, 2019:998-1008.
[12] WANG X, WANG Z, HAN X, et al.HMEAE:hierarchical modular event argument extraction[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2019:5781-5787.
[13] YANG S, FENG D W, QIAO L B, et al.Exploring pre-trained language models for event extraction and generation[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.Stroudsburg, USA:Association for Computational Linguistics, 2019:5284-5294.
[14] LIU J, CHEN Y, LIU K, et al.Event extraction as machine reading comprehension[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2020:1641-1651.
[15] LI F Y, PENG W H, CHEN Y G, et al.Event extraction as multi-turn question answering[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2020:829-838.
[16] HINTON G, VINYALS O, DEAN J.Distilling the knowledge in a neural network[EB/OL].[2021-05-10].https://arxiv.org/abs/1503.02531.
[17] AHN S, HU S X, DAMIANOU A, et al.Variational information distillation for knowledge transfer[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2019:9155-9163.
[18] ZAGORUYKO S, KOMODAKIS N.Paying more attention to attention:improving the performance of convolutional neural networks via attention transfer[EB/OL].[2021-05-10].https://arxiv.org/abs/1612.03928.
[19] 廖胜兰, 吉建民, 俞畅, 等.基于BERT模型与知识蒸馏的意图分类方法[J].计算机工程, 2021, 47(5):73-79. LIAO S L, JI J M, YU C, et al.Intention classification method based on BERT model and knowledge distillation[J].Computer Engineering, 2021, 47(5):73-79.(in Chinese)
[20] LAN Z Z, CHEN M D, GOODMAN S, et al.ALBERT:a lite BERT for self-supervised learning of language representations[EB/OL].[2021-05-10].https://arxiv.org/abs/1909.11942.
[21] PENNINGTON J, SOCHER R, MANNING C.GloVe:global vectors for word representation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Stroudsburg, USA:Association for Computational Linguistics, 2014:1532-1543.
[22] KIPF T N, WELLING M.Semi-supervised classification with graph convolutional networks[EB/OL].[2021-05-10].https://arxiv.org/abs/1609.02907.

选择文件类型/文献管理软件名称

选择包含的内容

基于知识蒸馏与模型集成的事件论元抽取方法

Event Argument Extraction Method Based on Knowledge Distillation and Model Ensemble

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	曹坪, 杨怀志, 薄一军, 尤嘉, 张淳杰, 李丹勇. 面向低质量裂缝图像的多知识蒸馏分类[J]. 计算机工程, 2023, 49(7): 204-213.
[2]	毛亮, 赵林均, 余敦辉, 孙斌. 基于知识蒸馏的企业命名实体识别模型[J]. 计算机工程, 2023, 49(5): 90-96.
[3]	李宜亭, 屈丹, 杨绪魁, 张昊, 沈小龙. 基于分解门控注意力单元的高效Conformer模型[J]. 计算机工程, 2023, 49(5): 73-80.
[4]	郭奕裕, 周箩鱼. 安全帽佩戴检测网络模型的轻量化设计[J]. 计算机工程, 2023, 49(4): 312-320.
[5]	陈柏霖, 王天极, 任丽娜, 黄瑞章. 融合ELECTRA和文本局部信息的中文语法错误检测方法[J]. 计算机工程, 2023, 49(3): 304-311.
[6]	禹克强, 黄芳, 吴琪, 欧阳洋. 基于双向语义的中文实体关系联合抽取方法[J]. 计算机工程, 2023, 49(1): 92-99,112.
[7]	王国栋, 叶剑, 谢萦, 钱跃良. 基于梯度的自适应阈值结构化剪枝算法[J]. 计算机工程, 2022, 48(9): 113-120.
[8]	黄君扬, 王振宇, 梁家卿, 肖仰华. 基于自裁剪异构图的NL2SQL模型[J]. 计算机工程, 2022, 48(9): 71-77,88.
[9]	李军怀, 陈苗苗, 王怀军, 崔颖安, 张爱华. 基于ALBERT-BGRU-CRF的中文命名实体识别方法[J]. 计算机工程, 2022, 48(6): 89-94,106.
[10]	佘朝阳, 严馨, 徐广义, 陈玮, 邓忠莹. 融合数据增强与半监督学习的药物不良反应检测[J]. 计算机工程, 2022, 48(6): 314-320.
[11]	何涛, 俞舒曼, 徐鹤. 基于条件生成对抗网络与知识蒸馏的单幅图像去雾方法[J]. 计算机工程, 2022, 48(4): 165-172.
[12]	鲁统伟, 徐子昕, 闵锋. 基于生成对抗网络的知识蒸馏数据增强[J]. 计算机工程, 2022, 48(4): 70-80.
[13]	孔维刚, 李文婧, 王秋艳, 曹鹏程, 宋庆增. 基于改进YOLOv4算法的轻量化网络设计与实现[J]. 计算机工程, 2022, 48(3): 181-188.
[14]	巩杰, 赵烁, 何虎, 邓宁. 基于FPGA的量化CNN加速系统设计[J]. 计算机工程, 2022, 48(3): 170-174,196.
[15]	于尊瑞, 毛震东, 王泉, 张勇东. 基于预训练语言模型的关键词感知问题生成[J]. 计算机工程, 2022, 48(2): 125-131.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于知识蒸馏与模型集成的事件论元抽取方法

Event Argument Extraction Method Based on Knowledge Distillation and Model Ensemble

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献

相关文章 15

编辑推荐

Metrics

本文评价