基于MacBERT与对抗训练的机器阅读理解模型

doi:10.19678/j.issn.1000-3428.0068121

摘要/Abstract

摘要： 机器阅读理解旨在让机器像人类一样理解自然语言文本,并据此进行问答任务。近年来,随着深度学习和大规模数据集的发展,机器阅读理解引起了广泛关注,但是在实际应用中输入的问题通常包含各种噪声和干扰,这些噪声和干扰会影响模型的预测结果。为了提高模型的泛化能力和鲁棒性,提出一种基于掩码校正的来自Transformer的双向编码器表示(MacBERT)与对抗训练(AT)的机器阅读理解模型。首先利用MacBERT对输入的问题和文本进行词嵌入转化为向量表示;然后根据原始样本反向传播的梯度变化在原始词向量上添加微小扰动生成对抗样本;最后将原始样本和对抗样本输入双向长短期记忆(BiLSTM)网络进一步提取文本的上下文特征,输出预测答案。实验结果表明,该模型在简体中文数据集CMRC2018上的F1值和精准匹配(EM)值分别较基线模型提高了1.39和3.85个百分点,在繁体中文数据集DRCD上的F1值和EM值分别较基线模型提高了1.22和1.71个百分点,在英文数据集SQuADv1.1上的F1值和EM值分别较基线模型提高了2.86和1.85个百分点,优于已有的大部分机器阅读理解模型,并且在真实问答结果上与基线模型进行对比,结果验证了该模型具有更强的鲁棒性和泛化能力,在输入的问题存在噪声的情况下性能更好。

关键词: 机器阅读理解, 对抗训练, 预训练模型, 掩码校正的来自Transformer的双向编码器表示, 双向长短期记忆网络

Abstract: Machine reading comprehension is designed to allow machines to understand natural language texts, resembling humans, and perform question-answering tasks accordingly. In recent years, owing to the development of deep learning and large-scale datasets, machine reading comprehension has received widespread attention. However, input problems in practical applications typically involve various noises and interferences, which affect the prediction results of a model. To improve the generalizability and robustness of a model, a machine reading comprehension model based on Masked language modeling as correction Bidirectional Encoder Representations from Transformers (MacBERT) and Adversarial Training (AT) is proposed. First, MacBERT is used to convert input questions and texts into word embeddings and vector representations. Subsequently, a small perturbation is added to the original word vector based on the gradient change of the original sample backpropagation to generate an adversarial sample. Finally, the original and adversarial samples are input into a Bidirectional Long Short-Term Memory (BiLSTM) network to further extract the contextual features of the text and output the predicted answer. Experimental results show that the F1 and Exact Matching (EM) values of this model on the simplified Chinese dataset CMRC2018 improve by 1.39 and 3.85 percentage points, respectively, compared with those of the baseline model. Meanwhile, the F1 and EM values on the traditional Chinese dataset DRCD improve by 1.22 and 1.71 percentage points, respectively, compared with those of the baseline model. Moreover, the F1 and EM values on the English dataset SQuADv1.1 improve by 2.86 and 1.85 percentage points, respectively, compared with those of the baseline model. The experimental results are better than those of most existing machine reading comprehension models. Based on actual question-answering results, the proposed model outperforms the baseline model in terms of robustness and generalizability; additionally, it performs better when the input problems contain noise.

Key words: machine reading comprehension, Adversarial Training(AT), pre-trained model, Masked language modeling as correction Bidirectional Encoder Representations from Transformers (MacBERT), Bidirectional Long Short-Term Memory (BiLSTM) network

中图分类号:

TP18

周昭辰, 方清茂, 吴晓红, 胡平, 何小海. 基于MacBERT与对抗训练的机器阅读理解模型[J]. 计算机工程, 2024, 50(5): 41-50.

ZHOU Zhaochen, FANG Qingmao, WU Xiaohong, HU Ping, HE Xiaohai. Machine Reading Comprehension Model Based on MacBERT and Adversarial Training[J]. Computer Engineering, 2024, 50(5): 41-50.

https://www.ecice06.com/CN/Y2024/V50/I5/41

参考文献

[1] HERMANN K M, KOISKY' T, GREFENSTETTE E, et al. Teaching machines to read and comprehend[EB/OL].[2023-07-07]. https://arxiv.org/abs/1506.03340.
[2] SEO M, KEMBHAVI A, FARHADI A, et al. Bidirectional attention flow for machine comprehension[EB/OL].[2023-07-07]. http://arxiv.org/abs/1611.01603.
[3] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of NIPS'17. Cambridge, USA:MIT Press, 2017:5998-6008.
[4] DEVLIN J, CHANG M W, LEE K, et al. BERT:pre-training of deep bidirectional Transformers for language understanding[EB/OL].[2023-07-07]. http://arxiv.org/abs/1810.04805.
[5] LIU Y, OTT M, GOYAL N, et al. RoBERTa:a robustly optimized BERT pretraining approach[EB/OL].[2023-07-07]. http://arxiv.org/abs/1907.11692.
[6] CUI Y, CHE W, LIU T, et al. Revisiting pre-trained models for Chinese natural language processing[EB/OL].[2023-07-07]. http://arxiv.org/abs/2004.13922.
[7] YANG Z, DAI Z, YANG Y, et al. XLNet:generalized autoregressive pretraining for language understanding[EB/OL].[2023-07-07]. http://arxiv.org/abs/1906.08237.
[8] LAN Z, CHEN M, GOODMAN S, et al. ALBERT:a lite BERT for self-supervised learning of language representations[EB/OL].[2023-07-07]. http://arxiv.org/abs/1909.11942.
[9] JOSHI M, CHEN D Q, LIU Y H, et al. SpanBERT:improving pre-training by representing and predicting spans[J]. Transactions of the Association for Computational Linguistics, 2020, 8:64-77.
[10] RAJPURKAR P, ZHANG J, LOPYREV K, et al. SQuAD:100, 000+ questions for machine comprehension of text[C]//Proceedings of 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA:Association for Computational Linguistics, 2016:2383-2392.
[11] TRISCHLER A, WANG T, YUAN X, et al. NewsQA:a machine comprehension dataset[EB/OL].[2023-07-07]. http://arxiv.org/abs/1611.09830.
[12] JOSHI M, CHOI E, WELD D S, et al. TriviaQA:a large scale distantly supervised challenge dataset for reading comprehension[EB/OL].[2023-07-07]. http://arxiv.org/abs/1705.03551.
[13] LAI G, XIE Q, LIU H, et al. RACE:large-scale reading comprehension dataset from examinations[EB/OL].[2023-07-07]. http://arxiv.org/abs/1704.04683.
[14] RAJPURKAR P, JIA R, LIANG P. Know what you don't know:unanswerable questions for SQuAD[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers). Stroudsburg, USA:Association for Computational Linguistics, 2018:784-789.
[15] CUI Y, LIU T, CHEN Z, et al. Dataset for the first evaluation on Chinese machine reading comprehension[EB/OL].[2023-07-07]. http://arxiv.org/abs/1709.08299.
[16] CUI Y M, LIU T, CHE W X, et al. A span-extraction dataset for Chinese machine reading comprehension[C]//Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing(EMNLP-IJCNLP). Stroudsburg, USA:Association for Computational Linguistics, 2019:5883-5888.
[17] SHAO C C, LIU T, LAI Y, et al. DRCD:a Chinese machine reading comprehension dataset[EB/OL].[2023-07-07]. http://arxiv.org/abs/1806.00920.
[18] SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[EB/OL].[2023-07-07]. http://arxiv.org/abs/1312.6199.
[19] GOODFELLOW I J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL].[2023-07-07]. http://arxiv.org/abs/1412.6572.
[20] MIYATO T, DAI A M, GOODFELLOW I. Adversarial training methods for semi-supervised text classification[EB/OL].[2023-07-07]. http://arxiv.org/abs/1605.07725.
[21] MADRY A, MAKELOV A, SCHMIDT L, et al. Towards deep learning models resistant to adversarial attacks[EB/OL].[2023-07-07]. https://arxiv.org/abs/1706.06083.
[22] ZHU C, CHENG Y, GAN Z, et al. FreeLB:enhanced adversarial training for natural language understanding[EB/OL].[2023-07-07]. https://arxiv.org/abs/1909.11764.
[23] 刘高军, 李亚欣, 段建勇. 基于混合注意力机制的中文机器阅读理解[J]. 计算机工程, 2022, 48(10):67-72, 80. LIU G J, LI Y X, DUAN J Y. Chinese machine reading comprehension based on hybrid attention mechanism[J]. Computer Engineering, 2022, 48(10):67-72, 80.(in Chinese)
[24] SUN Z J, LI X Y, SUN X F, et al. ChineseBERT:Chinese pretraining enhanced by glyph and Pinyin information[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). Stroudsburg, USA:Association for Computational Linguistics, 2021:2065-2075.
[25] 韩玉蛟,罗智勇,张明明,等.基于话头话体共享结构信息的机器阅读理解研究[C]//第二十一届中国计算语言学大会论文集.[出版地不详]:中国中文信息学会计算语言专业委员会, 2022:634-643. HAN Y J, LUO Z Y, ZHANG M M, et al. Research on machine reading comprehension based on shared structure information between naming and telling[C]//Proceedings of the 21st Chinese National Conference on Computational Linguistics.[S. l.]:Computational Language Professional Committee of the Chinese Information Society, 2022:634-643.(in Chinese)
[26] SUN Y, WANG S H, LI Y K, et al. ERNIE2.0:a continual pre-training framework for language understanding[EB/OL].[2023-07-07]. https://arxiv.org/abs/1907.12412.
[27] WANG J W, ZHAO H, ZHAO Y G, et al. What if sentence-hood is hard to define:a case study in Chinese reading comprehension[C]//Proceedings of the Findings of the Association for Computational Linguistics:EMNLP 2021. Stroudsburg, USA:Association for Computational Linguistics, 2021:2348-2359.
[28] CHEN B, TANG H, WANG J, et al. CLOWER:a pre-trained language model with contrastive learning over word and character representations[EB/OL].[2023-07-07]. http://arxiv.org/abs/2208.10844.
[29] CHEN D Q, FISCH A, WESTON J, et al. Reading Wikipedia to answer open-domain questions[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg, USA:Association for Computational Linguistics, 2017:1870-1879.

选择文件类型/文献管理软件名称

选择包含的内容