作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (5): 41-50. doi: 10.19678/j.issn.1000-3428.0068121

• 热点与综述 • 上一篇    下一篇

基于MacBERT与对抗训练的机器阅读理解模型

周昭辰1, 方清茂2, 吴晓红1, 胡平2, 何小海1   

  1. 1. 四川大学电子信息学院, 四川 成都 610065;
    2. 四川省中医药科学院, 四川 成都 610041
  • 收稿日期:2023-07-20 修回日期:2023-11-01 发布日期:2023-12-29
  • 通讯作者: 吴晓红,E-mail:wxh@scu.edu.cn E-mail:wxh@scu.edu.cn
  • 基金资助:
    成都市重大科技应用示范项目(2019-YF09-00120-SN)。

Machine Reading Comprehension Model Based on MacBERT and Adversarial Training

ZHOU Zhaochen1, FANG Qingmao2, WU Xiaohong1, HU Ping2, HE Xiaohai1   

  1. 1. School of Electronic Information, Sichuan University, Chengdu 610065, Sichuan, China;
    2. Sichuan Academy of Chinese Medicine Sciences, Chengdu 610041, Sichuan, China
  • Received:2023-07-20 Revised:2023-11-01 Published:2023-12-29
  • Contact: 吴晓红,E-mail:wxh@scu.edu.cn E-mail:wxh@scu.edu.cn

摘要: 机器阅读理解旨在让机器像人类一样理解自然语言文本,并据此进行问答任务。近年来,随着深度学习和大规模数据集的发展,机器阅读理解引起了广泛关注,但是在实际应用中输入的问题通常包含各种噪声和干扰,这些噪声和干扰会影响模型的预测结果。为了提高模型的泛化能力和鲁棒性,提出一种基于掩码校正的来自Transformer的双向编码器表示(MacBERT)与对抗训练(AT)的机器阅读理解模型。首先利用MacBERT对输入的问题和文本进行词嵌入转化为向量表示;然后根据原始样本反向传播的梯度变化在原始词向量上添加微小扰动生成对抗样本;最后将原始样本和对抗样本输入双向长短期记忆(BiLSTM)网络进一步提取文本的上下文特征,输出预测答案。实验结果表明,该模型在简体中文数据集CMRC2018上的F1值和精准匹配(EM)值分别较基线模型提高了1.39和3.85个百分点,在繁体中文数据集DRCD上的F1值和EM值分别较基线模型提高了1.22和1.71个百分点,在英文数据集SQuADv1.1上的F1值和EM值分别较基线模型提高了2.86和1.85个百分点,优于已有的大部分机器阅读理解模型,并且在真实问答结果上与基线模型进行对比,结果验证了该模型具有更强的鲁棒性和泛化能力,在输入的问题存在噪声的情况下性能更好。

关键词: 机器阅读理解, 对抗训练, 预训练模型, 掩码校正的来自Transformer的双向编码器表示, 双向长短期记忆网络

Abstract: Machine reading comprehension is designed to allow machines to understand natural language texts, resembling humans, and perform question-answering tasks accordingly. In recent years, owing to the development of deep learning and large-scale datasets, machine reading comprehension has received widespread attention. However, input problems in practical applications typically involve various noises and interferences, which affect the prediction results of a model. To improve the generalizability and robustness of a model, a machine reading comprehension model based on Masked language modeling as correction Bidirectional Encoder Representations from Transformers (MacBERT) and Adversarial Training (AT) is proposed. First, MacBERT is used to convert input questions and texts into word embeddings and vector representations. Subsequently, a small perturbation is added to the original word vector based on the gradient change of the original sample backpropagation to generate an adversarial sample. Finally, the original and adversarial samples are input into a Bidirectional Long Short-Term Memory (BiLSTM) network to further extract the contextual features of the text and output the predicted answer. Experimental results show that the F1 and Exact Matching (EM) values of this model on the simplified Chinese dataset CMRC2018 improve by 1.39 and 3.85 percentage points, respectively, compared with those of the baseline model. Meanwhile, the F1 and EM values on the traditional Chinese dataset DRCD improve by 1.22 and 1.71 percentage points, respectively, compared with those of the baseline model. Moreover, the F1 and EM values on the English dataset SQuADv1.1 improve by 2.86 and 1.85 percentage points, respectively, compared with those of the baseline model. The experimental results are better than those of most existing machine reading comprehension models. Based on actual question-answering results, the proposed model outperforms the baseline model in terms of robustness and generalizability; additionally, it performs better when the input problems contain noise.

Key words: machine reading comprehension, Adversarial Training(AT), pre-trained model, Masked language modeling as correction Bidirectional Encoder Representations from Transformers (MacBERT), Bidirectional Long Short-Term Memory (BiLSTM) network

中图分类号: