作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (10): 67-72,80. doi: 10.19678/j.issn.1000-3428.0062206

• 人工智能与模式识别 • 上一篇    下一篇

基于混合注意力机制的中文机器阅读理解

刘高军1,2, 李亚欣1,2, 段建勇1,2   

  1. 1. 北方工业大学 信息学院, 北京 100144;
    2. 北方工业大学CNONIX国家标准应用与推广实验室, 北京 100144
  • 收稿日期:2021-07-29 修回日期:2021-11-03 发布日期:2021-11-15
  • 作者简介:刘高军(1962—),男,教授,主研方向为数据处理、软件服务;李亚欣,硕士研究生;段建勇,教授。
  • 基金资助:
    国家自然科学基金(61972003,61672040)。

Chinese Machine Reading Comprehension Based on Hybrid Attention Mechanism

LIU Gaojun1,2, LI Yaxin1,2, DUAN Jianyong1,2   

  1. 1. School of Information, North China University of Technology, Beijing 100144, China;
    2. CNONIX National Standard Application and Promotion Laboratory, North China University of Technology, Beijing 100144, China
  • Received:2021-07-29 Revised:2021-11-03 Published:2021-11-15

摘要: 预训练语言模型在机器阅读理解领域具有较好表现,但相比于英文机器阅读理解,基于预训练语言模型的阅读理解模型在处理中文文本时表现较差,只能学习文本的浅层语义匹配信息。为了提高模型对中文文本的理解能力,提出一种基于混合注意力机制的阅读理解模型。该模型在编码层使用预训练模型得到序列表示,并经过BiLSTM处理进一步加深上下文交互,再通过由两种变体自注意力组成的混合注意力层处理,旨在学习深层语义表示,以加深对文本语义信息的理解,而融合层结合多重融合机制获取多层次的表示,使得输出的序列携带更加丰富的信息,最终使用双层BiLSTM处理输入输出层得到答案位置。在CMRC2018数据集上的实验结果表明,与复现的基线模型相比,该模型的EM值和F1值分别提升了2.05和0.465个百分点,能够学习到文本的深层语义信息,有效改进预训练语言模型。

关键词: 中文机器阅读理解, 注意力机制, 融合机制, 预训练模型, RoBERTa模型

Abstract: The pre-training language model performs well in the field of machine reading comprehension.Compared with English machine reading comprehension, the reading comprehension model based on the pre-training language model performs slightly worse in processing Chinese text and can only learn the shallow semantic matching of the text.To improve the ability of the model to understand Chinese text, this paper proposes a Chinese machine reading comprehension model based on hybrid attention mechanism.The model uses the pre-training model to obtain the sequence representation in the coding layer and further deepens the context interaction through BiLSTM processing.Then, this is processed by a hybrid attention layer comprising two variants of self-attention mechanism, which aims to learn the deep semantic representation, to deepen the understanding of the text semantic information.Further, the fusion layer combines multiple fusion mechanisms to obtain the multi-level representation, making the output sequence carry more rich information.Finally, after double BiLSTM processing, input output layer to get the answer position.The experimental results on CMRC2018 dataset show that the EM and F1 values of this model are increased by 2.05 and 0.465 percentage points, respectively, compared with those of the baseline model.This enables to learn the deep semantic information of the text and effectively improve the pre-trained language model.

Key words: Chinese machine reading comprehension, attention mechanism, fusion mechanism, pre-training model, RoBERTa model

中图分类号: