基于混合注意力机制的中文机器阅读理解

doi:10.19678/j.issn.1000-3428.0062206

计算机工程 ›› 2022, Vol. 48 ›› Issue (10): 67-72,80. doi: 10.19678/j.issn.1000-3428.0062206

基于混合注意力机制的中文机器阅读理解

刘高军^1,2, 李亚欣^1,2, 段建勇^1,2

1. 北方工业大学信息学院, 北京 100144;
2. 北方工业大学CNONIX国家标准应用与推广实验室, 北京 100144

收稿日期:2021-07-29 修回日期:2021-11-03 发布日期:2021-11-15
作者简介:刘高军(1962—),男,教授,主研方向为数据处理、软件服务;李亚欣,硕士研究生;段建勇,教授。
基金资助:
国家自然科学基金（61972003，61672040）。

Chinese Machine Reading Comprehension Based on Hybrid Attention Mechanism

LIU Gaojun^1,2, LI Yaxin^1,2, DUAN Jianyong^1,2

1. School of Information, North China University of Technology, Beijing 100144, China;
2. CNONIX National Standard Application and Promotion Laboratory, North China University of Technology, Beijing 100144, China

Received:2021-07-29 Revised:2021-11-03 Published:2021-11-15

摘要/Abstract

摘要： 预训练语言模型在机器阅读理解领域具有较好表现，但相比于英文机器阅读理解，基于预训练语言模型的阅读理解模型在处理中文文本时表现较差，只能学习文本的浅层语义匹配信息。为了提高模型对中文文本的理解能力，提出一种基于混合注意力机制的阅读理解模型。该模型在编码层使用预训练模型得到序列表示，并经过BiLSTM处理进一步加深上下文交互，再通过由两种变体自注意力组成的混合注意力层处理，旨在学习深层语义表示，以加深对文本语义信息的理解，而融合层结合多重融合机制获取多层次的表示，使得输出的序列携带更加丰富的信息，最终使用双层BiLSTM处理输入输出层得到答案位置。在CMRC2018数据集上的实验结果表明，与复现的基线模型相比，该模型的EM值和F1值分别提升了2.05和0.465个百分点，能够学习到文本的深层语义信息，有效改进预训练语言模型。

关键词: 中文机器阅读理解, 注意力机制, 融合机制, 预训练模型, RoBERTa模型

Abstract: The pre-training language model performs well in the field of machine reading comprehension.Compared with English machine reading comprehension, the reading comprehension model based on the pre-training language model performs slightly worse in processing Chinese text and can only learn the shallow semantic matching of the text.To improve the ability of the model to understand Chinese text, this paper proposes a Chinese machine reading comprehension model based on hybrid attention mechanism.The model uses the pre-training model to obtain the sequence representation in the coding layer and further deepens the context interaction through BiLSTM processing.Then, this is processed by a hybrid attention layer comprising two variants of self-attention mechanism, which aims to learn the deep semantic representation, to deepen the understanding of the text semantic information.Further, the fusion layer combines multiple fusion mechanisms to obtain the multi-level representation, making the output sequence carry more rich information.Finally, after double BiLSTM processing, input output layer to get the answer position.The experimental results on CMRC2018 dataset show that the EM and F1 values of this model are increased by 2.05 and 0.465 percentage points, respectively, compared with those of the baseline model.This enables to learn the deep semantic information of the text and effectively improve the pre-trained language model.

Key words: Chinese machine reading comprehension, attention mechanism, fusion mechanism, pre-training model, RoBERTa model

中图分类号:

TP391

刘高军, 李亚欣, 段建勇. 基于混合注意力机制的中文机器阅读理解[J]. 计算机工程, 2022, 48(10): 67-72,80.

LIU Gaojun, LI Yaxin, DUAN Jianyong. Chinese Machine Reading Comprehension Based on Hybrid Attention Mechanism[J]. Computer Engineering, 2022, 48(10): 67-72,80.

https://www.ecice06.com/CN/Y2022/V48/I10/67

图/表 9

参考文献

[1] CHEN D Q.Neural reading comprehension and beyond[D].PaloAlto, USA:Stanford University, 2018.
[2] DEVLIN J, CHANG M W, LEE K, et al.BERT:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2021-06-20].https://arxiv.org/abs/1810.04805.
[3] LAN Z Z, CHEN M D, GOODMAN S, et al.ALBERT:a lite BERT for self-supervised learning of language representations[EB/OL].[2021-06-20].https://arxiv.org/abs/1909.11942.
[4] LIU Y H, OTT M, GOYAL N, et al.RoBERTa:a robustly optimized BERT pretraining approach[EB/OL].[2021-06-20].https://arxiv.org/abs/1907.11692.
[5] BAHDANAU D, CHO K, BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].[2021-06-20].https://arxiv.org/abs/1409.0473.
[6] HERMANN K M, KOČISKÝ T, GREFENSTETTE E, et al.Teaching machines to read and comprehend[EB/OL].[2021-06-20].https://arxiv.org/abs/1506.03340.
[7] KADLEC R, SCHMID M, BAJGAR O, et al.Text understanding with the attention sum reader network[EB/OL].[2021-06-20].https://arxiv.org/abs/1603.01547.
[8] CHEN D Q, BOLTON J, MANNING C D.A thorough examination of the CNN/daily mail reading comprehension task[EB/OL].[2021-06-20].https://arxiv.org/abs/1606.02858.
[9] SEO M, KEMBHAVI A, FARHADI A, et al.Bidirectional attention flow for machine comprehension[EB/OL].[2021-06-20].https://arxiv.org/abs/1611.01603.
[10] CHEN D Q, FISCH A, WESTON J, et al.Reading Wikipedia to answer open-domain questions[EB/OL].[2021-06-20].https://arxiv.org/abs/1704.00051v2.
[11] WANG W H, YANG N, WEI F R, et al.Gated self-matching networks for reading comprehension and question answering[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics.Washington D.C., USA:IEEE Press, 2017:189-198.
[12] HUANG H Y, ZHU C G, SHEN Y L, et al.FusionNet:fusing via fully-aware attention with application to machine comprehension[EB/OL].[2021-06-20].https://arxiv.org/abs/1711.07341.
[13] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[C]//Proceedings of NIPSʼ17.Cambridge, USA:MIT Press, 2017:5998-6008.
[14] CUI Y M, LIU T, CHEN Z, et al.Consensus attention-based neural networks for Chinese reading comprehension[EB/OL].[2021-06-20].https://arxiv.org/abs/1607.02250.
[15] CUI Y M, LIU T, CHE W X, et al.A span-extraction dataset for Chinese machine reading comprehension[EB/OL].[2021-06-20].https://arxiv.org/abs/1810.07366.
[16] CUI Y M, LIU T, YANG Z Q, et al.A sentence cloze dataset for Chinese machine reading comprehension[EB/OL].[2021-06-20].https://arxiv.org/abs/2004.03116.
[17] HE W, LIU K, LIU J, et al.Dureader:a chinese machine reading comprehension dataset from real-world applications[C]//Proceedings of Workshop on Machine Reading for Question Answering.Washington D.C., USA:IEEE Press, 2018:37-46.
[18] 徐丽丽, 李茹, 李月香, 等.面向机器阅读理解的语句填补答案选择方法[J].计算机工程, 2018, 44(7):183-187, 192. XU L L, LI R, LI Y X, et al.Answer selection method of sentence filling for machine reading comprehension[J].Computer Engineering, 2018, 44(7):183-187, 192.(in Chinese)
[19] SHAO C C, LIU T, LAI Y T, et al.DRCD:a Chinese machine reading comprehension dataset[EB/OL].[2021-06-20].https://arxiv.org/abs/1806.00920.
[20] CUI Y M, CHE W X, LIU T, et al.Pre-training with whole word masking for Chinese BERT[EB/OL].[2021-06-20].https://arxiv.org/abs/1906.08101.
[21] TAY Y, BAHRI D, METZLER D, et al.Synthesizer:rethinking self-attention in transformer models[EB/OL].[2021-06-20].https://arxiv.org/abs/2005.00743.
[22] CUI Y M, CHE W X, LIU T, et al.Revisiting pre-trained models for Chinese natural language processing[EB/OL].[2021-06-20].https://arxiv.org/abs/2004.13922.

选择文件类型/文献管理软件名称

选择包含的内容

基于混合注意力机制的中文机器阅读理解

Chinese Machine Reading Comprehension Based on Hybrid Attention Mechanism

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王炼红, 林飞鹏, 李潇瑶, 谌桂枝, 周莉. 融入课程知识图谱的KMAKT预测[J]. 计算机工程, 2024, 50(7): 23-31.
[2]	钱来, 赵卫伟. 基于对比学习和注意力机制的文本分类方法[J]. 计算机工程, 2024, 50(7): 104-111.
[3]	刘建敏, 林晖, 汪晓丁. 基于图注意力机制的无地图场景轨迹预测方法[J]. 计算机工程, 2024, 50(7): 144-153.
[4]	屠乃威, 焦猛, 阎馨. 复杂环境下输电线路鸟巢目标图像检测模型[J]. 计算机工程, 2024, 50(7): 216-226.
[5]	肖慈, 徐杨, 张永丹, 冯明文, 黄易仟. 结合注意力和低光增强的夜间语义分割[J]. 计算机工程, 2024, 50(7): 271-281.
[6]	张锡英, 孙守东, 于海浩, 边继龙. 基于空间传播的多视图三维重建[J]. 计算机工程, 2024, 50(7): 293-302.
[7]	贵向泉, 刘世清, 李立, 秦庆松, 李唐艳. 基于改进YOLOv8的景区行人检测算法[J]. 计算机工程, 2024, 50(7): 342-351.
[8]	程腾腾, 姚春龙, 于晓强, 李旭, 王庆丰. 基于多头注意力机制融合常识知识的共情对话生成[J]. 计算机工程, 2024, 50(6): 94-101.
[9]	更藏措毛, 黄鹤鸣, 杨毅杰. 融合多尺度特征与上下文信息的语音增强方法[J]. 计算机工程, 2024, 50(6): 138-147.
[10]	贺姗, 蔺素珍, 王彦博, 李大威. 基于特征融合的多波段图像描述生成方法[J]. 计算机工程, 2024, 50(6): 236-244.
[11]	杨硕, 王一丁. 基于改进薄板样条运动模型的人脸动画算法[J]. 计算机工程, 2024, 50(6): 255-265.
[12]	陈晓玉, 沈晨, 沈阅, 孔德明. 基于改进SwiftNet的堆场图像实时分割网络[J]. 计算机工程, 2024, 50(6): 296-303.
[13]	周昭辰, 方清茂, 吴晓红, 胡平, 何小海. 基于MacBERT与对抗训练的机器阅读理解模型[J]. 计算机工程, 2024, 50(5): 41-50.
[14]	游奔, 李晓红, 姚锦, 冯绍杰. 基于多粒度图与注意力机制的半监督短文本分类[J]. 计算机工程, 2024, 50(5): 83-90.
[15]	代巍, 王丰羽, 冀常鹏. 基于情感增强与双图卷积网络的方面级情感分析[J]. 计算机工程, 2024, 50(5): 120-127.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于混合注意力机制的中文机器阅读理解

Chinese Machine Reading Comprehension Based on Hybrid Attention Mechanism

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献

相关文章 15

编辑推荐

Metrics

本文评价