一种注意力增强的自然语言推理模型

doi:10.19678/j.issn.1000-3428.0054953

计算机工程 ›› 2020, Vol. 46 ›› Issue (7): 91-97. doi: 10.19678/j.issn.1000-3428.0054953

一种注意力增强的自然语言推理模型

李冠宇^a,b, 张鹏飞^a,b, 贾彩燕^a,b

北京交通大学 a. 计算机与信息技术学院;b. 交通数据分析与挖掘北京市重点实验室, 北京 100044

收稿日期:2019-05-20 修回日期:2019-08-09 发布日期:2019-08-20
作者简介:李冠宇(1993-),男,硕士研究生,主研方向为自然语言处理、机器学习;张鹏飞,硕士研究生;贾彩燕,教授、博士。
基金资助:
国家自然科学基金（61876016）；中央高校基本科研业务费专项资金（2017JBM023）。

An Attention-enchanced Natural Language Reasoning Model

LI Guanyu^a,b, ZHANG Pengfei^a,b, JIA Caiyan^a,b

a. School of Computer and Information Technology;b. Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China

Received:2019-05-20 Revised:2019-08-09 Published:2019-08-20

摘要/Abstract

摘要： 在自然语言处理任务中使用注意力机制可准确衡量单词重要度。为此，提出一种注意力增强的自然语言推理模型aESIM。将词注意力层以及自适应方向权重层添加到ESIM模型的双向LSTM网络中，从而更有效地学习单词与句子表示，同时提高前提与假设文本之间局部推理的建模效率。在SNLI、MultiNLI及Quora数据集上的实验结果表明，与ESIM、HBMP、SSE等模型相比，aESIM模型的准确率能够提升0.5%~1%。

关键词: 自然语言处理, 自然语言推理, ESIM模型, 注意力机制, 双向LSTM网络

Abstract: In natural language processing tasks,the attention mechanism can be used to evaluate the importance of a word.On this basis,this paper proposes an attention-enhanced natural language reasoning model,aESIM.The model adds the word attention layer and the adaptive direction weight layer to the bidirectional LSTM network of the ESIM model,so as to learn the representation of words and sentences more effectively,and increase the modelling efficiency of local inference between premises and hypothetical texts.Experimental results on datasets of SNLI,MultiNLI and Quora show that,compared with ESIM,HBMP,SSE and other models,aESIM increases the accuracy rate by 0.5%~1%.

Key words: natural language processing, natural language reasoning, ESIM model, attention mechanism, bidirectional LSTM network

中图分类号:

TP18

李冠宇, 张鹏飞, 贾彩燕. 一种注意力增强的自然语言推理模型[J]. 计算机工程, 2020, 46(7): 91-97.

LI Guanyu, ZHANG Pengfei, JIA Caiyan. An Attention-enchanced Natural Language Reasoning Model[J]. Computer Engineering, 2020, 46(7): 91-97.

https://www.ecice06.com/CN/Y2020/V46/I7/91

参考文献

[1] ZHANG Zhichang,YAO Dongren,LIU Xie,et al.Textual entailment recognition fused with syntactic structure transformation and lexical semantic features[J].Computer Engineering,2015,41(9):199-204.(in Chinese)张志昌,姚东任,刘霞,等.融合句法结构变换与词汇语义特征的文本蕴涵识别[J].计算机工程,2015,41(9):199-204.
[2] LAN Wuwei,XU Wei.Neural network models for paraphrase identification,semantic textual similarity,natural language inference,and question answering[C]//Proceedings of the 27th International Conference on Computational Linguistics.Santa Fe,USA:[s.n.],2018:3890-3902.
[3] GUO Maosheng,ZHANG Yu,LIU Ting.Research advances and prospect of recognizing textual entailment and knowledge acquisition[J].Chinese Journal of Computers,2017,40(4):119-140.(in Chinese)郭茂盛,张宇,刘挺.文本蕴含关系识别与知识获取研究进展及展望[J].计算机学报,2017,40(4):119-140.
[4] BOWMAN S R,ANGELI G,POTTS C,et al.A large annotated corpus for learning natural language inference[J].Empirical Methods in Natural Language Processing,2015,41:632-642.
[5] YANG Z,YANG D,DYER C,et al.Hierarchical attention networks for document classification[EB/OL].[2019-04-10].https://arxiv.org/abs/1707.00896v1.
[6] CHEN Qian,ZHU Xiaodan,LING Zhenhua,et al.Enhanced LSTM for natural language inference[C]//Proceedings of IEEE Meeting on Association for Computational Linguistics.Washington D.C.,USA:IEEE Press,2017:1657-1668.
[7] NIE Y,BANSAL M.Shortcut-stacked sentence encoders for multi-domain inference[C]//Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP.Washington D.C.,USA:IEEE Press,2017:165-178.
[8] CONNEAU A,KIELA D,SCHWENK H,et al.Supervised learning of universal sentence representations from natural language inference data[EB/OL].[2019-04-10].https://arxiv.org/abs/1705.02364v5.
[9] TALMAN A,YLIJYRA A,TIEDEMANN J,et al.Natural language inference with hierarchical BiLSTM max pooling architecture[EB/OL].[2019-04-10].https://arxiv.org/abs/1808.08762v1.
[10] IM J,CHO S.Distance-based self-attention network for natural language inference[EB/OL].[2019-04-10].https://arxiv.org/abs/1712.02047v1.
[11] SHEN Tao,ZHOU Tianyi,LONG Guodong,et al.Reinforced self-attention network:a hybrid of hard and soft attention for sequence modeling[EB/OL].[2019-04-10].https://arxiv.org/abs/1801.10296.
[12] CHENG J,DONG L,LAPATA M,et al.Long short-term memory-networks for machine reading[EB/OL].[2019-04-10].https://arxiv.org/abs/1601.06733.
[13] CHUNG J,GULCEHRE C,CHO K,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].[2019-04-10].https://arxiv.org/abs/1412.3555.
[14] PARIKH A P,TACKSTROM O,DAS D,et al.A decomposable attention model for natural language inference[EB/OL].[2019-04-10].https://arxiv.org/abs/1606.01933v2.
[15] WANG Z,HAMZA W,FLORIAN R,et al.Bilateral multi-perspective matching for natural language sentences[C]//Proceedings of IEEE International Joint Conference on Artificial Intelligence.Washington D.C.,USA:IEEE Press,2017:4144-4150.
[16] KIM S,KANG I,KWAK N.Semantic sentence matching with densely-connected recurrent and co-attentive information[EB/OL].[2019-04-10].https://arxiv.org/abs/1805.11360.
[17] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].[2019-04-10].https://arxiv.org/abs/1409.0473.
[18] HE H,LIN J J.Pairwise word interaction modeling with deep neural networks for semantic similarity measurement[C]//Proceedings of 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.San Diego,USA:[s.n.],2016:937-948.
[19] WILLIAMS A,NANGIA N,BOEMAN S R,et al.A broad-coverage challenge corpus for sentence understanding through inference[EB/OL].[2019-04-10].https://arxiv.org/abs/1704.05426v2.
[20] PENNINGTON J,SOCHER R,MANNING C D,et al.GloVe:global vectors for word representation[EB/OL].[2019-04-10].https://www.aclweb.org/anthology/D14-1162.

选择文件类型/文献管理软件名称

选择包含的内容

一种注意力增强的自然语言推理模型

An Attention-enchanced Natural Language Reasoning Model

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[2]	林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.
[3]	李泽霖, 吕兆峰, 陈富强, 李克. 基于多跳信息融合的实体对齐模型[J]. 计算机工程, 2024, 50(9): 142-152.
[4]	王汝英, 马嘉骏, 董建强, 刘万龙, 张海涛, 尹凯, 赵博超. 基于MTS-BiGRU-DMHSA的工业负荷预测方法[J]. 计算机工程, 2024, 50(9): 169-178.
[5]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[6]	张天鹏, 韩晶, 吕学强. 基于多任务学习的超分辨率辅助小目标检测[J]. 计算机工程, 2024, 50(9): 304-312.
[7]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[8]	曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.
[9]	饶日昕, 王怡文, 曾砺志, 童心恬, 赵海涛. 面向废旧电缆检测的轻量化网络模型[J]. 计算机工程, 2024, 50(8): 22-30.
[10]	李华昱, 张智康, 闫阳, 岳阳. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39.
[11]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[12]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[13]	王夙喆, 张雪英, 陈晓玉, 李凤莲, 吴泽林. 基于有效注意力和GAN结合的脑卒中EEG增强算法[J]. 计算机工程, 2024, 50(8): 336-344.
[14]	王宇, 祁琦, 王纯, 许才. 储能变流器信号高精度故障诊断方法[J]. 计算机工程, 2024, 50(8): 389-396.
[15]	王炼红, 林飞鹏, 李潇瑶, 谌桂枝, 周莉. 融入课程知识图谱的KMAKT预测[J]. 计算机工程, 2024, 50(7): 23-31.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

一种注意力增强的自然语言推理模型

An Attention-enchanced Natural Language Reasoning Model

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价