融合RNN与稀疏自注意力的文本摘要方法

doi:10.19678/j.issn.1000-3428.0068177

摘要/Abstract

摘要：

随着深度学习的高速发展, 基于序列到序列(Seq2Seq)架构的文本摘要方法成为研究焦点, 但现有大多数文本摘要模型受限于长期依赖, 忽略了注意力机制复杂度以及词序信息对文本摘要生成的影响, 生成的摘要丢失关键信息, 偏离原文内容与意图, 影响用户体验。为了解决上述问题, 提出一种基于Transformer改进的融合递归神经网络(RNN)与稀疏自注意力的文本摘要方法。首先采用窗口RNN模块, 将输入文本按窗口划分, 每个RNN对窗口内词序信息进行压缩, 并通过窗口级别的表示整合为整个文本的表示, 进而增强模型捕获局部依赖的能力; 其次采用基于递归循环机制的缓存模块, 循环缓存上一文本片段的信息到当前片段, 允许模型更好地捕获长期依赖和全局信息; 最后采用稀疏自注意力模块, 通过块稀疏矩阵对注意力矩阵按块划分, 关注并筛选出重要令牌对, 而不是在所有令牌对上平均分配注意力, 从而降低注意力的时间复杂度, 提高长文本摘要任务的效率。实验结果表明, 该方法在数据集text8、enwik8上的BPC分数相比于LoBART模型降低了0.02, 在数据集wikitext-103以及ptb上的PPL分数相比于LoBART模型分别降低了1.0以上, 验证了该方法的可行性与有效性。

关键词: 序列到序列架构, 文本摘要, Transformer模型, 递归神经网络, 递归循环机制, 稀疏自注意力机制

Abstract:

With the rapid development of deep learning, text summarization methods based on the Sequence-to-Sequence (Seq2Seq) architecture have been gaining traction. However, most existing text summarization models are constrained by long-term dependencies, ignore the complexity of attention mechanisms, and the impact of word-order information on text summary generation. The generated summaries often lose key information, deviating from the original content and intent, which affects user experience. To address these issues, a text summarization method based on Transformer improvements, combining a Recurrent Neural Network (RNN) with sparse self-attention, is proposed. First, a window-based RNN module is used to divide the input text into windows, compressing the word-order information within each window. These representations are then integrated at the window level to enhance the ability of the model to capture local dependencies. Second, a cache module based on recursive looping mechanisms is employed to recycle information from the previous text segment to the current segment, allowing the model to better capture long-term dependencies and global information. Finally, a sparse self-attention module is used to partition the attention matrix into blocks, focusing on and selecting important token pairs rather than evenly distributing attention across all token pairs. This reduces the time complexity of the attention and improves the efficiency of long-text summarization tasks. Experimental results show that compared to the LoBART model, this method reduces the BPC score by 0.02 on the text8 and enwik8 datasets and reduces the PPL score by more than 1.0 on the wikitext-103 and ptb datasets, validating the efficacy of this method.

Key words: Sequence-to-Sequence(Seq2Seq) architecture, text summarization, Transformer model, Recurrent Neural Network(RNN), recursive loop mechanism, sparse self-attention mechanism

刘钟, 唐宏, 王宁喆, 朱传润. 融合RNN与稀疏自注意力的文本摘要方法[J]. 计算机工程, 2025, 51(1): 312-320.

LIU Zhong, TANG Hong, WANG Ningzhe, ZHU Chuanrun. Text Summarization Method Incorporating RNN and Sparse Self-Attention[J]. Computer Engineering, 2025, 51(1): 312-320.

https://www.ecice06.com/CN/Y2025/V51/I1/312

图/表 8

参考文献 30

1	甘陈敏, 唐宏, 杨浩澜, 等. 融合卷积收缩门控的生成式文本摘要方法. 计算机工程, 2024, 50 (2): 98- 104. doi: 10.19678/j.issn.1000-3428.0066847
	GAN C M , TANG H , YANG H L , et al. Abstractive text summarization method incorporating convolutional shrinkage gating. Computer Engineering, 2024, 50 (2): 98- 104. doi: 10.19678/j.issn.1000-3428.0066847
2	HUANG D D, CUI L Y, YANG S, et al. What have we achieved on text summarization?[C]//Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2020: 446-469.
3	SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of NIPS'14. Cambridge, USA: MIT Press, 2014: 3104-3112.
4	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Washington D. C., UAS: IEEE Press, 2017: 6000-6010.
5	LEWIS M, LIU Y H, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, USA: Association for Computational Linguistics, 2020: 7871-7880.
6	DAI Z, YANG Z, YANG Y, et al. Transformer-XL: attentive language models beyond a fixed-length context[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1901.02860.
7	RAE J W, POTAPENKO A, JAYAKUMAR S M, et al. Compressive transformers for long-range sequence modelling[C]//Proceedings of International Conference on Learning Representations. Washington D. C., UAS: IEEE Press, 2019: 256-268.
8	BURTSEV M S, KURATOV Y, PEGANOV A, et al. Memory transformer[EB/OL]. [2023-06-20]. http://arxiv.org/abs/2006.11527v2.
9	HOCHREITER S , SCHMIDHUBER J . Long short-term memory. Neural Computation, 1997, 9 (8): 1735- 1780. doi: 10.1162/neco.1997.9.8.1735
10	CHUNG J, GÜLÇEHRE Ç, CHO K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1412.3555.
11	黄东瑞, 毛克彪, 郭中华, 等. 几种神经网络经典模型综述. 高技术通讯, 2023, 33 (8): 860- 871. doi: 10.3772/j.issn.1002-0470.2023.08.008
	HUANG D R , MAO K B , GUO Z H , et al. A review of classical models of neural networks. Chinese High Technology Letters, 2023, 33 (8): 860- 871. doi: 10.3772/j.issn.1002-0470.2023.08.008
12	BENGIO Y , SIMARD P , FRASCONI P . Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 1994, 5 (2): 157- 166. doi: 10.1109/72.279181
13	BRADBURY J, MERITY S, XIONG C M, et al. Quasi-recurrent neural networks[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1611.01576v2.
14	WANG Z W, MA Y, LIU Z T, et al. R-transformer: recurrent neural network enhanced transformer[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1907.05572v1.
15	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1810.04805v2.
16	HUANG Y P, CHENG Y L, BAPNA A, et al. GPipe: efficient training of giant neural networks using pipeline parallelism[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1811.06965v5.
17	CHEN T Q, XU B, ZHANG C Y, et al. Training deep nets with sublinear memory cost[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1604.06174v2.
18	KITAEV N, KAISER L, LEVSKAYA A. Reformer: the efficient transformer[C]//Proceedings of International Conference on Learning Representations. Washington D. C., UAS: IEEE Press, 2019: 4235-4246.
19	BELTAGY I, PETERS M E, COHAN A. Longformer: the long-document transformer[EB/OL]. [2023-06-20]. http://arxiv.org/abs/2004.05150v2.
20	ZAHEER M, GURUGANESH G, DUBEY A, et al. Big bird: transformers for longer sequences[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Washington D. C., UAS: IEEE Press, 2020: 17283-17297.
21	QIU J Z, MA H, LEVY O, et al. Blockwise self-attention for long document understanding[C]// Proceedings of Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2020: 2555-2565.
22	ZHOU H Y , ZHANG S H , PENG J Q , et al. Informer: beyond efficient transformer for long sequence time-series forecasting. Artificial Intelligence, 2021, 35 (12): 11106- 11115.
23	WANG S N, LI B Z, KHABSA M, et al. Linformer: self-attention with linear complexity[EB/OL]. [2023-06-20]. http://arxiv.org/abs/2006.04768v3.
24	ZHOU Y J, DOU Z C, YUAN H Y, et al. Socialformer: social network inspired long document modeling for document ranking[C]//Proceedings of ACM Web Conference. New York, USA: ACM Press, 2022: 339-347.
25	LIU T T, WANG C Y, CHEN C, et al. Understanding long programming languages with structure-aware sparse attention[C]//Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, USA: ACM Press, 2022: 2093-2098.
26	Mahoney M. Large text compression benchmark[EB/OL]. [2023-06-20]. https://arxiv.org/pdf/1911.02972.
27	MERITY S, XIONG C M, BRADBURY J, et al. Pointer sentinel mixture models[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1609.07843v1.
28	MIKOLOV T, ZWEIG G. Context dependent recurrent neural network language model[C]//Proceedings of IEEE Spoken Language Technology Workshop. Miami, USA: IEEE Press, 2012: 234-239.
29	BAI S J, KOLTER J Z, KOLTUN V, et al. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[EB/OL]. [2023-06-20]. http://arxiv.org/abs/1803.01271v2.
30	MANAKUL P, GALES M. Long-span summarization via local attention and content selection[C]//Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2021: 6026-6041.

[1]	肖超恩, 李子凡, 张磊, 王建新, 钱思源. 基于Transformer模型与注意力机制的差分密码分析[J]. 计算机工程, 2025, 51(1): 156-163.
[2]	王言国, 吕鹏远, 兰金江, 刘明哲, 秦冠军, 张硕桦, 周宇. 基于对抗训练与Transformer的风力发电机故障分类方法[J]. 计算机工程, 2024, 50(9): 377-384.
[3]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[4]	张亚洲, 和玉, 戎璐, 王祥凯. 基于上下文知识增强型Transformer网络的抑郁检测[J]. 计算机工程, 2024, 50(8): 75-85.
[5]	赵宏, 王枭. 基于Swin-Transformer的黑色素瘤图像病灶分割研究[J]. 计算机工程, 2024, 50(8): 249-258.
[6]	白雪冰, 车进, 吴金蔓, 陈玉敏. 基于Transformer视觉特征融合的图像描述方法[J]. 计算机工程, 2024, 50(8): 229-238.
[7]	杨兴睿, 马斌, 李森垚, 钟忺. 基于大语言模型的教育文本幂等摘要方法[J]. 计算机工程, 2024, 50(7): 32-41.
[8]	侯颖, 杨林, 胡鑫, 贺顺, 宋婉莹, 赵谦. 基于SwinT-YOLOX模型的自动扶梯行人安全检测算法[J]. 计算机工程, 2024, 50(3): 277-289.
[9]	刘帅威, 李智, 王国美, 张丽. 基于Transformer和GAN的对抗样本生成算法[J]. 计算机工程, 2024, 50(2): 180-187.
[10]	甘陈敏, 唐宏, 杨浩澜, 刘小洁, 刘杰. 融合卷积收缩门控的生成式文本摘要方法[J]. 计算机工程, 2024, 50(2): 98-104.
[11]	谌海云, 余鹏, 王海川. 时域孪生网络融合Transformer的长时无人机视觉跟踪[J]. 计算机工程, 2024, 50(11): 107-118.
[12]	赵健, 崔骞, 石佳, 刘岳. 基于文本和声学特征的双模态融合抑郁倾向识别算法[J]. 计算机工程, 2024, 50(11): 49-58.
[13]	池亚平, 岳梓岩, 林雨衡. 基于Transformer的SM4算法工作模式识别[J]. 计算机工程, 2023, 49(9): 109-117.
[14]	李健智, 王红玲, 王中卿. 基于场景与对话结构的摘要生成研究[J]. 计算机工程, 2023, 49(4): 303-311.
[15]	孙懿, 高见, 顾益军. 融合一维Inception结构与ViT的恶意加密流量检测[J]. 计算机工程, 2023, 49(1): 154-162.

选择文件类型/文献管理软件名称

选择包含的内容