Abstractive Text Summarization Method Incorporating Convolutional Shrinkage Gating

doi:10.19678/j.issn.1000-3428.0066847

Abstract

Abstract:

Driven by deep learning techniques, Sequence to Sequence(Seq2Seq) model, based on an encoder-decoder architecture combined with an attention mechanism, is widely utilized in text summarization research, particularly for abstractive text summarization tasks. Remarkable results are achieved by this model. However, limitations are faced by existing models using Recurrent Neural Network(RNN), such as insufficient parallelism, low time efficiency, and a tendency to produce summaries that are either redundant, repetitive, or semantically irrelevant. Additionally, these models often fail to fully summarize useful information and ignore the connection between words and sentences. In response to these challenges, a text summarization method based on Transformer and convolutional shrinkage gating is proposed. Different levels of text representations are extracted using BERT as an encoder, which then obtains contextual encoding. The convolutional shrinkage gating unit is adopted to adjust encoding weights, strengthen global relevance, remove interference from useless information, and obtain the final encoding output after filtering. Three different decoders are designed: the basic Transformer decoding module, the decoding module with a shared encoder, and the decoding module using GPT. These are aimed at strengthening the association between encoder and decoder and exploring model structures capable of generating high-quality abstracts. Evaluation scores of the TCSG, ES-TCSG, and GPT-TCSG models in this method are shown to increment by no less than 1.0 on both LCSTS and CNNDM datasets, verifying the validity and feasibility of the method relative to mainstream benchmark models.

Key words: abstractive text summarization, Sequence to Sequence(Seq2Seq) model, Transformer model, BERT encoder, convolutional shrinkage gating unit, decoder

摘要：

在深度学习技术的推动下，基于编码器-解码器架构并结合注意力机制的序列到序列模型成为文本摘要研究中应用最广泛的模型之一，尤其在生成式文本摘要任务中取得显著效果。然而，现有的采用循环神经网络的模型存在并行能力不足和时效低下的局限性，无法充分概括有用信息，忽视单词与句子间的联系，易产生冗余重复或语义不相关的摘要。为此，提出一种基于Transformer和卷积收缩门控的文本摘要方法。利用BERT作为编码器，提取不同层次的文本表征得到上下文编码，采用卷积收缩门控单元调整编码权重，强化全局相关性，去除无用信息的干扰，过滤后得到最终的编码输出，并通过设计基础Transformer解码模块、共享编码器的解码模块和采用生成式预训练Transformer(GPT)的解码模块3种不同的解码器，加强编码器与解码器的关联，以此探索能生成高质量摘要的模型结构。在LCSTS和CNNDM数据集上的实验结果表明，相比主流基准模型，设计的TCSG、ES-TCSG和GPT-TCSG模型的评价分数增量均不低于1.0，验证了该方法的有效性和可行性。

关键词: 生成式文本摘要, 序列到序列模型, Transformer模型, BERT编码器, 卷积收缩门控单元, 解码器

Chenmin GAN, Hong TANG, Haolan YANG, Xiaojie LIU, Jie LIU. Abstractive Text Summarization Method Incorporating Convolutional Shrinkage Gating[J]. Computer Engineering, 2024, 50(2): 98-104.

甘陈敏, 唐宏, 杨浩澜, 刘小洁, 刘杰. 融合卷积收缩门控的生成式文本摘要方法[J]. 计算机工程, 2024, 50(2): 98-104.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0066847

http://www.ecice06.com/EN/Y2024/V50/I2/98

Figures/Tables 5

References 25

1	HUANG D D, CUI L Y, YANG S, et al. What have we achieved on text summarization?[C]//Proceedings of 2020 Conference on Empirical Methods in Natural Language Processing. Washington D. C., USA: Association for Computational Linguistics, 2020: 446-469.
2	李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述. 计算机研究与发展, 2021, 58 (1): 1- 21. doi: 10.7544/issn1000-1239202120190785
	LI J P, ZHANG C, CHEN X J, et al. Survey on automatic text summarization. Journal of Computer Research and Development, 2021, 58 (1): 1- 21. doi: 10.7544/issn1000-1239202120190785
3	EL-KASSAS W S, SALAMA C R, RAFEA A A, et al. Automatic text summarization: a comprehensive survey. Expert Systems with Applications, 2021, 165, 113679. doi: 10.1016/j.eswa.2020.113679
4	SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2014: 3104-3112.
5	NALLAPATI R, ZHOU B W, DOS SANTOS C, et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond[C]//Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning. Berlin, Germany: Springer, 2016: 280-290.
6	LIN C Y. ROUGE: a package for automatic evaluation of summaries[C]//Proceedings of ACLʼ04. Washington D. C., USA: IEEE Press, 2004: 74-81.
7	VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2015: 2692-2700.
8	GU J T, LU Z D, LI H, et al. Incorporating copying mechanism in sequence-to-sequence learning[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Springer, 2016: 1631-1640.
9	SEE A, LIU P J, MANNING C D. Get to the point: summarization with pointer-generator networks[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Washington D. C., USA: Association for Computational Linguistics, 2017: 1073-1083.
10	TU Z P, LU Z D, LIU Y, et al. Modeling coverage for neural machine translation[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin, Germany: Springer, 2016: 76-85.
11	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
12	LIU Y, LAPATA M. Text summarization with pretrained encoders[C]//Proceedings of 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Washington D. C., USA: Association for Computational Linguistics, 2019: 3730-3740.
13	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2022-12-20]. https://arxiv.org/abs/1810.04805.pdf.
14	LEWIS M, LIU Y H, GOYAL N, et al. BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Washington D. C., USA: Association for Computational Linguistics, 2020: 7871-7880.
15	RADFORD A, NARASIMHAN K. Improving language understanding by generative pre-training[EB/OL]. [2022-12-20]. https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf.
16	XIAO D L, ZHANG H, LI Y K, et al. ERNIE-GEN: an enhanced multi-flow pre-training and fine-tuning framework for natural language generation[C]//Proceedings of the 29th International Joint Conference on Artificial Intelligence. Yokohama, Japan: [s. n.], 2020: 357-369.
17	ZHANG J Q, ZHAO Y, SALEH M, et al. PEGASUS: pre-training with extracted gap-sentences for abstractive summarization[EB/OL]. [2022-12-20]. https://arxiv.org/abs/1912.08777.pdf.
18	吴仁守, 王红玲, 王中卿, 等. 全局自匹配机制的短文本摘要生成方法. 软件学报, 2019, 30 (9): 2705- 2717. doi: 10.13328/j.cnki.jos.005850
	WU R S, WANG H L, WANG Z Q, et al. Short text summary generation with global self-matching mechanism. Journal of Software, 2019, 30 (9): 2705- 2717. doi: 10.13328/j.cnki.jos.005850
19	LIN J Y, SUN X, MA S M, et al. Global encoding for abstractive summarization[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. Washington D. C., USA: Association for Computational Linguistics, 2018: 163-169.
20	邓维斌, 李云波, 张一明, 等. 融合BERT和卷积门控的生成式文本摘要方法. 控制与决策, 2023, 38 (1): 152- 160. URL
	DENG W B, LI Y B, ZHANG Y M, et al. A generative text summarization method combining BERT and convolution gating. Control and Decision, 2023, 38 (1): 152- 160. URL
21	ZHAO M H, ZHONG S S, FU X Y, et al. Deep residual shrinkage networks for fault diagnosis. IEEE Transactions on Industrial Informatics, 2020, 16 (7): 4681- 4690. doi: 10.1109/TII.2019.2943898
22	HU J E, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 7132-7141.
23	田珂珂, 周瑞莹, 董浩业, 等. 基于编码器共享和门控网络的生成式文本摘要方法. 北京大学学报(自然科学版), 2020, 56 (1): 61- 67. URL
	TIAN K K, ZHOU R Y, DONG H Y, et al. An abstractive summarization method based on encoder-sharing and gated network. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020, 56 (1): 61- 67. URL
24	HE R N, RAVULA A, KANAGAL B, et al. RealFormer: transformer likes residual attention[C]//Proceedings of the 52th Annual Meeting of the Association for Computational Linguistics. Washington D. C., USA: Association for Computational Linguistics, 2021: 929-943.
25	HU B T, CHEN Q C, ZHU F Z. LCSTS: a large scale Chinese short text summarization dataset[C]//Proceedings of 2015 Conference on Empirical Methods in Natural Language Processing. Washington D. C., USA: Association for Computational Linguistics, 2015: 1967-1972.

[1]	Zhigang XU, Cong ZHANG. Mural Image Color Restoration Based on Dual Reference Optimization [J]. Computer Engineering, 2024, 50(2): 345-352.
[2]	Shuaiwei LIU, Zhi LI, Guomei WANG, Li ZHANG. Adversarial Example Generation Algorithm Based on Transformer and GAN [J]. Computer Engineering, 2024, 50(2): 180-187.
[3]	Yaping CHI, Ziyan YUE, Yuheng LIN. Working Mode Recognition for SM4 Algorithm Based on Transformer [J]. Computer Engineering, 2023, 49(9): 109-117.
[4]	Ang LU, Jun CHU, Lu LENG. Image Dehazing Based on High-Frequency and Low-Frequency Feature Enhancement [J]. Computer Engineering, 2023, 49(8): 174-181.
[5]	WANG Tongguan, LAI Huicheng, CAI Yuxi, GAO Guxue, WANG Liejun. Face Super-Resolution Reconstruction Based on Attention Residual Network [J]. Computer Engineering, 2023, 49(6): 234-241.
[6]	ZHANG Jiong, WANG Lifang, LIN Suzhen, QIN Pinle, MI Jia, LIU Yang. Medical Image Fusion with Local-Global Feature Coupling and Cross-Scale Attention [J]. Computer Engineering, 2023, 49(3): 238-247.
[7]	Dechun ZHAO, Yang SHU, Ling LI, Huan CHEN, Zihao ZHANG. Speech Recognition Transformer Decoding Acceleration Method with Discarding Redundant Blocks [J]. Computer Engineering, 2023, 49(10): 105-111, 119.
[8]	LIU Lixia, XUAN Shibin, LIU Chang, LI Jiaxiang. Quantification of Optic Cup and Optic Disc Uncertainty with Multi-Expert Annotations [J]. Computer Engineering, 2023, 49(1): 250-257,269.
[9]	LIU Teng, LIU Hongzhe, LI Xuewei, XU Cheng. Improved Instance Segmentation Method Based on Anchor-Free Segmentation Network [J]. Computer Engineering, 2022, 48(9): 239-247,253.
[10]	LU Dongsheng, ZHANG Yujin, DANG Lianghui. Multi-Feature Fusion U-structure Deep Network for Image Tempering Forensics [J]. Computer Engineering, 2022, 48(4): 213-222.
[11]	CHEN Renjie, ZHENG Xiaoying, ZHU Yongxin. Joint Entity and Relation Extraction Fusing Entity Type Information [J]. Computer Engineering, 2022, 48(3): 46-53.
[12]	ZHANG Xiangfen, LIU Yan, YUAN Feiniu. 3D Medical Image Segmentation Based on Inverted Pyramid Deep Learning Network [J]. Computer Engineering, 2022, 48(12): 304-311.
[13]	WANG Tao, LIU Chaohui, ZHENG Qingqing, HUANG Jiaxi. Multi-turn Task-oriented Dialogue Technology Based on Unidirectional Transformer and Siamese Network [J]. Computer Engineering, 2021, 47(7): 55-58,66.
[14]	LI Yahong, ZHOU Haiying, XU Shaowei. Image Description Model Based on Object Relation Mesh Transformer [J]. Computer Engineering, 2021, 47(5): 197-204.
[15]	ZHOU Weixiao, LAN Wenfei. Summarization Model Using Multi-Task Learning Fused with Text Classification [J]. Computer Engineering, 2021, 47(4): 48-55.

Please choose a citation manager

Content to export