作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (2): 98-104. doi: 10.19678/j.issn.1000-3428.0066847

• 人工智能与模式识别 • 上一篇    下一篇

融合卷积收缩门控的生成式文本摘要方法

甘陈敏1,2,*(), 唐宏1,2, 杨浩澜1,2, 刘小洁1,2, 刘杰1,2   

  1. 1. 重庆邮电大学通信与信息工程学院, 重庆 400065
    2. 重庆邮电大学移动通信技术重庆市重点实验室, 重庆 400065
  • 收稿日期:2023-01-31 出版日期:2024-02-15 发布日期:2023-04-18
  • 通讯作者: 甘陈敏
  • 基金资助:
    长江学者和创新团队发展计划(IRT_16R72)

Abstractive Text Summarization Method Incorporating Convolutional Shrinkage Gating

Chenmin GAN1,2,*(), Hong TANG1,2, Haolan YANG1,2, Xiaojie LIU1,2, Jie LIU1,2   

  1. 1. College of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
    2. Chongqing Key Lab of Mobile Communications Technology, Chongqing University of Posts and Communications, Chongqing 400065, China
  • Received:2023-01-31 Online:2024-02-15 Published:2023-04-18
  • Contact: Chenmin GAN

摘要:

在深度学习技术的推动下,基于编码器-解码器架构并结合注意力机制的序列到序列模型成为文本摘要研究中应用最广泛的模型之一,尤其在生成式文本摘要任务中取得显著效果。然而,现有的采用循环神经网络的模型存在并行能力不足和时效低下的局限性,无法充分概括有用信息,忽视单词与句子间的联系,易产生冗余重复或语义不相关的摘要。为此,提出一种基于Transformer和卷积收缩门控的文本摘要方法。利用BERT作为编码器,提取不同层次的文本表征得到上下文编码,采用卷积收缩门控单元调整编码权重,强化全局相关性,去除无用信息的干扰,过滤后得到最终的编码输出,并通过设计基础Transformer解码模块、共享编码器的解码模块和采用生成式预训练Transformer(GPT)的解码模块3种不同的解码器,加强编码器与解码器的关联,以此探索能生成高质量摘要的模型结构。在LCSTS和CNNDM数据集上的实验结果表明,相比主流基准模型,设计的TCSG、ES-TCSG和GPT-TCSG模型的评价分数增量均不低于1.0,验证了该方法的有效性和可行性。

关键词: 生成式文本摘要, 序列到序列模型, Transformer模型, BERT编码器, 卷积收缩门控单元, 解码器

Abstract:

Driven by deep learning techniques, Sequence to Sequence(Seq2Seq) model, based on an encoder-decoder architecture combined with an attention mechanism, is widely utilized in text summarization research, particularly for abstractive text summarization tasks. Remarkable results are achieved by this model. However, limitations are faced by existing models using Recurrent Neural Network(RNN), such as insufficient parallelism, low time efficiency, and a tendency to produce summaries that are either redundant, repetitive, or semantically irrelevant. Additionally, these models often fail to fully summarize useful information and ignore the connection between words and sentences. In response to these challenges, a text summarization method based on Transformer and convolutional shrinkage gating is proposed. Different levels of text representations are extracted using BERT as an encoder, which then obtains contextual encoding. The convolutional shrinkage gating unit is adopted to adjust encoding weights, strengthen global relevance, remove interference from useless information, and obtain the final encoding output after filtering. Three different decoders are designed: the basic Transformer decoding module, the decoding module with a shared encoder, and the decoding module using GPT. These are aimed at strengthening the association between encoder and decoder and exploring model structures capable of generating high-quality abstracts. Evaluation scores of the TCSG, ES-TCSG, and GPT-TCSG models in this method are shown to increment by no less than 1.0 on both LCSTS and CNNDM datasets, verifying the validity and feasibility of the method relative to mainstream benchmark models.

Key words: abstractive text summarization, Sequence to Sequence(Seq2Seq) model, Transformer model, BERT encoder, convolutional shrinkage gating unit, decoder