作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (4): 303-311. doi: 10.19678/j.issn.1000-3428.0064382

• 开发研究与工程应用 • 上一篇    下一篇

基于场景与对话结构的摘要生成研究

李健智, 王红玲, 王中卿   

  1. 苏州大学 计算机科学与技术学院, 江苏 苏州 215006
  • 收稿日期:2022-04-06 修回日期:2022-05-17 发布日期:2022-06-20
  • 作者简介:李健智(1997-),男,硕士研究生,主研方向为自然语言处理;王红玲、王中卿,副教授、博士。
  • 基金资助:
    国家自然科学基金(61976146)。

Research on Summarization Generation Based on Scene and Dialogue Structure

LI Jianzhi, WANG Hongling, WANG Zhongqing   

  1. School of Computer Science and Technology, Soochow University, Suzhou 215006, Jiangsu, China
  • Received:2022-04-06 Revised:2022-05-17 Published:2022-06-20

摘要: 对话摘要是从复杂的对话中提取关键信息以转化成简短的文本,供用户快速浏览对话内容。相比传统文本摘要,对话摘要数据具有篇幅较长、结构复杂等特点。传统的摘要模型难以充分地利用数据的长文本信息,并且无法考虑对话的结构信息。为此,结合抽取和生成模型,提出一种基于场景与对话结构的摘要生成方法,以对话中的场景、角色和对话内容为主要信息生成对话摘要。通过对话解析构建以角色、动作说明和会话为要素的对话结构图,并使用序列标注任务微调BERT预训练模型,生成以对话句子为单位的向量表示,利用图神经网络建模对话结构,筛选出包含关键信息的句子。在此基础上,将得到的抽取结果作为生成模型的输入,利用双向自回归变压器(BART)预训练模型作为基础框架,在编码端额外引入角色和场景信息,丰富生成模型的语义特征,使用加入多头注意力机制的解码器生成摘要。实验结果表明,相比BART、MV_BART、HMNet等方法,该方法在ROUGE-1指标上最高可提升5.3个百分点。

关键词: 对话摘要, 长文本摘要, 文本结构, 对话结构, 双向自回归变压器预训练

Abstract: The dialogue summarization's goal is to condense key information from complex dialogues into concise text, allowing users to browse through the content quickly.Unlike traditional text summarization, dialogue summarization data is characteristically lengthy and complex in structure.The traditional summarization model is unable to exploit the lengthy text information fully, and neglects the structural information of the conversation.To this end, this paper proposes a summarization generation method based on scene and dialogue structure by combining the extraction and generation model.This method generates the dialogue summarization using the scene, role, and dialogue content.Through dialogue analysis, a dialogue structure diagram incorporating elements such as roles, action descriptions, and conversations is constructed.The BERT pre-training model is fine-tuned using sequence tagging tasks to generate vector representation with dialogue sentences as the unit.Graph neural networks model the dialogue structure and filter out sentences containing key information.On this basis, the generation model uses the extracted results as the input.A Bidirectional and Auto-Regressive Transformer(BART) pre-training model as the basic framework, introduces additional role and scene information at the coding end, enriches the semantic features of the generation model, and uses the decoder with a multi-head attention mechanism to generate the summarization.The experimental results indicate that the proposed method can increase the ROUGE-1 index by up to 5.3 percentage points, compared to BART, MV_BART, HMNet, and other methods.

Key words: dialogue summarization, long text summarization, text structure, dialogue structure, Bidirectional and Auto-Regressive Transformers(BART) pre-training

中图分类号: