作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2020, Vol. 46 ›› Issue (9): 110-116. doi: 10.19678/j.issn.1000-3428.0055047

• 人工智能与模式识别 • 上一篇    下一篇

基于文本多维度特征的自动摘要生成方法

王青松, 张衡, 李菲   

  1. 辽宁大学 信息学院, 沈阳 110036
  • 收稿日期:2019-05-29 修回日期:2019-09-12 发布日期:2019-09-20
  • 作者简介:王青松(1974-),男,副教授、硕士,主研方向为大数据技术、数据挖掘;张衡、李菲,硕士研究生。
  • 基金资助:
    国家自然科学基金(61802160);沈阳市新兴产业发展专项资金计划"辽宁省公共舆情与网络安全大数据系统工程实验室"[2016(294)]。

Automatic Summary Generation Method Based on Multidimensional Text Feature

WANG Qingsong, ZHANG Heng, LI Fei   

  1. College of Information, Liaoning University, Shenyang 110036, China
  • Received:2019-05-29 Revised:2019-09-12 Published:2019-09-20

摘要: 现有长文本自动摘要生成方法存在句子特征单一化和无法全面衡量句子相似特征的问题,导致摘要生成的准确率降低。为此,提出一种基于图集成模型的自动摘要生成方法。在计算得到文本句子词频、语义和句法特征后,利用朴素贝叶斯方法将文本多维度特征的融合问题转化为图集成方式,提高句子间相似计算的准确性,并在此基础上通过TextRank算法生成文本摘要。实验结果表明,相比传统基于序列到序列模型的摘要生成方法和基于句子多维特征的摘要抽取方法,该方法取得了更高的ROUGE指标值,能够有效综合句子的多维特征,提高摘要生成的准确率。

关键词: 句子相似度, 图集成模型, 文本摘要, 朴素贝叶斯, 多维度特征

Abstract: Existing automatic summary generation methods for long texts cannot fully measure the similarity characteristics of sentences,which results in the decrease of the accuracy of summary generation.To address the problem,this paper proposes an automatic summary generation method based on graph integration model.The method calculates the word frequency,semantic features and syntactic features of text sentences,and then uses the naive Bayesian method to transform the fusion problem of multidimensional text features into a graph integration mode,which improves the calculation accuracy of similarity between sentences.On this basis,a text summary is generated by using TextRank algorithm.Experimental results show that compared with the traditional summary generation method based on the sequence-to-sequence model and summary extraction method based on multi-dimensional features of sentences,the proposed method achieves a higher ROUGE index.It can effectively synthesize the multidimensional features of sentences,and improves the accuracy of summary generation.

Key words: sentence similarity, graph integration model, text summarization, naive Bayes, multidimensional feature

中图分类号: