计算机工程 ›› 2019, Vol. 45 ›› Issue (3): 273-277.doi: 10.19678/j.issn.1000-3428.0051615

• 开发研究与工程应用 • 上一篇    下一篇

基于改进TextRank算法的中文文本摘要提取

徐馨韬,柴小丽,谢彬,沈晨,王敬平   

  1. 中国电子科技集团公司第三十二研究所,上海 201808
  • 收稿日期:2018-05-22 出版日期:2019-03-15 发布日期:2019-03-15
  • 作者简介:徐馨韬(1995—),女,硕士研究生,主研方向为自然语言处理;柴小丽,研究员;谢彬,高级工程师;沈晨,学士;王敬平,工程师。
  • 基金项目:

    国家部委基金。

Extraction of Chinese Text Summarization Based on Improved TextRank Algorithm

XU Xintao,CHAI Xiaoli,XIE Bin,SHEN Chen,WANG Jingping   

  1. The 32nd Research Institute of China Electronics Technology Group Corporation,Shanghai 201808,China
  • Received:2018-05-22 Online:2019-03-15 Published:2019-03-15

摘要:

为提高中文文本摘要提取的准确度,融合Doc2Vec模型、K-means算法和TextRank算法,提出一种中文文本摘要自动提取算法(DK-TextRank)。使用Doc2Vec模型进行文本向量化,采用改进的K-means算法实现相似文本聚类,在每个聚类簇中应用加入权重影响因子的TextRank算法对文本语句进行排序,并提取主题句生成摘要。实验结果表明,DK-TextRank算法在摘要语句数量为7时F值达到79.36%,相比传统TF-IDF、TextRank算法提取的摘要质量更高。

关键词: Doc2Vec模型, K-means算法, TextRank算法, 摘要提取, 权重影响因子

Abstract:

This paper proposes a Chinese text summarization extraction algorithm,called DK-TextRank,combines Doc2Vec model,K-means and TextRank algorithm for Chinese texts to improve summarization accuracy.After using the Doc2Vec model for text vectorization,the DK-TextRank algorithm uses an improved K-means algorithm for similar text clustering,and the TextRank algorithm with weight impact factors in each cluster to sort and extract topic sentence.Then,it generates a summary.Experimental results show that,compared with traditional TF-IDF,TextRank algorithm,the DK-TextRank algorithm has an F value of 79.36% when the number of summary statements is 7,and the extracted abstract has higher quality.

Key words: Doc2Vec model, K-means algorithm, TextRank algorithm, summarization extraction, weight influence factor

中图分类号: