Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2019, Vol. 45 ›› Issue (3): 273-277. doi: 10.19678/j.issn.1000-3428.0051615

Previous Articles     Next Articles

Extraction of Chinese Text Summarization Based on Improved TextRank Algorithm

XU Xintao,CHAI Xiaoli,XIE Bin,SHEN Chen,WANG Jingping   

  1. The 32nd Research Institute of China Electronics Technology Group Corporation,Shanghai 201808,China
  • Received:2018-05-22 Online:2019-03-15 Published:2019-03-15

基于改进TextRank算法的中文文本摘要提取

徐馨韬,柴小丽,谢彬,沈晨,王敬平   

  1. 中国电子科技集团公司第三十二研究所,上海 201808
  • 作者简介:徐馨韬(1995—),女,硕士研究生,主研方向为自然语言处理;柴小丽,研究员;谢彬,高级工程师;沈晨,学士;王敬平,工程师。
  • 基金资助:

    国家部委基金。

Abstract:

This paper proposes a Chinese text summarization extraction algorithm,called DK-TextRank,combines Doc2Vec model,K-means and TextRank algorithm for Chinese texts to improve summarization accuracy.After using the Doc2Vec model for text vectorization,the DK-TextRank algorithm uses an improved K-means algorithm for similar text clustering,and the TextRank algorithm with weight impact factors in each cluster to sort and extract topic sentence.Then,it generates a summary.Experimental results show that,compared with traditional TF-IDF,TextRank algorithm,the DK-TextRank algorithm has an F value of 79.36% when the number of summary statements is 7,and the extracted abstract has higher quality.

Key words: Doc2Vec model, K-means algorithm, TextRank algorithm, summarization extraction, weight influence factor

摘要:

为提高中文文本摘要提取的准确度,融合Doc2Vec模型、K-means算法和TextRank算法,提出一种中文文本摘要自动提取算法(DK-TextRank)。使用Doc2Vec模型进行文本向量化,采用改进的K-means算法实现相似文本聚类,在每个聚类簇中应用加入权重影响因子的TextRank算法对文本语句进行排序,并提取主题句生成摘要。实验结果表明,DK-TextRank算法在摘要语句数量为7时F值达到79.36%,相比传统TF-IDF、TextRank算法提取的摘要质量更高。

关键词: Doc2Vec模型, K-means算法, TextRank算法, 摘要提取, 权重影响因子

CLC Number: