作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于CR-PageRank算法的个人事件自动摘要研究

高永兵,王宇,马占飞   

  1. (内蒙古科技大学 信息工程学院,内蒙古 包头 014010)
  • 收稿日期:2015-10-29 出版日期:2016-11-15 发布日期:2016-11-15
  • 作者简介:高永兵(1974—),男,副教授,主研方向为数据管理、信息检索;王宇,硕士研究生;马占飞,教授。
  • 基金资助:
    国家自然科学基金(61163025);内蒙古自然科学基金(2015MS0621)。

Research on Automatic Summarization of Personal Events Based on CR-PageRank Algorithm

GAO Yongbing,WANG Yu,MA Zhanfei   

  1. (School of Information Engineering,Inner Mongolia University of Science and Technology,Baotou,Inner Mongolia 014010,China)
  • Received:2015-10-29 Online:2016-11-15 Published:2016-11-15

摘要: 文本自动摘要是获取微博重要信息的一种手段,但微博的短文本、高冗余、高噪声等特征对于自动摘要造成较大的影响。为此,提出一种基于个人微博内容与相关性的事件摘要提取算法CR-PageRank。将微博事件集构建成事件图,结合微博内容质量,利用CR-PageRank算法计算出微博的总权重,选取有代表性的微博生成初始摘要进行可读性加工,使摘要更具可读性。实验结果表明,该算法相对于TextRank算法和LexRank算法,准确率和召回率明显提高,而且生成的摘要内容简洁,信息全面,阅读性好。

关键词: CR-PageRank算法, 内容质量, 个人事件, 事件摘要, 人工评测

Abstract: Automatic document summarization is an approach to obtain important information of microblog,but with the characteristics of short text,high redundancy and high noise of microblog,cause great difficulties for automatic summary.For this problem,an event summary extraction algorithm based on the content and relativity of individual micro blog is presented,called Content and Relativity PageRank(CR-PageRank).It uses a set of events of microblog to build an event graph.And combines with content quality of microblog and calculates the total weight of microblog by using CR-PageRank algorithm,extracts representative microblog to generate the initial summary.It processes the readability to make the final summary more readable.Experimental results show that by comparing with TextRank algorithm and LexRank algorithm,it is precise and recall rate is increased significantly,and the generated content is more concise,more comprehensive information,and better readability.

Key words: CR-PageRank algorithm, content quality, personal event, event summarization, manual evaluation

中图分类号: