作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于高质量信息提取的微博自动摘要

彭敏1,2,高斌龙1,黄济民1,刘纪平1   

  1. (1.武汉大学计算机学院,武汉 430072; 2.武汉大学深圳研究院,广东 深圳 518057)
  • 收稿日期:2014-07-02 出版日期:2015-07-15 发布日期:2015-07-15
  • 作者简介:彭敏(1973-),女,教授、博士后,主研方向:自然语言处理,社会计算;高斌龙,硕士研究生;黄济民,本科生;刘纪平,讲师、博士。
  • 基金资助:
    国家自然科学基金资助项目(61070083);2013年深圳知识创新计划基金资助项目。

Automatic Summarization of Microblog Based on High Quality Information Extraction

PENG Min 1,2,GAO Binlong 1,HUANG Jimin 1,LIU Jiping 1   

  1. (1.Computer School,Wuhan University,Wuhan 430072,China; 2.Shenzhen Institute,Wuhan University,Shenzhen 518057,China)
  • Received:2014-07-02 Online:2015-07-15 Published:2015-07-15

摘要:

文本自动摘要是获取微博平台关键信息的一种重要手段。现有面向微博的自动摘要方法较关注文本集合中句子或者关键词的提取,而在去除冗余信息、内容噪声方面缺乏有效手段,导致提取的微博内容质量不高。为解决该问题,以微博平台为研究对象,提出一种基于时频域 转换的信息提取方法,获得与某话题相关度高、冗余度低且信息量大的高质量微博文本,将综合分值较高的微博作为生成摘要的样本集合,并对该样本集合中每条微博的句子进行权重打分,选取权值较高的句子组成微博摘要。实验结果表明,该方法能够有效过滤冗余信息和内容噪声,基于自动评测和人工评测的摘要结果均优于现有自动摘要方法。

关键词: 微博自动摘要, 冗余去除, 信息提取, 自动评测, 人工评测

Abstract: Automatic document summarization is an important approach to obtain key information of microblog platform.Most existing methods on microblogs automatic summarization pay more attention to extract sentences or key phrases from the set of documents,but there are few effective and commonly used methods on reducing the redundancy and noise,which results in the poor content quality of the extracted microblog messages and directly affects the performance of summary.This paper takes microblog platform as research object,proposes an information extraction method based on time-frequency transformation,and extracts a series of high quality microblogs which are highly related to one topic and with less redundancy and abundant informativeness.The sentences in the set of high quality microblogs are scored based on the weights of sentence characters,and the summary of microblogs is generated by ranking and selection of the sentences.Experimental results show that the method is effectively in filtering the redundancy and noise of microblogs,and the final summarization results based on automatic evaluation and manual evaluation outperform other automatic summarization methods’ results.

Key words: microblog automatic summarization, redundancy removal, information extraction, automatic evaluation, manual evaluation

中图分类号: