作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (22): 64-65. doi: 10.3969/j.issn.1000-3428.2010.22.022

• 软件技术与数据库 • 上一篇    下一篇

基于谱聚类的多文档摘要新方法

林 立1,胡 侠2,朱俊彦1   

  1. (1. 浙江大学计算机科学与技术学院,杭州 310027;2. 杭州市科技信息研究院,杭州 310001)
  • 出版日期:2010-11-20 发布日期:2010-11-18
  • 作者简介:林 立(1985-),男,硕士研究生,主研方向:信息检索,网络技术;胡 侠,助理研究员、硕士;朱俊彦,硕士研究生
  • 基金资助:

    残疾人信息无障碍服务关键技术及信息资源支撑基金资助项目(2008BAH26B02)

Novel Multi-document Summarization Method Based on Spectral Clustering

LIN Li1, HU Xia2, ZHU Jun-yan1   

  1. (1. School of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China; 2. Hangzhou Information Institute of Science and Technology, Hangzhou 310001, China)
  • Online:2010-11-20 Published:2010-11-18

摘要:

提出一种基于谱聚类的多文档摘要方法。在将文档中主题相关的句子进行聚类的基础上,同时考虑不同主题类别的重要性,综合句子位置、长度等因素以得到句子的重要性得分。根据重要性从高到低抽取满足字数要求的句子作为最终摘要。实验结果表明,该方法相较于传统摘要方法有更好的性能,能够有效地提高摘要的质量。

关键词: 多文档摘要, 谱聚类, 信息检索

Abstract:

This paper proposes a multi-document summarization method based on spectral clustering. Based on clustering topic-relevant sentences in the documents together, this method creatively takes the importance of each class into consideration, along with sentence position, length and other factors to obtain the score of importance of the sentences. The sentences are sorted according to the score and extracted that meet the requirement of number of words as the summarization. Experimental results show that this method performs better than traditional methods and can improve the quality of summarization effectively.

Key words: multi-document summarization, spectral clustering, information retrieval

中图分类号: