作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (14): 166-167,. doi: 10.3969/j.issn.1000-3428.2007.14.058

• 人工智能及识别技术 • 上一篇    下一篇

基于基本要素向量空间的英文多文档自动摘要

刘德喜1,2,何炎祥2,姬东鸿3,杨 华2   

  1. (1. 襄樊学院物理学系,襄樊 441053;2. 武汉大学计算机学院,武汉 430079;3. 新加坡信息通讯研究所,新加坡 119613)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-07-20 发布日期:2007-07-20

English Multi-document Summarization Based on Basic Element Vector Space

LIU Dexi1,2, HE Yanxiang2, JI Donghong3, YANG Hua2   

  1. (1. School of Physics, Xiangfan University, Xiangfan 441053; 2. School of Computer, Wuhan University, Wuhan 430079; 3. Institute for Information Research Comm., Singapore 119613)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-07-20 Published:2007-07-20

摘要: 在基于基本要素(BE)向量空间的英文多文档自动文摘中,句子不再用术语向量或词向量来表达,而是用基本要素向量来表示。在用k-均值聚类算法时,采用一种自动探测k值的技术。实验表明,基于基本要素的多文档自动文摘MSBEC比基于词更优越。

关键词: 多文档自动文摘, 基本要素, k-均值聚类

Abstract: This paper proposes a novel multi-document summarization strategy based on basic element(BE) vector clustering. In this strategy, sentences are represented by BE vectors instead of word or term vectors before clustering. The BE-vector clustering is realized by adopting the k-means clustering method, and a novel clustering analysis method is employed to automatically detect the number of clusters, k. The experimental results indicate a superiority of the proposed strategy over the traditional summarization strategy based on word vector clustering.

Key words: multi-document summarization, basic element, k-means clustering

中图分类号: