摘要: 在基于基本要素(BE)向量空间的英文多文档自动文摘中,句子不再用术语向量或词向量来表达,而是用基本要素向量来表示。在用k-均值聚类算法时,采用一种自动探测k值的技术。实验表明,基于基本要素的多文档自动文摘MSBEC比基于词更优越。
关键词:
多文档自动文摘,
基本要素,
k-均值聚类
Abstract: This paper proposes a novel multi-document summarization strategy based on basic element(BE) vector clustering. In this strategy, sentences are represented by BE vectors instead of word or term vectors before clustering. The BE-vector clustering is realized by adopting the k-means clustering method, and a novel clustering analysis method is employed to automatically detect the number of clusters, k. The experimental results indicate a superiority of the proposed strategy over the traditional summarization strategy based on word vector clustering.
Key words:
multi-document summarization,
basic element,
k-means clustering
中图分类号:
刘德喜;何炎祥;姬东鸿;杨 华. 基于基本要素向量空间的英文多文档自动摘要[J]. 计算机工程, 2007, 33(14): 166-167,.
LIU Dexi; HE Yanxiang; JI Donghong; YANG Hua. English Multi-document Summarization Based on Basic Element Vector Space[J]. Computer Engineering, 2007, 33(14): 166-167,.