摘要: 提出了一种基于主题句发现的中文自动文摘方法。该方法使用术语代替传统的词语作为最小语义单位,采用术语长度术语频率方法进行术语权重计算,获得特征词。利用一种改进的k-means聚类算法进行句子聚类,根据聚类结果进行主题句发现。实验表明,该算法所得到的文摘,在各项指标上优于传统的文摘。
关键词:
主题句发现,
自动文摘,
句子聚类,
自然语言处理
Abstract: Automatic summarization is one of main research fields in natural language processing. This paper proposes a special Chinese automatic summarization method based on discovering thematic sentences, which uses terms as minimal semantic unit rather than word, and employs term length term frequency (TLTF) to compute weight of term to obtain feature. It uses an improved k-means method to cluster sentences, and discovers thematic sentences according to clustering results. Experimental results indicate a clear superiority of the proposed method over the traditional method under the proposed evaluation scheme.
Key words:
Thematic sentence discovery,
Automatic text summarization,
Sentences clustering,
Natural language processing
中图分类号:
王 萌;李春贵;唐培和;王晓荣. 一种主题句发现的中文自动文摘研究[J]. 计算机工程, 2007, 33(08): 180-181,.
WANG Meng; LI Chungui; TANG Peihe; WANG Xiaorong. Chinese Automatic Summarization Based on Thematic Sentence Discovery[J]. Computer Engineering, 2007, 33(08): 180-181,.