Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2007, Vol. 33 ›› Issue (01): 209-211. doi: 10.3969/j.issn.1000-3428.2007.01.073

• Artificial Intelligence and Recognition Technology • Previous Articles     Next Articles

Validation of Textual Document Clustering Techniques

LIU Wuhua, LUO Tiejian, WANG Wenjie   

  1. (Graduate School, Chinese Academy of Sciences, Beijing 100080)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-01-05 Published:2007-01-05

文本聚类技术的有效性验证

刘务华,罗铁坚,王文杰   

  1. (中国科学院研究生院,北京 100080)

Abstract: This paper presents the quality evaluation criterions. Based on these criterions it takes three document clustering algorithms for assessment with experiments. The comparison and analysis show that STC(Suffix Tree Clustering) algorithm is better than k-Means and Ant-based clustering algorithms. The better performance of STC algorithm comes from that it takes accounts of the linguistic property when processing the documents. Ant-based clustering algorithm’s performance variation is affected by the input variables. It is necessary to adopt linguistic properties to improve the Ant-based text clustering’s performance.

Key words: Document clustering, Clustering validation, STC, Ant-based

摘要: 讨论了利用分类测试集进行聚类量化评价的标准。在此基础上选择k-Means聚类算法、STC(后缀树聚类)算法和基于Ant的聚类算法进行了实验对比。实验表明,STC聚类算法在处理文本时充分考虑了文本的特性,其聚类效果较好;基于Ant的聚类算法在聚类的划分时效果受参数输入的影响较大,其聚类结果与STC相比并不具有优势;在Ant聚类算法中引入文本特性后,可以提高文本聚类的效果。

关键词: 文本聚类, 聚类有效性验证, 后缀树聚类, Ant-based