摘要: 提出了一种基于相似度的软聚类算法用于文本聚类,这是一种基于相似性度量的有效的软聚类算法,实验表明通过比较SISC 和诸如K-means 的硬聚类算法,SISC 的聚类速度快、效率高。最后展望了文本挖掘在信息技术中的发展前景。
关键词:
Web 文本挖掘;文本聚类;软聚类;相似度
Abstract: This paper proposes similarity-based soft clustering (SISC), an efficient soft clustering algorithm based on a given similarity measure used in document clustering. Comparison with existing hard clustering algorithms like K-means, the experiment indicates SISC is both efficient and effective, and this algorithm is available for document clustering. In the end, it highlights the upcoming challenges of document mining and the opportunities it offers.
Key words:
Web document mining; Document clustering; Soft clustering; Similarity
姜亚莉,关泽群. 用于 Web 文档聚类的基于相似度的软聚类算法[J]. 计算机工程, 2006, 32(2): 59-61.
JIANG Yali, GUAN Zequn. A Similarity-based Soft Clustering Algorithm for Web Documents[J]. Computer Engineering, 2006, 32(2): 59-61.