摘要: 现有的半监督聚类方法较少利用数据集空间结构信息,限制了聚类算法的性能。该文提出一种基于密度的约束扩展方法(DCE),将数据集以图的形式表达,定义一种基于密度的图形相似度。根据样本点间的距离和相似度关系,对已知约束集进行扩展,扩展后的约束集可用于各种半监督聚类算法。以约束完全连接聚类和成对约束K均值方法为例,说明了约束扩展方法的应用。实验表明,DCE能够有效地提升半监督聚类算法的性能。
关键词:
半监督聚类,
基于密度的距离,
约束扩展
Abstract: Most of the existing semi-supervised clustering methods neglect the structural information of the data, while the few constraints available may degrade the performance of the algorithms. This paper presents a Density-based Constraint Expansion(DCE) method. The dataset is represented by a graph. It introduces a density-based graph similarity. The constraint set is expanded by the similarity of the data samples. The expanded constraint set can be used in all semi-supervised clustering algorithms, including the constraint complete link algorithm and the pairwise constraint K means algorithm. Experimental results on several synthetic datasets and real-world datasets show that the DCE method can effectively enhance the performance of the semi-supervised clustering algorithms.
Key words:
semi-supervised clustering, density-based distance, constraint expansion
中图分类号:
张 亮;李敏强. 半监督聚类中基于密度的约束扩展方法[J]. 计算机工程, 2008, 34(10): 13-15.
ZHANG Liang; LI Min-qiang. Density-based Constraint Expansion Method for Semi-supervised Clustering[J]. Computer Engineering, 2008, 34(10): 13-15.