作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (10): 13-15. doi: 10.3969/j.issn.1000-3428.2008.10.005

• 博士论文 • 上一篇    下一篇

半监督聚类中基于密度的约束扩展方法

张 亮,李敏强   

  1. (天津大学管理学院,天津 300072)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-05-20 发布日期:2008-05-20

Density-based Constraint Expansion Method for Semi-supervised Clustering

ZHANG Liang, LI Min-qiang   

  1. (School of Management, Tianjin University, Tianjin 300072)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-05-20 Published:2008-05-20

摘要: 现有的半监督聚类方法较少利用数据集空间结构信息,限制了聚类算法的性能。该文提出一种基于密度的约束扩展方法(DCE),将数据集以图的形式表达,定义一种基于密度的图形相似度。根据样本点间的距离和相似度关系,对已知约束集进行扩展,扩展后的约束集可用于各种半监督聚类算法。以约束完全连接聚类和成对约束K均值方法为例,说明了约束扩展方法的应用。实验表明,DCE能够有效地提升半监督聚类算法的性能。

关键词: 半监督聚类, 基于密度的距离, 约束扩展

Abstract: Most of the existing semi-supervised clustering methods neglect the structural information of the data, while the few constraints available may degrade the performance of the algorithms. This paper presents a Density-based Constraint Expansion(DCE) method. The dataset is represented by a graph. It introduces a density-based graph similarity. The constraint set is expanded by the similarity of the data samples. The expanded constraint set can be used in all semi-supervised clustering algorithms, including the constraint complete link algorithm and the pairwise constraint K means algorithm. Experimental results on several synthetic datasets and real-world datasets show that the DCE method can effectively enhance the performance of the semi-supervised clustering algorithms.

Key words: semi-supervised clustering, density-based distance, constraint expansion

中图分类号: