摘要: 非参数信息理论聚类(NIC)算法通过计算数据点与簇间的互信息来实现聚类,利用无参估计法计算集群平均熵,从而降低人为参与的成本,但该算法假定待分析样本的所有特征对分类具有相同的贡献,与目前已有的研结果相悖。为此,提出一种特征加权的R-NIC 算法,该算法考虑各维特征对模式分类的不同影响,使用ReliefF对特征进行加权变换,抑制冗余特征,加强有效特征,利用NIC 算法在变换后的特征空间中进行聚类以提高聚类效
果。在UCI 数据集上的实验结果表明,该算法具有较高的聚类性能,聚类效果优于NIC 算法。
关键词:
非监督,
聚类,
互信息,
非参数信息理论聚类算法,
准确率,
特征加权
Abstract: Nonparametric Information theoretic Clustering (NIC) utilizes a non-parametric estimation of the average
cluster entropies to maximize the estimated mutual information between data points and clusters,which effectively reduces the cost of participation. However,the algorithm assumes that all features of the sample to be analyzed plays a uniform contribution in the process of cluster analysis. Obviously,the hypothesis is inconsistent with a lot of practices. Therefore, this paper proposes a novel non-parametric feature weighting clustering algorithm based on ReliefF,which is named RNIC, to consider of different feature. It adopts ReliefF to transform and weighting features,R-NIC can inhibit redundant features,improves the clustering results by clustering in the transformed feature space. Experimental results on UCI datasets show that the performance of the proposed R-NIC algorithm is superior to the NIC algorithm.
Key words:
unsupervised,
clustering,
mutual information,
Nonparametric Information theoretic Clustering ( NIC ) algorithm,
accuracy,
feature weighting
中图分类号:
陈晓琳,姬波,叶阳东. 一种基于ReliefF 特征加权的R-NIC 算法[J]. 计算机工程.
CHEN Xiaolin,JI Bo,YE Yangdong. A R-NIC Algorithm Based on ReliefF Feature Weighting[J]. Computer Engineering.