作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 人工智能及识别技术 • 上一篇    下一篇

一种基于ReliefF 特征加权的R-NIC 算法

陈晓琳,姬 波,叶阳东   

  1. (郑州大学信息工程学院,郑州450052)
  • 收稿日期:2014-03-26 出版日期:2015-04-15 发布日期:2015-04-15
  • 作者简介:陈晓琳(1989 - ),女,硕士研究生,主研方向:数据挖掘,机器学习;姬 波,副教授、博士;叶阳东,教授、博士生导师。
  • 基金资助:
    国家自然科学基金资助项目“多变量IB 方法及算法的研究”(61170223);国家自然科学基金联合基金资助项目“可扩展迁移 学习中跨媒体复杂问题自动映射研究”(U1204610)。

A R-NIC Algorithm Based on ReliefF Feature Weighting

CHEN Xiaolin,JI Bo,YE Yangdong   

  1. (School of Information Engineering,Zhengzhou University,Zhengzhou 450052,China)
  • Received:2014-03-26 Online:2015-04-15 Published:2015-04-15

摘要: 非参数信息理论聚类(NIC)算法通过计算数据点与簇间的互信息来实现聚类,利用无参估计法计算集群平均熵,从而降低人为参与的成本,但该算法假定待分析样本的所有特征对分类具有相同的贡献,与目前已有的研结果相悖。为此,提出一种特征加权的R-NIC 算法,该算法考虑各维特征对模式分类的不同影响,使用ReliefF对特征进行加权变换,抑制冗余特征,加强有效特征,利用NIC 算法在变换后的特征空间中进行聚类以提高聚类效 果。在UCI 数据集上的实验结果表明,该算法具有较高的聚类性能,聚类效果优于NIC 算法。

关键词: 非监督, 聚类, 互信息, 非参数信息理论聚类算法, 准确率, 特征加权

Abstract: Nonparametric Information theoretic Clustering (NIC) utilizes a non-parametric estimation of the average cluster entropies to maximize the estimated mutual information between data points and clusters,which effectively reduces the cost of participation. However,the algorithm assumes that all features of the sample to be analyzed plays a uniform contribution in the process of cluster analysis. Obviously,the hypothesis is inconsistent with a lot of practices. Therefore, this paper proposes a novel non-parametric feature weighting clustering algorithm based on ReliefF,which is named RNIC, to consider of different feature. It adopts ReliefF to transform and weighting features,R-NIC can inhibit redundant features,improves the clustering results by clustering in the transformed feature space. Experimental results on UCI datasets show that the performance of the proposed R-NIC algorithm is superior to the NIC algorithm.

Key words: unsupervised, clustering, mutual information, Nonparametric Information theoretic Clustering ( NIC ) algorithm, accuracy, feature weighting

中图分类号: