摘要: 针对传统特征选择算法只专注于特征类相关性或者特征冗余性的问题,提出一种基于条件互信息的特征选择算法。该算法采用k-means的基本思想聚类特征,并从中选出类相关度最大的特征,从而去除不相关和冗余特征。实验使用5个数据集,结果表明,该算法的分类性能优于传统特征选择算法。
关键词:
数据挖掘,
特征选择,
互信息,
条件互信息,
聚类,
度量距离
Abstract: Aiming at the shortcomings of traditional feature selection which are neglect of relevancy to the class and redundancy to the feature, this paper introduces a feature selection algorithm based on conditional mutual information. The algorithm clusters interdependent features into clusters and selects one feature which has maximum mutual information with class, the irrelevant and redundant features are removed. Experimental results show that the method is prior to traditional feature selection from the point of view of classification accuracy.
Key words:
data mining,
feature selection,
mutual information,
conditional mutual information,
clustering,
metric distance
中图分类号:
刘海燕, 王超, 牛军钰. 基于条件互信息的特征选择改进算法[J]. 计算机工程, 2012, 38(14): 135-137.
LIU Hai-Yan, WANG Chao, NIU Jun-Yu. Improved Feature Selection Algorithm Based on Conditional Mutual Information[J]. Computer Engineering, 2012, 38(14): 135-137.