作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (14): 135-137. doi: 10.3969/j.issn.1000-3428.2012.14.040

• 人工智能及识别技术 • 上一篇    下一篇

基于条件互信息的特征选择改进算法

刘海燕,王 超,牛军钰   

  1. (复旦大学计算机科学技术学院,上海 201203)
  • 收稿日期:2011-11-21 出版日期:2012-07-20 发布日期:2012-07-20
  • 作者简介:刘海燕(1980-),女,硕士研究生,主研方向:Web信息检索;王 超,硕士研究生;牛军钰,副教授
  • 基金资助:
    国家“863”计划基金资助项目(2009AA01Z429)

Improved Feature Selection Algorithm Based on Conditional Mutual Information

LIU Hai-yan, WANG Chao, NIU Jun-yu   

  1. (School of Computer Science, Fudan University, Shanghai 201203, China)
  • Received:2011-11-21 Online:2012-07-20 Published:2012-07-20

摘要: 针对传统特征选择算法只专注于特征类相关性或者特征冗余性的问题,提出一种基于条件互信息的特征选择算法。该算法采用k-means的基本思想聚类特征,并从中选出类相关度最大的特征,从而去除不相关和冗余特征。实验使用5个数据集,结果表明,该算法的分类性能优于传统特征选择算法。

关键词: 数据挖掘, 特征选择, 互信息, 条件互信息, 聚类, 度量距离

Abstract: Aiming at the shortcomings of traditional feature selection which are neglect of relevancy to the class and redundancy to the feature, this paper introduces a feature selection algorithm based on conditional mutual information. The algorithm clusters interdependent features into clusters and selects one feature which has maximum mutual information with class, the irrelevant and redundant features are removed. Experimental results show that the method is prior to traditional feature selection from the point of view of classification accuracy.

Key words: data mining, feature selection, mutual information, conditional mutual information, clustering, metric distance

中图分类号: