计算机工程 ›› 2010, Vol. 36 ›› Issue (8): 37-39.doi: 10.3969/j.issn.1000-3428.2010.08.013

• 软件技术与数据库 • 上一篇    下一篇

基于改进的ID3信息增益的特征选择方法

朱颢东1,2,钟 勇1,2   

  1. (1. 中国科学院成都计算机应用研究所,成都 610041;2. 中国科学院研究生院,北京 100039)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-04-20 发布日期:2010-04-20

Feature Selection Method Based on Improved ID3 Information Gain

ZHU Hao-dong1,2, ZHONG Yong1,2   

  1. (1. Chengdu Institute of Computer Application, Chinese Academy of Sciences, Chengdu 610041;2. Graduate University of Chinese Academy of Sciences, Beijing 100039)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-04-20 Published:2010-04-20

摘要: 针对ID3中信息增益的缺点,通过引进属性依赖度进行改进,提出一种综合的特征选择方法,使用优化的文档频方法进行特征初选以降低文本向量的稀疏性,利用改进的信息增益方法进一步选择特征,以获得具有代表性的特征子集。实验结果表明该方法的性能优于信息增益、 统计量及互信息方法。

关键词: 特征选择, 文档频, ID3算法, 信息增益, 属性依赖度

Abstract: Aiming at the shortcomings of Information Gain(IG) in ID3 algorithm, by introducing attribute dependence to improve IG, this paper presents a comprehensive feature selection method. It uses the optimal Document Frequency(DF) method to select features to reduce the sparsity of feature spaces, and employs the improved IG method to select features, so that it can acquire the feature subsets which are more representative. Experimental results show that the method is prior to IG, CHI and MI method.

Key words: feature selection, Document Frequency(DF), ID3 algorithm, Information Gain(IG), attribute dependence

中图分类号: