摘要: 针对ID3中信息增益的缺点,通过引进属性依赖度进行改进,提出一种综合的特征选择方法,使用优化的文档频方法进行特征初选以降低文本向量的稀疏性,利用改进的信息增益方法进一步选择特征,以获得具有代表性的特征子集。实验结果表明该方法的性能优于信息增益、 统计量及互信息方法。
关键词:
特征选择,
文档频,
ID3算法,
信息增益,
属性依赖度
Abstract: Aiming at the shortcomings of Information Gain(IG) in ID3 algorithm, by introducing attribute dependence to improve IG, this paper presents a comprehensive feature selection method. It uses the optimal Document Frequency(DF) method to select features to reduce the sparsity of feature spaces, and employs the improved IG method to select features, so that it can acquire the feature subsets which are more representative. Experimental results show that the method is prior to IG, CHI and MI method.
Key words:
feature selection,
Document Frequency(DF),
ID3 algorithm,
Information Gain(IG),
attribute dependence
中图分类号:
朱颢东;钟 勇;. 基于改进的ID3信息增益的特征选择方法[J]. 计算机工程, 2010, 36(8): 37-39.
ZHU Hao-dong; ZHONG Yong;. Feature Selection Method Based on Improved ID3 Information Gain[J]. Computer Engineering, 2010, 36(8): 37-39.