作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (15): 181-183. doi: 10.3969/j.issn.1000-3428.2011.15.057

• 人工智能及识别技术 • 上一篇    下一篇

基于互信息和粗糙集理论的特征选择

朱颢东,李红婵   

  1. (郑州轻工业学院计算机与通信工程学院,郑州 450002)
  • 收稿日期:2011-01-13 出版日期:2011-08-05 发布日期:2011-08-05
  • 作者简介:朱颢东(1980-),男,博士,主研方向:文本挖掘,智能信息处理,计算智能;李红婵,硕士
  • 基金资助:
    河南省基础与前沿技术研究计划基金资助项目(102300 410266);郑州轻工业学院博士科研基金资助项目

Feature Selection Based on Mutual Information and Rough Set Theory

ZHU Hao-dong, LI Hong-chan   

  1. (School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450002, China)
  • Received:2011-01-13 Online:2011-08-05 Published:2011-08-05

摘要: 针对互信息方法在精度方面的不足,通过引入粗糙集,给出一种基于关系积理论的属性约简算法,以此为基础提出一个适用于海量文本数据集的特征选择方法。该方法采用互信息进行特征初选,利用提出的属性约简算法消除冗余,获得较具代表性的特征子集。实验结果表明,该特征选择方法能获得冗余度小且较具代表性的特征子集。

关键词: 特征选择, 互信息, 粗糙集, 关系积理论, 属性约简

Abstract: Feature selection is research hotspot in text automatic categorization. Mutual Information(MI) is analyzed. And according to deficiency of MI, Rough Set(RS) is introduced and an attribute reduction algorithm based on relation union theory is proposed. A feature selection method based on MI and the proposed attribute reduction algorithm is presented, and it is suitable for massive text data sets. The method uses MI to select features, and employs the proposed attribute reduction algorithm to eliminate redundancy, so it can acquire the feature subsets which are more representative. Experimental results show that the method is promising.

Key words: feature selection, Mutual Information(MI), Rough Set(RS), relation union theory, attribute reduction

中图分类号: