计算机工程 ›› 2019, Vol. 45 ›› Issue (9): 183-187,193.doi: 10.19678/j.issn.1000-3428.0051013

• 人工智能及识别技术 • 上一篇    下一篇

基于IG_CDmRMR的二阶段特征选择方法

朱文峰, 于舒娟, 何伟   

  1. 南京邮电大学 电子与光学工程学院, 南京 210000
  • 收稿日期:2018-03-30 修回日期:2018-08-28 出版日期:2019-09-15 发布日期:2019-09-03
  • 作者简介:朱文峰(1994-),男,硕士研究生,主研方向为机器学习、自然语言处理;于舒娟,副教授、硕士;何伟,硕士研究生。
  • 基金项目:
    国家自然科学基金(61302155,61276429)。

Two-stage Feature Selection Method Based on IG_CDmRMR

ZHU Wenfeng, YU Shujuan, HE Wei   

  1. College of Electronic and Optical Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210000, China
  • Received:2018-03-30 Revised:2018-08-28 Online:2019-09-15 Published:2019-09-03

摘要: 为提高特征提取方法的文本分类精确度,结合信息增益(IG)和改进的最大相关最小冗余(mRMR),提出一种IG_CDmRMR二阶段文本特征选择方法。通过IG提取与类别相关性较强的特征集合,利用类差分度动态改变mRMR中特征与类别之间的互信息值权重,并筛选最优特征子集,从而得到文本分类结果。实验结果表明,与IG方法、IG_mRMR方法相比,在特征数量相同的情况下,该方法可使准确率提升约2%。

关键词: 信息增益, 最大相关最小冗余, 类差分度, 特征选择, 文本分类

Abstract: In order to improve the text classification accuracy of feature extraction method,combining with Information Gain (IG) and improved minimal Redundancy Maximal Relevance(mRMR),an IG_CDmRMR two-stage text feature selection method is proposed.The IG is used to extract the feature set with strong correlation with the category.The class difference degree is used to dynamically change the weight of the mutual information value between the feature and the category in the mRMR,and the optimal feature subset is filtered to obtain the text categorization result.Experimental results show that compared with the IG method and the IG_mRMR method,the accuracy of the proposed method is improved by about 2% with the same number of features.

Key words: Information Gain(IG), minimal Redundancy Maximal Relevance(mRMR), class difference degree, feature selection, text categorization

中图分类号: