计算机工程 ›› 2012, Vol. 38 ›› Issue (9): 186-188,192.doi: 10.3969/j.issn.1000-3428.2012.09.056

• 人工智能及识别技术 • 上一篇    下一篇

基于类别区分度和关联性分析的综合特征选择

陈建华,王治和,蒋 芸   

  1. (西北师范大学数学与信息科学学院,兰州 730070)
  • 收稿日期:2011-07-08 出版日期:2012-05-05 发布日期:2012-05-05
  • 作者简介:陈建华(1988-),女,硕士研究生,主研方向:数据挖掘;王治和,教授;蒋 芸,副教授
  • 基金项目:
    国家自然科学基金资助项目(60873196);甘肃省自然科学基金资助项目(1010RJZA022);西北师范大学2010年第三期知识与创新工程科研骨干基金资助项目(nwnu-kjcxgc-03-67)

Syntaxic Feature Selection Based on Category Discrimination Degree and Correlation Analysis

CHEN Jian-hua, WANG Zhi-he, JIANG Yun   

  1. (College of Mathematics and Information Science, Northwest Normal University, Lanzhou 730070, China)
  • Received:2011-07-08 Online:2012-05-05 Published:2012-05-05

摘要: 提出一种基于类别区分度和关联性分析的综合特征选择算法。利用类别区分度提取具有较强类别区分能力的特征词,降低特征空间的稀疏性,通过特征的关联性分析衡量特征与类别的相关性及特征之间的冗余度,选出具有类别代表性且相互之间不存在冗余的特征词。实验结果表明,该算法能有效提高分类器性能。

关键词: 文本分类, 特征选择, 关联性分析, 类别区分度, 相关独立度

Abstract: This paper proposes a syntaxic feature selection algorithm based on category discrimination degree and correlation analysis. The algorithm uses discrimination degree to extract the features that reveal larger differences among categories to reduce the sparsity of feature spaces, and emploies correlation analysis of features to measure relativity between features and categories and redundancy among features, so it can acquire the feature subsets which are more representative and have no redundancy between each other. Experimental results show that the proposed algorithm can improve the performance of the classifier effectively.

Key words: text categorization, feature selection, correlation analysis, category discrimination degree, relevant independence degree

中图分类号: