作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (16): 150-152. doi: 10.3969/j.issn.1000-3428.2007.16.052

• 人工智能及识别技术 • 上一篇    下一篇

基于关联特征扩展的特征选择算法

古 平,朱庆生,何希平,李云峰   

  1. (重庆大学计算机学院,重庆 400044)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-08-20 发布日期:2007-08-20

Feature Selection Algorithm Based on Association Features Enhancement

GU Ping, ZHU Qing-sheng, HE Xi-ping, LI Yun-feng   

  1. (School of Computer Science, Chongqing University, Chongqing 400044)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-08-20 Published:2007-08-20

摘要: 特征选择是文档分类中常见的预处理工作,通过对文档特征空间降维,可以提高文档的分类性能。针对多数特征选择算法不考虑特征词共现关系的问题,该文提出了一种利用关联特征来增强文档分类性能的方法,针对特征扩展后产生的高维向量空间设计了一种快速冗余特征去除和选择算法,以满足实际应用中对增强特征分类性能和执行效率的需要。实验采用朴素贝叶斯网作为分类器,从特征降维效果、分类性能以及算法执行效率等方面与其他算法进行了比较。

关键词: 文档分类, 特征选择, 关联特征

Abstract: Feature selection is frequently used as a preprocessing step to text classification, which is effective in reducing dimensionality and increasing classification accuracy. However, most feature selection algorithms fail to take advantage of the co-occurrence of words. This paper explores the use of association features to enhance the performance of primitive features and proposes a new fast algorithm for identifying relevant features as well as redundancy among high dimensional features. The experiment are conducted with Naïve Bayes, it compares the method with other feature selection algorithms with respect to the feature numbers, accuracy and effectiveness.

Key words: text classification, feature selection, association feature

中图分类号: