作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (04): 184-186.

• 人工智能及识别技术 • 上一篇    下一篇

一种适合大规模数据集的特征选择方法

张 莉,陈恭和   

  1. (对外经济贸易大学信息技术与管理工程学院,北京 100029)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-02-20 发布日期:2007-02-20

A Feature Selection Method Fitting for Large Data Set

ZHANG Li, CHEN Gonghe   

  1. (School of Information Technology & Management Engineering, University of International Business and Economics, Beijing 100029)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-02-20 Published:2007-02-20

摘要: 研究训练样本重要特征选择问题,提出了一种适合大规模数据集的特征选择方法。在不同的样本空间中利用特征相似性和浮动搜索方法的思想选择特征,基于互信息和分类准确度加权选择分类器,提出了基于Bagging选择性组合算法来提高特征选择算法稳定性。采用KDD Cup’99中的入侵检测数据对算法性能进行了验证。

关键词: 特征选择, 特征相似性, 浮动搜索, 选择性集成

Abstract: This paper researches on problems of selecting important features and proposes a feature selection method fitting for large data set, selects feature subset using feature similarity, the idea of floating search method, and classifiers with the help of mutual information and accuracy weight, and propose a Bagging-based selective result ensemble algorithm to improve the algorithm stability. Intrusion detection data of KDD Cup’99 to validate the performance of algorithm is introduced.

Key words: Feature selection, Feature similarity, Floating search, Selective ensemble