作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2008, Vol. 34 ›› Issue (11): 61-62,6. doi: 10.3969/j.issn.1000-3428.2008.11.022

• 软件技术与数据库 • 上一篇    下一篇

一种杂交特征选择算法及其应用

王秀英1,2,邵志清1,刘红丽1   

  1. (1. 华东理工大学信息学院,上海 200237;2. 上海新侨职业技术学院计算机信息系,上海 200237)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2008-06-05 发布日期:2008-06-05

Hybrid Algorithm of Feature Selection and Its Application

WANG Xiu-ying1,2, SHAO Zhi-qing1, LIU Hong-li1   

  1. (1. College of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237; 2. Department of Computer and Information, Shanghai Xinqiao Vocational and Technical College, Shanghai 200237)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-06-05 Published:2008-06-05

摘要: 提出一种新的特征选择算法,融合了信息熵和关联规则理论,并应用于检测P2P下载的数据集,目的是检测局域网内占用较大带宽下载的用户。该方法计算含有和不含有P2P下载的数据集各属性的熵值,对两者进行对比,将熵值变化大的属性标记为感兴趣属性。利用关联规则对含有P2P下载的数据集挖掘出大于一定阈值的1频繁项目集的属性,得到精简的属性集。利用该精简属性集,提出一种检测模型,用于检测局域网中P2P大规模下载的用户,取得了较好的效果。

关键词: 特征选择, 关联规则, P2P下载, 入侵检测

Abstract: A novel algorithm of feature selection is presented and applied to detect P2P downloading. It is based on information entropy and frequent itemset theory. There are two datasets corresponding to P2P and non-P2P downloading respectively, and the entropy values of their attributes are compared. The attributes are labeled with “of interest”, if there is large difference between same attribute from different datasets. The attributes are extracted, if support value of the attribute of the interest is larger than a threshold. A new model is presented with the extracted attributes, and the efficiency of the model is demonstrated with examples.

Key words: feature selection, association rule, P2P downloading, intrusion detection

中图分类号: