Abstract:
A novel algorithm of feature selection is presented and applied to detect P2P downloading. It is based on information entropy and frequent itemset theory. There are two datasets corresponding to P2P and non-P2P downloading respectively, and the entropy values of their attributes are compared. The attributes are labeled with “of interest”, if there is large difference between same attribute from different datasets. The attributes are extracted, if support value of the attribute of the interest is larger than a threshold. A new model is presented with the extracted attributes, and the efficiency of the model is demonstrated with examples.
Key words:
feature selection,
association rule,
P2P downloading,
intrusion detection
摘要: 提出一种新的特征选择算法,融合了信息熵和关联规则理论,并应用于检测P2P下载的数据集,目的是检测局域网内占用较大带宽下载的用户。该方法计算含有和不含有P2P下载的数据集各属性的熵值,对两者进行对比,将熵值变化大的属性标记为感兴趣属性。利用关联规则对含有P2P下载的数据集挖掘出大于一定阈值的1频繁项目集的属性,得到精简的属性集。利用该精简属性集,提出一种检测模型,用于检测局域网中P2P大规模下载的用户,取得了较好的效果。
关键词:
特征选择,
关联规则,
P2P下载,
入侵检测
CLC Number:
WANG Xiu-ying; SHAO Zhi-qing; LIU Hong-li. Hybrid Algorithm of Feature Selection and Its Application[J]. Computer Engineering, 2008, 34(11): 61-62,6.
王秀英;邵志清;刘红丽. 一种杂交特征选择算法及其应用[J]. 计算机工程, 2008, 34(11): 61-62,6.