作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2014, Vol. 40 ›› Issue (12): 104-107,113. doi: 10.3969/j.issn.1000-3428.2014.12.019

• 安全技术 • 上一篇    下一篇

基于K-L散度的恶意代码模型聚类检测方法

边根庆a,龚培娇a,邵必林b   

  1. 西安建筑科技大学 a.信息与控制工程学院; b.管理学院,西安710055
  • 收稿日期:2013-11-18 修回日期:2014-02-10 出版日期:2014-12-15 发布日期:2015-01-16
  • 作者简介:边根庆(1968-),男,副教授,主研方向:信息安全,海量信息处理;龚培娇(通讯作者),硕士研究生;邵必林,教授。
  • 基金资助:
    国家自然科学基金资助项目(61272458)。

Detection Method of Malicious Code Model Clustering Based on K-L Divergence

BIAN Genqinga,GONG Peijiaoa,SHAO Bilinb   

  1. a.School of Information and Control Engineering; b.School of Management, Xi’an University of Architecture and Technology,Xi’an 710055,China
  • Received:2013-11-18 Revised:2014-02-10 Online:2014-12-15 Published:2015-01-16

摘要: 在云计算应用环境下,由于服务系统越来越复杂,网络安全漏洞和被攻击情况急剧增加,传统的恶意代码检测技术和防护模式已无法适应云存储环境的需求。为此,通过引入高斯混合模型,建立恶意代码的分层检测机制,使用信息增益和文档频率等方法分析和提取样本数据特征值,结合K-L散度特性,提出基于K-L散度的恶意代码模型聚类检测方法。采用KDDCUP99数据集,使用Weka开源软件完成数据预处理和聚类分析。实验结果表明,在结合信息增益和文档频率进行特征分析的前提下,与贝叶斯算法相比,该方法在虚拟环境中恶意代码的平均检测时间降低16.6%,恶意代码的平均检测率提高1.05%。

关键词: 恶意代码, 高斯混合模型, K-L散度, 模型聚类, 信息增益, 文档频率

Abstract: Under the environment of the cloud computing,the network security vulnerabilities and attack increase rapidly because the service system is more and more complex,and the traditional pattern of malicious code detection technology and protection can not meet the requirement of cloud storage environment.This paper introduces Gaussian Mixture Model(GMM) to build the layered detection mechanism of the malicious code,uses the methods of information gain and document frequency to analyze the malicious code feature,combining K-L Divergence(KLD) to put forward a method of model clustering on malicious code based on K-L divergence method,this method can improve the malicious code detection rate and accurate efficiency than other methods.This paper adopts KDDCUP99 data sets to complete the process of data preprocessing and cluster analysis using the Weka open-source software.Experimental results show that the average malicious code detection time proposed by this paper improves by 16.6% compared with Bayes-algorithm;and meanwhile the rate of malicious code detection increases by 1.05 % under the virtual environment.

Key words: malicious code, Gaussian Mixture Model(GMM), K-L Divergence(KLD), model clustering, information gain, document frequency

中图分类号: