基于动态行为和机器学习的恶意代码检测方法

doi:10.19678/j.issn.1000-3428.0056409

摘要/Abstract

摘要： 目前恶意代码出现频繁且抗识别性加强，现有基于签名的恶意代码检测方法无法识别未知与隐藏的恶意代码。提出一种结合动态行为和机器学习的恶意代码检测方法。搭建自动化分析Cuckoo沙箱记录恶意代码的行为信息和网络流量，结合Cuckoo沙箱与改进DynamoRIO系统作为虚拟环境，提取并融合恶意代码样本API调用序列及网络行为特征。在此基础上，基于双向门循环单元（BGRU）建立恶意代码检测模型，并在含有12 170个恶意代码样本和5 983个良性应用程序样本的数据集上对模型效果进行验证。实验结果表明，该方法能全面获得恶意代码的行为信息，其所用BGRU模型的检测效果较LSTM、BLSTM等模型更好，精确率和F1值分别达到97.84%和98.07%，训练速度为BLSTM模型的1.26倍。

关键词: 恶意代码, 应用程序接口序列, 流量分析, Cuckoo沙箱, DynamoRIO系统, 双向门循环单元网络

Abstract: As the malicious codes with increasing anti-recognition ability emerge in an endless stream,the existing signature-based malicious code detection methods fail to identify unknown and hidden malicious codes.To address the problem,this paper proposes a malicious code detection method combining dynamic behavior and machine learning.In this method,a Cuckoo sandbox for automatic analysis is built to record the behavior information and network traffic of malicious code.Then the Cuckoo sandbox is integrated with the improved DynamoRIO system as a virtual environment, which enables the extraction and fusion of the Application Programming Interface(API)call sequence and network behavior characteristics of malicious code samples.On this basis,a malicious code detection model based on Bidirectional Gated Recurrent Unit(BGRU) is established,whose performance is tested on the dataset containing 12 170 malicious code samples and 5 983 benign application samples.Experimental results show that the proposed method can obtain the behavior information of malicious code comprehensively,the detection effect of BGRU model is better than Long Short-Term Memory(LSTM),Bidirectional Long Short-Term Memory(BLSTM) and other models,the accuracy and F1 value are 97.84% and 98.07% respectively,and the training speed is 1.26 times of BLSTM model.

Key words: malicious code, Application Programming Interface(API)sequence, traffic analysis, Cuckoo sandbox, DynamoRIO system, Bidirectional Gated Recurrent Unit(BGRU) network

中图分类号:

TP309

陈佳捷, 彭伯庄, 吴佩泽. 基于动态行为和机器学习的恶意代码检测方法[J]. 计算机工程, 2021, 47(3): 166-173.

CHEN Jiajie, PENG Bozhuang, WU Peize. Malicious Code Detection Method Based on Dynamic Behavior and Machine Learning[J]. Computer Engineering, 2021, 47(3): 166-173.

https://www.ecice06.com/CN/Y2021/V47/I3/166

图/表 10

20210322171710

20210322171713

20210322171716

20210322171719

20210322171723

20210322171727

20210322171731

20210322171734

20210322171738

20210322171741

参考文献

[1] Kaspersky Lab.Digital dangerscape:kaspersky lab spotlights cybersecurity trends in the middle east,turkey and africa[EB/OL].[2019-09-10].https://me-en.kaspersky.com/about/press-releases/2019_digital-dangerscape-kaspersky-lab-spotlights-cybersecurity-trends-in-the-middle-east-turkey-and-africa.
[2] FERRAND O.How to detect the Cuckoo sandbox and to strengthen it?[J].Journal of Computer Virology and Hacking Techniques,2015,11(9):51-58.
[3] ZHANG Jinglian,PENG Yanbing.Research on malware code classification based on features fusion[J].Computer Engineering,2019,45(8):281-286.(in Chinese)张景莲,彭艳兵.基于特征融合的恶意代码分类研究[J].计算机工程,2019,45(8):281-286.
[4] SANTOS I,BREZO F,UGARTE-PEDRERO X,et al.Opcode sequences as representation of executables for data-mining-based unknown malware detection[J].Information Sciences,2013,231(5):64-82.
[5] BOUJNOUNI E M,JEDRA M,ZAHID N.New malware detection framework based on n-grams and support vector domain description[C]//Proceedings of 2015 International Conference on Information Assurance and Security.Washington D.C.,USA:IEEE Press,2015:123-128.
[6] ERDENE B M,PARK H,LI H,et al.Entropy analysis to classify unknown packing algorithms for malware detection[J].International Journal of Information Security,2017,16(3):227-248.
[7] ZENG Yaqin,ZHANG Linlin,ZHANG Ruonan,et al.Malware family classification model based on MobileNet[J].Computer Engineering,2020,46(4):162-168.(in Chinese)曾娅琴,张琳琳,张若楠,等.基于MobileNet的恶意软件家族分类模型[J].计算机工程,2020,46(4):162-168.
[8] WANG Bo,CAI Honghao,SU Yang.Classification of malicious code variants based on VGG net[EB/OL].[2019-09-10].http://kns.cnki.net/kcms/detail/51.1307.TP.20190924.1107.006.html.(in Chinese)王博,蔡弘昊,苏旸.基于VGG网络的恶意代码变种分类[EB/OL].[2019-09-10].http://kns.cnki.net/kcms/detail/51.1307.TP.20190924.1107.006.html.
[9] SPINELLIS D.Reliable identification of bounded-length viruses is NP-complete[J].IEEE Transactions on Information Theory,2003,49(1):280-284.
[10] ANDERSON B,QUIST D,NEIL J,et al.Graph-based malware detection using dynamic analysis[J].Journal in Computer Virology,2011,7(4):247-258.
[11] PAI S,TROIA D F,VISAGGIO C A,et al.Clustering for malware classification[J].Journal of Computer Virology and Hacking Techniques,2017,13(2):95-107.
[12] BEKERMAN D,SHAPIRA B,ROKACH L,et al.Unknown malware detection using network traffic classification[C]//Proceedings of 2015 IEEE Conference on Communications and Network Security.Washington D.C.,USA:IEEE Press,2015:134-142.
[13] WANG Wei,ZHU Ming,ZENG Xuewen,et al.Malware traffic classification using convolutional neural network for representation learning[C]//Proceedings of 2017 International Conference on Information Networking.Washington D.C.,USA:IEEE Press,2017:712-717.
[14] WANG Qian,SHU Hui,LI Yang,et al.Malicious code behavior analysis based on DynamoRIO[J].Computer Engineering,2011,37(18):139-141.(in Chinese)王乾,舒辉,李洋,等.基于DynamoRIO的恶意代码行为分析[J].计算机工程,2011,37(18):139-141.
[15] GOLDBERG Y,LEVY O.Word2vec explained:deriving Mikolovet al.'s negative-sampling word-embedding method[EB/OL].[2019-09-10].https://www.oalib.com/paper/4043543#.YBWpctfSmuw.
[16] MIKOLOV T,YIH W,ZWEIG G.Linguistic regularities in continuous space word representations[C]//Proceedings of 2013 Conference of the North American Chapter of the Association for Computational Linguistics.New York,USA:ACM Press 2013:746-751.
[17] HWANG R H,PENG M C,NGUYEN V L,et al.An LSTM-based deep learning approach for classifying malicious traffic at the packet level[J].Applied Sciences,2019,9(16):3414-3416.
[18] WANG Baozong,LIU Yongshan,SHI Yu.Constrained relationship-based RNN queries algorithm in two-dimension space[J].Computer Engineering,2008,34(16):69-71.(in Chinese)王宝宗,刘永山,时玉.二维空间中基于约束关系的RNN查询算法[J].计算机工程,2008,34(16):69-71.
[19] GERS F A,SCHMIDHUBER J,CUMMINS F.Learning to forget:continual prediction with LSTM[J].Neural Computation,1999,12(10):2451-2471.
[20] CHUNG J,GULCEHRE C,CHO K H,et al.Empirical evaluation of gated recurrent neural networks on sequence modeling[EB/OL].[2019-09-10].https://www.researchgate.net/publication/269416998_Empirical_Evaluation_of_Gated_Recurrent_Neural_Networks_on_Sequence_Modeling.
[21] FU Rui,ZHANG Zuo,LI Li.Using LSTM and GRU neural network methods for traffic flow prediction[C]//Proceedings of 2016 Youth Academic Annual Conference of Chinese Association of Automation.Washington D.C.,USA:IEEE Press,2016:324-328.
[22] KETKAR N.Deep learning with python[M].Berlin,Germany:Springer,2017.

选择文件类型/文献管理软件名称

选择包含的内容