作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (3): 166-173. doi: 10.19678/j.issn.1000-3428.0056409

• 网络空间安全 • 上一篇    下一篇

基于动态行为和机器学习的恶意代码检测方法

陈佳捷, 彭伯庄, 吴佩泽   

  1. 中国南方电网数字电网研究院有限公司, 广州 510000
  • 收稿日期:2019-10-25 修回日期:2020-04-15 发布日期:2020-04-26
  • 作者简介:陈佳捷(1995-),男,研究员、硕士,主研方向为工控安全、人工智能;彭伯庄、吴佩泽,工程师。
  • 基金资助:
    中国南方电网科技项目(ZBKJXM20180749)。

Malicious Code Detection Method Based on Dynamic Behavior and Machine Learning

CHEN Jiajie, PENG Bozhuang, WU Peize   

  1. Digital Power Grid Research Institute Co., Ltd., of China Southern Power Grild, Guangzhou 510000, China
  • Received:2019-10-25 Revised:2020-04-15 Published:2020-04-26

摘要: 目前恶意代码出现频繁且抗识别性加强,现有基于签名的恶意代码检测方法无法识别未知与隐藏的恶意代码。提出一种结合动态行为和机器学习的恶意代码检测方法。搭建自动化分析Cuckoo沙箱记录恶意代码的行为信息和网络流量,结合Cuckoo沙箱与改进DynamoRIO系统作为虚拟环境,提取并融合恶意代码样本API调用序列及网络行为特征。在此基础上,基于双向门循环单元(BGRU)建立恶意代码检测模型,并在含有12 170个恶意代码样本和5 983个良性应用程序样本的数据集上对模型效果进行验证。实验结果表明,该方法能全面获得恶意代码的行为信息,其所用BGRU模型的检测效果较LSTM、BLSTM等模型更好,精确率和F1值分别达到97.84%和98.07%,训练速度为BLSTM模型的1.26倍。

关键词: 恶意代码, 应用程序接口序列, 流量分析, Cuckoo沙箱, DynamoRIO系统, 双向门循环单元网络

Abstract: As the malicious codes with increasing anti-recognition ability emerge in an endless stream,the existing signature-based malicious code detection methods fail to identify unknown and hidden malicious codes.To address the problem,this paper proposes a malicious code detection method combining dynamic behavior and machine learning.In this method,a Cuckoo sandbox for automatic analysis is built to record the behavior information and network traffic of malicious code.Then the Cuckoo sandbox is integrated with the improved DynamoRIO system as a virtual environment, which enables the extraction and fusion of the Application Programming Interface(API)call sequence and network behavior characteristics of malicious code samples.On this basis,a malicious code detection model based on Bidirectional Gated Recurrent Unit(BGRU) is established,whose performance is tested on the dataset containing 12 170 malicious code samples and 5 983 benign application samples.Experimental results show that the proposed method can obtain the behavior information of malicious code comprehensively,the detection effect of BGRU model is better than Long Short-Term Memory(LSTM),Bidirectional Long Short-Term Memory(BLSTM) and other models,the accuracy and F1 value are 97.84% and 98.07% respectively,and the training speed is 1.26 times of BLSTM model.

Key words: malicious code, Application Programming Interface(API)sequence, traffic analysis, Cuckoo sandbox, DynamoRIO system, Bidirectional Gated Recurrent Unit(BGRU) network

中图分类号: