作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (6): 162-169,179. doi: 10.19678/j.issn.1000-3428.0065643

• 网络空间安全 • 上一篇    下一篇

基于集成学习的系统调用实时异常检测框架

陈仲磊1,2, 伊鹏1,2, 陈祥1,2, 胡涛1,2   

  1. 1. 战略支援部队信息工程大学 信息技术研究所, 郑州 450002;
    2. 国家数字交换系统工程技术研究中心 内生安全研究中心, 郑州 450002
  • 收稿日期:2022-08-31 修回日期:2022-10-13 发布日期:2022-11-14
  • 作者简介:陈仲磊(1997-),男,硕士研究生,主研方向为入侵检测;伊鹏,研究员、博士;陈祥,助理研究员、博士研究生;胡涛,博士。
  • 基金资助:
    国家重点研发计划(2019YFB802505,2020YFB806402)。

Real-time Anomaly Detection Framework via System Calls Based on Integrated Learning

CHEN Zhonglei1,2, YI Peng1,2, CHEN Xiang1,2, HU Tao1,2   

  1. 1. Information Technology Institute, Information Engineering University, Zhengzhou 450002, China;
    2. Endogenous Safety and Security Research Center, National Digital Switching System Engineering & Technological Research Center, Zhengzhou 450002, China
  • Received:2022-08-31 Revised:2022-10-13 Published:2022-11-14

摘要: 基于系统调用数据的异常检测无法完成进程生命周期内的入侵行为同步感知任务,且存在实时异常检测准确率低的问题。提出一个基于集成学习的系统调用实时异常检测框架,其中包括数据处理与切片、集成学习、异常检测与反馈模块。在数据处理与切片模块中,对处于生命周期内的进程行为轨迹进行采集与分析,根据线上待分析数据与线下模型训练数据对时效性的不同要求,设计2种系统调用轨迹的切分策略;在集成学习模块中,改进GPT语言模型和门控循环神经单元用于构建系统调用轨迹片段行为轮廓,以集成学习思想融合异常检测异构模型同时抓取单向语义特征与统计特征;在异常检测与反馈模块中,采用考虑单个系统调用重要度的异常判决方法,引入同步感知与实时裁决共存的异常预警机制。在公开数据集上的实验结果表明,该框架具有进程生命周期内的入侵同步感知能力,所构建的集成模型在保证低误报率(0.2%)的同时具有高异常检测准确率(99.3%),优于决策树模型、单分类SVM、BiLSTM等对比模型。

关键词: 异常检测, 系统调用, 主机入侵检测, 集成学习, 半监督学习

Abstract: Anomaly detection based on system calls data cannot complete the synchronous perception task of intrusion behavior within the process lifecycle,and there is a problem of low real-time anomaly detection accuracy. We proposes a real-time anomaly detection framework for system calls based on integrated learning,which includes data processing and slicing,integrated learning,anomaly detection and feedback modules. In the data processing and slicing module,the process behavior trajectories are collected and analyzed within the lifecycle.Based on the different requirements for timeliness of online and offline model training data,two segmentation strategies are designed for system calls trajectories;in the integrated learning module,the improved GPT language model and Gated Recurrent Unit(GRU) are used to construct the behavior profile of system calls trajectory fragments,and the integrated learning idea is used to fuse anomaly detection heterogeneous models while capturing unidirectional semantic and statistical features;in the anomaly detection and feedback module,an anomaly detection method that considers the importance of individual system calls is adopted,and an anomaly warning mechanism that coexists with synchronous perception and real-time decision-making is introduced.The experimental results on public datasets show that the framework has the ability to detect intrusion synchronization throughout the process lifecycle. The constructed integrated model ensures a low false alarm rate(0.2%) while maintaining a high anomaly detection accuracy(99.3%),which is superior to comparison models such as Decision Tree(DT),one-class SVM, and BiLSTM models.

Key words: anomaly detection, system calls, host intrusion detection, integrated learning, semi-supervised learning

中图分类号: