Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (7): 151-158,167. doi: 10.19678/j.issn.1000-3428.0061750

• Cyberspace Security • Previous Articles     Next Articles

Log Anomaly Detection Method Based on CNN-BiLSTM Model

SUN Jia1,2, ZHANG Jianhui2, BU Youjun2, CHEN Bo2, HU Nan1,2, WANG Fangyu1,2   

  1. 1. Zhong Yuan Network Security Research Institute, Zhengzhou University, Zhengzhou 450001, China;
    2. PLA Strategic Support Force Information Engineering University, Zhengzhou 450001, China
  • Received:2021-05-25 Revised:2021-08-25 Online:2022-07-15 Published:2021-08-30

基于CNN-BiLSTM模型的日志异常检测方法

孙嘉1,2, 张建辉2, 卜佑军2, 陈博2, 胡楠1,2, 王方玉1,2   

  1. 1. 郑州大学 中原网络安全研究院, 郑州 450001;
    2. 中国人民解放军战略支援部队信息工程大学, 郑州 450001
  • 作者简介:孙嘉(1995—),男,硕士研究生,主研方向为网络安全、机器学习;张建辉、卜佑军,副研究员;陈博,博士;胡楠、王方玉,硕士研究生。
  • 基金资助:
    国家自然科学基金(62176264);郑州市协同创新重大专项(20XTZX-X010)。

Abstract: At present, the field of log anomaly detection has difficulties such as large data volume, high concealment of faults and attack threats, and complex feature engineering of traditional methods.The rapid research and development of deep learning provides new ideas for solving these problems.Here we propose to combine Convolutional Neural Network(CNN) and Bi-LSTM. The superior CNN-BiLSTM deep learning model not only considers the significant time series characteristics of the log key, but also takes into account the spatial location characteristics of the log parameters, and uses the splicing mapping method to perform feature fusion processing to avoid mutual inundation to the greatest extent, which is feasible in analyzing model complexity After the performance, based on the Hadoop log HDFS data set, comparing CNN and Bi-LSTM to verify the superior CNN-BiLSTMassification effect of the CNN-BiLSTM model, reaching about 91% log anomaly detection accuracy, and reaching 94% detection accuracy on the WC98_day Web log data set. Verify the good generalization ability of the CNN-BiLSTM model, and finally analyze the importance of word embedding and fully connected layer structure in the CNN-BiLSTM model through ablation experiments.

Key words: log anomaly detection, deep learning, feature fusion, generalization ability, ablation experiment

摘要: 目前日志异常检测领域存在数据量大、故障和攻击威胁隐蔽性高、传统方法特征工程复杂等困难,研究卷积神经网络(CNN)、循环神经网络等迅速发展的深度学习技术,能够为解决这些问题提供新的思路。提出结合CNN和双向长短时记忆循环神经网络(Bi-LSTM)优势的CNN-BiLSTM深度学习模型,在考虑日志键显著时间序列特征基础上,兼顾日志参数的空间位置特征,通过拼接映射方法进行最大程度避免特征淹没的融合处理。在此基础上,分析模型复杂度,同时在Hadoop日志HDFS数据集上进行实验,对比支持向量机(SVM)、CNN和Bi-LSTM验证CNN-BiLSTM模型的分类效果。分析和实验结果表明,CNN-BiLSTM达到平均91%的日志异常检测准确度,并在WC98_day网络日志数据集上达到94%检测准确度,验证了模型良好的泛化能力,与SVM CNN和Bi-LSTM相比具有更优的检测性能。此外,通过消融实验表明,词嵌入和全连接层结构对于提升模型准确率具有重要作用。

关键词: 日志异常检测, 深度学习, 特征融合, 泛化能力, 消融实验

CLC Number: