Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2023, Vol. 49 ›› Issue (5): 173-180. doi: 10.19678/j.issn.1000-3428.0065478

• Cyberspace Security • Previous Articles     Next Articles

Research on Abnormal Traffic Detection in Industrial Control Network Based on CVAE-CatBoost

ZHANG Zixuan1,2, ZONG Xuejun1,2, HE Kan1,2, LIAN Lian1,2   

  1. 1. School of Information Engineering, Shenyang University of Chemical Technology, Shenyang 110142, China;
    2. Liaoning Province Petrochemical Industry Key Laboratory of Information Security, Shenyang 110142, China
  • Received:2022-08-09 Revised:2022-09-19 Published:2022-10-21

基于CVAE-CatBoost的工业控制网络异常流量检测研究

张子宣1,2, 宗学军1,2, 何戡1,2, 连莲1,2   

  1. 1. 沈阳化工大学 信息工程学院, 沈阳 110142;
    2. 辽宁省石油化工行业信息安全重点实验室, 沈阳 110142
  • 作者简介:张子宣(1996-),男,硕士研究生,主研方向为工业信息安全;宗学军(通信作者),教授;何戡,副教授、硕士;连莲,讲师、博士。
  • 基金资助:
    辽宁省“兴辽英才计划”(XLYC2002085)。

Abstract: For the detection of abnormal traffic in Industrial Control Network(ICN),a new abnormal traffic detection model based on Conditional Variational Autoencoder(CVAE) and the Categorical Features Gradient Boosting(CatBoost) algorithm is proposed to address the problems of unbalanced data distribution and low detection rate in existing models.CVAE uses label information as a constraint to control the category of generated samples.The CatBoost algorithm overcomes gradient bias by introducing unbiased estimation,improves prediction accuracy,and reduces risk of overfitting by adopting various tree growth modes.CVAE is used to enhance data,expand rare attack samples,and build balanced datasets with uniform distribution.The CatBoost algorithm is an anomaly traffic detection model which accurately identifies attack samples,such as Dos,Fuzzers,and outputs the classification results.The experimental results show that on the UNSW-NB15 dataset,after data enhancement using CVAE,CatBoost improves the F1 value by 25.16 percentage points on average,whereby the overall precision,recall,and F1 value,reach 87.85%,87.87%,and 87.86%,respectively;on the ZYELL_NCTU NetTraffic_1.0 dataset,after using CVAE to enhance the data,CatBoost improves the F1 value by 16.32% on average,and the overall precision,recall,and F1 value,reach 99.85%.The proposed model can effectively avoid data imbalance problems and has better detection performance and generalization ability than machine learning and deep learning algorithms,such as K-Nearest Neighbor(KNN),Random Forest(RF),and Convolution Neural Network(CNN).

Key words: Industrial Control Network(ICN), anomaly detection, data imbalance, Conditional Variational Autoencoder(CVAE), CatBoost algorithm

摘要: 为解决工业控制网络异常流量检测中存在的数据分布不均衡、现有模型检测率低的问题,提出一种基于条件变分自编码器(CVAE)和CatBoost算法的异常流量检测模型。CVAE引入标签信息作为约束条件,控制生成样本的类别。CatBoost算法通过引入无偏估计克服梯度偏差,提高预测的准确性,同时采用多种树的生长方式降低过拟合的风险。使用CVAE进行数据增强,扩充稀有攻击样本,构建分布均匀的平衡数据集。将CatBoost算法作为异常流量检测模型,对Dos、Fuzzers等攻击样本进行精确识别并输出分类结果。实验结果表明:在UNSW-NB15数据集上,利用CVAE进行数据增强后,CatBoost算法对少数类样本的F1值平均提升了25.16个百分点,整体精确率、召回率和F1值分别达到87.85%、87.87%和87.86%;在ZYELL_NCTU NetTraffic_1.0数据集上,利用CVAE进行数据增强后,CatBoost算法对少数类样本的F1值平均提升了16.32%,整体精确率、召回率和F1值均达到99.85%。该模型能够有效避免数据不均衡问题,相较K近邻、随机森林、卷积神经网络等机器学习和深度学习算法具有更好的检测性能和泛化能力。

关键词: 工业控制网络, 异常检测, 数据不平衡, 条件变分自编码器, CatBoost算法

CLC Number: