作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (8): 227-237. doi: 10.19678/j.issn.1000-3428.0069281

• 网络空间安全 • 上一篇    下一篇

不平衡数据集下的数据中心网络流量异常检测

王光明*(), 李冬青, 蒋从锋   

  1. 杭州电子科技大学云技术研究中心, 浙江 杭州 310018
  • 收稿日期:2024-01-22 修回日期:2024-04-07 出版日期:2025-08-15 发布日期:2024-08-16
  • 通讯作者: 王光明
  • 基金资助:
    国家自然科学基金面上项目(61972118)

Network Traffic Anomaly Detection for Data Centers in Imbalanced Datasets

WANG Guangming*(), LI Dongqing, JIANG Congfeng   

  1. Cloud Technology Research Center, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
  • Received:2024-01-22 Revised:2024-04-07 Online:2025-08-15 Published:2024-08-16
  • Contact: WANG Guangming

摘要:

数据中心作为信息化时代的重要基础设施, 承载着各类关键信息服务。目前, 数据中心是网络攻击的主要攻击目标。为了提高网络安全, 提出数据中心网络流量异常检测方法。研究内容包括特征选择、不平衡数据集分类和异常流量检测。首先, 提出了一种处理不平衡数据集的分类方法, 通过基于集成的特征选择和混合采样算法提高分类性能; 其次, 引入基于随机森林(RF)和LightGBM的流量异常检测方法, 充分利用它们在处理不平衡数据和噪声抵抗方面的优势。在CSE-CIC-IDS2018公开数据集上进行验证, 实验结果表明, 所提方法具有较高的精确率和召回率, 在15种流量类型中有9种类型的分类精确率都高于90%, 并且有13种类型的分类精确率都在74%以上, 对提高数据中心安全、保障服务质量和改善网络流量异常检测具有重要意义。

关键词: 数据中心, 网络流量, 异常检测, 不平衡数据集, 集成学习

Abstract:

As an important infrastructure in the information age, data centers provide all types of key information services. Currently, data centers face high levels of network attacks and are the main targets of network attacks. To improve network security, this study focuses on an anomaly detection method for data center network traffic. This study includes feature selection, dataset distribution balance, and abnormal traffic detection. First, a classification method for imbalanced datasets is proposed, and the classification performance is improved using feature engineering and a mixed sampling algorithm. Second, traffic anomaly detection methods based on Random Forest (RF) and Light Gradient Boosting Machine (LightGBM) are introduced to fully utilize their advantages in processing imbalanced data and noise resistance. The experiment uses the CSE-CIC-IDS2018 public dataset for verification. The results show that the proposed algorithm has a high precision and recall; among the 15 traffic types, the classification precision of 9 types is higher than 90%, and the classification precision of 13 types is higher than 74%. This study is significant for improving data center security, service quality, and network traffic anomaly detection. It not only provides an effective means to address escalating network threats but also makes a positive contribution to the stable operation of data centers and the reliability of information services.

Key words: data center, network traffic, anomaly detection, imbalanced dataset, ensemble learning