作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (6): 53-67. doi: 10.19678/j.issn.1000-3428.0252301

• 前沿观点与综述 • 上一篇    下一篇

基于机器学习与预训练模型的流量分析方法综述

李学相, 郑永利, 张怡泽, 段鹏松*()   

  1. 郑州大学网络空间安全学院, 河南 郑州 450002
  • 收稿日期:2025-04-08 修回日期:2025-07-19 出版日期:2026-06-15 发布日期:2025-08-27
  • 通讯作者: 段鹏松
  • 作者简介:

    李学相, 男, 教授, 主研方向为云计算、物联网安全研究

    郑永利, 硕士研究生

    张怡泽, 硕士研究生

    段鹏松(CCF专业会员、通信作者), 副教授、博士

  • 基金资助:
    郑州市协同创新重大专项(20XTZX06013); 中国工程科技发展战略河南研究院战略咨询研究项目(2022HENYB03); 河南省科技攻关项目(232102210050); 河南省科技攻关项目(242102210060)

Review of Traffic Analysis Methods Based on Machine Learning and Pre-trained Model

LI Xuexiang, ZHENG Yongli, ZHANG Yize, DUAN Pengsong*()   

  1. School of Cyber Space Security, Zhengzhou University, Zhengzhou 450002, Henan, China
  • Received:2025-04-08 Revised:2025-07-19 Online:2026-06-15 Published:2025-08-27
  • Contact: DUAN Pengsong

摘要:

随着互联网的普及与应用程序的多样化, 海量网络流量的精细化分类成为优化服务质量和分析用户行为模式的关键。对基于机器学习(ML)和基于预训练模型的网络流量分析方法进行概述, 旨在通过多维度对比与分析, 推动该领域研究的进一步发展。首先, 解析了流量分类的完整流程, 涵盖了数据采集、预处理、特征提取过程, 分析了数据平衡技术的实践价值, 同时介绍了主流公共数据集的数据格式、规模及场景适配性等, 从多角度进行对比分析, 指出其存在的数据分布、特征冗余与时效性问题。然后, 不仅在方法层面总结了传统算法在高维数据处理与实时性上的局限性, 还重点通过实验结果对比分析, 总结了流量分析领域应用预训练模型技术的趋势, 包括基于Transformer的预训练模型、与深度学习(DL)的融合模型和轻量化模型在流量分类中的突破性进展。最后, 结合动态研究趋势, 探讨了未来应用预训练模型存在的机遇和挑战, 分析了其在计算成本与隐私保护方面的局限性, 提出了未来的研究方向并对研究前景进行展望。

关键词: 流量分析, 机器学习, 深度学习, 预训练模型, 特征提取, 联邦学习

Abstract:

With the popularization of the Internet and the diversification of applications, fine-grained classification of massive network traffic has become key to optimizing quality of service and analyzing user behavior patterns. This paper presents an overview of Machine Learning (ML)-based and pretrained model-based network traffic analysis methods to promote further research and development in this field through multidimensional comparison and analysis. First, the complete traffic classification pipeline is deconstructed, covering data acquisition, preprocessing, and feature extraction, and the practical value of data balancing techniques is examined. The data format, scale, and scene suitability of mainstream public datasets are introduced, compared, and analyzed from multiple perspectives, highlighting their data distribution, feature redundancy, and timeliness problems. Second, it summarizes the limitations of traditional algorithms in handling high-dimensional data and meeting real-time requirements, and outlines the trend of applying pretrained models in traffic analytics, through a focused comparative analysis of experimental results. This review includes breakthroughs in Transformer-based pretrained models, their fusion with Deep Learning (DL) models, and advances in lightweight pretrained models for traffic classification. Finally, by considering dynamic research trends, the opportunities and challenges in future applications of pretrained models are discussed, and their limitations in terms of computational cost and privacy protection are analyzed.

Key words: traffic analysis, Machine Learning (ML), Deep Learning (DL), pre-trained model, feature extraction, Federated Learning (FL)