作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于机器学习与预训练模型的流量分析方法综述

  • 发布日期:2025-08-27

Review of Traffic Analysis Methods Based on Machine Learning and Pre-trained Model

  • Published:2025-08-27

摘要: 随着互联网的普及与应用程序的多样化,海量网络流量的精细化分类成为优化服务质量和分析用户行为模式的关键。对基于机器学习和基于预训练模型的网络流量分析方法进行概述,旨在通过多维度对比与分析,推动该领域研究的进一步发展。首先,解析了流量分类的完整流程,涵盖数据采集、预处理、特征提取过程,分析了数据平衡技术的实践价值。介绍了主流公共数据集的数据格式、规模及场景适配性等,从多角度进行对比分析,指出其存在的数据分布、特征冗余与时效性问题。其次,不仅在方法层面总结了传统算法在高维数据处理与实时性上的局限性,还重点通过实验结果对比分析,总结了流量分析领域应用预训练模型技术的趋势,包括基于Transformer的预训练模型BERT、与深度学习的融合模型和轻量化模型在流量分类中的突破性进展。最后,结合动态研究趋势,探讨了未来应用预训练模型存在的机遇和挑战,分析了其在计算成本与隐私保护方面的局限性,提出了未来的研究方向并对研究前景进行展望。

Abstract: With the popularization of the Internet and the diversification of applications, the fine-grained classification of massive network traffic has become the key to optimizing the quality of service and analyzing user behavior patterns. An overview of machine learning-based and pre-trained model-based network traffic analysis methods is presented, aiming to promote further research development in this field through multi-dimensional comparison and analysis. First, the complete flow of traffic classification is analyzed, covering data acquisition, preprocessing, and feature extraction processes, and the practical value of data balancing techniques is examined. The data format, scale, and scene suitability of mainstream public datasets are introduced, compared, and analyzed from multiple perspectives, pointing out their data distribution, feature redundancy, and timeliness problems. Secondly, not only the limitations of traditional algorithms in high-dimensional data processing and real-time are summarized at the methodological level, but also the trends of applying pre-trained model technology in the field of traffic analytics are summarized by focusing on the comparative analysis of the experimental results, including the breakthroughs of the pre-trained model BERT based on Transformer, the fusion model of big model and deep learning, and the breakthroughs of the lightweight big model in traffic classification. Finally, combined with the dynamic research trends, we discuss the opportunities and challenges in the future application of pre-trained models, analyze their limitations regarding computational cost and privacy protection, and propose future research directions and outlooks on research prospects.