LI Xuexiang, ZHENG Yongli, ZHANG Yize, DUAN Pengsong
Accepted: 2025-08-27
With the popularization of the Internet and the diversification of applications, the fine-grained classification of massive network traffic has become the key to optimizing the quality of service and analyzing user behavior patterns. An overview of machine learning-based and pre-trained model-based network traffic analysis methods is presented, aiming to promote further research development in this field through multi-dimensional comparison and analysis. First, the complete flow of traffic classification is analyzed, covering data acquisition, preprocessing, and feature extraction processes, and the practical value of data balancing techniques is examined. The data format, scale, and scene suitability of mainstream public datasets are introduced, compared, and analyzed from multiple perspectives, pointing out their data distribution, feature redundancy, and timeliness problems. Secondly, not only the limitations of traditional algorithms in high-dimensional data processing and real-time are summarized at the methodological level, but also the trends of applying pre-trained model technology in the field of traffic analytics are summarized by focusing on the comparative analysis of the experimental results, including the breakthroughs of the pre-trained model BERT based on Transformer, the fusion model of big model and deep learning, and the breakthroughs of the lightweight big model in traffic classification. Finally, combined with the dynamic research trends, we discuss the opportunities and challenges in the future application of pre-trained models, analyze their limitations regarding computational cost and privacy protection, and propose future research directions and outlooks on research prospects.