基于机器学习与预训练模型的流量分析方法综述

doi:10.19678/j.issn.1000-3428.0252301

摘要/Abstract

摘要： 随着互联网的普及与应用程序的多样化，海量网络流量的精细化分类成为优化服务质量和分析用户行为模式的关键。对基于机器学习和基于预训练模型的网络流量分析方法进行概述，旨在通过多维度对比与分析，推动该领域研究的进一步发展。首先，解析了流量分类的完整流程，涵盖数据采集、预处理、特征提取过程，分析了数据平衡技术的实践价值。介绍了主流公共数据集的数据格式、规模及场景适配性等，从多角度进行对比分析，指出其存在的数据分布、特征冗余与时效性问题。其次，不仅在方法层面总结了传统算法在高维数据处理与实时性上的局限性，还重点通过实验结果对比分析，总结了流量分析领域应用预训练模型技术的趋势，包括基于Transformer的预训练模型BERT、与深度学习的融合模型和轻量化模型在流量分类中的突破性进展。最后，结合动态研究趋势，探讨了未来应用预训练模型存在的机遇和挑战，分析了其在计算成本与隐私保护方面的局限性，提出了未来的研究方向并对研究前景进行展望。

Abstract: With the popularization of the Internet and the diversification of applications, the fine-grained classification of massive network traffic has become the key to optimizing the quality of service and analyzing user behavior patterns. An overview of machine learning-based and pre-trained model-based network traffic analysis methods is presented, aiming to promote further research development in this field through multi-dimensional comparison and analysis. First, the complete flow of traffic classification is analyzed, covering data acquisition, preprocessing, and feature extraction processes, and the practical value of data balancing techniques is examined. The data format, scale, and scene suitability of mainstream public datasets are introduced, compared, and analyzed from multiple perspectives, pointing out their data distribution, feature redundancy, and timeliness problems. Secondly, not only the limitations of traditional algorithms in high-dimensional data processing and real-time are summarized at the methodological level, but also the trends of applying pre-trained model technology in the field of traffic analytics are summarized by focusing on the comparative analysis of the experimental results, including the breakthroughs of the pre-trained model BERT based on Transformer, the fusion model of big model and deep learning, and the breakthroughs of the lightweight big model in traffic classification. Finally, combined with the dynamic research trends, we discuss the opportunities and challenges in the future application of pre-trained models, analyze their limitations regarding computational cost and privacy protection, and propose future research directions and outlooks on research prospects.

李学相, 郑永利, 张怡泽, 段鹏松. 基于机器学习与预训练模型的流量分析方法综述[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252301.

LI Xuexiang, ZHENG Yongli, ZHANG Yize, DUAN Pengsong. Review of Traffic Analysis Methods Based on Machine Learning and Pre-trained Model[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252301.

参考文献

[1] LIN X J, XIONG G, GOU G P, et al. Respond to change with constancy: instruction-tuning with LLM for Non-I.I.D. Network traffic classification[J]. IEEE Transactions on Information Forensics and Security, 2025, 20: 5758-5773.
[2] SEWAK M, SAHAY S K, RATHORE H. Deep reinforcement learning in the advanced cybersecurity threat detection and protection[J]. Information Systems Frontiers, 2023, 25(2): 589-611.
[3] CARVALHO M, SOARES D, MACEDO D F. Qoe estimation across different cloud gaming services using transfer learning[J]. IEEE Transactions on Network and Service Management, 2024, 21(6): 5935-5946.
[4] KUMAR, R., KUMAR, R., & NIGAM, M. J. Alleviation of delay in tele-surgical operations using Markov approach-based smith predictor[J]. International Journal of Business Analytics (IJBAN), 2022, 9(3): 1-14.
[5] ZHAO P, DING Z, WANG M, et al. Behavior analysis for electronic commerce trading systems: a survey[J]. IEEE Access, 2019, 7: 108703-10872.
[6] HUAN S, ZHANG X, SHANG W, et al. T-shaped CAN feature integration with lightweight deep learning model for in-vehicle network intrusion detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2024, 25(2): 21193-21196.
[7] 梁晓萌, 严明, 吴杰. 基于人工蜂群算法的Tor流量在线识别方法[J]. 计算机工程, 2021, 47(11): 129-135,143. LIANG X M, YAN M, WU J. Tor traffic online identification method based on artificial bee colony algorithm[J]. Computer Engineering. 2021, 47(11): 129-135, 143.
[8] 席荣康, 蔡满春, 芦天亮. 基于数据增强与流数据处理的Tor流量分析模型[J]. 计算机工程, 2023, 49(3): 177-184. XI R K, CAI M C, LU T L. Tor traffic analysis model based on data enhancement and stream data processing[J]. Computer Engineering. 2023, 49(3):177-184.
[9] PARK J T, SHIN C Y, BAEK U J, et al. User behavior detection using multi-modal signatures of encrypted network traffic[J]. IEEE Access, 2023, 11: 97353-97372.
[10] VARGAS ANAMURO C, BLANC A, LAGRANGE X. Statistical analysis and characterization of signaling and user traffic of a commercial multi-band LTE system[J]. Telecommunication Systems, 2024, 87(2): 437-45.
[11] DAINOTTI A, PESCAPE A, CLAFFY K C. Issues and future directions in traffic classification[J]. IEEE network, 2012, 26(1): 35-40.
[12] SUN G L, XUE Y B, DONG Y F, et al. An novel hybrid method for effectively classifying encrypted traffic[C]//2010 IEEE Global Telecommunications Conference GLOBECOM 2010. Miami, FL, USA: IEEE, 2010: 1-5.
[13] VELAN P, ČERMÁK M, ČELEDA P, et al. A survey of methods for encrypted traffic classification and analysis[J]. International Journal of Network Management, 2015, 25(5): 355-374.
[14] ARNDT D J, ZINCIR-HEYWOOD A N. A comparison of three machine learning techniques for encrypted network traffic analysis[C]//2011 IEEE symposium on computational intelligence for security and defense applications (CISDA). Paris, France: IEEE, 2011: 107-114.
[15] YAO Z J, GE J G, WU Y L, et al. Encrypted traffic classification based on Gaussian mixture models and hidden Markov models[J]. Journal of Network and Computer Applications, 2020, 166: 10271.
[16] WU S, WANG H, WANG Y, et al. Technology analysis of network anomalous behavior detection based on machine learning[C]//2022 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE). Xi’an, China: IEEE, 2022: 730-737.
[17] SHENG C, ZHOU W, HAN Q L, et al. Network traffic fingerprinting for IoT device identification: a survey[J]. IEEE Transactions on Industrial Informatics, 2025.
[18] YAN X, HE L, XU Y, et al. High-speed encrypted traffic classification by using payload features[J]. Digital Communications and Networks, 2025, 11(2): 412-423.
[19] VU L, VAN TRA D, NGUYEN Q U. Learning from imbalanced data for encrypted traffic identification problem[C]//Proceedings of the 7th Symposium on Information and Communication Technology. New York, NY, USA: Association for Computing Machinery, 2016: 147-152.
[20] SARANYA N, HALDORAI A. Efficient intrusion detection system data preprocessing using deep sparse autoencoder with differential evolution[J]. IET Information Security, 2024, 2024(1): 9937803.
[21] WANG P, LI S H, YE F, et al. Packet-CGAN: exploratory study of class imbalance for encrypted traffic classification using CGAN\[C\]//ICC 2020-2020 IEEE International Conference on Communications (ICC). IEEE, 2020: 1-7.
[22] ZHANG Z, ZHOU Y C, TIAN H P. Network intrusion detection based on spatial features and generative adversarial networks[J]. Journal of Zhengzhou University (Engineering Science), 2024, 45(6): 40-47.
[23] 孙文茜. 基于流量特征的网络流量分类算法研究[D]. 南京: 南京信息工程大学, 2024. SUN W Q. Research on network traffic classification algorithm based on traffic characteristics[D]. Nanjing: Nanjing University of Information Engineering, 2024.
[24] BRO
WNLEE J. How to use power transforms for machine learning[J]. Machine Learning Mastery [Internet], 2020. [25] WEISBERG S. Yeo-Johnson power transformations[J]. Department of Applied Statistics, University of Minnesota. Retrieved June, 2001, 1: 2003.
[26] MEI H T, CHENG G, ZHU Y L. Survey on tor passive traffic analysis[J]. Journal of Software, 1-36.
[27] LIU Y, ZHANG W, ZHOU Y. Classification of TLS encrypted traffic based on continuous forward and backward data transmission features[J]. Communications Technology, 2024, 57(09): 955-964.
[28] DO N Q, SELAMAT A, LIM K C, et al. An improved ensemble deep learning model based on CNN for malicious website detection[C]//International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. Cham: Springer International Publishing, 2022: 497-504.
[29] ZHANG G H, WANG Z Y, CAI M W. Abnormal traffic detection in the internet of things based on imbalanced data[J]. Journal of Information Security Research, 2024, 10(11): 1012-1019.
[30] 赵广龙. 基于深度学习的轻量化网络流量分类方法研究[D]. 哈尔滨: 黑龙江大学, 2023. ZHAO G L. Research on lightweight network traffic classification method based on deep learning[D]. Harbin: Heilongjiang University, 2023.
[31] LIN X J, XIONG G, GOU G P, et al. ET-BERT: a contextualized datagram representation with pre-training transformers for encrypted traffic classification[C]//Proceedings of the ACM Web Conference 2022 (WWW '22). Association for Computing Machinery, New York, NY, USA, 633–642. [32] WANG Z X, LI Z Y, FU M Y, et al. Network traffic classification based on federated semi-supervised learning[J]. Journal of Systems Architecture, 2024, 149: 103091. [33] REVATHI S, MALATHI A. A detailed analysis on NSL-KDD dataset using various machine learning techniques for intrusion detection[J]. International Journal of Engineering Research & Technology (IJERT), 2013, 2(12): 1848-1853. [34] TAVALLAEE M, BAGHERI E, LU W, et al. A detailed analysis of the KDD CUP 99 data set[C]//2009 IEEE symposium on computational intelligence for security and defense applications. Ottawa, ON, Canada: IEEE, 2009: 1-6. [35] CREECH G, HU J. Generation of a new IDS test dataset: time to retire the KDD collection[C]//2013 IEEE wireless communications and networking conference (WCNC). Shanghai, China: IEEE, 2013: 4487-4492. [36] HADDADI F, ZINCIR-HEYWOOD A N. Data confirmation for botnet traffic analysis[C]//Foundations and Practice of Security: 7th International Symposium, FPS 2014, Montreal, QC, Canada, November 3-5, 2014. Revised Selected Papers 7. Springer International Publishing, 2015: 329-336. [37] WANG W, ZHU M, ZENG X, et al. Malware traffic classification using convolutional neural network for representation learning[C]//2017 International conference on information networking (ICOIN). Da Nang, Vietnam: IEEE, 2017: 712-717. [38] BERKAY CELIK Z, WALLS R J, MCDANIEL P, et al. Malware traffic detection using tamper resistant features[J]. IEEE, 2015: 330-335. [39] LASHKARI A H, GIL G D, MAMUN M S I, et al. Characterization of Tor traffic using time based features[C]//International Conference on Information Systems Security and Privacy. 2017, 2: 253-262. [40] https://gitcode.com/open-source-toolkit/8c98d. [41] https://www.traffic.comics.unina.it/mirage/mirage-2019.html. [42] https://research.unsw.edu.au/projects/toniot-datasets. [43] https://www.unb.ca/cic/datasets/ids-2018.html. [44] CARLOS PINTO NETO E, TASLIMASA H, DADKHAH S, et al. CICIoV2024: advancing realistic IDS approaches against DoS and spoofing attack in IoV CAN bus[J]. Internet of Things, 2024, 26. [45] www.ing.unibs.it/ntw/tools/traces/index.php. [46] AULD T, MOORE A W, GULL S F. Bayesian neural networks for internet traffic classification[J]. IEEE Transactions on neural networks, 2007, 18(1): 223-239. [47] TANG B, HE H B, BAGGENSTOSS P M, et al. A Bayesian classification approach using class-specific features for text categorization[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1602-1606. [48] GUARINO I, NASCITA A, ACETO G, et al. Mobile network traffic prediction using high order Markov chains trained at multiple granularity[C]//2021 IEEE 6th International Forum on Research and Technology for Society and Industry (RTSI). Naples, Italy: IEEE, 2021: 394-399. [49] NGUYEN THUY T.T., ARMITAGE G. A survey of techniques for internet traffic classification using machine learning[J]. IEEE communications surveys & tutorials, 2008, 10(4): 56-76. [50] SHI Y, BISWAS S. Protocol-independent identification of encrypted video traffic sources using traffic analysis[C]//2016 IEEE International Conference on Communications (ICC). Kuala Lumpur, Malaysia: IEEE, 2016: 1-6. [51] DUBIN R, DVIR A, PELE O, et al. I know what you saw last minute—encrypted http adaptive video streaming title classification[J]. IEEE transactions on information forensics and security, 2017, 12(12): 3039-3049. [52] DONG S. Multi class SVM algorithm with active learning for network traffic classification[J]. Expert Systems with Applications, 2021, 176: 114885. [53] RAMRAJ S, USHA G. Unsupervised feature learning methodology for tree based classifier and SVM to classify encrypted traffic[J]. International Journal of Advanced Computer Science and Applications, 2023, 14(2). [54] 许家钰. 基于k-means算法的WiFi用户行为分析系统设计与实现[D]. 北京: 北京邮电大学, 2019. XU J Y. Design and implementation of WiFi user behavior analysis system based on K-means algorithm[D]. Beijing: Beijing University of Posts and Telecommunications, 2019. [55] NOORBEHBAHANI F, MANSOORI S. A new semi-supervised method for network traffic classification based on X-means clustering and label propagation[C]//2018 8th International Conference on Computer and Knowledge Engineering (ICCKE). Mashhad, Iran: IEEE, 2018: 120-125. [56] PELLEG D, MOORE A. X-means: extending K-means with efficient estimation of the number of clusters[C]//ICML’00. Citeseer, 2000: 727-734. [57] DU Y, HE M, WANG X. A clustering-based approach for classifying data streams using graph matching[J]. Journal of Big Data, 2025, 12(1): 37. [58] LIU J, ZHANG P, SUN Y, et al. Network traffic classification method of power system based on DNN and K-means[C]//International Symposium on Artificial Intelligence and Robotics. Singapore: Springer Nature Singapore, 2022: 303-317. [59] 王旭仁, 马慧珍, 冯安然, 等. 基于信息增益与主成分分析的网络入侵检测方法[J]. 计算机工程, 2019, 45(6): 175-180. WANG X, MA H, FENG A, et al. Network intrusion detection method based on information gain and principal components analysis[J]. Computer Engineering, 2019, 45(6): 175-180. [60] CHEN L, WANG Q J, SONG Y Q, et al. Security is readily to interpret: quantitative feature analysis for botnet encrypted malicious traffic[C]//2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). Torino, Italy: IEEE, 2023: 753-758. [61] JISI C, ROH B, ALI J. An effective scheme for classifying imbalanced traffic in SD-IoT, leveraging XGBoost and active learning[J]. Computer Networks, 2025, 257: 110939. [62] 周林勇, 谢晓尧, 刘志杰, 等. 卷积神经网络池化方法研究[J]. 计算机工程, 2019, 45(4): 211-216. ZHOU L Y, XIE X Y, LIU Z J, et al. Research on pooling method of convolution neural network[J]. Computer Engineering, 2019, 45(4): 211-216. [63] LOTFOLLAHI M, JAFARI SIAVOSHANI M, SHIRALI HOSSEIN ZADE R, et al. Deep packet: a novel approach for encrypted traffic classification using deep learning[J]. Soft Computing, 2020, 24(3): 1999-2012. [64] YANG L X, FINAMORE A, JUN F, et al. Deep learning and Zero-Day traffic classification: lessons learned from a commercial-grade dataset[J]. IEEE Transactions on Network and Service Management, 2021, 18(4): 4103-4118. [65] WANG M N, ZHENG K F, NING X Y, et al. CENTIME: a direct comprehensive traffic features extraction for encrypted traffic classification[C]//2021 IEEE 6th International Conference on Computer and Communication Systems (ICCCS). Chengdu, China: IEEE, 2021. [66] WEI M J, LI F, LIU Y Z, et al. An intrusion detection method for Internet of vehicles based on improved WGAN-GP and ResNet[J]. Journal of Zhengzhou University (Engineering Science), 2024, 45(4): 30-37. [67] MIRZA M, OSINDERO S. Conditional generative adversarial nets[J]. arXiv preprint arXiv:1411.1784, 2014. [68] ZHAI J, LIN P, CUI Y, et al. GraphCWGAN-GP: a novel data augmenting approach for imbalanced encrypted traffic classification[J]. CMES-Computer Modeling in Engineering & Sciences, 2023, 136(2). [69] LIN M, CHEN Q, YAN S. Network in network[J]. arXiv preprint arXiv:1312.4400, 2013. [70] SHAMELI R, RAJKUMAR S. High-speed threat detection in 5G SDN with particle swarm optimizer integrated GRU-driven generative adversarial network[J]. Scientific Reports, 2025, 15(1): 10025. [71] ZENDEHDEL M, DEHAKITOROGHI A, HAMIDZADEH J. MDNET: a novel neural network based on CNN and Fuzzy Rough Set with adaptive parameters for intrusion detection in the internet of things[J]. International Journal of Engineering, Transactions B: Applications, 2025, 38(12): 2965-2993. [72] LIANG X, XING H, HOU T. Network intrusion detection method based on CGAN and CNN-BiLSTM[C]//2023 IEEE 16th International Conference on Electronic Measurement & Instruments (ICEMI). Harbin, China: IEEE, 2023: 396-400. [73] MO L, QI X, LIU L. Network traffic grant classification based on 1DCNN-TCN-GRU hybrid model[J]. Applied Intelligence, 2024, 54(6): 4834-4847. [74] CHEN S Y, MA H L, ZHANG J H. Encrypted traffic classification of CNN and Bi-GRU based on self-attention[J]. Computer Science, 2024, 51(8): 396-402. [75] BHATTI U A, TANG H, WU G, et al. Deep learning with graph convolutional networks: an overview and latest applications in computational intelligence[J]. International Journal of Intelligent Systems, 2023, 2023(1): 8342104. [76] FENG J, SHEN L, CHEN Z, et al. HGDetector: a hybrid android malware detection method using network traffic and function call graph[J]. Alexandria Engineering Journal, 2025, 114: 30-45. [77] XU S, HAN J, LIU Y, et al. Few-shot traffic classification based on autoencoder and deep graph convolutional networks[J]. Scientific Reports, 2025, 15(1): 8995. [78] LIU M, YANG Q, WANG W, et al. TB-Graph: enhancing encrypted malicious traffic classification through relational graph attention networks[J]. Computers, Materials & Continua, 2025, 82(2). [79] ZHAO G, LI L, HE H, et al. LGSMOTE-IDS: Line Graph based Weighted-Distance SMOTE for imbalanced network traffic detection[J]. Expert Systems with Applications, 2025: 127645. [80] ZHANG H Z, YU L, XIAO X, et al. TFE-GNN: a temporal fusion encoder using graph neural networks for fine-grained encrypted traffic classification[C]//Proceedings of the ACM Web Conference 2023. New York, NY, USA: Association for Computing Machinery, 2023: 2066-2075. [81] HE H Y, YANG Z G, CHEN X N. PERT: payload encoding representation from transformer for encrypted traffic classification[C]//2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K). Ha Noi, Vietnam: IEEE, 2020: 1-8. [82] YU J, CHOI Y, KOO K, et al. A novel approach for application classification with encrypted traffic using BERT and packet headers[J]. Computer Networks, 2024, 254: 110747. [83] SHI Z L, LUKTARHAN N, SONG Y Y, et al. BFCN: a novel classification method of encrypted traffic based on BERT and CNN[J]. Electronics, 2023, 12(3): 516. [84] MA X T, LIU T, HU N, et al. Bi-ETC: a bidirectional encrypted traffic classification model based on BERT and BiLSTM[C]//2023 8th International Conference on Data Science in Cyberspace (DSC). Hefei, China: IEEE, 2023: 197-204. [85] FARRUKH Y A, WALI S, KHAN I, et al. XG-NID: dual-modality network intrusion detection using a heterogeneous graph neural network and large language model[J]. Expert Systems with Applications, 2025: 128089. [86] LU H, ZHANG R, KONG T. Analyzing decentralized applications traffic: a multimodal approach based on GNN and BERT[C]//International Conference on Information Security and Cryptology. Singapore: Springer Nature Singapore, 2024: 235-254. [87] MA J, LI X, LUO H, et al. NetKD: towards resource-efficient encrypted traffic classification using knowledge distillation for language models[C]//2024 27th International Conference on Computer Supported Cooperative Work in Design (CSCWD). Tianjin, China: IEEE, 2024: 3011-3016. [88] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020. [89] YANG L, GUO S, LIU D, et al. ConViTML: a convolutional vision transformer-based meta-learning framework for real-time edge network traffic classification[J]. IEEE Transactions on Network and Service Management, 2024, 21(3): 3344-3357. [90] MAJEED U, HASSAN S S, HONG C S. Cross-silo model-based secure federated transfer learning for flow-based traffic classification[C]//2021 international conference on information networking (ICOIN). Island, Korea (South): IEEE, 2021: 588-593. [91] PEKAR A, MAKARA L A, BICZOK G. Incremental federated learning for traffic flow classification in heterogeneous data scenarios[J]. Neural Computing and Applications, 2024, 36(32): 20401-20424. [92] MUN H, LEE Y. Internet traffic classification with federated learning[J]. Electronics, 2020, 10(1): 27. [93] MAO W, YU B, ZHANG C, et al. FedKT: federated learning with knowledge transfer for non-IID data[J]. Pattern Recognition, 2025, 159: 111143. [94] JIANG W, MU J, HAN H, et al. Federated learning‐based mobile traffic prediction in satellite‐terrestrial integrated networks[J]. Software: Practice and Experience, 2025, 55(4): 613-628. [95] TANG Z Z, ZENG X W, CHEN J, et al. A review of network traffic analysis based on machine learning[J]. Network New Media Technology, 2020, 9(05): 1-8. [96] VICENZI J C, KOROL G, JORDAN M G, et al. Exploiting virtual layers and pruning for FPGA-based adaptive traffic classification[C]//2024 27th Euromicro Conference on Digital System Design (DSD). IEEE, 2024: 194-201. [97] XU Y, CAO J, SONG K, et al. FastTraffic: a lightweight method for encrypted traffic fast classification[J]. Computer Networks, 2023, 235: 109965. [98] 张琬茜. 面向异构设备的高效网络流量分类技术的研究[D]. 大连: 大连理工大学, 2020. ZHANG W X. Research on efficient network traffic classification technology for heterogeneous devices[D]. Dalian: Dalian University of Technology, 2020. [99] 张磊. 基于深度学习的物联网恶意流量识别技术研究[D]. 济南: 齐鲁工业大学, 2024. ZHANG L. Research on malicious traffic identification technology in Internet of things based on deep learning[D]. Jinan: Qilu University of Technology, 2024. [100] IZADI M, SAFAYANI M, MIRZAEI A. Knowledge distillation on spatial-temporal graph convolutional network for traffic prediction[J]. International Journal of Computers and Applications, 2025, 47(1): 45-56. [101] 王军, 冯孙铖, 程勇. 深度学习的轻量化神经网络结构研究综述[J]. 计算机工程, 2021, 47(8): 1-13. WANG J, FENG S, CHENG Y. Survey of research on lightweight neural network structures for deep learning[J]. Computer Engineering, 2021, 47(8): 1-13.

选择文件类型/文献管理软件名称

选择包含的内容