基于多表征融合的物联网恶意加密流量分类

doi:10.19678/j.issn.1000-3428.0252127

摘要/Abstract

摘要： 恶意加密流量分类领域模型通过增加流量特征维度扩展学习判别表征的丰富性，但仍然存在选择模型与恶意加密流量数据特征不匹配与特征选择不充分的问题，同时缺乏对加密流量数据特征的讨论研究。为此，针对物联网恶意加密流量分类领域提出基于多表征融合的分类模型，一方面使用抽象表征学习模型学习流量会话的数据包级字节关联表征与会话统计表征，另一方面使用明文表征学习模型学习未加密明文的会话连接表征，最后根据抽象表征学习模型对分类结果的置信分数融合两个模型的分类结果获得最终的恶意流量分类结果。为验证模型的先进性，与其他7种基于不同方法的基准模型表现进行比较，模型在F1值指标上达到0.7694的结果，相较其他现有基准模型指标均有大幅提升。同时为讨论验证各个模块与流量表征学习的适配性、选择特征所含判别表征之间的互补性，生成10种基于不同输入与不同模型架构的变体模型进行比较，结果表明该模型具有更优的检测性能，证明模型架构的适配与表征之间的互补。

Abstract: In the field of malicious encrypted traffic classification, algorithms enhance the richness of learning discriminative representations by increasing the dimensionality of traffic features. However, challenges persist, such as the mismatch between selected models and the characteristics of malicious encrypted traffic data, insufficient feature selection, and a lack of in-depth discussion on the characteristics of encrypted traffic data. To address these issues, a classification method based on multi-representation fusion is proposed for the domain of IoT malicious encrypted traffic classification. On one hand, an abstract representation learning module is used to learn packet-level byte association representations and session statistical representations of traffic sessions. On the other hand, a plaintext representation learning module is employed to learn session connection representations of unencrypted plaintext. Finally, the classification results of the two modules are fused based on the confidence scores of the abstract representation learning module to obtain the final malicious traffic classification result. To validate the method's advancement, its performance is compared with 7 benchmark methods based on different methods. The method achieves an F1 score of 0.7694, significantly outperforming other existing benchmark methods. Additionally, to discuss and validate the adaptability of each module to traffic representation learning and the complementarity between the discriminative representations contained in the selected features, 10 variant models based on different inputs and model architectures are generated and compared. The results demonstrate that the proposed method has superior detection performance, proving the adaptability of the model architecture and the complementarity between the representations.

王纶羽, 顾益军. 基于多表征融合的物联网恶意加密流量分类[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252127.

WANG Guanyu, GU Yijun. Malicious Encrypted Traffic Classification in IoT Based on Multi-Representation Fusion[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252127.

参考文献

[1] WANG Z, FOK K W, THING V L L. Machine learni ng for encrypted malicious traffic detection: Approach es, datasets and comparative study[J]. Computers & S ecurity, 2022, 113: 102542.
[2] ISINGIZWE D F, WANG M, LIU W, et al. Analyzin g learning-based encrypted malware traffic classificatio n with automl[C]//2021 IEEE 21st International Confer ence on Communication Technology (ICCT). Tianjin, China: IEEE, 2021: 313-322.
[3] LIN K, XU X, GAO H. TSCRNN: A novel classifica tion scheme of encrypted traffic based on flow spatiot emporal features for efficient management of IIoT[J]. Computer Networks, 2021, 190: 107974.
[4] PAPADOGIANNAKI E, IOANNID S. A survey on en crypted network traffic analysis applications, technique s, and countermeasures[J]. ACM Computing Surveys (CSUR), 2021, 54(6): 1-35.
[5] 陈子涵, 程光, 徐子恒, 等. 互联网加密流量检测、分类与识别研究综述[J]. 计算机学报, 2023, 46(05):1060-10 85. CHEN Zihan, CHENG Guang, XU Ziheng, et al. A s urvey on inter-net encrypted traffic detection, classifica tion and identification[J]. Chinese Journal of Computer s, 2023, 46(5): 1060-1085.
[6] 康鹏, 杨文忠, 马红桥. TLS 协议恶意加密流量识别研究综述[J]. 计算机工程与应用, 2022, 58(12):1-11. KANG Peng, YANG Wenzhong, MA Hongqiao. TLS malicious encrypted traffic identification research[J]. C omputer Engineering and Applications, 2022, 58(12):1 11.
[7] 侯剑, 鲁辉, 刘方爱, 等. 加密恶意流量检测及对抗综述[J]. 软件学报, 2024, 35(01): 333-355. HOU Jian, LU Hui, LIU Fangai, et al. Overview of e ncrypted malicious traffic detection and countermeasur es[J]. Journal of Software, 2024, 35(01): 333-355.
[8] 付钰, 刘涛涛, 王坤, 等. 基于机器学习的加密流量分类研究综述[J]. 通信学报, 2025, 46(01): 167-191. FU Yu, LIU Taotao, WANG Kun, et al. Survey of re search on encrypted traffic classification based on mac hine learning[J]. Journal on Communications, 2025, 20 25, 46(01): 167-191.
[9] SHEKHAWA A S, DI Troia F, STAMP M. Feature an alysis of encrypted malicious traffic[J]. Expert Systems with Applications, 2019, 125: 130-141.
[10] ANDERSON B, MCGREW D. Identifying encrypted malware traffic with contextual flow data[C]//Proceedin gs of the 2016 ACM workshop on artificial intelligen ce and security. New York, USA:ACM, 2016: 35-46.
[11] WENG Z, CHEN T, ZHU T, et al. TLSmell: Direct i dentification on malicious https encryption traffic with simple connection-specific indicators[J]. Computer Sys tems Science & Engineering, 2021, 37(1): 105-119.
[12] YU T, ZOU F T, LI L, et al. An encrypted malicious traffic detection system based on neural network[C]// 2019 Inter-national Conference on Cyber-Enabled Distr ibuted Computing and Knowledge Discovery (CyberC). Guilin, China: IEEE, 2019: 62-70.
[13] CHEN L, GAO S, LIU B, et al. THS-IDPC: A three stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted mali cious traffic detection[J]. The Journal of Supercomputi ng, 2020, 76: 7489-7518.
[14] LIU J, ZENG Y, SHI J, et al. Maldetect: A structure of encrypted malware traffic detection[J]. Computers, Materials and Continua, 2019, 60(2): 721-739. [15] WANG W, ZHU M, ZENG X, et al. Malware traffic classification using convolutional neural network for re presentation learning[C]//2017 International conference on in-formation networking (ICOIN). Da Nang, Vietna m: IEEE, 2017: 712-717.
[16] ZHU S, XU X, GAO H, et al. CMTSNN: A deep le arning model for multi-classification of abnormal and encrypted traffic of Internet of Things[J]. IEEE Intern et of Things Journal, 2023, 10(13): 11773-11791.
[17] 蒋彤彤, 尹魏昕, 蔡冰, 等. 基于层次时空特征与多头注意力的恶意加密流量识别[J]. 计算机工程, 2021, 47 (7): 101-108. JAING Tongtong, YIN Weixin, CAI Bing, et al. Encr ypted malicious traffic identification based on hierarchi cal spatiotemporal feature and multi-head attention[J]. Computer Engineering, 2021, 47(7):101-108.
[18] LIN X, XIONG G, GOU G, et al. ET-BERT: A conte xtualized datagram representation with pre-training tran sformers for encrypted traffic classification[C]//Proceedi ngs of the ACM Web Conference 2022. New York, U SA:ACM, 2022: 633-642.
[19] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-tr aining of deep bidirectional transformers for language understanding[C]//Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). Minneapolis, USA: ACL, 2019: 4171-4186.
[20] 赵键锦, 李祺, 刘胜利, 等. 面向 6G 流量监控:基于图神经网络的加密恶意流量检测方法[J]. 中国科学:信息科学, 2022, 52(2): 270-286. ZHAO Jianjin, LI Qi, LIU Shengli, et al. Towards tra ffic supervision in 6G: a graph neural network-based encrypted malicious traffic detection method. Sci Sin I nform, 2022, 52(2): 270-286.
[21] ACETO G, CIUONZO D, MONTIERI A, et al. DIST ILLER: Encrypted traffic classification via multi-modal multitask deep learning[J]. Journal of Network and C omputer Applications, 2021, 183: 102985.
[22] BADER O, LICHY A, HAJAJ C, et al. MalDIST: Fr om encrypted traffic classification to malware traffic d etection and classification[C]//2022 IEEE 19th annual consumer communications & networking conference (C CNC). Las Vegas, USA: IEEE, 2022: 527-533.
[23] 谷勇浩, 徐昊, 张晓青. 基于多粒度表征学习的加密恶意流量检测[J]. 计算机学报, 2023, 46(09): 1888-1899. GU Yonghao, XU Hao, ZHANG Xiaoqing. Multi-gran ularity representation learning for encrypted malicious traffic detection[J]. Chinese Journal of Computers, 202 3, 46(9):1888-1899.
[24] SHEN M, YE K, LIU X, et al. Machine learning-po wered encrypted network traffic analysis: a comprehen sive survey[J]. IEEE Communications Surveys & Tuto rials, 2022, 25(1): 791-824.
[25] VASWANI A, SHAZEER N, PARMAR N, et al. Atte ntion Is All You Need[C]// Advances in neural inform ation processing systems. USA: Curran Associates Inc. 2017: 6000–6010
[26] HE H Y, YANG Z G, CHEN X N. PERT: Payload e ncoding representation from transformer for encrypted traffic classification[C]//2020 ITU Kaleidoscope: Indust ry-Driven Digital Transformation (ITU K). Ha Noi, Vi etnam: IEEE, 2020: 1-8.
[27] WU Y, SCHUSTER M, CHEN Z, et al. Google's neu ral machine translation system: Bridging the gap betw een human and machine translation[J]. arXiv preprint arXiv:1609.08144, 2016.
[28] HE H Y, YANG Z G, CHEN X N. PERT: Payload e ncoding representation from transformer for encrypted traffic classification[C]//2020 ITU Kaleidoscope: Indust ry-Driven Digital Transformation (ITU K). Ha Noi, Vi etnam: IEEE, 2020: 1-8.
[29] SHARMA A, KREIBICH C, WALA F B, et al. Zeek [CP/OL].[2024-06-27]. https://download.zeek.org/zeek-6. 0.4.tar.gz. [30] KADAVATH
S, CONERLY T, ASKELL A, et al. Lan guage models (mostly) know what they know[J]. arXi v preprint arXiv:2207.05221, 2022.
[31] NETO E C P, DADKHAH S, FERREIRA R, et al. C I-CIoT2023: A real-time dataset and benchmark for lar ge-scale attacks in IoT environment[J]. Sensors, 2023, 23(13): 5941.
[32] VAN Ede T, BORTOLAMEOTTI R, CONTINELLA A, et al. Flowprint: Semi-supervised mobile-app fingerpri nting on encrypted network traffic[C]//Network and dis tributed system security symposium (NDSS). San Dieg o, USA: NDSS, 2020, 27. [33] SHARAFALDIN I, LASHKAR
I A H, GHORBANI A A. Toward generating a new intrusion detection datase t and intrusion traffic characterization[J]. ICISSp, 2018, 1: 108-116.
[34] LIU C, WANG W, WANG M, et al. An efficient inst ance selection algorithm to reconstruct training set for support vector machine[J]. Knowledge-Based Systems, 2017, 116: 58-73.
[35] NKORO E C, NWAKANMA C I, LEE J M, et al. D etecting cyberthreats in Metaverse learning platforms u sing an explainable DNN[J]. Internet of Things, 2024, 25: 101046.

选择文件类型/文献管理软件名称

选择包含的内容