基于有监督自编码器的TLS加密异常流量检测

doi:10.19678/j.issn.1000-3428.0070021

摘要/Abstract

摘要：

随着用户对隐私保护意识的增强，越来越多的网站和服务使用传输层安全(TLS)协议来保护用户数据，这导致TLS加密流量在网络传输流量中的占比越来越高。但目前大多数异常流量检测方法是针对所有流量或所有加密流量的通用检测模型，而专门研究TLS加密流量的方法较少。因此，提出一种基于有监督自编码器的TLS加密异常流量检测方法。该方法的核心是训练一个有监督自编码器，其将网络流量作为输入，生成与输入流量维度相同的重构流量，并要求正常流量与对应的重构流量之间相似度极高，异常流量与重构流量之间相似度极低。为达到上述重构要求，设计一个重构损失函数来有监督地优化自编码器内部参数。在检测阶段，利用自编码器的重构能力，通过衡量输入流量与重构流量之间的余弦相似度来判断输入流量是否为异常流量。此外，通过整合数据构建一个专门用于TLS加密异常流量检测任务的数据集，在此数据集上的实验结果表明，该方法在TLS加密异常流量检测二分类任务上的准确率达到99.52%，优于其他对比模型，同时多种可视化策略展现了所提方法的有效性。

关键词: TLS加密, 自编码器, 异常流量检测, 重构损失, 可视化分析

Abstract:

As user awareness of privacy protection increases, an increasing number of websites and services are employing the Transport Layer Security (TLS) protocol to safeguard user data. Consequently, the proportion of TLS-encrypted traffic within overall network traffic is steadily increasing. However, most current abnormal traffic detection methods are general-purpose models that target all traffic or all encrypted traffic. Methods that specifically focus on TLS-encrypted traffic are few. Therefore, this study proposes a supervised autoencoder-based method for detecting abnormal TLS-encrypted traffic. This method focuses on training a supervised autoencoder that uses network traffic as the input and generates reconstructed traffic with the same dimensionality as that of the input. The model requires extremely high similarity between normal traffic and its corresponding reconstructed traffic, whereas the similarity between abnormal traffic and its reconstructed counterpart should be extremely low. To achieve these reconstruction requirements, a reconstruction loss function is designed to supervise and optimize the internal parameters of the autoencoder. During the detection phase, the reconstruction capability of the autoencoder is utilized to determine whether the input traffic is abnormal, by measuring the cosine similarity between the input and reconstructed traffic. Furthermore, a specialized dataset tailored for TLS-encrypted abnormal traffic detection is constructed by integrating relevant data. Experimental results on this dataset demonstrate that the proposed method achieves an accuracy of 99.52% in the binary classification task of TLS-encrypted abnormal traffic detection, outperforming other comparative models. In addition, various visualization strategies are employed to demonstrate the effectiveness of the proposed method.

Key words: TLS encryption, autoencoder, abnormal traffic detection, reconstruction loss, visual analysis

杨明芬, 甘昀, 张兴鹏. 基于有监督自编码器的TLS加密异常流量检测[J]. 计算机工程, 2025, 51(9): 192-200.

YANG Mingfen, GAN Yun, ZHANG Xingpeng. Transport Layer Security-Encrypted Abnormal Traffic Detection Based on Supervised Autoencoder[J]. Computer Engineering, 2025, 51(9): 192-200.

https://www.ecice06.com/CN/Y2025/V51/I9/192

图/表 11

图1 基于有监督自编码器的TLS加密异常流量检测流程

Fig.1 TLS-encryption anomaly traffic detection process based on supervised autoencoder

图2 数据集处理流程

Fig.2 Dataset processing flow

图3 实验结果的混淆矩阵

Fig.3 Confusion matrix of experimental results

图4 不同方法的ROC结果

Fig.4 ROC results of different methods

图5 训练阶段损失值的变化趋势

Fig.5 The trend of loss values during the training phase

图6 训练前后原始流量与重构流量相似度分布直方图

Fig.6 Histogram of similarity distribution between original traffic and reconstructed traffic before and after training

图7 训练前后正常流量与重构流量的t-SNE分布

Fig.7 t-SNE distribution of normal traffic and reconstructed traffic before and after training

图8 训练前后异常流量与重构流量的t-SNE分布

Fig.8 t-SNE distribution of abnormal traffic and reconstructed traffic before and after training

参考文献 25

1	BERBECARU D G, PETRAGLIA G. TLS-monitor: a monitor for TLS attacks[C]//Proceedings of the 20th IEEE Consumer Communications & Networking Conference (CCNC). Washington D.C., USA: IEEE Press, 2023: 1-6.
2	董卫宇, 李海涛, 王瑞敏, 等. 基于堆叠卷积注意力的网络流量异常检测模型. 计算机工程, 2022, 48(9): 12- 19. URL
	DONG W Y, LI H T, WANG R M, et al. Network traffic anomaly detection model based on stacked convolutional attention. Computer Engineering, 2022, 48(9): 12- 19. URL
3	SEBBAR A, ZKIK K, BADDI Y, et al. MitM detection and defense mechanism CBNA-RF based on machine learning for large-scale SDN context. Journal of Ambient Intelligence and Humanized Computing, 2020, 11(12): 5875- 5894. doi: 10.1007/s12652-020-02099-4
4	HUANG L S, RICE A, ELLINGSEN E, et al. Analyzing forged SSL certificates in the wild[C]//Proceedings of the IEEE Symposium on Security and Privacy. Washington D.C., USA: IEEE Press, 2014: 83-97.
5	WAKED L, MANNAN M, YOUSSEF A. The sorry state of TLS security in enterprise interception appliances. Digital Threats: Research and Practice, 2020, 1(2): 1- 26.
6	DE CARNÉ DE CARNAVALET X, MANNAN M. Killed by proxy: analyzing client-end TLS interception software[C]//Proceedings of 2016 Network and Distributed System Security Symposium. Washington D.C., USA: IEEE Press, 2016: 21-38.
7	DURUMERIC Z, MA Z N, SPRINGALL D, et al. The security impact of HTTPS interception[C]//Proceedings of 2017 Network and Distributed System Security Symposium. Washington D.C., USA: IEEE Press, 2017: 26-40.
8	BOUKHTOUTA A, LAKHDARI N E, MOKHOV S A, et al. Towards fingerprinting malicious traffic. Procedia Computer Science, 2013, 19, 548- 555. doi: 10.1016/j.procs.2013.06.073
9	ALSHAMMARI R, ZINCIR-HEYWOOD N. Generalization of signatures for SSH encrypted traffic identification[C]//Proceedings of the IEEE Symposium on Computational Intelligence in Cyber Security. Washington D.C., USA: IEEE Press, 2009: 167-174.
10	ALSHAMMARI R, ZINCIR-HEYWOOD A N. Machine learning based encrypted traffic classification: identifying SSH and Skype[C]//Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications. Washington D.C., USA: IEEE Press, 2009: 1-8.
11	ALSHAMMARI R, ZINCIR-HEYWOOD A N. Can encrypted traffic be identified without port numbers, IP addresses and payload inspection?. Computer Networks, 2011, 55(6): 1326- 1350. doi: 10.1016/j.comnet.2010.12.002
12	赵荻, 尹志超, 崔苏苏, 等. 基于图表示的恶意TLS流量检测方法. 信息安全研究, 2024, 10(3): 209- 215.
	ZHAO D, YIN Z C, CUI S S, et al. Malicious TLS traffic detection based on graph representation. Journal of Information Security Research, 2024, 10(3): 209- 215.
13	BOUKHTOUTA A, MOKHOV S A, LAKHDARI N E, et al. Network malware classification comparison using DPI and flow packet headers. Journal of Computer Virology and Hacking Techniques, 2016, 12(2): 69- 100. doi: 10.1007/s11416-015-0247-x
14	ANDERSON B, MCGREW D. Identifying encrypted malware traffic with contextual flow data[C]//Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. New York, USA: ACM Press, 2016: 35-46.
15	MCGREW D, ANDERSON B. Enhanced telemetry for encrypted threat analytics[C]//Proceedings of the 24th IEEE International Conference on Network Protocols (ICNP). Washington D.C., USA: IEEE Press, 2016: 1-6.
16	冯景瑜, 王锦康, 张宝军, 等. 基于信任过滤的轻量级加密流量异常检测方案. 西安邮电大学学报, 2023, 28(5): 56- 66.
	FENG J Y, WANG J K, ZHANG B J, et al. Anomaly detection scheme of lightweight encrypted traffic based on trust filtering. Journal of Xi'an University of Posts and Telecommunications, 2023, 28(5): 56- 66.
17	梁添鑫, 郭晓军, 杨明芬. 基于CNN-BiGRU的DNS协议中间人攻击检测方法. 西藏科技, 2024, 46(5): 47- 54.
	LIANG T X, GUO X J, YANG M F. A man-in-the-middle attack detection method for DNS protocol based on CNN-BiGRU. Xizang Science and Technology, 2024, 46(5): 47- 54.
18	靳玮琨, 郭晓军, 杨明芬. 基于Laplace机制的加密流量特征集隐私保护方法. 西藏科技, 2024, 46(4): 71- 80.
	JIN W K, GUO X J, YANG M F. A privacy-preserving method for encrypted traffic feature set based on Laplace mechanism. Xizang Science and Technology, 2024, 46(4): 71- 80.
19	DRAPER-GIL G, LASHKARI A H, MAMUN M, et al. Characterization of encrypted and VPN traffic using time-related features[C]//Proceedings of International Conference on Information Systems Security and Privacy. Washington D.C., USA: IEEE Press, 2016: 407-414.
20	LETTERI I, DI CECCO A, DELLA PENNA G. New optimization approaches in malware traffic analysis[EB/OL]. [2023-10-05]. https://link.springer.com/chapter/10.1007/978-3-030-95467-3_4.
21	WENG Z Q, CHEN T M, ZHU T T, et al. TLSmell: direct identification on malicious HTTPs encryption traffic with simple connection-specific indicators. Computer Systems Science and Engineering, 2021, 37(1): 105- 119. doi: 10.32604/csse.2021.015074
22	ZHANG J B. DeepMAL: a CNN-LSTM model for malware detection based on dynamic semantic behaviours[C]//Proceedings of the International Conference on Computer Information and Big Data Applications (CIBDA). Washington D.C., USA: IEEE Press, 2020: 313-316.
23	CHEN L C, GAO S, LIU B X, et al. THS-IDPC: a three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection. The Journal of Supercomputing, 2020, 76(9): 7489- 7518. doi: 10.1007/s11227-020-03372-1
24	李慧慧, 张士庚, 宋虹, 等. 结合多特征识别的恶意加密流量检测方法. 信息安全学报, 2021, 6(2): 129- 142.
	LI H H, ZHANG S G, SONG H, et al. Robust malicious encrypted traffic detection based with multiple features. Journal of Cyber Security, 2021, 6(2): 129- 142.
25	ZENG Y, GU H X, WEI W T, et al. Deep-Full-Range: a deep learning based network encrypted traffic classification and intrusion detection framework. IEEE Access, 2019, 7, 45182- 45190. doi: 10.1109/ACCESS.2019.2908225

[1]	刘慧, 郭特, 刘栋, 李颖颖. 基于量子化降噪自编码器的遮挡微表情重建方法研究[J]. 计算机工程, 2025, 51(5): 288-304.
[2]	张安勤, 丁志锋. 融合动态图嵌入和Transformer自编码器的网络异常检测[J]. 计算机工程, 2025, 51(4): 47-56.
[3]	张新波, 张雪英, 黄丽霞, 陈桂军. 基于半监督深度自编码网络的分类算法及应用[J]. 计算机工程, 2025, 51(1): 71-80.
[4]	余长宏, 许孔豪, 张泽, 高明. 基于分割点改进孤立森林的网络入侵检测方法[J]. 计算机工程, 2024, 50(6): 148-156.
[5]	陈虹, 王瀚文, 金海波. 融合改进自编码器和残差网络的入侵检测模型[J]. 计算机工程, 2024, 50(2): 188-195.
[6]	宋航, 周凤, 熊伟. 基于自相关-变分对抗学习的物理系统异常检测[J]. 计算机工程, 2024, 50(12): 358-366.
[7]	郭尚伟, 刘树峰, 李子铭, 欧阳德强, 王宁, 向涛. 基于融合模型的网络安全态势感知方法[J]. 计算机工程, 2024, 50(11): 1-9.
[8]	朱孟栩, 张文豪, 李国洪, 顾行发, 余涛, 郑逢杰, 张丽丽, 吴俣, 邴芳飞, 唐健雄. 基于卷积神经网络的高分六号卫星多光谱图像压缩[J]. 计算机工程, 2023, 49(9): 287-294.
[9]	张子宣, 宗学军, 何戡, 连莲. 基于CVAE-CatBoost的工业控制网络异常流量检测研究[J]. 计算机工程, 2023, 49(5): 173-180.
[10]	刘强, 张颖, 周卫祥, 蒋先涛, 周薇娜, 周谋国. 自适应类增量学习的物联网入侵检测系统[J]. 计算机工程, 2023, 49(2): 169-174.
[11]	袁立宁, 胡皓, 刘钊. 基于多通道图卷积自编码器的图表示学习[J]. 计算机工程, 2023, 49(2): 150-160,174.
[12]	鄢宁, 李岳阳, 罗海驰. 基于块金字塔记忆模块的无监督异常检测[J]. 计算机工程, 2023, 49(12): 304-310.
[13]	富坤, 孙明磊, 郝玉涵, 刘赢华. 基于对抗训练的伪标签约束自编码器[J]. 计算机工程, 2023, 49(11): 123-130.
[14]	江雨燕, 邵金, 李平. 融合自动权重学习的深度子空间聚类[J]. 计算机工程, 2022, 48(8): 77-84,97.
[15]	丁庆丰, 李晋国. 一种物联网环境下的分布式异常流量检测方案[J]. 计算机工程, 2022, 48(8): 152-159.

选择文件类型/文献管理软件名称

选择包含的内容