Tor Traffic Analysis Model Based on Data Enhancement and Stream Data Processing

doi:10.19678/j.issn.1000-3428.0064386

Abstract

Abstract: Tor anonymous traffic identification technology provides a mechanism to combat illegal and criminal activities in the dark network using Tor anonymous communication tools.However, some challenges exist, such as data collection difficulties, unbalanced datasets, and the poor ability of the Tor analysis model to detect and adapt to conceptual drift.First, the collected original Tor PCAP traffic is segmented, denoised, and processed into byte sequences.Then, one-dimensional sequences are transformed into visual grayscale images and input to an improved multi-size Deep Convolution Generate Adversarial Network(DCGAN) to generate Tor traffic samples for data balancing.Finally, a Stacked Denoising Auto-Encoder(SDAE) is used for sequence dimensionality reduction, and the extracted features are input to an Online Sequential Extreme Learning Machine(OS-ELM) to realize the online flow recognition of Tor traffic.The experimental results show that the improved DCGAN can be used to improve the quality of data sets and improve the model recognition rate by about 2.8 percentage points.The accuracy of the traffic analysis model combined with OS-ELM and SDAE can reach 95.7%, and the recognition efficiency is greatly improved compared with traditional Convolutional Neural Network(CNN) and Long Short-Term Memory(LSTM) network models.

Key words: onion router, concept drift, stream data mining, data enhancement, Deep Convolution Generative Adversarial Network(DCGAN), Stacked Denoising Auto Encoder(SDAE), Online Sequential Extreme Learning Machine(OS-ELM)

摘要： Tor流量分析技术为打击利用Tor匿名通信工具从事的暗网犯罪活动提供了技术支撑，但目前存在数据难于收集、数据集不平衡、模型抗概念漂移能力差等问题。提出一种结合堆叠去噪自编码器和在线序列极限学习机的Tor流量分析模型。对原始Tor PACP包进行分割、去噪处理并提取特征序列。在此基础上，将一维序列转化为可视化灰度图并输入改进多尺寸深度卷积生成对抗网络，生成Tor流量样本以平衡数据集，利用堆叠降噪自动编码器进行序列降维并将特征输入在线序列极限学习机实现Tor匿名流量的在线流识别。实验结果表明，改进多尺寸深度卷积生成对抗网络可用于提升数据集质量并提高模型识别率约2.8个百分点，结合在线序列极限学习机和堆叠去噪自编码器的流量分析模型准确率可达95.7%，识别效率较传统卷积神经网络和长短期记忆网络模型有较大提升。

关键词: 洋葱路由, 概念漂移, 流数据挖掘, 数据增强, 深度卷积生成对抗网络, 堆叠去噪自动编码器, 在线序列极限学习机

CLC Number:

TP309

XI Rongkang, CAI Manchun, LU Tianliang. Tor Traffic Analysis Model Based on Data Enhancement and Stream Data Processing[J]. Computer Engineering, 2023, 49(3): 177-184.

席荣康, 蔡满春, 芦天亮. 基于数据增强与流数据处理的Tor流量分析模型[J]. 计算机工程, 2023, 49(3): 177-184.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0064386

http://www.ecice06.com/EN/Y2023/V49/I3/177

Figures/Tables 16

References

[1] 王琦, 曹卫权, 梁杰, 等.面向端到端溯源攻击对手的Tor安全性模型[J].计算机工程, 2021, 47(11):136-143. WANG Q, CAO W Q, LIANG J, et al.Tor security model for end-to-end source tracking attack adversary[J].Computer Engineering, 2021, 47(11):136-143.(in Chinese)
[2] HINTZ A.Fingerprinting websites using traffic analysis[C]//Proceedings of the 2nd International Workshop on Privacy Enhancing Technologies.Berlin, Germany:Springer, 2002:171-178.
[3] DEAN B G, LIBERATORE M, JENSEN D, et al.Privacy vulnerabilities in encrypted HTTP streams[C]//Proceedings of the 5th International Workshop on Privacy Enhancing Technologies.Berlin, Germany:Springer, 2005:1-11.
[4] LIBERATORE M, LEVINE B N.Inferring the source of encrypted HTTP connections[C]//Proceedings of the 13th ACM Conference on Computer and Communications Security.New York, USA:ACM Press, 2006:255-263.
[5] DINGLEDINE R, MATHEWSON N, SYVERSON P.Tor:the second-generation onion router[J].Journal of the Franklin Institute, 2004, 239(2):135-139.
[6] PANCHENKO A, NIESSEN L, ZINNEN A, et al.Website fingerprinting in onion routing based anonymization networks[C]//Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society.New York, USA:ACM Press, 2011:103-114.
[7] HAYES J, DANEZIS G.k-fingerprinting:a robust scalable website fingerprinting technique[C]//Proceedings of the 25th USENIX Security Symptom.Berkley, USA:USENIX Association, 2016:1187-1203.
[8] ABE K, GOTO S.Fingerprinting attack on tor anonymity using deep learning[J].Proceedings of the Asia-Pacific Advanced Network, 2016, 42:15-20.
[9] RIMMER V, PREUVENEERS D, JUAREZ M, et al.Automated website fingerprinting through deep learning[C]//Proceedings of 2018 Network and Distributed System Security Symposium.San Diego, USA:Internet Society, 2018:1-10.
[10] WANG W, ZHU M, ZENG X W, et al.Malware traffic classification using convolutional neural network for representation learning[C]//Proceedings of International Conference on Information Networking.Washington D.C., USA:IEEE Press, 2017:712-717.
[11] 马陈城, 杜学绘, 曹利峰, 等.基于深度神经网络burst特征分析的网站指纹攻击方法[J].计算机研究与发展, 2020, 57(4):746-766. MA C C, DU X H, CAO L F, et al.burst-analysis website fingerprinting attack based on deep neural network[J].Journal of Computer Research and Development, 2020, 57(4):746-766.(in Chinese)
[12] WANG T, GOLDBERG I.On realistically attacking Tor with website fingerprinting[J].Proceedings on Privacy Enhancing Technologies, 2016(4):21-36.
[13] XU Y X, WANG T, LI Q, et al.A multi-tab website fingerprinting attack[C]//Proceedings of the 34th Annual Computer Security Applications Conference.New York, USA:ACM Press, 2018:327-341.
[14] CUI W Q, CHEN T, FIELDS C, et al.Revisiting assumptions for website fingerprinting attacks[C]//Proceedings of ACM Asia Conference on Computer and Communications Security.New York, USA:ACM Press, 2019:328-339.
[15] JUAREZ M, AFROZ S, ACAR G, et al.A critical evaluation of website fingerprinting attacks[C]//Proceedings of 2014 ACM SIGSAC Conference on Computer and Communications Security.New York, USA:ACM Press, 2014:263-274.
[16] ATTARIAN R, ABDI L, HASHEMI S.AdaWFPA:adaptive online website fingerprinting attack for Tor anonymous network:a stream-wise paradigm[J].Computer Communications, 2019, 148(C):74-85.
[17] BIFET A, READ J, PFAHRINGER B, et al.CD-MOA:change detection framework for massive online analysis[C]//Proceedings of International Symposium on Intelligent Data Analysis.Berlin, Germany:Springer, 2013:92-103.
[18] 李伟, 黄鹤鸣, 武风英, 等.基于深度多特征融合的自适应CNN图像分类算法[J].计算机工程, 2021, 47(9):235-239, 251. LI W, HUANG H M, WU F Y, et al.Adaptive CNN-based image classification algorithm based on deep fusion of multi-feature[J].Computer Engineering, 2021, 47(9):235-239, 251.(in Chinese)
[19] RADFORD A, METZ L, CHINTALA S.Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL].[2022-02-04].https://arxiv.org/abs/1511.06434.
[20] SU L Z, SHI J, ZHANG P Z, et al.Detecting multiple changes from multi-temporal images by using stacked denosing autoencoder based change vector analysis[C]//Proceedings of International Joint Conference on Neural Networks.Washington D.C., USA:IEEE Press, 2016:1269-1276.
[21] SIRINAM P, IMANI M, JUAREZ M, et al.Deep fingerprinting:undermining website fingerprinting defenses with deep learning[C]//Proceedings of the 25th ACM Conference on Computer and Communications Security.New York, USA:ACM Press, 2018:1928-1943.
[22] PRABAVATHY S, SUNDARAKANTHAM K, SHALINIE S M.Design of cognitive fog computing for intrusion detection in Internet of things[J].Journal of Communications and Networks, 2018, 20(3):291-298.
[23] 刘明峰, 侯路, 郭顺森, 等.基于OS-ELM和SDAE的Wi-Fi入侵检测方法[J].北京交通大学学报, 2019, 43(5):87-93, 101. LIU M F, HOU L, GUO S S, et al.Wi-Fi intrusion detection method based on OS-ELM and SDAE[J].Journal of Beijing Jiaotong University, 2019, 43(5):87-93, 101.(in Chinese)
[24] HABIBI L A, DRAPER G G, MAMUN M S I, et al.Characterization of Tor traffic using time based features[C]//Proceedings of the 3rd International Conference on Information Systems Security and Privacy.Porto, Portugal:SCITEPRESS, 2017:253-262.
[25] WANG W, ZHU M, WANG J L, et al.End-to-end encrypted traffic classification with one-dimensional convolution neural networks[C]//Proceedings of 2017 IEEE International Conference on Intelligence and Security Informatics.Washington D.C., USA:IEEE Press, 2017:43-48.

Please choose a citation manager

Content to export