作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (11): 129-135,143. doi: 10.19678/j.issn.1000-3428.0059492

• 网络空间安全 • 上一篇    下一篇

基于人工蜂群算法的Tor流量在线识别方法

梁晓萌1, 严明1,2, 吴杰1,2   

  1. 1. 复旦大学 计算机科学技术学院, 上海 200433;
    2. 教育部网络信息安全审计与监控工程研究中心, 上海 200433
  • 收稿日期:2020-09-10 修回日期:2020-11-17 发布日期:2020-12-04
  • 作者简介:梁晓萌(1996-),女,硕士研究生,主研方向为信息安全;严明(通信作者),工程师、硕士;吴杰,研究员、博士。
  • 基金资助:
    国家重点研发计划(2017YFB0803203)。

Tor Traffic Online Identification Method Based on Artificial Bee Colony Algorithm

LIANG Xiaomeng1, YAN Ming1,2, WU Jie1,2   

  1. 1. School of Computer Science, Fudan University, Shanghai 200433, China;
    2. Engineering Research Center of Cyber Security Auditing and Monitoring, Shanghai 200433, China
  • Received:2020-09-10 Revised:2020-11-17 Published:2020-12-04

摘要: Tor等匿名流量的分类与识别对运营商监管网络安全具有重要意义,但目前Tor流量的分类检测技术普遍存在识别准确率低、缺乏实时性、无法有效处理高维数据等问题。为此,提出一种Tor流量在线识别方法。通过搭建基于逻辑回归的深度神经网络,提取Tor流量特征匹配度以实现特征增强,并使用人工蜂群机制代替梯度下降等常见迭代算法,得到流量分类及识别结果。在此基础上,构建一套实时流量检测工具应用于实际生产环境中。在公开Tor数据集上的实验结果表明,与逻辑回归、随机森林、KNN算法相比,该算法的精确率和召回率分别提高了10%~50%,相比梯度下降的迭代算法准确率提高了7%~8%。

关键词: Tor流量识别, 网络流量分类, 特征提取, 网络流量分析, 深度学习, 人工蜂群算法, 逻辑回归

Abstract: The classification and identification of Tor and other anonymous traffic is of great significance to network security supervision.However,the existing Tor traffic classification and detection technologies are generally characterized by low identification accuracy,poor real-time performance,and inability to effectively handle high-dimensional data,etc.To solve these problems,a method for online identification of Tor traffic is proposed.A deep neural network based on logistic regression is constructed to extract the matching degree of effective Tor traffic features to implement feature enhancement.Additionally,the artificial bee colony mechanism is used to replace the commonly used iterative algorithms,such as gradient descent,so the traffic classification and recognition results are obtained.The experimental results on public Tor datasets show that compared with the logic regression algorithm,random forest and KNN algorithm,the proposed algorithm improves the accuracy and the recall rate by 10% to 50%.Compared with the regression algorithm based on gradient descent,the proposed algorithm improves the accuracy by 7% to 8%.

Key words: Tor traffic identification, network traffic classification, feature extraction, network traffic analysis, deep learning, artificial bee colony algorithm, logistic regression

中图分类号: