作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (4): 148-157,164. doi: 10.19678/j.issn.1000-3428.0061003

• 网络空间安全 • 上一篇    下一篇

基于CNN-SIndRNN的恶意TLS流量快速识别方法

李小剑1, 谢晓尧1,2, 徐洋2, 张思聪2   

  1. 1. 贵州师范大学 数学科学学院贵阳 550001;
    2. 贵州师范大学 贵州省信息与计算科学重点实验室贵阳 550001
  • 收稿日期:2020-03-04 修回日期:2020-04-15 发布日期:2021-04-15
  • 作者简介:李小剑(1981—),男,博士研究生,主研方向为网络空间安全、深度学习;谢晓尧,教授、博士、博士生导师;徐洋,教授、博士;张思聪,博士。
  • 基金资助:
    中央引导地方科技发展专项资金(黔科中引地[2018]4008);贵州省科技计划项目(黔科合支撑[2020]2Y013);贵州省研究生教育创新计划项目(黔教合YJSCXJH[2019]043)。

Fast Identification Method of Malicious TLS Traffic Based on CNN-SIndRNN

LI Xiaojian1, XIE Xiaoyao1,2, XU Yang2, ZHANG Sicong2   

  1. 1. School of Mathematical Science, Guizhou Normal University, Guiyang 550001, China;
    2. Key Laboratory of Information and Computing Science Guizhou Province, Guizhou Normal University, Guiyang 550001, China
  • Received:2020-03-04 Revised:2020-04-15 Published:2021-04-15

摘要: 传统浅层机器学习方法在识别恶意TLS流量时依赖专家经验且流量表征不足,而现有的深度神经网络检测模型因层次结构复杂导致训练时间过长。提出一种基于CNN-SIndRNN端到端的轻量级恶意加密流量识别方法,使用多层一维卷积神经网络提取流量字节序列局部模式特征,并利用全局最大池化降维以减少计算参数。为增强流量表征,设计一种改进的循环神经网络用于捕获流量字节长距离依赖关系。在此基础上,采用独立循环神经网络IndRNN单元代替传统RNN循环单元,使用切片并行计算结构代替传统RNN的串行计算结构,并将两种类型深度神经网络所提取的特征拼接作为恶意TLS流量表征。在CTU-Maluware-Capure公开数据集上的实验结果表明,该方法在二分类实验上F1值高达0.965 7,在多分类实验上整体准确率为0.848 9,相比BotCatcher模型训练时间与检测时间分别节省了98.47%和98.28%。

关键词: 恶意TLS流量, 独立循环神经网络, 切片循环神经网络, 一维卷积, 全局池化

Abstract: Traditional shallow machine learning methods for identifying malicious TLS traffic rely heavily on expert experience, and perform poorly in traffic representation.In addition, the training of the existing deep neural network detection models is time-consuming due to the deepened hierarchical structure.To address the problem, a lightweight end-to-end method for malicious encrypted traffic detection is proposed based on CNN-SIndRNN.The method employs a multi-layer one-dimensional convolutional neural network to extract the local pattern features of a traffic byte sequence, and uses global maximum pooling to reduce dimensions to simplify computational parameters.At the same time, to enhance traffic representation, an improved recurrent neural network is designed in parallel to capture the long-distance dependence of traffic bytes.On this basis, the Independent Recurrent Neural Network (IndRNN) unit is used to replace the traditional Recurrent Neural Network (RNN) unit, and the sliced parallel computing structure is adopted to replace the serial computing structure of the traditional RNN.Then, the features extracted from the two types of deep neural networks are spliced to represent the malicious TLS traffic.The effectiveness of the proposed method is verified on two open datasets.The experimental results show that the method exhibits a F1 score of 0.965 7 in the binary classification experiment.Its overall accuracy rate reaches 84.89% in the multi-classification experiment.Compared with the model of BotCatcher, CNN-SIndRNN model improves the classification performance while reducing the training time by 98.47% and test time by 98.28%.

Key words: malicious TLS traffic, independently recurrent neural network, sliced recurrent neural network, one dimensional convolution neural network, global pooling

中图分类号: