An attention-based BERT-CNN-GRU detection method

doi:10.19678/j.issn.1000-3428.0068479

Abstract

Abstract: To address the issue of poor detection performance of existing methods on short domain names, a detection approach combining BERT-CNN-GRU with an attention mechanism is proposed. Initially, BERT is employed to extract the effective features and inter-character composition logic of the domain name. Subsequently, a parallel fusion simplifying attention Convolutional Neural Network (CNN) and Gated Recurrent Network (GRU) based on the multi-head attention mechanism are used to extract deep features of the domain name. The CNN, organized in an n-gram arrangement, can extract domain name information at different levels. Batch Normalization (BN) is applied to optimize the convolution results. GRU is utilized to better capture the composition differences of domain names before and after, and the multi-head attention mechanism excels in capturing the internal composition relationships of domain names. Concatenating the results of parallel detection network output, by maximizing the advantages of both networks and employing a local loss function to focus on the domain name classification problem, the classification performance is ultimately improved. Experimental results demonstrate that the model achieves optimal performance in binary classification. Specifically, on the short-domain multi-classification dataset, the Weighted F1-score for 15 categories reaches 86.21%, surpassing the BiLSTM-Seq-Attention model by 0.88%. On the UMUDGA dataset, the Weighted F1-score for 50 categories reaches 85.51%, representing an improvement of 0.45%. Moreover, the model exhibits outstanding performance in detecting variant domain names and word DGA, showcasing the ability to handle imbalances in domain name data distribution and a broader range of detection capabilities.

摘要： 针对现有检测方法对短域名检测性能普遍较差的问题，提出了一种BERT-CNN-GRU 结合注意力机制的检测方法。首先通过BERT提取域名的有效特征和字符间组成逻辑，再通过并行的融合简化注意力的卷积神经网络(CNN)和基于多头注意力机制的门控循环网络(GRU)提取域名深度特征。使用形如n-gram排布的CNN能够提取不同层次的域名信息，采用批标准化(Batch Normalization, BN)对卷积结果进行优化；使用GRU能够更好获取前后域名的组成差异，加上多头注意力机制善于捕获域名内部的组成关系。对并行检测网络输出的结果进行拼接，在最大限度上利用了两种网络的优势，采用局部损失函数，聚焦域名分类问题，最终提高了分类性能。实验结果表明：在二分类上，模型达到了最优效果，在短域名多分类数据集上15分类的Weighted F1-score达到了86.21%，比BiLSTM-Seq-Attention模型提高了0.88%，在UMUDGA数据集上50分类的Weighted F1-score达到了85.51%，提高了0.45%，并且模型对变体域名和单词DGA检测性能出众，具有域名数据分布不平衡检测能力和更广泛的检测性能。

ZHENG Yazhou, LIU Wanping, and HUANG Dong. An attention-based BERT-CNN-GRU detection method[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0068479.

郑雅洲, 刘万平, 黄东. 一种基于注意力机制的BERT-CNN-GRU检测方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0068479.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068479

References

[1] 刘文峰，张宇，张宏莉，等. 域名系统测量研究综述 [J]. 软件学报, 2022, 33(01): 211-232. Wen Feng Liu，Yu Zhang，Hong Li Zhang，et al. Survey on Domain Name System Measurement Research[J]. Journal of Software, 2022, 33(01): 211-232.(in Chinese)
[2] Selvi Jose，Rodríguez Ricardo J.，Soria-Olivas Emilio. Detection of algorithmically generated malicious domain names using masked N-grams[J]. Expert Systems with Applications, 2019, 124: 156-163.
[3] Liu Wanping，Zhong Shouming. Web malware spread modelling and optimal control strategies[J]. Scientific Reports, 2017, 7: 42308.
[4] 樊昭杉，王青，刘俊荣，等. 域名滥用行为检测技术综述 [J]. 计算机研究与发展, 2022, 59(11): 2581-2605. Zhao Shan Fan，Qing Wang，Jun Rong Liu，et al. Survey on Domain Name Abuse Detection Technology[J]. Journal of Computer Research and Development, 2022, 59(11): 2581-2605.(in Chinese)
[5] 国家互联网应急中心. 关于BlackMoon僵尸网络大规模传播的风险提示[EB/OL]. [2023-12-1]. htt ps://www.cert.org.cn/publish/main/10/2022/2022030 1130309305840180/20220301130309305840180_.ht ml. National Internet Emergency Response Center. Ri sk warning about the large-scale spread of Black Moon botnet[EB/OL]. [2023-12-1]. https://www.ce rt.org.cn/publish/main/10/2022/20220301130309305 840180/20220301130309305840180_.html.(in Chine se)
[6] Tran Duc，Mac Hieu，Tong Van，et al. A LSTM based framework for handling multiclass imbalance in DGA botnet detection[J]. Neurocomputing, 2018, 275: 2401-2413.
[7] Berman Daniel，Buczak Anna，Chavis Jeffrey，et al. A Survey of Deep Learning Methods for Cyber Security[J]. Information, 2019, 10(4): 122.
[8] Akarsh S.，Sriram S.，Poornachandran Prabaharan,et al. Deep Learning Framework for Domain Generation Algorithms Prediction Using Long Short-term Memory.[C]//Proceedings of the 5th International Conference on Advanced Computing &Communication Systems.Coimbatore, India, IEEE Press, 2019：666-671
[9] Ahluwalia Aashna,Traoré Issa,Ganame Karim, e t al. Detecting Broad Length Algorithmically Ge nerated Domains.[C]//Proceedings of International Conference on Intelligent, Secure, and Dependa ble Systems in Distributed and Cloud Environme nts. Berlin, Springer, 2017：19-34
[10] Ahluwalia Aashna，A. Abakumov. Impact Study of Length in Detecting Algorithmically Generated Domains[D]. Victoria,Canada: University of Victoria, 2018.
[11] Liang Jianbing，Chen Shuhui，Wei Ziling，et al. HAGDetector: Heterogeneous DGA domain name detection model[J]. Computers & Security, 2022, 120: 102803.
[12] Cucchiarelli Alessandro，Morbidoni Christian， Spalazzi Luca ，et al. Algorithmically generated malicious domain names detection based on n-grams features[J]. Expert Systems with Applications, 2021, 170: 114551.
[13] Yun Xiaochun，Huang Ji，Wang Yipeng，et al. Khaos: An Adversarial Neural Network DGA With High Anti-Detection Ability[J]. IEEE Trans on Information Forensics and Security, 2020, 15: 2225-2240.
[14] Devlin Jacob，Chang Ming-Wei，Lee Kenton，et al. BERT: Pre-training of Deep Bidirectional Tr ansformers for Language Understanding.[C]//Proce edings of the Conference of the North America n Chapter of the Association for Computational Linguistics.Minneapolis, Minnesota, Association f or Computational Linguistics, 2019：4171-4186
[15] Liu Zhanghui，Zhang Yudong，Chen Yuzhong，et al. Detection of Algorithmically Generated Domain Names Using the Recurrent Convolutional Neural Network with Spatial Pyramid Pooling[J]. Entropy, 2020, 22(9): 1058.
[16] Ioffe Sergey，Szegedy Christian. Batch Normaliz ation: Accelerating Deep Network Training by R educing Internal Covariate Shift.[C]//Proceedings of the 32rd International Conference on Machine Learning.New York, ACM Press, 2015：448-456
[17] Woo Sanghyun，Park Jongchan，Lee Joon-Young， et al. CBAM: Convolutional Block Attention Mo dule.[C]//Lecture Notes in Computer Science.Berli n, Springer, 2018：3-19
[18] Hu Jie，Shen Li，Sun Gang. Squeeze-and-Excitation Networks.[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition.Piscataway,NJ, IEEE Press, 2018：7132-7141
[19] Vaswani Ashish，Shazeer Noam，Parmar Niki，et al. Attention is All you Need.[C]//Proceedings of Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems.Cambridge,MA, MIT Press, 2017：5998-6008
[20] Lin Tsung-Yi，Goyal Priya，Girshick Ross，et al. Focal Loss for Dense Object Detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327.
[21] Tuan Tong Anh，Long Hoang Viet，Taniar David. On Detecting and Classifying DGA Botnets and their Families[J]. Computers & Security, 2022, 113:102549.
[22] Qiao Yanchen，Zhang Bin，Zhang Weizhe，et al. DGA Domain Name Classification Method Based on Long Short-Term Memory with Attention Mechanism[J]. Applied Sciences, 2019, 9(20): 4205.
[23] Kim Yoon. Convolutional Neural Networks for Sentence Classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.New York, ACL Press, 2014： 1746-1751
[24] Zago Mattia，Gil Pérez Manuel，Martínez Pérez Gregorio. UMUDGA: A dataset for profiling DGA-based botnet[J]. Computers & Security, 2020, 92: 101719.
[25] Schüppen Samuel，Teubert Dominik，Herrmann Patrick，et al. FANCI : Feature-based Automated NXDomain Classification and Intelligence.[C]//Pro ceedings of SEC'18: Proceedings of the 27th US ENIX Conference on Security Symposium.Berlin, Springer, 2018：1165-1181
[26] Tranco DGA List[EB/OL]. [2023-9-21]. https://t ranco-list.eu
[27] Sivaguru Raaghavi，Choudhary Chhaya，Yu Bin， et al. An Evaluation of DGA Classifiers.[C]// Proceedings of IEEE international conference on big data.Seattle, WA, USA, IEEE Press, 2018 ： 5058-5067
[28] Zago Mattia，Gil Pérez Manuel，Martínez Pérez Gregorio. UMUDGA: A dataset for profiling algorithmically generated domain names in botnet detection[J]. Data in Brief, 2020, 30: 105400.
[29] Osint DGA[EB/OL]. [2023-9-21]. https://osint.ba mbenekconsulting.com/feeds
[30] Abakumov A. DGA Repository[EB/OL]. [2023-9- 21]. https://github.com/andrewaeva/DGA
[31] Abdullah Raja Azlina Raja Mahmoodazizol，A. Abakumov. Dictionary-based DGAs Variants Detection.[C]//Advances on Intelligent Informatics and Computing.Berlin, Springer, 2022：258-269

Please choose a citation manager

Content to export