A BERT-CNN-GRU Detection Method Based on Attention Mechanism

doi:10.19678/j.issn.1000-3428.0068479

Abstract

Abstract:

To address the poor detection performance of existing methods on short domain names, a detection approach combining BERT-CNN-GRU with an attention mechanism is proposed. Initially, BERT is employed to extract the effective features and intercharacter composition logic of domain names. Subsequently, a parallel fusion simplifying the attention Convolutional Neural Network(CNN) and Gated Recurrent Unit(GRU) based on the multihead attention mechanism is used to extract the deep features of the domain name. A CNN organized in an n-gram arrangement can extract domain name information at different levels. Batch Normalization(BN) is applied to optimize the convolution results. The GRU is utilized to better capture the composition differences of the domain names before and after, and the multihead attention mechanism excels in capturing the internal composition relationships of the domain names. The classification performance is ultimately improved by concatenating the results of the parallel detection network output, maximizing the advantages of both networks, and employing a local loss function to focus on the domain name classification problem. The experimental results demonstrate that the model achieves optimal performance in binary classification. Specifically, on the short-domain multi-classification dataset, the weighted F1 value for 15 categories reached 86.21%, surpassing that of the BiLSTM-seq-attention model by 0.88 percentage point. In the UMUDGA dataset, the weighted F1 value for the 50 categories reached 85.51%, representing an improvement of 0.45 percentage point. Moreover, the model exhibited outstanding performance in detecting variant domain names and word Domain Generation Algorithms(DGA), showing the ability to handle imbalances in domain name data distribution and a broader range of detection capabilities.

Key words: malicious short domain name, BERT pre-trained, Batch Normalization(BN), attention mechanism, Gated Recurrent Unit(GRU), parallel Convolutional Neural Network(CNN)

摘要：

针对现有检测方法对短域名检测性能普遍较差的问题, 提出一种BERT-CNN-GRU结合注意力机制的检测方法。通过BERT提取域名的有效特征和字符间组成逻辑, 根据并行的融合简化注意力的卷积神经网络(CNN)和基于多头注意力机制的门控循环单元(GRU)提取域名深度特征。CNN使用n-gram排布的方式提取不同层次的域名信息, 并采用批标准化(BN)对卷积结果进行优化。GRU能够更好地获取前后域名的组成差异, 多头注意力机制在捕获域名内部的组成关系方面表现出色。将并行检测网络输出的结果进行拼接, 最大限度地发挥两种网络的优势, 并采用局部损失函数聚焦域名分类问题, 提高分类性能。实验结果表明, 该方法在二分类上达到了最优效果, 在短域名多分类数据集上15分类的加权F1值达到了86.21%, 比BiLSTM-Seq-Attention模型提高了0.88百分点, 在UMUDGA数据集上50分类的加权F1值达到了85.51%, 比BiLSTM-Seq-Attention模型提高了0.45百分点。此外, 该模型对变体域名和单词域名生成算法(DGA)检测性能较好, 具有处理域名数据分布不平衡的能力和更广泛的检测能力。

关键词: 恶意短域名, BERT预训练, 批标准化, 注意力机制, 门控循环单元, 并行卷积神经网络

ZHENG Yazhou, LIU Wanping, HUANG Dong. A BERT-CNN-GRU Detection Method Based on Attention Mechanism[J]. Computer Engineering, 2025, 51(1): 258-268.

郑雅洲, 刘万平, 黄东. 一种基于注意力机制的BERT-CNN-GRU检测方法[J]. 计算机工程, 2025, 51(1): 258-268.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068479

https://www.ecice06.com/EN/Y2025/V51/I1/258

Figures/Tables 14

Fig.1 Schematic diagram of DGA operation principle

Fig.2 Detection framework combining BERT-CNN-GRU and attention mechanism

Fig.3 BERT structure

Fig.4 Schematic diagram of CBAM module

Fig.5 Detecting accuracy of domain names each length

Fig.6 Multiple classification confusion matrix for fifteen classification dataset of short domain names

Fig.7 Loss value changing trend during training proce

References 31

1	刘文峰, 张宇, 张宏莉, 等. 域名系统测量研究综述. 软件学报, 2022, 33(1): 211- 232.
	LIU W F, ZHANG Y, ZHANG H L, et al. Survey on domain name system measurement research. Journal of Software, 2022, 33(1): 211- 232.
2	SELVI J, RODRÍGUEZ R J, SORIA-OLIVAS E. Detection of algorithmically generated malicious domain names using masked N-grams. Expert Systems with Applications, 2019, 124, 156- 163. doi: 10.1016/j.eswa.2019.01.050
3	LIU W P, ZHONG S M. Web malware spread modelling and optimal control strategies. Scientific Reports, 2017, 7, 42308. doi: 10.1038/srep42308
4	樊昭杉, 王青, 刘俊荣, 等. 域名滥用行为检测技术综述. 计算机研究与发展, 2022, 59(11): 2581- 2605.
	FAN Z S, WANG Q, LIU J R, et al. Survey on domain name abuse detection technology. Journal of Computer Research and Development, 2022, 59(11): 2581- 2605.
5	国家互联网应急中心. 关于BlackMoon僵尸网络大规模传播的风险提示[EB/OL]. [2023-08-10]. https://www.cert.org.cn/publish/main/10/2022/20220301130309305840180/20220301130309305840180_.html.
	National Internet Emergency Response Center. Risk warning about the large-scale spread of BlackMoon botnet[EB/OL]. [2023-08-10]. https://www.cert.org.cn/publish/main/10/2022/20220301130309305840180/20220301130309305840180_.html. (in Chinese)
6	TRAN D, MAC H, TONG V, et al. A LSTM based framework for handling multiclass imbalance in DGA botnet detection. Neurocomputing, 2018, 275, 2401- 2413. doi: 10.1016/j.neucom.2017.11.018
7	BERMAN D S, BUCZAK A L, CHAVIS J S, et al. A survey of deep learning methods for cyber security. Information, 2019, 10(4): 122. doi: 10.3390/info10040122
8	AKARSH S, SRIRAM S, POORNACHANDRAN P, et al. Deep learning framework for domain generation algorithms prediction using long short-term memory[C]//Proceedings of the 5th International Conference on Advanced Computing & Communication Systems. Washington D. C., USA: IEEE Press, 2019: 666-671.
9	AHLUWALIA A, TRAORE I, GANAME K, et al. Detecting broad length algorithmically generated domains. Berlin, Germany: Springer, 2017.
10	AASHNA A, ABAKUMOV A. Impact study of length in detecting algorithmmically generated domains[D]. Victoria, Canada: University of Victoria, 2018.
11	LIANG J B, CHEN S H, WEI Z L, et al. HAGDetector: heterogeneous DGA domain name detection model. Computers & Security, 2022, 120, 102803.
12	CUCCHIARELLI A, MORBIDONI C, SPALAZZI L, et al. Algorithmically generated malicious domain names detection based on n-grams features. Expert Systems with Applications, 2021, 170, 114551. doi: 10.1016/j.eswa.2020.114551
13	YUN X C, HUANG J, WANG Y P, et al. Khaos: an adversarial neural network DGA with high anti-detection ability. IEEE Transactions on Information Forensics and Security, 2020, 15, 2225- 2240. doi: 10.1109/TIFS.2019.2960647
14	JACOB D, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of IEEE Conference of the North American Chapter of the Association for Computational Linguistics. Minneapolis, USA: Association for Computational Linguistics, 2019: 4171-4186.
15	LIU Z H, ZHANG Y D, CHEN Y Z, et al. Detection of algorithmically generated domain names using the recurrent convolutional neural network with spatial pyramid pooling. Entropy, 2020, 22(9): 1058. doi: 10.3390/e22091058
16	SERGEY I, CHRISTIAN S. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32rd International Conference on Machine Learning. New York, USA: ACM Press, 2015: 448-456.
17	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module. Berlin, Germany: Springer, 2018.
18	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 7132-7141.
19	ASHISH V, NOAM S, NIKI P, et al. Attention is all you need[C]//Proceedings of NIPS'17. Cambridge, USA: MIT Press, 2017: 5998-6008.
20	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318- 327. doi: 10.1109/TPAMI.2018.2858826
21	TUAN T A, LONG H V, TANIAR D. On detecting and classifying DGA botnets and their families. Computers & Security, 2022, 113, 102549.
22	QIAO Y C, ZHANG B, ZHANG W Z, et al. DGA domain name classification method based on long short-term memory with attention mechanism. Applied Sciences, 2019, 9(20): 4205. doi: 10.3390/app9204205
23	KIM Y. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2014: 3457-3468.
24	ZAGO M, GIL PÉREZ M, MARTÍNEZ PÉREZ G. UMUDGA: a dataset for profiling DGA-based botnet. Computers & Security, 2020, 92, 101719.
25	SAMUEL S, DOMINIK T, PATRICK H, et al. FANCI: feature-based automated NXDomain classification and intelligence[C]//Proceedings of the 27th USENIX Conference on Security Symposium. Berlin, Germany: Springer, 2018: 1165-1181.
26	Tranco DGA list[EB/OL]. [2023-08-10]. https://tranco-list.eu.
27	SIVAGURU R, CHOUDHARY C, YU B, et al. An evaluation of DGA classifiers[C]//Proceedings of IEEE International Conference on Big Data. Seattle, USA: IEEE Press, 2018: 5058-5067.
28	ZAGO M, GIL PÉREZ M, MARTÍNEZ PÉREZ G. UMUDGA: a dataset for profiling algorithmically generated domain names in botnet detection. Data in Brief, 2020, 30, 105400.
29	Osint DGA[EB/OL]. [2023-08-10]. https://osint.bambenekconsulting.com/feeds.
30	ABAKUMAV A. DGA repository[EB/OL]. [2023-08-10]. https://github.com/andrewaeva/DGA.
31	MAHMOODAZIZOL R A R, ABAKUMOV A. Dictionary-based DGAs variants detection[C]//Proceedings of International Conference on Reliable Information and Communication Technology. Berlin, Germany: Springer, 2022: 258-269.

[1]	HU Yongtao, HUANG Hongqiong. Multi-Branch Clothes-Changing Person Re-Identification with Feature Fusion and Channel Attention [J]. Computer Engineering, 2025, 51(1): 225-234.
[2]	HUO Jiuyuan, SU Hongrui, WU Zeyu, WANG Tingjuan. Road Traffic Small Target Vehicle Detection Algorithm Based on Improved YOLOv8 [J]. Computer Engineering, 2025, 51(1): 246-257.
[3]	WANG Qian, ZHANG Junhua, WANG Zetong, LI Bo. X2S-Net: Three-Dimensional Reconstruction of Spine Based on Biplanar X-Rays [J]. Computer Engineering, 2025, 51(1): 277-286.
[4]	LIU Zhong, TANG Hong, WANG Ningzhe, ZHU Chuanrun. Text Summarization Method Incorporating RNN and Sparse Self-Attention [J]. Computer Engineering, 2025, 51(1): 312-320.
[5]	LIU Zhaowei, FANG Yanhong, ZHENG Mingyu, SUO Bin. Lung Disease Diagnosis Method Based on Attention Mechanism and Multi-tasking [J]. Computer Engineering, 2025, 51(1): 332-342.
[6]	LUO Xudong, YUAN Di, CHANG Xiaojun, HE Zhenyu. Underwater Target Tracking Based on Uncertainty-Inspired Image Enhancement [J]. Computer Engineering, 2025, 51(1): 11-19.
[7]	ZHOU Xueyang, FU Qiming, CHEN Jianping, CHEN Yanming, LU You, WANG Yunzhe. Document-Level Relation Extraction Method Based on Evidence and Graph Inference: A Case Study of Medical Relationships [J]. Computer Engineering, 2025, 51(1): 106-117.
[8]	XIAO Chaoen, LI Zifan, ZHANG Lei, WANG Jianxin, QIAN Siyuan. Differential Cryptanalysis Based on Transformer Model and Attention Mechanism [J]. Computer Engineering, 2025, 51(1): 156-163.
[9]	ZHANG Tianpeng, HAN Jing, LÜ Xueqiang. Super-Resolution-Aided Small-Target Detection Based on Multi-Task Learning [J]. Computer Engineering, 2024, 50(9): 304-312.
[10]	GUO Min, ZHANG Xihan, LI Yang. Integrated Attentional Teacher Mutual Consistency Semi-Supervised Medical Image Segmentation [J]. Computer Engineering, 2024, 50(9): 313-323.
[11]	ZENG Yuqi, LIU Bo, ZHONG Baichang, ZHONG Jin. Student Classroom Behavior Detection Algorithm Based on Improved YOLOv8 in Smart Education [J]. Computer Engineering, 2024, 50(9): 344-355.
[12]	LI Junjun, DONG Jiangang, LI Kun. Research on Kubernetes-based Cluster Energy-Saving Strategy [J]. Computer Engineering, 2024, 50(9): 82-91.
[13]	YANG Mingqiang, LU Jian. Target Speech Extraction Based on Cross-modal Attention [J]. Computer Engineering, 2024, 50(9): 121-129.
[14]	LIN Chang, GUO Wei, REN Zhecong, JIN Haibo. Unification Algorithm for Object Tracking and Segmentation Based on Transformer [J]. Computer Engineering, 2024, 50(9): 130-141.
[15]	LI Zelin, LÜ Zhaofeng, CHEN Fuqiang, LI Ke. Entity Alignment Model Based on Multi-Hop Information Fusion [J]. Computer Engineering, 2024, 50(9): 142-152.

Please choose a citation manager

Content to export