Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (1): 258-268. doi: 10.19678/j.issn.1000-3428.0068479

• Graphics and Image Processing • Previous Articles     Next Articles

A BERT-CNN-GRU Detection Method Based on Attention Mechanism

ZHENG Yazhou1, LIU Wanping1,*(), HUANG Dong2   

  1. 1. College of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400054, China
    2. Key Laboratory of Advanced Manufacturing Technology of the Ministry of Education, Guizhou University, Guiyang 550025, Guizhou, China
  • Received:2023-09-27 Online:2025-01-15 Published:2024-04-19
  • Contact: LIU Wanping

一种基于注意力机制的BERT-CNN-GRU检测方法

郑雅洲1, 刘万平1,*(), 黄东2   

  1. 1. 重庆理工大学计算机科学与工程学院, 重庆 400054
    2. 贵州大学现代制造技术教育部重点实验室, 贵州 贵阳 550025
  • 通讯作者: 刘万平
  • 基金资助:
    重庆市自然科学基金(cstc2021jcyj-msxmX0594); 重庆理工大学研究生教育高质量发展行动计划(gzlcx20233226)

Abstract:

To address the poor detection performance of existing methods on short domain names, a detection approach combining BERT-CNN-GRU with an attention mechanism is proposed. Initially, BERT is employed to extract the effective features and intercharacter composition logic of domain names. Subsequently, a parallel fusion simplifying the attention Convolutional Neural Network(CNN) and Gated Recurrent Unit(GRU) based on the multihead attention mechanism is used to extract the deep features of the domain name. A CNN organized in an n-gram arrangement can extract domain name information at different levels. Batch Normalization(BN) is applied to optimize the convolution results. The GRU is utilized to better capture the composition differences of the domain names before and after, and the multihead attention mechanism excels in capturing the internal composition relationships of the domain names. The classification performance is ultimately improved by concatenating the results of the parallel detection network output, maximizing the advantages of both networks, and employing a local loss function to focus on the domain name classification problem. The experimental results demonstrate that the model achieves optimal performance in binary classification. Specifically, on the short-domain multi-classification dataset, the weighted F1 value for 15 categories reached 86.21%, surpassing that of the BiLSTM-seq-attention model by 0.88 percentage point. In the UMUDGA dataset, the weighted F1 value for the 50 categories reached 85.51%, representing an improvement of 0.45 percentage point. Moreover, the model exhibited outstanding performance in detecting variant domain names and word Domain Generation Algorithms(DGA), showing the ability to handle imbalances in domain name data distribution and a broader range of detection capabilities.

Key words: malicious short domain name, BERT pre-trained, Batch Normalization(BN), attention mechanism, Gated Recurrent Unit(GRU), parallel Convolutional Neural Network(CNN)

摘要:

针对现有检测方法对短域名检测性能普遍较差的问题, 提出一种BERT-CNN-GRU结合注意力机制的检测方法。通过BERT提取域名的有效特征和字符间组成逻辑, 根据并行的融合简化注意力的卷积神经网络(CNN)和基于多头注意力机制的门控循环单元(GRU)提取域名深度特征。CNN使用n-gram排布的方式提取不同层次的域名信息, 并采用批标准化(BN)对卷积结果进行优化。GRU能够更好地获取前后域名的组成差异, 多头注意力机制在捕获域名内部的组成关系方面表现出色。将并行检测网络输出的结果进行拼接, 最大限度地发挥两种网络的优势, 并采用局部损失函数聚焦域名分类问题, 提高分类性能。实验结果表明, 该方法在二分类上达到了最优效果, 在短域名多分类数据集上15分类的加权F1值达到了86.21%, 比BiLSTM-Seq-Attention模型提高了0.88百分点, 在UMUDGA数据集上50分类的加权F1值达到了85.51%, 比BiLSTM-Seq-Attention模型提高了0.45百分点。此外, 该模型对变体域名和单词域名生成算法(DGA)检测性能较好, 具有处理域名数据分布不平衡的能力和更广泛的检测能力。

关键词: 恶意短域名, BERT预训练, 批标准化, 注意力机制, 门控循环单元, 并行卷积神经网络