Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (10): 238-249. doi: 10.19678/j.issn.1000-3428.0069949

• Cyberspace Security • Previous Articles     Next Articles

Discovery of Nuisance Website Domain Name Generation Based on Domain Name Semantic Information and Similarity

YU Jie1,2, ZHAO Chunlei1,2,*(), DONG Guozhong3, REN Huaishuo1,2, YOU Wei1,2   

  1. 1. Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology, Tianjin 300384, China
    2. Tianjin Key Laboratory of Intelligent Computing and Novel Software Technology, Tianjin 300384, China
    3. Department of Novel Network Research, Pengcheng Laboratory, Shenzhen 518055, Guangdong, China
  • Received:2024-06-03 Revised:2024-07-20 Online:2025-10-15 Published:2024-09-24
  • Contact: ZHAO Chunlei

基于域名语义信息与域名相似度的公害网站域名生成发现

于杰1,2, 赵春蕾1,2,*(), 董国忠3, 任怀硕1,2, 尤伟1,2   

  1. 1. 天津理工大学计算机视觉与系统省部共建教育部重点实验室,天津 300384
    2. 天津市智能计算与软件新技术重点实验室,天津 300384
    3. 鹏城实验室新型网络研究部,广东 深圳 518055
  • 通讯作者: 赵春蕾
  • 基金资助:
    国家自然科学基金联合基金项目(U1536122); 国家自然科学基金(62272440); 天津市科委重大专项(15ZXDSGX00030); 鹏城实验室重大攻关项目(PCL2023A05)

Abstract:

Using domain name generation technology to identify nuisance website domains offers benefits such as broad coverage, the provision of substantial research data, and timely prevention of dissemination. Existing domain generation algorithms based on domain similarity face issues such as insufficient feature utilization, high redundancy in the generated domains, and a low concentration of nuisance website domains. To address these issues, this study proposes a new nuisance website domain name generation model based on semantic information and domain similarity. The proposed model employs a Transformer encoder to extract the semantic features of domain names and uses them to guide the generation process and enhance feature utilization. It improves Sequence Generative Adversarial Networks (SeqGANs) by separately focusing on semantic features for generation and contextual information for discrimination, thereby increasing the quality of the generated domains and the accuracy of the discriminator. The model detects generated domains through initial filtering, multitool rechecking, and final selection. Experimental results show that, compared to existing domain similarity-based generation models, the proposed model can discover more nuisance website domain names through its domain name generation mode and is advantageous in terms of generation quality, expansion rate, and active monitoring ability.

Key words: nuisance website domain name, generation algorithm, semantic feature, Transformer encoder, attention mechanism

摘要:

利用域名生成技术发现公害网站域名的方式具有覆盖面广、可提供大量研究数据、及时阻断和预防传播等优点。现有基于域名相似度的域名生成模型存在特征利用不充分、生成域名冗余度高、公害网站域名浓度低等问题。因此,提出一种基于域名语义信息与域名相似度的公害网站域名生成发现模型。该模型首先使用Transformer编码器提取域名的语义特征,并将其作为特征向量指导生成工作,提升了对域名特征的利用率;然后对序列生成对抗网络(SeqGAN)进行改进,在生成和鉴别时分别关注域名的语义特征和上下文信息,提高了生成器生成域名的质量和鉴别器的准确率;最后通过初步过滤、多工具复检、最终筛选等步骤,实现了对生成域名的检测。实验结果表明,与现有基于域名相似度的生成模型相比,该模型可以通过域名生成的方式发现更多公害网站域名,且在生成质量、扩展率及主动监测能力等关键指标上更具优势。

关键词: 公害网站域名, 生成算法, 语义特征, Transformer编码器, 注意力机制