作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (1): 76-85. doi: 10.19678/j.issn.1000-3428.0252752

• 大模型时代的服务计算 • 上一篇    下一篇

面向Web3钓鱼网站的域名检测与网页分析方法

刘荣龙, 李梓炜, 万悦, 吴嘉婧, 蒋子规*()   

  1. 中山大学软件工程学院, 广东 珠海 519082
  • 收稿日期:2025-07-11 修回日期:2025-12-04 出版日期:2026-01-15 发布日期:2026-01-15
  • 通讯作者: 蒋子规
  • 作者简介:

    刘荣龙(CCF学生会员), 男, 硕士研究生, 主研方向为区块链、加密货币、反欺诈

    李梓炜, 博士研究生

    万悦, 本科生

    吴嘉婧, 教授

    蒋子规(CCF高级会员、通信作者), 副教授

  • 基金资助:
    国家重点研发计划(2023YFB2704703); 国家自然科学基金(62372485); 国家自然科学基金(623B2102); 国家自然科学基金(62472457); 广东省基础与应用基础研究基金项目(2023A1515011314); 广东省基础与应用基础研究基金项目(2023A1515011336); 中山大学中央高校基本科研业务费专项资金(24lgqb018)

A Method for Domain Detection and Web Page Analysis Targeting Web3 Phishing Websites

LIU Ronglong, LI Ziwei, WAN Yue, WU Jiajing, JIANG Zigui*()   

  1. School of Software Engineering, Sun Yat-sen University, Zhuhai 519082, Guangdong, China
  • Received:2025-07-11 Revised:2025-12-04 Online:2026-01-15 Published:2026-01-15
  • Contact: JIANG Zigui

摘要:

Web3作为"去中心化的下一代互联网"范式, 依托区块链技术, 成为数智服务生态中极具潜力的新兴领域。然而, Web3钓鱼网站对生态健康构成了严重威胁, 钓鱼者精心设计域名作为主要诱饵, 诱导用户访问并进行高风险操作以窃取数字资产。目前, Web3反钓鱼工作主要集中在钓鱼账户检测、钓鱼交易检测和钓鱼团伙挖掘, 而现有钓鱼网站域名检测工作主要面向传统钓鱼网站, 存在适应性不足、缺乏系统性分析等局限性。为此, 提出一种针对Web3钓鱼网站域名的检测方法WPWHunter, 对检测到的真实Web3钓鱼网站进行多维度分析, 并探究大语言模型(LLM)在网页分析方面的应用潜力。WPWHunter检测算法对Web3钓鱼网站域名中的诱导词、视觉欺骗、项目名模仿3种特征进行检测, 实验结果表明, WPWHunter能够有效检测出可疑的Web3钓鱼域名, 在测试集上G-means指标达到0.769, 相比表现最佳的基线方法提升了0.048。此外, 作为补充的探索实验, 使用3个通用LLM对WPWHunter未能成功检测的Web3钓鱼网页内容进行分析, 总结LLM判定Web3钓鱼网站时的依据。

关键词: Web3, 网络安全, 钓鱼网站, 域名检测, 大语言模型

Abstract:

As the paradigm of ″decentralized next-generation Internet, ″ Web3, relying on blockchain technology, has become an emerging field with great potential in the digital intelligence service ecosystem. However, Web3 phishing websites pose a serious threat to ecological health. Phishers carefully design domain names as the primary bait, inducing users to visit and engage in high-risk operations to steal digital assets. Currently, the antiphishing works of Web3 primarily focus on phishing account detection, phishing transaction detection, and phishing gang mining, whereas the existing phishing website domain name detection primarily targets traditional phishing websites, which have limitations such as insufficient adaptability and a lack of systematic analysis. To this end, a detection method called WPWHunter is proposed for Web3 phishing website domain names, which conducts multidimensional analysis on the detected real Web3 phishing websites and explores the potential application of Large Language Model (LLM) in web page analysis. The WPWHunter algorithm detects three features in Web3 phishing website domain names: inducing words, visual deception, and item name imitation. The experimental results show that WPWHunter can effectively detect suspicious Web3 phishing domains with a G-means index of 0.769 on a test set, which is 0.048 higher than that of the best-performing baseline method. Additionally, as a supplementary exploratory experiment, three universal LLM are used to analyze the content of Web3 phishing websites that WPWHunter failed to detect and the logic used by LLM to determine Web3 phishing websites is summarized.

Key words: Web3, cyber security, phishing website, domain detection, Large Language Model (LLM)