Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (4): 22-38. doi: 10.19678/j.issn.1000-3428.0252743

• Frontier Perspectives and Reviews • Previous Articles    

Research on Watermarking Attack of Deep Neural Network Models

WANG Wen, YANG Kuiwu, TONG Songsong, WEI Jianghong, XUE Yan, ZHOU Rongkui   

  1. School of Data and Target Engineering, The PLA Information Engineering University, Zhengzhou 450001, Henan, China
  • Received:2025-07-10 Revised:2025-10-09 Published:2026-04-08

深度神经网络模型水印攻击研究

王雯, 杨奎武, 仝松松, 魏江宏, 薛岩, 周荣魁   

  1. 中国人民解放军网络空间部队信息工程大学数据与目标工程学院, 河南 郑州 450001
  • 作者简介:王雯(CCF学生会员),女,硕士研究生,主研方向为人工智能安全、模型水印;杨奎武(通信作者),副教授;仝松松,硕士研究生,E-mail:yangkw@aliyun.com;魏江宏,讲师、博士后;薛岩,本科生;周荣魁,博士研究生。
  • 基金资助:
    国家自然科学基金(62172434);河南省高等教育教学改革研究与实践项目(2024SJGLX0095)。

Abstract: Model intellectual property protection is an issue that cannot be ignored in model security. Watermarking technology, as the core means of model traceability, provides technical support for copyright verification by embedding special identifiers into model parameters or generated content. However, trained watermarked models can easily be copied and spread, which enables attackers to destroy or remove the watermarks embedded in Deep Neural Network (DNN) models using specific technical means such as fine-tuning, pruning, or adversarial sample attacks, making the verification of model ownership impossible. To gain a deeper understanding of model watermarking attack methods, this study begins by introducing model watermarking attacks and proceeds to classify these methods into two categories, white-box watermarking attacks and black-box watermarking attacks, based on the attacker's access rights and information acquisition capabilities regarding the target model. It also sorts and analyzes the motives, hazards, attack principles, and specific implementation methods of DNN model watermarking attacks. Moreover, it compares and summarizes existing research on model watermarking attacks from the perspectives of attacker capabilities and performance impacts. Finally, it explores the potential positive roles of neural network model watermarking attacks in future research and provides suggestions for in-depth research in the fields of model security and intellectual property protection.

Key words: deep learning, model security, watermarking technology, Artificial Intelligence (AI) security, copyright protection

摘要: 模型知识产权保护已成为模型安全中不可忽视的问题,水印技术作为模型溯源的核心手段,通过将特殊标识嵌入模型参数或生成内容中,为版权验证提供技术支撑。然而,训练完成的含水印模型非常容易被复制并扩散,这使得攻击者能够通过微调、剪枝或对抗样本攻击等特定技术手段,破坏或去除深度神经网络(DNN)模型中嵌入的水印,使得模型所有权无法验证。为了更深入地了解模型水印攻击方法,首先对模型水印攻击进行介绍,然后对模型水印攻击方法进行分类,根据攻击者对目标模型的访问权限和信息获取能力,分为白盒水印攻击和黑盒水印攻击两类,对DNN模型水印攻击的动因、危害、攻击原理和具体实施手段进行梳理和分析,接着对现有模型水印攻击研究从攻击者能力及性能影响等方面进行比较与总结,最后探讨了神经网络模型水印攻击在未来研究中的潜在积极作用,为模型安全和知识产权保护领域的深入研究提供建议。

关键词: 深度学习, 模型安全, 水印技术, 人工智能(AI)安全, 版权保护

CLC Number: