Research on Watermarking Attack of Deep Neural Network Models

doi:10.19678/j.issn.1000-3428.0252743

Abstract

Abstract: Model intellectual property protection is an issue that cannot be ignored in model security. Watermarking technology, as the core means of model traceability, provides technical support for copyright verification by embedding special identifiers into model parameters or generated content. However, trained watermarked models can easily be copied and spread, which enables attackers to destroy or remove the watermarks embedded in Deep Neural Network (DNN) models using specific technical means such as fine-tuning, pruning, or adversarial sample attacks, making the verification of model ownership impossible. To gain a deeper understanding of model watermarking attack methods, this study begins by introducing model watermarking attacks and proceeds to classify these methods into two categories, white-box watermarking attacks and black-box watermarking attacks, based on the attacker's access rights and information acquisition capabilities regarding the target model. It also sorts and analyzes the motives, hazards, attack principles, and specific implementation methods of DNN model watermarking attacks. Moreover, it compares and summarizes existing research on model watermarking attacks from the perspectives of attacker capabilities and performance impacts. Finally, it explores the potential positive roles of neural network model watermarking attacks in future research and provides suggestions for in-depth research in the fields of model security and intellectual property protection.

Key words: deep learning, model security, watermarking technology, Artificial Intelligence (AI) security, copyright protection

摘要： 模型知识产权保护已成为模型安全中不可忽视的问题,水印技术作为模型溯源的核心手段,通过将特殊标识嵌入模型参数或生成内容中,为版权验证提供技术支撑。然而,训练完成的含水印模型非常容易被复制并扩散,这使得攻击者能够通过微调、剪枝或对抗样本攻击等特定技术手段,破坏或去除深度神经网络(DNN)模型中嵌入的水印,使得模型所有权无法验证。为了更深入地了解模型水印攻击方法,首先对模型水印攻击进行介绍,然后对模型水印攻击方法进行分类,根据攻击者对目标模型的访问权限和信息获取能力,分为白盒水印攻击和黑盒水印攻击两类,对DNN模型水印攻击的动因、危害、攻击原理和具体实施手段进行梳理和分析,接着对现有模型水印攻击研究从攻击者能力及性能影响等方面进行比较与总结,最后探讨了神经网络模型水印攻击在未来研究中的潜在积极作用,为模型安全和知识产权保护领域的深入研究提供建议。

关键词: 深度学习, 模型安全, 水印技术, 人工智能(AI)安全, 版权保护

CLC Number:

TP309

WANG Wen, YANG Kuiwu, TONG Songsong, WEI Jianghong, XUE Yan, ZHOU Rongkui. Research on Watermarking Attack of Deep Neural Network Models[J]. Computer Engineering, 2026, 52(4): 22-38.

王雯, 杨奎武, 仝松松, 魏江宏, 薛岩, 周荣魁. 深度神经网络模型水印攻击研究[J]. 计算机工程, 2026, 52(4): 22-38.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252743

https://www.ecice06.com/EN/Y2026/V52/I4/22

References

[1] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 779-788.
[2] DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional Transformers for language understanding[C]//Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1(Long and Short Papers). Philadelphia, USA: ACL Press, 2019: 4171-4186.
[3] UCHIDA Y, NAGAI Y, SAKAZAWA S, et al. Embedding watermarks into deep neural networks[C]//Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. New York, USA: ACM Press, 2017: 269-277.
[4] WANG T H, KERSCHBAUM F. Attacks on digital watermarks for deep neural networks[C]//Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Washington D.C., USA: IEEE Press, 2019: 2622-2626.
[5] 何春辉, 葛斌, 徐浩, 等. 版权保护视角下的图像水印处理技术综述[J]. 计算机技术与发展, 2025, 35(8): 1-9. HE C H, GE B, XU H, et al. Survey of image watermarking processing technology from perspective of copyright protection[J]. Computer Technology and Development, 2025, 35(8): 1-9. (in Chinese)
[6] 谢宸琪, 张保稳, 易平. 人工智能模型水印研究综述[J]. 计算机科学, 2021, 48(7): 9-16. XIE C Q, ZHANG B W, YI P. Survey on artificial intelligence model watermarking[J]. Computer Science, 2021, 48(7): 9-16. (in Chinese)
[7] 夏道勋, 王林娜, 宋允飞, 等. 深度神经网络模型数字水印技术研究进展综述[J]. 科学技术与工程, 2023, 23(5): 1799-1811. XIA D X, WANG L N, SONG Y F, et al. Review of deep neural network model digital watermarking technology[J]. Science Technology and Engineering, 2023, 23(5): 1799-1811. (in Chinese)
[8] 吴汉舟, 张杰, 李越, 等. 人工智能模型水印研究进展[J]. 中国图象图形学报, 2023, 28(6): 1792-1810. WU H Z, ZHANG J, LI Y, et al. Overview of artificial intelligence model watermarking[J]. Journal of Image and Graphics, 2023, 28(6): 1792-1810. (in Chinese)
[9] 金彪, 林翔, 熊金波, 等. 基于水印技术的深度神经网络模型知识产权保护[J]. 计算机研究与发展, 2024, 61(10): 2587-2606. JIN B, LIN X, XIONG J B, et al. Intellectual property protection of deep neural network models based on watermarking technology[J]. Journal of Computer Research and Development, 2024, 61(10): 2587-2606. (in Chinese)
[10] ZHANG X Y, TANG Z C, XU Z P, et al. OmniGuard: hybrid manipulation localization via augmented versatile deep image watermarking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2025: 3008-3018.
[11] CHEN X Y, WANG W X, BENDER C, et al. REFIT: a unified watermark removal framework for deep learning systems with limited data[C]//Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security. New York, USA: ACM Press, 2021: 321-335.
[12] KIRKPATRICK J, PASCANU R, RABINOWITZ N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521-3526.
[13] MIYATO T, MAEDA S I, KOYAMA M, et al. Virtual adversarial training: a regularization method for supervised and semi-supervised learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1979-1993.
[14] GRANDVALET Y, BENGIO Y. Semi-supervised learning by entropy minimization[C]//Proceedings of the 18th International Conference on Neural Information Processing Systems. Cambridge, USA: MIT Press, 2004: 529-536.
[15] GOODFELLOW I J, MIRZA M, XIAO D, et al. An empirical investigation of catastrophic forgetting in gradient-based neural networks[EB/OL].[2025-06-04]. https://arxiv.org/abs/1312.6211.
[16] KEMKER R, MCCLURE M, ABITINO A, et al. Measuring catastrophic forgetting in neural networks[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2018, 32(1): 1-8.
[17] COOP R, MISHTAL A, AREL I. Ensemble learning in fixed expansion layer networks for mitigating catastrophic forgetting[J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(10): 1623-1634.
[18] ADI Y, BAUM C, CISSE M, et al. Turning your weakness into a strength: watermarking deep neural networks by backdooring[C]//Proceedings of the 27th USENIX Security Symposium (USENIX Security 18). Austin, USA: USENIX Association, 2018: 1615-1631.
[19] LIU K, DOLAN-GAVITT B, GARG S. Fine-pruning: defending against backdooring attacks on deep neural networks[C]//Proceedings of International Symposium on Research in Attacks, Intrusions, and Defenses. Berlin, Germany: Springer International Publishing, 2018: 273-294.
[20] ZHANG J L, GU Z S, JANG J, et al. Protecting intellectual property of deep neural networks with watermarking[C]//Proceedings of the 2018 on Asia Conference on Computer and Communications Security. New York, USA: ACM Press, 2018: 159-172.
[21] GUO S W, ZHANG T W, QIU H, et al. Fine-tuning is not enough: a simple yet effective watermark removal attack for DNN models[C]//Proceedings of the 30th International Joint Conference on Artificial Intelligence. Montreal, Canada: International Joint Conferences on Artificial Intelligence Organization, 2021: 3635-3641.
[22] ROUHANI B D, CHEN H L, KOUSHANFAR F. DeepSigns: an end-to-end watermarking framework for ownership protection of deep neural networks[C]//Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems. New York, USA: ACM Press, 2019: 485-497.
[23] 李珮玄, 黄土, 罗书卿, 等. 深度学习模型版权保护技术研究综述[J]. 信息安全学报, 2025, 10(1): 17-35. LI P X, HUANG T, LUO S Q, et al. A survey on copyright protection technology of deep learning model[J]. Journal of Cyber Security, 2025, 10(1): 17-35. (in Chinese)
[24] KROGH A, HERTZ J A. A simple weight decay can improve generalization[C]//Proceedings of the 5th International Conference on Neural Information Processing Systems. San Francisco, USA: Morgan Kaufmann Publishers Inc., 1991: 950-957.
[25] GUAN X Q, FENG H M, ZHANG W M, et al. Reversible watermarking in deep convolutional neural networks for integrity authentication[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM Press, 2020: 2273-2280.
[26] ZHAO G J, QIN C, YAO H, et al. DNN self-embedding watermarking: towards tampering detection and parameter recovery for deep neural network[J]. Pattern Recognition Letters, 2022, 164: 16-22.
[27] YAN Y, PAN X, ZHANG M, et al. Rethinking white-box watermarks on deep learning models under neural structural obfuscation[C]//Proceedings the 32nd USENIX Security Symposium (USENIX Security 23). Austin, USA: USENIX Association 2023: 2347-2364.
[28] YANG Z, DANG H, CHANG E C. Effectiveness of distillation attack and countermeasure on neural network watermarking[EB/OL].[2025-06-04]. https://arxiv.org/abs/1906.06046.
[29] WANG B L, YAO Y S, SHAN S, et al. Neural cleanse: identifying and mitigating backdoor attacks in neural networks[C]//Proceedings of the IEEE Symposium on Security and Privacy (SP). Washington D.C., USA: IEEE Press, 2019: 707-723.
[30] CHEN X, LIU C, LI B, et al. Targeted backdoor attacks on deep learning systems using data poisoning[EB/OL].[2025-06-04]. https://arxiv.org/abs/1712.05526.
[31] LIU Y Q, MA S Q, AAFER Y, et al. Trojaning attack on neural networks[C]//Proceedings of 2018 Network and Distributed System Security Symposium. San Diego, USA: Internet Society, 2018: 27-41.
[32] GU T, DOLAN-GAVITT B, GARG S. BadNets: identifying vulnerabilities in the machine learning model supply chain[EB/OL].[2025-06-04]. https://arxiv.org/abs/1708.06733.
[33] CHEN X, WANG W, DING Y, et al. Leveraging unlabeled data for watermark removal of deep neural networks[EB/OL].[2025-06-04]. https://wangwenxiao.github.io/files/watermark_removal_icml19_workshop.pdf.
[34] SHAFIEINEJAD M, LUKAS N, WANG J Q, et al. On the robustness of backdoor-based watermarking in deep neural networks[C]//Proceedings of the 2021 ACM Workshop on Information Hiding and Multimedia Security. New York, USA: ACM Press, 2021: 177-188.
[35] CHEN H, ROUHANI B D, KOUSHANFAR F. Blackmarks: blackbox multibit watermarking for deep neural networks[EB/OL].[2025-06-04]. https://arxiv.org/abs/1904.00344.
[36] BANSAL A, CHIANG P, CURRY M J, et al. Certified neural network watermarks with randomized smoothing[C]//Proceedings of International Conference on Machine Learning.[S. l.]: PMLR, 2022: 1450-1465.
[37] LIU X K, LI F T, WEN B H, et al. Removing backdoor-based watermarks in neural networks with limited data[C]//Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Washington D.C., USA: IEEE Press, 2021: 10149-10156.
[38] PUAH Y H, NGO T A, CHATTOPADHYAY N, et al. BlockDoor: blocking backdoor based watermarks in deep neural networks[EB/OL].[2025-06-04]. https://arxiv.org/abs/2412.12194.
[39] AIKEN W, KIM H, WOO S, et al. Neural network laundering: removing black-box backdoor watermarks from deep neural networks[J]. Computers & Security, 2021, 106: 102277.
[40] HITAJ D, MANCINI L. Have you stolen my model? Evasion attacks against deep neural network watermarking techniques[EB/OL].[2025-06-04]. https://arxiv.org/abs/1809.00615.
[41] 张学军, 席阿友, 加小红, 等. 基于深度学习的指纹室内定位对抗样本攻击研究[J]. 计算机工程, 2024, 50(10): 228-239. ZHANG X J, XI A Y, JIA X H, et al. Study on adversarial sample attacks on deep learning based fingerprinting indoor localization[J]. Computer Engineering, 2024, 50(10): 228-239. (in Chinese)
[42] 刘帅威, 李智, 王国美, 等. 基于Transformer和GAN的对抗样本生成算法[J]. 计算机工程, 2024, 50(2): 180-187. LIU S W, LI Z, WANG G M, et al. Adversarial example generation algorithm based on Transformer and GAN[J]. Computer Engineering, 2024, 50(2): 180-187. (in Chinese)
[43] AN B, DING M, RABBANI T, et al. WAVES: benchmarking the robustness of image watermarks[C]//Proceedings of the 41st International Conference on Machine Learning (ICML’24). Washington D.C., USA: IEEE Press, 2024. 1456-1492.
[44] LIN J, JUAREZ M. A crack in the bark: leveraging public knowledge to remove Tree-Ring watermarks[EB/OL].[2025-06-04]. https://arxiv.org/abs/2506.10502.
[45] WEN Y, KIRCHENBAUER J, GEIPING J, et al. Tree-Rings watermarks: Invisible fingerprints for diffusion images[J]. Advances in Neural Information Processing Systems, 2023, 36: 58047-58063.
[46] QUIRING E, RIECK K. Adversarial machine learning against digital watermarking[C]//Proceedings of the 26th European Signal Processing Conference (EUSIPCO). Washington D.C., USA: IEEE Press, 2018: 519-523.
[47] QUIRING E, ARP D, RIECK K. Forgotten siblings: unifying attacks on machine learning and digital watermarking[C]//Proceedings of the IEEE European Symposium on Security and Privacy (EuroS&P). Washington D.C., USA: IEEE Press, 2018: 488-502.
[48] SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks[EB/OL].[2025-06-04]. https://arxiv.org/abs/1312.6199.
[49] CHENG D N, LI X, LI W H, et al. Large-scale visible watermark detection and removal with deep convolutional networks[C]//Proceedings of the 1st Chinese Conference on Pattern Recognition and Computer Vision. Berlin, Germany: Springer International Publishing, 2018: 27-40.
[50] LI X, LU C, CHENG D N, et al. Towards photo-realistic visible watermark removal with conditional generative adversarial networks[C]//Proceedings of the 10th International Conference on Image and Graphics. Berlin, Germany: Springer International Publishing, 2019: 345-356.
[51] CAO Z Y, NIU S Z, ZHANG J W, et al. Generative adversarial networks model for visible watermark removal[J]. IET Image Processing, 2019, 13(10): 1783-1789.
[52] LUKAS N, DIAA A, FENAUX L, et al. Leveraging optimization for adaptive attacks on image watermarks[EB/OL].[2025-06-04]. https://arxiv.org/abs/2309.16952.
[53] LI X Y. DiffWA: diffusion models for watermark attack[C]//Proceedings of the International Conference on Integrated Intelligence and Communication Systems (ICIICS). Washington D.C., USA: IEEE Press, 2023: 1-8.
[54] ZHAO X, ZHANG K, SU Z, et al.Invisible image watermarks are provably removable using generative AI[J]. Advances in Neural Information Processing Systems, 2024, 37: 8643-8672.
[55] HU Y, JIANG Z, GUO M, et al. Stable signature is unstable: removing image watermark from diffusion models[EB/OL].[2025-06-04]. https://arxiv.org/abs/2405.07145.
[56] JIANG Z Y, ZHANG J H, GONG N Z. Evading watermark based detection of AI-generated content[C]//Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security. New York, USA: ACM Press, 2023: 1168-1181.
[57] CHEN J B, JORDAN M I, WAINWRIGHT M J. HopSkipJumpAttack: a query-efficient decision-based attack[C]//Proceedings of the IEEE Symposium on Security and Privacy (SP). Washington D.C., USA: IEEE Press, 2020: 1277-1294.
[58] HU Y, JIANG Z, GUO M, et al. A transfer attack to image watermarks[EB/OL].[2025-06-04]. https://www.semanticscholar.org/paper/A-Transfer-Attack-to-Image-Watermarks-Hu-Jiang/fe8e1c1765bc1edda1100de281224892f4197f70/figure/0.
[59] ZHU J R, KAPLAN R, JOHNSON J, et al. HiDDeN: hiding data with deep networks[C]//Proceedings of the European conference on computer vision (ECCV). Berlin, Germany: Springer, 2018: 682-697.
[60] TANCIK M, MILDENHALL B, NG R. StegaStamp: invisible hyperlinks in physical photographs[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 2117-2126.
[61] ZHAO X D, ZHANG K X, SU Z B, et al. Invisible image watermarks are provably removable using generative AI[C]//Proceedings of the Advances in Neural Information Processing Systems. Vancouver, Canada: Neural Information Processing Systems Foundation, Inc., 2024: 1-12.
[62] SABERI M, SADASIVAN V S, REZAEI K, et al. Robustness of AI-image detectors: fundamental limits and practical attacks[EB/OL].[2025-06-04]. https://arxiv.org/abs/2310.00076.
[63] 汪旭童, 尹捷, 刘潮歌, 等. 神经网络后门攻击与防御综述[J]. 计算机学报, 2024, 47(8): 1713-1743. WANG X T, YIN J, LIU C G, et al. A survey of backdoor attacks and defenses on neural networks[J]. Chinese Journal of Computers, 2024, 47(8): 1713-1743. (in Chinese)
[64] FAN Z K, GUAN Y P. A deep learning framework for face verification without alignment[J]. Journal of Real-Time Image Processing, 2021, 18(4): 999-1009.
[65] HUANG Y G, PAN L, LUO W, et al. Machine learning-based online source identification for image forensics[M]//CHEN X F, SUSILO W, BERTINO E. Cyber security meets machine learning. Singapore: Springer, 2021: 27-56.
[66] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 10674-10685.
[67] XUE Z H, MARCULESCU R. Dynamic multimodal fusion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Washington D.C., USA: IEEE Press, 2023: 2575-2584.
[68] ZHANG Q, WU H, ZHANG C, et al. Provable dynamic fusion for low-quality multimodal data[C]//Proceedings of the International Conference on Machine Learning.[S. l.]: PMLR, 2023: 41753-41769.
[69] LIU A W, PAN L Y, LU Y J, et al. A survey of text watermarking in the era of large language models[J]. ACM Computing Surveys, 2025, 57(2): 1-36.
[70] WANG B W, WU Y F, WANG G L. Adaptor: improving the robustness and imperceptibility of watermarking by the adaptive strength factor[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(11): 6260-6272.
[71] ZHANG L, LIU X, MARTIN A V, et al. Attack-resilient image watermarking using stable diffusion[J]. Advances in Neural Information Processing Systems, 2024, 37: 38480-38507.
[72] 闫培玲, 刘俊娟, 高志宇. 基于多模态深度神经网络的Web网页攻击重定向混淆检测[J]. 吉林大学学报(理学版), 2025, 63(6): 1731-1736. YAN P L, LIU J J, GAO Z Y. Web page attack redirection confusion detection based on multimodal deep neural network[J]. Journal of Jilin University (Science Edition), 2025, 63(6): 1731-1736. (in Chinese)
[73] AGGARWAL A, MITTAL M, BATTINENI G. Generative adversarial network: an overview of theory and applications[J]. International Journal of Information Management Data Insights, 2021, 1(1): 100004.
[74] ZHAI G T, MIN X K. Perceptual image quality assessment: a survey[J]. Science China Information Sciences, 2020, 63(11): 211301.
[75] HANCOCK J T, KHOSHGOFTAAR T M, JOHNSON J M. Evaluating classifier performance with highly imbalanced big data[J]. Journal of Big Data, 2023, 10(1): 42.
[76] BETZALEL E, PENSO C, FETAYA E. Evaluation metrics for generative models: an empirical study[J]. Machine Learning and Knowledge Extraction, 2024, 6(3): 1531-1544.
[77] FEI J W, XIA Z H, TONDI B, et al. Wide flat minimum watermarking for robust ownership verification of GANs[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 8322-8337.
[78] OGUNDOKUN R O, ABIKOYE C O, SAHU A K, et al. Enhancing security and ownership protection of neural networks using watermarking techniques: a systematic literature review using PRISMA[M]//SAHU A K. Multimedia watermarking. Singapore: Springer, 2024: 1-28.
[79] TRAMER F, ZHANG F, JUELS A, et al. Stealing machine learning models via prediction APIs[C]//Proceedings of the 25th USENIX Security Symposium (USENIX Security 16). Austin, USA: USENIX Association, 2016: 601-618.
[80] MO M K, WANG C T, GUO Q W, et al. A novel robust black-box fingerprinting scheme for deep classification neural networks[J]. Expert Systems with Applications, 2024, 252: 124201.
[81] ZONG W, CHOW Y W, SUSILO W, et al. IPRemover: a generative model inversion attack against deep neural network fingerprinting and watermarking[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 7837-7845.
[82] FERNANDEZ P, COUAIRON G, JÉGOU H, et al. The stable signature: rooting watermarks in latent diffusion models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 22409-22420.
[83] ZHANG G K, WANG L J, SU Y T, et al. MarkPlugger: generalizable watermark framework for latent diffusion models without retraining[J]. IEEE Transactions on Multimedia, 2025, 27: 6211-6220.

Please choose a citation manager

Content to export