作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (2): 132-139. doi: 10.19678/j.issn.1000-3428.0067063

• 网络空间安全 • 上一篇    下一篇

基于后门的鲁棒后向模型水印方法

曾嘉忻1, 张卫明2,*(), 张荣1   

  1. 1. 中国科学技术大学信息科学技术学院, 安徽 合肥 230027
    2. 中国科学技术大学网络空间安全学院, 安徽 合肥 230027
  • 收稿日期:2023-03-01 出版日期:2024-02-15 发布日期:2023-05-25
  • 通讯作者: 张卫明
  • 基金资助:
    国家自然科学基金联合基金重点项目(U20B2047)

Robust Backward Model Watermarking Method Based on Backdoor

Jiaxin ZENG1, Weiming ZHANG2,*(), Rong ZHANG1   

  1. 1. School of Information Science and Technology, University of Science and Technology of China, Hefei 230027, Anhui, China
    2. School of Cyber Science and Technology, University of Science and Technology of China, Hefei 230027, Anhui, China
  • Received:2023-03-01 Online:2024-02-15 Published:2023-05-25
  • Contact: Weiming ZHANG

摘要:

深度学习模型的训练成本高,但窃取成本低,容易被复制并扩散。模型的版权拥有者可以利用后门等方式在模型中嵌入水印,通过验证水印来证明模型版权。根据水印嵌入阶段的不同,模型水印又可分为前向模型水印和后向模型水印,前向模型水印在模型训练之初就嵌入水印,而后向模型水印的嵌入发生在模型原始任务训练完成后,计算量小,更为灵活。但是已有的后向模型水印方法鲁棒性较弱,不能抵抗微调、剪枝等水印擦除攻击。分析后向模型水印鲁棒性弱于前向模型水印的原因,在此基础上,提出一种通用的鲁棒后向模型水印方法,在水印嵌入时引入对模型中间层特征和模型输出的约束,减小水印任务对原始任务的影响,增强后向模型水印的鲁棒性。在CIFAR-10、CALTECH-101、GTSRB等数据集上的实验结果表明,该方法能有效提升后向模型水印在微调攻击下的鲁棒性,CIFAR-10数据集实验中的最优约束设置与后向模型水印基线相比,水印验证成功率平均提升24.2个百分点,同时,该方法也提升了后向模型水印在剪枝等攻击下的鲁棒性。

关键词: 深度学习模型, 模型版权保护, 模型水印, 后门, 鲁棒性

Abstract:

The training cost of deep learning model is high; however, the stealing cost is low. This model is easy to copy and spread. The copyright owner of a model can embed a watermark in the model using a backdoor or another method. The copyright of the model is proven by verifying the embedded watermark. Watermark embedding strategies can be classified into forward and backward watermarking models. Forward model watermarking embeds watermarks from scratch, whereas backward model watermarking occurs after the original model training. Backward model watermarking requires fewer computations and is more flexible. However, unlike forward model watermarking, existing backward watermarking methods can be easily erased by fine-tuning, pruning, and other attacks. This study analyzes the reason for a weaker backward model watermarking compared to forward model watermarking. Based on this, a general method is proposed to enhance the robustness of backward model watermarking. This method introduces constraints on the middle-layer features and outputs of the model during the watermark embedding process. Experiments on CIFAR-10, CALTECH-101, GTSRB, and other datasets demonstrate that the proposed method can effectively improve the robustness of backward model watermarking against fine-tuning attacks, particularly on the CIFAR-10 dataset, improving the watermark success rate by an average of 24.2 percentage points compared to the baseline method. It also improves the robustness of backward model watermarking under pruning attacks.

Key words: deep learning model, copyright protection of the model, model watermarking, backdoor, robustness