作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (1): 242-253. doi: 10.19678/j.issn.1000-3428.0069948

• 网络空间安全 • 上一篇    下一篇

基于多尺度双流网络的深度伪造检测方法

蒋翠玲, 程梓源, 俞新贵, 万永菁*()   

  1. 华东理工大学信息科学与工程学院, 上海 200237
  • 收稿日期:2024-06-03 修回日期:2024-09-06 出版日期:2026-01-15 发布日期:2024-10-23
  • 通讯作者: 万永菁
  • 作者简介:

    蒋翠玲, 女, 副教授、博士, 主研方向为多媒体信息安全、信息隐藏与篡改取证

    程梓源, 硕士研究生

    俞新贵, 硕士研究生

    万永菁(通信作者), 教授

  • 基金资助:
    国家自然科学基金(62272164)

DeepFake Detection Method Based on Multi-Scale Dual-Stream Network

JIANG Cuiling, CHENG Ziyuan, YU Xingui, WAN Yongjing*()   

  1. School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
  • Received:2024-06-03 Revised:2024-09-06 Online:2026-01-15 Published:2024-10-23
  • Contact: WAN Yongjing

摘要:

人脸深度伪造技术的滥用给社会和个人带来了极大的安全隐患, 因此深度伪造检测技术已成为当今研究的热点。目前基于深度学习的伪造检测技术在高质量(HQ)数据集上效果较好, 但在低质量(LQ)数据集和跨数据集上的检测效果不佳。为提升深度伪造检测的泛化性, 提出一种基于多尺度双流网络(MSDSnet)的深度伪造检测方法。MSDSnet输入分为空域特征流和高频噪声特征流, 首先采用多尺度融合(MSF)模块捕获不同情况下图像在空域被篡改的粗粒度人脸特征和伪造图像的细粒度高频噪声特征信息, 然后通过MSF模块将空域流和高频噪声流的双流特征充分融合, 由多模态交互注意力(MIA)模块进一步交互以充分学习双流特征信息, 最后利用FcaNet(Frequency Channel Attention Network)获取伪造人脸特征的全局信息并完成检测分类。实验结果表明, 该方法在HQ数据集Celeb-DF v2上的准确率为98.54%, 在LQ数据集FaceForensics++上的准确率为93.11%, 同时在跨数据集上的实验效果也优于其他同类方法。

关键词: 深度伪造检测, 双流网络, 多尺度融合, 多模态交互注意力, 高频噪声

Abstract:

DeepFake-enabled abuse of face forgery technology has given rise to considerable security risks to society and individuals; therefore, DeepFake detection has become a hot topic of research. Current deep learning-based forgery detection techniques exhibit good results on High-Quality (HQ) datasets but show poor performance on Low-Quality (LQ) datasets and across different datasets. To improve the generalization of DeepFake detection performance, this paper proposes a Multi-Scale Dual-Stream Network (MSDSnet) for DeepFake detection. The network input is divided into a spatial-domain feature stream and a high-frequency noise feature flow. First, the Multi-Scale Fusion (MSF) module is used to capture the tampered coarse-grain facial features from images and fine-grained high-frequency noise information from forged images in different situations. The network fully integrates the dual-stream features of the spatial-domain feature stream and high-frequency noise feature flow through the MSF module. The Multi-modal Interaction Attention (MIA) module further interacts to learn the dual-stream information. Finally, a Frequency Channel Attention Network (FcaNet) is used to obtain the global information of the forged face features for complete detection and classification. Experimental results show that the proposed method achieves 98.54% accuracy on the HQ dataset Celeb-DF v2 and 93.11% on the LQ dataset FaceForensics++. Simultaneously, the experimental results are better than those obtained using other methods in cross-dataset experiments.

Key words: DeepFake detection, dual-stream network, Multi-Scale Fusion (MSF), Multi-modal Interactive Attention (MIA), high-frequency noise