[1] 李旭嵘,纪守领,吴春明,等.深度伪造与检测技术综述[J].
软件学报, 2021.
Li Xurong, Ji Shouling, Wu Chunming, et al. A Survey on
Deepfakes and Detection Techniques[J]. Journal of
Softmax, 2021.
[2] 任延珍,刘晨雨,刘武洋,等.语音伪造及检测技术研究综
述[J].信号处理, 2021, 37(12): 28.
Ren Yanzhen, Liu Chenyu, Liu Wuyang, et al. A Survey
on Forgery and Detection[J]. Journal of Signal
Processing, 2021, 37(12): 28.
[3] 梁瑞刚,吕培卓,赵月,等.视听觉深度伪造检测技术研究
综述[J].信息安全学报, 2020, 5(2): 1-17.
Liang Ruigang, Lv Peizhuo, Zhao yue, et al. A Survey of
Audiovisual Deepfake Detection Techniques[J].Journal
of Cyber Security, 2020, 5(2): 1-17
[4] 李泽宇,张旭鸿,蒲誉文,等.多模态深度伪造及检测技术
综述[J].计算机研究与发展,2023,60(6): 1396-1416.
Li Zeyu, Zhang Xuhong, Pu Yuwen, et al. A Survey onMultimodal Deepfake and Detection Techniques[J].
Journal of Computer Research and Development, 2023,
60(6): 1396-1416.
[5] FaceSwap. FaceSwap github[EB/OL]. [2022-09-14].
https://github.com/MarekKowalski/FaceSwap.
[6] Deepfakes. Deepfakes github[EB/OL]. [2022-09-14].
https://github.com/Deepfakes/faceswap.
[7] Li L, Bao J, Yang H, et al. Faceshifter: Towards high
fidelity and occlusion aware face swapping[C]//
Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, 2020: 5074-5083.
[8] Xu C, Zhang J, Hua M, et al. Region-Aware Face
Swapping[C]//Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. New
Orleans, LA, USA: IEEE, 2022: 7632-7641.
[9] Liu Zhian, Li Maomao, Zhang Yong, et al. Fine-Grained
Face Swapping Via Regional GAN Inversion[C]//
IEEE/CVF Conference on Computer Vision and Pattern
Recognition: IEEE, 2023: 8578-8587.
[10] Zhao W, Rao Y, Shi W, et al. Diffswap: High-Fidelity and
Controllable Face Swapping via 3D-Aware Masked
Diffusion[C]//Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. Vancouver,
BC, Canada: IEEE, 2023: 8568-8577.
[11] Baliah S, Lin Q, Liao S, et al. Realistic and Efficient Face
Swapping: A Unified Approach with Diffusion Models[J].
arXiv, 2024, arXiv preprint arXiv:2409.07269.
[12] Thies J, Zollhöfer M, Theobalt C, et al. Headon: Real-time
Reenactment of Human Portrait Videos[J]. ACM Trans.
Graph, 2018, 37(4): 1-13.
[13] Prajwal K R, Mukhopadhyay R, Namboodiri V P, et al. A
Lip Sync Expert Is All You Need for Speech to Lip
Generation In The Wild[C]//Proceedings of the 28th ACM
international conference on multimedia. New York, NY,
USA: Association for Computing Machinery, 2020:
484-492.
[14] Liang B, Pan Y, Guo Z, et al. Expressive Talking Head
Generation with Granular Audio-Visual
Control[C]//Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. New Orleans,
LA, USA: IEEE, 2022: 3387-3396.
[15] Choi Y, Choi M, Kim M, et al. Stargan: Unified
Generative Adversarial Networks for Multi-Domain
Image-to-Image Translation[C]//Proceedings of the IEEE
conference on computer vision and pattern recognition.
Salt Lake City, UT, USA: IEEE, 2018: 8789-8797.
[16] Gao Y, Wei F, Bao J, et al. High-fidelity and Arbitrary
Face Editing[C]//Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition.
Nashville, TN, USA: IEEE, 2021: 16115-16124.
[17] Xu Y, Yin Y, Jiang L, et al. Transeditor:
Transformer-Based Dual-Space GAN for Highly
Controllable Facial Editing[C]//Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern
Recognition. New Orleans, LA, USA: IEEE, 2022:
7683-7692.
[18] Karras T, Aila T, Laine S, et al. Progressive Growing of
GANs for Improved Quality, Stability, and Variation[C]//
Proceedings of International Conference on Learning
Representations. Vancouver, Canada: OpenReview, 2018.
[19] Karras T, Laine S, Aila T. A Style-Based Generator
Architecture for Generative Adversarial
Networks[C]//Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition. Long Beach,
CA, USA: IEEE, 2019: 4401-4410.
[20] Karras T, Laine S, Aittala M, et al. Analyzing and
Improving the Image Quality of
StyleGAN[C]//Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition. Seattle, WA,
USA: IEEE, 2020: 8110-8119.
[21] Karras T, Aittala M, Laine S, et al. Alias-Free Generative
Adversarial Networks[C]//Proceedings of the 35th
International Conference on Neural Information
Processing Systems. Red Hook, NY, USA: Curran
Associates Inc, 2021, 66: 1-12.
[22] Xia W, Yang Y, Xue J H, et al. Tedigan: Text-Guided
Diverse Face Image Generation and
Manipulation[C]//Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition.
Nashville, TN, USA: IEEE, 2021: 2256-2265.
[23] Sun J, Deng Q, Li Q, et al. AnyFace: Free-style
Text-to-Face Synthesis and Manipulation[C]//Proceedings
of the IEEE/CVF conference on computer vision and
pattern recognition. New Orleans, LA, USA: IEEE, 2022:
18687-18696.
[24] Rabiner L R, Schafer R W. Introduction to Digital Speech
Processing[J]. Foundations and Trends in Signal
Processing, 2007, 1(1–2): 1-194.
[25] Balestri M, Pacchiotti A, Quazza S. Choose the best tomodify the least: A new generation concatenative
synthesis system[C]//European Conference on Speech
Communication and Technology DBLP. Budapest,
Hungary: ISCA, 1999: 2291-2294.
[26] Nishigaki Y, Takamichi S, Toda T, et al. HMM-Based
Speech Synthesis System with Prosody Modification
Based on Speech Input[J]. IEICE Technical Report; IEICE
Tech. Rep, 2014, 114(365): 81-86.
[27] Arık S Ö, Chrzanowski M, Coates A, et al. Deep Voice:
Real-time Neural Text-to-Speech[C]//International
Conference on Machine Learning. Sydney, Australia:
PMLR, 2017: 195-204.
[28] Ping W, Peng K, Gibiansky A, et al. Deep Voice3:
2000-Speaker Neural Text-to-Speech[C]//International
Conference on Learning Representations. Vancouver, BC,
Canada: OpenReview, 2018, 79: 1094-1099.
[29] Van Den Oord A, Dieleman S, Zen H, et al. Wavenet: A
Generative Model for Raw Audio[C]//ISCA Workshop on
Speech Synthesis Workshop, Sunnyvale, USA, 2016:
13-15.
[30] Wang Y, Skerry-Ryan R J, Stanton D, et al. Tacotron: A
Fully End-to-End Text-to-Speech Synthesis Model[J].
arXiv, 2017, arXiv preprint arXiv:1703.10135.
[31] Ren Y, Ruan Y, Tan X, et al. Fastspeech: Fast, Robust and
Controllable Text to Speech[J]. Advances in neural
information processing systems, 2019, 32.
[32] Kong J, Kim J, Bae J. Hifi-GAN: Generative Adversarial
Networks for Efficient and High Fidelity Speech
Synthesis[J]. Advances in neural information processing
systems, 2020, 33: 17022-17033.
[33] Elias I, Zen H, Shen J, et al. Parallel Tacotron:
Non-Autoregressive and Controllable TTS[C]//ICASSP
2021-2021 IEEE International Conference on Acoustics,
Speech and Signal Processing. Toronto, ON, Canada:
IEEE, 2021: 5709-5713.
[34] Jeong M, Kim H, Cheon S J, et al. Diff-TTS: A Denoising
Diffusion Model for Text-to-Speech[C]//Interspeech. Brno,
Czechia: ISCA, 2021: 3605-3609.
[35] Kawanami H, Iwami Y, Toda T, et al. GMM-based Voice
Conversion Applied to Emotional Speech Synthesis[J].
2003.
[36] Wu D Y, Lee H. One-Shot Voice Conversion by Vector
Quantization[C]//ICASSP 2020-2020 IEEE International
Conference on Acoustics, Speech and Signal Processing.
Barcelona, Spain: IEEE, 2020: 7734-7738.
[37] Suda H, Kotani G, Saito D. Nonparallel Training of
Exemplar-Based Voice Conversion System Using
INCA-Based Alignment Technique[C]//Interspeech.
Shanghai, China: ISCA, 2020: 4681-4685.
[38] Suda H, Kotani G, Saito D. INmfCA Algorithm for
Training of Nonparallel Voice Conversion Systems Based
on Non-Negative Matrix Factorization[J]. IEICE
Transactions on Information and Systems, 2022, 105(6):
1196-1210.
[39] Xu L, Zhong R, Liu Y, et al. Flow-VAE VC: End-to-End
Flow Framework with Contrastive Loss for Zero-shot
Voice Conversion[C]//Interspeech. Dublin, Ireland: ISCA,
2023: 2293-2297.
[40] Choi H Y, Lee S H, Lee S W. DDDM-VC: Decoupled
Denoising Diffusion Models with Disentangled
Representation and Prior Mixup for Verified Robust Voice
Conversion[C]//Proceedings of the AAAI Conference on
Artificial Intelligence. VANCOUVER, CANADA: AAAI,
2024, 38(16): 17862-17870.
[41] Zhang X, Wang J, Cheng N, et al. Voice Conversion with
Denoising Diffusion Probabilistic GAN
Models[C]//International Conference on Advanced Data
Mining and Applications. Cham. Berlin, Heidelberg:
Springer-Verlag, 2023: 154-167.
[42] Zi B, Chang M, Chen J, et al. Wilddeepfake: A
challenging real-world dataset for deepfake
detection[C]//Proceedings of the 28th ACM international
conference on multimedia. New York, NY, USA:
Association for Computing Machinery, 2020: 2382-2390.
[43] Jiang L, Li R, Wu W, et al. Deeperforensics-1.0: A
large-scale dataset for real-world face forgery
detection[C]//Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition. Seattle, WA,
USA: IEEE, 2020: 2889-2898.
[44] Dang H, Liu F, Stehouwer J, et al. On the detection of
digital face manipulation[C]//Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern
recognition. Seattle, WA, USA: IEEE, 2020: 5781-5790.
[45] Kwon P, You J, Nam G, et al. Kodf: A large-scale korean
deepfake detection dataset[C]//Proceedings of the
IEEE/CVF international conference on computer vision.
Montreal, QC, Canada: IEEE, 2021: 10744-10753.
[46] Nadimpalli A V, Rattani A. GBDF: gender balanced
deepfake dataset towards fair deepfake
detection[C]//International Conference on PatternRecognition. Berlin, Heidelberg: Springer-Verlag, 2022:
320-337.
[47] Narayan K, Agarwal H, Thakral K, et al. Df-platter:
Multi-face heterogeneous deepfake
dataset[C]//Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. Vancouver, BC,
Canada: IEEE, 2023: 9739-9748.
[48] Liu X, Wang X, Sahidullah M, et al. Asvspoof 2021:
Towards spoofed and deepfake speech detection in the
wild[J]. IEEE/ACM Trans. Audio, Speech and Lang. Proc,
2023, 31: 2507-2522.
[49] Yi J, Bai Y, Tao J, et al. Half-truth: A partially fake audio
detection dataset[C]//Interspeech. Brno, Czechia: ISCA,
2021: 1654-1658.
[50] Yi J, Fu R, Tao J, et al. Add 2022: the first audio deep
synthesis detection challenge[C]//ICASSP 2022-2022
IEEE International Conference on Acoustics, Speech and
Signal Processing. Singapore, Singapore: IEEE, 2022:
9216-9220.
[51] Yi J, Tao J, Fu R, et al. Add 2023: the second audio
deepfake detection challenge[C]//IJCAI 2023
Workshopon Deepfake Audio Detection and Analysis.
Macao, S.A.R, 2023.
[52] Sanderson C. The vidtimit database[J]. 2002.
[53] Sun Chengzhe, Jia Shan, Hou Shuwei, et al.
AI-Synthesized Voice Detection Using Neural Vocoder
Artifacts[C]// Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition Workshops.
Vancouver, BC, Canada: IEEE, 2023: 904-912.
[54] Du J, Lin I, Chiu I, et al. DFADD: The Diffusion and
Flow-Matching Based Audio Deepfake Dataset[J]. arXiv,
2024, arXiv preprint arXiv:2409.08731.
[55] Lu Y, Xie Y, Fu R, et al. Codecfake: An Initial Dataset for
Detecting LLM-based Deepfake Audio[C]//Interspeech.
Kos Island, Greece: ISCA, 2024: 1390-1394.
[56] Li Xinfeng, Li Kai, Zheng Yifan, et al. SafeEar: Content
Privacy-Preserving Audio Deepfake Detection[C]//
Proceedings of the 2024 ACM SIGSAC Conference on
Computer and Communications Security (CCS). New
York, NY, USA: Association for Computing Machinery,
2024: 3585-3599.
[57] Khalid H, Tariq S, Kim M, et al. FakeAVCeleb: A novel
audio-video multimodal deepfake dataset[C]//Conference
on Neural Information Processing Systems Track on
Datasets and Benchmarks. Seattle, WA, USA: MIT Press,
2021.
[58] Cai Z, Stefanov K, Dhall A, et al. Do you really mean that?
content driven audio-visual deepfake dataset and
multimodal method for temporal forgery
localization[C]//International Conference on Digital
Image Computing: Techniques and Applications. Sydney,
Australia: IEEE, 2022: 1-10.
[59] Hou Y, Fu H, Chen C, et al. PolyGlotFake: A Novel
Multilingual and Multimodal DeepFake Dataset[J]. arxiv,
2024, arxiv preprint arxiv:2405.08838.
[60] Cai Z, Ghosh S, Adatia A P, et al. AV-Deepfake1M: A
large-scale LLM-driven audio-visual deepfake
dataset[C]//Proceedings of the 32nd ACM International
Conference on Multimedia. New York, NY, USA:
Association for Computing Machinery, 2024: 7414-7423.
[61] Yang W, Zhou X, Chen Z, et al. Avoid-df: Audio-visual
joint learning for detecting deepfake[J]. IEEE
Transactions on Information Forensics and Security, 2023,
18: 2015-2029.
[62] Pan X, Zhang X, Lyu S. Exposing image splicing with
inconsistent local noise variances[C]//IEEE International
conference on computational photography. Seattle, WA,
USA: IEEE, 2012: 1-10.
[63] Ciftci U A, Demir I, Yin L. Fakecatcher: Detection of
synthetic portrait videos using biological signals[J]. IEEE
transactions on pattern analysis and machine intelligence,
2020: 1-1.
[64] Haliassos A, Vougioukas K, Petridis S, et al. Lips don't lie:
A generalisable and robust approach to face forgery
detection[C]//Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition. Nashville, TN,
USA: IEEE, 2021: 5039-5049.
[65] Wang H, Liu Z, Wang S. Exploiting complementary
dynamic incoherence for deepfake video detection[J].
IEEE Transactions on Circuits and Systems for Video
Technology, 2023, 33(8): 4027-4040.
[66] Nguyen H M, Derakhshani R. Eyebrow recognition for
identifying deepfake videos[C]//international conference
of the biometrics special interest group. Darmstadt,
Germany: IEEE, 2020: 1-5.
[67] Nirkin Y, Wolf L, Keller Y, et al. Deepfake detection based
on discrepancies between faces and their context[J]. IEEE
Transactions on Pattern Analysis and Machine
Intelligence, 2021, 44(10): 6111-6121.
[68] Khormali A, Yuan J S. DFDT: an end-to-end deepfakedetection framework using vision transformer[J]. Applied
Sciences, 2022, 12(6): 2953.
[69] Gu Z, Chen Y, Yao T, et al. Spatiotemporal inconsistency
learning for deepfake video detection[C]//Proceedings of
the 29th ACM international conference on multimedia.
New York, NY, USA: Association for Computing
Machinery, 2021: 3473-3481.
[70] Zhang D, Lin F, Hua Y, et al. Deepfake video detection
with spatiotemporal dropout transformer[C]//Proceedings
of the 30th ACM international conference on multimedia.
New York, NY, USA: Association for Computing
Machinery, 2022: 5833-5841.
[71] Xu Y, Liang J, Jia G, et al. Tall: Thumbnail layout for
deepfake video detection[C]//Proceedings of the
IEEE/CVF international conference on computer vision.
Paris, France: IEEE, 2023: 22658-22668.
[72] Zhao H, Zhou W, Chen D, et al. Multi-attentional
deepfake detection[C]//Proceedings of the IEEE/CVF
conference on computer vision and pattern recognition.
Nashville, TN, USA: IEEE, 2021: 2185-2194.
[73] X. Li, R. Ni, P. Yang, et al. Artifacts-Disentangled
Adversarial Learning for Deepfake Detection[J]. IEEE
Transactions on Circuits and Systems for Video
Technology, 2023, 33(4): 1658-1670.
[74] Y. Hua, R. Shi, P. Wang, et al. Learning Patch-Channel
Correspondence for Interpretable Face Forgery
Detection[J]. IEEE Transactions on Image Processing,
2023, 32: 1668-1680.
[75] Yiwen ZHANG, Manchun CAI, Yonghao CHEN, Yi ZHU,
Lifeng YAO. Multi-Scale Deepfake Detection Method
with Fusion of Spatial Features[J]. Computer Engineering,
2024, 50(7): 240-250.
[76] T Qiao, S Xie, Y Chen, F. Fully Unsupervised Deepfake
Video Detection Via Enhanced Contrastive Learning[J].
IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2024, 46(7): 4654-4668.
[77] W Lu, L Liu, B Zhang, et al. Detection of Deepfake
Videos Using Long-Distance Attention[J]. IEEE
Transactions on Neural Networks and Learning Systems,
2024, 35(7): 9366-9379.
[78] Q Zhou, K -Y Zhang, T Yao, et al. Test-Time Domain
Generalization for Face Anti-Spoofing[C]//IEEE/CVF
Conference on Computer Vision and Pattern Recognition.
Seattle, WA, USA: IEEE, 2024: 175-187.
[79] Wu Z, Chng E S, Li H. Detecting converted speech and
natural speech for anti-spoofing attack in speaker
recognition[C]//Interspeech. Portland, OR, USA: ISCA,
2012: 1700-1703.
[80] Chitale M, Dhawale A, Dubey M, et al. A Hybrid
CNN-LSTM Approach for Deepfake Audio Detection[C]//
International Conference on Artificial Intelligence For
Internet of Things. Vellore, India: IEEE, 2024: 1-6.
[81] Martín-Doñas J M, Álvarez A. The vicomtech audio
deepfake detection system based on wav2vec2 for the
2022 add challenge[C]//ICASSP 2022-2022 IEEE
International Conference on Acoustics, Speech and Signal
Processing. IEEE, 2022: 9241-9245.
[82] Guo Y, Huang H, Chen X, et al. Audio Deepfake
Detection With Self-Supervised Wavlm And Multi-Fusion
Attentive Classifier[C]//ICASSP 2024-2024 IEEE
International Conference on Acoustics, Speech and Signal
Processing. Seoul, Korea, Republic of: IEEE, 2024:
12702-12706.
[83] Y Xie, H Cheng, Y Wang, et al. Learning A
Self-Supervised Domain-Invariant Feature Representation
for Generalized Audio Deepfake
Detection[C]//Interspeech. Dublin, Ireland: ISCA, 2023:
2808-2812.
[84] Lei Z, Yang Y, Liu C, et al. Siamese Convolutional Neural
Network Using Gaussian Probability Feature for Spoofing
Speech Detection[C]//Interspeech. Shanghai, China: ISCA,
2020: 1116-1120.
[85] Hamza A, Javed A R R, Iqbal F, et al. Deepfake audio
detection via MFCC features using machine learning[J].
IEEE Access, 2022, 10: 134018-134028.
[86] Li X, Li N, Weng C, et al. Replay and synthetic speech
detection with res2net architecture[C]//ICASSP
2021-2021 IEEE international conference on acoustics,
speech and signal processing. Toronto, ON, Canada: IEEE,
2021: 6354-6358.
[87] Liu X, Liu M, Wang L, et al. Leveraging positional-related
local-global dependency for synthetic speech
detection[C]//ICASSP 2023-2023 IEEE International
Conference on Acoustics, Speech and Signal Processing.
Rhodes Island, Greece: IEEE, 2023: 1-5.
[88] Ge W, Patino J, Todisco M, et al. Raw differentiable
architecture search for speech deepfake and spoofing
detection[C]//Edition of the Automatic Speaker
Verification and Spoofing Countermeasures Challenge.
ISCA, 2021: 22-28. [89] Tak H, Jung J, Patino J, et al. End-to-end spectro-temporal
graph attention networks for speaker verification
anti-spoofing and speech deepfake detection[C]//Edition
of the Automatic Speaker Verification and Spoofing
Countermeasures Challenge. ISCA, 2021: 1-8.
[90] Zhou Y, Lim S N. Joint audio-visual deepfake
detection[C]//Proceedings of the IEEE/CVF International
Conference on Computer Vision. Montreal, QC, Canada:
IEEE, 2021: 14800-14809.
[91] Ilyas H, Javed A, Malik K M. AVFakeNet: A unified
end-to-end Dense Swin Transformer deep learning model
for audio–visual deepfakes detection[J]. Applied Soft
Computing, 2023, 136: 110124.
[92] Raza M A, Malik K M. Multimodaltrace: Deepfake
detection using audiovisual representation
learning[C]//Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. Vancouver, BC,
Canada: IEEE, 2023: 993-1000.
[93] Wang R, Ye D, Tang L, et al. AVT2-DWF: Improving
Deepfake Detection with Audio-Visual Fusion and
Dynamic Weighting Strategies[J]. arXiv, 2024, arXiv
preprint arXiv:2403.14974.
[94] Mittal T, Bhattacharya U, Chandra R, et al. Emotions don't
lie: An audio-visual deepfake detection method using
affective cues[C]//Proceedings of the 28th ACM
international conference on multimedia. New York, NY,
USA: Association for Computing Machinery, 2020:
2823-2832.
[95] Chugh K, Gupta P, Dhall A, et al. Not made for each
other-audio-visual dissonance-based deepfake detection
and localization[C]//Proceedings of the 28th ACM
international conference on multimedia. New York, NY,
USA: Association for Computing Machinery, 2020:
439-447.
[96] Cheng H, Guo Y, Wang T, et al. Voice-face homogeneity
tells deepfake[J]. ACM Transactions on Multimedia
Computing, Communications and Applications, 2023,
20(3): 1-22.
[97] Katamneni V S, Rattani A. MIS-AVoiDD: Modality
invariant and specific representation for audio-visual
deepfake detection[C]//International Conference on
Machine Learning and Applications. Jacksonville, FL,
USA: IEEE, 2023: 1371-1378.
[98] Liu X, Yu Y, Li X, et al. Mcl: multimodal contrastive
learning for deepfake detection[J]. IEEE Transactions on
Circuits and Systems for Video Technology, 2024, 34(4):
2803-2813.
[99] M. Liu, J. Wang, X. Qian, et al. Audio-Visual Temporal
Forgery Detection Using Embedding-Level Fusion and
Multi-Dimensional Contrastive Loss[J], IEEE
Transactions on Circuits and Systems for Video
Technology, 2024, 34(8): 6937-6948.
[100] Zou H, Shen M, Hu Y, et al. Cross-Modality and
Within-Modality Regularization for Audio-Visual
Deepfake Detection[C]//ICASSP 2024-2024 IEEE
International Conference on Acoustics, Speech and Signal
Processing. Seoul, Korea, Republic of: IEEE, 2024:
4900-4904.
[101] Oorloff T, Koppisetti S, Bonettini N, et al. AVFF:
Audio-Visual Feature Fusion for Video Deepfake
Detection[C]//Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. Seattle, WA,
USA: IEEE, 2024: 27102-27112.
[102] Yu Y, Liu X, Ni R, et al. Pvass-mdd: predictive
visual-audio alignment self-supervision for multimodal
deepfake detection[J]. IEEE Transactions on Circuits and
Systems for Video Technology, 2023.
[103] Li X, Liu Z, Chen C, et al. Zero-Shot Fake Video
Detection by Audio-Visual Consistency[C]//Interspeech.
Kos, Greece: ISCA, 2024: 2935-2939.
[104] Feng C, Chen Z, Owens A. Self-supervised video
forensics by audio-visual anomaly
detection[C]//Proceedings of the IEEE/CVF Conference
on Computer Vision and Pattern Recognition. Vancouver,
BC, Canada: IEEE, 2023: 10491-10503.
[105] Li X, Yu K, Ji S, et al. Fighting against deepfake:
Patch&pair convolutional neural networks
(PPCNN)[C]//Companion Proceedings of the Web
Conference. New York, NY, USA: Association for
Computing Machinery, 2020: 88-89.
[106] DAI Lei, CAO Lin, GUO Yanan, ZHANG Fan, DU
Kangning. Deepfake Cross-Model Defense Method Based
on Generative Adversarial Network[J]. Computer
Engineering, 2024, 50(10): 100-109.
[107] Haq Ijaz Ul, Malik Khalid Mahmood, Muhammad
Khan[J]. ACM Trans. Multimedia Comput. Commun.
Appl, 2024, 20(11): 341.
|