[1] PRAJWAL K R, MUKHOPADHYAY R, NAMBOODIRI V P, et al. A lip sync expert is all you need for speech to lip generation in the wild[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM Press, 2020: 484-492. [2] PRAJWAL K R, MUKHOPADHYAY R, PHILIP J, et al. Towards automatic face-to-face translation[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York, USA: ACM Press, 2019: 1428-1436. [3] MITTAL G, WANG B Y. Animating face using disentangled audio representations[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2020: 3290-3298. [4] AFOURAS T, CHUNG J S, ZISSERMAN A. LRS3-TED: a large-scale dataset for visual speech recognition[EB/OL].[2024-05-11]. https://arxiv.org/abs/1809.00496. [5] AFOURAS T, CHUNG J S, SENIOR A, et al. Deep audio-visual speech recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(12): 8717-8727. [6] KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2019: 4401-4410. [7] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. [8] ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein generative adversarial networks[C]//Proceedings of the International Conference on Machine Learning.[S. l.]: PMLR, 2017: 214-223. [9] GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 5769-5779. [10] MIYATO T, KATAOKA T, KOYAMA M, et al. Spectral normalization for generative adversarial networks[EB/OL].[2024-05-11]. https://arxiv.org/abs/1802.05957. [11] 张慧妍, 梁勇, 兰景宏, 等. 基于记忆模块与过滤式生成对抗网络的入侵检测方法[J]. 计算机工程, 2024, 50(6): 197-207. ZHANG H Y, LIANG Y, LAN J H, et al. Intrusion detection method based on memory module and filtered generative adversarial network[J]. Computer Engineering, 2024, 50(6): 197-207. (in Chinese) [12] KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 8110-8119. [13] PATASHNIK O, WU Z Z, SHECHTMAN E, et al. StyleCLIP: text-driven manipulation of StyleGAN imagery[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 2065-2074. [14] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the International Conference on Machine Learning.[S. l.]: PMLR, 2021: 8748-8763. [15] SUWAJANAKORN S, SEITZ S M, KEMELMACHER-SHLIZERMAN I. Synthesizing Obama[J]. ACM Transactions on Graphics, 2017, 36(4): 1-13. [16] GUO Y D, CHEN K Y, LIANG S, et al. AD-NeRF: audio driven neural radiance fields for talking head synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 5764-5774. [17] YE Z, JIANG Z, REN Y, et al. GeneFace: generalized and high-fidelity audio-driven 3D talking face synthesis[EB/OL].[2024-05-11]. https://arxiv.org/abs/2301.13430. [18] LAHIRI A, KWATRA V, FRUEH C, et al. LipSync3D: data-efficient learning of personalized 3D talking faces from video using pose and lighting normalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 2755-2764. [19] FRIED O, TEWARI A, ZOLLHÖFER M, et al. Text-based editing of talking-head video[J]. ACM Transactions on Graphics, 2019, 38(4): 1-14. [20] THIES J, ELGHARIB M, TEWARI A, et al. Neural voice puppetry: audio-driven facial reenactment[C]//Proceedings of the 16th European Conference on Computer Vision. Berlin, Germany: Springer International Publishing, 2020: 716-731. [21] CHUNG J S, JAMALUDIN A, ZISSERMAN A. You said that?[EB/OL].[2024-05-11]. https://ludwig.guru/s/you+said+that. [22] ZHOU H, LIU Y, LIU Z W, et al. Talking face generation by adversarially disentangled audio-visual representation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2019: 9299-9306. [23] ZHOU H, SUN Y S, WU W, et al. Pose-controllable talking face generation by implicitly modularized audio-visual representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 4176-4186. [24] WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin, Germany: Springer International Publishing, 2018: 3-19. [25] CHUNG J S, ZISSERMAN A. Out of time: automated lip sync in the wild[C]//Proceedings of ACCV’16. Berlin, Germany: Springer International Publishing, 2016: 251-263. [26] KINGMA D P, BA J. Adam: a method for stochastic optimization[EB/OL].[2024-05-11]. https://arxiv.org/abs/1412.6980. [27] WANG Z, BOVIK A C, SHEIKH H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600-612. [28] PARK S J, KIM M, HONG J, et al. SyncTalkFace: talking face generation with precise lip-syncing via audio-lip memory[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2022: 2062-2070. [29] CHEN L L, MADDOX R K, DUAN Z Y, et al. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2019: 7824-7833. [30] ZHANG Z M, HU Z P, DENG W J, et al. DINet: deformation inpainting network for realistic face visually dubbing on high resolution video[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2023: 3543-3551. |