| 1 |
PRAJWAL K R, MUKHOPADHYAY R, NAMBOODIRI V P, et al. A lip sync expert is all you need for speech to lip generation in the wild[C]//Proceedings of the 28th ACM International Conference on Multimedia. New York, USA: ACM Press, 2020: 484-492.
|
| 2 |
PRAJWAL K R, MUKHOPADHYAY R, PHILIP J, et al. Towards automatic face-to-face translation[C]//Proceedings of the 27th ACM International Conference on Multimedia. New York, USA: ACM Press, 2019: 1428-1436.
|
| 3 |
MITTAL G, WANG B Y. Animating face using disentangled audio representations[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2020: 3290-3298.
|
| 4 |
|
| 5 |
AFOURAS T , CHUNG J S , SENIOR A , et al. Deep audio-visual speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (12): 8717- 8727.
doi: 10.1109/TPAMI.2018.2889052
|
| 6 |
KARRAS T, LAINE S, AILA T M. A style-based generator architecture for generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2019: 4401-4410.
|
| 7 |
GOODFELLOW I , POUGET-ABADIE J , MIRZA M , et al. Generative adversarial networks. Communications of the ACM, 2020, 63 (11): 139- 144.
doi: 10.1145/3422622
|
| 8 |
ARJOVSKY M, CHINTALA S, BOTTOU L. Wasserstein generative adversarial networks[C]//Proceedings of the International Conference on Machine Learning. [S. l.]: PMLR, 2017: 214-223.
|
| 9 |
GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 5769-5779.
|
| 10 |
|
| 11 |
张慧妍, 梁勇, 兰景宏, 等. 基于记忆模块与过滤式生成对抗网络的入侵检测方法. 计算机工程, 2024, 50 (6): 197- 207.
doi: 10.19678/j.issn.1000-3428.0068157
|
|
ZHANG H Y , LIANG Y , LAN J H , et al. Intrusion detection method based on memory module and filtered generative adversarial network. Computer Engineering, 2024, 50 (6): 197- 207.
doi: 10.19678/j.issn.1000-3428.0068157
|
| 12 |
KARRAS T, LAINE S, AITTALA M, et al. Analyzing and improving the image quality of StyleGAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 8110-8119.
|
| 13 |
PATASHNIK O, WU Z Z, SHECHTMAN E, et al. StyleCLIP: text-driven manipulation of StyleGAN imagery[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 2065-2074.
|
| 14 |
RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//Proceedings of the International Conference on Machine Learning. [S. l.]: PMLR, 2021: 8748-8763.
|
| 15 |
SUWAJANAKORN S , SEITZ S M , KEMELMACHER-SHLIZERMAN I . Synthesizing Obama. ACM Transactions on Graphics, 2017, 36 (4): 1- 13.
|
| 16 |
GUO Y D, CHEN K Y, LIANG S, et al. AD-NeRF: audio driven neural radiance fields for talking head synthesis[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 5764-5774.
|
| 17 |
YE Z, JIANG Z, REN Y, et al. GeneFace: generalized and high-fidelity audio-driven 3D talking face synthesis[EB/OL]. [2024-05-11]. https://arxiv.org/abs/2301.13430.
|
| 18 |
LAHIRI A, KWATRA V, FRUEH C, et al. LipSync3D: data-efficient learning of personalized 3D talking faces from video using pose and lighting normalization[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 2755-2764.
|
| 19 |
FRIED O , TEWARI A , ZOLLHÖFER M , et al. Text-based editing of talking-head video. ACM Transactions on Graphics, 2019, 38 (4): 1- 14.
|
| 20 |
THIES J, ELGHARIB M, TEWARI A, et al. Neural voice puppetry: audio-driven facial reenactment[C]//Proceedings of the 16th European Conference on Computer Vision. Berlin, Germany: Springer International Publishing, 2020: 716-731.
|
| 21 |
|
| 22 |
ZHOU H, LIU Y, LIU Z W, et al. Talking face generation by adversarially disentangled audio-visual representation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2019: 9299-9306.
|
| 23 |
ZHOU H, SUN Y S, WU W, et al. Pose-controllable talking face generation by implicitly modularized audio-visual representation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 4176-4186.
|
| 24 |
WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). Berlin, Germany: Springer International Publishing, 2018: 3-19.
|
| 25 |
CHUNG J S, ZISSERMAN A. Out of time: automated lip sync in the wild[C]//Proceedings of ACCV'16. Berlin, Germany: Springer International Publishing, 2016: 251-263.
|
| 26 |
|
| 27 |
WANG Z , BOVIK A C , SHEIKH H R , et al. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 2004, 13 (4): 600- 612.
doi: 10.1109/TIP.2003.819861
|
| 28 |
PARK S J, KIM M, HONG J, et al. SyncTalkFace: talking face generation with precise lip-syncing via audio-lip memory[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2022: 2062-2070.
|
| 29 |
CHEN L L, MADDOX R K, DUAN Z Y, et al. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2019: 7824-7833.
|
| 30 |
ZHANG Z M, HU Z P, DENG W J, et al. DINet: deformation inpainting network for realistic face visually dubbing on high resolution video[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2023: 3543-3551.
|