[1] 朱欣娟, 徐晨溦. 基于风格迁移的虚拟试穿研究[J]. 纺织高校基础科学学报, 2023, 36(1): 65.
Xinjuan Z, Chenwei X . Research on virtual try-on based on style transfer[J]. Basic Sciences Journal of Textile Universities, 2023, 36(1): 65.
[2] 祖雅妮, 张毅. 基于大规模预训练文本图像模型的虚 试穿方法[J]. 丝绸杂志社, 2023, 60(8): 99.
Yani Z, Yi Z. Virtual try-on method based on large-scale pre-trained text-image models[J]. Journal of Silk, 2023, 60(8): 99.
[3] 黄东晋, 李晓敏, 刘金华, 等. 基于姿势引导下的虚拟试穿网络[J]. 上海大学学报, 2024, 30(3): 491.
Dongjin H, Xiaomin L, Jinhua L, et al. Pose-guided virtual try-on network[J]. Journal of Shanghai University, 2024, 30(3): 491.
[4] Fang Z, Zhai W, Su A, et al. Vivid: Video virtual try-on using diffusion models[J]. arXiv preprint arXiv:2405.11794, 2024.
[5] Li G, Zheng S, Zhang H, et al. MagicTryOn: Harnessing Diffusion Transformer for Garment-Preserving Video Virtual Try-on[J]. arXiv preprint arXiv:2505.21325, 2025.
[6] Chong Z, Zhang W, Zhang S, et al. Catv2ton: Taming diffusion transformers for vision-based virtual try-on with temporal concatenation[J]. arXiv preprint arXiv:2501.11325, 2025.
[7] Jiang J, Wang T, Yan H, et al. Clothformer: Taming video virtual try-on in all module[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 10799-10808.
[8] Deng Z, He X, Peng Y, et al. MV-Diffusion: Motion-aware video diffusion model[C]//Proceedings of the 31st ACM International Conference on Multimedia. 2023: 7255-7263.
[9] Dong H, Liang X, Shen X, et al. Fw-gan: Flow-navigated warping gan for video virtual try-on[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 1161-1170.
[10] Zheng J, Wang J, Zhao F, et al. Dynamic Try-On: Taming Video Virtual Try-on with Dynamic Attention Mechanism[J]. arXiv preprint arXiv:2412.09822, 2024.
[11] Rombach R, Blattmann A, Lorenz D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 10684-10695.
[12] Blattmann A, Dockhorn T, Kulal S, et al. Stable video diffusion: Scaling latent video diffusion models to large datasets[J]. arXiv preprint arXiv:2311.15127, 2023.
[13] Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
[14] Nguyen H, Nguyen Q Q V, Nguyen K, et al. Swifttry: Fast and consistent video virtual try-on with diffusion models[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2025, 39(6): 6200-6208.
[15] He Z, Chen P, Wang G, et al. Wildvidfit: Video virtual try-on in the wild via image-based controlled diffusion models[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 123-139.
[16] Xu Z, Chen M, Wang Z, et al. Tunnel try-on: Excavating spatial-temporal tunnels for high-quality virtual try-on in videos[C]//Proceedings of the 32nd ACM International Conference on Multimedia. 2024: 3199-3208.
[17] Peebles W, Xie S. Scalable diffusion models with transformers[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 4195-4205.
[18] Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models[J]. Advances in neural information processing systems, 2020, 33: 6840-6851.
[19] Wan T, Wang A, Ai B, et al. Wan: Open and advanced large-scale video generative models[J]. arXiv preprint arXiv:2503.20314, 2025.
[20] Kong W, Tian Q, Zhang Z, et al. Hunyuanvideo: A systematic framework for large video generative models, 2025[J]. URL https://arxiv.org/abs/2412.03603.
[21] Jiang B, Hu X, Luo D, et al. Fitdit: Advancing the authentic garment details for high-fidelity virtual try-on[J]. arXiv preprint arXiv:2411.10499, 2024.
[22] Hu E J, Shen Y, Wallis P, et al. LoRA: Low-rank adaptation of large language models[J]. ICLR, 2022, 1(2): 3.
[23] Choi S, Park S, Lee M, et al. VITON-HD: High-resolution virtual try-on via misalignment-aware normalization[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 14131-14140.
[24] Esser P, Kulal S, Blattmann A, et al. Scaling rectified flow transformers for high-resolution image synthesis[C]//Forty-first international conference on machine learning. 2024.
[25] Black Forest Labs, “Flux,” https://github.com/ black-forest-labs/flux, 2024.
[26] Wu J Z, Ge Y, Wang X, et al. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 7623-7633.
[27] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3d convolutional networks[C]//Proceedings of the IEEE international conference on computer vision. 2015: 4489-4497.
[28] Polyak A, Zohar A, Brown A, et al. Movie gen: A cast of media foundation models[J]. arXiv preprint arXiv:2410.13720, 2024.
[29] Yang Z, Teng J, Zheng W, et al. Cogvideox: Text-to-video diffusion models with an expert transformer[J]. arXiv preprint arXiv:2408.06072, 2024.
[30] Lipman Y, Chen R T Q, Ben-Hamu H, et al. Flow matching for generative modeling[J]. arXiv preprint arXiv:2210.02747, 2022.
[31] Zhang R, Isola P, Efros A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 586-595.
[32] Wang Z, Bovik A C, Sheikh H R, et al. Image quality assessment: from error visibility to structural similarity[J]. IEEE transactions on image processing, 2004, 13(4): 600-612.
[33] Carreira J, Zisserman A. Quo vadis, action recognition? a new model and the kinetics dataset[C]//proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6299-6308.
[34] Hara K, Kataoka H, Satoh Y. Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2018: 6546-6555.
[35] Kim J, Gu G, Park M, et al. StableVITON: Learning semantic correspondence with latent diffusion model for virtual try-on[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 8176-8185.
[36] Xu Y, Gu T, Chen W, et al. OOTDiffusion: Outfitting fusion based latent diffusion for controllable virtual try-on[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2025, 39(9): 8996-9004.
[37] Choi Y, Kwak S, Lee K, et al. Improving diffusion models for authentic virtual try-on in the wild[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 206-235.
|