| 1 |  | 
																													
																							| 2 | LI X, THICKSTUN J, GULRAJANI I, et al. Diffusion-lm improves controllable text generation[C]//Proceedings of Advances in Neural Information Processing Systems. [S. l. ]: AAAI Press, 2022: 4328-4343. | 
																													
																							| 3 |  | 
																													
																							| 4 | 闫志浩, 周长兵, 李小翠.  生成扩散模型研究综述. 计算机科学, 2024, 51 (1): 273- 283. | 
																													
																							|  |  YAN Z H ,  ZHOU C B ,  LI X C .  Survey on generative diffusion model. Computer Science, 2024, 51 (1): 273- 283. | 
																													
																							| 5 | RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. Berlin, Germany: Springer, 2015: 234-241. | 
																													
																							| 6 | 李豪宇, 陈晔曜, 蒋志迪, 等.  基于子光场遮挡融合的无监督光场深度估计. 光电工程, 2024, 51 (10): 240166. | 
																													
																							|  |  LI H Y ,  CHEN Y Y ,  JIANG Z D , et al.  Unsupervised light field depth estimation based on sub-light field occlusion fusion. Opto-Electronic Engineering, 2024, 51 (10): 240166. | 
																													
																							| 7 | PEEBLES W, XIE S. Scalable diffusion models with transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2023: 4195-4205. | 
																													
																							| 8 |  | 
																													
																							| 9 | CAO Y, LI S, LIU Y, et al. A comprehensive survey of AI-Generated Content (AIGC): a history of generative AI from gan to ChatGPT[EB/OL]. [2023-10-03]. https://arxiv.org/abs/2303.04226 . | 
																													
																							| 10 |  HINTON G E ,  SALAKHUTDINOV R R .  Reducing the dimensionality of data with neural networks. science, 2006, 313 (5786): 504- 507.  doi: 10.1126/science.1127647
 | 
																													
																							| 11 | GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]// Proceedings of the 28th International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2014: 2672-2680. | 
																													
																							| 12 | HO J, JAIN A, ABBEEL P. Denoising diffusion probabilistic models[C]//Proceedings of Advances in Neural Information Processing Systems. [S. l. ]: AAAI Press, 2020: 6840-6851. | 
																													
																							| 13 |  SIDDIQUE N ,  PAHEDING S ,  ELKIN C P , et al.  U-Net and its variants for medical image segmentation: a review of theory and applications. IEEE Access, 2021, 9, 82031- 82057.  doi: 10.1109/ACCESS.2021.3086020
 | 
																													
																							| 14 |  WU J ,  LIU W L ,  LI C , et al.  A state-of-the-art survey of U-Net in microscopic image analysis: from simple usage to structure mortification. Neural Computing and Applications, 2023, 36, 3317- 3346. | 
																													
																							| 15 | ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2022: 10684-10695. | 
																													
																							| 16 | 赵宏, 李文改.  基于扩散生成对抗网络的文本生成图像模型研究. 电子与信息学报, 2023, 45 (12): 4371- 4381. | 
																													
																							|  |  ZHAO H ,  LI W G .  Text-to-image generation model based on diffusion Wasserstein generative adversarial networks. Journal of Electronics & Information Technology, 2023, 45 (12): 4371- 4381. | 
																													
																							| 17 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010. | 
																													
																							| 18 |  LIU Y ,  ZHANG Y ,  WANG Y X , et al.  A survey of visual transformers. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35 (6): 7478- 7498.  doi: 10.1109/TNNLS.2022.3227717
 | 
																													
																							| 19 |  | 
																													
																							| 20 |  | 
																													
																							| 21 |  | 
																													
																							| 22 |  | 
																													
																							| 23 | HASSANI A, WALTON S, LI J, et al. Neighborhood attention transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 6185-6194. | 
																													
																							| 24 | DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2009: 248-255. | 
																													
																							| 25 | HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[EB/OL]. [2023-10-03]. https://arxiv.org/pdf/1706.08500 . | 
																													
																							| 26 |  | 
																													
																							| 27 | HESSEL J, HOLTZMAN A, FORBES M, et al. CLIPscore: a reference-free evaluation metric for image captioning[EB/OL]. [2023-10-03]. https://arxiv.org/pdf/1801.01973 . | 
																													
																							| 28 | SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2015: 1-9. | 
																													
																							| 29 |  |