[1] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from Scratch[J]. Journal of Machine Learning Research, 2011, 12: 2493-2537. [2] ANDREW A M. Multiple view geometry in computer vision[J]. Kybernetes, 2001, 30(9/10): 1333-1341. [3] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144. [4] TAO M, TANG H, WU S S, et al. DF-GAN: deep fusion generative adversarial networks for text-to-image synthesis[EB/OL].[2024-01-02]. https://arxiv.org/abs/2008.05865v1. [5] YANG Z L, DAI Z H, YANG Y M, et al. XLNet: generalized autoregressive pretraining for language understanding[J]. Advances in Neural Information Processing Systems, 2019, 8: 5753-5763. [6] GRAVES A. Supervised sequence labelling with recurrent neural networks[M]. Berlin, Germany: Springer, 2012: 37-45. [7] 任欢, 王旭光. 注意力机制综述[J]. 计算机应用, 2021, 41(S1): 1-6. REN H, WANG X G. Review of attention mechanism[J]. Journal of Computer Applications, 2021, 41(S1): 1-6. (in Chinese) [8] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//Proceedings of the 37th International Conference on Machine Learning. New York, USA: ACM Press, 2020: 1597-1607. [9] REED S, AKATA Z, YAN X, et al. Generative adversarial text to image synthesis[C]//Proceedings of International Conference on Machine Learning. New York, USA: ACM Press, 2016: 1060-1069. [10] ZHANG H, XU T, LI H S, et al. StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2017: 5908-5916. [11] ZHANG H, XU T, LI H, et al. StackGAN++: realistic image synthesis with stacked generative adversarial networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1947-1962. [12] XU T, ZHANG P C, HUANG Q Y, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 1316-1324. [13] QIAO T T, ZHANG J, XU D Q, et al. MirrorGAN: learning text-to-image generation by redescription[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 1505-1514. [14] ZHU M F, PAN P B, CHEN W, et al. DM-GAN: dynamic memory generative adversarial networks for text-to-image synthesis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2019: 5795-5803. [15] LIAO W T, HU K, YANG M Y, et al. Text to image generation with semantic-spatial aware GAN[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 18166-18175. [16] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30: 5998-6008. [17] 刘建伟, 宋志妍. 循环神经网络研究综述[J]. 控制与决策, 2022, 37(11): 2753-2768. LIU J W, SONG Z Y. Overview of recurrent neural networks[J]. Control and Decision, 2022, 37(11): 2753-2768. (in Chinese) [18] DEVLIN J, CHANG M W, LEE K. BERT: pre-training of deep bidirectional Transformers for language understanding[EB/OL].[2024-01-02]. https://arxiv.org/pdf/1810.04805. [19] GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey[J]. Computational Visual Media, 2022, 8(3): 331-368. [20] RADFORD A, NARASIMHAN K, SALIMANS T,et al. Improving language understanding by generative pre-training[EB/OL].[2024-01-02]. https://arxiv.org/abs/1810.04805. [21] CHEN K, WANG J, CHEN L C, et al. ABC-CNN: an attention based convolutional neural network for visual question answering[EB/OL].[2024-01-02]. https://arxiv.org/abs/1511.05960. [22] XU X, WANG T, YANG Y, et al. Cross-modal attention with semantic consistence for image-text matching[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(12): 5412-5425. [23] GUAN S Y, LOEW M. Evaluation of generative adversarial network performance based on direct analysis of generated images[C]//Proceedings of the IEEE Applied Imagery Pattern Recognition Workshop. Washington D. C., USA: IEEE Press, 2019: 1-5. [24] OBUKHOV A, KRASNYANSKIY M. Quality assessment method for GAN based on modified metrics inception score and Fréchet inception distance[M]//RADEK S, PETR S, ZDENKA P. Software engineering perspectives in intelligent systems. Berlin, Germany: Springer, 2020: 102-114. [25] 张佳, 张丽红. 基于条件增强和注意力机制的文本生成图像方法[J]. 测试技术学报, 2023, 37(2): 112-119. ZHANG J, ZHANG L H. Research on text to image based on conditioning augmentation and attention mechanism[J]. Journal of Test and Measurement Technology, 2023, 37(2): 112-119. (in Chinese) [26] ZHANG H, GOODFELLOW I, METAXAS D, et al. Self-attention generative adversarial networks[EB/OL].[2024-01-02]. https://arxiv.org/abs/1805.08318. |