[1] AYOUB S, GULZAR Y, AHMAD R F, et al. Generating image captions using Bahdanau attention mechanism and transfer learning[J]. Symmetry, 2022, 14(12):2681. [2] ZHAO D X, YANG R X, WANG Z H, et al. A cooperative approach based on self-attention with interactive attribute for image caption[J]. Multimedia Tools and Applications, 2023, 82(1):1223-1236. [3] GENG Y G, MEI H Y, XUE X R, et al. Image-caption model based on fusion feature[J]. Applied Sciences, 2022, 12(19):9861. [4] KHAN R, SHUJAH ISLAM M, KANWAL K, et al. Attention based sequence-to-sequence framework for auto image caption generation[J]. Journal of Intelligent&Fuzzy Systems, 2022, 43(1):159-170. [5] CHANG Y H, CHEN Y J, HUANG R H, et al. Enhanced image captioning with color recognition using deep learning methods[J]. Applied Sciences, 2021, 12(1):209. [6] CHANG X J, REN P Z, XU P F, et al. A comprehensive survey of scene graphs:generation and application[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1):1-26. [7] 苗益,赵增顺,杨雨露,等.图像描述技术综述[J].计算机科学, 2020, 47(12):149-160. MIAO Y, ZHAO Z S, YANG Y L, et al. Survey of image captioning methods[J]. Computer Science, 2020, 47(12):149-160.(in Chinese) [8] MING Y, HU N N, FAN C X, et al. Visuals to text:a comprehensive review on automatic image captioning[J]. CAA Journal of Automatica Sinica, 2022, 9(8):1339-1365. [9] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell:a neural image caption generator[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2015:3156-3164. [10] XU K, BA J, KIROS R, et al. Show, attend and tell:neural image caption generation with visual attention[EB/OL].[2023-02-05]. https://arxiv.org/abs/1502.03044. [11] LU J S, XIONG C M, PARIKH D, et al. Knowing when to look:adaptive attention via a visual sentinel for image captioning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2017:3242-3250. [12] CHEN L, ZHANG H W, XIAO J, et al. SCA-CNN:spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2017:5659-5667. [13] ANDERSON P, HE X D, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2018:6077-6086. [14] HAN K, WANG Y H, CHEN H T, et al. A survey on vision Transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1):87-110. [15] HUANG L, WANG W M, CHEN J, et al. Attention on attention for image captioning[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C.,USA:IEEE Press,2019:4633-4642. [16] HERDADE S, KAPPELER A, BOAKYE K, et al. Image captioning:transforming objects into words[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2019:11137-11147. [17] CORNIA M, STEFANINI M, BARALDI L, et al. Meshed-memory transformer for image captioning[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2020:10575-10584. [18] ZHANG X Y, SUN X S, LUO Y P, et al. RSTNet:captioning with adaptive attention on visual and non-visual words[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2021:15460-15469. [19] CHEN J, GUO H, YI K, et al. VisualGPT:data-efficient adaptation of pretrained language models for image captioning[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2022:18009-18019. [20] 季长清,高志勇,秦静,等.基于卷积神经网络的图像分类算法综述[J].计算机应用, 2022, 42(4):1044-1049. JI C Q, GAO Z Y, QIN J, et al. Review of image classification algorithms based on convolutional neural network[J]. Journal of Computer Applications, 2022, 42(4):1044-1049.(in Chinese) [21] 郭玥秀,杨伟,刘琦,等.残差网络研究综述[J].计算机应用研究, 2020, 37(5):1292-1297. GUO Y X, YANG W, LIU Q, et al. Survey of residual network[J]. Application Research of Computers, 2020, 37(5):1292-1297.(in Chinese) [22] WOO S, PARK J, LEE J Y, et al. CBAM:convolutional block attention module[EB/OL].[2023-02-05]. https://arxiv.org/abs/1807.06521. [23] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2018:7132-7141. [24] JIAN L H, XIANG H Q, LE G B. LSTM-based attentional embedding for English machine translation[J]. Scientific Programming, 2022, 2022:3909726. [25] WANG C Z, GU X D. Dynamic-balanced double-attention fusion for image captioning[J]. Engineering Applications of Artificial Intelligence, 2022, 114:105194. [26] RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2017:1179-1195. [27] GEBHARDT E, WOLF M. CAMEL dataset for visual and thermal infrared multiple object detection and tracking[C]//Proceedings of the 15th IEEE International Conference on Advanced Video and Signal Based Surveillance. Washington D.C.,USA:IEEE Press,2018:1-6. [28] LI C L, CHENG H, HU S Y, et al. Learning collaborative sparse representation for grayscale-thermal tracking[J]. IEEE Transactions on Image Processing, 2016, 25(12):5743-5756. [29] LI C L, LIANG X Y, LU Y J, et al. RGB-T object tracking:benchmark and baseline[J]. Pattern Recognition, 2019, 96:106977. |