[1] YANG Nan,NAN Lin,ZHANG Dingyi.Research on image interpretation based on deep learning[J].Infrared and Laser Engineering,2018,47(2):9-16.(in Chinese)杨楠,南琳,张丁一.基于深度学习的图像描述研究[J].红外与激光工程,2018,47(2):9-16. [2] VINYALS O,TOSHEV A,BENGIO S,et al.Show and tell:a neural image caption generator[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:3156-3164. [3] XU K,BA J,KIROS R,et al.Show,attend and tell:neural image caption generation with visual attention[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning.New York,USA:ACM Press,2015:2048-2057. [4] YANG Jin,LIU Jianbei,DIAO Jing.Image local feature descriptor based on discrete cosine transform[J].Computer Engineering,2012,38(14):173-176.(in Chinese)杨进,刘建波,赵静.基于离散余弦变换的图像局部特征描述子[J].计算机工程,2012,38(14):173-176. [5] CHEN Long,ZHANG Hanwang,XIAO Jun,et al.SCA-CNN:spatial and channel-wise attention in convolutional networks for image captioning[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:1-12. [6] LU J S,XIONG C M,PARIKH D,et al.Knowing when to look:adaptive attention via a visual sentinel for image captioning[C]//Proceedings of the 30th IEEE Conference on Computer Vision and Piscataway.Washington D.C.,USA:IEEE Press,2017:3242-3250. [7] HOU Xingchen,WANG Jin.Image caption based on adaptive attention model[J].Computer and Modernization,2020,30(6):95-100.(in Chinese)侯星晨,王锦.基于自适应注意模型的图像描述[J].计算机与现代化,2020,30(6):95-100. [8] LU J,JIANWEI Y,BATRA D,et al.Neural baby talk[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:1-11. [9] ANDER S P,HE X D,BUEHLER C,et al.Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:1122-1136. [10] DAI Bo,ZHANG Yuqi,LIN Dahua.Detecting visual relationships with deep relational networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:3298-3308. [11] HU Han,GU Jiayuan,ZHANG Zheng,et al.Relation networks for object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:3588-3597. [12] LU C W,KRISHNA R,BERNSTEIN M,et al.Visual relationship detection with language priors[C]//Proceedings of ECCV'16.Berlin,Germany:Springer,2016:852-869. [13] XU D F,ZHU Y K,CHOY C B,et al.Scene graph generation by iterative message passing[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:3097-3106. [14] YANG Xu,TANG Kaihua,ZHANG Hanwang,et al.Auto-encoding scene graphs for image captioning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2019:10685-10694. [15] TANG Pengjie,TAN Yunlan,LI Jinzhong.Image description based on the fusion of scene and object category prior knowledge[J].Journal of Image and Graphics,2017,22(9):1251-1260.(in Chinese)汤鹏杰,谭云兰,李金忠.融合图像场景及物体先验知识的图像描述生成模型[J].中国图象图形学报,2017,22(9):1251-1260. [16] BAHDANAU D,CHO K,BENGIO Y.Neural machine translation by jointly learning to align and translate[EB/OL].[2020-08-20].https://arxiv.org/abs/1409. [17] YANG Z C,YANG D Y,DYER C,et al.Hierarchical attention networks for document classification[C]//Proceedings of 2016 Conference of the North American Chapter of the Association for Computational Linguistics.Stroudsburg,USA:Association for Computational Linguistics,2016:236-256. [18] VASWANI A,SHAZEEER N,PARMAR N,et al.Attention is all you need[EB/OL].[2020-08-20].https://arxiv.org/pdf/1706.03762.pdf. [19] LIN Z H,FENG M W,CICERO N D S,et al.A structured self-attentive sentence embedding[EB/OL].[2020-08-20].https://arxiv.org/pdf/1703.03130.pdf. [20] REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149. [21] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778. [22] KRISHNA R,ZHU Y,GROTH O,et al.Visual genome:Connecting language and vision using crowdsourced dense image annotations[EB/OL].[2020-08-20].https://link.springer.com/content/pdf/10.1007%2Fs11263-016-0981-7.pdf. [23] LIN Y T,MAIRE M,BELONGIE S,et al.Microsoft COCO:common objects in context[C]//Proceedings of European Conference on Computer Vision.Berlin,Germany:Springer,2014:740-755. [24] PAPINENI K,ROUKOS S,WARDR,et al.BLEU:a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics.Washington D.C.,USA:IEEE Press,2002:311-318. [25] DENKOWSKI M,LAVIE A.Meteor universal:language specific translation evaluation for any target language[C]//Proceedings of the 9th Workshop on Statistical Machine Translation.Washington D.C.,USA:IEEE Press,2014:376-380. [26] VEDANTAM R,LAWRENCE Z C,PARIKH D.CIDER:consensus-based image description evaluation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:4566-4575. [27] LIU Y,LIU Z,CHUA T S,et al.Topical word embeddings[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence.Berlin,Germany:Springer,2015:2418-2424. [28] RENNIE S J,MARCHERET E,MROUEH Y,et al.Self-critical sequence training for image captioning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2017:7008-7024. |