1 |
LI Z X, LIN L, ZHANG C L, et al. A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Transactions on Multimedia Computing, Communications, and Applications, 2021, 17(1): 1- 23.
|
2 |
衡红军, 范昱辰, 王家亮. 基于Transformer的多方面特征编码图像描述生成算法. 计算机工程, 2023, 49(2): 199- 205.
URL
|
|
HENG H J, FAN Y C, WANG J L. Multifaceted feature coding image caption generation algorithm based on Transformer. Computer Engineering, 2023, 49(2): 199- 205.
URL
|
3 |
卓亚琦, 魏家辉, 李志欣. 基于双注意模型的图像描述生成方法研究. 电子学报, 2022, 50(5): 1123- 1130.
URL
|
|
ZHUO Y Q, WEI J H, LI Z X. Research on image captioning based on double attention model. Acta Electronica Sinica, 2022, 50(5): 1123- 1130.
URL
|
4 |
CHEN F L, ZHANG D Z, HAN M L, et al. VLP: a survey on vision-language pre-training. Machine Intelligence Research, 2023, 20(1): 38- 56.
doi: 10.1007/s11633-022-1369-5
|
5 |
YANG X, ZHANG H W, GAO C Y, et al. Learning to collocate visual-linguistic neural modules for image captioning. International Journal of Computer Vision, 2023, 131(1): 82- 100.
doi: 10.1007/s11263-022-01692-8
|
6 |
FENG Y M, LAN L, ZHANG X, et al. AttResNet: attention-based ResNet for image captioning[C]// Proceedings of 2018 International Conference on Algorithms, Computing and Artificial Intelligence. New York, USA: ACM Press, 2018: 1-6.
|
7 |
HOSSAIN M Z, SOHEL F, SHIRATUDDIN M F, et al. A comprehensive survey of deep learning for image captioning. ACM Computing Surveys, 2019, 51(6): 1- 36.
|
8 |
WANG D F, HU H F, CHEN D H. Transformer with sparse self-attention mechanism for image captioning. Electronics Letters, 2020, 56(15): 764- 766.
doi: 10.1049/el.2020.0635
|
9 |
HUANG L, WANG W M, CHEN J, et al. Attention on attention for image captioning[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 4634-4643.
|
10 |
GUO L T, LIU J, ZHU X X, et al. Normalized and geometry-aware self-attention network for image captioning[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 10327-10336.
|
11 |
赵敬伟, 林珊玲, 梅婷, 等. 基于YOLACT与Transformer相结合的实例分割算法研究. 半导体光电, 2023, 44(1): 134- 140.
URL
|
|
ZHAO J W, LIN S L, MEI T, et al. Research on instance segmentation algorithm based on YOLACT and Transformer. Semiconductor Optoelectronics, 2023, 44(1): 134- 140.
URL
|
12 |
|
13 |
VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 3156-3164.
|
14 |
ANDERSON P, HE X D, BUEHLER C, et al. Bottom-up and top-down attention for image captioning and visual question answering[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 6077-6086.
|
15 |
李志欣, 魏海洋, 黄飞成, 等. 结合视觉特征和场景语义的图像描述生成. 计算机学报, 2020, 43(9): 1624- 1640.
URL
|
|
LI Z X, WEI H Y, HUANG F C, et al. Combine visual features and scene semantics for image captioning. Chinese Journal of Computers, 2020, 43(9): 1624- 1640.
URL
|
16 |
宋井宽, 曾鹏鹏, 顾嘉扬, 等. 基于视觉区域聚合与双向协作的端到端图像描述生成. 软件学报, 2023, 34(5): 2152- 2169.
URL
|
|
SONG J K, ZENG P P, GU J Y, et al. End-to-end image captioning via visual region aggregation and dual-level collaboration. Journal of Software, 2023, 34(5): 2152- 2169.
URL
|
17 |
PAN Y W, YAO T, LI Y H, et al. X-linear attention networks for image captioning[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 10968-10977.
|
18 |
SONG Z L, ZHOU X F, DONG L H, et al. Direction relation Transformer for image captioning[C]//Proceedings of the 29th ACM International Conference on Multimedia. New York, USA: ACM Press, 2021: 5056-5064.
|
19 |
JIANG H Z, MISRA I, ROHRBACH M, et al. In defense of grid features for visual question answering[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 10264-10273.
|
20 |
WU M R, ZHANG X Y, SUN X S, et al. DIFNet: boosting visual information flow for image captioning[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 18020-18029.
|
21 |
RENNIE S J, MARCHERET E, MROUEH Y, et al. Self-critical sequence training for image captioning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 1179-1195.
|
22 |
|
23 |
KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 3128-3137.
|
24 |
PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. [S. l. ]: ACL, 2001: 311-318.
|
25 |
BANERJEE S, LAVIE A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments[EB/OL]. [2023-08-05]. https://aclanthology.org/W05-0909/.
|
26 |
|
27 |
VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: consensus-based image description evaluation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 4566-4575.
|
28 |
|
29 |
NGUYEN V Q, SUGANUMA M, OKATANI T. GRIT: faster and better image captioning Transformer using dual visual features[EB/OL]. [2023-08-05]. https://arxiv. org/abs/2207.09666.
|
30 |
LU J S, XIONG C M, PARIKH D, et al. Knowing when to look: adaptive attention via a visual sentinel for image captioning[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 3242-3250.
|
31 |
刘茂福, 施琦, 聂礼强. 基于视觉关联与上下文双注意力的图像描述生成方法. 软件学报, 2022, 33(9): 3210- 3222.
URL
|
|
LIU M F, SHI Q, NIE L Q. Image captioning based on visual relevance and context dual attention. Journal of Software, 2022, 33(9): 3210- 3222.
URL
|
32 |
|
33 |
CORNIA M, STEFANINI M, BARALDI L, et al. Meshed-memory Transformer for image captioning[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 10575-10584.
|
34 |
JI J Y, LUO Y P, SUN X S, et al. Improving image captioning by leveraging intra- and inter-layer global representation in Transformer network. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(2): 1655- 1663.
doi: 10.1609/aaai.v35i2.16258
|
35 |
ZHANG X Y, SUN X S, LUO Y P, et al. RSTNet: captioning with adaptive attention on visual and non-visual words[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 15460-15469.
|
36 |
LUO Y P, JI J Y, SUN X S, et al. Dual-level collaborative Transformer for image captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(3): 2286- 2293.
doi: 10.1609/aaai.v35i3.16328
|