1 |
GHANDI T, POURREZA H, MAHYAR H. Deep learning approaches on image captioning: a review. ACM Computing Surveys, 2024, 56(3): 1- 39.
doi: 10.1145/3617592
|
2 |
LI Y P, ZHANG X R, CHENG X N, et al. Learning consensus-aware semantic knowledge for remote sensing image captioning. Pattern Recognition, 2024, 145, 109893.
doi: 10.1016/j.patcog.2023.109893
|
3 |
石义乐, 杨文忠, 杜慧祥, 等. 基于深度学习的图像描述综述. 电子学报, 2021, 49(10): 2048- 2060.
doi: 10.12263/DZXB.20200669
|
|
SHI Y L, YANG W Z, DU H X, et al. Overview of image captions based on deep learning. Acta Electronica Sinica, 2021, 49(10): 2048- 2060.
doi: 10.12263/DZXB.20200669
|
4 |
STEFANINI M, CORNIA M, BARALDI L, et al. From show to tell: a survey on deep learning-based image captioning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(1): 539- 559.
doi: 10.1109/TPAMI.2022.3148210
|
5 |
WANG J, XU W, WANG Q, et al. On distinctive image captioning via comparing and reweighting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(2): 2088- 2103.
doi: 10.1109/TPAMI.2022.3159811
|
6 |
YANG X, ZHANG H, CAI J. Deconfounded image captioning: a causal retrospect. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 12996- 13010.
doi: 10.1109/TPAMI.2021.3121705
|
7 |
|
8 |
GUO L T, LIU J, ZHU X X, et al. Normalized and geometry-aware self-attention network for image captioning[EB/OL]. [2023-12-05]. https://arxiv.org/abs/2003.08897.
|
9 |
|
10 |
|
11 |
|
12 |
JI J Y, LUO Y P, SUN X S, et al. Improving image captioning by leveraging intra- and inter-layer global representation in transformer network. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(2): 1655- 1663.
doi: 10.1609/aaai.v35i2.16258
|
13 |
LUO Y P, JI J Y, SUN X S, et al. Dual-level collaborative transformer for image captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(3): 2286- 2293.
doi: 10.1609/aaai.v35i3.16328
|
14 |
李志欣, 魏海洋, 黄飞成, 等. 结合视觉特征和场景语义的图像描述生成. 计算机学报, 2020, 43(9): 1624- 1640.
URL
|
|
LI Z X, WEI H Y, HUANG F C, et al. Combine visual features and scene semantics for image captioning. Chinese Journal of Computers, 2020, 43(9): 1624- 1640.
URL
|
15 |
周东明, 张灿龙, 李志欣, 等. 基于多层级视觉融合的图像描述模型. 电子学报, 2021, 49(7): 1286- 1290.
URL
|
|
ZHOU D M, ZHANG C L, LI Z X, et al. Image captioning model based on multi-level visual fusion. Acta Electronica Sinica, 2021, 49(7): 1286- 1290.
URL
|
16 |
刘茂福, 施琦, 聂礼强. 基于视觉关联与上下文双注意力的图像描述生成方法. 软件学报, 2022, 33(9): 3210- 3222.
doi: 10.13328/j.cnki.jos.006623
|
|
LIU M F, SHI Q, NIE L Q. Image captioning based on visual relevance and context dual attention. Journal of Software, 2022, 33(9): 3210- 3222.
doi: 10.13328/j.cnki.jos.006623
|
17 |
宋井宽, 曾鹏鹏, 顾嘉扬, 等. 基于视觉区域聚合与双向协作的端到端图像描述生成. 软件学报, 2022, 34(5): 2152- 2169.
doi: 10.13328/j.cnki.jos.006773
|
|
SONG J K, ZENG P P, GU J Y, et al. End-to-end image captioning via visual region aggregation and dual-level collaboration. Journal of Software, 2022, 34(5): 2152- 2169.
doi: 10.13328/j.cnki.jos.006773
|
18 |
|
19 |
CHENG B W, MISRA I, SCHWING A G, et al. Masked-attention mask transformer for universal image segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 1290-1299.
|
20 |
|
21 |
WU M R, ZHANG X Y, SUN X S, et al. DIFNet: boosting visual information flow for image captioning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 18020-18029.
|
22 |
LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 9992-10002.
|
23 |
KARPATHY A, JOULIN A, LI F F. Deep fragment embeddings for bidirectional image sentence mapping. Advances in Neural Information Processing Systems, 2014, 3, 1889- 1897.
doi: 10.5555/2969033.2969038
|
24 |
VEDANTAM R, ZITNICK C L, PARIKH D. CIDEr: consensus-based image description evaluation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 4566-4575.
|
25 |
PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. [S. l.]: ACL, 2001: 311-318.
|
26 |
|
27 |
|
28 |
ZHANG X Y, SUN X S, LUO Y P, et al. RSTNet: captioning with adaptive attention on visual and non-visual words[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 15465-15474.
|
29 |
WANG Y Y, XU J G, SUN Y F. End-to-end transformer based model for image captioning. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(3): 2585- 2594.
doi: 10.1609/aaai.v36i3.20160
|
30 |
LI Y N, MA Y W, ZHOU Y Y, et al. Semantic-guided selective representation for image captioning. IEEE Access, 2023, 11, 14500- 14510.
doi: 10.1109/ACCESS.2023.3243952
|