| 1 |
LOWE D G . Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60 (2): 91- 110.
doi: 10.1023/B:VISI.0000029664.99615.94
|
| 2 |
ZHANG Y , JIN R , ZHOU Z H . Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics, 2010, 1 (1): 43- 52.
|
| 3 |
JELODAR H , WANG Y L , YUAN C , et al. Latent Dirichlet Allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 2019, 78 (11): 15169- 15211.
doi: 10.1007/s11042-018-6894-4
|
| 4 |
HARDOON D R , SZEDMAK S , SHAWE-TAYLOR J . Canonical correlation analysis: an overview with application to learning methods. Neural Computation, 2004, 16 (12): 2639- 2664.
doi: 10.1162/0899766042321814
|
| 5 |
ZHENG W M , ZHOU X Y , ZOU C R , et al. Facial expression recognition using Kernel Canonical Correlation Analysis (KCCA). IEEE Transactions on Neural Networks, 2006, 17 (1): 233- 238.
doi: 10.1109/TNN.2005.860849
|
| 6 |
BENTON A, KHAYRALLAH H, GUJRAL B, et al. Deep generalized canonical correlation analysis[C]//Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019). [S. l. ]: ACL, 2019: 1-6.
|
| 7 |
高迪辉, 盛立杰, 许小冬, 等. 图文跨模态检索的联合特征方法. 西安电子科技大学学报, 2024, 51 (4): 128- 138.
|
|
GAO D H , SHENG L J , XU X D , et al. Joint feature approach for image-text cross-modal retrieval. Journal of Xidian University, 2024, 51 (4): 128- 138.
|
| 8 |
LU J S, BATRA D, PARIKH D, et al. ViLBERT: pretraining task-agnostic visiolinguistic representations for vision-and-language tasks[EB/OL]. [2024-05-05]. https://arxiv.org/abs/1908.02265.
|
| 9 |
RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2024-05-05]. https://arxiv.org/abs/2103.00020.
|
| 10 |
LI J N, LI D X, XIONG C M, et al. BLIP: bootstrapping language-image pre-training for unified vision-language understanding and generation[EB/OL]. [2024-05-05]. https://arxiv.org/abs/2201.12086.
|
| 11 |
LI J N, LI D X, SAVARESE S, et al. BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models[C]//Proceedings of the 40th International Conference on Machine Learning. New York, USA: ACM Press, 2023: 19730-19742.
|
| 12 |
|
| 13 |
ZHANG K, MAO Z D, WANG Q, et al. Negative-aware attention framework for image-text matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 15640-15649.
|
| 14 |
YANG J Y, DUAN J L, TRAN S, et al. Vision-language pre-training with triple contrastive learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 15650-15659.
|
| 15 |
LIU Z , PEI X L , GAO S S , et al. Perceive, reason, and align: context-guided cross-modal correlation learning for image-text retrieval. Applied Soft Computing, 2024, 154, 111395.
doi: 10.1016/j.asoc.2024.111395
|
| 16 |
KRISHNA R , ZHU Y K , GROTH O , et al. Visual genome: connecting language and vision using crowd sourced dense image annotations. International Journal of Computer Vision, 2017, 123 (1): 32- 73.
doi: 10.1007/s11263-016-0981-7
|
| 17 |
REN S Q , HE K M , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149.
doi: 10.1109/TPAMI.2016.2577031
|
| 18 |
DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[EB/OL]. [2024-05-05]. https://arxiv.org/abs/1810.04805.
|
| 19 |
CHEN J C, HU H X, WU H, et al. Learning the best pooling strategy for visual semantic embedding[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 15784-15793.
|
| 20 |
DELIÈGE A, ISTASSE M, KUMAR A, et al. Ordinal pooling[C]//Proceedings of the 30th British Machine Vision Conference. Cardiff, UK: BMVA Press, 2019: 76.
|
| 21 |
KARPATHY A, LI F F. Deep visual-semantic alignments for generating image descriptions[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2015: 3128-3137.
|
| 22 |
LI K P, ZHANG Y L, LI K, et al. Visual semantic reasoning for image-text matching[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2019: 4653-4661.
|
| 23 |
MITHUN N C, PANDA R, PAPALEXAKIS E E, et al. Webly supervised joint embedding for cross-modal image-text retrieval[C]//Proceedings of the 26th ACM International Conference on Multimedia. New York, USA: ACM Press, 2018: 1856-1864.
|
| 24 |
JI Z , CHEN K X , HE Y Q , et al. Heterogeneous memory enhanced graph reasoning network for cross-modal retrieval. Science China Information Sciences, 2022, 65 (7): 172104.
doi: 10.1007/s11432-021-3367-y
|
| 25 |
ZHENG Z D , ZHENG L , GARRETT M , et al. Dual-Path convolutional image-text embeddings with instance loss. ACM Transactions on Multimedia Computing, Communications, and Applications, 2020, 16 (2): 1- 23.
|
| 26 |
HUANG Y, WU Q, SONG C F, et al. Learning semantic concepts and order for image and sentence matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 6163-6171.
|
| 27 |
CHEN H, DING G G, LIU X D, et al. IMRAM: iterative matching with recurrent attention memory for cross-modal image-text retrieval[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 12652-12660.
|
| 28 |
杨晓宇, 李超, 陈舜尧, 等. 基于Transformer的图文跨模态检索算法. 计算机科学, 2023, 50 (4): 141- 148.
|
|
YANG X Y , LI C , CHEN S Y , et al. Text-image cross-modal retrieval based on Transformer. Computer Science, 2023, 50 (4): 141- 148.
|
| 29 |
梁彦鹏, 刘雪儿, 马忠贵, 等. 嵌入共识知识的因果图文检索方法. 工程科学学报, 2024 (2): 317- 328.
|
|
LIANG Y P , LIU X E , MA Z G , et al. Causal image-text retrieval embedded with consensus knowledge. Chinese Journal of Engineering, 2024 (2): 317- 328.
|
| 30 |
廖律超, 邹伟东, 杨佳龙, 等. 基于注意力机制和微分跟踪器的宽度学习系统. 深圳大学学报(理工版), 2024, 41 (5): 583- 593.
|
|
LIAO L C , ZOU W D , YANG J L , et al. Broad learning system based on attention mechanism and tracking differentiator. Journal of Shenzhen University (Science and Engineering), 2024, 41 (5): 583- 593.
|