[1] 胡静赵新瑜, HU JING Z. 基于跨域特征解耦与语义原型引导的图文检索方法[J/OL]. 计算机工程, 2026: 0. DOI:10.19678/j.issn.1000-3428.0252767.
Hu jing, Zhao xinyu, Peng mingchao. Image-Text Retrieval via Cross-Domain Feature Disentanglement and Semantic Prototype Guidance[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252767.
[2] 杨钰雪何甜, YANG YUXUE H T. 基于交叉注意力与特征聚合的跨模态图文检索研究[J/OL]. 计算机工程, 2025: 0. DOI:10.19678/j.issn.1000-3428.0070119.
YANG Yuxue, HE Tian, FAN Jinghang, LIU Ruiying, LI Teng. Research on Cross-Modal Image-Text Retrieval Based on Cross Attention and Feature Aggregation[J]. Computer Engineering, 2026, 52(2): 311-321.
[3] SANGKLOY P, JITKRITTUM W, YANG D, 等. A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch[A/OL]. arXiv, 2022[2025-11-28]. http://arxiv.org/abs/2208.03354. DOI:10.48550/arXiv.2208.03354.
[4] CHOWDHURY P N, BHUNIA A K, SAIN A, 等. SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text[A/OL]. arXiv, 2023[2025-11-28]. http://arxiv.org/abs/2204.11964. DOI:10.48550/arXiv.2204.11964.
[5] LEVY M, BEN-ARI R, DARSHAN N, 等. Chatting Makes Perfect: Chat-based Image Retrieval[J]. Advances in Neural Information Processing Systems, 2023, 36: 61437-61449.
[6] 基于属性解纠缠表示的交互式服装图像检索 - 中国知网[EB/OL]. [2026-01-11]. https://kns.cnki.net/kcms2/article/abstract?v=SQKXI91EiTp0CGtl5Rf8eW087z7OMVV71F131ywdaIrBv5GE6bu5LLCV6r03kJ5u8a2BGj257RIeZg43H8X9YocoSe0LIfp689s3zQinFlirGM_LXQXdMBc-bmZn5ISf1SbLLhRniAv1STjWnIxSxrCGuK5F8g2bfKacxOPyHOTgS6Hlp0SldA==&uniplatform=NZKPT⟨uage=CHS.
HUANG Xiaoju, HUANG Xiaoju. Interactive Clothing Retrieval Based on Attribute Disentangled Representations [J]. Computer & Digital Engineering,2025,53(03):829-834.
[7] ZHU H, HUANG J H, RUDINAC S, 等. Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models[C/OL]//Proceedings of the 2024 International Conference on Multimedia Retrieval. 2024: 978-987[2025-11-24]. http://arxiv.org/abs/2404.18746. DOI:10.1145/3652583.3658032.
[8] LIU F, ZOU C, DENG X, 等. SceneSketcher: Fine-Grained Image Retrieval with Scene Sketches[C/OL]//VEDALDI A, BISCHOF H, BROX T, 等. Computer Vision – ECCV 2020. Cham: Springer International Publishing, 2020: 718-734. DOI:10.1007/978-3-030-58529-7_42.
[9] WU Z, WANG Q, YANG J. SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation[A/OL]. arXiv, 2024[2025-11-28]. http://arxiv.org/abs/2405.18801. DOI:10.48550/arXiv.2405.18801.
[10] KARTHIK S, ROTH K, MANCINI M, 等. Vision-by-Language for Training-Free Compositional Image Retrieval[A/OL]. arXiv, 2023[2024-11-26]. https://arxiv.org/abs/2310.09291. DOI:10.48550/ARXIV.2310.09291.
[11] LÜLF C, LIMA MARTINS D M, VAZ SALLES M A, 等. CLIP-Branches: Interactive Fine-Tuning for Text-Image Retrieval[C/OL]//Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: Association for Computing Machinery, 2024: 2719-2723[2026-01-10]. https://dl.acm.org/doi/10.1145/3626772.3657678. DOI:10.1145/3626772.3657678.
[12] LEE S, YU S, PARK J, 等. Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach[C/OL]//KU L W, MARTINS A, SRIKUMAR V. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics, 2024: 791-809[2025-12-26]. https://aclanthology.org/2024.acl-long.46/. DOI:10.18653/v1/2024.acl-long.46.
[13] LONG Z, LIANG K, ARAGON CAMARASA G, 等. Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image Retrieval[C/OL]//Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. Padua Italy: ACM, 2025: 823-832[2025-11-21]. https://dl.acm.org/doi/10.1145/3726302.3729950. DOI:10.1145/3726302.3729950.
[14] HO J, JAIN A, ABBEEL P. Denoising Diffusion Probabilistic Models[C/OL]//Advances in Neural Information Processing Systems: 卷 33. Curran Associates, Inc., 2020: 6840-6851[2025-04-02]. https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html.
[15] ROMBACH R, BLATTMANN A, LORENZ D, 等. High-Resolution Image Synthesis with Latent Diffusion Models[C/OL]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022: 10674-10685[2025-04-09]. https://ieeexplore.ieee.org/document/9878449. DOI:10.1109/CVPR52688.2022.01042.
[16] GAO C, LIU Q, XU Q, 等. SketchyCOCO: Image Generation from Freehand Scene Sketches[C/OL]. [2026][2026-01-08]. https://openaccess.thecvf.com/content_CVPR_2020/html/Gao_SketchyCOCO_Image_Generation_From_Freehand_Scene_Sketches_CVPR_2020_paper.html.
[17] HAN T, SCHLANGEN D. Draw and Tell: Multimodal Descriptions Outperform Verbal- or Sketch-Only Descriptions in an Image Retrieval Task[C/OL]//KONDRAK G, WATANABE T. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Taipei, Taiwan: Asian Federation of Natural Language Processing, 2017: 361-365[2026-01-11]. https://aclanthology.org/I17-2061/.
[18] SONG J, SONG Y zhe, XIANG T, 等. Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma[C/OL]//Procedings of the British Machine Vision Conference 2017. London, UK: British Machine Vision Association, 2017: 45[2026-01-11]. http://www.bmva.org/bmvc/2017/papers/paper045/index.html. DOI:10.5244/C.31.45.
[19] DEY S, DUTTA A, GHOSH S K, 等. Learning Cross-Modal Deep Embeddings for Multi-Object Image Retrieval using Text and Sketch[C/OL]//2018 24th International Conference on Pattern Recognition (ICPR). 2018: 916-921[2026-01-11]. https://ieeexplore.ieee.org/document/8545452. DOI:10.1109/ICPR.2018.8545452.
[20] KOLEY S, BHUNIA A K, SAIN A, 等. You’ll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval[A/OL]. arXiv, 2024[2025-11-04]. http://arxiv.org/abs/2403.07222. DOI:10.48550/arXiv.2403.07222.
[21] LIU F, DENG X, ZOU C, 等. SceneSketcher-v2: Fine-Grained Scene-Level Sketch-Based Image Retrieval Using Adaptive GCNs[J/OL]. IEEE Transactions on Image Processing, 2022, 31: 3737-3751. DOI:10.1109/TIP.2022.3175403.
[22] GATTI P, PARIKH K, PAUL D P, 等. Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions[A/OL]. arXiv, 2025[2025-11-28]. http://arxiv.org/abs/2502.08438. DOI:10.48550/arXiv.2502.08438.
[23] ZUO R, HU H, DENG X, 等. SceneDiff: Generative Scene-Level Image Retrieval with Text and Sketch Using Diffusion Models[C/OL]//Thirty-Third International Joint Conference on Artificial Intelligence. 2024: 1825-1833[2025-03-28]. https://www.ijcai.org/proceedings/2024/202. DOI:10.24963/ijcai.2024/202.
[24] CHOWDHURY P N, SAIN A, BHUNIA A K, 等. FS-COCO: Towards Understanding of Freehand Sketches of Common Objects in Context[C/OL]//AVIDAN S, BROSTOW G, CISSÉ M, 等. Computer Vision – ECCV 2022. Cham: Springer Nature Switzerland, 2022: 253-270. DOI:10.1007/978-3-031-20074-8_15.
[25] BAI S, CAI Y, CHEN R, 等. Qwen3-VL Technical Report[A/OL]. arXiv, 2025[2026-01-12]. http://arxiv.org/abs/2511.21631. DOI:10.48550/arXiv.2511.21631.
[26] LI J, LI D, XIONG C, 等. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation[C/OL]//Proceedings of the 39th International Conference on Machine Learning. 2022[2025-04-10]. https://proceedings.mlr.press/v162/li22n.html.
[27] MOU C, WANG X, XIE L, 等. T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion Models[J/OL]. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(5): 4296-4304. DOI:10.1609/aaai.v38i5.28226.
[28] XUE L, SHU M, AWADALLA A, 等. xGen-MM (BLIP-3): A Family of Open Large Multimodal Models[A/OL]. arXiv, 2025[2026-01-11]. http://arxiv.org/abs/2408.08872. DOI:10.48550/arXiv.2408.08872.
|