[1] SHARMA V, MURRAY N, LARLUS D, et al. Unsupervised meta-domain adaptation for fashion retrieval[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2021: 1-10. [2] KIAPOUR M H, HAN X, LAZEBNIK S, et al. Where to buy it: matching street clothing photos in online shops[C]//Proceedings of the IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2015: 3343-3351. [3] ZHU S M, ZOU X X, QIAN J J, et al. Learning structured relation embeddings for fine-grained fashion attribute recognition[J]. IEEE Transactions on Multimedia, 2023, 26: 1652-1664. [4] GU J Y, WANG K, LUO H, et al. MSINet: twins contrastive search of multi-scale interaction for object ReID[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 19243-19253. [5] 姜爱萍, 刘骊, 付晓东, 等. 跨模态时尚检索的服装分层特征表示和关联学习[J]. 计算机辅助设计与图形学学报, 2025, 37(4): 654-667. JIANG A P, LIU L, FU X D, et al. Clothing hierarchical feature representation and association learning for cross-modal fashion retrieval[J]. Journal of Computer-Aided Design & Computer Graphics, 2025, 37(4): 654-667. (in Chinese) [6] LIU Z W, LUO P, QIU S, et al. DeepFashion: powering robust clothes recognition and retrieval with rich annotations[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 1096-1104. [7] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].[2024-03-11]. https://arxiv.org/pdf/1409.1556v1. [8] WANG Z H, GU Y J, ZHANG Y, et al. Clothing retrieval with visual attention model[C]//Proceedings of the IEEE Visual Communications and Image Processing (VCIP). Washington D.C., USA: IEEE Press, 2017: 1-4. [9] 韩华, 黄丽, 田瑾, 等. 基于双中间模态的四流网络跨模态行人重识别[J]. 计算机工程, 2023, 49(8): 302-309. HAN H, HUANG L, TIAN J, et al. Cross-modality person re-identification using four-stream network based on dual-intermediate modalities[J]. Computer Engineering, 2023, 49(8): 302-309. (in Chinese) [10] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010. [11] WANG X X, JIANG B, WANG X, et al. Rethinking batch sample relationships for data representation: a batch-graph Transformer based approach[J]. IEEE Transactions on Multimedia, 2023, 26: 1578-1588. [12] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL].[2024-03-11]. https://www.semanticscholar.org/paper/An-Image-is-Worth-16x16-Words%3A-Transformers-for-at-Dosovitskiy-Beyer/268d347e8a55b5eb82f b5e7d2f800e33c75ab18a. [13] HE S T, LUO H, WANG P C, et al. TransReID: Transformer-based object re-identification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2021: 14993-15002. [14] 安国成, 江波, 王晓龙, 等. 基于拓展图文对比学习的多模态语义对齐[J]. 计算机工程, 2024, 50(11): 152-162. AN G C, JIANG B, WANG X L, et al. Multi-modal semantic alignment based on extended image-text contrastive learning[J]. Computer Engineering, 2024, 50(11): 152-162.(in Chinese) [15] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL].[2024-03-11]. https://cdn.openai.com/papers/Learning_Transferable_Visual_Models_From_Natural_Language.pdf. [16] CHIA P J, ATTANASIO G, BIANCHI F, et al. FashionCLIP: connecting language and images for product representations[EB/OL].[2024-03-11]. https://arxiv.org/pdf/2204.03972v2. [17] ZHOU K Y, YANG J K, LOY C C, et al. Learning to prompt for vision-language models[J]. International Journal of Computer Vision, 2022, 130(9): 2337-2348. [18] ZHOU K Y, YANG J K, LOY C C, et al. Conditional prompt learning for vision-language models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 16795-16804. [19] LI S Y, SUN L, LI Q L. CLIP-ReID: exploiting vision-language model for image re-identification without concrete text labels[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2023: 1405-1413. [20] Tianchi. Watch and buy 2021[EB/OL].[2024-03-11]. https://tianchi.aliyun.com/dataset/75730. [21] 邓远飞, 李加伟, 蒋运承. 基于知识注入提示学习的专利短语相似度计算[J]. 计算机工程, 2024, 50(4): 294-302. DENG Y F, LI J W, JIANG Y C. Similarity computation of patent phrases based on knowledge injection prompt learning[J]. Computer Engineering, 2024, 50(4): 294-302. (in Chinese) [22] 周炫余, 吴莲华, 郑勤华, 等. 联合语义提示和记忆增强的弱监督跳绳视频异常检测方法[J]. 计算机工程, 2024, 50(7): 87-95. ZHOU X Y, WU L H, ZHENG Q H, et al. Weakly supervised video anomaly detection method for rope skipping combined with semantic prompts and memory enhancement[J]. Computer Engineering, 2024, 50(7): 87-95. (in Chinese) [23] LESTER B, AL-RFOU R, CONSTANT N. The power of scale for parameter-efficient prompt tuning[EB/OL].[2024-03-11]. https://www.semanticscholar.org/paper/The-Power-of-Scale-for-Parameter-Efficient-Prompt-Lester-Al-Rfou/ffdbd7f0b03b85747b001b4734d5ee31b5229aa4. [24] HERMANS A, BEYER L, LEIBE B. In defense of the triplet loss for person re-identification[EB/OL].[2024-03-11]. https://arxiv.org/abs/1703.07737. [25] ZHUANG Z J, WEI L H, XIE L X, et al. Rethinking the distribution gap of person re-identification with camera-based batch normalization[M]. Berlin, Germany: Springer International Publishing, 2020. [26] XIE W, LI X H, CAO C C, et al. ViT-CX: causal explanation of vision Transformers[EB/OL].[2024-03-11]. https://arxiv.org/abs/2211.03064. |