1 |
RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[EB/OL]. [2024-05-11]. https://arxiv.org/abs/2103.00020.
|
2 |
JIA C, YANG Y F, XIA Y, et al. Scaling up visual and vision-language representation learning with noisy text supervision[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2102.05918.
|
3 |
ZHAI X H, WANG X, MUSTAFA B, et al. LiT: zero-shot transfer with locked-image text tuning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 18102-18112.
|
4 |
LIU H, LI C, WU Q, et al. Visual instruction tuning[C]//Proceedings of Conference on Neural Information Processing Systems (NeurIPS). Washington D.C., USA: IEEE Press, 2023: 34892-34916.
|
5 |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2010.11929.
|
6 |
ZONG Z F, SONG G L, LIU Y. DETRs with collaborative hybrid assignments training[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 6725-6735.
|
7 |
CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 833-851.
|
8 |
冯耀功, 于剑, 桑基韬, 等. 基于知识的零样本视觉识别综述. 软件学报, 2021, 32(2): 370- 405.
|
|
FENG Y G, YU J, SANG J T, et al. Survey on knowledge-based zero-shot visual recognition. Journal of Software, 2021, 32(2): 370- 405.
|
9 |
PHAM H, DAI Z, GHIASI G, et al. Combined scaling for zero-shot transfer learning. Neurocomputing, 2023, 555, 126658.
doi: 10.1016/j.neucom.2023.126658
|
10 |
LAMPERT C H, NICKISCH H, HARMELING S. Learning to detect unseen object classes by between-class attribute transfer[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2009: 951-958.
|
11 |
PALATUCCI M, POMERLEAU D, HINTON G, et al. Zero-shot learning with semantic output codes[C]// Proceedings of the 22nd International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2009: 1410-1418.
|
12 |
ROMERA-PAREDES B, TORR P H S. An embarrassingly simple approach to zero-shot learning. Berlin, Germany: Springer International Publishing, 2017.
|
13 |
CHANGPINYO S, CHAO W L, SHA F. Predicting visual exemplars of unseen classes for zero-shot learning[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2017: 3496-3505.
|
14 |
KODIROV E, XIANG T, GONG S G. Semantic autoencoder for zero-shot learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2017: 4447-4456.
|
15 |
LI Y N, WANG D H, HU H H, et al. Zero-shot recognition using dual visual-semantic mapping paths[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2017: 5207-5215.
|
16 |
刘欢, 郑庆华, 罗敏楠, 等. 基于跨域对抗学习的零样本分类. 计算机研究与发展, 2019, 56(12): 2519- 2535.
|
|
LIU H, ZHENG Q H, LUO M N, et al. Cross-domain adversarial learning for zero-shot classification. Journal of Computer Research and Development, 2019, 56(12): 2519- 2535.
|
17 |
JIANG H J, WANG R P, SHAN S G, et al. Transferable contrastive network for generalized zero-shot learning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2019: 9764-9773.
|
18 |
XIAN Y Q, SHARMA S, SCHIELE B, et al. f-VAEGAN-D2: a feature generating framework for any-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2019: 10267-10276.
|
19 |
HAN Z Y, FU Z Y, CHEN S, et al. Contrastive embedding for generalized zero-shot learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2021: 2371-2381.
|
20 |
张杰, 廖盛斌, 张浩峰, 等. 基于类别扩展的广义零样本图像分类方法. 电子学报, 2023, 51(4): 1068- 1080.
|
|
ZHANG J, LIAO S B, ZHANG H F, et al. Category expansion based generalized zero-shot image classification. Acta Electronica Sinica, 2023, 51(4): 1068- 1080.
|
21 |
WU J Z, LI X T, XU S L, et al. Towards open vocabulary learning: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(7): 5092- 5113.
doi: 10.1109/TPAMI.2024.3361862
|
22 |
ALLINGHAM J U, REN J, DUSENBERRY M W, et al. A simple zero-shot prompt weighting technique to improve prompt ensembling in text-image models[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2302.06235.
|
23 |
SHU M L, NIE W L, HUANG D A, et al. Test-time prompt tuning for zero-shot generalization in vision-language models[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2209.07511.
|
24 |
HASSAN J, GANI H, HUSSEIN N, et al. Align your prompts: test-time prompting with distribution alignment for zero-shot generalization[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2311.01459.
|
25 |
MA X, ZHANG J, GUO S, et al. SwapPrompt: test-time prompt adaptation for vision-language models[C]//Proceedings of Proceedings of the 37th International Conference on Neural Information Processing System. New York, USA: ACM Press, 2024: 65252-65264.
|
26 |
ZHANG D C, ZHOU Z, LI Y F. Robust test-time adaptation for zero-shot prompt tuning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 16714-16722.
|
27 |
GE Y H, REN J, GALLAGHER A, et al. Improving zero-shot generalization and robustness of multi-modal models[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 11093-11101.
|
28 |
PRATT S, COVERT I, LIU R, et al. What does a platypus look like? Generating customized prompts for zero-shot image classification[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 15645-15655.
|
29 |
|
30 |
NOVACK Z, MCAULEY J, LIPTON Z C, et al. CHiLS: zero-shot image classification with hierarchical label sets[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2302.02551.
|
31 |
UDANDARAO V, GUPTA A, ALBANIE S. SuS-X: training-free name-only transfer of vision-language models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 2725-2736.
|
32 |
|
33 |
GUO Z Y, ZHANG R R, QIU L T, et al. CALIP: zero-shot enhancement of CLIP with parameter-free attention[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2023: 746-754.
|
34 |
|
35 |
LI J N, SAVARESE S, HOI S C H. Masked unsupervised self-training for label-free image classification[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2206.02967.
|
36 |
LI X, BEHPOUR S, DOAN T, et al. UP-DP: unsupervised prompt learning for data pre-selection with vision-language models[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2307.11227.
|
37 |
LI X, WEN C C, HU Y, et al. RS-CLIP: zero shot remote sensing scene classification via contrastive vision-language supervision. International Journal of Applied Earth Observation and Geoinformation, 2023, 124, 103497.
doi: 10.1016/j.jag.2023.103497
|
38 |
ESMAEILPOUR S, LIU B, ROBERTSON E, et al. Zero-shot out-of-distribution detection based on the pre-trained model CLIP[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2022: 6568-6576.
|
39 |
LIU K, FU Z H, CHEN C, et al. Category-extensible out-of-distribution detection via hierarchical context descriptions[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2407.16725.
|
40 |
SHIPARD J, WILIEM A, THANH K N, et al. Diversity is definitely needed: improving model-agnostic zero-shot classification via stable diffusion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Washington D.C., USA: IEEE Press, 2023: 769-778.
|
41 |
WANG Z B, LIANG J, HE R, et al. Improving zero-shot generalization for CLIP with synthesized prompts[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2307.07397.
|
42 |
LIAO N, LIU Y F, LI X B, et al. CoHOZ: contrastive multimodal prompt tuning for hierarchical open-set zero-shot recognition[C]//Proceedings of the 30th ACM International Conference on Multimedia. New York, USA: ACM Press, 2022: 3262-3271.
|
43 |
|
44 |
TANG B W, ZHANG J, YAN L, et al. Data-free generalized zero-shot learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 5108-5117.
|
45 |
|
46 |
WANG H, LIU F, JIAO L C, et al. ViLT-CLIP: video and language tuning CLIP with multimodal prompt learning and scenario-guided optimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 5390-5400.
|
47 |
WANG Q, DU J L, YAN K, et al. Seeing in flowing: adapting CLIP for action recognition with motion prompts learning[C]//Proceedings of the 31st ACM International Conference on Multimedia. New York, USA: ACM Press, 2023: 5339-5347.
|
48 |
ZHENG Z W, MA M Y, WANG K, et al. Preventing zero-shot transfer degradation in continual learning of vision-language models[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 19068-19079.
|
49 |
ANDONIAN A, CHEN S X, HAMID R. Robust cross-modal representation learning with progressive self-distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 16409-16420.
|
50 |
GAO Y T, LIU J F, XU Z H, et al. PyramidCLIP: hierarchical feature alignment for vision-language model pretraining[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2204.14095.
|
51 |
YU H Y, WANG X C, LI B, et al. Chinese text recognition with A pre-trained CLIP-like model through image-IDS aligning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 11909-11918.
|
52 |
LU M Y, CHEN B W, ZHANG A, et al. Visual language pretrained multiple instance zero-shot transfer for histopathology images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 19764-19775.
|
53 |
WANG Z F, WU Z B, AGARWAL D, et al. MedCLIP: contrastive learning from unpaired medical images and text[C]//Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, USA: Association for Computational Linguistics, 2022: 3876-3887.
|
54 |
LIN Y Q, CHEN M H, ZHANG K P, et al. TagCLIP: a local-to-global framework to enhance open-vocabulary multi-label classification of CLIP without training[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 3513-3521.
|
55 |
GU X Y, LIN T Y, KUO W C, et al. Open-vocabulary object detection via vision and language knowledge distillation[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2104.13921.
|
56 |
DU Y, WEI F Y, ZHANG Z H, et al. Learning to prompt for open-vocabulary object detection with vision-language model[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 14064-14073.
|
57 |
LIU Z M, HU X F, NEVATIA R. Efficient feature distillation for zero-shot annotation object detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Washington D.C., USA: IEEE Press, 2024: 882-891.
|
58 |
MA Z Y, LUO G, GAO J, et al. Open-vocabulary one-stage detection with hierarchical visual-language knowledge distillation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 14054-14063.
|
59 |
RASHEED H, MAAZ M, KHATTAK M U, et al. Bridging the gap between object and image-level representations for open-vocabulary detection[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2207.03482.
|
60 |
WANG L T, LIU Y, DU P H, et al. Object-aware distillation pyramid for open-vocabulary object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 11186-11196.
|
61 |
ZHONG Y W, YANG J W, ZHANG P C, et al. RegionCLIP: region-based language-image pretraining[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 16772-16782.
|
62 |
GAO M F, XING C, NIEBLES J C, et al. Open vocabulary object detection with pseudo bounding-box labels[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2111.09452.
|
63 |
ZHAO S Y, ZHANG Z X, SCHULTER S, et al. Exploiting unlabeled data with vision and language models for object detection. Berlin, Germany: Springer, 2022.
|
64 |
|
65 |
|
66 |
FENG C J, ZHONG Y J, JIE Z Q, et al. PromptDet: towards open-vocabulary detection using uncurated images[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 701-717.
|
67 |
WU X S, ZHU F, ZHAO R, et al. CORA: adapting CLIP for open-vocabulary detection with region prompting and anchor pre-matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 7031-7040.
|
68 |
FANG R H, PANG G S, BAI X. Simple image-level classification improves open-vocabulary object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 1716-1725.
|
69 |
KUO W C, CUI Y, GU X Y, et al. F-VLM: open-vocabulary object detection upon frozen vision and language models[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2209.15639.
|
70 |
JEONG J, PARK G, YOO J, et al. ProxyDet: synthesizing proxy novel classes via classwise mixup for open-vocabulary object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 2462-2470.
|
71 |
LIN J H, SHEN Y H, WANG B Q, et al. Weakly supervised open-vocabulary object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 3404-3412.
|
72 |
SHEN H Z, ZHAO T C, ZHU M W, et al. GroundVLP: harnessing zero-shot visual grounding from vision-language pre-training and open-vocabulary object detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 4766-4775.
|
73 |
YAO L W, HAN J H, WEN Y P, et al. DetCLIP: dictionary-enriched visual-concept paralleled pre-training for open-world detection[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2209.09407.
|
74 |
ZHOU X Y, GIRDHAR R, JOULIN A, et al. Detecting twenty-thousand classes using image-level supervision. Berlin, Germany: Springer, 2022.
|
75 |
WANG T. Learning to detect and segment for open vocabulary object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 7051-7060.
|
76 |
XIE J, ZHENG S. Zero-shot object detection through vision-language embedding alignment[C]//Proceedings of the IEEE International Conference on Data Mining Workshops (ICDMW). Washington D.C., USA: IEEE Press, 2022: 1-15.
|
77 |
ZHANG D M, LI C, ZHANG R R, et al. FM-OV3D: foundation model-based cross-modal knowledge blending for open-vocabulary 3D detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 16723-16731.
|
78 |
MAO Y, DENG J, ZHOU W, et al. CLIP4HOI: towards adapting CLIP for practical zero-shot HOI detection[C]//Proceedings of the 37th International Conference on Neural Information Processing System. New York, USA: ACM Press, 2024: 45895-45906.
|
79 |
NAG S, ZHU X T, SONG Y Z, et al. Zero-shot temporal action detection via vision-language prompting[C]// Proceedings of the European Conference on Computer Vision. Berlin, Germany: Springer, 2022: 681-697.
|
80 |
LI L, XIAO J, CHEN G K, et al. Zero-shot visual relation detection via composite visual cues from large language models[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2305.12476.
|
81 |
YANG S, WANG Y Q, JI X F, et al. Multi-modal prompting for open-vocabulary video visual relationship detection[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 6513-6521.
|
82 |
XU W H, XU R T, WANG C W, et al. Spectral prompt tuning: unveiling unseen classes for zero-shot semantic segmentation[C]//Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto, USA: AAAI Press, 2024: 6369-6377.
|
83 |
MA C F, YANG Y H, WANG Y F, et al. Open-vocabulary semantic segmentation with frozen vision-language models[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2210.15138.
|
84 |
JEONG J, ZOU Y, KIM T, et al. WinCLIP: zero-/few-shot anomaly classification and segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 19606-19616.
|
85 |
|
86 |
DING J, XUE N, XIA G S, et al. Decoupling zero-shot semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 11573-11582.
|
87 |
|
88 |
GHIASI G, GU X Y, CUI Y, et al. Scaling open-vocabulary image segmentation with image-level labels. Berlin, Germany: Springer, 2022.
|
89 |
PANDEY P, CHASMAI M, NATARAJAN M, et al. A language-guided benchmark for weakly supervised open vocabulary semantic segmentation[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2302.14163.
|
90 |
LIANG F, WU B C, DAI X L, et al. Open-vocabulary semantic segmentation with mask-adapted CLIP[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 7061-7070.
|
91 |
YU Q H, HE J, DENG X Q, et al. Convolutions die hard: open-vocabulary segmentation with single frozen convolutional CLIP[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2308.02487.
|
92 |
FAHES M, VU T H, BURSUC A, et al. PØDA: prompt-driven zero-shot domain adaptation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 18577-18587.
|
93 |
|
94 |
ZHOU C, LOY C C, DAI B. Extract free dense labels from CLIP. Berlin, Germany: Springer, 2022.
|
95 |
JIAO S Y, WEI Y C, WANG Y W, et al. Learning mask-aware CLIP representations for zero-shot segmentation[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2310.00240.
|
96 |
ZHOU Z Q, LEI Y J, ZHANG B W, et al. ZegCLIP: towards adapting CLIP for zero-shot semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 11175-11185.
|
97 |
YU S, SEO P H, SON J. Zero-shot referring image segmentation with global-local context features[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 19456-19465.
|
98 |
WANG Y B, HUANG S F, GAO Y L, et al. Transferring CLIP's knowledge into zero-shot point cloud semantic segmentation[C]//Proceedings of the 31st ACM International Conference on Multimedia. New York, USA: ACM Press, 2023: 3745-3754.
|
99 |
|
100 |
LI W, ZHU L C, WEN L Y, et al. DeCap: decoding CLIP latents for zero-shot captioning via text-only training[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2303.03032.
|
101 |
QIU L T, NING S, HE X M. Mining fine-grained image-text alignment for zero-shot captioning via text-only training[EB/OL]. [2024-05-11]. http://arxiv.org/abs/2401.02347.
|
102 |
FEI J J, WANG T, ZHANG J R, et al. Transferable decoding with visual entities for zero-shot image captioning[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 3113-3123.
|
103 |
刘天义, 吴祖煊, 陈静静, 等. 面向视觉语言理解与生成的多模态预训练方法. 软件学报, 2022, 34(5): 2024- 2034.
|
|
LIU T Y, WU Z X, CHEN J J, et al. Multimodal pre-training method for vision-language understanding and generation. Journal of Software, 2023, 34(5): 2024- 2034.
|
104 |
GUO J Y, WANG C F, WU Y, et al. Zero-shot generative model adaptation via image-specific prompt learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 11494-11503.
|
105 |
BALDRATI A, AGNOLUCCI L, BERTINI M, et al. Zero-shot composed image retrieval with textual inversion[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2023: 15292-15301.
|
106 |
SAIN A, BHUNIA A K, CHOWDHURY P N, et al. CLIP for all things zero-shot sketch-based image retrieval, fine-grained or not[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 2765-2775.
|
107 |
JIANG R X, LIU L B, CHEN C W. CLIP-Count: towards text-guided zero-shot object counting[C]//Proceedings of the 31st ACM International Conference on Multimedia. New York, USA: ACM Press, 2023: 4535-4545.
|
108 |
LUO J Q, WANG Z N, WU C H, et al. Zero-shot model diagnosis[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 11631-11640.
|