Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2026, Vol. 52 ›› Issue (3): 141-151. doi: 10.19678/j.issn.1000-3428.0070093

• Computer Vision and Image Processing • Previous Articles     Next Articles

Few-shot Object Detection Method Based on Query Guidance and Semantic Enhancement

XIE Binhong, SHI Yufei*(), ZHANG Rui, ZHANG Yingjun   

  1. School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, Shanxi, China
  • Received:2024-07-09 Revised:2024-08-29 Online:2026-03-15 Published:2026-03-10
  • Contact: SHI Yufei

基于查询引导和语义增强的小样本目标检测方法

谢斌红, 石宇飞*(), 张睿, 张英俊   

  1. 太原科技大学计算机科学与技术学院, 山西 太原 030024
  • 通讯作者: 石宇飞
  • 作者简介:

    谢斌红(CCF会员), 男, 教授、硕士, 主研方向为智能化软件工程、机器学习

    石宇飞(通信作者), 硕士研究生

    张睿, 副教授、博士

    张英俊, 教授级高级工程师、硕士

  • 基金资助:
    山西省基础研究计划面上项目(20210302123216); 吕梁市引进高层次科技人才重点研发项目(2022RC08); 山西省产教融合研究生联合培养示范基地项目(2022JD11)

Abstract:

This study proposes a Few-Shot Object Detection (FSOD) method based on a query-guided strategy and semantic enhancement mechanism to address the following concerns: the lack of prototypical key information, insufficient adaptation to query images in the meta-learning paradigm, and the detector's sensitivity to the variance of the novel class leading to misclassification. The Query Guidance Module (QGM) conditionally couples query-aware information into support features by learning the correlation between the query and support features, aiming to generate specific and representative prototypes for each query image. The Visual Semantic Enhancement Module (VSEM) distils the knowledge from textual semantic information that matches the novel class of visual features and adaptively enhances these features to improve their discriminability and mitigate variance sensitivity for better classification. In addition, the classification and regression tasks are decoupled, and semantic enhancement is performed on the classification branch to facilitate the model's understanding of the target semantics. The experimental results demonstrate that, compared to the currently known state-of-the-art SMPCCNet method, the proposed approach achieves an average improvement of 2.2 percentage points in novel Average Precision (nAP) on the PASCAL VOC dataset and an average improvement of 1.0 percentage points in Average Precision (AP) on the MS COCO dataset, validating its effectiveness.

Key words: object detection, few-shot learning, meta learning, query-guided prototype, semantic enhancement

摘要:

针对元学习范式中原型关键信息欠缺、对查询图像适应性不足以及检测器对新类方差敏感导致误分类问题, 提出一种基于查询引导策略和语义增强机制的小样本目标检测(FSOD)方法。查询引导模块(QGM)通过学习查询与支持特征之间的相关性, 将查询感知信息有条件地耦合到支持特征中, 旨在为每个查询图像生成特定且具有代表性的原型。而视觉语义增强模块(VSEM)从文本语义信息中蒸馏与新类视觉特征相匹配的知识, 并自适应地对这些特征增强, 提高其可判别性, 缓解方差敏感, 以更好地分类。此外, 将分类和回归任务解耦, 在分类分支上执行语义增强, 以促进模型对目标语义的理解。实验结果表明, 相较于目前已知最新的SMPCCNet方法, 所提出方法在PASCAL VOC数据集上的新类平均精度(nAP)提升了2.2百分点, 在MS COCO数据集上的平均精度(AP)提升了1.0百分点, 证明了其有效性。

关键词: 目标检测, 小样本学习, 元学习, 查询引导原型, 语义增强