作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

目标检测中注意力机制综述

  • 发布日期:2024-04-15

A Review of Attention Mechanisms in Object Detection

  • Published:2024-04-15

摘要: Transformer在自然语言处理中表现出惊人的性能,激励了研究人员开始探索其在计算机视觉任务中的应用。DETR将目标检测视为一个集合预测问题,引入Transformer模型来解决目标检测任务,从而避免了传统方法中的提案生成和后处理步骤。最初的DETR在训练收敛和小物体检测方面存在一些问题。为了解决这些问题,研究人员进行了多方面的改进,实现了DETR的实质性改进,提升其性能。我们对DETR的基本模块和最近的增强进行了深入研究,包括对主干结构的修改、查询设计策略和注意机制的改进。同时,还对各种检测器进行了比较分析,评估了它们的性能和网络架构,深入探讨了DETR的局限性和面临的挑战,并展望了未来在该领域的发展方向。通过这篇论文的研究,展示了DETR在计算机视觉任务中的潜力和应用前景。

Abstract: Transformers have shown remarkable performance in natural language processing, inspiring researchers to explore their application in computer vision tasks. DETR views object detection as a set prediction problem and introduces the Transformer model to solve the task, thereby avoiding the proposal generation and post-processing steps in traditional methods. The original DETR had some issues with training convergence and small object detection. To address these issues, researchers made various improvements, resulting in substantial improvements to DETR and demonstrating its state-of-the-art performance. We conducted an in-depth study of the basic modules and recent enhancements of DETR, including modifications to the backbone structure, query design strategies, and attention mechanisms. We also compared and analyzed various detectors, evaluated their performance and network architectures, delved into the limitations and challenges of DETR, and looked forward to future developments in the field. Through this paper’s research, we demonstrate the potential and application prospects of DETR in computer vision tasks.