作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于Transformer的DETR目标检测算法研究综述

  • 出版日期:2024-12-11 发布日期:2024-12-11

A Comprehensive Review of the DETR Object Detection Algorithm Based on Transformer

  • Online:2024-12-11 Published:2024-12-11

摘要: 目标检测领域中,卷积神经网络(CNN)长期占据主导地位,并以其准确性和可扩展性在学术界得到广泛认可。目标检测领域先后涌现出多个代表性模型,如R-CNN系列(包括FastRCNN、FasterRCNN等)和YOLO系列。随着Transformer在自然语言处理领域的成功,研究者开始探索将其用于计算机视觉,由此产生了如ViT和Swin-ViT等视觉骨干网络。2020年,Facebook团队为减少目标检测任务中的先验知识和后处理,推出了基于Transformer的 DETR,一种端到端目标检测算法。尽管DETR在目标检测领域展现出潜力,但也存在诸如收敛速度慢、准确性较差、目标查询的物理意义不明确等缺点。这促使诸多研究人员对该算法开展了进一步的研究和改进。本文旨在分析整理总结针对DETR的改进探索,并分析他们的优势与不足,同时对利用DETR开展的前沿研究和细分应用领域进行概括,最后给出DETR在计算机视觉领域的未来展望。

Abstract: Convolutional Neural Networks (CNNs) have established their supremacy in the realm of object detection, earning widespread acclaim in scholarly circles for their precision and scalability. This domain has spawned numerous notable models, including the R-CNN series (FastRCNN, FasterRCNN, and others) and the YOLO series. After the success of Transformers in the field of natural language processing, researchers began to explore their application in computer vision, leading to the development of visual backbone networks such as ViT and Swin-ViT. In 2020, the Facebook research team unveiled DETR, an end-to-end object detection algorithm based on Transformers, designed to minimize the need for prior knowledge and post-processing in object detection tasks. Despite the promise shown by DETR in object detection, it is not without its shortcomings, including slow convergence speed, diminished accuracy, and the ambiguous physical significance of target queries. These issues have spurred a wave of research aimed at refining and enhancing the algorithm. This paper endeavors to collate, scrutinize, and synthesize the various efforts directed towards the improvement of DETR, assessing their respective merits and demerits. Furthermore, it offers a comprehensive overview of state-of-the-art research and specialized application domains that employ DETR, and concludes with a prospective analysis of DETR’s future role in the field of computer vision.