CAFR-YOLO:基于YOLOv8的多尺度目标检测算法

doi:10.19678/j.issn.1000-3428.0252821

摘要/Abstract

摘要： 针对复杂场景下多尺度目标检测任务中存在的跨层级特征交互不足、特征表达能力有限等问题，提出一种基于 YOLOv8 的改进模型CAFR-YOLO。首先，设计了一种新颖的跨尺度特征重组流程，构建了通道注意力引导的跨尺度特征重组模块(CAFR)。该模块通过以特定层级为融合主干，结合尺度对齐、注意力加权融合及特征子集拼接策略，有效缓解了传统特征金字塔结构中跨层级交互不足的问题。其次，在局部层面，主干网络中引入 C2f_DCNv3 模块，利用可变形卷积的动态采样特性显著提升了模型的几何适应性；在全局角度，结合可切换空洞卷积(SAC)与 C2f 模块构建 C2f_SAConv 模块，通过动态空洞率优化了多尺度语义特征融合，二者从不同维度增强了模型对复杂场景的鲁棒性。最后，采用 SPDConv 替代传统卷积架构，通过空间-通道维度的特征重组增强了模型表征能力，同时降低了计算复杂度。实验结果表明，在PASCAL VOC数据集上，CAFR-YOLO取得了86.3%的mAP@0.5和67.2%的mAP@0.5:0.95，计算量与原模型相当；在MS COCO数据集上，map@0.5和mAP@0.5:0.95分别提升了3.5%和3.9%。与现有主流方法相比，CAFR-YOLO在多项指标上均表现出显著优势，在保持计算效率的同时，显著提升了多尺度目标检测的精度和鲁棒性，为实时目标检测任务提供了新的解决方案。

Abstract: This paper proposes an improved YOLOv8-based model named CAFR-YOLO to address the issues of insufficient cross-level feature interaction and limited feature representation capability in multi-scale object detection under complex scenes. First, a novel cross-scale feature reorganization pipeline was designed, constructing the Channel Attention-guided Feature Reorganization (CAFR) module. By using a specific layer as the fusion backbone and incorporating scale alignment, attention-weighted fusion, and feature subset splicing strategies, it alleviates insufficient cross-level interaction in traditional feature pyramid structures. Secondly, at the local level, the method introduces the C2f_DCNv3 module into the backbone network, significantly enhancing the model's geometric adaptability by exploiting the dynamic sampling characteristics of deformable convolution. From a global perspective, the C2f_SAConv module is constructed by combining Switchable Atrous Convolution (SAC) with the C2f module, optimizing multi-scale semantic feature fusion through dynamic atrous rate adjustment. These two approaches enhance the model's robustness to complex scenes. Finally, SPDConv replaces traditional convolution structures, strengthening feature representation through spatial-channel reorganization while reducing computational complexity. Experimental results demonstrate that CAFR-YOLO achieves 86.3% mAP@0.5 and 67.2% mAP@0.5:0.95 on the PASCAL VOC dataset with comparable computational costs to the original model. On the MS COCO dataset, it improves mAP@0.5 and mAP@0.5:0.95 by 3.5% and 3.9%, respectively. Compared to existing state-of-the-art methods, CAFR-YOLO exhibits significant advantages across multiple metrics. The proposed CAFR-YOLO model substantially enhances multi-scale object detection accuracy and robustness while maintaining computational efficiency, providing a novel solution for real-time object detection tasks.

张瑶, 张俊三, 马俊朋, 姚宗全, 刘天一. CAFR-YOLO:基于YOLOv8的多尺度目标检测算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252821.

Zhang Yao, Zhang Junsan, Ma Junpeng, Yao Zongquan, Liu Tianyi. CAFR-YOLO: A Multi-Scale Object Detection Algorithm Based on YOLOv8[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252821.

参考文献

[1] 付苗苗,邓淼磊,张德贤.深度神经网络图像目标检测算法综述[J].计算机系统应用,2022,31(07):35-45.DOI:10.15888/j.cnki.csa.008595. FU Miao-Miao, DENG Miao-Lei, ZHANG De-Xian. Survey on Deep Neural Network Image Target Detection Algorithms[J], Computer Systems & Applications, 2022, 31(7): 35-45.
[2] 胡清翔,饶文碧,熊盛武.面向无人机遥感场景的轻量级小目标检测算法[J].计算机工程,2023,49(12):169-177.DOI:10.19678/j.issn.1000-3428.0066677. HU Qingxiang, RAO Wenbi, XIONG Shengwu. Lightweight Small Object Detection Algorithm for UAV Remote Sensing Scene[J], Computer Engineering, 2023, 49(12)
[3] 曹家乐,李亚利,孙汉卿,等.基于深度学习的视觉目标检测技术综述[J].中国图象图形学报,2022,27(06):1697-1722. Cao J, Li Y, Sun H, Xie J, Huang K, Pang Y, et al. A Survey on Deep Learning Based Visual Object Detection[J], Journal of Image and Graphics, 2022, 27(6): 1697-1722.
[4] Ross Girshick. Fast R-CNN.[J], 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015: 1440-1448.
[5] Shaoqing R, Kaiming H, Ross G, Jian S, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J], IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 28: 91-99.
[6] Joseph R, Santosh D, Ross G, Ali F, et al. You Only Look Once: Unified, Real-Time Object Detection.[J], Computing Research Repository, 2016: 779-788.
[7] Joseph Redmon, Ali Farhadi. YOLOv3: an Incremental Improvement.[J], Computing Research Repository, 2018, abs/1804.02767
[8] Wei L, Dragomir A, Dumitru E, Christian S, Scott R, Cheng-Yang F, Alexander C B, et al. SSD: Single Shot MultiBox Detector[J], Lecture Notes in Computer Science, 2016, 9905: 21-37.
[9] JOCHER G. Yolov8[EB/OL]. [2023]. https://github.com/ultralytics/ultralytics.
[10] Xianzhe X, Yiqi J, Weihua C, Yilun H, Yuan Z, Xiuyu S, et al. DAMO-YOLO : A Report on Real-Time Object Detection Design[J], Computing Research Repository, 2022, abs/2211.15444
[11] Chengcheng W, Wei H, Ying N, Jianyuan G, Chuanjian L, Yunhe W, Kai H, et al. Gold-YOLO: Efficient Object Detector Via Gather-and-Distribute Mechanism[J], Computing Research Repository, 2023
[12] Tsung-Yi L, Piotr D, Ross G, Kaiming H, Bharath H, Serge B, et al. Feature Pyramid Networks for Object Detection.[C], Computer Vision and Pattern Recognition, 2017, abs/1612.03144: 936-944.
[13] Wenhai W, Jifeng D, Zhe C, Zhenhang H, Zhiqi L, Xizhou Z, Xiaowei H, Tong L, Lewei L, Hongsheng L, Xiaogang W, Yu Q, et al. InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.[J], CVPR 2023, 2023: 14408-14419.15
[14] Siyuan Qiao, Liang-Chieh Chen, Alan Yuille. DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution[C], Computer Vision and Pattern Recognition, 2020: 10208-10219.
[15] Raja Sunkara, Tie Luo. No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects.[J], Computing Research Repository, 2023: 443-459.
[16] Kaixin W, Jun H L, Yingtian Z, Daquan Z, Jiashi F, et al. PANet: Few-Shot Image Semantic Segmentation with Prototype Alignment[C], IEEE International Conference on Computer Vision, 2019: 9196-9205.
[17] JOCHER A, CHAURASIA A, STOKEN A. Yolov5[EB/OL]. [2022]. https://github.com/ultralytics/yolov5.
[18] Shangliang X, Xinxin W, Wenyu L, Qinyao C, Cheng C, Kaipeng D, Guanzhong W, Qingqing D, Shengyu W, Yuning D, Baohua L, et al. PP-YOLOE: an Evolved Version of YOLO[J], Computing Research Repository, 2022, abs/2203.16250
[19] Guoyu Y, Jie L, Zhikuan Z, Siyu C, Zunlei F, Ronghua L, et al. AFPN: Asymptotic Feature Pyramid Network for Object Detection[J], IEEE International Conference on Systems, Man and Cybernetics, 2023: 2184-2189.
[20] Mingxing Tan, Ruoming Pang, Quoc Le. EfficientDet: Scalable and Efficient Object Detection[C], Computer Vision and Pattern Recognition, 2020: 10778-10787.
[21] Sanghyun W, Jongchan P, Joon-Young L, In S K, et al. CBAM: Convolutional Block Attention Module[C], European Conference on Computer Vision: 3-19.
[22] M E, L V G, CKI W, J W, A Z, et al. The pascal visual object classes challenge 2012 (voc2012) results (2012)[J], user-5f8411ab4c775e9685ff56d3, 2011
[23] Tsung-Yi L, Michael M, Serge B, James H, Pietro P, Deva R, Piotr D, C. L Z, et al. Microsoft COCO: Common Objects in Context[J], Lecture Notes in Computer Science, 2014, 8693: 740-755.
[24] Yuan D, Weiming L, Heng W, Wei X, Kejun L, et al. YOLO-Former: Marrying YOLO and Transformer for Foreign Object Detection[J], IEEE Transactions on Instrumentation and Measurement, 2022, 71
[25] Chengcheng W, Wei H, Ying N, Jianyuan G, Chuanjian L, Yunhe W, Kai H, et al. Gold-YOLO: Efficient Object Detector Via Gather-and-Distribute Mechanism[J], Computing Research Repository, 2023
[26] Yuan Dai, Weiming Liu. GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model.[J], Entropy, 2023, 25(4)
[27] Minze L, Yuling C, Tao Z, Wu H, et al. TA-YOLO: a Lightweight Small Object Detection Model Based on Multi-Dimensional Trans-Attention Module for Remote Sensing Images[J], COMPLEX & INTELLIGENT SYSTEMS, 2024, 10(4): 5459-5473.
[28] Jie Chen, Meng Joo Er. Dynamic YOLO for Small Underwater Object Detection[J], Artificial Intelligence Review, 2024, 57(7)
[29] Zu J K, Yi-Fei T, Hezerul A K, Hairul A A R, et al. Improved YOLOv8 Model for a Comprehensive Approach to Object Detection and Distance Estimation[J], IEEE Access, 2024, 12: 63754-63767.
[30] Yukang H, Mingyuan Y, Qingbin T, Tonghao W, Ruifeng W, Haihua W, et al. FA-YOLO: Research on Efficient Feature Selection YOLO Improved Algorithm Based on FMDS and AGMF Modules[J], Computing Research Repository, 2024, abs/2408.16313
[31] Zhiqiang Y, Qiu G, Zhongwen Y, Xinli X, Haixia L, Sheng L, Haigen H, Ying T, et al. MHAF-YOLO: Multi-Branch Heterogeneous Auxiliary Fusion YOLO for Accurate Object Detection.[J], Computing Research Repository, 2025, abs/2502.04656
[32] Chunxian W, Xiaoxing W, Yiwen W, Shengchao H, Hongyang C, Xuehai G, Junchi Y, Tao H, et al. FastDARTSDet: Fast Differentiable Architecture Joint Search on Backbone and FPN for Object Detection[J], Applied Sciences, 2022, 12(20)
[33] Hatem Ibrahem, Ahmed Salem, Hyun-Soo Kang. LEOD-Net: Learning Line-Encoded Bounding Boxes for Real-Time Object Detection[J], Sensors, 2022, 22(10): 3699-3699.
[34] Rui T, Hui S, Di L, Hui X, Miao Q, Jun K, et al.

选择文件类型/文献管理软件名称

选择包含的内容