作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

CAFR-YOLO:基于YOLOv8的多尺度目标检测算法

  • 发布日期:2025-11-07

CAFR-YOLO: A Multi-Scale Object Detection Algorithm Based on YOLOv8

  • Published:2025-11-07

摘要: 针对复杂场景下多尺度目标检测任务中存在的跨层级特征交互不足、特征表达能力有限等问题,提出一种基于 YOLOv8 的改进模型CAFR-YOLO。首先,设计了一种新颖的跨尺度特征重组流程,构建了通道注意力引导的跨尺度特征重组模块(CAFR)。该模块通过以特定层级为融合主干,结合尺度对齐、注意力加权融合及特征子集拼接策略,有效缓解了传统特征金字塔结构中跨层级交互不足的问题。其次,在局部层面,主干网络中引入 C2f_DCNv3 模块,利用可变形卷积的动态采样特性显著提升了模型的几何适应性;在全局角度,结合可切换空洞卷积(SAC)与 C2f 模块构建 C2f_SAConv 模块,通过动态空洞率优化了多尺度语义特征融合,二者从不同维度增强了模型对复杂场景的鲁棒性。最后,采用 SPDConv 替代传统卷积架构,通过空间-通道维度的特征重组增强了模型表征能力,同时降低了计算复杂度。实验结果表明,在PASCAL VOC数据集上,CAFR-YOLO取得了86.3%的mAP@0.5和67.2%的mAP@0.5:0.95,计算量与原模型相当;在MS COCO数据集上,map@0.5和mAP@0.5:0.95分别提升了3.5%和3.9%。与现有主流方法相比,CAFR-YOLO在多项指标上均表现出显著优势,在保持计算效率的同时,显著提升了多尺度目标检测的精度和鲁棒性,为实时目标检测任务提供了新的解决方案。

Abstract: This paper proposes an improved YOLOv8-based model named CAFR-YOLO to address the issues of insufficient cross-level feature interaction and limited feature representation capability in multi-scale object detection under complex scenes. First, a novel cross-scale feature reorganization pipeline was designed, constructing the Channel Attention-guided Feature Reorganization (CAFR) module. By using a specific layer as the fusion backbone and incorporating scale alignment, attention-weighted fusion, and feature subset splicing strategies, it alleviates insufficient cross-level interaction in traditional feature pyramid structures. Secondly, at the local level, the method introduces the C2f_DCNv3 module into the backbone network, significantly enhancing the model's geometric adaptability by exploiting the dynamic sampling characteristics of deformable convolution. From a global perspective, the C2f_SAConv module is constructed by combining Switchable Atrous Convolution (SAC) with the C2f module, optimizing multi-scale semantic feature fusion through dynamic atrous rate adjustment. These two approaches enhance the model's robustness to complex scenes. Finally, SPDConv replaces traditional convolution structures, strengthening feature representation through spatial-channel reorganization while reducing computational complexity. Experimental results demonstrate that CAFR-YOLO achieves 86.3% mAP@0.5 and 67.2% mAP@0.5:0.95 on the PASCAL VOC dataset with comparable computational costs to the original model. On the MS COCO dataset, it improves mAP@0.5 and mAP@0.5:0.95 by 3.5% and 3.9%, respectively. Compared to existing state-of-the-art methods, CAFR-YOLO exhibits significant advantages across multiple metrics. The proposed CAFR-YOLO model substantially enhances multi-scale object detection accuracy and robustness while maintaining computational efficiency, providing a novel solution for real-time object detection tasks.