A Vehicle Detection Algorithm Based on Frequency-Domain Enhancement and Adaptive Multi-Scale Feature Fusion

doi:10.19678/j.issn.1000-3428.0260227

Abstract

Abstract: In complex road traffic scenarios, vehicle detection faces significant challenges, including large variations in object scale, frequent occlusions, and the difficulty of simultaneously achieving high accuracy and real-time performance. To address these issues, An improved vehicle detection algorithm, termed YOLOv13n-FCM, based on the YOLOv13n baseline was improved. First, Frequency Dynamic Convolution (FDConv) is introduced into the backbone to strengthen the modeling capability of multi-frequency information, thereby enhancing the representation of vehicle edge structures and fine-grained details. Second, a Channel–Spatial Fusion (CSF) module is designed to jointly model channel-wise and spatial features, enabling the network to focus on salient vehicle regions while effectively suppressing background interference in complex scenes. Finally, a Multi-Branch Fusion (MBF) module is incorporated into the detection head to perform adaptive, weighted multi-scale feature fusion, further improving the detection performance for vehicles at different scales. The experimental results on the public datasets Vehicle Detection Dataset and BITVehicle show that the YOLOv13n-FCM model achieves good detection performance in various road vehicle scenarios. Specifically, on the Vehicle Detection Dataset, the mAP50 reaches 60.1%, and the mAP50:95 reaches 42.6%, which are 2.7% and 2.6% higher than those of the original YOLOv13n model, respectively; at the same time, compared with the best competing method, it has improved by 2.7% and 1.8% respectively. On the BITVehicle, the proposed method also outperforms the baseline model, indicating its certain cross-scenario adaptability. In addition, after hardware acceleration on an NVIDIA Jetson AGX Orin edge device, YOLOv13n-FCM runs at 78.5 FPS with an input resolution of 640×640. Overall, the proposed method substantially improves detection accuracy while maintaining real-time performance, demonstrating strong practicality for engineering applications.

摘要： 在复杂道路交通场景中，车辆目标检测面临目标尺度变化大、遮挡频繁以及检测精度与实时性难以兼顾等问题。为此，以YOLOv13n为基准模型，提出一种改进的车辆检测算法YOLOv13n-FCM。首先，在骨干网络中引入频率动态卷积（Frequency Dynamic Convolution，FDConv），通过增强网络对多频率特征的建模能力，提升模型对车辆边缘结构与细节特征的表达能力；随后，设计通道—空间特征融合（Channel–Spatial Fusion，CSF）模块，对通道维度与空间维度特征进行联合建模，引导网络更加关注关键车辆区域，有效抑制复杂背景干扰；最后，引入多分支特征融合（Multi-Branch Fusion，MBF）模块，实现多尺度特征的自适应加权融合，增强模型对不同尺度车辆目标的检测能力。在公开数据集Vehicle Detection Dataset和BITVehicle上的实验结果表明，YOLOv13n-FCM模型在不同道路车辆场景下均取得了较好的检测效果。其中，在Vehicle Detection Dataset上，mAP50达到60.1%，mAP50:95达到42.6%，较原始YOLOv13n模型分别提升2.7%和2.6%；同时，相较最优对比方法分别提升2.7%和1.8%。在BITVehicle上，所提方法同样优于基线模型，表明其具有一定的跨场景适应能力。此外，在边缘移动设备NVIDIA Jetson AGX Orin上经过硬件加速后，输入尺寸640×640下，推理速度达78.5FPS。由此可见，该模型在保证实时检测性能的同时显著提升了车辆检测精度，具有良好的工程应用价值。

WU Jiaheng, DUAN Jiancheng, ZHANG Ronghui, CHEN Junzhou. A Vehicle Detection Algorithm Based on Frequency-Domain Enhancement and Adaptive Multi-Scale Feature Fusion[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0260227.

吴佳恒, 段建成, 张荣辉, 陈俊周. 基于频域增强与多尺度特征自适应融合的车辆检测算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0260227.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0260227

References

[1] 张新钰，卢毅果，高鑫，等. 面向智能网联汽车的车路协同感知技术及发展趋势[J]. 自动化学报. 2025, 51(02): 233-248. ZHANG X Y, LU Y G, GAO X, et al. Vehicle-road Collaborative Perception Technology and Development Trend for Intelligent Connected Vehicles[J]. Journal of Automatica Sinica, 2025, 51(02): 233-248. (in Chinese)
[2] 马月坤, 马铭佑. 基于全局与局部特征加权融合的隐喻识别模型[J]. 计算机工程, 2025, 51(5): 143-153. MA Y K, MA M Y. Metaphor Recognition Model Based on Weighted Integration of Global and Local Features[J]. Computer Engineering, 2025, 51(5): 143-153. (in Chinese)
[3] Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.
[4] He K, Gkioxari G, Dollár P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969.
[5] Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//European conference on computer vision. Cham: Springer International Publishing, 2016: 21-37.
[6] Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: Unified, Real-Time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016: 779–788.
[7] 黄金贵, 刘朋, 唐文胜. MMD-YOLOv7:黑暗条件下车辆检测方法[J]. 计算机工程, 2025, 51(9): 340-349. HUANG J G, LIU P, TANG W S. MMD-YOLOv7: Vehicle Detection Method Under Dark Conditions[J]. Computer Engineering, 2025, 51(9): 340-349. (in Chinese)
[8] 华家宝, 张京瑞, 朱福民, 陈璐. 基于路侧相机的自适应空间变换车辆检测方法[J]. 计算机工程, 2025, 51(6): 349-359. HUA J B, ZHANG J R, ZHU F M, CHEN L. Adaptive Spatial Transformation Method for Vehicle Detection Based on Roadside Cameras[J]. Computer Engineering, 2025, 51(6): 349-359. (in Chinese)
[9] Zhou X, Wang D, Krähenbühl P. Objects as points[J]. arXiv preprint arXiv:1904.07850, 2019.
[10] Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.
[11] Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv preprint arXiv:1704.04861, 2017.
[12] Yuan X, Cao X, Hao X, Chen H, Wei X. Vehicle Detection by a Context-Aware Multichannel Feature Pyramid[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017, 47(7): 1348-1357.
[13] Chen X, et al. Dynamic Context-Aware Pyramid Network for Infrared Small Target Detection[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 13780-13794.
[14] Chen Y, et al. LCSA-IDNet: A Lightweight Channel-Spatial Attention Network for Automotive FMCW Radar Interference Detection[C]//2025 IEEE MTT-S International Wireless Symposium (IWS). Xi’an, China: IEEE, 2025: 1-3.
[15] Hasan M A, Dey K. Depthwise separable convolutions with deep residual convolutions[J]. arXiv preprint arXiv:2411.07544, 2024.
[16] Chen L, Gu L, Li L, et al. Frequency Dynamic Convolution for Dense Image Prediction[C]//2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025.
[17] Daud Shah. Vehicle Detection Dataset[EB/OL]. (n.d.)[2026-01-31]. https://www.kaggle.com/datasets/daudshah/vehicle-detection-dataset/data.
[18] Kuanghang Dong. BitVehicle Dataset[EB/OL]. (n.d.)[2026-03-11]. https://www.kaggle.com/datasets/kuanghangdong/bitvehicle.
[19] Li C, Li L, Jiang H, et al. YOLOv6: A single-stage object detection framework for industrial applications[J]. arXiv preprint arXiv:2209.02976, 2022.
[20] Jocher G, Chaurasia A, Qiu J. Ultralytics YOLO[EB/OL]. (2023-01)[2026-02]. https://github.com/ultralytics/ultralytics.
[21] Zhao Y, Lv W, Xu S, et al. Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 16965-16974.
[22] Ma X, Dai X, Bai Y, et al. Rewrite the stars[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 5694-5703.
[23] Wang C Y, Yeh I H, Mark Liao H Y. Yolov9: Learning what you want to learn using programmable gradient information[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2024: 1-21.
[24] Qin D, Leichner C, Delakis M, et al. MobileNetV4: Universal models for the mobile ecosystem[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 78-96.
[25] Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection[J]. Advances in Neural Information Processing Systems, 2024, 37: 107984-108011.
[26] Li X, Zhao S, Chen C, et al. YOLO-FD: An accurate fish disease detection method based on multi-task learning[J]. Expert Systems with Applications, 2024, 258: 125085.
[27] Tian Y, Ye Q, Doermann D. Yolov12: Attention-centric real-time object detectors[J]. arXiv preprint arXiv:2502.12524, 2025.
[28] Chen J, Huang H, Zhang R, et al. Yolo-ts: Real-time traffic sign detection with enhanced accuracy using optimized receptive fields and anchor-free fusion[J]. IEEE Transactions on Intelligent Transportation Systems, 2025.
[29] Lei M, Li S, Wu Y, et al. YOLOv13: Real-Time Object Detection with Hypergraph-Enhanced Adaptive Visual Perception[J]. arXiv preprint arXiv:2506.17733, 2025.
[30] Lin Z, Wu Y, Ma Y, et al. YOLO-LLTS: Real-Time Low-Light Traffic Sign Detection via Prior-Guided Enhancement and Multi-Branch Feature Interaction[J]. arXiv preprint arXiv:2503.13883, 2025.
[31] Sapkota R, Cheppally R H, Sharda A, et al. YOLO26: key architectural enhancements and performance benchmarking for real-time object detection[J]. arXiv preprint arXiv:2509.25164, 2025.
[32] Huang H, Xia T, Ren P. Partial Channel Network: Compute Fewer, Perform Better[J]. arXiv preprint arXiv:2502.01303, 2025.

Please choose a citation manager

Content to export