Small Object Detection Algorithm Based on Large Kernel Adaptive Fusion

doi:10.19678/j.issn.1000-3428.0068540

Abstract

Abstract:

To address the challenges faced by current single-stage object detection algorithms based on convolutional neural networks (such as the YOLO series and VFNet)in high-altitude aerial shooting scenarios-including complex backgrounds, low detection accuracy, and feature overlap, this study proposes an end-to-end object detection algorithm called CSPENet. First, a deep convolutional network, CSPNeXt, with large kernels is used as the model′s backbone, enhancing its capability to capture global context. Second, by introducing a Feature Refinement Module (FRM) in both spatial and channel dimensions, adaptive weights are generated that can effectively suppress overlapping features are generated. It adds a Receptive Field Attention (RFA) mechanism, based on mobile networks in the feature fusion stage to solve the problem of large kernel parameter sharing. Finally, the Efficient Intersection over Union (EIoU) loss function is utilized as the model′s regression loss, separating the influencing factors of the aspect ratios between the predicted and ground truth boxes, which leads to faster convergence and improved localization accuracy. Experimental results demonstrate that CSPENet achieves an average accuracy improvement of 4.4 percentage points compared with the DINO algorithm on the VisDrone-DET dataset, offering a novel solution for research and applications in small object detection algorithms.

Key words: large kernel, small object, contextual information, feature refinement, adaptive fusion, receptive field

摘要：

针对当前基于卷积神经网络的单阶段目标检测算法(YOLO系列、VFNet等)在高空拍摄场景下目标背景复杂、检测精度低、特征混叠等问题，提出一种端到端的目标检测算法CSPENet。首先，采用基于大内核深度卷积CSPNeXt作为模型主干，提高模型捕捉全局上下文的能力；其次，通过引入特征细化模块(FRM)在空间和通道维度上生成自适应权重，可有效抑制混叠特征，并在特征融合阶段添加基于移动网络的感受野注意力(RFA)机制解决大内核参数共享问题；最后，采用EIoU损失函数作为模型的回归损失函数，并拆分预测框和真实框纵横比的影响因子，以提高模型收敛速度并改善定位效果。实验结果表明，CSPENet在VisDrone-DET数据集上相对于DINO算法平均准确率均值提升4.4百分点，为小目标检测算法的研究及其应用提供新的参考方案。

关键词: 大内核, 小目标, 上下文信息, 特征细化, 自适应融合, 感受野

WANG Lei, HU Junhong, REN Yang. Small Object Detection Algorithm Based on Large Kernel Adaptive Fusion[J]. Computer Engineering, 2025, 51(6): 65-73.

王磊, 胡君红, 任洋. 基于大内核自适应融合的小目标检测算法[J]. 计算机工程, 2025, 51(6): 65-73.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068540

https://www.ecice06.com/EN/Y2025/V51/I6/65

Figures/Tables 13

Fig.1 Structure of CSPENet model

Fig.2 Large kernel convolution block

Fig.3 Dark structure

Fig.4 Overall structure of the FRM

Fig.5 Visualization results of FRM feature map

Fig.6 Spatial feature map of receptive field

Fig.7 Schematic diagram of the receptive field attention module

Fig.8 Visualization results of different models

References 28

1	CHEN Y Y , WANG H Q , PANG Y , et al. An infrared small target detection method based on a weighted human visual comparison mechanism for safety monitoring. Remote Sensing, 2023, 15 (11): 2922. doi: 10.3390/rs15112922
2	WANG L, LIU H. Review of human target detection and tracking based on multi-view information fusion[C]//Proceedings of Chinese Intelligent Automation Conference. Berlin, Germany: Springer, 2023: 31-50.
3	LI K , WANG Y N , HU Z M . Improved YOLOv7 for small object detection algorithm based on attention and dynamic convolution. Applied Sciences, 2023, 13 (16): 9316. doi: 10.3390/app13169316
4	WANG B F , LI Y W , ZHOU M F , et al. Smartphone-based platforms implementing microfluidic detection with image-based artificial intelligence. Nature Communications, 2023, 14 (1): 1341. doi: 10.1038/s41467-023-36017-x
5	DAI X Y, CHEN Y P, XIAO B, et al. Dynamic head: unifying object detection heads with attentions[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 7373-7382.
6	LIU Z, LIN Y T, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2021: 10012-10022.
7	LIU Z, HU H, LIN Y T, et al. Swin transformer V2: scaling up capacity and resolution[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 1-10.
8	NICOLAS C, FRANCISCO M, GABRIEL S, et al. End-to-end object detection with transformers[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 213-229.
9	REN S Q , HE K M , GIRSHICK R , et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
10	CHEN Q, CHEN X, WANG J, et al. Group DETR: fast DETR training with group-wise one-to-many assignment[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2023: 6633-6642.
11	MENG D P, CHEN X K, FAN Z J, et al. Conditional DETR for fast training convergence[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2021: 3651-3660.
12	LIU Y, ZHANG Y, WANG Y, et al. Sap-DETR: bridging the gap between salient points and queries-based transformer detector for fast model convergency[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2023: 15539-15547.
13	LIU S L, LI F, ZHANG H, et al. DAB-DETR: dynamic anchor boxes are better queries for DETR[EB/OL]. [2023-09-02]. https://arxiv.org/abs/2201.12329?context=cs.CV.
14	SUN R, WANG Y, MAI H, et al. Alignment before aggregation: trajectory memory retrieval network for video object segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2023: 1218-1228.
15	LI F , ZHANG H , LIU S L , et al. DN-DETR: accelerate DETR training by introducing query denoising. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 46 (4): 2239- 2251.
16	DE PLAEN H, DE PLAEN P F, SUYKENS J A K, et al. Unbalanced optimal transport: a unified framework for object detection[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2023: 3198-3207.
17	ZHANG H, LI F, LIU S L, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection[EB/OL]. [2023-09-02]. https://arxiv.org/abs/2203.03605?context=cs.CV.
18	LIU Z, MAO H Z, WU C Y, et al. A convnet for the 2020s[EB/OL]. [2023-09-02]. https://arxiv.org/abs/2201.03545.
19	HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2023-09-02]. https://arxiv.org/pdf/1704.04861.
20	LYU C Q, ZHANG W W, HUANG H A, et al. RTMDet: an empirical study of designing real-time object detectors[EB/OL]. [2023-09-02]. https://arxiv.org/abs/2212.07784?context=cs.
21	李运堂, 朱文凯, 李恒杰, 等. 基于轻量型编解码网络的复杂输电线图像识别. 光电工程, 2024, 51 (10): 240158. doi: 10.12086/oee.2024.240158
	LI Y T , ZHU W K , LI H J , et al. Image recognition of complex transmission lines based on lightweight encoder-decoder networks. Opto-Electronic Engineering, 2024, 51 (10): 240158. doi: 10.12086/oee.2024.240158
22	XIAO J, ZHAO T, YAO Y, et al. Context augmentation and feature refinement network for tiny object detection[EB/OL]. [2023-09-02]. https://openreview.net/pdf?id=q2ZaVU6bEsT.
23	ZHANG X, LIU C, YANG D, et al. RFAConv: innovating spatital attentionand standard convolutional operation[EB/OL]. [2023-09-02]. https://arxiv.org/pdf/2304.03198.
24	XU H Z , HE H J , ZHANG Y , et al. A comparative study of loss functions for road segmentation in remotely sensed road datasets. International Journal of Applied Earth Observation and Geoinformation, 2023, 116, 103159. doi: 10.1016/j.jag.2022.103159
25	MENG W , YUAN Y L . SGN-YOLO: detecting wood defects with improved YOLOv5 based on semi-global network. Sensors, 2023, 23 (21): 8705. doi: 10.3390/s23218705
26	ZHANG Y F, REN W Q, ZHANG Z, et al. Focal and efficient IOU loss for accurate bounding box regression[EB/OL]. [2023-09-02]. https://arxiv.org/abs/2101.08158?context=cs.CV.
27	DU D W, ZHU P F, WEN L Y, et al. VisDrone-DET2019: the vision meets drone object detection in image challenge results[C]//Proceedings of International Conference on Computer Vision Workshop. Washington D.C., USA: IEEE Press, 2019: 213-226.
28	FAKHARURAZI M I M, JUSOH A Z, ASNAWI A L, et al. Object detection in autonomous vehicles[C]//Proceedings of the 13th International Conference on System Engineering and Technology (ICSET). Washington D.C., USA: IEEE Press, 2023: 177-181.

[1]	XI Qi, WANG Mingjie, WEI Jinghe, ZHAO Wei. Small Object Detection Algorithm for Aerial Photography Based on Improved YOLOv3 [J]. Computer Engineering, 2025, 51(6): 184-192.
[2]	HUANG Kun, QI Zhaojian, WANG Juanmin, HU Qian, HU Weichao, PI Jianyong. Aggregation Pedestrian Detection Model Based on Improved YOLOv8 [J]. Computer Engineering, 2025, 51(5): 133-142.
[3]	HUANG Shuoqing, HUANG Jingui. Improved Steel Defect Detection Method Based on Enhanced Fusion of RFB and YOLOv5 Features [J]. Computer Engineering, 2025, 51(4): 249-260.
[4]	SUN Haomiao, LI Zongmin, XIAO Qian, SUN Wenjie, ZHANG Wenxin. AI-Curling: An On-Site Curling Analysis and Decision-Making Method [J]. Computer Engineering, 2025, 51(2): 102-110.
[5]	HU Chaoju, GUO Fengyi. MODF Port State Detection Algorithm Based on Improved YOLOv7 [J]. Computer Engineering, 2025, 51(2): 78-85.
[6]	CHEN Xiaoyu, SHEN Chen, SHEN Yue, KONG Deming. Real-Time Segmentation Network of Yard Images Based on Improved SwiftNet [J]. Computer Engineering, 2024, 50(6): 296-303.
[7]	LONG Chenzhi, CHEN Ping, LI Chuankun. Fusing Global-Local Contextual Information for Small Object Multi-Person Pose Estimation [J]. Computer Engineering, 2024, 50(4): 342-349.
[8]	LI Zhenlu, HUANG Wei, SUN Kai. Research on Lightweight Road-Target-Recognition Algorithm in Complex Environment [J]. Computer Engineering, 2024, 50(4): 219-227.
[9]	CUI Liqun, CAO Huawei. Target Detection of Remote-Sensing Images Based on Improved YOLOv5 [J]. Computer Engineering, 2024, 50(4): 228-236.
[10]	TAN Ruoqi, DONG Minggang, ZHAO Weixiao, WU Tianhao. Non-Motorized License Plate Recognition and Localization Method Based on Semantic Alignment and Hierarchical Optimization [J]. Computer Engineering, 2024, 50(11): 142-151.
[11]	HE Zize, ZHAN Yinwei. Local Feature Refinement Action Recognition Method Based on Graph Convolution [J]. Computer Engineering, 2024, 50(11): 276-283.
[12]	Xinlu JIANG, Tianen CHEN, Cong WANG, Chunjiang ZHAO. Small Object Detection Algorithm for Agricultural Pest Images in Field Environments [J]. Computer Engineering, 2024, 50(1): 232-241.
[13]	Wenshun SHENG, Xiongfeng YU, Jiayan LIN, Xin CHEN. Small-Scale Object Detection Algorithm Integrating Attention and Feature Pyramids [J]. Computer Engineering, 2024, 50(1): 242-250.
[14]	Jiaxin LI, Jin HOU, Boying SHENG, Yuhang ZHOU. Remote Sensing Small Object Detection Network Based on Improved YOLOv5 [J]. Computer Engineering, 2023, 49(9): 256-264.
[15]	Hao LIU, Honglan WU, Yuxuan FANG. Efficient Human Pose Estimation Combining Global Contextual Information [J]. Computer Engineering, 2023, 49(7): 102-109.

Please choose a citation manager

Content to export