BS-YOLO: A Small Object Detection Algorithm Based on BSAM Attention Mechanism and SCConv

doi:10.19678/j.issn.1000-3428.0070159

Abstract

Abstract:

In recent years, there has been significant progress in terms of accuracy and robustness of deep-learning-based algorithms for object detection that have been widely applied in industry. However, in the field of small object detection, currently used object detection algorithms suffer from high rates of missed detections and false positives. Therefore, in this study, a YOLO small object detection algorithm, viz., BS-YOLO, which is based on SCConv and BSAM attention mechanism, is developed. First, in response to the problem of the large amount of redundant information generated in the feature extraction network, a new module, viz., C3SC, is proposed to reconstruct the backbone network using SCConv. This module reduces redundant information in both spatial and channel aspects of the extracted feature maps, thereby improving the quality of the feature maps extracted by the backbone network, and in turn enhancing detection accuracy. Second, a new attention mechanism, viz., BSAM, is proposed by combining CBAM and the BiFormer self-attention mechanism, by which weights are allocated reasonably in both spatial and channel aspects, making the feature map more focused on effective information and suppressing background interference. Finally, to solve the problem of uneven distribution of difficult and easy samples in terms of small object detection, Slideloss is used to optimize the loss function, thereby improving the effectiveness of the algorithm for small object detection. The experimental results obtained using the RSOD dataset show that the BS-YOLO algorithm has a precision of 94.2%, a recall rate of 91.6%, and a mAP@0.5 of 95.9%, corresponding to improvements of 3.3, 0.1, and 3.6 percentage, respectively, compared to the original YOLOv5 algorithm. This indicates that the BS-YOLO algorithm can effectively improve the accuracy of small object detection and reduce the missed detection rate.

Key words: small object detection, attention mechanism, feature purification, computer vision, deep learning

摘要：

近年来, 基于深度学习的目标检测算法在准确率和鲁棒性等方面取得了巨大进步, 并且在工业界得到广泛应用。但是, 在小目标检测领域, 当前的目标检测算法仍然存在漏检率和误检率高的问题。因此, 提出一种基于SCConv和BSAM注意力机制的YOLO小目标检测算法BS-YOLO。首先, 针对特征提取网络存在大量冗余信息的问题, 利用SCConv重构主干网络, 提出一种新的模块C3SC, 对提取到的特征图从空间和通道两个方面减少冗余信息, 提升主干网络提取到的特征图质量, 从而提高检测精度; 其次, 结合CBAM和BiFormer自注意力机制提出一种新的注意力机制BSAM, 在空间和通道两个方面合理分配权重, 使特征图更加关注有效信息, 抑制背景的干扰; 最后, 为了解决小目标检测存在的难易样本分布不均的问题, 利用Slideloss优化损失函数, 从而提高小目标检测的效果。在RSOD数据集上的实验结果表明, BS-YOLO算法的精确率为94.2%, 召回率为91.6%, 均值平均精度(mAP@0.5)为95.9%, 相对于原始的YOLOv5算法, 分别提高了3.3、0.1、3.6百分点, 表明BS-YOLO算法可以有效提高小目标检测的精度, 降低漏检率。

关键词: 小目标检测, 注意力机制, 特征提纯, 计算机视觉, 深度学习

CAO Jiwei, LUO Fei, DING Weichao. BS-YOLO: A Small Object Detection Algorithm Based on BSAM Attention Mechanism and SCConv[J]. Computer Engineering, 2026, 52(3): 119-127.

曹继卫, 罗飞, 丁炜超. BS-YOLO: 基于BSAM注意力机制和SCConv的小目标检测算法[J]. 计算机工程, 2026, 52(3): 119-127.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0070159

https://www.ecice06.com/EN/Y2026/V52/I3/119

Figures/Tables 12

Fig.1 The structure of YOLOv5 network

Fig.2 Improved network structure

Fig.3 Structure of C3SC

Fig.4 Structure of SCConv

Fig.5 Structure of SRU

Fig.6 Structure of CRU

Fig.7 Structure of BSAM

Fig.8 Weight distribution of Slideloss

Fig.9 Visual comparison

References 27

1	张珂, 冯晓晗, 郭玉荣, 等. 图像分类的深度卷积神经网络模型综述. 中国图象图形学报, 2021, 26(10): 2305- 2325.
	ZHANG K, FENG X H, GUO Y R, et al. Overview of deep convolutional neural networks for image classification. Journal of Image and Graphics, 2021, 26(10): 2305- 2325.
2	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 779-788.
3	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[EB/OL]. [2024-06-05]. https://arxiv.org/abs/1512.02325.
4	陈科圻, 朱志亮, 邓小明, 等. 多尺度目标检测的深度学习研究综述. 软件学报, 2021, 32(4): 1201- 1227.
	CHEN K Q, ZHU Z L, DENG X M, et al. Deep learning for multi-scale object detection: a survey. Journal of Software, 2021, 32(4): 1201- 1227.
5	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2024-06-05]. https://arxiv.org/abs/2004.10934.
6	DA ROCHA D A, FERREIRA F M F, PEIXOTO Z M A. Diabetic retinopathy classification using VGG16 neural network. Research on Biomedical Engineering, 2022, 38(2): 761- 772. doi: 10.1007/s42600-022-00200-8
7	李振鲁, 黄威, 孙锴. 复杂环境下的轻量化道路目标识别算法研究. 计算机工程, 2024, 50(4): 219- 227. doi: 10.19678/j.issn.1000-3428.0067576
	LI Z L, HUANG W, SUN K. Research on lightweight road-target-recognition algorithm in complex environment. Computer Engineering, 2024, 50(4): 219- 227. doi: 10.19678/j.issn.1000-3428.0067576
8	圣文顺, 余熊峰, 林佳燕, 等. 融合注意力与特征金字塔的小尺度目标检测算法. 计算机工程, 2024, 50(1): 242- 250. doi: 10.19678/j.issn.1000-3428.0066724
	SHENG W S, YU X F, LIN J Y, et al. Small-scale object detection algorithm integrating attention and feature pyramids. Computer Engineering, 2024, 50(1): 242- 250. doi: 10.19678/j.issn.1000-3428.0066724
9	LI J F, WEN Y, HE L H. SCConv: spatial and channel reconstruction convolution for feature redundancy[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 6153-6162.
10	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2014: 580-587.
11	HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[EB/OL]. [2024-06-05]. https://arxiv.org/abs/1406.4729.
12	GIRSHICK R. Fast R-CNN[C]//Proceedings of the IEEE International Conference on Computer Vision (ICCV). Washington D.C., USA: IEEE Press, 2016: 1440-1448.
13	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
14	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 779-788.
15	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2024-06-05]. https://arxiv.org/abs/1804.02767.
16	XIAO J S, GUO H W, ZHOU J, et al. Tiny object detection with context enhancement and feature purification. Expert Systems with Applications, 2023, 211, 118665. doi: 10.1016/j.eswa.2022.118665
17	LIU M J, WANG X H, ZHOU A J, et al. UAV-YOLO: small object detection on unmanned aerial vehicle perspective. Sensors, 2020, 20(8): 2238. doi: 10.3390/s20082238
18	ZHU L Q, LI X M, SUN H M, et al. Research on CBF-YOLO detection model for common soybean pests in complex environment. Computers and Electronics in Agriculture, 2024, 216, 108515. doi: 10.1016/j.compag.2023.108515
19	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2017: 936-944.
20	CHENG D C, MENG G F, CHENG G L, et al. SeNet: structured edge network for sea-land segmentation. IEEE Geoscience and Remote Sensing Letters, 2017, 14(2): 247- 251. doi: 10.1109/LGRS.2016.2637439
21	WANG Q L, WU B G, ZHU P F, et al. ECA-Net: efficient channel attention for deep convolutional neural networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2020: 11531-11539.
22	ZHU L, WANG X J, KE Z H, et al. BiFormer: vision transformer with bi-level routing attention[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2023: 10323-10333.
23	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[EB/OL]. [2024-06-05]. https://arxiv.org/abs/1807.06521.
24	GE Z, LIU S T, WANG F, et al. YOLOx: exceeding YOLO series in 2021[EB/OL]. [2024-06-05]. https://arxiv.org/abs/2107.08430.
25	CAI Z W, VASCONCELOS N. Cascade R-CNN: delving into high quality object detection[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 6154-6162.
26	ZHU X K, LÜ S C, WANG X, et al. TPH-YOLOv5: improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Washington D.C., USA: IEEE Press, 2021: 2778-2788.
27	LIANG S Y, WU H, ZHEN L, et al. Edge YOLO: real-time intelligent object detection system based on edge-cloud cooperation in autonomous vehicles. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(12): 25345- 25360. doi: 10.1109/TITS.2022.3158253

[1]	PAN Lihu, YIN Jiali, ZHANG Rui, XIE Binhong, ZHANG Linliang. Global-Local Spatiotemporal Perception Model for Traffic Flow Prediction [J]. Computer Engineering, 2026, 52(3): 392-402.
[2]	SU Jianhua, CHI Yunxian, XU Yunfeng, GAO Kai. Multimodal Intent Recognition Based on Attention Modality Fusion [J]. Computer Engineering, 2026, 52(3): 234-242.
[3]	ZHANG Yonghong, SUN Shulin, GONG Meng, WANG Junfei, MA Guangyi. Remote Sensing Cloud Image Prediction Method Based on Multi-scale Motion Memory Model [J]. Computer Engineering, 2026, 52(3): 128-140.
[4]	ZHANG Zhi, YIN Yukai, SUN Yiling, MENG Wenjing, PENG Chang. Research on Android Malware Detection Model Based on Multi-modal Feature Fusion [J]. Computer Engineering, 2026, 52(3): 243-254.
[5]	WU Xuesong, CHEN Yuanyuan, ZHOU Tao. Adaptive No-Reference Image Quality Assessment Based on Multi-Scale Pyramid Pooling [J]. Computer Engineering, 2026, 52(3): 107-118.
[6]	LIU Xiaoyu, LIAO Zhifang, TAN Sui, YU Zhiwu. Bridge Dynamic Strain Prediction Based on Stacked GRU Neural Network [J]. Computer Engineering, 2026, 52(3): 441-450.
[7]	CHEN Guolian, FENG Ziyang, CAO Junkuo. Research on Cyberbullying Detection Based on Multimodal Spatial Feature Fusion [J]. Computer Engineering, 2026, 52(3): 255-263.
[8]	QIN Yingxin, ZHANG Kejia, PAN Haiwei, JU Yahao. Adversarial Attacks in Computer Vision: A Survey [J]. Computer Engineering, 2026, 52(2): 46-68.
[9]	LI Jianlang, WU Xindian, CHEN Ling, YANG Bo, TANG Wensheng. 3D Object Detection Algorithm Based on 4D Millimeter-Wave Radar and Vision Fusion [J]. Computer Engineering, 2026, 52(2): 299-310.
[10]	ZHANG Xinjia, WANG Fang. UAV Image Small Object Detection Algorithm Based on Multi-layer Feature Fusion and Attention Mechanism [J]. Computer Engineering, 2026, 52(2): 148-157.
[11]	DAN Chonghong, WEI Honglei, HE Zhou, WU Guanfeng. SRMpose: Multi-Scale Feature Extraction Keypoint Detection Algorithm [J]. Computer Engineering, 2026, 52(2): 136-147.
[12]	LIU Chang, LIANG Bingxue, TIAN Rongkun, QIN Yuhua. Medical and Health Question Classification Based on Multi-feature Fusion and Hybrid Neural Network [J]. Computer Engineering, 2026, 52(2): 342-355.
[13]	WEN Lang, GOU Guanglei, BAI Ruifeng, MIAO Wanyu. Few-shot Fine-grained Image Classification Based on Neighborhood Fusion and Feature Enhancement [J]. Computer Engineering, 2026, 52(2): 158-166.
[14]	SONG Chaoqi, LIU Ying, HE Jinglu, LI Daxiang. Few-shot Image Classification Method Based on Salient Position Interaction Transformer [J]. Computer Engineering, 2026, 52(2): 167-176.
[15]	WANG Qingrong, HAO Fule, ZHU Changfeng, WANG Junjie. Research on Vehicle Trajectory Prediction Based on Multifeature Fusion [J]. Computer Engineering, 2026, 52(2): 331-341.

Please choose a citation manager

Content to export