基于条形池化和注意力机制的街道场景红外目标检测算法

doi:10.19678/j.issn.1000-3428.0065481

摘要/Abstract

摘要：

街道场景下的红外图像所含细节信息少、背景复杂，目前的目标检测模型存在检测精度低、检测速度慢的问题。为此，基于条形池化和注意力机制提出一种新的红外目标检测算法。使用包含条形池化和金字塔池化模块的混合池化模块改进快速空间池化金字塔模块，利用条形池化解决传统池化操作在进行目标检测时存在的特征丢失和污染问题，提高算法对长窄目标的特征提取能力，同时在孤立目标之间建立全局依赖关系，使模型收集更多的特征信息。在注意力模块中加入水平和垂直方向上的全局池化操作，以获取目标在特征图全局范围上的位置信息，将位置信息嵌入特征通道中，使算法更精准地定位目标，降低复杂背景对检测性能的影响。使用无批次归一化阻断批次归一化的估计偏移累积，解决算法性能退化问题，进一步提高算法的检测性能。在FLIR数据集上的实验结果表明，该算法的mAP(IoU值为0.5)和F1值分别达到80.7%和78.0%，相较YOLOv5分别提高了1.9和2.4个百分点。

关键词: 红外目标检测, 条形池化, 金字塔池化, 注意力机制, 无批次归一化

Abstract:

Infrared image in street scene contains less detail information and complex background, the existing target detection model exhibits low accuracy and sluggish processing speed. To address these issues, a new infrared target detection algorithm based on strip pooling and attention mechanism is proposed.The Mixed Pooling Module(MPM) includes strip pooling and the Pyramid Pooling Module(PPM) is used to improve the Spatial Pyramid Pooling Fast (SPPF) module. Strip pooling is applied to solve the feature loss and pollution issues existing in the traditional pooling operation during target detection, so as to improve the feature extraction ability for long and narrow targets, and the global dependency relationship is established between isolated targets, whereby this new method helps the model capture more enriched feature information. The global pooling operates in the horizontal direction, and vertical directions are handled by the attention module to obtain the position information of the target in the global range of the feature map, whereby the position information is embedded into the feature channel so that the algorithm can locate the target more accurately and reduce the impact of complex backgrounds on detection performance.Batch-Free Normalization(BFN) is used to address the performance degradation caused by the accumulation of the estimated offset in Batch Normalization(BN), which further improves the detection performance of the algorithm.The experimental results on FLIR dataset show that the improved algorithm has an mAP(IoU value is 0.5) of 80.7% and an F1 value of 78.0%, which are 1.9 and 2.4 percentage points higher than those of YOLOv5, respectively.

Key words: infrared target detection, strip pooling, pyramid pooling, attention mechanism, Batch-Free Normalization(BFN)

李强龙, 周新文, 位梦恩, 甘阳洲. 基于条形池化和注意力机制的街道场景红外目标检测算法[J]. 计算机工程, 2023, 49(8): 310-320.

Qianglong LI, Xinwen ZHOU, Meng'en WEI, Yangzhou GAN. Infrared Target Detection Algorithm Based on Strip Pooling and Attention Mechanism in Street Scene[J]. Computer Engineering, 2023, 49(8): 310-320.

https://www.ecice06.com/CN/Y2023/V49/I8/310

图/表 13

图1 不同的池化操作

Fig.1 Different pooling operations

图2 SPPF和SSPPF模块的整体结构

Fig.2 Overall structure of SPPF and SSPPF modules

图3 MPM模块结构

Fig.3 MPM module structure

图4 改进后的注意力模块结构

Fig.4 Improved attention module structure

图5 AC3模块结构

Fig.5 AC3 module structure

图6 Strip-YOLO模型结构

Fig.6 Strip-YOLO model structure

表1 FLIR数据集上不同模型的检测性能对比

Table 1 Comparison of detection performance of different models on FLIR dataset

检测模型	AP(IoU值为0.5)/%			P/%	R/%	mAP (IoU值为0.5)/%	F1值/%	时间/ms	参数量/10⁶	FLOPs/10⁹
检测模型	People	Bicycle	Car	P/%	R/%	mAP (IoU值为0.5)/%	F1值/%	时间/ms	参数量/10⁶	FLOPs/10⁹
Faster R-CNN	71.3	61.8	79.6	—	—	70.9	—	131.6	—	—
SSD	63.1	47.5	75.8	—	—	62.1	—	40.1	—	—
RefineDet	77.2	57.2	84.5	—	—	73.0	—	44.8	—	—
YOLOv3-tiny	71.9	51.9	85.3	79.5	61.9	69.7	69.6	9.1	8.9	13.3
YOLOv5s	82.6	63.4	90.4	80.6	71.2	78.8	75.6	10.8	7.1	16.0
Strip-YOLO	84.8	67.1	90.5	85.2	71.9	80.7	78.0	20.4	8.1	19.3

表2 不同改进方式对YOLOv5s检测性能的影响

Table 2 The effect of different improvement methods on the detection performance of YOLOv5s

SSPPF	AC3	BFN	AP(IoU值为0.5)/%			mAP (IoU值为0.5)/%	F1值/%	时间/ms	参数量/10⁶	FLOPs/10⁹
SSPPF	AC3	BFN	People	Bicycle	Car	mAP (IoU值为0.5)/%	F1值/%	时间/ms	参数量/10⁶	FLOPs/10⁹
			82.6	63.4	90.4	78.8	75.6	10.8	7.1	16.0
√			84.4	64.0	90.5	79.7	77.7	14.2	7.3	18.6
	√		83.3	63.6	90.4	79.1	77.2	16.8	7.9	16.6
		√	83.9	64.8	89.9	79.6	77.0	11.3	7.1	15.9
√	√	√	84.8	67.1	90.5	80.7	78.0	20.4	8.1	19.3

表3 不同改进方式对各个类别目标检测精确率和召回率的影响

Table 3 The effect of different improvement methods on the precision and recall of target detection in various categories %

SSPPF	AC3	BFN	P			平均P	R			平均R
SSPPF	AC3	BFN	People	Bicycle	Car	平均P	People	Bicycle	Car	平均R
			84.9	74.9	82.1	80.6	73.0	55.3	85.4	71.2
√			86.9	78.0	84.5	83.2	73.7	58.9	85.6	72.8
	√		85.8	78.9	85.6	83.4	73.6	56.6	85.0	71.8
		√	84.9	75.3	83.2	81.3	74.7	59.1	85.3	73.1
√	√	√	86.2	82.2	87.1	85.2	73.0	58.9	83.7	71.9

图7 不同改进方式对模型mAP的动态影响

Fig.7 Dynamic impact of different improvement methods on model mAP

图8 用于检测不同大小目标的特征图的热力图

Fig.8 The thermal maps of the feature maps for the detection of different size target

图9 不同改进方式在FLIR验证集上的loss值对比

Fig.9 Comparison of loss values between different improvement methods on FLIR validation set

图10 2种模型的部分检测结果

Fig.10 Partial detection results of two models

参考文献 28

1	秦鹏, 唐川明, 刘云峰, 等. 基于改进YOLOv3的红外目标检测方法. 计算机工程, 2022, 48(3): 211- 219. URL
	QIN P, TANG C M, LIU Y F, et al. Infrared target detection method based on improved YOLOv3. Computer Engineering, 2022, 48(3): 211- 219. URL
2	DAI X R, YUAN X, WEI X Y. TIRNet: object detection in thermal infrared images for autonomous driving. Applied Intelligence, 2021, 51(3): 1244- 1261. doi: 10.1007/s10489-020-01882-2
3	ZHANG H Z, LUO C B, WANG Q, et al. A novel infrared video surveillance system using deep learning based techniques. Multimedia Tools and Applications, 2018, 77(20): 26657- 26676. doi: 10.1007/s11042-018-5883-y
4	LI S S, LI Y J, LI Y, et al. YOLO-FIRI: improved YOLOv5 for infrared image object detection. IEEE Access, 2021, 9, 141861- 141875. doi: 10.1109/ACCESS.2021.3120870
5	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2014: 580-587.
6	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[EB/OL]. [2022-07-05]. https://arxiv.org/abs/1512.02325.
7	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2022-07-05]. https://arxiv.org/abs/1804.02767.
8	张汝榛, 张建林, 祁小平, 等. 复杂场景下的红外目标检测. 光电工程, 2020, 47(10): 126- 135. URL
	ZHANG R Z, ZHANG J L, QI X P, et al. Infrared target detection and recognition in complex scene. Opto-Electronic Engineering, 2020, 47(10): 126- 135. URL
9	徐诚极, 王晓峰, 杨亚东. Attention-YOLO: 引入注意力机制的YOLO检测算法. 计算机工程与应用, 2019, 55(6): 13-23, 125 URL
	XU C J, WANG X F, YANG Y D. Attention-YOLO: YOLO detection algorithm that introduces attention mechanism. Computer Engineering and Applications, 2019, 55(6): 13-23, 125 URL
10	LOWE D G. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 2004, 60(2): 91- 110. doi: 10.1023/B:VISI.0000029664.99615.94
11	VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2003: 1-9.
12	CHANG C C, LIN C J. LIBSVM. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 1- 27.
13	FREUND Y, SCHAPIRE R. Experiments with a new boosting algorithm[EB/OL]. [2022-07-05]. https://cseweb.ucsd.edu/~yfreund/papers/boostingexperiments.pdf.
14	BREHAR R, NEDEVSCHI S. Pedestrian detection in infrared images using HOG, LBP, gradient magnitude and intensity feature channels[C]//Proceedings of the 17th IEEE International Conference on Intelligent Transportation Systems. Washington D. C., USA: IEEE Press, 2014: 1669-1674.
15	MURESAN M P, BREHAR R, NEDEVSCHI S. Vision algorithms and embedded solution for pedestrian detection with far infrared camera[C]//Proceedings of the 10th IEEE International Conference on Intelligent Computer Communication and Processing. Washington D. C., USA: IEEE Press, 2014: 133-136.
16	GHOSE D, DESAI S M, BHATTACHARYA S, et al. Pedestrian detection in thermal images using saliency maps[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 988-997.
17	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137- 1149. doi: 10.1109/TPAMI.2016.2577031
18	汪常建, 丁勇, 卢盼成. 融合改进FPN与关联网络的Faster R⁃CNN目标检测. 计算机工程, 2022, 48(2): 173- 179. URL
	WANG C J, DING Y, LU P C. Object detection using Faster R-CNN combining improved FPN and relation network. Computer Engineering, 2022, 48(2): 173- 179. URL
19	ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 4203-4212.
20	邹慧海, 侯进. 改进SSD算法的道路小目标检测研究. 计算机工程, 2022, 48(5): 281- 288. URL
	ZOU H H, HOU J. Research on road small target detection with improved SSD algorithm. Computer Engineering, 2022, 48(5): 281- 288. URL
21	HOU Q B, ZHANG L, CHENG M M, et al. Strip pooling: rethinking spatial pooling for scene parsing[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 4002-4011.
22	CAO Y, XU J R, LIN S, et al. GCNet: non-local networks meet squeeze-excitation networks and beyond[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2020: 1971-1980.
23	HOU Q B, ZHOU D Q, FENG J S. Coordinate attention for efficient mobile network design[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 13708-13717.
24	ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Networks, 2018, 107, 3- 11. doi: 10.1016/j.neunet.2017.12.012
25	舒朗, 张智杰, 雷波. 一种针对红外目标检测的Dense-Yolov5算法研究. 光学与光电技术, 2021, 19(1): 69- 75. URL
	SHU L, ZHANG Z J, LEI B. Research on Dense-Yolov5 algorithm for infrared target detection. Optics & Optoelectronic Technology, 2021, 19(1): 69- 75. URL
26	HUANG L, ZHOU Y, WANG T, et al. Delving into the estimation shift of batch normalization in a network[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2022: 753-762.
27	ZHOU T, YU Z, CAO Y, et al. Study on an infrared multi-target detection method based on the pseudo-two-stage model. Infrared Physics & Technology, 2021, 118, 103883.
28	ZHOU B L, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 2921-2929.

[1]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[2]	林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.
[3]	李泽霖, 吕兆峰, 陈富强, 李克. 基于多跳信息融合的实体对齐模型[J]. 计算机工程, 2024, 50(9): 142-152.
[4]	王汝英, 马嘉骏, 董建强, 刘万龙, 张海涛, 尹凯, 赵博超. 基于MTS-BiGRU-DMHSA的工业负荷预测方法[J]. 计算机工程, 2024, 50(9): 169-178.
[5]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[6]	张天鹏, 韩晶, 吕学强. 基于多任务学习的超分辨率辅助小目标检测[J]. 计算机工程, 2024, 50(9): 304-312.
[7]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[8]	曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.
[9]	饶日昕, 王怡文, 曾砺志, 童心恬, 赵海涛. 面向废旧电缆检测的轻量化网络模型[J]. 计算机工程, 2024, 50(8): 22-30.
[10]	李华昱, 张智康, 闫阳, 岳阳. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39.
[11]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[12]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[13]	李仲, 冒睿瑞, 王晓龙, 王根一, 安国成. 基于改进PIDNet的水位线检测算法[J]. 计算机工程, 2024, 50(8): 102-112.
[14]	王夙喆, 张雪英, 陈晓玉, 李凤莲, 吴泽林. 基于有效注意力和GAN结合的脑卒中EEG增强算法[J]. 计算机工程, 2024, 50(8): 336-344.
[15]	王宇, 祁琦, 王纯, 许才. 储能变流器信号高精度故障诊断方法[J]. 计算机工程, 2024, 50(8): 389-396.

选择文件类型/文献管理软件名称

选择包含的内容