一种基于边界框关键点距离的框回归算法

doi:10.19678/j.issn.1000-3428.0065328

摘要/Abstract

摘要：

针对目前基于交并比（IoU）的框回归方法在实际应用中存在的检测精度不高、收敛速度较慢等问题，提出一种基于关键点距离交并比（KIoU）的框回归方法。从几何知识入手，将矩形的3个顶点和1个中心点作为关键点，通过计算对应点之间的距离来判断预测框与真实框的位置以及形态差异。构建基于关键点交并比损失的新型损失函数，计算实际情况与理想情况下预测框与真实框的关键点交并比之差，将关键对应点的距离作为IoU的惩罚项以加速模型收敛过程，利用关键点信息在定位上的高效性和准确性来提高目标检测精度。以单阶段目标检测算法SSD和两阶段目标检测算法Faster R-CNN为基准算法，在PASCAL VOC和COCO数据集上将KIoU与IoU、GIoU、DIoU、CIoU等4种交并比方法进行实验对比，结果表明：在检测精度方面，在Faster R-CNN上KIoU相较IoU提升了2.91%，相较目前表现较好的DIoU提升了0.11%，在SSD上KIoU相较IoU与DIoU分别提升了0.96%与0.06%；在目标检测视觉效果方面，KIoU方法对目标的定位更加准确，且在一定程度上能够减少目标漏检的情况。

关键词: 目标检测, 边界框回归, 交并比, 关键点距离交并比, 关键对应点

Abstract:

To address the challenges of low detection accuracy and slow convergence rate associated with the current box regression method utilizing Intersection-over-Union(IoU) in practical applications, a new box regression method based on Key point distance based Intersection-over-Union(KIoU) is proposed. The proposed method incorporates geometric knowledge by considering the three vertices and the center point of the rectangle as key points. These key points enable the determination of the position and morphological differences between the predicted box and the actual box by calculating the distance between corresponding points. A new loss function based on the IoU loss of key points is constructed to measure the difference between the IoU of the key points of the prediction box and the actual box in both real-world and ideal scenarios.The distance between the corresponding key points is used as the penalty term for IoU, thereby accelerating the convergence process of the model.The efficiency and accuracy of key point information in object positioning are leveraged to improve the target detection accuracy. Experimental comparisons were conducted on the PASCAL VOC and COCO datasets using the Single Shot multibox Detector(SSD), which is a single-stage object detection algorithm, and the Faster Region-Convolutional Neural Network(Faster R-CNN), which is a two-stage object detection algorithm, as benchmark algorithms. KIoU was compared against IoU, Generalized IoU(GIoU), Distance IoU (DIoU), and Complete IoU(CIoU).The results demonstrated notable improvements in detection accuracy.Specifically, compared to IoU, KIoU on Faster R-CNN exhibited a 2.91% increase, surpassing DIoU by 0.11% in current performance, and outperformed IoU and DIoU on SSD by 0.96% and 0.06%, respectively.Additionally, in terms of visual effects in object detection, the KIoU method exhibited more accurate target localization and demonstrated the ability to mitigate the occurrence of missed targets to some extent.

Key words: object detection, boundary box regression, Intersection-over-Union(IoU), Key point distance based Intersection-over-Union(KIoU), key corresponding point

聂志勇, 阴宇薇, 汤佳欣, 涂志刚. 一种基于边界框关键点距离的框回归算法[J]. 计算机工程, 2023, 49(7): 65-75.

Zhiyong NIE, Yuwei YIN, Jiaxin TANG, Zhigang TU. A Box Regression Algorithm Based on Key Point Distance of Bounding Box[J]. Computer Engineering, 2023, 49(7): 65-75.

https://www.ecice06.com/CN/Y2023/V49/I7/65

图/表 9

参考文献 49

1	张慧, 王坤峰, 王飞跃. 深度学习在目标视觉检测中的应用进展与展望. 自动化学报, 2017, 43 (8): 1289- 1305. URL
	ZHANG H , WANG K F , WANG F Y . Advances and perspectives on applications of deep learning in visual object detection. Acta Automatica Sinica, 2017, 43 (8): 1289- 1305. URL
2	张顺, 龚怡宏, 王进军. 深度卷积神经网络的发展及其在计算机视觉领域的应用. 计算机学报, 2019, 42 (3): 453- 482. URL
	ZHANG S , GONG Y H , WANG J J . The development of deep convolution neural network and its applications on computer vision. Chinese Journal of Computers, 2019, 42 (3): 453- 482. URL
3	周飞燕, 金林鹏, 董军. 卷积神经网络研究综述. 计算机学报, 2017, 40 (6): 1229- 1251. URL
	ZHOU F Y , JIN L P , DONG J . Review of convolutional neural network. Chinese Journal of Computers, 2017, 40 (6): 1229- 1251. URL
4	张冬明, 靳国庆, 代锋, 等. 基于深度融合的显著性目标检测算法. 计算机学报, 2019, 42 (9): 2076- 2086. URL
	ZHANG D M , JIN G Q , DAI F , et al. Salient object detection based on deep fusion of hand-crafted features. Chinese Journal of Computers, 2019, 42 (9): 2076- 2086. URL
5	蒋弘毅, 王永娟, 康锦煜. 目标检测模型及其优化方法综述. 自动化学报, 2021, 47 (6): 1232- 1255. URL
	JIANG H Y , WANG Y J , KANG J Y . A survey of object detection models and its optimization methods. Acta Automatica Sinica, 2021, 47 (6): 1232- 1255. URL
6	TU Z , GUO Z , XIE W , et al. Fusing disparate object signatures for salient object detection in video. Pattern Recognition, 2017, 72, 285- 299. doi: 10.1016/j.patcog.2017.07.028
7	LIU L , OUYANG W L , WANG X G , et al. Deep learning for generic object detection: a survey. International Journal of Computer Vision, 2020, 128 (2): 261- 318.
8	HOY M, TU Z G, DANG K, et al. Learning to predict pedestrian intention via variational tracking networks[C]//Proceedings of the 21st International Conference on Intelligent Transportation Systems. Washington D.C., USA: IEEE Press, 2018: 3132-3137.
9	MHALLA A , CHATEAU T , GAZZAH S , et al. An embedded computer-vision system for multi-object detection in traffic surveillance. IEEE Transactions on Intelligent Transportation Systems, 2019, 20 (11): 4006- 4018. doi: 10.1109/tits.2018.2876614
10	LIU Y , MA Z , LIU X M , et al. Privacy-preserving object detection for medical images with Faster R-CNN. IEEE Transactions on Information Forensics and Security, 2022, 17, 69- 84. doi: 10.1109/TIFS.2019.2946476
11	黄凯奇, 陈晓棠, 康运锋, 等. 智能视频监控技术综述. 计算机学报, 2015, 38 (6): 1093- 1118. URL
	HUANG K Q , CHEN X T , KANG Y F , et al. Intelligent visual surveillance: a review. Chinese Journal of Computers, 2015, 38 (6): 1093- 1118. URL
12	代科学, 李国辉, 涂丹, 等. 监控视频运动目标检测减背景技术的研究现状和展望. 中国图象图形学报, 2006, 11 (7): 919- 927. URL
	DAI K X , LI G H , TU D , et al. Prospects and current studies on background subtraction techniques for moving objects detection from surveillance video. Journal of Image and Graphics, 2006, 11 (7): 919- 927. URL
13	AHMED F, TARLOW D, BATRA D. Optimizing expected intersection-over-union with candidate-constrained CRFs[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2016: 1850-1858.
14	ZHANG Z, SABUNCU M R. Generalized cross entropy loss for training deep neural networks with noisy labels[EB/OL]. [2022-06-05]. https://arxiv.org/abs/1805.07836.
15	BAE S H . Object detection based on region decomposition and assembly. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33 (1): 8094- 8101.
16	ZOU Z, CHEN K, SHI Z, et al. Object detection in 20 years: a survey[EB/OL]. [2022-06-05]. https://arxiv.org/abs/1905.05055.
17	WU B, NEVATIA R. Cluster boosted tree classifier for multi-view, multi-pose object detection[C]//Proceedings of the 11th IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2007: 1-8.
18	NOWOZIN S. Optimal decisions from probabilistic models: the intersection-over-union case[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2014: 548-555.
19	SAXENA E , GOSWAMI M N . Automatic object detection in image processing: a survey. International Journal on Recent and Innovation Trends in Computing and Communi-cation, 2014, 2 (12): 4239- 4242.
20	NEUBECK A, VAN GOOL L. Efficient non-maximum suppression[C]//Proceedings of the 18th International Conference on Pattern Recognition. Washington D.C., USA: IEEE Press, 2006: 850-855.
21	BLASCHKO M B, LAMPERT C H. Learning to localize objects with structured output regression[EB/OL]. [2022-06-05]. https://www.cs.cornell.edu/courses/cs6784/2014sp/lectures/07-BlaschkoLampert08.pdf.
22	DEVIN C, ABBEEL P, DARRELL T, et al. Deep object-centric representations for generalizable robot learning[C]//Proceedings of IEEE International Conference on Robotics and Automation. Washington D.C., USA: IEEE Press, 2018: 7111-7118.
23	ZHANG D J , ZHANG Z , ZOU L , et al. Part-based visual tracking with spatially regularized correlation filters. The Visual Computer, 2020, 36 (3): 509- 527.
24	AKBAS E , ECKSTEIN M P . Object detection through search with a foveated visual system. PLoS Computational Biology, 2017, 13 (10): e1005743.
25	WANG A T, SUN Y H, KORTYLEWSKI A, et al. Robust object detection under occlusion with context-aware CompositionalNets[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 12642-12651.
26	VIOLA P, PLATT J C, ZHANG C. Multiple instance boosting for object detection[EB/OL]. [2022-06-05]. http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2005_590.pdf.
27	FORT A , DELPUECH C , PERNIER J , et al. Dynamics of cortico-subcortical cross-modal operations involved in audio-visual object detection in humans. Cerebral Cortex, 2002, 12 (10): 1031- 1039. doi: 10.1093/cercor/12.10.1031
28	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2014: 580-587.
29	GIRSHICK R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2015: 1440-1448.
30	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[C]//Proceedings of IEEE Conference on Pattern Analysis and Machine Intelligence. Washington D.C., USA: IEEE Press, 2016: 1137-1149.
31	HE K M, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2017: 2980-2988.
32	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2016: 779-788.
33	REDMON J, FARHADI A. YOLO9000: better, faster, stronger[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2017: 6517-6525.
34	REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2022-06-05]. https://arxiv.org/abs/1804.02767.
35	BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2022-06-05]. https://arxiv.org/abs/2004.10934.
36	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[EB/OL]. [2022-06-05]. https://arxiv.org/abs/1512.02325.
37	LIN T Y , GOYAL P , GIRSHICK R , et al. Focal loss for dense object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42 (2): 318- 327. doi: 10.1109/TPAMI.2018.2858826
38	DUAN K W, BAI S, XIE L X, et al. CenterNet: keypoint triplets for object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2020: 6568-6577.
39	LAW H, TENG Y, RUSSAKOVSKY O, et al. CornerNet-lite: efficient keypoint based object detection[EB/OL]. [2022-06-05]. https://arxiv.org/abs/1904.08900.
40	RANJAN R , PATEL V M , CHELLAPPA R . HyperFace: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41 (1): 121- 135. doi: 10.1109/TPAMI.2017.2781233
41	ZHOU X Y, KOLTUN V, KRÄHENBÜHL P. Tracking objects as points[EB/OL]. [2022-06-05]. https://arxiv.org/abs/2004.01177.
42	REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: a metric and a loss for bounding box regression[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 658-666.
43	WANG X L, GUPTA A. Unsupervised learning of visual representations using videos[C]//Proceedings of IEEE International Conference on Computer Vision. Washington D.C., USA: IEEE Press, 2016: 2794-2802.
44	ZHENG Z H , WANG P , LIU W , et al. Distance-IoU loss: faster and better learning for bounding box regression. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34 (7): 12993- 13000. doi: 10.1609/aaai.v34i07.6999
45	BOUCHARD G. Clustering and classification employing softmax function including efficient bounds[EB/OL]. [2022-06-05]. https://www.freepatentsonline.com/8065246.html.
46	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2015: 3431-3440.
47	PETERSEN S E , POSNER M I . The attention system of the human brain: 20 years after. Annual Review of Neuroscience, 2012, 35, 73- 89. doi: 10.1146/annurev-neuro-062111-150525
48	LAW H , DENG J . CornerNet: detecting objects as paired keypoints. International Journal of Computer Vision, 2020, 128 (3): 642- 656.
49	JIAO L C , ZHANG F , LIU F , et al. A survey of deep learning-based object detection. IEEE Access, 2019, 7, 128837- 128868. doi: 10.1109/ACCESS.2019.2939201

交并比	mAP	优化比例
SSD+IoU	77.3	baseline
SSD+GIoU	77.4	0.129
SSD+DIoU	77.5	0.258
SSD+CIoU	77.8	0.646
SSD+KIoU	78.1	1.035

交并比	mAP	优化比例
SSD+IoU	77.3	baseline
SSD+GIoU	77.4	0.129
SSD+DIoU	77.5	0.258
SSD+CIoU	77.8	0.646
SSD+KIoU	78.1	1.035

交并比损失	AP50	AP75
SSD+$ {L}_{{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $(baseline)	51.01	54.74
SSD+$ {L}_{{\mathrm{G}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	51.06	55.48
优化比例	0.10	1.35
SSD+$ {L}_{{\mathrm{D}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	51.31	55.71
优化比例	0.59	1.77
SSD+$ {L}_{{\mathrm{C}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	51.44	56.16
优化比例	0.84	2.59
SSD+$ {L}_{{\mathrm{K}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	51.50	56.21
优化比例	0.96	2.69

交并比损失	AP50	AP75
SSD+$ {L}_{{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $(baseline)	51.01	54.74
SSD+$ {L}_{{\mathrm{G}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	51.06	55.48
优化比例	0.10	1.35
SSD+$ {L}_{{\mathrm{D}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	51.31	55.71
优化比例	0.59	1.77
SSD+$ {L}_{{\mathrm{C}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	51.44	56.16
优化比例	0.84	2.59
SSD+$ {L}_{{\mathrm{K}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	51.50	56.21
优化比例	0.96	2.69

损失	AP	AP75	APs	APm	APl
$ {L}_{{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	37.93	40.79	21.58	40.82	50.14
$ {L}_{{\mathrm{G}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	38.02	41.11	21.45	41.06	50.21
优化比例	0.24	0.78	-0.60	0.59	0.14
$ {L}_{{\mathrm{D}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	38.09	41.23	21.66	41.18	50.32
优化比例	0.42	0.76	0.31	0.88	0.36
$ {L}_{{\mathrm{C}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	38.65	41.96	21.32	41.83	51.51
优化比例	1.90	2.87	-1.20	2.47	2.73
$ {L}_{{\mathrm{K}}{\mathrm{I}}{\mathrm{o}}{\mathrm{U}}} $	38.76	42.04	21.56	41.92	52.21
优化比例	2.19	3.06	-0.09	2.69	4.13

选择文件类型/文献管理软件名称

选择包含的内容