Improved YOLOv8-based Algorithm for Instance Segmentation in Traffic Scenes

doi:10.19678/j.issn.1000-3428.0068677

Abstract

Abstract: To achieve assisted driving and vehicle-road coordination, high-precision real-time detection and segmentation of traffic scenes are crucial. However, instance segmentation in traffic scenarios has its challenges, including complex environments, object stacking, and low object resolution that may cause false detections, missing detections, and missing masks. Moreover, the widely used two-stage models in high-precision instance segmentation studies often come with a large number of parameters, making real-time requirements challenging to achieve. Proposing an Instance Segmentation Algorithm (DE-YOLO) based on Improved YOLOv8. To decrease the effect of complex backgrounds in images, efficient multi-scale attention is introduced, and cross-dimensional interaction ensures an even spatial feature distribution within each feature group. In the backbone network, deformable convolution using DCNv2 is combined with the C2f convolutional layer to surpass the limitations of traditional convolutions and increase flexibility. This is done to reduce harmful gradient effects and improve the overall accuracy of the detector. The dynamic non-monotonic Wise-IoU (WIoU) focusing mechanism is used instead of the traditional CIoU loss function to evaluate the quality, optimize detection frame positioning, and improve segmentation accuracy. Meanwhile, Mixup data enhancement processing is enabled to enrich the training features of the dataset and improve the model's learning ability. The experimental results demonstrate that DE-YOLO improves the average accuracy (mAPmask) by 2.0 percentage points and 3.2 percentage points by APmask@0.5 compared to the benchmark model YOLOv8n-seg in the cityscapes dataset of urban landscapes. Furthermore, DE-YOLO maintains excellent detection speed and small parameter quantity while improving the accuracy, with the model requiring 2.2-31.3 percentage points fewer parameters than similar models.

摘要： 实现辅助驾驶、车路协同均需要对交通场景进行高精度的实时检测分割，但在实例分割过程中，由于环境复杂、目标堆叠、对象分辨率低等因素，存在着错检、漏检及掩膜缺失等问题，且针对高精度实例分割研究中多采用二阶段模型，通常因参数量过大无法满足实时性需求。提出一种基于改进型YOLOv8 的实例分割算法（DE-YOLO）。为减少图像中复杂背景的干扰，引入高效多尺度注意力机制，跨维交互使各特征组内空间语义特征分布平均。在主干网络部分，使用可变形卷积DCNv2结合C2f卷积层，突破原始卷积限制，增加可变性。为减小有害梯度并整体提升检测器精度，采用动态非单调聚焦机制Wise-IoU（WIoU）替代CIoU损失函数进行质量评估，优化检测框定位，提升分割精度。同时，开启Mixup数据增强处理，充实数据集丰富训练特征，提升模型学习能力。实验结果表明，DE-YOLO在城市景观数据集Cityscapes中的掩膜平均精度（mAPmask）较基准模型YOLOv8n-seg提高了2.0个百分点，APmask@0.5提升了3.2个百分点，在精度提升的同时，保持了优良的检测速度和小参数量，模型参数量相较同类模型低2.2-31.3个百分点。

ZHAO Nannan, GAO Feichen. Improved YOLOv8-based Algorithm for Instance Segmentation in Traffic Scenes[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0068677.

赵南南 , 高翡晨. 基于改进YOLOv8的交通场景实例分割算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0068677.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068677

References

[1] TOROYAN T. Global status report on road safety［J］. In jury Prevention，2009，15（4）：286.
[2] HUVAL B， WANG T， TANDON S，et al. An Empiric al Evaluation of Deep Learning on High⁃way Driving［E B/OL］. 2015：arXiv：1504. 01716.https：//arxiv. org/a bs/1504. 01716"
[3] JIANG Y，TAN Z，WANG J，et al. GiraffeDet：a heav y-neck paradigm for object detection［EB/OL］.［2022- 05-10］.https：//arxiv. org/abs/2202. 04256.
[4] STRUDEL R，GARCIA R，LAPTEV I，et al. Segmente r：Transformer for semantic segmentation［C］//Proceed ings of IEEE/CVF International Conference on Computer Vision. Washington D. C. ，USA：IEEE Press，2022：7 242-7252.
[5] 刘文波,叶涛,李颀.基于改进 SOLO v2 的番茄叶部病害检测方法[J].农业机械学报,2021,52(08):213-220. LIU W B,YE T,LI Q.Tomato leaf disease detection metho d based on improved SOLO v2[J].Transactions of the Chin ese Society for Agricultural Machinery,2021,52(8):213-22 0.(in Chinese)
[6] 穆世义，徐树公. 基于单字符注意力的全品类鲁棒车牌识别［J］. 自动化学报，2023，49（1）：122-134.MU S Y，XU S G. Full-category robust license plate recogniti on based on character attention［J］. Acta Automatica Sin ica，2023，49（1）：122-134.（in Chinese）
[7] 彭道刚,陈晨,王丹豪等.基于改进 YOLOv7 的火电厂管道及阀门泄漏分割与检测[J/OL].控制与决策:1-9[2023-1 0-21].https://doi.org/10.13195/j.kzyjc.2023.0592. PENG D G, CHEN C, WANG D H, et al.Leakage segmen tation and detection of pipelines and valves in thermal po wer plants based on improved YOLOv7[J/OL].Control an d Decision:1-9[2023-10-21].https://doi.org/10.13195/j.kz yjc.2023.0592. （in Chinese）
[8] HE K M，GKIOXARI G，DOLLÁR P，et al. Mask R-C NN［C］//Proceedings of IEEE International Conference on Computer Vision. Washington D. C. ，USA：IEEE Pr ess，2017：2980-2988.
[9] BOLYA D，ZHOU C，XIAO F，et al. YOLACT：real-t ime instance segmentation［EB/OL］.［2022-05-10］. ht tps：//arxiv. org/abs/1904. 02689.
[10] WANG X L，KONG T，SHEN C H，et al. SOLO：seg menting objects by locations［C］//Proceedings of ECC V’20. Berlin，Germany：Springer，2020：649-665.
[11] WANG X, ZHANG R, SHEN C, et al. SOLO: A Simple Fr amework for Instance Segmentation[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2022, 44(11): 8587-8601.
[12] CHEN X L，GIRSHICK R，HE K M，et al. TensorMas k：a foundation for dense object segmentation［C］//Pro ceedings of IEEE/CVF International Conference on Comp uter Vision. Washington D. C. ，USA：IEEE Press，202 0：2061-2069.
[13] Hurtik P, Molek V, Hula J, et al. Poly-YOLO: higher speed, more precise detection and instance segmentation for YO LOv3[J]. Neural Computing and Applications, 2022, 34(1 0): 8275-8290.
[14] Jocher G，Nishimura K，Mineeva T，et al． YOLOv5 ［EB / OL］． ( 2020 － 06 － 26) ［2021 － 06 － 0 2］． https: / /github． com /ultralytics/yolov5．
[15] Ｒedmon J，Divvala S，Girshick Ｒ，et al． You only lo ok once: unified，real-time object detection［C］/ /IEEE Conference on Computer Vision and Pattern Ｒecognitio n． Las Vegas: IEEE， 2016: 779 － 788．
[16] 李成严,车子轩,郑企森.基于特征与数据增强的城市街景实例分割算法[J/OL].哈尔滨理工大学学报:1-8[2023-1 0-21].http://kns.cnki.net/kcms/detail/23.1404.N.20230601.1134.022.html. Li C Y, Che Z X, Zheng Q S. Instance segmentation algori thm of urban street scene based on data augmentation and feature enhancement [J/OL].Journal of Harbin University of Science and Technology:1-8[2023-10-21].http://kns.cnk i.net/kcms/detail/23.1404.N.20230601.1134.022.html. （i n Chinese）
[17] 宋亮,谷玉海,黄佳伟.改进 SOLOv2 的非结构化道路图像实例分割[J/OL].激光杂志:1-7[2023-10-21].http://kns.c nki.net/kcms/detail/50.1085.TN.20230529.1948.008.html. Song L, Gu Y H, Huang J W.Improved segmentation of un structured road image instance in SOLOv2[J/OL]. Laser J ournal:1-7[2023-10-21].http://kns.cnki.net/kcms/detail/50. 1085.TN.20230529.1948.008.html. （in Chinese）
[18] 陈妍妍,王海,蔡英凤等.基于检测的高效自动驾驶实例分割方法[J].汽车工程,2023,45(04):541-550.DOI:10.1956 2/j.chinasae.qcgc.2023.04.002. CHEN Y Y, WANG H, CAI Y F, et al. Efficient automatic driving instance segmentation method based on detection [J].Automotive Engineering,2023,45(04):541-550.DOI:10. 19562/j.chinasae.qcgc.2023.04.002. （in Chinese）
[19] Zhu X, Hu H, Lin S, et al. Deformable convnets v2: More deformable, better results[C]//Proceedings of the IEEE/CV F conference on computer vision and pattern recognition. 2 019: 9308-9316.
[20] Ouyang D, He S, Zhang G, et al. Efficient Multi-Scale Atte ntion Module with Cross-Spatial Learning[C]//ICASSP 20 23-2023 IEEE International Conference on Acoustics, Spe ech and Signal Processing (ICASSP). IEEE, 2023: 1-5.
[21] Zheng Z, Wang P, Ren D, et al. Enhancing geometric facto rs in model learning and inference for object detection and instance segmentation[J]. IEEE transactions on cybernetics, 2021, 52(8): 8574-8586.
[22] Tong Z, Chen Y, Xu Z, et al. Wise-IoU: Bounding Box Re gression Loss with Dynamic Focusing Mechanism[J]. arXi v preprint arXiv:2301.10051, 2023.
[23] Zhang H, Cisse M, Dauphin Y N, et al. mixup: Beyond em pirical risk minimization[J]. arXiv preprint arXiv:1710.094 12, 2017.
[24] Cordts M, Omran M, Ramos S, et al. The cityscapes datase t[C]//CVPR Workshop on the Future of Datasets in Vision. sn, 2015, 2.
[25] Bochkovskiy A, Wang C Y, Liao H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint a rXiv:2004.10934, 2020.
[26] Ge Z, Liu S, Wang F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv:2107.08430, 2021.
[27] Mao M, Zhang R, Zheng H, et al. Dual-stream network for visual recognition[J]. Advances in Neural Information Pro cessing Systems, 2021, 34: 25346-25358.
[28] Li X, Wang W, Wu L, et al. Generalized focal loss: Learnin g qualified and distributed bounding boxes for dense objec t detection[J]. Advances in Neural Information Processing Systems, 2020, 33: 21002-21012.
[29] Ruby U, Yendapalli V. Binary cross entropy with deep lear ning technique for image classification[J]. Int. J. Adv. Tren ds Comput. Sci. Eng, 2020, 9(10).
[30] REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized i ntersection over union: A metric and a loss for bounding b ox regression[C]//Proceedings of the IEEE Computer Soci ety Conference on Computer Vision and Pattern Recogniti on, 2019: 658-666. DOI:10.1109/CVPR.2019.00075.
[31] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitatio n networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011-2023. DOI:10.11 09/TPAMI.2019.2913372.
[32] Wang X, Girshick R, Gupta A, et al. Non-local neural netw orks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7794-7803.
[33] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional bl ock attention module[C]//Lecture Notes in Computer Scie nce (including subseries Lecture Notes in Artificial Intellig ence and Lecture Notes in Bioinformatics), 2018, 11211 L NCS: 3-19. DOI:10.1007/978-3-030-01234-2_1.
[34] Wang Q, Wu B, Zhu P, et al. ECA-Net: Efficient channel at tention for deep convolutional neural networks[C]//Procee dings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 11534-11542.

Please choose a citation manager

Content to export