基于改进SwiftNet的堆场图像实时分割网络

doi:10.19678/j.issn.1000-3428.0067995

摘要/Abstract

摘要： 在堆场环境下,实时图像语义分割可以提供直观的场景类别信息。为节约工控机等边缘设备的硬件资源以及为多源信息融合提供图像语义类别信息,提出一种轻量化的实时语义分割网络模型。首先提出基于空间注意力引导的上采样融合模块,通过引入空间注意力和残差注意力结构设计一种轻量化的解码器,在上采样过程中还原空间细节,抑制冗余信息,进而融合不同来源的特征图;其次提出一种轻量化的级联空洞空间金字塔模块,利用级联的空洞卷积单元增大网络感受野,有效提取多尺度特征;最后使用通道分离、通道混洗、通道池化等操作,降低多尺度聚合过程中的计算开销。在公开数据集Camvid上,该模型的平均交并比(MIoU)为70.1%,推理速度为146.3 帧/s,分割精度和推理速度优于ENet、ICNet等模型,消融实验结果也证明了所提各模块的有效性;在实际堆场图像数据集上,该模型的MIoU为93.5%,推理速度为123.8帧/s,证明模型结构具有良好的泛化性能。

关键词: 实时语义分割, 注意力机制, 空洞卷积, 感受野, 堆场图像

Abstract: In a storage yard environment, real-time image semantic segmentation can provide intuitive scene category information. To save the limited hardware resources of edge equipment, such as industrial computers, and provide image semantic category information for multi-source information fusion, this study proposes a lightweight real-time semantic segmentation network model. First, an upsampling fusion module based on spatial attention guidance is proposed. By introducing a spatial attention and residual attention structure, a lightweight decoder is designed to restore spatial details in the upsampling restoration process, suppress redundant information, and fuse feature maps from different sources. Second, a lightweight cascaded atrous space pyramid module is proposed, which uses cascaded atrous convolution elements to enhance the network receptive field and effectively extract multi-scale features. Simultaneously, the calculation cost of multi-scale polymerization is reduced by channel splitting, channel shufflement, and channel pooling. On the publicly available Camvid dataset, the Mean Intersection over Union (MIoU) of the model is 70.1%, inference speed is 146.3 frame/s, and the segmentation accuracy and inference speed are better than those of models such as ENet and ICNet. The ablation experiment results also prove the effectiveness of the proposed modules. In the actual storage yard image dataset, the MIoU of the model is 93.5%, and the inference speed is 123.8 frame/s, proving that the model structure has good generalization performance.

Key words: real-time semantic segmentation, attention mechanism, atrous convolution, receptive field, yard image

中图分类号:

TP391

陈晓玉, 沈晨, 沈阅, 孔德明. 基于改进SwiftNet的堆场图像实时分割网络[J]. 计算机工程, 2024, 50(6): 296-303.

CHEN Xiaoyu, SHEN Chen, SHEN Yue, KONG Deming. Real-Time Segmentation Network of Yard Images Based on Improved SwiftNet[J]. Computer Engineering, 2024, 50(6): 296-303.

收藏文章 0 / 推荐 / 导出引用

链接本文: https://www.ecice06.com/CN/10.19678/j.issn.1000-3428.0067995

https://www.ecice06.com/CN/Y2024/V50/I6/296

参考文献

[1] 田萱,王亮,丁琪.基于深度学习的图像语义分割方法综述[J].软件学报, 2019, 30(2):440-468. TIAN X, WANG L, DING Q. Review of image semantic segmentation based on deep learning[J]. Journal of Software, 2019, 30(2):440-468.(in Chinese)
[2] 景庄伟,管海燕,彭代峰,等.基于深度神经网络的图像语义分割研究综述[J].计算机工程, 2020, 46(10):1-17. JING Z W, GUAN H Y, PENG D F, et al. Survey of research in image semantic segmentation based on deep neural network[J]. Computer Engineering, 2020, 46(10):1-17.(in Chinese)
[3] HUANG L K, WANG M J J. Image thresholding by minimizing the measures of fuzziness[J]. Pattern Recognition, 1995, 28(1):41-51.
[4] 陆剑锋,林海,潘志庚.自适应区域生长算法在医学图像分割中的应用[J].计算机辅助设计与图形学学报, 2005, 17(10):2168-2173. LU J F, LIN H, PAN Z G. Adaptive region growing algorithm in medical images segmentation[J]. Journal of Computer Aided Design&Computer Graphics, 2005, 17(10):2168-2173.(in Chinese)
[5] HONG L, WAN Y F, JAIN A. Fingerprint image enhancement:algorithm and performance evaluation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(8):777-789.
[6] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2015:3431-3440.
[7] BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
[8] RONNEBERGER O, FISCHER P, BROX T. U-Net:convolutional networks for biomedical image segmentation[EB/OL].[2023-06-05].https://arxiv.org/abs/1505.04597.
[9] ORSIC M, KRESO I, BEVANDIC P, et al. In defense of pre-trained ImageNet architectures for real-time semantic segmentation of road-driving images[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2019:12607-12616.
[10] MA N N, ZHANG X Y, ZHENG H T, et al. ShuffleNetV2:practical guidelines for efficient CNN architecture design[C]//Proceedings of the 15th European Conference on Computer Vision. New York,USA:ACM Press,2018:122-138.
[11] PASZKE A, CHAURASIA A, KIM S, et al. ENet:a deep neural network architecture for real-time semantic segmentation[EB/OL].[2023-06-05].https://arxiv.org/abs/1606.02147.
[12] ZHAO H S, QI X J, SHEN X Y, et al. ICNet for real-time semantic segmentation on high-resolution images[C]//Proceedings of the 15th European Conference on Computer Vision. New York,USA:ACM Press,2018:418-434.
[13] YU C Q, WANG J B, PENG C, et al. BiSeNet:bilateral segmentation network for real-time semantic segmentation[C]//Proceedings of the 15th European Conference on Computer Vision. New York,USA:ACM Press,2018:334-349.
[14] YU C Q, GAO C X, WANG J B, et al. BiSeNetV2:bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11):3051-3068.
[15] LI H C, XIONG P F, FAN H Q, et al. DFANet:deep feature aggregation for real-time semantic segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2019:9522-9531.
[16] CHOLLET F. Xception:deep learning with depthwise separable convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2017:1251-1258.
[17] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2018:7132-7141.
[18] WOO S, PARK J, LEE J Y, et al. CBAM:convolutional block attention module[EB/OL].[2023-06-05].https://arxiv.org/abs/1807.06521.
[19] LI X, WANG W H, HU X L, et al. Selective kernel networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2019:510-519.
[20] WANG Q L, WU B G, ZHU P F, et al. ECA-Net:efficient channel attention for deep convolutional neural networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2020:11534-11542.
[21] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2017:2881-2890.
[22] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[23] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4):834-848.
[24] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[EB/OL].[2023-06-05].https://arxiv.org/abs/1706.05587.
[25] CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the 15th European Conference on Computer Vision. New York,USA:ACM Press,2018:833-851.
[26] 马素刚,陈期梅,侯志强,等.基于密集连接与特征增强的语义分割算法[J].计算机工程, 2023, 49(3):263-270. MA S G, CHEN Q M, HOU Z Q, et al. Semantic segmentation algorithm based on dense connection and feature enhancement[J]. Computer Engineering, 2023, 49(3):263-270.(in Chinese)
[27] WANG F, JIANG M Q, QIAN C, et al. Residual attention network for image classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C.,USA:IEEE Press,2017:3156-3164.
[28] WANG P Q, CHEN P F, YUAN Y, et al. Understanding convolution for semantic segmentation[C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision. Washington D.C.,USA:IEEE Press,2018:1451-1460.

编辑推荐 0

Metrics

阅读次数

全文

174

HTML			PDF

最新录用	在线预览	正式出版	最新录用	在线预览	正式出版
0	0	6	18	0	150

来源	本网站	其他网站

次数	93	81
比例	53%	47%

摘要

116

最新录用	在线预览	正式出版

20	0	96

来源	本网站	其他网站

次数	43	73
比例	37%	63%

选择文件类型/文献管理软件名称

选择包含的内容