基于注意力机制与多尺度池化的实时语义分割网络

doi:10.19678/j.issn.1000-3428.0065885

摘要/Abstract

摘要：

现有语义分割算法在精确度方面表现良好，但在速度上难以满足实时性要求。为提升网络分割速度同时确保高精确度，提出一种新型实时语义分割网络。设计融合通道注意力模块，先通过最大池化和平均池化捕捉全局特征，对池化后的特征图进行级联、卷积和变形以得到各通道权重，再将原特征图与各通道权重进行矩阵乘法操作，得到融合通道权重。将融合通道权重与原特征图进行元素级乘法操作，保证各通道权重与原特征图有效融合。提出一种轻量化金字塔场景解析模块，使用多尺度池化操作充分捕捉多尺度目标特征，在原金字塔场景解析模块的基础上减少池化后的特征图通道数，从而降低计算量。池化后特征图以级联方式连接，利用输入特征图引导连接后的特征图，以有效融合高层和低层特征图。在公共图像数据集Cityscapes上进行实验，结果表明，该网络在验证集、测试集上的准确率分别达到74.6%、73.8%，分割速度达到60.6帧/s，分割性能优于ICNet、DFANet-A等网络。

关键词: 语义分割, 全局特征, 注意力机制, 金字塔场景解析, 多尺度池化

Abstract:

Existing semantic segmentation algorithms achieve high accuracy but their performance in real-time scenarios is insufficient owing to their low speed. Therefore, a new real-time semantic segmentation network is proposed to improve speed and ensure accuracy in network segmentation. First, Fusion Channel Attention Module(FCAM)is designed, largest and average pooling are applied to capture features. Through the cascade, convolution, and reshape operations, the weights of each channel is obtained. Subsequently, matrix multiplication of the original feature map and weights of each channel is performed to obtain the fused channel weights. Finally, element-level multiplication is performed between the fused channel weight and original feature map to ensure that the weight of each channel is effectively integrated with the original feature map. Additionally, a lightweight pyramid scene parsing module is designed based on the original pyramid scene parsing module. This uses a multi-scale pooling operation to fully capture the multi-scale characteristics of a target, which reduces the number of channels of the feature map in a cascaded manner and the amount of computation. Feature map after pooling connected in cascade way, an input feature figure is utilized to lead the connected feature map to learn integrating the high- and low- level feature maps effectively. Experiments conducted on the Cityscapes public image dataset show that the network achieves an accuracy of 74.6% and 73.8% on the validation and test sets, respectively, with a segmentation speed of 60.6 frame/s. Moreover, the segmentation performance is better than that of networks such as ICNet and DFANet-A.

Key words: semantic segmentation, global feature, attention mechanism, pyramid scene parsing, multi-scale pooling

王卓, 瞿绍军. 基于注意力机制与多尺度池化的实时语义分割网络[J]. 计算机工程, 2023, 49(10): 222-229, 238.

Zhuo WANG, Shaojun QU. Real-Time Semantic Segmentation Network Based on Attention Mechanism and Multi-Scale Pooling[J]. Computer Engineering, 2023, 49(10): 222-229, 238.

https://www.ecice06.com/CN/Y2023/V49/I10/222

图/表 10

图1 FCAM结构

Fig.1 FCAM structure

图2 SPPM结构

Fig.2 SPPM structure

图3 本文方法网络结构

Fig.3 The network structure of this method

图4 对比实验可视化效果

Fig.4 Visualization effect of comparative experiments

图5 消融实验可视化效果

Fig.5 Visualization effect of ablation experiment

参考文献 31

1	田萱, 王亮, 丁琪. 基于深度学习的图像语义分割方法综述. 软件学报, 2019, 30 (2): 440- 468. URL
	TIAN X, WANG L, DING Q. Review of image semantic segmentation based on deep learning. Journal of Software, 2019, 30 (2): 440- 468. URL
2	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2022-08-05]. https://arxiv.org/abs/1409.1556.
3	BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495. doi: 10.1109/TPAMI.2016.2644615
4	ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 6230-6239.
5	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 3431-3440.
6	SHENG P, SHI Y, LIU X, et al. LSNet: real-time attention semantic segmentation network with linear complexity. Neurocomputing, 2022, 509, 94- 101. doi: 10.1016/j.neucom.2022.08.049
7	吴琼, 瞿绍军. 融合注意力机制的端到端的街道场景语义分割. 小型微型计算机系统, 2023, 44 (7): 1514- 1520. URL
	WU Q, QU S J. End-to-end semantic segmentation of street scene with attention mechanism. Journal of Chinese Computer Systems, 2023, 44 (7): 1514- 1520. URL
8	鲁博, 瞿绍军. 融合BiFPN和改进Yolov3-tiny网络的航拍图像车辆检测方法. 小型微型计算机系统, 2021, 42 (8): 1694- 1698. URL
	LU B, QU S J. Vehicle detection method in aerial images based on BiFPN and improved Yolov3-tiny network. Journal of Chinese Computer Systems, 2021, 42 (8): 1694- 1698. URL
9	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL]. [2022-08-05]. https://arxiv.org/abs/1412.7062.
10	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 834- 848. doi: 10.1109/TPAMI.2017.2699184
11	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[EB/OL]. [2022-08-05]. https://arxiv.org/abs/1802.02611.
12	谭镭, 孙怀江. SKASNet: 用于语义分割的轻量级卷积神经网络. 计算机工程, 2020, 46 (9): 261- 267. URL
	TAN L, SUN H J. SKASNet: lightweight convolutional neural network for semantic segmentation. Computer Engineering, 2020, 46 (9): 261- 267. URL
13	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2022-08-05]. https://arxiv.xilesou.top/abs/1704.04861.
14	YU C Q, GAO C X, WANG J B, et al. BiSeNet V2: bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision, 2021, 129 (11): 3051- 3068. doi: 10.1007/s11263-021-01515-2
15	LI H C, XIONG P F, FAN H Q, et al. DFANet: deep feature aggregation for real-time semantic segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 9514-9523.
16	CHOLLET F. Xception: deep learning with depthwise separable convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 1800-1807.
17	POUDEL R P K, LIWICKI S, CIPOLLA R. Fast-SCNN: fast semantic segmentation network[EB/OL]. [2022-08-05]. https://arxiv.org/abs/1902.04502.
18	叶剑锋, 徐轲, 熊峻峰, 等. 基于注意力机制和辅助任务的语义分割算法. 计算机工程, 2021, 47 (9): 203-209, 216. URL
	YE J F, XU K, XIONG J F, et al. Semantic segmentation algorithm based on attention mechanism and auxiliary task. Computer Engineering, 2021, 47 (9): 203-209, 216. URL
19	HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 7132-7141.
20	WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[EB/OL]. [2022-08-05]. https://arxiv.org/abs/1807.06521.
21	PENG J, LIU Y, TANG S, et al. PP-LiteSeg: a superior real-time semantic segmentation model[EB/OL]. [2022-08-05]. https://arxiv.org/abs/2204.02681.
22	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation[EB/OL]. [2022-08-05]. https://arxiv.org/abs/1505.04597.
23	罗嗣卿, 张志超, 岳琪. 基于改进SEGNET模型的图像语义分割. 计算机工程, 2021, 47 (4): 256- 261. URL
	LUO S Q, ZHANG Z C, YUE Q. Semantic image segmentation based on improved SEGNET model. Computer Engineering, 2021, 47 (4): 256- 261. URL
24	SU Z B, LI W, MA Z, et al. An improved U-Net method for the semantic segmentation of remote sensing images. Applied Intelligence, 2022, 52 (3): 3276- 3288.
25	FAN M Y, LAI S Q, HUANG J S, et al. Rethinking BiSeNet for real-time semantic segmentation[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 9711-9720.
26	GUO M H, XU T X, LIU J J, et al. Attention mechanisms in computer vision: a survey. Computational Visual Media, 2022, 8 (3): 331- 368.
27	ZHAO H S, QI X J, SHEN X Y, et al. ICNet for real-time semantic segmentation on high-resolution images[EB/OL]. [2022-08-05]. https://arxiv.org/abs/1704.08545.
28	CHEN W, GONG X, LIU X, et al. FasterSeg: searching for faster real-time semantic segmentation[EB/OL]. [2022-08-05]. https://arxiv.org/abs/1912.10917.
29	ROMERA E, ÁLVAREZ J M, BERGASA L M, et al. ERFNet: efficient residual factorized ConvNet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems, 2018, 19 (1): 263- 272.
30	EMARA T, EL MUNIM H E A, ABBAS H M. LiteSeg: a novel lightweight ConvNet for semantic segmentation[C]//Proceedings of Digital Image Computing: Techniques and Applications. Washington D. C., USA: IEEE Press, 2020: 1-7.
31	YU C Q, WANG J B, PENG C, et al. BiSeNet: bilateral segmentation network for real-time semantic segmentation[EB/OL]. [2022-08-05]. https://arxiv.org/abs/1808.00897.

[1]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[2]	林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141.
[3]	李泽霖, 吕兆峰, 陈富强, 李克. 基于多跳信息融合的实体对齐模型[J]. 计算机工程, 2024, 50(9): 142-152.
[4]	王汝英, 马嘉骏, 董建强, 刘万龙, 张海涛, 尹凯, 赵博超. 基于MTS-BiGRU-DMHSA的工业负荷预测方法[J]. 计算机工程, 2024, 50(9): 169-178.
[5]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[6]	张天鹏, 韩晶, 吕学强. 基于多任务学习的超分辨率辅助小目标检测[J]. 计算机工程, 2024, 50(9): 304-312.
[7]	郭敏, 张熙涵, 李阳. 融合注意力的教师互一致性半监督医学图像分割[J]. 计算机工程, 2024, 50(9): 313-323.
[8]	曾钰琦, 刘博, 钟柏昌, 钟瑾. 智慧教育下基于改进YOLOv8的学生课堂行为检测算法[J]. 计算机工程, 2024, 50(9): 344-355.
[9]	饶日昕, 王怡文, 曾砺志, 童心恬, 赵海涛. 面向废旧电缆检测的轻量化网络模型[J]. 计算机工程, 2024, 50(8): 22-30.
[10]	李华昱, 张智康, 闫阳, 岳阳. 基于知识图谱增强的领域多模态实体识别[J]. 计算机工程, 2024, 50(8): 31-39.
[11]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[12]	陈瀚, 赵春蕾, 蒋昊达, 王春东. 基于融合模型与语义网络的App用户意图识别研究[J]. 计算机工程, 2024, 50(8): 50-63.
[13]	李仲, 冒睿瑞, 王晓龙, 王根一, 安国成. 基于改进PIDNet的水位线检测算法[J]. 计算机工程, 2024, 50(8): 102-112.
[14]	闵莉, 董冰洁, 安冬. 基于多注意力机制与跨特征融合的语义分割算法[J]. 计算机工程, 2024, 50(8): 282-289.
[15]	王夙喆, 张雪英, 陈晓玉, 李凤莲, 吴泽林. 基于有效注意力和GAN结合的脑卒中EEG增强算法[J]. 计算机工程, 2024, 50(8): 336-344.

选择文件类型/文献管理软件名称

选择包含的内容