Image Semantic Segmentation Based on Multi-level Superposition and Attention Mechanism

doi:10.19678/j.issn.1000-3428.0065940

Abstract

Abstract:

To address the common problems such as small-scale targets being easily lost and boundary segmentation being discontinuous owing to the complexity of target space, a semantic image segmentation model based on multi-level superposition and attention mechanism is established using the DeepLabv3+network structure. The encoder stage involves the following: average pooling operations are used at different scales to construct a multi-scale average pooling module; hollow convolutions with different expansion rates are used to form a multi-scale superposition module, expand the receptive field of convolution operations, and enhance the ability to obtain local features; an attention mechanism module composed of channels and spaces is utilized to suppress meaningless features, enhance meaningful features, and improve the segmentation accuracy of small-scale targets and target boundaries. In the decoder stage, bilinear interpolation is used to restore the resolution of the feature map, and pixel filling is combined with channel dimension information to supplement the feature information. A Softmax activation function is used for semantic segmentation output prediction. The experimental results show that the Mean Intersection over Union(MIoU)of this model on the PASCAL VOC2012 and SUIM public datasets reaches 85.6% and 60.8%, respectively. It significantly outperforms most image semantic segmentation models in terms of overall segmentation accuracy and small-scale image segmentation performance.

Key words: semantic segmentation, small-scale target, attention mechanism, multi-scale superposition, multi-scale average pooling

摘要：

针对目标空间复杂度高容易造成小尺度目标丢失和边界分割不连续等问题，借鉴DeepLabv3+网络结构，建立基于多级叠加和注意力机制的图像语义分割模型。在编码器阶段，采用不同尺度的平均池化操作构建多尺度平均池化模块，使用不同扩张率的空洞卷积组成多尺度叠加模块扩大卷积运算的感受野，增强对局部特征的获取能力，并利用由通道和空间组成的注意力机制模块抑制无意义的特征，增强有意义的特征，提高对小尺度目标及局部边界的分割精度。在解码器阶段，通过双线性插值法对特征图进行分辨率恢复，并结合通道维度信息进行像素填充补充特征信息，并使用Softmax激活函数进行语义分割的输出预测。实验结果表明，该模型在PASCAL VOC2012和SUIM公开数据集上的平均交并比分别达到85.6%和60.8%，在整体分割精度和小尺度图像的分割效果上明显优于多数图像语义分割模型。

关键词: 语义分割, 小尺度目标, 注意力机制, 多尺度叠加, 多尺度平均池化

Xiaodong SU, Shizhou LI, Jiayuan ZHAO, Hongyu LIANG, Yurong ZHANG, Hongyan XU. Image Semantic Segmentation Based on Multi-level Superposition and Attention Mechanism[J]. Computer Engineering, 2023, 49(9): 265-271, 278.

苏晓东, 李世洲, 赵佳圆, 亮洪宇, 张玉荣, 徐红岩. 基于多级叠加和注意力机制的图像语义分割[J]. 计算机工程, 2023, 49(9): 265-271, 278.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0065940

http://www.ecice06.com/EN/Y2023/V49/I9/265

Figures/Tables 14

Fig.1 DeepLabv3+ network structure

Fig.2 Xception network structure

Fig.3 Structure of image semantic segmentation model based on multi-level superposition and attention mechanism

Fig.4 Multi-scale average pooling module structure

Fig.5 Multi-scale superposition module structure

Fig.6 Attention mechanism module structure

Fig.7 Decoder module structure

Fig.8 Segmentation visualization results of different models on the PASCAL VOC2012 dataset

Fig.9 Segmentation visualization results of DmsefNet

Fig.10 Segmentation visualization results of different models on the SUIM dataset

References 26

1	JIANG F, GRIGOREV A, RHO S, et al. Medical image semantic segmentation based on deep learning. Neural Computing and Applications, 2018, 29(5): 1257- 1265. doi: 10.1007/s00521-017-3158-6
2	KIM W, SEOK J. Indoor semantic segmentation for robot navigating on mobile[C]//Proceedings of the 10th International Conference on Ubiquitous and Future Networks. Washington D. C., USA: IEEE Press, 2018: 22-25.
3	XIAO A R, YANG X F, LU S J, et al. FPS-Net: a convolutional fusion network for large-scale LiDAR point cloud segmentation. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 176, 237- 249. doi: 10.1016/j.isprsjprs.2021.04.011
4	CAO W G, HUANG X C, SHU F L. Location recognition of unmanned vehicles based on visual semantic information and geometric distribution. Journal of Automobile Engineering, 2021, 235(2/3): 552- 563.
5	KONG Y Y, ZHANG B W, YAN B Y, et al. Affiliated fusion conditional random field for urban UAV image semantic segmentation. Sensors, 2020, 20(4): 993. doi: 10.3390/s20040993
6	KANG B, NGUYEN T Q. Random forest with learned representations for semantic segmentation. IEEE Transactions on Image Processing, 2019, 28(7): 3542- 3555. doi: 10.1109/TIP.2019.2905081
7	HO T K. Random decision forests[C]//Proceedings of the 3rd International Conference on Document Analysis and Recognition. Washington D. C., USA: IEEE Press, 1995: 278-382.
8	CHANDRA S, KOKKINOS I. Fast, exact and multi-scale inference for semantic image segmentation with deep Gaussian CRFs. Berlin, Germany: Springer International Publishing, 2016.
9	SHELHAMER E, LONG J, DARRELL T. Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 640- 651. doi: 10.1109/TPAMI.2016.2572683
10	RONNEBERGER O, FISCHER P, BROX T. U-Net: convolutional networks for biomedical image segmentation. Berlin, Germany: Springer International Publishing, 2015.
11	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[EB/OL]. [2022-09-11]. https://arxiv.org/abs/1412.7062.
12	CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: semantic image segmentation with deep convolutional Nets, Atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834- 848. doi: 10.1109/TPAMI.2017.2699184
13	CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking Atrous convolution for semantic image segmentation[EB/OL]. [2022-09-11]. https://arxiv.org/abs/1706.05587.
14	CHEN L C, ZHU Y K, PAPANDREOU G, et al. Encoder-decoder with Atrous separable convolution for semantic image segmentation. Berlin, Germany: Springer International Publishing, 2018.
15	WANG Z M, WANG J S, YANG K, et al. Semantic segmentation of high-resolution remote sensing images based on a class feature attention mechanism fused with DeepLabv3+. Computers & Geosciences, 2022, 158, 104969.
16	FU H X, MENG D, LI W H, et al. Bridge crack semantic segmentation based on improved DeepLabv3+. Journal of Marine Science and Engineering, 2021, 9(6): 671. doi: 10.3390/jmse9060671
17	ALOM Z, ASARI V K, PARWANI A, et al. Microscopic nuclei classification, segmentation, and detection with improved Deep Convolutional Neural Networks(DCNN). Diagnostic Pathology, 2022, 17(1): 1- 17. doi: 10.1186/s13000-021-01174-4
18	MONTEIRO M, FIGUEIREDO M A T, OLIVEIRA A L. Conditional random fields as recurrent neural networks for 3D medical imaging segmentation[EB/OL]. [2022-09-11]. https://arxiv.org/abs/1807.07464.
19	叶剑锋, 徐轲, 熊峻峰, 等. 基于注意力机制和辅助任务的语义分割算法. 计算机工程, 2021, 47(9): 203-209, 216. URL
	YE J F, XU K, XIONG J F, et al. Semantic segmentation algorithm based on attention mechanism and auxiliary task. Computer Engineering, 2021, 47(9): 203-209, 216. URL
20	姚燕, 胡立坤, 郭军. 基于改进DeepLabv3+网络的轻量级语义分割算法. 激光与光电子学进展, 2022, 59(4): 0410015. URL
	YAO Y, HU L K, GUO J. Improved lightweight semantic segmentation algorithm based on DeepLabv3+ network. Laser & Optoelectronics Progress, 2022, 59(4): 0410015. URL
21	SUN L L, GE C H, ZHONG Y C. Design and implementation of face emotion recognition system based on CNN Mini_Xception frameworks. Journal of Physics: Conference Series, 2021, 2010(1): 012123. doi: 10.1088/1742-6596/2010/1/012123
22	邱云飞, 温金燕. 基于DeepLabV3+与注意力机制相结合的图像语义分割. 激光与光电子学进展, 2022, 59(4): 041008. URL
	QIU Y F, WEN J Y. Image semantic segmentation based on combination of DeepLabV3+ and attention mechanism. Laser & Optoelectronics Progress, 2022, 59(4): 041008. URL
23	景庄伟, 管海燕, 彭代峰, 等. 基于深度神经网络的图像语义分割研究综述. 计算机工程, 2020, 46(10): 1- 17. URL
	JING Z W, GUAN H Y, PENG D F, et al. Survey of research in image semantic segmentation based on deep neural network. Computer Engineering, 2020, 46(10): 1- 17. URL
24	范润泽, 刘宇红, 张荣芬, 等. 基于多尺度注意力机制的道路场景语义分割模型. 计算机工程, 2023, 49(2): 288- 295. URL
	FAN R Z, LIU Y H, ZHANG R F, et al. Road scene semantic segmentation model based on multi-scale attention mechanism. Computer Engineering, 2023, 49(2): 288- 295. URL
25	LIU J B, HE J J, QIAO Y, et al. Learning to predict context-adaptive convolution for semantic segmentation. Berlin, Germany: Springer International Publishing, 2020.
26	HE J, DENG Z, QIAO Y. Dynamic multi-scale filters for semantic segmentation[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 3562-3572.

[1]	HAN Lu, HUO Weigang, ZHANG Yonghui, LIU Tao. Multivariate Time Series Forecasting Based on Multi-Scale Feature Fusion and Dual-Attention Mechanism [J]. Computer Engineering, 2023, 49(9): 99-108.
[2]	LONG Yujiang, WEI Wei, SHU Yu, ZHANG Zhenggang, WANG Daolei, LI Feng. Detection Method for Damaged Rotating Insulator Based on Adaptive Key Points [J]. Computer Engineering, 2023, 49(9): 272-278.
[3]	YANG Jing, LU Minghua, MA Jieqiong, WU Jinping, LIU Xingxuan. Underwater Defense Posture Prediction Method Based on Alternating Recurrent Neural Network [J]. Computer Engineering, 2023, 49(9): 69-78.
[4]	Fangyu FENG, Xiaoshu LUO, Zhiming MENG, Guangyu WANG. Facial Expression Recognition Based on Anti-Aliasing Residual Attention Network [J]. Computer Engineering, 2023, 49(8): 190-198.
[5]	Chunbo XU, Juan YAN, Huibin YANG, Bo WANG, Han WU. Visual SLAM Algorithm Based on Target Detection and Semantic Segmentation [J]. Computer Engineering, 2023, 49(8): 199-206, 214.
[6]	Haoxin LIU, Chao DONG, Zhinan GOU, Kai GAO. Few-Shot Relation Extraction Method Fusing with Hybrid Representation [J]. Computer Engineering, 2023, 49(8): 63-68.
[7]	Changpei YANG, Liefa LIAO. Chinese Named Entity Recognition Based on Dilated Gated Convolution Feature Fusion [J]. Computer Engineering, 2023, 49(8): 85-95.
[8]	Junhao LIU, Meilin WANG, Xing XIE, Yexing SONG, Lihua XU. Leather Defect Detection Algorithm Based on Improved YOLOv5 [J]. Computer Engineering, 2023, 49(8): 240-249.
[9]	Na MA, Tingxin WEN, Xu JIA, Xiaohui LI. Adaptive Vehicle Face Re-identification Model Under Complex Illumination Conditions [J]. Computer Engineering, 2023, 49(8): 275-282, 290.
[10]	Lumeng CHEN, Yanyan CAO, Min HUANG, Xingang XIE. Flame Detection Method Based on Improved YOLOv5 [J]. Computer Engineering, 2023, 49(8): 291-301, 309.
[11]	Qianglong LI, Xinwen ZHOU, Meng'en WEI, Yangzhou GAN. Infrared Target Detection Algorithm Based on Strip Pooling and Attention Mechanism in Street Scene [J]. Computer Engineering, 2023, 49(8): 310-320.
[12]	Shupeng WANG, Yindi HE. Uneven Illumination Image Enhancement Algorithm Fusing Feature Attention Mechanism [J]. Computer Engineering, 2023, 49(8): 232-239.
[13]	Jiarong ZHANG, Jinsha YUAN, Jianing XU, Zhihong LUO. Mechanics Entities Recognition Algorithm Based on Multi-Meta Information Embedding and Collaborative Neural Network [J]. Computer Engineering, 2023, 49(7): 125-134.
[14]	Shan WU, Feng ZHOU. Small Target Detection Based on Improved SSD Algorithm [J]. Computer Engineering, 2023, 49(7): 179-188.
[15]	Yongsheng QI, Xiaoxu DU, Junfeng ZHU, Shengli GAO, Liqiang LIU. Efficient Livestock Detection in Grazing Areas Based on Enhanced Lightweight Deep Network [J]. Computer Engineering, 2023, 49(7): 278-287.

Please choose a citation manager

Content to export