Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2023, Vol. 49 ›› Issue (9): 265-271, 278. doi: 10.19678/j.issn.1000-3428.0065940

• Graphics and Image Processing • Previous Articles     Next Articles

Image Semantic Segmentation Based on Multi-level Superposition and Attention Mechanism

Xiaodong SU1,2, Shizhou LI1,2,*, Jiayuan ZHAO1,2, Hongyu LIANG1,2, Yurong ZHANG1,2, Hongyan XU1,2   

  1. 1. School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China
    2. Heilongjiang Key Laboratory of Electronic Commerce and Intelligent Information Processing, Harbin 150028, China
  • Received:2022-10-08 Online:2023-09-15 Published:2023-01-03
  • Contact: Shizhou LI

基于多级叠加和注意力机制的图像语义分割

苏晓东1,2, 李世洲1,2,*, 赵佳圆1,2, 亮洪宇1,2, 张玉荣1,2, 徐红岩1,2   

  1. 1. 哈尔滨商业大学 计算机与信息工程学院, 哈尔滨 150028
    2. 黑龙江省电子商务与智能信息处理重点实验室, 哈尔滨 150028
  • 通讯作者: 李世洲
  • 作者简介:

    苏晓东(1965—),男,教授,主研方向为计算机视觉

    赵佳圆,硕士研究生

    亮洪宇,硕士

    张玉荣,硕士研究生

    徐红岩,硕士研究生

  • 基金资助:
    黑龙江省自然科学基金(LH2022F035); 哈尔滨商业大学研究生创新科研项目(YJSCX2022-743HSD); 2022年哈尔滨商业大学教师创新支持计划项目(XL0068)

Abstract:

To address the common problems such as small-scale targets being easily lost and boundary segmentation being discontinuous owing to the complexity of target space, a semantic image segmentation model based on multi-level superposition and attention mechanism is established using the DeepLabv3+network structure. The encoder stage involves the following: average pooling operations are used at different scales to construct a multi-scale average pooling module; hollow convolutions with different expansion rates are used to form a multi-scale superposition module, expand the receptive field of convolution operations, and enhance the ability to obtain local features; an attention mechanism module composed of channels and spaces is utilized to suppress meaningless features, enhance meaningful features, and improve the segmentation accuracy of small-scale targets and target boundaries. In the decoder stage, bilinear interpolation is used to restore the resolution of the feature map, and pixel filling is combined with channel dimension information to supplement the feature information. A Softmax activation function is used for semantic segmentation output prediction. The experimental results show that the Mean Intersection over Union(MIoU)of this model on the PASCAL VOC2012 and SUIM public datasets reaches 85.6% and 60.8%, respectively. It significantly outperforms most image semantic segmentation models in terms of overall segmentation accuracy and small-scale image segmentation performance.

Key words: semantic segmentation, small-scale target, attention mechanism, multi-scale superposition, multi-scale average pooling

摘要:

针对目标空间复杂度高容易造成小尺度目标丢失和边界分割不连续等问题,借鉴DeepLabv3+网络结构,建立基于多级叠加和注意力机制的图像语义分割模型。在编码器阶段,采用不同尺度的平均池化操作构建多尺度平均池化模块,使用不同扩张率的空洞卷积组成多尺度叠加模块扩大卷积运算的感受野,增强对局部特征的获取能力,并利用由通道和空间组成的注意力机制模块抑制无意义的特征,增强有意义的特征,提高对小尺度目标及局部边界的分割精度。在解码器阶段,通过双线性插值法对特征图进行分辨率恢复,并结合通道维度信息进行像素填充补充特征信息,并使用Softmax激活函数进行语义分割的输出预测。实验结果表明,该模型在PASCAL VOC2012和SUIM公开数据集上的平均交并比分别达到85.6%和60.8%,在整体分割精度和小尺度图像的分割效果上明显优于多数图像语义分割模型。

关键词: 语义分割, 小尺度目标, 注意力机制, 多尺度叠加, 多尺度平均池化