作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (8): 282-289. doi: 10.19678/j.issn.1000-3428.0067682

• 图形图像处理 • 上一篇    下一篇

基于多注意力机制与跨特征融合的语义分割算法

闵莉*(), 董冰洁, 安冬   

  1. 沈阳建筑大学机械工程学院, 辽宁 沈阳 110168
  • 收稿日期:2023-05-23 出版日期:2024-08-15 发布日期:2024-03-19
  • 通讯作者: 闵莉
  • 基金资助:
    国家自然科学基金面上项目(51975130); 辽宁省教育厅项目(LJKMZ20220915)

Semantic Segmentation Algorithm Based on Multi-Attention Mechanism and Cross-Feature Fusion

Li MIN*(), Bingjie DONG, Dong AN   

  1. School of Mechanical Engineering, Shenyang Jianzhu University, Shenyang 110168, Liaoning, China
  • Received:2023-05-23 Online:2024-08-15 Published:2024-03-19
  • Contact: Li MIN

摘要:

图像语义分割技术在缺陷检测、医疗诊断、无人驾驶等领域广泛应用。针对现有语义分割模型普遍存在训练成本过高、目标轮廓分割效果不佳以及对小目标误分割、漏分割等问题, 基于DeepLabv3+网络框架, 提出多注意力机制与跨特征融合相结合的图像语义分割算法。该算法选取轻量级网络MobileNetv2作为主干, 以缩短训练时间; 通过优化空洞空间金字塔池化模块中空洞卷积的膨胀率, 改善多尺度语义特征的提取效果, 提高模型对小目标的分割能力, 并将兼具通道与空间的卷积块注意力机制引入其中, 更加关注对分割起决定作用的区域, 从而加强对目标边界的提取; 在编码器中设计跨特征融合模块, 以聚合不同层次特征图的空间信息和语义信息, 提高网络学习特征的能力; 在编码和解码部分均引入坐标注意力机制, 以分解全局平均池化的方式将位置信息嵌入到通道中, 从而得到分割目标的准确位置。实验结果表明, 所提算法F3crc-DeepLabv3+在PASCAL VOC 2012增强数据集和Cityspaces数据集上的平均交并比分别达到了75.06%和73.06%, 平均精度分别达到了84.16%和82.05%, 精确率分别达到了86.18%和85.43%, 训练时间分别为10 h和13.8 h, 具有较优的网络性能。

关键词: 语义分割, DeepLabv3+网络, MobileNetv2网络, 坐标注意力, 卷积块注意力模块, 跨特征融合

Abstract:

Image semantic segmentation is widely used in defect detection, medical diagnosis, and unmanned driving. To address the common problems of existing semantic segmentation models, such as their high training costs, poor target contour segmentation, small target missegmentation and missing segmentation, this study proposes an image semantic segmentation algorithm based on the DeepLabv3+ network framework, which combines a multi-attention mechanism and Cross-Feature Fusion (CFF). In this algorithm, the lightweight network MobileNetv2 is selected as the backbone to reduce the training time. The expansion rate of the void convolution in the void space pyramid pool module is optimized, the extraction effects of multiscale semantic features are improved, and the segmentation ability of the model for small targets is improved. A convolution block attention mechanism with both a channel and space is introduced, and more attention is paid to the region that plays a decisive role in segmentation to enhance the extraction of target boundaries. A cross-feature fusion module is designed in the encoder to aggregate the spatial and semantic information of the feature graphs at different levels to thereby improve the feature learning ability of the network. A Coordinate Attention (CA) mechanism is introduced in both the encoding and decoding parts, and the location information is embedded into the channel using global average pooling decomposition to obtain the exact location of the segmented target. The experimental results show that the proposed algorithm F3crc-DeepLabv3+ achieves average crossover ratios of 75.06% and 73.06%, average accuracies of 84.16% and 82.05%, and precision rates of 86.18% and 85.43%, respectively, on the PASCAL VOC 2012 enhanced dataset. The training times are only 10 h and 13.8 h, respectively, indicating that the algorithm achieves better network performance.

Key words: semantic segmentation, DeepLabv3+ network, MobileNetv2 network, Coordinate Attention(CA), Convolution Block Attention Module(CBAM), Cross-Feature Fusion(CFF)