作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (2): 344-355. doi: 10.19678/j.issn.1000-3428.0068781

• 图形图像处理 • 上一篇    下一篇

基于调制-全局推理的弱监督语义分割算法研究

刘洲峰*(), 李冰芮, 杨瑞敏, 李春雷, 何媛, 丁淑敏   

  1. 中原工学院电子信息学院, 河南 郑州 450007
  • 收稿日期:2023-11-07 出版日期:2025-02-15 发布日期:2024-04-26
  • 通讯作者: 刘洲峰
  • 基金资助:
    国家自然科学基金(62072489); 河南省高校科技创新团队项目(21IRTSTHN013); 中原科技创新领军人才项目(234200510009); 河南省科技攻关项目(222102210008); 河南省科技攻关项目(232102211002); 河南省科技攻关项目(232102211030)

Research on Weakly Supervised Semantic Segmentation Algorithm Based on Modulation-Global Reasoning

LIU Zhoufeng*(), LI Bingrui, YANG Ruimin, LI Chunlei, HE Yuan, DING Shumin   

  1. School of Electrical and Information Engineering, Zhongyuan University of Technology, Zhengzhou 450007, Henan, China
  • Received:2023-11-07 Online:2025-02-15 Published:2024-04-26
  • Contact: LIU Zhoufeng

摘要:

基于图像级标签的弱监督语义分割方法可利用少量带有图像级标签的注释对网络进行训练, 从而减轻注释负担。然而, 现有基于类激活映射的方法存在分割区域不完整的问题。为使最终分割预测结果包含更多前景目标, 提出一种基于调制-全局推理的弱监督语义分割方法。在分类网络中, 首先设计空间-通道激活调制模块以提取更完整的目标对象特征, 从而避免类激活图过度关注显著性区域; 其次提出全局推理单元模块, 利用该模块捕获特征图中不相交区域和较远区域之间的全局关系以便选出包含更完整的目标对象, 从而进一步增强非显著区域的特征; 最后通过设计潜在目标挖掘模块以降低伪标签中的假阴性率, 进而提取其中的丢失信息, 从而有效缓解初始伪标签中目标区域不完整的问题。在分割网络中, 将分类网络生成的初始预测和伪标签相结合, 并通过非显著区域挖掘模块进一步生成掩蔽伪标签从而提升分割效果。实验结果表明, 该方法在仅使用图像级标签的情况下, 在Pascal VOC 2012验证集和测试集上的精度分别为69.5%和69.8%, 在MS COCO 2014验证集上的精度为32.8%, 同时可有效解决分割区域不完整的问题, 优于已有方法。

关键词: 语义分割, 弱监督, 非显著区域, 激活调制, 全局推理单元

Abstract:

To reduce the annotation burden, a weakly supervised semantic method based on image-level labels can be used to train a network using a few annotations. However, existing methods based on class activation mapping suffer from incomplete segmentation. To overcome this limitation, a weakly supervised semantic segmentation method based on modulation-global reasoning is proposed. First, a spatial-channel activation modulation module is designed to extract more complete features of the target object and prevent class activation maps from focusing excessively on salient regions in the classification network. Moreover, a global inference unit module is proposed, which can be used to capture the global relationship between the disjoint and distant regions in the feature map to select more complete target objects and enhance the features of non-saliency areas. Finally, a potential object mining module is designed to reduce the false negative rate and extract missing information in pseudo labels, thus solving the issue of incomplete target regions in the initial pseudo labels. In the segmentation network, the initial prediction generated by the classification network is combined with the pseudo label, and a masking pseudo label is generated by the non-saliency region mining module to improve segmentation. Experimental results on Pascal VOC 2012 validation and test datasets indicate that the accuracy of the method is 69.5% and 69.8%, respectively, and 32.8% on the MS COCO2014 validation dataset upon using only image-level labels. The proposed method effectively resolves the issue of incomplete segmentation regions and is superior to state-of-the-art methods.

Key words: semantic segmentation, weakly supervised, non-saliency region, activation modulation, global inference unit