作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (8): 240-248,257. doi: 10.19678/j.issn.1000-3428.0062066

• 图形图像处理 • 上一篇    下一篇

引入独立融合分支的双模态语义分割网络

田乐, 王欢   

  1. 南京理工大学 计算机科学与工程学院, 南京 210094
  • 收稿日期:2021-07-13 修回日期:2021-09-18 发布日期:2022-08-09
  • 作者简介:田乐(1996-),男,硕士研究生,主研方向为计算机视觉、图像处理、人工智能;王欢(通信作者),副教授。
  • 基金资助:
    国家自然科学基金(61703209)。

Dual-Mode Semantical Segmentation Network with an Independent Fusion Branch

TIAN Le, WANG Huan   

  1. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
  • Received:2021-07-13 Revised:2021-09-18 Published:2022-08-09

摘要: 基于可见光、红外双模态数据的场景语义分割在多种复杂环境下较单模态分割显现出更好的性能,然而,获取较好分割效果的前提条件是可见光相机和红外热像仪的成像均清晰。真实场景中存在较多不利的环境因素,如恶劣的光照和天气会对可见光或红外产生不同程度的干扰,从而限制了基于双模态语义分割方法的性能表现。为解决该问题,建立一种改进的双模态语义分割模型。在双流网络架构的基础上增加红外与可见光的像素级融合模块,将其作为一个独立的分支网络并与可见光、红外2个已有分支进行特征级融合,从而实现双模态的像素级和特征级融合。此外,在融合分支中增加空间、通道注意力机制,以挖掘双模态在像素级上的互补特征。实验结果表明,在MF和FR-T这2个公开数据集上,该模型的mIoU指标相比性能表现次优的RTFNet-50模型分别提高6.5和0.6个百分点,且在双模态图像降质和失效时依然具有良好的分割性能。

关键词: 语义分割, 双模态, 注意力机制, 特征融合, 自适应融合分支

Abstract: Scene semantic segmentation based on visible and infrared dual-mode data typically shows better performance than single-mode segmentation in a variety of complex environments.However, the precondition for obtaining better segmentation results is that the images of a visible camera and infrared thermal imager should be clear. Many unfavorable environmental factors are present in real scenes, including bad light and weather, which interfere with visible or infrared light to varying degrees.These factors limit the performance of the dual-mode semantic segmentation method.To solve this problem, an improved dual-mode semantic segmentation model is developed in this study.Based on the dual-stream network architecture, a pixel-level fusion module of infrared and visible light is added to the model.This is regarded as an independent branch network and is fused at the feature level with the two existing branches of visible and infrared light, enabling dual-mode pixel-level and feature-level fusion to be realized.In addition, spatial and channel attention mechanisms are added to the fusion branches to mine the complementary features of the two modes at the pixel-level.Experimental results show that the mIoU index of the model is 6.5 and 0.6 percentage points higher than that of the RTFNet-50 model with the second highest mIoU on the two public datasets of MF and FR-T, respectively.The model also exhibits good segmentation performance under dual-mode image degradation and failure.

Key words: semantical segmentation, dual-mode, attention mechanism, feature fusion, adaptive fusion branch

中图分类号: