引入独立融合分支的双模态语义分割网络

doi:10.19678/j.issn.1000-3428.0062066

摘要/Abstract

摘要： 基于可见光、红外双模态数据的场景语义分割在多种复杂环境下较单模态分割显现出更好的性能，然而，获取较好分割效果的前提条件是可见光相机和红外热像仪的成像均清晰。真实场景中存在较多不利的环境因素，如恶劣的光照和天气会对可见光或红外产生不同程度的干扰，从而限制了基于双模态语义分割方法的性能表现。为解决该问题，建立一种改进的双模态语义分割模型。在双流网络架构的基础上增加红外与可见光的像素级融合模块，将其作为一个独立的分支网络并与可见光、红外2个已有分支进行特征级融合，从而实现双模态的像素级和特征级融合。此外，在融合分支中增加空间、通道注意力机制，以挖掘双模态在像素级上的互补特征。实验结果表明，在MF和FR-T这2个公开数据集上，该模型的mIoU指标相比性能表现次优的RTFNet-50模型分别提高6.5和0.6个百分点，且在双模态图像降质和失效时依然具有良好的分割性能。

关键词: 语义分割, 双模态, 注意力机制, 特征融合, 自适应融合分支

Abstract: Scene semantic segmentation based on visible and infrared dual-mode data typically shows better performance than single-mode segmentation in a variety of complex environments.However, the precondition for obtaining better segmentation results is that the images of a visible camera and infrared thermal imager should be clear. Many unfavorable environmental factors are present in real scenes, including bad light and weather, which interfere with visible or infrared light to varying degrees.These factors limit the performance of the dual-mode semantic segmentation method.To solve this problem, an improved dual-mode semantic segmentation model is developed in this study.Based on the dual-stream network architecture, a pixel-level fusion module of infrared and visible light is added to the model.This is regarded as an independent branch network and is fused at the feature level with the two existing branches of visible and infrared light, enabling dual-mode pixel-level and feature-level fusion to be realized.In addition, spatial and channel attention mechanisms are added to the fusion branches to mine the complementary features of the two modes at the pixel-level.Experimental results show that the mIoU index of the model is 6.5 and 0.6 percentage points higher than that of the RTFNet-50 model with the second highest mIoU on the two public datasets of MF and FR-T, respectively.The model also exhibits good segmentation performance under dual-mode image degradation and failure.

Key words: semantical segmentation, dual-mode, attention mechanism, feature fusion, adaptive fusion branch

中图分类号:

TP391

田乐, 王欢. 引入独立融合分支的双模态语义分割网络[J]. 计算机工程, 2022, 48(8): 240-248,257.

TIAN Le, WANG Huan. Dual-Mode Semantical Segmentation Network with an Independent Fusion Branch[J]. Computer Engineering, 2022, 48(8): 240-248,257.

http://www.ecice06.com/CN/Y2022/V48/I8/240

图/表 17

20220825093237

20220825093240

20220825093244

20220825093247

20220825093251

20220825093254

20220825093257

20220825093301

20220825093305

20220825093309

20220825093313

20220825093316

20220825093319

20220825093323

20220825093326

20220825093625

20220825093630

参考文献

[1] HA Q S, WATANABE K, KARASAWA T, et al.MFNet:towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2017:5108-5115.
[2] DOLZ J, GOPINATH K, YUAN J, et al.HyperDense-net:a hyper-densely connected CNN for multi-modal image segmentation[J].IEEE Transactions on Medical Imaging, 2019, 38(5):1116-1126.
[3] SUN Y X, ZUO W X, LIU M.RTFNet:RGB-thermal fusion network for semantic segmentation of urban scenes[J].IEEE Robotics and Automation Letters, 2019, 4(3):2576-2583.
[4] SHIVAKUMAR S S, RODRIGUES N, ZHOU A, et al.PST900:RGB-thermal calibration, dataset and segmentation network[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2020:9441-9447.
[5] JOHN V, BOYALI A, THOMPSON S, et al.BVTNet:multi-label multi-class fusion of visible and thermal camera for free space and pedestrian segmentation[EB/OL].[2021-06-05].https://www.xueshufan.com/publication/3130396667.
[6] 施政, 毛力, 孙俊.基于YOLO的多模态加权融合行人检测算法[J].计算机工程, 2021, 47(8):234-242. SHI Z, MAO L, SUN J.YOLO-based multi-modal weighted fusion pedestrian detection algorithm[J].Computer Engineering, 2021, 47(8):234-242.(in Chinese)
[7] BADRINARAYANAN V, KENDALL A, CIPOLLA R.SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(12):2481-2495.
[8] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[9] CHOLLET F.Xception:deep learning with depthwise separable convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:1800-1807.
[10] 马震环, 高洪举, 雷涛.基于增强特征融合解码器的语义分割算法[J].计算机工程, 2020, 46(5):254-258, 266. MA Z H, GAO H J, LEI T.Semantic segmentation algorithm based on enhanced feature fusion decoder[J].Computer Engineering, 2020, 46(5):254-258, 266.(in Chinese)
[11] HU J, SHEN L, SUN G.Squeeze-and-excitation networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7132-7141.
[12] LI L Q, ZHAO X M, LU W, et al.Deep learning for variational multimodality tumor segmentation in PET/CT[J].Neurocomputing, 2020, 392:277-295.
[13] ZHOU T X, RUAN S, CANU S.A review:deep learning for medical image segmentation using multi-modality fusion[J].Array, 2019, 3/4:100004.
[14] SUN Y X, ZUO W X, YUN P, et al.FuseSeg:semantic segmentation of urban scenes based on RGB and thermal data fusion[J].IEEE Transactions on Automation Science and Engineering, 2021, 18(3):1000-1011.
[15] HAZIRBAS C, MA L N, DOMOKOS C, et al.FuseNet:incorporating depth into semantic segmentation via fusion-based CNN architecture[EB/OL].[2021-06-05].https://vision.in.tum.de/_media/spezial/bib/hazirbasma2016fusenet.pdf.
[16] LÜY, SCHIOPU I, MUNTEANU A.Multi-modal neural networks with multi-scale RGB-T fusion for semantic segmentation[J].Electronics Letters, 2020, 56(18):920-923.
[17] LI S T, KANG X D, FANG L Y, et al.Pixel-level image fusion:a survey of the state of the art[J].Information Fusion, 2017, 33:100-112.
[18] LE GUEN V.Cartoon + texture image decomposition by the TV-L1 model[J].Image Processing on Line, 2014, 4:204-219.
[19] 闫钧华, 杭谊青, 孙思佳.基于GPU的可见光与红外图像融合快速实现[J].计算机工程, 2013, 39(11):249-253. YAN J H, HANG Y Q, SUN S J.Image fusion fast realization of visible light and infrared image based on GPU[J].Computer Engineering, 2013, 39(11):249-253.(in Chinese)
[20] EIGEN D, RANZATO M, SUTSKEVER I.Learning factored representations in a deep mixture of experts[EB/OL].[2021-06-05].https://arxiv.org/abs/1312.4314.
[21] VALADA A, VERTENS J, DHALL A, et al.AdapNet:adaptive semantic segmentation in adverse environmental conditions[C]//Proceedings of IEEE International Conference on Robotics and Automation.Washington D.C., USA:IEEE Press, 2017:4644-4651.
[22] LATEEF F, RUICHEK Y.Survey on semantic segmentation using deep learning techniques[J].Neurocomputing, 2019, 338:321-348.
[23] VERTENS J, ZÜRN J, BURGARD W.HeatNet:bridging the day-night domain gap in semantic segmentation with thermal images[C]//Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems.Washington D.C., USA:IEEE Press, 2021:8461-8468.
[24] LI H, WU X J, DURRANI T.NestFuse:an infrared and visible image fusion architecture based on nest connection and spatial/channel attention models[J].IEEE Transactions on Instrumentation and Measurement, 2020, 69(12):9645-9656.
[25] GUNES H, PICCARDI M.Affect recognition from face and body:early fusion vs.late fusion[C]//Proceedings of IEEE International Conference on Systems, Man and Cybernetics.Washington D.C., USA:IEEE Press, 2005:3437-3443.

选择文件类型/文献管理软件名称

选择包含的内容