用于低光场景分割的RGB-T融合网络

doi:10.19678/j.issn.1000-3428.0252509

摘要/Abstract

摘要： RGB-T(RGB-Thermal)语义分割是在照明不良或者完全黑暗的情况下实现可靠的语义场景理解的一种解决方案。热成像通过捕捉物体红外辐射特征，能在低光条件下保持稳定的边缘检测能力，可以有效弥补RGB图像在低光下导致的纹理细节丢失问题。然而，对于现有的RGB-T语义分割方法，在多个层次的信息交互中，未能对不同模态之间的有效信息进行充分利用，导致产生不准确的预测结果。为了解决这一问题，构建了CMFANet(Cross-Modal Fusion Attention Network)跨模态融合的注意力网络。首先设计了一个跨模态融合模块，旨在建立RGB图像和热图像特征之间的互补关系；其次考虑到多维度和多尺度信息的重要性，在编码端引入多维度注意力模块用来强化深层特征提取，在解码端引入多尺度特征聚合模块来帮助模型捕捉纹理细节和轮廓信息；最后在解码端引入小波变换与卷积优势互补，提高分割精确性。在MFNet数据集上，平均准确率(mAcc)和平均交并比(mIoU)指标分别达到73.8%和59.0%；在PST900数据集上，mAcc和mIoU指标分别达到90.71%和85.15%。与现有前沿方法相比，模型在关键目标（如MFNet的汽车、行人、自行车和PST900的幸存者、背包）上表现尤为突出，可视化结果验证了其能有效融合RGB与热成像模态信息，在低光场景下恢复纹理细节与目标轮廓，展现出了更好的分割效果和良好的泛化能力。

Abstract: RGB-T (RGB-Thermal) semantic segmentation is a solution that enables reliable semantic scene understanding under poor lighting conditions or in complete darkness. Thermal imaging captures object infrared radiation features, providing stable edge detection under low-light conditions. This effectively compensates for the loss of texture details in RGB images under such environments. However, existing RGB-T semantic segmentation methods fail to fully utilize effective cross-modal information during multi-level interactions, leading to inaccurate predictions. To address this issue, this work constructs CMFANet (Cross-Modal Fusion Attention Network). First, it designs a cross-modal fusion module to establish complementary relationships between RGB and thermal features. Second, considering the importance of multi-dimensional and multi-scale information, a multi-dimensional attention module is introduced at the encoder to enhance deep feature extraction, while a multi-scale feature aggregation module is added at the decoder to capture texture details and contour information. Finally, the decoder integrates wavelet transforms with convolutional operations to improve segmentation accuracy. On the MFNet dataset, CMFANet achieves 73.8% in mean accuracy (mAcc) and 59.0% in mean intersection-over-union (mIoU). On the PST900 dataset, it attains 90.71% mAcc and 85.15% mIoU. Compared with existing cutting-edge methods, the model performs particularly well on key targets (such as cars, persons and bikes in MFNet, and survivors and backpacks in PST900). Visualization results verify its ability to effectively fuse RGB and thermal imaging modality information, restore texture details and target contours in low-light scenarios, and demonstrate better segmentation performance and strong generalization capabilities.

李光, 周义强, 高心丹. 用于低光场景分割的RGB-T融合网络[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252509.

LI Guang , ZHOU Yiqiang, GAO Xindan. RGB-T Fusion Network for Semantic Segmentation in Low-Light Scenarios[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252509.

参考文献

[1] 黄雯珂,滕飞,王子丹,等.基于深度学习的图像分割综述 [J].计算机科学,2024,51(02):107-116. HUANG Wenke, TENG Fei, WANG Zidan, FENG Li. Image Segmentation Based on Deep Learning:A Sur vey[J]. Computer Science, 2024, 51(2): 107-116. https://doi.org/10.11896/jsjkx.230900002.
[2] Long J, Shelhamer E, Darrell T. Fully convolutional net works for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recogni tion. 2015: 3431-3440.
[3] Xie E, Wang W, Yu Z, et al. SegFormer: Simple and effi cient design for semantic segmentation with transform ers[J]. Advances in neural information processing sys tems, 2021, 34: 12077-12090.
[4] Zhao L, Zhou H, Zhu X, et al. Lif-seg: Lidar and camera image fusion for 3d lidar semantic segmentation[J]. IEEE Transactions on Multimedia, 2023, 26: 1158-1168.
[5] Li J, Dai H, Ding Y. Self-distillation for robust lidar se mantic segmentation in autonomous driv ing[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 659-676.
[6] 张欢, 仇大伟, 冯毅博, 刘静. U-Net 模型改进及其在医学图像分割上的研究综述[J]. 激光与光电子学进展, 2022, 59(2): 0200005. Huan Zhang, Dawei Qiu, Yibo Feng, Jing Liu. Improved U-Net Models and Its Applications in Medical Image Segmentation: A Review[J]. Laser & Optoelectronics Progress, 2022, 59(2): 0200005.
[7] 林畅, 郭伟, 任哲聪, 金海波. 基于Transformer的目标跟踪与分割统一算法[J]. 计算机工程, 2024, 50(9): 130-141. LIN Chang, GUO Wei, REN Zhecong, JIN Haibo. Unifi cation Algorithm for Object Tracking and Segmentation Based on Transformer[J]. Computer Engineering, 2024, 50(9): 130-141.
[8] Ainetter S, Fraundorfer F. End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from rgb[C]//2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021: 13452-13458.
[9] 彭大鑫, 甄彤, 李智慧. 低光照图像增强研究方法综述 [J]. 计算机工程与应用, 2023, 59(18): 14-27. PENG Daxin, ZHEN Tong, LI Zhihui. Survey of Re search Methods for Low Light Image Enhancement[J]. Computer Engineering and Applications, 2023, 59(18): 14-27.
[10] Ha Q, Watanabe K, Karasawa T, et al. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes[C]//2017 IEEE/RSJ Interna tional Conference on Intelligent Robots and Systems (IROS). IEEE, 2017: 5108-5115.
[11] Shen Z, Wang J, Weng Y, et al. ECFNet: efficient cross-layer fusion network for real time RGB-thermal urban scene parsing[J]. Digital Signal Processing, 2024, 151: 104579.
[12] Sun Y, Zuo W, Liu M. RTFNet: RGB-thermal fusion network for semantic segmentation of urban scenes[J]. IEEE Robotics and Automation Letters, 2019, 4(3): 2576-2583.
[13] Sun Y, Zuo W, Yun P, et al. FuseSeg: Semantic segmenta tion of urban scenes based on RGB and thermal data fu sion[J]. IEEE Transactions on Automation Science and Engineering, 2020, 18(3): 1000-1011.
[14] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]//Medical image computing and computer-assisted interven tion–MICCAI 2015: 18th international conference, Mu nich, Germany, October 5-9, 2015, proceedings, part III 18. Springer international publishing, 2015: 234-241.
[15] Zhang Q, Zhao S, Luo Y, et al. ABMDRNet: Adap tive-weighted bi-directional modality difference reduction network for RGB-T semantic segmenta tion[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 2633-2642.
[16] Deng F, Feng H, Liang M, et al. FEANet: Fea ture-enhanced attention network for RGB-thermal re al-time semantic segmentation[C]//2021 IEEE/RSJ inter national conference on intelligent robots and systems (IROS). IEEE, 2021: 4467-4473.
[17] Zhou W, Liu J, Lei J, et al. GMNet: Graded-feature mul tilabel-learning network for RGB-thermal urban scene semantic segmentation[J]. IEEE Transactions on Image Processing, 2021, 30: 7790-7802.
[18] Misra D, Nalamada T, Arasanipalai A U, et al. Rotate to attend: Convolutional triplet attention mod ule[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2021: 3139-3148.
[19] Lan X, Gu X, Gu X. MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation[J]. Ap plied Intelligence, 2022, 52(5): 5817-5829.
[20] Wang Y, Lu T, Yao Y, et al. Learning to hallucinate face in the dark[J]. IEEE Transactions on Multimedia, 2023, 26: 2314-2326.
[21] Zhou Z, Wu S, Zhu G, et al. Channel and spatial rela tion-propagation network for RGB-thermal semantic segmentation[J]. arXiv preprint arXiv:2308.12534, 2023.
[22] Feng Z, Guo Y, Sun Y. CEKD: Cross-modal edge-privileged knowledge distillation for semantic scene understanding using only thermal images[J]. IEEE Ro botics and Automation Letters, 2023, 8(4): 2205-2212.
[23] Li P, Chen J, Lin B, et al. Residual spatial fusion network for RGB-thermal semantic segmentation[J]. Neurocom puting, 2024, 595: 127913.
[24] Finder S E, Amoyal R, Treister E, et al. Wavelet convolu tions for large receptive fields[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 363-380.
[25] Shivakumar S S, Rodrigues N, Zhou A, et al. Pst900: Rgb-thermal calibration, dataset and segmentation net work[C]//2020 IEEE international conference on robotics and automation (ICRA). IEEE, 2020: 9441-9447.
[26] Zhao S, Zhang Q. A feature divide-and-conquer network for RGB-T semantic segmentation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 33(6): 2892-2905.
[27] Zhou W, Dong S, Xu C, et al. Edge-aware guidance fu sion network for rgb–thermal scene pars ing[C]//Proceedings of the AAAI conference on artificial intelligence. 2022, 36(3): 3571-3579.
[28] Zhou W, Dong S, Lei J, et al. MTANet: Multitask-aware network with hierarchical multimodal fusion for RGB-T urban scene understanding[J]. IEEE Transactions on In telligent Vehicles, 2022, 8(1): 48-58.
[29] Xu C, Li Q, Jiang X, et al. Dual-space graph-based inter action network for RGB-thermal semantic segmentation in electric power scene[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 33(4): 1577-1592.
[30] Yi, Shi, et al. "CCAFFMNet: Dual-spectral semantic segmentation network with channel-coordinate attention feature fusion module." Neurocomputing 482 (2022): 236-251.
[31] Wu W, Chu T, Liu Q. Complementarity-aware cross-modal feature fusion network for RGB-T semantic segmentation[J]. Pattern Recognition, 2022, 131: 108881.
[32] Zhou W, Lv Y, Lei J, et al. Embedded control gate fusion and attention residual learning for RGB–thermal urban scene parsing[J]. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(5): 4794-4803.
[33] Li G, Wang Y, Liu Z, et al. RGB-T Semantic Segmenta tion With Location, Activation, and Sharpening[J]. IEEE Transactions on Circuits and Systems for Video Technol ogy, 2023, 33(3): 1223-1235.
[34] Li P, Chen J, Lin B, et al. Residual spatial fusion network for RGB-thermal semantic segmentation[J]. Neurocom puting, 2024, 595: 127913.
[35] Lv Y, Liu Z, Li G. Context-aware interaction network for rgb-t semantic segmentation[J]. IEEE Transactions on Multimedia, 2024, 26: 6348-6360.

选择文件类型/文献管理软件名称

选择包含的内容