Fisheye Image Segmentation with Adaptive Sampling and Edge Enhancement

doi:10.19678/j.issn.1000-3428.0253316

Abstract

Abstract: High-precision semantic segmentation enables autonomous vehicles to obtain detailed environmental perception. To address the limitations of traditional methods on fisheye images, such as poor edge segmentation, low accuracy, and insufficient training data, we propose RSCAMamba, a model specifically designed for fisheye image segmentation. A zoom augmentation method is employed to transform standard datasets into fisheye datasets, allowing effective modeling of fisheye distortions and ensuring robustness across diverse scenarios. RSCAMamba first adopts a Swin Transformer encoder to capture global feature representations. Second, we propose the restricted spatial-channel attention module. By integrating one-dimensional and two-dimensional restricted deformable convolutions, the module adaptively models distortion-aware nonlinear features and effectively captures anisotropic deformations. Consequently, it provides more accurate representations of strip-like structures and irregular edges. In addition, a channel reduced and edge increased module further enhances edge details, alleviating distortion-induced degradation. Finally, the Mamba module fuses global features, captures long-range dependencies, and reduces redundancy across scales. This helps the model detect complete objects and preserve spatial continuity. Experimental results indicate that, compared with Mask2Former, RSCAMamba achieves a 1.88% improvement in mIoU on the WoodScape public dataset and a 3.30% improvement on the CityScapesFisheye synthetic dataset, demonstrating superior segmentation performance.

摘要： 高精度的语义分割技术能为自动驾驶车辆提供详尽的环境感知信息。针对传统语义分割方法在鱼眼图像中存在的边缘分割效果差、整体精度低以及训练数据缺乏的问题，提出了一种专用于鱼眼图像语义分割的模型RSCAMamba，并基于变焦增强方法，将普通图像数据集转换为鱼眼图像数据集，旨在有效捕捉鱼眼图像的畸变特征、提升模型的准确性，同时在不同场景下验证模型的鲁棒性。方法首先采用Swin Transformer作为编码器，以准确地建模输入数据的全局特征表示；其次，提出了受限空间通道注意力模块，通过引入一维和二维的受限可变形卷积，在自适应地捕获各向异性的畸变的同时，实现了基于畸变信息的非线性特征建模，从而更准确地刻画条状物体与不规则边缘；此外，采用通道收缩与边缘扩展模块进一步增强图像的边缘细节，缓解因畸变导致的边缘分割性能退化；最后，采用Mamba模块以实现全局特征融合，在捕捉长程依赖关系的同时减少多尺度特征中的冗余信息，使模型能够准确识别完整物体并保持区域空间的连续性。实验结果显示，与Mask2Former相比，RSCAMamba的关键性能指标mIoU在WoodScape公开数据集上提升了1.88%，在CityScapesFisheye合成数据集上提升了3.30%，具有较优的分割性能。

Dawei Zhang, Kangbo Kou, Yi Liu, Wei Guo, Yang Yu. Fisheye Image Segmentation with Adaptive Sampling and Edge Enhancement[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0253316.

张大伟, 寇康博, 刘意, 郭威, 于洋. 自适应采样协同边缘增强的鱼眼图像分割方法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0253316.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0253316

References

[1] 包俊, 刘宏哲, 褚文博. 环视鱼眼图像处理深度学习研究进展[J]. 中国图象图形学报, 2021, 26(12): 2778-2799. BAO J, LIU H Z, CHU W B. Research progress of fisheye image processing based on deep learning[J]. Journal of Image and Graphics, 2021, 26(12): 2778-2799. (in Chinese)
[2] 江云峰, 罗敏, 何红星, 等. 鱼眼镜头的研究进展及应用[J]. 红外技术, 2023, 45(4): 342-351. JIANG Y F, LUO M, HE H X, et al. Research progress and application of fisheye lenses[J]. Infrared Technology, 2023, 45(4): 342-351. (in Chinese)
[3] ZHANG Y C, ZHOU S J. Study on methods for fish-eye image correction based on spherical projection model[C]//International Conference on Frontiers of Manufacturing Science and Measuring Technology (FMSMT). Taiyuan: Atlantis, 2017: 848-854.
[4] LEE M, KIM H, PAIK J. Correction of barrel distortion in fisheye lens images using image-based estimation of distortion parameters[J]. IEEE Access, 2019, 7: 45723-45733.
[5] DENG L Y, YANG M, QIAN Y Q, et al. CNN based semantic segmentation for urban traffic scenes using fisheye camera[C]//IEEE Intelligent Vehicles Symposium Proceedings (IVSP). Redondo: IEEE, 2017: 231-236.
[6] DENG L Y, YANG M, LI H, et al. Restricted deformable convolution-based road scene semantic segmentation using surround view cameras[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 21(10): 4350-4362.
[7] JIANG J Z, XU C, LIU H Z, et al. DSA: deformable segmentation attention for multi-scale fisheye image segmentation[J]. Electronics, 2023, 12(19): 4059.
[8] 刘岗顶, 王玲军, 张玉常, 等. 基于改良MaskFormer的鱼眼相机天空图像分割方法[J]. 科学技术创新, 2025, (16): 223-228. LIU G D, WANG L J, ZHANG Y C, et al. Sky image segmentation for fisheye cameras using an improved MaskFormer[J]. Scientific and Technological Innovation, 2025, (16): 223-228. (in Chinese)
[9] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2018: 7132-7141.
[10] 谭镭. 面向地面无人驾驶的语义分割方法及模型实现[D]. 南京: 南京理工大学, 2020. TAN L. Semantic segmentation approaches and model implementation for ground-based autonomous driving[D]. Nanjing: Nanjing University of Science and Technology, 2020. (in Chinese)
[11] PAUL S, PATTERSON Z, BOUGUILA N. FishSegSSL: a semi-supervised semantic segmentation framework for fish-eye images[J]. Journal of Imaging, 2024, 10(3): 71-85.
[12] CARLSSON O, GERKEN J E, LINANDER H, et al. Heal-swin: a vision Transformer on the sphere[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 6067-6077.
[13] LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]//IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 9992-10002.
[14] ELJURDI R, SEKKAT A R, DUPUIS Y, et al. Fully residual Unet-based semantic segmentation of automotive fisheye images: a comparison of rectangular and deformable convolutions[J]. Multimedia Tools and Applications, 2024, 83(13): 40269-40291.
[15] MANZOOR A, MOHANDAS R, SCANLAN A, et al. A comparison of spherical neural networks for surround-view fisheye image semantic segmentation[J]. IEEE Open Journal of Vehicular Technology, 2025, 6: 717-740.
[16] KUMAR V R, YOGAMANI S, RASHED H, et al. OmniDet: surround view cameras based multi-task visual perception network for autonomous driving[J]. IEEE Robotics and Automation Letters, 2021, 6(2): 2830-2837.
[17] DAO T, GU A. Transformers are SSMs: generalized models and efficient algorithms through structured state space duality[C]//International Conference on Machine Learning (ICML). Vienna: ML Research, 2024: 10041-10071.
[18] YOGAMANI S, HUGHES C, HORGAN J, et al. WoodScape: a multi-task, multi-camera fisheye dataset for autonomous driving[C]//IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 9308-9318.
[19] Ye Y Z, YANG K L, XIANG K T, et al. Universal semantic segmentation for fisheye urban driving images[C]//IEEE International Conference on Systems, Man, and Cybernetics (SMC). Piscataway: IEEE, 2020: 648-655.
[20] CORDTS M, OMRAN M, RAMOS S, et al. The Cityscapes dataset for semantic urban scene understanding[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 3213-3223.
[21] WOO S, PARK J, LEE J Y, et al. Cbam: convolutional block attention module[C]//European Conference on Computer Vision (ECCV). Munich: Springer, 2018: 3-19.
[22] MSegmentation Contributors. MMSegmentation: openMMLab semantic segmentation toolbox and benchmark[EB/OL]. [2025-03-06]. https://github.com/open-mmlab/mmsegmentation.
[23] EVERINGHAM M, ESLAMI S M A, VAN G L, et al. The pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 2014, 111(1): 98-136.
[24] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2017: 2881-2890.
[25] HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2016: 770-778.
[26] XIAO T T, LIU Y C, ZHOU B L, et al. Unified perceptual parsing for scene understanding[C]//European Conference on Computer Vision (ECCV). Cham, Switzerland: Springer, 2018: 432-448.
[27] CHU X X, TIAN Z, WANG Y Q, et al. Twins: revisiting the design of spatial attention in vision transformers[J]. Advances in Neural Information Processing Systems, 2021, 34: 9355-9366.
[28] YUAN Y, CHEN X, WANG J. Object-contextual representations for semantic segmentation[C]//European Conference on Computer Vision (ECCV). Cham, Switzerland: Springer, 2020: 173-190.
[29] HUANG Z L, WANG X G, WEI Y C, et al. CCNet: criss-cross attention for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 45(6): 6896-6908.
[30] CHENG B, MISRA I, SCHWING A G, et al. Masked-attention mask Transformer for universal image segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2022: 1280-1289.
[31] GUO M H, LU C Z, HOU Q, et al. SegNeXt: rethinking convolutional attention design for semantic segmentation[J]. Advances in Neural Information Processing Systems, 2022, 35: 1140-1156.
[32] HATAMIZADEH A, KAUTZ J. Mambavision: a hybrid mamba-transformer vision backbone[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2025: 25261-25270.
[33] YEOM S K, VON K J. U-MixFormer: UNet-like Transformer with mix-attention for efficient semantic segmentation[C]//IEEE Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE, 2025: 7721-7730.
[34] WU Y H, ZHANG S C, LIU Y, et al. Low-resolution self-attention for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025, 47(9): 8180-8192.
[35] LI X, ZHONG Z S, WU J L, et al. Expectation maximization attention networks for semantic segmentation[C]//IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 9166-9175.
[36] HE J J, DENG Z Y, QIAO Y. Dynamic multi-scale filters for semantic segmentation[C]//IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2019: 3561-3571.
[37] STRUDEL R, GARCIA R, LAPTEV I, et al. Segmenter: Transformer for semantic segmentation[C]//IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE, 2021: 7242-7252.
[38] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: transformers for image recognition at scale[C]//International Conference on Learning Representations (ICLR). Addis Ababa: ICLR, 2021.
[39] FAN Q H, HUANG H B, CHEN M R, et al. Rmt: retentive networks meet vision transformers[C]//IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE, 2024: 5641-5651.

Please choose a citation manager

Content to export