Domain Adaptive Remote Sensing Image Segmentation Based on Hierarchical Attention

doi:10.19678/j.issn.1000-3428.0070162

Abstract

Abstract:

Remote sensing semantic image segmentation technology has significant applications in resource management, natural disaster management, and environmental monitoring and protection. However, different remote sensing image datasets often exhibit issues such as spectral confusion between different objects and spectral variations within the same object. These issues significantly reduce the generalization performance of deep learning models, and cross-domain performance degradation in remote sensing semantic image segmentation algorithms poses a significant challenge. To address these issues, optimizations are performed from two perspectives: neural network architecture and domain adaptation strategies. First, a TransConv network based on a hierarchical multihead self-attention mechanism and multiscale feature fusion is proposed. This network effectively enhances feature extraction and fusion capabilities through sliding window patching, multilayer self-attention modules, and a lightweight feedforward neural network, thereby improving the model's generalization performance. Second, a self-training-based domain adaptation technique is introduced, which optimizes the image input, model parameters, and learning process. As a result, labeled source domain knowledge is successfully transferred to the unlabeled target domain, significantly improving the segmentation performance in the target domain. Experimental results demonstrate that the improved TransConv network significantly outperforms other algorithms in terms of generalization performance. In addition, it excels in domain adaptation tasks with the self-training-based domain adaptation technique. The proposed approach thus enhances the accuracy and generalization capability of remote sensing image semantic segmentation, reduces the impact of erroneous pseudo-labels, and addresses the class imbalance problem, providing more reliable technical support for practical applications.

Key words: remote sensing image, Convolutional Neural Network (CNN), Transformer network, hierarchical attention, domain adaptive

摘要：

遥感图像语义分割技术在资源管理、自然灾害管理、环境监测和保护等领域具有重要应用价值, 然而不同的遥感图像数据集往往存在大量的异物同谱和同物异谱等现象, 极大地降低了深度学习模型的泛化性能, 同时遥感图像语义分割算法中存在跨域预测性能下降的问题。为了解决上述问题, 从神经网络模型架构和域自适应策略两个方面进行优化。首先, 提出了基于层级多头自注意力机制与多尺度特征融合的TransConv网络, 通过滑动窗口切块、多层自注意力模块和轻量前馈神经网络, 有效提升特征提取和融合的能力, 从而增强模型的泛化性能。其次, 提出一种基于自训练的域自适应技术, 该技术通过优化图像输入、模型参数和学习过程, 将带标注的源域知识成功迁移至未标注的目标域, 大幅提高了目标域的分割性能。实验结果表明, 改进后的TransConv网络不仅在泛化性能上显著优于其他算法, 基于自训练的域自适应技术也在域自适应任务中表现出色, 提升了遥感图像语义分割的准确性和泛化能力, 减少了错误伪标签的影响和解决了类不平衡问题, 为实际应用提供了更为可靠的技术支持。

关键词: 遥感图像, 卷积神经网络, Transformer网络, 层级注意力, 域自适应

WANG Shasha, LI Weitao, LIU Xingyu, GAO Hui. Domain Adaptive Remote Sensing Image Segmentation Based on Hierarchical Attention[J]. Computer Engineering, 2026, 52(4): 176-186.

王沙沙, 李帷韬, 刘星宇, 高辉. 基于层级注意力的域自适应遥感图像分割[J]. 计算机工程, 2026, 52(4): 176-186.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0070162

https://www.ecice06.com/EN/Y2026/V52/I4/176

Figures/Tables 14

Fig.1 Overall architecture of TransConv model

Fig.2 Overlapping slice embedding procedure

Fig.3 Feedforward neural network structure

Fig.4 Depth-separable convolution process

Fig.5 Multi-scale feature fusion structure

Fig.6 Self-training-based domain adaptation technology framework

Fig.7 Potsdam dataset image and label data

Fig.8 Vaihingen dataset image and label data

Fig.9 Visualization results of Potsdam predicting Vaihingen

Fig.10 Visualization results of Vaihinge predicting Potsdam

References 33

1	KRIZHEVSKY A , SUTSKEVER I , HINTON G E . ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60 (6): 84- 90. doi: 10.1145/3065386
2	LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 3431-3440.
3	RONNEBERGER O , FISCHER P , BROX T . U-Net: convolutional networks for biomedical image segmentation. Berlin, Germany: Springer, 2015.
4	BADRINARAYANAN V , KENDALL A , CIPOLLA R . SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39 (12): 2481- 2495. doi: 10.1109/TPAMI.2016.2644615
5	CHEN L C , PAPANDREOU G , KOKKINOS I , et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (4): 834- 848. doi: 10.1109/TPAMI.2017.2699184
6	ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE Press, 2017: 2881-2890.
7	CHAURASIA A, CULURCIELLO E. LinkNet: exploiting encoder representations for efficient semantic segmentation[C]//Proceedings of the IEEE Visual Communications and Image Processing. Petersburg, USA: IEEE Press, 2017: 1-4.
8	FU J, LIU J, TIAN H J, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE Press, 2019: 3146-3154.
9	ZHOU Z , SIDDIQUEE M M R , TAJBAKHSH N , et al. UNet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging, 2020, 39 (6): 1856- 1867. doi: 10.1109/TMI.2019.2959609
10	XIE E, WANG W H, YU Z D, et al. SegFormer: simple and efficient design for semantic segmentation with transformers[C]//Proceedings of the Advances in Neural Information Processing Systems. Cambridge, USA: MIT Press, 2021: 12077-12090.
11	ZHENG S X, LU J C, ZHAO H S, et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, USA: IEEE Press, 2021: 6881-6890.
12	CHEN J N, LU Y, YU Q, et al. Transunet: transformers make strong encoders for medical image segmentation[EB/OL]. [2024-06-10]. https://arxiv.org/abs/2102.04306.
13	DOSOVITSKIY A, BEYER L, KOLSSNIKOV A, et al. Animage is worth 16×16 words: transformers for image recognition at scale[C]//Proceedings of the 9th International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2021: 332-345.
14	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE Press, 2016: 770-778.
15	祝冰艳, 陈志华, 盛斌. 基于感知增强Swin Transformer的遥感图像检测. 计算机工程, 2024, 50 (1): 216- 223. doi: 10.19678/j.issn.1000-3428.0066941
	ZHU B Y , CHEN Z H , SHENG B . Remote sensing image detection based on perceptually enhanced Swin Transformer. Computer Engineering, 2024, 50 (1): 216- 223. doi: 10.19678/j.issn.1000-3428.0066941
16	WANG W H, XIE E Z, LI X, et al. Pyramid vision transformer: a versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2021: 568-578.
17	王富平, 刘鸿玮, 张锲石, 等. 基于深度特征抑制的遮挡人脸识别网络. 计算机工程, 2024, 50 (8): 259- 269. doi: 10.19678/j.issn.1000-3428.0068289
	WANG F P , LIU H W , ZHANG Q S , et al. Occluded face recognition network based on deep feature suppression. Computer Engineering, 2024, 50 (8): 259- 269. doi: 10.19678/j.issn.1000-3428.0068289
18	郭伟, 王欣哲, 王江达, 等. 基于卷积调制与空间协作的水下图像增强. 计算机工程, 2024, 50 (8): 310- 318. doi: 10.19678/j.issn.1000-3428.0067883
	GUO W , WANG X Z , WANG J D , et al. Underwater image enhancement based on convolutional modulation and spatial collaboration. Computer Engineering, 2024, 50 (8): 310- 318. doi: 10.19678/j.issn.1000-3428.0067883
19	梁敏, 汪西莉. 结合超分辨率和域适应的遥感图像语义分割方法. 计算机学报, 2022, 45 (12): 2619- 2636. doi: 10.11897/SP.J.1016.2022.02619
	LIANG M , JIANG X L . Semantic segmentation model for remote sensing images combing super resolution and domain in adaption. Chinese Journal of Computers, 2022, 45 (12): 2619- 2636. doi: 10.11897/SP.J.1016.2022.02619
20	胡清翔, 饶文碧, 熊盛武. 面向无人机遥感场景的轻量级小目标检测算法. 计算机工程, 2023, 49 (12): 169- 177. doi: 10.19678/j.issn.1000-3428.0066677
	HU Q X , RAO W B , XIONG S W . Lightweight small object detection algorithm for UAV remote sensing scene. Computer Engineering, 2023, 49 (12): 169- 177. doi: 10.19678/j.issn.1000-3428.0066677
21	杨敏航, 陈龙, 刘慧, 等. 基于图卷积网络的多标签遥感图像分类. 计算机应用研究, 2021, 38 (11): 3439- 3445. doi: 10.19734/j.issn.1001-3695.2021.04.0153
	YANG M H , CHEN L , LIU H , et al. Multi-label remote sensing image classification based on graph convolutional network. Application Research of Computers, 2021, 38 (11): 3439- 3445. doi: 10.19734/j.issn.1001-3695.2021.04.0153
22	何青, 孟洋洋, 李华智. 多层次编码-解码网络遥感图像建筑物分割. 计算机应用研究, 2021, 38 (8): 2510- 2514. doi: 10.19734/j.issn.1001-3695.2020.09.0394
	HE Q , MENG Y Y , LI H Z . Multi-level encoding and decoding network remote sensing image building segmentation. Application Research of Computers, 2021, 38 (8): 2510- 2514. doi: 10.19734/j.issn.1001-3695.2020.09.0394
23	LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[EB/OL]. [2024-06-10]. https://arxiv.org/abs/1711.05101.
24	TRANHEDEN W, OLSSON V, PINTO J, et al. DACS: domain adaptation via cross-domain mixed sampling[C]//Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Washington D. C., USA: IEEE Press, 2021: 1378-1388.
25	HUANG Z L, WANG X G, HUANG L C, et al. CCNet: criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2019: 603-612.
26	TARVAINEN A, VALPOLA H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results[C]//Proceedings of the 5th IEEE International Conference on Learning Representations. Washington D. C., USA: IEEE Press, 2017: 564-573.
27	ZHOU H C , XIAO X L , LI H H , et al. Hybrid shunted transformer embedding UNet for remote sensing image semantic segmentation. Neural Computing and Applications, 2024, 36 (25): 15705- 15720. doi: 10.1007/s00521-024-09888-4
28	ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, USA: IEEE Press, 2017: 2242-2251.
29	TSAI Y H, HUNG W C, SCHULTER S, et al. Learning to adapt structured output space for semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE Press, 2018: 7472-7481.
30	LI Y S , SHI T , ZHANG Y J , et al. Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 175, 20- 33. doi: 10.1016/j.isprsjprs.2021.02.009
31	YI Z L, ZHANG H, TAN P, et al. DualGAN: unsupervised dual learning for image-to-image translation[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice, USA: IEEE Press, 2017: 2868-2876.
32	GAO H , ZHAO Y , GUO P , et al. Cycle and self-supervised consistency training for adapting semantic segmentation of aerial images. Remote Sensing, 2022, 14 (7): 1527. doi: 10.3390/rs14071527
33	ZHAO Y , GUO P , SUN Z H , et al. ResiDualGAN: resize-residual DualGAN for cross-domain remote sensing images semantic segmentation. Remote Sensing, 2023, 15 (5): 1428. doi: 10.3390/rs15051428

[1]	XIAO Zeqiu, LI Yong, WANG Xia. Prediction of Blood Glucose Concentration in Diabetic Patients Based on PBI-CLA Model [J]. Computer Engineering, 2026, 52(6): 382-390.
[2]	SHEN Danyang, MAI Wen. Automatic Modulation Recognition of Communication Signals Based on the ResNet-Transformer [J]. Computer Engineering, 2026, 52(5): 383-395.
[3]	WANG Shuyun, MA Tengfei, XIA Jie, YANG Zhiyong. Wi-LSM: Learning State Monitoring Method Based on Wi-Fi Signal [J]. Computer Engineering, 2026, 52(5): 172-183.
[4]	HUO Jiuyuan, LI Xin, CHANG Chen, ZHANG Yaonan. Dual-Channel Rolling Bearing Fault Diagnosis Method Based on ACNN-LFSwin Transformer [J]. Computer Engineering, 2026, 52(5): 430-444.
[5]	YANG Lu, LIU Junjie, YU Xiang. Target Detection Algorithm for Remote Sensing Images with Multi-Scale Information Enhancement [J]. Computer Engineering, 2026, 52(4): 200-213.
[6]	FU Bichao, SHENG Jie, WANG Lei. Emotion Recognition Based on Adaptive Fusion of Multiple Gait Features [J]. Computer Engineering, 2026, 52(4): 82-89.
[7]	WANG Yi, LI Zhi, ZHANG Li, SHI Xueli, LIU Dengbo, LU Yu. Frequency-Domain Quantification Adversarial Attacks Based on Remote Sensing Image Scene Classification [J]. Computer Engineering, 2026, 52(1): 266-281.
[8]	CHEN Dongji, LAI Huicheng, GAO Guxue, MA Jun, LI Junkai, QUAN Hutuo. Knowledge Distillation-based Transformer for Human-Object Interaction Detection [J]. Computer Engineering, 2026, 52(1): 206-216.
[9]	MIAO Ru, LI Yi, ZHOU Ke, ZHANG Yanna, CHANG Ranran, MENG Geng. A Study on Improved Faster R-CNN Model for Multi-Object Detection in Remote Sensing Images [J]. Computer Engineering, 2025, 51(8): 292-304.
[10]	TIAN Yinhua, YANG Lifei, HAN Dong, DU Yuyue. Conformance Checking Method of Business Processes Based on Improved BERT and Lightweight CNN [J]. Computer Engineering, 2025, 51(7): 199-209.
[11]	LIU Dage, YOU Jinguo, GENG Qiqi. Cross-Domain Aspect Term Extraction Fusing Global and Local Semantics [J]. Computer Engineering, 2025, 51(6): 116-126.
[12]	WU Chaoyu, YANG Bin. Remote Sensing Image Change Detection Based on Large Kernel Re-parameter U-Net [J]. Computer Engineering, 2025, 51(3): 261-273.
[13]	ZHANG Zhaoxin, HUANG Shize, ZHANG Bingjie, SHEN Tuo. Camouflaged Adversarial Example Generation Method for the Form of Motion Blur in Traffic Scenes [J]. Computer Engineering, 2025, 51(3): 45-53.
[14]	ZHANG Huan, WANG Chen, SHAN Jingdong, QIU Runhe. Elevator Safety Risk Prediction Based on Domain Adaptation and Attention Mechanism [J]. Computer Engineering, 2025, 51(2): 86-93.
[15]	ZHENG Jieyun, ZHANG Zhanghuang, XUAN Juqin, WEI Xin, XUE Jingwei. Intelligent Planning Method of Distribution Network Based on Knowledge Graph and Graph Convolutional Neural Network [J]. Computer Engineering, 2025, 51(11): 392-402.

Please choose a citation manager

Content to export