作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (9): 33-45. doi: 10.19678/j.issn.1000-3428.0068296

• 热点与综述 • 上一篇    下一篇

基于Swin Transformer的双流遥感图像时空融合超分辨率重建

王志浩1,*(), 钱沄涛2   

  1. 1. 浙江大学工程师学院, 浙江 杭州 310015
    2. 浙江大学计算机科学与技术学院, 浙江 杭州 310027
  • 收稿日期:2023-08-28 出版日期:2024-09-15 发布日期:2024-09-20
  • 通讯作者: 王志浩
  • 基金资助:
    国家自然科学基金(62071421)

Super-Resolution Reconstruction of Spatiotemporal Fusion for Dual-Stream Remote Sensing Images Based on Swin Transformer

WANG Zhihao1,*(), QIAN Yuntao2   

  1. 1. Polytechnic Institute, Zhejiang University, Hangzhou 310015, Zhejiang, China
    2. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, Zhejiang, China
  • Received:2023-08-28 Online:2024-09-15 Published:2024-09-20
  • Contact: WANG Zhihao

摘要:

遥感图像时空融合超分辨重建从高时序密度的低分辨率图像和低时序密度的高分辨率图像中提取信息, 生成同时具有高时序密度的高分辨率遥感图像, 它直接关系到后续的解译、检测、跟踪等任务的实施。随着卷积神经网络(CNN)的快速发展, 研究者们提出了一系列基于CNN的时空融合方法, 然而由于卷积的局限性, 这些方法在全局信息提取方面仍然存在不足。受Swin Transformer全局能力的启发, 提出一种基于Swin Transformer的超分辨重建模型。在特征提取阶段, 引入双流结构, 将特征提取网络分为两个部分, 分别提取时间信息与空间信息, 并通过Swin Transformer的全局能力提升模型性能。在特征融合阶段, 引入结合通道注意力与空间注意力的卷积块注意力模块(CBAM), 用于增强重要特征, 提升图像重建精度。在Coleambally灌溉区(CIA)与Gwydir下游流域(LGC)数据集上将该模型与多种时空融合超分辨率重建模型进行对比实验, 结果表明该模型在各项评价指标上均取得了最优的结果, 具有更出色的性能和更强的泛化能力。

关键词: 时空融合, 超分辨率重建, Swin Transformer算法, 双流结构, 卷积神经网络

Abstract:

The spatiotemporal fusion super-resolution reconstruction of remote sensing images extracts information from low-resolution images with high temporal density and high-resolution images with low temporal resolution to generate remote sensing images with both high temporal and spatial resolutions. This process is directly related to the implementation of subsequent tasks such as interpretation, detection, and tracking. With the rapid advancement of Convolutional Neural Network (CNN), researchers have proposed a series of CNN-based spatiotemporal fusion methods. However, because of the inherent limitations of convolution operations, these methods still face challenges with respect to global information extraction. Inspired by the global modeling capabilities of the Swin Transformer, this paper proposes a super-resolution reconstruction model based on the Swin Transformer. In the feature extraction stage, a dual-stream structure is introduced, dividing the feature extraction network into two parts to extract temporal and spatial information separately. The performance of the model is enhanced by the global capabilities of the Swin Transformer. In the feature fusion stage, a Convolutional Block Attention Module (CBAM) that combines channel and spatial attention is introduced to enhance the important features and improve the image reconstruction accuracy. Comparative experiments are conducted on the Coleambally Irrigation Area (CIA) and Lower Gwydir Catchment (LGC) datasets using various spatiotemporal fused super-resolution reconstruction models. The results show that the proposed model achieved optimal performance across all evaluation metrics, demonstrating superior performance and enhanced generalization capabilities.

Key words: spatiotemporal fusion, super-resolution reconstruction, Swin Transformer algorithm, dual-stream structure, Convolutional Neural Network(CNN)