作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 216-223. doi: 10.19678/j.issn.1000-3428.0066941

• 图形图像处理 • 上一篇    下一篇

基于感知增强Swin Transformer的遥感图像检测

祝冰艳1, 陈志华1,*(), 盛斌2   

  1. 1. 华东理工大学信息科学与工程学院, 上海 200237
    2. 上海交通大学电子信息与电气工程学院, 上海 200240
  • 收稿日期:2023-02-15 出版日期:2024-01-15 发布日期:2024-01-16
  • 通讯作者: 陈志华
  • 基金资助:
    国家自然科学基金(62272164); 空间智能控制技术实验室开放基金(HTKJ2022KL502010)

Remote Sensing Image Detection Based on Perceptually Enhanced Swin Transformer

Bingyan ZHU1, Zhihua CHEN1,*(), Bin SHENG2   

  1. 1. College of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
    2. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • Received:2023-02-15 Online:2024-01-15 Published:2024-01-16
  • Contact: Zhihua CHEN

摘要:

随着遥感技术的快速发展,遥感图像检测被广泛应用于农业、军事、国防安全等众多领域。遥感图像相较于传统图像检测存在诸多难点,如何实现高效精准的遥感图像检测成为该领域的研究热点。针对遥感图像检测中存在的计算复杂度高、正负样本不平衡、目标尺度小等问题,提出一种基于感知增强Swin Transformer的遥感图像检测网络,以提升遥感图像检测性能。在主干网络设计过程中,利用Swin Transformer分层设计和移动窗口的优点有效减小计算复杂度,同时插入空间局部感知块,加强网络对局部相关性和结构信息的提取能力。设计区域分布回归损失为小目标分配更大的权重,解决了正负样本不平衡的问题,同时结合改进的IoU-aware分类损失消除不同分支之间的差距,降低分类和回归损失。在公共遥感数据集DOTA上的多组实验结果表明,该网络获得了78.47%的平均精度均值和10.8帧/s的检测速度,检测性能优于经典的目标检测网络Faster R-CNN、Mask R-CNN以及现有优秀的遥感图像检测网络,并且在各类不同尺度的目标上均具有较好的性能表现。

关键词: 遥感图像, 目标检测, Swin Transformer, 多尺度特征, 深度学习

Abstract:

Owing to the rapid development of remote sensing technology, remote sensing image detection technology is being used extensively in agriculture, military, national defense security, and other fields. Compared with conventional images, remote sensing images are more difficult to detect; therefore, researchers have endeavored to detect remote sensing images efficiently and accurately. To address the high calculation complexity, large-scale range variation, and scale imbalance of remote sensing images, this study proposes a perceptually enhanced Swin Transformer network, which improves the detection of remote sensing images. Exploiting the hierarchical design and shift windows of the basic Swin Transformer, the network inserts spatial local perceptually blocks into each stage, thus enhancing local feature extraction while negligibly increasing the calculation amount. An area-distributed regression loss is introduced to assign larger weights to small objects for solving scale imbalance; additionally, the network is combined with an improved IoU-aware classification loss to eliminate the discrepancy between different branches and reduce the loss of classification and regression. Experimental results on the public dataset DOTA show that the proposed network yields a mean Average Precision(mAP) of 78.47% and a detection speed of 10.8 frame/s, thus demonstrating its superiority over classical object detection networks(i.e., Faster R-CNN and Mask R-CNN) and existing excellent remote sensing image detection networks. Additionally, the network performs well on all types of objects at different scales.

Key words: remote sensing image, object detection, Swin Transformer, multi-scale feature, deep learning