作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (9): 276-285. doi: 10.19678/j.issn.1000-3428.0068246

• 图形图像处理 • 上一篇    下一篇

基于Transformer的多阶段运动模糊图像修复网络

朱凯1,*(), 李理1,2, 张彤1, 江晟1, 别一鸣3   

  1. 1. 长春理工大学物理学院, 吉林 长春 130022
    2. 长春理工大学电子信息工程学院, 吉林 长春 130022
    3. 吉林大学交通学院, 吉林 长春 130022
  • 收稿日期:2023-08-17 出版日期:2024-09-15 发布日期:2024-09-04
  • 通讯作者: 朱凯
  • 基金资助:
    吉林省科技发展计划重点研发项目(20210203214SF)

Multi-Stage Motion Blur Image Restoration Network Based on Transformer

ZHU Kai1,*(), LI Li1,2, ZHANG Tong1, JIANG Sheng1, BIE Yiming3   

  1. 1. School of Physics, Changchun University of Science and Technology, Changchun 130022, Jilin, China
    2. School of Electronical and Information Engineering, Changchun University of Science and Technology, Changchun 130022, Jilin, China
    3. School of Transportation, Jilin University, Changchun 130022, Jilin, China
  • Received:2023-08-17 Online:2024-09-15 Published:2024-09-04
  • Contact: ZHU Kai

摘要:

运动模糊是导致图像退化的常见原因, 其限制了图像的可读性和后续处理效果。针对卷积网络感受野有限以及常规多阶段网络中信息丢失的问题, 提出一种基于Transformer的多阶段去模糊网络。网络采用多阶段编码器-解码器结构, 在单个阶段内和多个阶段间采用跳跃连接来增强信息的传递。首先, 高效Transformer模块采用通道注意力和深度卷积来处理图像的全局和局部信息; 其次, 多分支结构的前馈传播网络通过引入多个并行的分支, 实现了不同尺度和不同层次的特征提取和融合; 最后, 通过多阶段的残差处理实现更优的图像恢复结果。实验结果显示, 在GoPro数据集上该网络的峰值信噪比(PSNR)达到32.23 dB, 结构相似性指数(SSIM)达到0.955, 在HIDE数据集上PSNR和SSIM分别达到30.15 dB和0.930, 优于DeepDeblur、DeblurGAN-V2等模型。

关键词: 深度学习, Transformer模型, 注意力机制, 图像修复, 多尺度网络

Abstract:

Motion blur is a common cause of image degradation that limits image readability and subsequent processing. A multi-stage deblurring network based on the Transformer is proposed to address the limited receptive field of convolutional networks and information loss in conventional multi-stage networks. The network adopts a multi-stage encoder-decoder structure with skip connections within and between stages to enhance information propagation. First, an efficient Transformer module is used to process the global and local information of the image using channel attention and depthwise convolution. Second, a multi-branch feedforward network with multiple parallel branches is introduced to extract and fuse features at different scales and levels. Finally, superior image restoration results are achieved through multi-stage residual learning. Experimental results show that the proposed method achieves a Peak Signal-to-Noise Ratio (PSNR) of 32.23 dB and Structural Similarity Index Measure (SSIM) of 0.955 on the GoPro dataset, and a PSNR of 30.15 dB and SSIM of 0.930 on the HIDE dataset, demonstrating a performance superior to DeepDeblur, DeblurGAN-V2, and other models.

Key words: deep learning, Transformer model, attention mechanism, image restoration, multi-scale network