作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (9): 226-233, 245. doi: 10.19678/j.issn.1000-3428.0065513

• 图形图像处理 • 上一篇    下一篇

基于Transformer和多尺度CNN的图像去模糊

李现国1,2, 李滨1   

  1. 1. 天津工业大学 电子与信息工程学院, 天津 300387
    2. 天津市光电检测技术与系统重点实验室, 天津 300387
  • 收稿日期:2022-08-15 出版日期:2023-09-15 发布日期:2023-09-14
  • 作者简介:

    李现国(1981—),男,教授、博士,主研方向为智能信息处理、光电检测

    李滨,硕士研究生

  • 基金资助:
    天津市重点研发计划科技支撑重点项目(18YFZCGX00930)

Image Deblurring Based on Transformer and Multi-scale CNN

Xianguo LI1,2, Bin LI1   

  1. 1. School of Electronics and Information Engineering, Tiangong University, Tianjin 300387, China
    2. Tianjin Key Laboratory of Photoelectric Detection Technology and System, Tianjin 300387, China
  • Received:2022-08-15 Online:2023-09-15 Published:2023-09-14

摘要:

卷积神经网络(CNN)单独应用于图像去模糊时感受野受限,Transformer能有效缓解这一问题但计算复杂度随输入图像空间分辨率的增加呈2次方增长。为此,提出一种基于Transformer和多尺度CNN的图像去模糊网络(T-MIMO-UNet)。利用多尺度CNN提取空间特征,并嵌入Transformer全局特性捕获远程像素信息。设计局部增强Transformer模块、局部多头自注意力计算网络和增强前馈网络,采用窗口的方式进行局部逐块多头自注意力计算,通过增加深度可分离卷积层,加强不同窗口之间的信息交互。在GoPro测试数据集上的实验结果表明,T-MIMO-UNet的峰值信噪比相比于MIMO-UNet、DeepDeblur、DeblurGAN、SRN网络分别提升了0.39 dB、2.89 dB、3.42 dB、1.86 dB,参数量相比于MPRNet减少了1/2,能有效解决动态场景下的图像模糊问题。

关键词: 图像去模糊, 多尺度卷积神经网络, Transformer编码器, 多头自注意力, 增强前馈网络

Abstract:

Convolutional Neural Network(CNN) has limitations when applied solely to image deblurring tasks with restricted receptive fields.Transformer can effectively mitigate these limitations.However, the computational complexity increases quadratically as the spatial resolution of the input image increases.Therefore, this study proposes an image deblurring network based on Transformer and multi-scale CNN called T-MIMO-UNet. The multi-scale CNN is used to extract spatial features while the global feature of the Transformer is employed to capture remote pixel information.The local enhanced Transformer module, local Multi-Head Self-Attention(MHSA) computing network, and Enhanced Feed-Forward Network(EFFN) are designed.The block-by-block MHSA computation is performed using a windowing approach. The information interaction between different windows is enhanced by increasing the depth of the separable convolution layer.The results of the experiment conducted using the GoPro test dataset demonstrate that the Peak Signal-to-Noise Ratio(PSNR) of the T-MIMO-UNet increases by 0.39 dB, 2.89 dB, 3.42 dB, and 1.86 dB compared to the MIMO-UNet, DeepDeblur, DeblurGAN, and SRN networks, respectively.Additionally, the number of parameters is reduced by 1/2 compared to MPRNet.These findings prove that the T-MIMO-UNet effectively addresses the challenge of image blurring in dynamic scenes.

Key words: image deblurring, multi-scale Convolutional Neural Network(CNN), Transformer encoder, Multi-Head Self-Attention(MHSA), Enhanced Feed-Forward Network(EFFN)