基于Transformer和多尺度CNN的图像去模糊

doi:10.19678/j.issn.1000-3428.0065513

摘要/Abstract

摘要：

卷积神经网络(CNN)单独应用于图像去模糊时感受野受限，Transformer能有效缓解这一问题但计算复杂度随输入图像空间分辨率的增加呈2次方增长。为此，提出一种基于Transformer和多尺度CNN的图像去模糊网络(T-MIMO-UNet)。利用多尺度CNN提取空间特征，并嵌入Transformer全局特性捕获远程像素信息。设计局部增强Transformer模块、局部多头自注意力计算网络和增强前馈网络，采用窗口的方式进行局部逐块多头自注意力计算，通过增加深度可分离卷积层，加强不同窗口之间的信息交互。在GoPro测试数据集上的实验结果表明，T-MIMO-UNet的峰值信噪比相比于MIMO-UNet、DeepDeblur、DeblurGAN、SRN网络分别提升了0.39 dB、2.89 dB、3.42 dB、1.86 dB，参数量相比于MPRNet减少了1/2，能有效解决动态场景下的图像模糊问题。

关键词: 图像去模糊, 多尺度卷积神经网络, Transformer编码器, 多头自注意力, 增强前馈网络

Abstract:

Convolutional Neural Network(CNN) has limitations when applied solely to image deblurring tasks with restricted receptive fields.Transformer can effectively mitigate these limitations.However, the computational complexity increases quadratically as the spatial resolution of the input image increases.Therefore, this study proposes an image deblurring network based on Transformer and multi-scale CNN called T-MIMO-UNet. The multi-scale CNN is used to extract spatial features while the global feature of the Transformer is employed to capture remote pixel information.The local enhanced Transformer module, local Multi-Head Self-Attention(MHSA) computing network, and Enhanced Feed-Forward Network(EFFN) are designed.The block-by-block MHSA computation is performed using a windowing approach. The information interaction between different windows is enhanced by increasing the depth of the separable convolution layer.The results of the experiment conducted using the GoPro test dataset demonstrate that the Peak Signal-to-Noise Ratio(PSNR) of the T-MIMO-UNet increases by 0.39 dB, 2.89 dB, 3.42 dB, and 1.86 dB compared to the MIMO-UNet, DeepDeblur, DeblurGAN, and SRN networks, respectively.Additionally, the number of parameters is reduced by 1/2 compared to MPRNet.These findings prove that the T-MIMO-UNet effectively addresses the challenge of image blurring in dynamic scenes.

Key words: image deblurring, multi-scale Convolutional Neural Network(CNN), Transformer encoder, Multi-Head Self-Attention(MHSA), Enhanced Feed-Forward Network(EFFN)

李现国, 李滨. 基于Transformer和多尺度CNN的图像去模糊[J]. 计算机工程, 2023, 49(9): 226-233, 245.

Xianguo LI, Bin LI. Image Deblurring Based on Transformer and Multi-scale CNN[J]. Computer Engineering, 2023, 49(9): 226-233, 245.

http://www.ecice06.com/CN/Y2023/V49/I9/226

图/表 10

图1 T-MIMO-UNet结构

Fig.1 Structure of the T-MIMO-UNet

图2 局部增强Transformer模块结构

Fig.2 Structure of the enhanced local Transformer module

图3 局部多头自注意力计算网络结构

Fig.3 Structure of the local multi-head self-attention network

图4 增强前馈网络结构

Fig.4 Structure of the enhanced feed-forward network

图5 在GoPro测试数据集上的去模糊效果

Fig.5 Deblurring effects on the GoPro test dataset

参考文献 33

1	LOU Y F, BERTOZZI A L, SOATTO S. Direct sparse deblurring. Journal of Mathematical Imaging and Vision, 2011, 39 (1): 1- 12. doi: 10.1007/s10851-010-0220-8
2	KRISHNAN D, TAY T, FERGUS R. Blind deconvolution using a normalized sparsity measure[C]//Proceedings of CVPR'11. Washington D. C., USA: IEEE Press, 2011: 233-240.
3	KOTERA J, ŠROUBEK F, MILANFAR P. Blind deconvolution using alternating maximum a posteriori estimation with heavy-tailed priors[C]//Proceedings of International Conference on Computer Analysis of Images and Patterns. Berlin, Germany: Springer, 2013: 59-66.
4	LEVIN A, WEISS Y, DURAND F, et al. Efficient marginal likelihood optimization in blind deconvolution[C]//Proceedings of CVPR'11. Washington D. C., USA: IEEE Press, 2011: 2657-2664.
5	BABACAN S D, MOLINA R, DO M N, et al. Bayesian blind deconvolution with general sparse image priors[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2012: 341-355.
6	王俊芝, 玉振明. 基于LMS自适应算法的图像去模糊研究. 计算机工程, 2012, 38 (17): 226- 231. URL
	WANG J Z, YU Z M. Research on image debluring based on adaptive least mean square algorithm. Computer Engineering, 2012, 38 (17): 226- 231. URL
7	CHAKRABARTI A. A neural approach to blind motion deblurring[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2016: 221-235.
8	缪斯, 祝永新. 针对图像盲去模糊的可微分神经网络架构搜索方法. 计算机工程, 2021, 47 (9): 313- 320. URL
	MIAO S, ZHU Y X. Differentiable neural architecture search method for blind image deblurring. Computer Engineering, 2021, 47 (9): 313- 320. URL
9	HRADIŠ M, KOTERA J, ZEMČÍK P, et al. Convolutional neural networks for direct text deblurring[C]//Proceedings of British Machine Vision Conference. Swansea, UK: British Machine Vision Association, 2015: 1-10.
10	SCHULER C J, HIRSCH M, HARMELING S, et al. Learning to deblur. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38 (7): 1439- 1451.
11	JIAN S, CAO W F, XU Z B, et al. Learning a convolutional neural network for non-uniform motion blur removal[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2015: 769-777.
12	KUPYN O, BUDZAN V, MYKHAILYCH M, et al. DeblurGAN: blind motion deblurring using conditional adversarial networks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8183-8192.
13	NAH S, KIM T H, LEE K M. Deep multi-scale convolutional neural network for dynamic scene deblurring[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2017: 257-265.
14	ZHAO S Y, ZHANG Z, HONG R C, et al. FCL-GAN: a lightweight and real-time baseline for unsupervised blind image deblurring[C]//Proceedings of the 30th ACM International Conference on Multimedia. New York, USA: ACM Press, 2022: 1-7.
15	ZHAO S, ZHANG Z, HONG R, et al. Unsupervised color retention network and new quantization metric for blind motion deblurring[EB/OL]. [2022-07-11]. https://www.techrxiv.org/articles/preprint/Unsupervised_Color_Retention_Network_and_New_Quantization_Metric_for_Blind_Motion_Deblurring/16835314.
16	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. New York, USA: ACM Press, 2017: 6000-6010.
17	CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 213-229.
18	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2022-07-11]. https://arxiv.org/abs/2010.11929.
19	TOUVRON H, CORD M, DOUZE M, et al. Training data-efficient image Transformers & distillation through attention[EB/OL]. [2022-07-11]. https://arxiv.org/abs/2012.12877.
20	CHO S J, JI S W, HONG J P, et al. Rethinking coarse-to-fine approach in single image deblurring[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2022: 4621-4630.
21	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: hierarchical vision Transformer using shifted windows[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2022: 9992-10002.
22	LIANG J Y, CAO J Z, SUN G L, et al. SwinIR: image restoration using Swin Transformer[C]//Proceedings of IEEE/CVF International Conference on Computer Vision Workshops. Washington D. C., USA: IEEE Press, 2021: 1833-1844.
23	LIU S G, WANG H B, WANG J, et al. Blur-kernel bound estimation from pyramid statistics. IEEE Transactions on Circuits and Systems for Video Technology, 2016, 26 (5): 1012- 1016. doi: 10.1109/TCSVT.2015.2418585
24	GAO H Y, TAO X, SHEN X Y, et al. Dynamic scene deblurring with parameter selective sharing and nested skip connections[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 3843-3851.
25	TAO X, GAO H Y, SHEN X Y, et al. Scale-recurrent network for deep image deblurring[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 8174-8182.
26	YUAN Y, FU R, HUANG L, et al. Hrformer: high-resolution vision Transformer for dense predict[C]//Proceedings of Advances in Neural Information Processing Systems. Berlin, Germany: Springer, 2021: 7281-7293.
27	ZAMIR S W, ARORA A, KHAN S, et al. Learning enriched features for real image restoration and enhancement[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 492-511.
28	RIM J, LEE H, WON J, et al. Real-world blur dataset for learning and benchmarking deblurring algorithms[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2020: 184-201.
29	JIAO J, CAO Y, SONG Y, et al. Look deeper into depth: monocular depth estimation with semantic booster and attention-driven loss[C]//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer, 2018: 53-69.
30	ZHANG H G, DAI Y C, LI H D, et al. Deep stacked hierarchical multi-patch network for image deblurring[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2020: 5971-5979.
31	ZAMIR S W, ARORA A, KHAN S, et al. Multi-stage progressive image restoration[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2021: 14816-14826.
32	KUPYN O, MARTYNIUK T, WU J R, et al. DeblurGAN-v2: deblurring(orders-of-magnitude) faster and better[C]//Proceedings of IEEE/CVF International Conference on Computer Vision. Washington D. C., USA: IEEE Press, 2020: 8877-8886.
33	ZOU W B, JIANG M C, ZHANG Y C, et al. SDWNet: a straight dilated network with wavelet transformation for image deblurring[C]//Proceedings of IEEE/CVF International Conference on Computer Vision Workshops. Washington D. C., USA: IEEE Press, 2021: 1895-1904.

[1]	王款, 宣士斌, 何雪东, 李紫薇, 李嘉祥. 基于交叉注意力Transformer的人体姿态估计方法[J]. 计算机工程, 2023, 49(7): 223-231.
[2]	廖列法, 谢树松. 基于注意力机制特征融合的中文命名实体识别[J]. 计算机工程, 2023, 49(4): 256-262.
[3]	衡红军, 范昱辰, 王家亮. 基于Transformer的多方面特征编码图像描述生成算法[J]. 计算机工程, 2023, 49(2): 199-205.
[4]	司逸晨, 管有庆. 基于Transformer编码器的中文命名实体识别模型[J]. 计算机工程, 2022, 48(7): 66-72.
[5]	王刚, 孙媛媛, 陈彦光, 林鸿飞. 面向法律文书的分段式摘要模型[J]. 计算机工程, 2022, 48(6): 288-294.
[6]	史占堂, 马玉鹏, 赵凡, 马博. 基于CNN‐Head Transformer编码器的中文命名实体识别[J]. 计算机工程, 2022, 48(10): 73-80.
[7]	缪斯, 祝永新. 针对图像盲去模糊的可微分神经网络架构搜索方法[J]. 计算机工程, 2021, 47(9): 313-320.
[8]	陈彦光, 王雷, 孙媛媛, 王治政, 张书晨. 面向法律文本的三元组抽取模型[J]. 计算机工程, 2021, 47(5): 277-284.
[9]	赵亚南, 刘渊, 宋设. 融合多头自注意力机制的金融新闻极性分析[J]. 计算机工程, 2020, 46(8): 85-92.
[10]	薛之昕, 郑英豪, 肖建, 魏玲玲. 基于多尺度卷积神经网络的交通标志识别[J]. 计算机工程, 2020, 46(3): 261-266.

选择文件类型/文献管理软件名称

选择包含的内容