局部全局特征耦合与交叉尺度注意的医学图像融合

doi:10.19678/j.issn.1000-3428.0064891

摘要/Abstract

摘要： 现有基于深度学习的多模态医学图像融合方法存在全局特征表示能力不足的问题。对此，提出一种基于局部全局特征耦合与交叉尺度注意的医学图像融合方法。该方法由编码器、融合规则和解码器三部分组成。编码器中采用并行的卷积神经网络（CNN）和Transformer双分支网络分别提取图像的局部特征与全局表示。在不同尺度下，通过特征耦合模块将CNN分支的局部特征嵌入Transformer分支的全局特征表示中，最大程度地结合互补特征，同时引入交叉尺度注意模块实现对多尺度特征表示的有效利用。编码器提取待融合原始图像的局部、全局以及多尺度特征表示，根据融合规则融合不同源图像的特征表示后再输入到解码器中生成融合图像。实验结果表明，与CBF、PAPCNN、IFCNN、DenseFuse和U2Fusion方法相比，该方法在特征互信息、空间频率、边缘信息传递因子、结构相似度、感知图像融合质量这5个评价指标上分别平均提高6.29%、3.58%、29.01%、5.34%、5.77%，融合图像保留了更清晰的纹理细节和更高的对比度，便于疾病的诊断与治疗。

关键词: 医学图像融合, 编码器-解码器网络, Transformer网络, 特征耦合, 交叉尺度注意

Abstract: To address the insufficient global feature representation in existing deep learning-based multimodal medical image fusion methods, this study proposes a medical image fusion method based on local-global feature coupling and cross-scale attention.The method comprises three parts:encoder, fusion rule, and decoder.In the encoder, parallel Convolutional Neural Network(CNN) and Transformer dual-branch networks are used to extract the local features and global representation of the image, respectively.At different scales, the local features of the CNN branch are embedded into the global feature representation of the Transformer branch through the feature coupling module for combining complementary features;simultaneously, a cross-scale attention module is introduced to effectively utilize multiscale feature representation.The encoder extracts the local, global, and multiscale feature representations of the original images to be fused, fuses the feature representations of different source images through fusion rules, and then inputs them into the decoder to generate the fused image.Experiments show that compared with CBF, PAPCNN, IFCNN, DenseFuse, and U2Fusion methods, the proposed method objectively improves the five evaluation indicators of feature mutual information, spatial frequency, edge information transfer factor, structural similarity, and perceptual image fusion quality by 6.29%, 3.58%, 29.01%, 5.34%, and 5.77%, respectively;subjectively, the fusion images obtained using this method retain clearer texture details and higher contrast, which is convenient for disease diagnosis and treatment.

Key words: medical image fusion, encoder-decoder network, Transformer network, feature coupling, cross-scale attention

中图分类号:

TP391

张炯, 王丽芳, 蔺素珍, 秦品乐, 米嘉, 刘阳. 局部全局特征耦合与交叉尺度注意的医学图像融合[J]. 计算机工程, 2023, 49(3): 238-247.

ZHANG Jiong, WANG Lifang, LIN Suzhen, QIN Pinle, MI Jia, LIU Yang. Medical Image Fusion with Local-Global Feature Coupling and Cross-Scale Attention[J]. Computer Engineering, 2023, 49(3): 238-247.

https://www.ecice06.com/CN/Y2023/V49/I3/238

图/表 13

20230314190225

20230314190228

20230314190231

20230314190234

20230314190237

20230314190240

20230314190244

20230314190247

20230314190250

20230314190254

20230314190257

20230314190304

20230314190307

参考文献

[1] DU J.An overview of multi-modal medical image fusion[J].Neurocomputing, 2016, 215:3-20.
[2] LIU Y, CHEN X, CHENG J, et al.A medical image fusion method based on convolutional neural networks[C]//Proceedings of the 20th International Conference on Information Fusion.Washington D.C., USA:IEEE Press, 2017:1-7.
[3] LI S T, KANG X D, FANG L Y, et al.Pixel-level image fusion:a survey of the state of the art[J].Information Fusion, 2017, 33:100-112.
[4] 郭淑娟, 高媛, 秦品乐, 等.基于多尺度边缘保持分解与PCNN的医学图像融合[J].计算机工程, 2021, 47(3):276-283. GUO S J, GAO Y, QIN P L, et al.Medical image fusion based on multi-scale edge-preserving decomposition and PCNN[J].Computer Engineering, 2021, 47(3):276-283.(in Chinese)
[5] LI H, WU X J.Multi-focus image fusion using dictionary learning and low-rank representation[C]//Proceedings of International Conference on Image and Graphics.Berlin, Germany:Springer, 2017:675-686.
[6] ZHANG Y, ZHANG Y, LIU Y, et al.IFCNN:a general image fusion framework based on convolutional neural network[J].Information Fusion, 2020, 54:99-118.
[7] XU H, MA J, JIANG J, et al.U2Fusion:a unified unsupervised image fusion network[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(1):502-518.
[8] LI H, WU X J.DenseFuse:a fusion approach to infrared and visible images[J].IEEE Transactions on Image Processing, 2019, 28(5):2614-2623.
[9] MA B Y, ZHU Y, YIN X, et al.SESF-Fuse:an unsupervised deep model for multi-focus image fusion[J].Neural Computing and Applications, 2021, 33(11):5793-5804.
[10] DUAN Z, ZHANG T P, TAN J, et al.Non-local multi-focus image fusion with recurrent neural networks[J].IEEE Access, 2020, 8:135284-135295.
[11] VASWANI A, SHAZEER N, PARMAR N, et al.Attention is all you need[EB/OL].[2022-04-10].https://arxiv.org/abs/1706.03762.
[12] TOUVRON H, CORD M, DOUZE M, et al.Training data-efficient image transformers & distillation through attention[EB/OL].[2022-04-10].https://arxiv.org/abs/2012.12877.
[13] CARION N, MASSA F, SYNNAEVE G, et al.End-to-end object detection with transformers[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2020:213-229.
[14] XU Y F, ZHANG Q M, ZHANG J, et al.ViTAE:vision transformer advanced by exploring intrinsic inductive bias[EB/OL].[2022-04-10].https://arxiv.org/abs/2106.03348.
[15] FAN H Q, XIONG B, MANGALAM K, et al.Multiscale vision transformers[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2021:6804-6815.
[16] 段丹丹, 唐加山, 温勇, 等.基于BERT模型的中文短文本分类算法[J].计算机工程, 2021, 47(1):79-86. DUAN D D, TANG J S, WEN Y, et al.Chinese short text classification algorithm based on BERT model[J].Computer Engineering, 2021, 47(1):79-86.(in Chinese)
[17] 李俊, 吕学强.融合BERT语义加权与网络图的关键词抽取方法[J].计算机工程, 2020, 46(9):89-94. LI J, LÜ X Q.Keyword extraction method based on BERT semantic weighting and network graph[J].Computer Engineering, 2020, 46(9):89-94.(in Chinese)
[18] BROWN T B, MANN B, RYDER N, et al.Language models are few-shot learners[EB/OL].[2022-04-10].https://arxiv.org/abs/2005.14165.
[19] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al.An image is worth 16×16 words:transformers for image recognition at scale[C]//Proceedings of International Conference on Learning Representations.New York, USA:ACM Press, 2021:1-10.
[20] ZHU X Z, SU W J, LU L W, et al.Deformable DETR:deformable transformers for end-to-end object detection[EB/OL].[2022-04-10].https://arxiv.org/abs/2010.04159.
[21] DAI Z G, CAI B L, LIN Y G, et al.UP-DETR:unsupervised pre-training for object detection with transformers[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:1601-1610.
[22] SUN Z Q, CAO S C, YANG Y M, et al.Rethinking transformer-based set prediction for object detection[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2021:3591-3600.
[23] WANG Y Q, XU Z L, WANG X L, et al.End-to-end video instance segmentation with transformers[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2021:8737-8746.
[24] ZHANG Y, LIU H, HU Q.TransFuse:fusing transformers and CNNs for medical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention.Berlin, Germany:Springer, 2021:14-24.
[25] VALANARASU J M J, OZA P, HACIHALILOGLU I, et al.Medical transformer:gated axial-attention for medical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer Assisted Intervention.Berlin, Germany:Springer, 2021:36-46.
[26] JIANG Y, CHANG S, WANG Z.TransGAN:two pure transformers can make one strong GAN, and that can scale up[EB/OL].[2022-04-10].https://arxiv.org/abs/2102.07074.
[27] CHANG H W, ZHANG H, JIANG L, et al.MaskGIT:masked generative image transformer[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2022:11305-11315.
[28] PENG Z L, HUANG W, GU S Z, et al.Conformer:local features coupling global representations for visual recognition[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2021:357-366.
[29] CHEN C F R, FAN Q F, PANDA R.CrossViT:cross-attention multi-scale vision transformer for image classification[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2021:347-356.
[30] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[31] WANG W H, XIE E Z, LI X, et al.Pyramid vision transformer:a versatile backbone for dense prediction without convolutions[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2021:548-558.
[32] LIN J Y, MAO X F, CHEN Y F, et al.D^2ETR:decoder-only DETR with computationally efficient cross-scale attention[EB/OL].[2022-04-10].https://arxiv.org/abs/2203.00860.
[33] KUMAR S B K.Image fusion based on pixel significance using cross bilateral filter[J].Signal, Image and Video Processing, 2015, 9(5):1193-1204.
[34] YIN M, LIU X N, LIU Y, et al.Medical image fusion with parameter-adaptive pulse coupled neural network in nonsubsampled shearlet transform domain[J].IEEE Transactions on Instrumentation and Measurement, 2019, 68(1):49-64.
[35] ZHANG X C, YE P, XIAO G.VIFB:a visible and infrared image fusion benchmark[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops.Washington D.C., USA:IEEE Press, 2020:468-478.

选择文件类型/文献管理软件名称

选择包含的内容