MTM3D：融合Mamba与改进TTM的3D医学图像分析网络

doi:10.19678/j.issn.1000-3428.0070764

摘要/Abstract

摘要： 生物医学成像在诊断和治疗多种疾病中起着至关重要的作用。将深度学习方法应用于医学图像分析能够提高医学图像的可读性，为临床决策提供更可靠的支持。然而，传统的医学图像处理方法在有效捕获三维图像中的空间特征和复杂结构信息方面存在一定局限性，尤其是在处理不同成像方式生成的复杂3D医学影像时，模型的精度和泛化能力常常受到挑战。针对这一挑战，提出了一种MTM3D模型用于医学图像分类任务，该模型结合了Mamba模型在复杂序列任务的优异性能与改进令牌图灵机(Token Turning Machines, TTM)网络的外部记忆存储功能。通过引入循环链式存储结构，MTM3D能够在记忆单元中有效交互不同空间结构的特征，从而提升对复杂空间关系的捕捉能力；此外，Mamba的引入进一步增强了记忆单元与处理单元的交互能力，使模型具备更强的泛化能力，在不同的医学影像数据集上表现出色。实验结果表明，MTM3D在MedMNIST v2数据集上的医学图像理解能力表现优异。相比现有最佳的医学图像分析网络，MTM3D的平均准确率ACC提升了3.97%，平均曲线下面积AUC提升了2.00%，充分展示了其在医学影像解读和协助医疗专业人员进行诊断与治疗规划中的巨大潜力。

Abstract: Biomedical imaging plays a crucial role in the diagnosis and treatment of various diseases. The application of deep learning methods to medical image analysis can enhance the readability of medical images and provide more reliable support for clinical decision-making. However, traditional medical image processing methods face certain limitations in effectively capturing spatial features and complex structural information in 3D images, especially when handling complex 3D medical images generated by different imaging modalities. This often challenges the model's accuracy and generalization ability. To address this challenge, an MTM3D model is proposed for medical image classification tasks. This model combines the excellent performance of the Mamba model in complex sequential tasks with the external memory storage function of the improved Token Turning Machines (TTM) network. By introducing a cyclic chain storage structure, MTM3D enables effective interaction of features from different spatial structures within memory units, thus enhancing its ability to capture complex spatial relationships. Furthermore, the incorporation of Mamba further strengthens the interaction between the memory and processing units, allowing the model to possess stronger generalization capability and perform excellently across different medical imaging datasets. Experimental results demonstrate that MTM3D exhibits outstanding medical image understanding capabilities on the MedMNIST v2 dataset. Compared to the current best medical image analysis networks, MTM3D improves the average accuracy (ACC) by 3.97% and the average area under the curve (AUC) by 2.00%, fully showcasing its tremendous potential in medical image interpretation and assisting healthcare professionals in diagnosis and treatment planning.

杨洋, 魏弘凯, 孙士杰, 胡红利, 王荣, 王天添. MTM3D：融合Mamba与改进TTM的3D医学图像分析网络[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0070764.

YANG Yang, WEI Hongkai, SUN Shijie, HU Hongli, WANG Rong, WANG Tiantian. MTM3D:3D Medical Image Classification Network Integrating Mamba and Improved TTM[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0070764.

参考文献

[1] WEBB A. Introduction to biomedical imaging[M]. John Wiley & Sons, 2022.
[2] DOU Q, CHEN H, YU L, et al. Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks[J]. IEEE Transactions on Medical Imaging, 2016, 35(5): 1182-1195. DOI: 10.1109/TMI.2016.2528129.
[3] RASAL R, CASTRO D C, PAWLOWSKI N, et al. Deep structural causal shape models[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 400-432.
https://doi.org/10.1007/978-3-031-25075-0_28. [4] LIU F, CAI J, HUO Y, et al. Jssr: A joint synthesis, segmentation, and registration system for 3D multi-modal image alignment of large-scale pathological CT scans[C]//European Conference on Computer Vision. Cham: Springer International Publishing, 2020: 257-274.https://doi.org/10.1007/978-3-030-58601-0_16.
[5] 李翠云, 白静, 郑凉. 融合边缘增强注意力机制和 U-Net 网络的医学图像分割[J]. 图学学报, 2022, 43(2): 273-278. LI CUI-YUN, BAI JING, ZHENG LIANG. A U-Net based contour enhanced attention for medical image segmentatio n[J]. Journal of Graphics, 2022, 43(2): 273-278.
[6] 张淑军, 彭中, 李辉. SAU-Net:基于 U-Net 和自注意力机制的医学图像分割方法[J]. 电子学报, 2022, 50(10): 2433-2442. https://doi.org/10.12263/DZXB.20200984. ZHANG SHUJUN, PENG ZHONG, LI HUI. SAU-Net: Medical Image Segmentation Method Based on U-Net and Self-Attention[J]. Acta Electronica Sinica, 2022, 50(10): 2433-2442. https://doi.org/10.12263/DZXB.20200984.
[7] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778. https://api.semanticscholar.org/CorpusID:206594692.
[8] 谢娟英, 张凯云. SOSNet:一种非对称编码器-解码器结构的非小细胞肺癌 CT 图像分割模型[J]. 电子学报, 2024, 52(3): 824-837. https://doi.org/10.12263/DZXB.20220853. XIE JUAN-YING, ZHANG KAI-YUN. SOSNet: An Asymmetric Encoder-Decoder Structure Model for Automatic Segmenting Non-Small Cell Lung Cancer CT Images[J]. Acta Electronica Sinica, 2024, 52(3): 824-837. https://doi.org/10.12263/DZXB.20220853.
[9] RAZZAK M I, NAZ S, ZAIB A. Deep learning for medical image processing: Overview, challenges and the future[J]. Classification in BioApps: Automation of Decision Making, 2018: 323-350. https://api.semanticscholar.org/CorpusID:6736412.
[10] 王欣雨, 刘慧, 朱积成, 盛玉瑞, 张彩明. 基于高低频特征分解的深度多模态医学图像融合网络[J]. 图学学报, 2024, 45(1): 65-77. WANG XINYU, LIU HUI, ZHU JICHENG, SHENG YURUI, ZHANG CAIMING. Deep multimodal medical image fusion network based on high-low frequency feature decomposition[J]. Journal of Graphics, 2024, 45(1): 65-77.
[11] SAHA A, TUSHAR F I, FARYNA K, et al. Weakly supervised 3D classification of chest CT using aggregated multi-resolution deep segmentation features[C]//Medical Imaging 2020: Computer-Aided Diagnosis. SPIE, 2020, 11314: 39-44. http://dx.doi.org/10.1117/12.2550857.
[12] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30.
[13] GRAVES A, WAYNE G, DANIHELKA I. Neural Turing Machines[J]. arXiv preprint arXiv:1410.5401, 2014.
[14] RYOO M S, GOPALAKRISHNAN K, KAHATAPITIYA K, et al. Token Turing Machines[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 19070-19081. https://api.semanticscholar.org/CorpusID:253553171.
[15] GU A, DAO T. Mamba: Linear-time sequence modeling with selective state spaces[J]. arXiv preprint arXiv:2312.00752, 2023. https://api.semanticscholar.org/CorpusID:265551773.
[16] YANG J, SHI R, WEI D, et al. MedMNIST v2 - a large-scale lightweight benchmark for 2D and 3D biomedical image classification[J]. Scientific Data, 2023, 10(1): 41.
[17] BORE J C, LI P, JIANG L, et al. A long short-term memory network for sparse spatiotemporal EEG source imaging[J]. IEEE Transactions on Medical Imaging, 2021, 40(12): 3787-3800. DOI: 10.1109/TMI.2021.3097758.
[18] CHEN X, LOWERISON M R, DONG Z, et al. Localization free super-resolution microbubble velocimetry using a long short-term memory neural network[J]. IEEE Transactions on Medical Imaging, 2023, 42(8): 2374-2385. DOI: 10.1109/TMI.2023.3251197.
[19] ZHANG R, QIN B, ZHAO J, et al. Locating X-ray coronary angiogram keyframes via long short-term spatiotemporal attention with image-to-patch contrastive learning[J]. IEEE Transactions on Medical Imaging, 2023, 43(1): 51-63. DOI: 10.1109/TMI.2023.3286859.
[20] CHEN Y, ZHANG H, WANG Y, et al. MAMA Net: Multi-scale attention memory autoencoder network for anomaly detection[J]. IEEE Transactions on Medical Imaging, 2020, 40(3): 1032-1041. DOI: 10.1109/TMI.2020.3045295.
[21] WANG P, ZHANG H, ZHU M, et al. MGIML: CancerGrading with Incomplete Radiology-Pathology Data via Memory Learning and Gradient Homogenization[J]. IEEE Transactions on Medical Imaging, 2024. https://api.semanticscholar.org/CorpusID:267029732.
[22] RYOO M S, PIERGIOVANNI A J, ARNAB A, et al. TokenLearner: What can 8 learned tokens do for images and videos?[J]. arXiv preprint arXiv:2106.11297, 2021.
[23] JAEGLE A, GIMENO F, BROCK A, et al. Perceiver: General perception with iterative attention[C]//International Conference on Machine Learning. PMLR, 2021: 4651-4664.
[24] AHMADI N, TSANG M Y, GU A N, et al. Transformer-based spatio-temporal analysis for classification of aortic stenosis severity from echocardiography cine series[J]. IEEE Transactions on Medical Imaging, 2023. https://api.semanticscholar.org/CorpusID:260923888.
[25] YANG Z, PAN J, DAI J, et al. Self-supervised lightweight depth estimation in endoscopy combining CNN and transformer[J]. IEEE Transactions on Medical Imaging, 2024. https://api.semanticscholar.org/CorpusID:266930548.
[26] WU R, LIU Y, LIANG P, et al. Ultralight VM-Unet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation[J]. arXiv preprint arXiv:2403.20035, 2024. https://arxiv.org/abs/2403.20035.
[27] RUAN J, XIANG S. VM-Unet: Vision Mamba Unet for Medical Image Segmentation[J]. arXiv preprint arXiv:2402.02491, 2024. https://arxiv.org/abs/2402.02491.
[28] ZHOU J, JIANG M, WU J, et al. MGI: Multimodal Contrastive Pre-training of Genomic and Medical Imaging[J]. arXiv preprint arXiv:2406.00631, 2024. https://arxiv.org/abs/2406.00631.
[29] K. HE, X. ZHANG, S. REN, and J. SUN, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
[30] FEURER M, KLEIN A, EGGENSPERGER K, et al. Efficient and robust automated machine learning[J]. Advances in Neural Information Processing Systems, 2015, 28.
[31] JIN H, SONG Q, HU X. Auto-Keras: An efficient neural architecture search system[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019: 1946-1956.
[32] LIU J, LI Y, CAO G, et al. Feature pyramid vision transformer for MedMNIST classification decathlon[C]//2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022: 1-8.
[33] YANG J, HUANG X, HE Y, et al. Reinventing 2D convolutions for 3D images[J]. IEEE Journal of Biomedical and Health Informatics, 2021, 25(8): 3009-3018.
[34] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[35] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]//International conference on machine learning. PmLR, 2021: 8748-8763.
[36] LIU Z, LIN Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.

选择文件类型/文献管理软件名称

选择包含的内容