多头自注意力与双线性池化融合的心肌缺血影像分类

doi:10.19678/j.issn.1000-3428.0068166

计算机工程 ›› 2025, Vol. 51 ›› Issue (11): 246-257. doi: 10.19678/j.issn.1000-3428.0068166

多头自注意力与双线性池化融合的心肌缺血影像分类

周嘉文¹^,², 郑小盈¹^,²^,*(), 祝永新¹^,², 林思敏³, 陈凌曜¹^,², 曾洪斌⁴, 郭俞⁴, 王馨莹⁴

1. 中国科学院上海高等研究院，上海 201210
2. 中国科学院大学，北京 101408
3. 厦门大学医学院厦门大学附属心血管病医院放射科，福建厦门 361006
4. 上海核工程研究设计院股份有限公司，上海 200030

收稿日期:2023-08-01 修回日期:2024-06-03 出版日期:2025-11-15 发布日期:2025-11-26
通讯作者: 郑小盈
基金资助:
国家自然科学基金(12373113); 国家自然科学基金(62004201); 上海市人才发展资金项目(E1322E1); 上海核工程研究设计院股份有限公司知识图谱应用开发与测试项目(E3423E1)

Myocardial Ischemia Image Classification via Fusion of Multi-Head Self-Attention and Bilinear Pooling

ZHOU Jiawen¹^,², ZHENG Xiaoying¹^,²^,*(), ZHU Yongxin¹^,², LIN Simin³, CHEN Lingyao¹^,², ZENG Hongbin⁴, GUO Yu⁴, WANG Xinying⁴

1. Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
2. University of Chinese Academy of Sciences, Beijing 101408, China
3. Radiology Department of Xiamen Cardiovascular Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361006, Fujian, China
4. Shanghai Nuclear Engineering Research and Design Institute Co., Ltd., Shanghai 200030, China

Received:2023-08-01 Revised:2024-06-03 Online:2025-11-15 Published:2025-11-26
Contact: ZHENG Xiaoying

摘要/Abstract

摘要：

深度学习在心肌缺血辅助诊断中有重要应用价值, 但传统深度学习医学图像分类网络存在无法捕捉心肌计算机断层扫描(CT)类别间细微差异、丢失CT数据三维(3D)结构信息等问题。为此, 提出一种DBTMed3D网络, 采用3D双线性细粒度池化对传统Med3D网络中的卷积模块进行改进, 用于处理包括CT和MRI在内的多模态医学图像数据。同时, 模仿ResNet网络, 在模块中引入跳跃连接, 融合图像细粒度二阶特征和卷积模块提取到的特征, 使得网络在关注局部特征的同时保留整体特征。此外, 引入3D类别激活图, 将热力图叠加在原心肌图像的CT切片上, 突出网络模型重点关注的心肌位置。最后, 设计3D层次化多头自注意力模块, 通过捕获图像局部特征解决3D医学图像的细粒度分类问题。实验结果表明, DBTMed3D在心肌CT数据集上的分类准确率为86.4%, 相比基准网络3D ResNet-50提升了6.7百分点, 具有较优的分类效果。

关键词: 心肌缺血, 卷积神经网络, 双线性细粒度, 多头自注意力机制, 类别激活图, 跳跃连接

Abstract:

Deep learning has significant application value in the auxiliary diagnosis of myocardial ischemia. However, traditional deep learning networks for medical image classification suffer from limitations such as the inability to capture subtle inter-class differences in myocardial Computed Tomography (CT) scans and the loss of three-dimensional (3D) structural information from CT data. To address these issues, this study proposes DBTMed3D, a network that improves the convolutional modules in the conventional Med3D architecture through 3D bilinear fine-grained pooling, thereby enabling the processing of multimodal medical imaging data, including both CT and MRI. By emulating the ResNet design, skip connections are introduced within the modules to fuse fine-grained second-order image features with those extracted by convolutional blocks, allowing the network to preserve global characteristics while focusing on local details. Additionally, 3D class activation maps are incorporated to overlay heat maps onto the original myocardial CT slices, highlighting the regions of primary interest identified by the model. Furthermore, the study designs a 3D hierarchical multi-head self-attention module to resolve fine-grained classification challenges in 3D medical images by capturing localized image features. Experimental results demonstrate that DBTMed3D achieves an 86.4% classification accuracy on the myocardial CT dataset, which is a 6.7 percentage points improvement compared with the accuracy of the baseline 3D ResNet-50 model, thereby validating its superior classification performance.

Key words: myocardial ischemia, Convolutional Neural Network (CNN), bilinear fine-grained, multi-head self-attention mechanism, class activation map, skip connection

周嘉文, 郑小盈, 祝永新, 林思敏, 陈凌曜, 曾洪斌, 郭俞, 王馨莹. 多头自注意力与双线性池化融合的心肌缺血影像分类[J]. 计算机工程, 2025, 51(11): 246-257.

ZHOU Jiawen, ZHENG Xiaoying, ZHU Yongxin, LIN Simin, CHEN Lingyao, ZENG Hongbin, GUO Yu, WANG Xinying. Myocardial Ischemia Image Classification via Fusion of Multi-Head Self-Attention and Bilinear Pooling[J]. Computer Engineering, 2025, 51(11): 246-257.

https://www.ecice06.com/CN/Y2025/V51/I11/246

图/表 17

图1 双线性细粒度网络结构

Fig.1 Bilinear fine-grained network structure

图2 叠加心肌label的心脏CT

Fig.2 Cardiac CT with overlaid myocardial label

图3 心肌轮廓的3D立体图

Fig.3 3D stereogram of myocardial contour

图4 心肌CT图像

Fig.4 Myocardial CT image

图5 DBTMed3D模型的网络架构

Fig.5 Network architecture of DBTMed3D model

图6 3D Bilinear Transformation模块运算流程

Fig.6 Operational process of the 3D Bilinear Transformation module

图7 数据维度在模型中的变化过程

Fig.7 The transformation process of data dimensions within the model

图8 DBTMed3D训练心肌CT数据的过程

Fig.8 The process of DBTMed3D training myocardial CT data

图9 DBTMed3D训练OrganMNIST3D数据的过程

Fig.9 The process of DBTMed3D training OrganMNIST3D data

图10 双线性池化模块消融实验的3种网络结构

Fig.10 Three network structures for bilinear pooling module ablation experiment

图11 双线性池化模块的消融实验结果

Fig.11 Ablation experiment results of bilinear pooling module

图12 3D层次化多头自注意力模块消融实验的网络结构

Fig.12 Network structure of 3D hierarchical multi-head self-attention module ablation experiment

图13 3D层次化多头自注意力模块的消融实验结果

Fig.13 Ablation experiment results of 3D hierarchical multi-head self-attention module

图14 层次化多头自注意力模块的消融实验结果

Fig.14 Ablation experiment results of hierarchical multi-head self-attention module

图15 3D CAM可视化切片

Fig.15 Visualization slicing of 3D CAM

参考文献 33

1	中国心血管疾病防治协会. 中国心血管健康与疾病报告[R]. 北京: 科学出版社, 2023: 8-9.
	Chinese Association for the Prevention and Treatment of Cardiovascular Diseases. Report on cardiovascular health and diseases in China[R]. Beijing: Science Press, 2023: 8-9. (in Chinese)
2	THYGESEN K , ALPERT J S , JAFFE A S , et al. Fourth universal definition of myocardial infarction (2018). Journal of the American College of Cardiology, 2018, 72 (18): 2231- 2264. doi: 10.1016/j.jacc.2018.08.1038
3	GARNER K K , POMEROY W , ARNOLD J J . Exercise stress testing: indications and common questions. American Family Physician, 2017, 96 (5): 293- 299.
4	曹春萍, 李哲. 基于ResNet101多特征融合的新型冠状病毒感染图像分类方法. 小型微型计算机系统, 2024, 45 (10): 2473- 2478.
	CAO C P , LI Z . Novel method for classifying coronavirus infection images based on ResNet101 multi-feature fusion. Journal of Chinese Computer Systems, 2024, 45 (10): 2473- 2478.
5	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. [2023-05-05]. https://arxiv.org/abs/1409.1556.
6	CHEN S, MA K, ZHENG Y. Med3D: transfer learning for 3D medical image analysis[EB/OL]. [2023-05-05]. https://arxiv.org/abs/1904.00625.
7	VAN DEN WYNGAERT T , STROBEL K , KAMPEN W U , et al. The EANM practice guidelines for bone scintigraphy. European Journal of Nuclear Medicine and Molecular Imaging, 2016, 43 (9): 1723- 1738. doi: 10.1007/s00259-016-3415-4
8	DUVALL W L , SLOMKA P J , GERLACH J R , et al. High-efficiency SPECT MPI: comparison of automated quantification, visual interpretation, and coronary angiography. Journal of Nuclear Cardiology, 2013, 20 (5): 763- 773. doi: 10.1007/s12350-013-9735-x
9	ARSANJANI R , XU Y , DEY D , et al. Improved accuracy of myocardial perfusion SPECT for the detection of coronary artery disease using a support vector machine algorithm. Journal of Nuclear Medicine, 2013, 54 (4): 549- 555. doi: 10.2967/jnumed.112.111542
10	SU T Y , CHEN J J , CHEN W S , et al. Deep learning for myocardial ischemia auxiliary diagnosis using CZT SPECT myocardial perfusion imaging. Journal of the Chinese Medical Association, 2023, 86 (1): 122- 130. doi: 10.1097/JCMA.0000000000000833
11	KAPLAN BERKAYA S , AK SIVRIKOZ I , GUNAL S . Classification models for SPECT myocardial perfusion imaging. Computers in Biology and Medicine, 2020, 123, 103893. doi: 10.1016/j.compbiomed.2020.103893
12	PAPANDRIANOS N , PAPAGEORGIOU E . Automatic diagnosis of coronary artery disease in SPECT myocardial perfusion imaging employing deep learning. Applied Sciences, 2021, 11 (14): 6362. doi: 10.3390/app11146362
13	PAPANDRIANOS N , PAPAGEORGIOU E I , ANAGNOSTIS A . Development of convolutional neural networks to identify bone metastasis for prostate cancer patients in bone scintigraphy. Annals of Nuclear Medicine, 2020, 34 (11): 824- 832. doi: 10.1007/s12149-020-01510-6
14	GWILLIAM M, TEUSCHER A, ANDERSON C, et al. Fair comparison: quantifying variance in results for fine-grained visual categorization[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Washington D.C., USA: IEEE Press, 2021: 3309-3318.
15	ZENG R , HE J S . Grouping bilinear pooling for fine-grained image classification. Applied Sciences, 2022, 12 (10): 5063. doi: 10.3390/app12105063
16	WU W, YU J. An improved bilinear pooling method for image-based action recognition[C]//Proceedings of the 25th International Conference on Pattern Recognition (ICPR). Washington D.C., USA: IEEE Press, 2021: 8578-8583.
17	GUO J Y, HAN K, WU H, et al. CMT: convolutional neural networks meet vision transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2022: 12165-12175.
18	HU X B , ZHU S N , PENG T L . Hierarchical attention vision transformer for fine-grained visual classification. Journal of Visual Communication and Image Representation, 2023, 91, 103755. doi: 10.1016/j.jvcir.2023.103755
19	PAN Z Z , ZHUANG B H , HE H Y , et al. Less is more: pay less attention in vision transformers. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36 (2): 2035- 2043. doi: 10.1609/aaai.v36i2.20099
20	ZHOU B L, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 2921-2929.
21	YUSHKEVICH P A, GAO Y, GERIG G. ITK-SNAP: an interactive tool for semi-automatic segmentation of multi-modality biomedical images[C]//Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Washington D.C., USA: IEEE Press, 2016: 3342-3345.
22	YE C, YANG Y, FERMULLER C, et al. On the importance of consistency in training deep neural networks[EB/OL]. [2023-05-05]. https://arxiv.org/abs/1708.00631.
23	ROTH H R , YANG D , XU Z Y , et al. Going to extremes: weakly supervised medical image segmentation. Machine Learning and Knowledge Extraction, 2021, 3 (2): 507- 524. doi: 10.3390/make3020026
24	GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington D.C., USA: IEEE Press, 2016: 2414-2423.
25	YANG J C , SHI R , WEI D L , et al. MedMNIST v2-a large-scale lightweight benchmark for 2D and 3D biomedical image classification. Scientific Data, 2023, 10, 41. doi: 10.1038/s41597-022-01721-8
26	HOUNSFIELD G N. A method of and apparatus for examination of a body by radation such as X or gamma radiation[EB/OL]. [2023-05-05]. https://cir.nii.ac.jp/crid/1570009749978392704?lang=en.
27	XU X A , ZHOU F G , LIU B , et al. Efficient multiple organ localization in CT image using 3D region proposal network. IEEE Transactions on Medical Imaging, 2019, 38 (8): 1885- 1898. doi: 10.1109/TMI.2019.2894854
28	HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2017: 4700-4708.
29	ZHANG X Y, ZHOU X Y, LIN M X, et al. ShuffleNet: an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2018: 6848-6856.
30	TAN M, LE Q. EfficientNet: rethinking model scaling for convolutional neural networks[EB/OL]. [2023-05-05]. https://arxiv.org/abs/1905.11946.
31	IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50×fewer parameters and < 0.5 MB model size[EB/OL]. [2023-05-05]. https://arxiv.org/abs/1602.07360.
32	HOWARD A G, ZHU M, CHEN B, et al. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL]. [2023-05-05]. https://arxiv.org/abs/1704.04861.
33	JIN L , YANG J C , KUANG K M , et al. Deep-learning-assisted detection and segmentation of rib fractures from CT scans: development and validation of FracNet. eBioMedicine, 2020, 62, 103106. doi: 10.1016/j.ebiom.2020.103106

[1]	翟志鹏, 曹阳, 沈琴琴, 施佺. 基于多时空图融合与动态注意力的交通流预测[J]. 计算机工程, 2025, 51(9): 139-148.
[2]	武东辉, 王金凤, 仇森, 刘国志. 基于EWBiLSTM-ATT的数据手套手语识别[J]. 计算机工程, 2025, 51(8): 107-119.
[3]	武东辉, 王金凤, 仇森, 刘国志. 基于EWBiLSTM-ATT的数据手套手语识别[J]. 计算机工程, 2025, 51(8): 107-119.
[4]	田银花, 杨立飞, 韩咚, 杜玉越. 基于改进BERT和轻量化CNN的业务流程合规性检查方法[J]. 计算机工程, 2025, 51(7): 199-209.
[5]	柳大格, 游进国, 耿齐祁. 融合全局与局部语义的跨领域方面词抽取[J]. 计算机工程, 2025, 51(6): 116-126.
[6]	郭佩林, 张德, 王怀秀. 基于特征可视化探究跳跃连接结构对深度神经网络特征提取的影响[J]. 计算机工程, 2025, 51(4): 149-157.
[7]	杨萍, 张汐. 改进DeepLabv3+的道路表面裂缝检测方法[J]. 计算机工程, 2025, 51(4): 261-270.
[8]	刘云翔, 梁智超. 一种高效的连续时序图注意力网络的交通预测模型[J]. 计算机工程, 2025, 51(4): 350-359.
[9]	张肇鑫, 黄世泽, 张兵杰, 沈拓. 面向交通场景的运动模糊伪装对抗样本生成方法[J]. 计算机工程, 2025, 51(3): 45-53.
[10]	胡书林, 张华军, 邓小涛, 王征华. 结合依存图卷积的中文文本相似度计算研究[J]. 计算机工程, 2025, 51(3): 76-85.
[11]	张树华, 王继业, 赵传奇, 陈宏铭, 郭咏雯. 面向输电线路边缘智能的硬件加速设计[J]. 计算机工程, 2025, 51(2): 213-222.
[12]	郑洁云, 张章煌, 宣菊琴, 魏鑫, 薛静玮. 基于知识图谱和图卷积神经网络的配电网智能规划方法[J]. 计算机工程, 2025, 51(11): 392-402.
[13]	肖志鹏, 何书峰, 田春岐. EmoRepLKNet:一种基于UniRepLKNet的面部情绪识别神经网络[J]. 计算机工程, 2025, 51(11): 54-62.
[14]	郑雅洲, 刘万平, 黄东. 一种基于注意力机制的BERT-CNN-GRU检测方法[J]. 计算机工程, 2025, 51(1): 258-268.
[15]	张会影, 圣文顺. 基于标记适应的人脸年龄识别优化算法[J]. 计算机工程, 2025, 51(1): 174-181.

选择文件类型/文献管理软件名称

选择包含的内容

多头自注意力与双线性池化融合的心肌缺血影像分类

Myocardial Ischemia Image Classification via Fusion of Multi-Head Self-Attention and Bilinear Pooling

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 33

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

多头自注意力与双线性池化融合的心肌缺血影像分类

Myocardial Ischemia Image Classification via Fusion of Multi-Head Self-Attention and Bilinear Pooling

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 33

相关文章 15

编辑推荐

Metrics

本文评价