作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (11): 246-257. doi: 10.19678/j.issn.1000-3428.0068166

• 图形图像处理 • 上一篇    下一篇

多头自注意力与双线性池化融合的心肌缺血影像分类

周嘉文1,2, 郑小盈1,2,*(), 祝永新1,2, 林思敏3, 陈凌曜1,2, 曾洪斌4, 郭俞4, 王馨莹4   

  1. 1. 中国科学院上海高等研究院,上海 201210
    2. 中国科学院大学,北京 101408
    3. 厦门大学医学院厦门大学附属心血管病医院放射科,福建 厦门 361006
    4. 上海核工程研究设计院股份有限公司,上海 200030
  • 收稿日期:2023-08-01 修回日期:2024-06-03 出版日期:2025-11-15 发布日期:2025-11-26
  • 通讯作者: 郑小盈
  • 基金资助:
    国家自然科学基金(12373113); 国家自然科学基金(62004201); 上海市人才发展资金项目(E1322E1); 上海核工程研究设计院股份有限公司知识图谱应用开发与测试项目(E3423E1)

Myocardial Ischemia Image Classification via Fusion of Multi-Head Self-Attention and Bilinear Pooling

ZHOU Jiawen1,2, ZHENG Xiaoying1,2,*(), ZHU Yongxin1,2, LIN Simin3, CHEN Lingyao1,2, ZENG Hongbin4, GUO Yu4, WANG Xinying4   

  1. 1. Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
    2. University of Chinese Academy of Sciences, Beijing 101408, China
    3. Radiology Department of Xiamen Cardiovascular Hospital of Xiamen University, School of Medicine, Xiamen University, Xiamen 361006, Fujian, China
    4. Shanghai Nuclear Engineering Research and Design Institute Co., Ltd., Shanghai 200030, China
  • Received:2023-08-01 Revised:2024-06-03 Online:2025-11-15 Published:2025-11-26
  • Contact: ZHENG Xiaoying

摘要:

深度学习在心肌缺血辅助诊断中有重要应用价值, 但传统深度学习医学图像分类网络存在无法捕捉心肌计算机断层扫描(CT)类别间细微差异、丢失CT数据三维(3D)结构信息等问题。为此, 提出一种DBTMed3D网络, 采用3D双线性细粒度池化对传统Med3D网络中的卷积模块进行改进, 用于处理包括CT和MRI在内的多模态医学图像数据。同时, 模仿ResNet网络, 在模块中引入跳跃连接, 融合图像细粒度二阶特征和卷积模块提取到的特征, 使得网络在关注局部特征的同时保留整体特征。此外, 引入3D类别激活图, 将热力图叠加在原心肌图像的CT切片上, 突出网络模型重点关注的心肌位置。最后, 设计3D层次化多头自注意力模块, 通过捕获图像局部特征解决3D医学图像的细粒度分类问题。实验结果表明, DBTMed3D在心肌CT数据集上的分类准确率为86.4%, 相比基准网络3D ResNet-50提升了6.7百分点, 具有较优的分类效果。

关键词: 心肌缺血, 卷积神经网络, 双线性细粒度, 多头自注意力机制, 类别激活图, 跳跃连接

Abstract:

Deep learning has significant application value in the auxiliary diagnosis of myocardial ischemia. However, traditional deep learning networks for medical image classification suffer from limitations such as the inability to capture subtle inter-class differences in myocardial Computed Tomography (CT) scans and the loss of three-dimensional (3D) structural information from CT data. To address these issues, this study proposes DBTMed3D, a network that improves the convolutional modules in the conventional Med3D architecture through 3D bilinear fine-grained pooling, thereby enabling the processing of multimodal medical imaging data, including both CT and MRI. By emulating the ResNet design, skip connections are introduced within the modules to fuse fine-grained second-order image features with those extracted by convolutional blocks, allowing the network to preserve global characteristics while focusing on local details. Additionally, 3D class activation maps are incorporated to overlay heat maps onto the original myocardial CT slices, highlighting the regions of primary interest identified by the model. Furthermore, the study designs a 3D hierarchical multi-head self-attention module to resolve fine-grained classification challenges in 3D medical images by capturing localized image features. Experimental results demonstrate that DBTMed3D achieves an 86.4% classification accuracy on the myocardial CT dataset, which is a 6.7 percentage points improvement compared with the accuracy of the baseline 3D ResNet-50 model, thereby validating its superior classification performance.

Key words: myocardial ischemia, Convolutional Neural Network (CNN), bilinear fine-grained, multi-head self-attention mechanism, class activation map, skip connection