相似实例引导下融合异质图的医学影像报告生成

doi:10.19678/j.issn.1000-3428.0252152

摘要/Abstract

摘要： 医学影像报告自动生成任务存在影像对比度低、异常区域较小的难题，仅依靠影像信息难以精准刻画异常特征，因此如何引入外部知识来增强视觉表征成为解决问题的关键。此外，异常特征的共现关系复杂，依赖单一样本难以捕捉，如何利用相似实例建模共现模式至关重要。针对上述挑战，本文提出一种相似实例引导下融合异质图的医学影像报告生成方法，包括结合异质图的图像特征记忆模块和相似实例特征融合模块。结合异质图的图像特征记忆模块提取报告实体关系，构建报告对应异质图为桥梁，引导模型关注图像异常区域，增强异常视觉特征；相似实例特征融合模块检索相似实例，融合相似实例的异常视觉特征，增强异常区域特征表达的同时，获取更全面的异常信息。在 IU X-ray 和 MIMIC-CXR 这两个医学影像数据集上进行的实验评估显示，所提方法在 BLEU 系列评分指标上表现优秀，IU X-ray上B1~B4分别为0.539，0.353，0.265，0.193。同时，该方法在 METEOR 和 ROUGE-L 指标上的表现同样卓越。实验结果表明，所提方法在自然语言生成指标和生成报告的准确性、完整性方面优于现有主流方法，证明了方法的有效性。

Abstract: Medical report generation from images is challenging due to low image contrast and the small size of abnormal regions, making it difficult to accurately capture abnormal features using visual information alone. Therefore, introducing external knowledge to enhance visual representation becomes a key issue. In addition, the co-occurrence patterns of abnormal features are complex and cannot be effectively captured from a single instance, making it crucial to leverage similar cases to model such patterns. To address the aforementioned challenges, a Similar-Instance Guided method for medical report generation is proposed, consisting of two main components: Image Feature Memory Module Incorporating Heterogeneous Graphs(FMHG) and Similar Instance Feature Fusion Module(SIFF). FMHG extracts entity relationships from the report and constructs a corresponding heterogeneous graph as a bridge, guiding the model's attention to the abnormal regions of the image, thus enhancing abnormal visual features. SIFF retrieves similar instances and integrates their abnormal visual features, thereby augmenting the representation of abnormal regions while acquiring a more comprehensive under-standing of the abnormal information. Experiments conducted on the IU X-ray and MIMIC-CXR medical imaging datasets demonstrate that the proposed method performs well on the BLEU evaluation metrics, achieving BLEU-1 to BLEU-4 scores of 0.539, 0.353, 0.265, and 0.193 respectively on the IU X-ray dataset. Additionally, it excels in METEOR and ROUGE-L metrics, indicating that the proposed method outperforms existing methods in terms of NLG metrics as well as the accuracy and completeness of the generated reports.

李俊亮, 马俊朋, 刘梦萱, 刘玉雪, 张俊三. 相似实例引导下融合异质图的医学影像报告生成[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252152.

Li Junliang, Ma Junpeng, Liu Mengxuan, Liu Yuxue, Zhang Junsan. Medical Image Report Generation with Similar Instance Guidance and Heterogeneous Graph Fusion[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252152.

参考文献

[1] VINYALS O, TOSHEV A, BENGIO S, et al. Show and tell: a neural image caption generator[C]//Proceedings of the IEEE/CVF conference on computer vision and recognition.2015:3156-3164.
[2] JING B Y, XIE P T, XING E. On the automatic of medical imaging reports[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018:2577-2586.
[3] ZHANG Z, XIE Y, XING F, et al. Mdnet: a semantically and visually interpretable medical image diagnosis network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:6428– 6436.
[4] SHIN H C, ROBERTS K, LU L, DEMNNERFUSHMAN D, et al. Learning to read chest x-rays: recurrent neural cascade model for automated image annotation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:2497– 2506.
[5] LIU F L, YIN C, WU X, et al. Contrastive attention for automatic chest x-ray report generation[C]//Findings of the Association for Computational Linguistics:ACL-IJCNLP. 2021:269-280.
[6] LI Y, LIANG X, HU Z, et al. Hybrid retrieval-generation reinforced agent for medical image report generation[C]//Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018. 2018:1537–1547.
[7] CHEN Z, SONG Y, CHANG T, et al. Generating radiology reports via memory-driven transformer[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 2020:1439–1449.
[8] CHEN Z, SHEN Y, SONG Y, et al. Cross-modal memory networks for radiology report of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing.2021:5904-5914.
[9] 邢素霞,方俊泽,鞠子涵等.基于记忆驱动的多模态医学影像报告自动生成研究[J]. 生物医学工程学杂志, 2024, 41(1): 60-69. XING S X, FANG J Z,JU Z H, et al. Research on automatic generation of multimodal medical image reports based on memory driven[J]. Journal of Biomedical Engineering, 2024, 41(1): 60-69(in Chinese).
[10] 沈秀轩,吴春雷,冯叶棋等.基于双分支特征融合的医学报告生成方法 [J]. 计算机工程 , 2023, 49(6): 274-283,291. SHEN X X,WU C L,FENG Y Q, et al. Medical Report Generation Method Based on Dual-Branch Feature Fusion[J]. Computer Engineering, 2023, 49(6): 274-283,291.(in Chinese).
[11] HOU W, XU K, CHENG Y, et al. Organ: observation-guided radiology report generation via tree reasoning[C]//Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics.2023:8108-8122.
[12] SONG X, ZHANG X, JI J, et al. Cross-modal contrastive attention model for medical report generation[C]//Proceedings of the 29th International Conference on Computational Linguistics. 2022:2388-2397.
[13] LIU F, WU X, GE S, et al. Exploring and distilling posterior and prior knowledge for radiology report generation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2021.
[14] JIN H B, CHE H X, LIN Y, et al. PromptMRG: diagnosis-driven prompts for medical report generation[C]//In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence.2024,38:2607-2615.
[15] KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[J]. arXiv preprint arXiv:2304.02643.
[16] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016:770-778.
[17] JAIN S, AGRAWAL A, SAPORTA A, et al. Radgraph: extracting clinical entities and relations from radiology report[J]. arXiv preprint arXiv:2106.14463.
[18] VASWANI A. Attention is all you need[C]// Advances in Neural Information Processing Systems, 2017. [19] REINERS N, GUREVYCH I. Sentence-Bert: sentence embeddings using Siamese BERT-networks[C]//Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019:3982-3992.
[20] LI J, HU Y, TAO H. A self-guided framework for radiology report generation[C]//International Conference on Medical Image Computing and Computer-Assisted Intervention.2022:588-598.
[21] DEMNNERFUSHMAN D, KOHLI M, ROSENMAN M, et al. Preparing a collection of radiology examinations for distribution and retrieval[J]. Journal of the American Medical Informatics Association. 2016, 23(2):304-310.
[22] JOHNSON A, POLLARD T, BERKOWITZ S, et al. MIMIC-CXR: A large publicly available database of labeled chest radiographs[J]. arXiv preprint arXiv:1901.07042.
[23] PAPINENI K, ROUKOS S, WARD T, et al. Bleu: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002:311-318.
[24] LIN C. ROUGE: A package for automatic evaluation of summaries[C]//Text Summarization Branches Out. 2004:74-81.
[25] BANERJEE S, LAVIE A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. 2005:65-72.
[26] DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C]//IEEE Conference on Computer Vision and PatternRecognition.2009:248-255.
[27] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[C]//7th International Conference on Learning Representations. 2019.
[28] YANG Y, YU J, ZHANG J, et al. Joint embedding of deep visual and semantic features for medical image report generation[J]. IEEE Transactions on Multimedia, 2021, 25: 167-178.
[29] ZHANG J, SHEN X, WAN S, et al. A novel deep learning model for medical report generation by inter-intra information calibration[J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27: 5110-5121.
[30] YANG X, WU X, GE S, et al. Radiology report generation with a learned knowledge base and multi-modal alignment[J]. Medical Image Analysis, 2023, 86:102798.
[31] ZHANG K, JIANG H, ZHANG J, et al. Semi-supervised medical report generation via graph-guided hybrid feature consistency[J]. IEEE Transactions on Multimedia, 2023, 26: 904-915.
[32] WANG Z, LIU L, Wang L, et al. R2GenGPT: Radiology Report Generation with frozen LLMs[J]. Meta-Radiology, 2023, 1(3): 100033.
[33] JIN Y, CHEN W, TIAN Y, et al. Improving radiology report generation with d2-net: When diffusion meets discriminator[C]//IEEE International Conference on Acoustics, Speech and Signal Processing.2024:2215-2219.
[34] LIU Z, ZHU Z, ZHENG S, et al. From observation to concept: A flexible multi-view paradigm for medical report generation[J]. IEEE Transactions on Multimedia, 2024, 26: 5987-5995.
[35] CHEN W, LIU Y, WANG C, et al. Cross-Modal Causal Intervention for Medical Report Generation[J].arXiv preprint arXiv:2303.09117.
[36] WANG X, WANG F, Wang B, et al. Activating associative disease-aware vision token memory for llm-based x-ray report generation[J].arXiv preprint arXiv:2501.03458.
[37] TU T, AZIZI S, DRIESS D, et al. Towards Generalist Biomedical AI[J].arXiv preprint arXiv:2307.14334.

选择文件类型/文献管理软件名称

选择包含的内容