作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

医学图像分割:深度学习模型架构与性能评估方法综述

  • 发布日期:2025-12-15

Review on Deep Learning Model Architectures and Performance Evaluation Methods for Medical Image Segmentation

  • Published:2025-12-15

摘要: 医学图像分割在多模态成像数据中实现病灶或结构的像素级定位,是支撑辅助诊断与临床决策的关键任务。针对医学图像分割网络架构快速演化与评价指标存在的语义歧义、统计不稳等局限,本文旨在系统梳理网络结构、任务特征和评价指标三者间的适配关系,揭示方法发展路径与性能边界,构建面向实际应用需求的结构—指标匹配机制。基于2020—2025年Web of Science核心数据库的代表性文献,本文首先梳理 Transformer、图神经网络、扩散模型等主干架构的设计机制与演化路径,再总结轻量化、混合结构及提示引导范式的关键特征。其次,结合公开数据集实证研究,对不同网络结构在器官、肿瘤与脑组织等典型任务中的分割性能进行定量对比,涵盖DSC、HD95等常用指标,识别出HD95在边界复杂任务中波动较大,DSC对小目标敏感性不足,IoU在结构区分方面存在局限等问题。本文进一步揭示了指标误用与任务特征不匹配的统计根源,构建了任务结构–指标推荐映射,提出基于任务粒度的指标选择策略,并探讨动态网络、自监督学习与跨模态建模等方向对模型泛化能力的潜在促进作用。

Abstract: Medical image segmentation enables pixel-level localization of lesions or anatomical structures in multimodal imaging data, serving as a key foundation for computer-aided diagnosis and clinical decision-making. Addressing the rapid evolution of medical image segmentation network architectures and the inherent limitations (semantic ambiguity, statistical instability) of existing evaluation metrics, this paper aims to systematically analyze the relationship among network structure, task characteristics, and evaluation metrics, revealing the method development path and performance boundaries, and establishing a Structure-Metric matching mechanism tailored for practical clinical needs. Based on representative literature from the Web of Science Core Collection between 2020 and 2025, this study first reviews the design mechanisms and evolutionary pathways of core architectures such as Transformer, graph neural networks (GNNs), and diffusion models, and further summarizes the essential characteristics of lightweight, hybrid, and prompt-guided paradigms. Subsequently, by integrating empirical studies on public datasets, a quantitative comparison is conducted across different architectures in typical segmentation tasks involving organs, tumors, and brain tissues, covering common metrics such as DSC, HD95. The results indicate that HD95 exhibits high variability in boundary-complex tasks, DSC shows limited sensitivity to small targets, and IoU presents insufficient structural discrimination capability. Furthermore, this study reveals the statistical causes underlying metric misapplication and task–metric mismatch, constructs a task–structure-to-metric recommendation mapping, proposes a task-granularity-based metric selection strategy, and explores how dynamic networks, self-supervised learning, and cross-modal modeling contribute to enhancing model generalization.