作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (5): 81-94. doi: 10.19678/j.issn.1000-3428.0253035

• 前沿观点与综述 • 上一篇    下一篇

面向医学图像分割的深度学习模型架构与性能评估方法综述

李辉1, 刘佳煜1, 徐雅萍2,*()   

  1. 1. 北京化工大学信息科学与技术学院, 北京 100029
    2. 中日友好医院骨科, 北京 100029
  • 收稿日期:2025-09-18 修回日期:2025-11-18 出版日期:2026-05-15 发布日期:2025-12-15
  • 通讯作者: 徐雅萍
  • 作者简介:

    李辉(CCF高级会员), 男, 副教授、博士, 主研方向为计算机图形学

    刘佳煜, 硕士研究生

    徐雅萍(通信作者), 副主任护师、硕士

  • 基金资助:
    中央高水平医院临床科研业务费专项资金(2025NHLHCRFHLA04)

Review on Deep Learning Model Architectures and Performance Evaluation Methods for Medical Image Segmentation

LI Hui1, LIU Jiayu1, XU Yaping2,*()   

  1. 1. College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
    2. Department of Orthopaedics, China-Japan Friendship Hospital, Beijing 100029, China
  • Received:2025-09-18 Revised:2025-11-18 Online:2026-05-15 Published:2025-12-15
  • Contact: XU Yaping

摘要:

医学图像分割在多模态成像数据中实现病灶或结构的像素级定位是支撑辅助诊断与临床决策的关键任务。针对医学图像分割网络架构快速演化与评价指标存在的语义歧义、统计不稳等局限, 旨在系统梳理网络结构、任务特征和评价指标三者间的适配关系, 揭示方法发展路径与性能边界, 构建面向实际应用需求的结构-指标匹配机制。基于2020—2025年Web of Science核心数据库的代表性文献, 首先梳理Transformer、图神经网络(GNN)、扩散模型等主干架构的设计机制与演化路径; 然后总结轻量化、混合结构及提示引导范式的关键特征; 接着结合公开数据集实证研究, 对不同网络结构在器官、肿瘤与脑组织等典型任务中的分割性能进行定量对比, 涵盖95%豪斯多夫距离(HD95)、Dice相似系数(DSC)、交并比(IoU)等常用指标, 并识别出HD95在边界复杂任务中波动较大、DSC对小目标敏感性不足、IoU在结构区分方面存在局限等问题; 最后进一步揭示了指标误用与任务特征不匹配的统计根源, 构建了任务结构-指标推荐映射, 提出基于任务粒度的指标选择策略, 并探讨动态网络、自监督学习、跨模态建模等方向对模型泛化能力的潜在促进作用。

关键词: 医学图像分割, 深度学习, 网络架构, 评价指标体系, 任务适配

Abstract:

Medical image segmentation enables pixel-level localization of lesions or anatomical structures in multimodal imaging data and serves as a key foundation for computer-aided diagnosis and clinical decision-making. This study addresses the rapid evolution of medical image segmentation network architectures and the inherent limitations (semantic ambiguity and statistical instability) of existing evaluation metrics. This study aims to systematically examine and delineate the alignment among network structure, task characteristics, and evaluation metrics; reveal the method development path and performance boundaries; and establish a structure-metric matching mechanism tailored to practical clinical needs. Based on representative literature from the Web of Science Core Collection between 2020 and 2025, this study first reviews the design mechanisms and evolutionary pathways of core architectures, such as Transformers, Graph Neural Networks (GNNs), and Diffusion Models (DMs), and then summarizes the essential characteristics of lightweight, hybrid, and prompt-guided paradigms. Subsequently, by integrating empirical studies on public datasets, a quantitative comparison is conducted across different architectures in typical segmentation tasks involving organs, tumors, and brain tissues, covering common metrics such as the Dice Similarity Coefficient (DSC), 95% Hausdorff Distance (HD95), and Intersection over Union (IoU). The results indicate that HD95 exhibits high variability in boundary-complex tasks, DSC shows limited sensitivity to small targets, and IoU presents insufficient structural discrimination capability. Furthermore, this study reveals the statistical causes underlying metric misapplication and task-metric mismatch; constructs a task-structure-to-metric recommendation mapping; proposes a task-granularity-based metric selection strategy; and explores how dynamic networks, self-supervised learning, and cross-modal modeling contribute to the enhancement of model generalization.

Key words: medical image segmentation, deep learning, network architecture, evaluation metric system, task adaptation