作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

基于多层次最优传输的癌症生存预测研究

  • 发布日期:2025-11-13

Research on Cancer Survival Prediction Based on Multi-level Optimal Transport

  • Published:2025-11-13

摘要: 基于深度学习的生存预测在整合全玻片图像(Whole Slide Images, WSI)与基因组学数据方面取得进展,但WSI的超高分辨率与转录组的高维特性使特征提取与跨模态融合面临挑战。原型聚合虽可将图块与基因表达压缩为形态学与通路原型以降复杂度,仍存在两大瓶颈:难以捕捉两种模态原型间的细粒度交互; WSI形态学原型与基因通路原型间存在显著的表示异质性。为此,本文提出基于多层次最优传输的弱监督生存预测模型(MOTSurv),包含三项协同创新:其一,双模态原型编码器(病理编码器集成金字塔位置编码PPEG、通路编码器建模通路内相关性)以强化模态内结构与保留模态特异性;其二,级联的多层次最优传输融合机制,先完成粗粒度全局对齐,再细化匹配并纠偏误配,兼顾对齐精度与信息保持;其三,正交解缠模块(ODM),通过模态间特异性正交、模态内特异-共享正交与全局特异-共享正交的多层约束,实现特征解缠并提升可解释性。基于TCGA的BLCA、BRCA与LUAD三数据集的实验结果表明,MOTSurv较先进方法在C-index上平均提升4.22%,消融研究进一步验证了各组件的独立与协同贡献,展示了模型在多模态对齐、结构化表征与生物学可解释性方面的综合优势。

Abstract: Deep learning–based survival prediction has advanced the integration of whole-slide images (WSI) and genomics, yet the ultra–high resolution of WSIs and the high dimensionality of transcriptomics pose substantial challenges for feature extraction and cross-modal fusion. Although prototype aggregation reduces computational burden by compressing tiles and gene expressions into morphological and pathway prototypes, two key bottlenecks remain: accurately capturing fine-grained interactions between modality-specific prototypes, and addressing the pronounced representational heterogeneity between WSI morphological prototypes and genomic pathway prototypes. To tackle these issues, we propose a weakly supervised survival prediction model based on multi-level optimal transport (MOTSurv), comprising three synergistic innovations: first, a dual-modality prototype encoder—integrating a Pyramid Position Encoding Generator (PPEG) in the pathology encoder and modeling intra-pathway dependencies in the pathways encoder—to strengthen intra-modality structure while preserving modality specificity; second, a cascaded multi-level optimal transport fusion mechanism that performs coarse global alignment followed by refined matching with error correction, balancing alignment accuracy and information preservation; and third, an Orthogonal Disentanglement Module (ODM) that enforces multi-level constraints—inter-modal specificity orthogonality, intra-modal specificity–shared orthogonality, and global specificity–shared orthogonality—to achieve explicit feature disentanglement and enhance interpretability. Experiments on the TCGA BLCA, BRCA, and LUAD datasets demonstrate that MOTSurv improves C-index by an average of 4.22% over state-of-the-art methods. Ablation studies further validate the independent and synergistic contributions of each module, highlighting the model’s comprehensive advantages in multimodal alignment, structured representation, and biological interpretability.