Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

Generating explanation subgraphs for synthetic lethality prediction

  

  • Published:2025-12-12

面向合成致死预测的解释子图生成方法

Abstract: Predicting synthetic lethality (SL) interactions holds significant promise for anticancer drug discovery. However, existing interpretable SL prediction methods typically assume a fixed number of explanation patterns, limiting their ability to capture the inherent diversity underlying SL mechanisms. In this study, we propose DiSE4SL, a model that formulates the generation of explanatory subgraphs as a stochastic process in function space, thereby addressing the critical challenge of adaptively determining the number of explanatory patterns. Built upon the neural process framework, DiSE4SL first leverages a base SL predictor to obtain prediction scores and node embeddings for gene pairs. A context encoder then integrates structural features with predictive semantics into a unified vector representation, which parameterizes the conditional posterior of a Gaussian Mixture Model (GMM), mapping distinct explanatory patterns to different Gaussian components. During training, latent variables are sampled via the Gumbel–Softmax mechanism, and mode-aware attention weights sparsify local subgraphs to yield explanations. In addition, contrastive loss and Lipschitz regularization are introduced to encourage discriminative yet smooth explanatory patterns across components. Finally, by sampling latent variables and applying clustering without a preset number of clusters, DiSE4SL can adaptively extract multiple explanatory subgraphs for each gene pair. The effectiveness of DiSE4SL is validated on benchmark datasets, where it delivers competitive predictive performance (AUPR 0.9337) against the strongest baseline and significantly enhances explanation diversity and fidelity by 29.1% and 9.5%, respectively, compared to the second-best method.

摘要: 合成致死(SL)相互作用的预测在抗癌药物研发中具有重要的应用前景。然而,现有可解释SL预测方法大多为每个基因对生成固定数量的解释子图,难以充分反映SL机制固有的多样性。本文提出 DiSE4SL 模型,通过将解释子图的生成过程建模为函数空间上的随机过程,以解决解释模式数量自适应这一关键问题。该模型基于神经过程框架,首先利用基础SL预测器获取基因对的预测得分和节点嵌入,然后通过上下文编码器将结构特征与预测语义融合为统一向量表示,进而参数化高斯混合模型(GMM)的条件后验分布,将不同解释模式映射至不同高斯分量。在模型训练过程中,本文采用Gumbel-Softmax机制采样潜变量,通过模式感知的注意力权重对局部子图进行稀疏化以生成解释子图,同时引入对比损失与Lipschitz约束,促进各分量学习具有区分性且平滑的解释模式。最终,通过对潜变量采样并进行无预设簇数的聚类,DiSE4SL 能够自适应地为每个基因对提取多个解释子图。在基准数据集上的实验表明,DiSE4SL 在不牺牲预测精度的前提下达到了与最强基线相当的预测性能(AUPR 0.9337),同时在解释的多样性和忠诚性分别较次优方法提高了29.1%和9.5%,验证了该方法的有效性。