作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

多表征融合的分子属性预测模型

  • 出版日期:2025-06-20 发布日期:2025-06-20

Multi-Representation Fusion Model for Molecular Property Prediction

  • Online:2025-06-20 Published:2025-06-20

摘要: 药物研发是一项复杂、成本高但是成功率低的过程。分子属性预测是药物研发过程中基础但是具有挑战性的任务,准确地预测分子性质可以加速药物研发的进程,降低研发成本。随着机器学习,尤其是深度学习的发展,分子属性预测研究取得了很大的进展。但是现在的许多方法使用的分子表征单一,或者没有通过多维表征之间的潜在关联对其进行融合。因此,本研究提出一种新的分子属性预测方法—多表征融合的分子属性预测模型(MRFP),创新性地设计了一种分子表征融合算法,通过融合分子指纹与分子图两种不同类型的表征,生成更全面、细致的分子表征,从而为分子属性预测提供更准确的输入。此外,为了更好地提取分子图中的特征,依据分子特性,设计了一个新的分子图读出模块三阶卷积读出模块(TCNN),该模块能够有效捕捉分子图所表达的信息。通过在MoleculeNet中六个分类数据集和三个预测数据集上的实验,证明了本研究的性能,分类指标平均提高了2.8%,预测指标平均降低了0.47。这一研究不仅为分子属性预测提供了一种新的解决方案,也为药物研发领域中的分子设计和筛选提供了有力的支持,具有广泛的应用前景和潜力。

Abstract: Drug development is a complex, costly, and low-success-rate process. Molecular property prediction is a fundamental yet challenging task in drug development, and accurately predicting molecular properties can accelerate the process and reduce costs. With the advancement of machine learning, particularly deep learning, significant progress has been made in molecular property prediction. However, many existing methods rely on single molecular representations or fail to integrate the potential relationships among multi-dimensional representations. Therefore, this study proposes a novel molecular property prediction method—the Multi-Representation Fusion Model for Molecular Property Prediction (MRFP). It innovatively designs a molecular representation fusion algorithm that integrates two distinct types of molecular representations: molecular fingerprints and molecular graphs, thereby generating a more comprehensive and detailed molecular representation, which provides more accurate input for molecular property prediction. Furthermore, to better extract features in molecular graphs, we have designed a novel molecular graph readout module named the Tri-Step Convolutional Readout Module (TCNN) based on molecular characteristics, which effectively captures the information expressed in molecular graphs. Experimental results on six classification datasets and three regression datasets from MoleculeNet demonstrate the effectiveness of our method, achieving an average improvement of 2.8% in classification metrics and a reduction of 0.47 in regression metrics. This research not only provides a new solution for molecular property prediction but also offers strong support for molecular design and screening in drug development, with broad application prospects and potential.