多表征融合的分子属性预测模型

doi:10.19678/j.issn.1000-3428.0252139

摘要/Abstract

摘要： 药物研发是一项复杂、成本高但是成功率低的过程。分子属性预测是药物研发过程中基础但是具有挑战性的任务，准确地预测分子性质可以加速药物研发的进程，降低研发成本。随着机器学习，尤其是深度学习的发展，分子属性预测研究取得了很大的进展。但是现在的许多方法使用的分子表征单一，或者没有通过多维表征之间的潜在关联对其进行融合。因此，本研究提出一种新的分子属性预测方法—多表征融合的分子属性预测模型（MRFP），创新性地设计了一种分子表征融合算法，通过融合分子指纹与分子图两种不同类型的表征，生成更全面、细致的分子表征，从而为分子属性预测提供更准确的输入。此外，为了更好地提取分子图中的特征，依据分子特性，设计了一个新的分子图读出模块三阶卷积读出模块（TCNN），该模块能够有效捕捉分子图所表达的信息。通过在MoleculeNet中六个分类数据集和三个预测数据集上的实验，证明了本研究的性能，分类指标平均提高了2.8%，预测指标平均降低了0.47。这一研究不仅为分子属性预测提供了一种新的解决方案，也为药物研发领域中的分子设计和筛选提供了有力的支持，具有广泛的应用前景和潜力。

Abstract: Drug development is a complex, costly, and low-success-rate process. Molecular property prediction is a fundamental yet challenging task in drug development, and accurately predicting molecular properties can accelerate the process and reduce costs. With the advancement of machine learning, particularly deep learning, significant progress has been made in molecular property prediction. However, many existing methods rely on single molecular representations or fail to integrate the potential relationships among multi-dimensional representations. Therefore, this study proposes a novel molecular property prediction method—the Multi-Representation Fusion Model for Molecular Property Prediction (MRFP). It innovatively designs a molecular representation fusion algorithm that integrates two distinct types of molecular representations: molecular fingerprints and molecular graphs, thereby generating a more comprehensive and detailed molecular representation, which provides more accurate input for molecular property prediction. Furthermore, to better extract features in molecular graphs, we have designed a novel molecular graph readout module named the Tri-Step Convolutional Readout Module (TCNN) based on molecular characteristics, which effectively captures the information expressed in molecular graphs. Experimental results on six classification datasets and three regression datasets from MoleculeNet demonstrate the effectiveness of our method, achieving an average improvement of 2.8% in classification metrics and a reduction of 0.47 in regression metrics. This research not only provides a new solution for molecular property prediction but also offers strong support for molecular design and screening in drug development, with broad application prospects and potential.

张克威, 温昕, 张文慧, 曹锐. 多表征融合的分子属性预测模型[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252139.

Kewei Zhang, Xin Wen, Wenhui Zhang, Rui Cao. Multi-Representation Fusion Model for Molecular Property Prediction[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252139.

参考文献

[1] LI Y, HSIEH C Y LU R, et al. An adaptive graph learning method for automated molecular interactions and properties predictions[J] Nature Machine Intelligence, 2022, 4(7): 645-651. [2] LIU Y, DUO L, HIRST J D, et al. Three-branch molecular representation learning framework for predicting molecular properties in drug discovery[C]//Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC). Tokyo, Japan: IEEE, 2024: 1983-1989. [3] 蔡瑞初, 许遵鸿, 陈道鑫, 杨振辉, 李梓健, 郝志峰. 基于因果机制的分子属性预测[J]. 计算机工程, 2025, 51(3): 105-112. RUICHU CAI, ZUNHONG XU, DAOXIN CHEN, ZHENHUI YANG, ZIJIAN LI, ZHIFENG HAO. Causal structural-based molecular property prediction[J]. Computer Engineering, 2025, 51(3): 105-112. [4] 张超然. 基于伪孪生网络的分子性质预测模型研究[D]. 黑龙江大学, 2023. ZHANG CHAORAN. Research on molecular property prediction model based on pseudo-siamese network[D]. Heilongjiang University, 2023. [5] Yi H C, You Z H, Huang D S, et al. Graph representation learning in bioinformatics: trends, methods and applications[J]. Briefings in Bioinformatics, 2022, 23(1): bbab340. [6] SHEN C, LUO J, XIA K. Molecular geometric deep learning[J]. Cell Reports Methods, 2023, 3(11): 100621-100635. [7] MORIWAKI H, TIAN Y S, KAWASHITA N, et al. Mordred: a molecular descriptor calculator[J]. Journal of Cheminformatics, 2018, 10(4): 1-14. [8] CAO D S, XIAO N, XU Q S, et al. Rcpi: r/bioconductor package to generate various descriptors of proteins, compounds and their interactions[J]. Bioinformatics, 2015, 31(2): 279-281. [9] WEININGER D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules[J]. Journal of Chemical Information and Computer Sciences, 1988, 28(1): 31-36. [10] SHEN W X, ZENG X, ZHU F, et al. Out-of-the-box deep learning prediction of pharmaceutical properties by broadly learned knowledge-based molecular representations.[J] Nature Machine Intelligence, 2021, 3(4): 334-343. [11] MANCUSO C A, JOHNSON K A, LIU R, et al. Joint representation of molecular networks from multiple species improves gene classification[J]. PLOS Computational Biology, 2024, 20(1): e1011773. [12] JIANG J, ZHANG R, ZHAO Z, et al. MultiGran-SMILES: multi-granularity SMILES learning for molecular property prediction[J]. Bioinformatics, 2022, 38(19): 4573-4580. [13] LV Q, CHEN G, ZHAO L, et al. Mol2Context-vec: learning molecular representation from context awareness for drug discovery[J]. Briefings in Bioinformatics, 2021, 22(6): bbab317. [14] LI Z, JIANG M, WANG S, et al. Deep learning methods for molecular representation and property prediction[J]. Drug Discovery Today, 2022, 27(12): 103373. [15] JIANG X, TAN L, ZOU Q. DGCL: dual-graph neural networks contrastive learning for molecular property prediction[J]. Briefings in Bioinformatics, 2024, 25(6): bbae474. [16] HE G, LIU S, LIU Z, et al. Prototype-based contrastive substructure identification for molecular property prediction[J]. Briefings in Bioinformatics, 2024, 25(6): bbae565. [17] LIU C, SUN Y, DAVIS R, et al. ABT-MPNN: an atom-bond transformer-based message-passing neural network for molecular property prediction[J]. Journal of Cheminformatics, 2023, 15(1): 29-43. [18] LIU S, QU M, ZHANG Z, et al. Structured multi-task learning for molecular property prediction[C]//Proceedings of the 2022 International Conference on Artificial Intelligence and Statistics(AISTATS). Valencia, Spain: PMLR, 2022: 8906-8920. [19] JIANG S, BALAPRAKASH P. Graph neural network architecture search for molecular property prediction[C]//Proceedings of the 2020 IEEE International Conference on Big Data (Big Data). Georgia, USA: IEEE, 2020: 1346-1353. [20] WITHNALL M, LINDELÖF E, ENGKVIST O, et al. Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction[J]. Journal of Cheminformatics, 2020, 12(1): 1-18. [21] WU Z, JIANG D, HSIEH C Y, et al. Hyperbolic relational graph convolution networks plus: a simple but highly efficient QSAR-modeling method[J]. Briefings in Bioinformatics, 2021, 22(5): bbab112. [22] CAI H, ZHANG H, ZHAO D, et al. FP-GNN: a versatile deep learning architecture for enhanced molecular property prediction[J]. Briefings in Bioinformatics, 2022, 23(6): bbac408. [23] ZHU W, ZHANG Y, ZHAO D, et al. HiGNN: a hierarchical informative graph neural network for molecular property prediction equipped with feature-wise attention[J]. Journal of Chemical Information and Modeling, 2022, 63(1): 43-55. [24] ZALIANI, A, et al. On the art of compiling and using 'drug-like' chemical fragment spaces[J]. Journal of Medicinal Chemistry, 2009, 52(3): 775-784. [25] DURANT J L, LELAND B A, HENRY D R, et al. Reoptimization of MDL keys for use in drug discovery[J]. Journal of Chemical Information and Computer Sciences, 2002, 42(6): 1273-1280. [26] WANG C, WANG L, YU H, et al. Machine learning for layer-by-layer nanofiltration membrane performance prediction and polymer candidate exploration[J]. Chemosphere, 2024, 350: 140999. [27] ILNICKA A, SCHNEIDER G. Compression of molecular fingerprints with autoencoder networks[J]. Molecular Informatics, 2023, 42(6): 2300059. [28] 向君.图神经网络在分子属性预测中的技术研究[D]. 重庆大学, 2023. XIANG JUN. Study on molecular property prediction based on graph neural network[D]. Chongqing University, 2023. [29] WANG J, HUANG G, ZHONG G, et al. Qgd-Net: a lightweight model utilizing pixels of affinity in feature layer for dermoscopic lesion segmentation[J]. IEEE Journal of Biomedical and Health Informatics, 2023, 27(12): 5982-5993. [30] LEI X, PAN H, HUANG X. A dilated CNN model for image classification[J]. IEEE Access, 2019, 7: 124087-124095. [31] WU Z, RAMSUNDAR B, FEINBERG E N, et al. MoleculeNet: a benchmark for molecular machine learning[J]. Chemical Science, 2018, 9(2): 513-530. [32] XIONG Z, WANG D, LIU X, et al. Pushing the boundaries of molecular representation for drug discovery with the graph attention mechanism[J]. Journal of Medicinal Chemistry, 2019, 63(16): 8749-8760. [33] LIU Y, FAN Q, XU C, et al. GDMol: generative double‐masking self‐supervised learning for molecular property prediction[J]. Molecular Informatics, 2024, 44(1): e202400146. [34] D. HUANG, S. TU. MulMol: transformer-based multi-task molecular representation learning[C]//Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Lisbon, Portugal: IEEE, 2024: 681-686 [35] J. LI, W. DU, Y. WANG. MolCLW: molecular contrastive learning with learnable weighted substructures[C]//Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Lisbon, Portugal: IEEE, 2024: 828-831.

选择文件类型/文献管理软件名称

选择包含的内容