作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (12): 72-78. doi: 10.19678/j.issn.1000-3428.0063069

• 人工智能与模式识别 • 上一篇    下一篇

基于GAT双聚合运算与归纳式矩阵补全的关联预测

张奕1,2, 郑婧1, 蔡钢生1, 王真梅1   

  1. 1. 桂林理工大学 信息科学与工程学院, 广西 桂林 541004;
    2. 广西嵌入式技术与智能系统重点实验室, 广西 桂林 541004
  • 收稿日期:2021-10-27 修回日期:2022-01-04 发布日期:2022-01-17
  • 作者简介:张奕(1977—),女,教授、博士,主研方向为生物信息学、机器学习、服务计算;郑婧、蔡钢生、王真梅,硕士研究生。
  • 基金资助:
    国家自然科学基金(62166014);广西自然科学基金面上项目(2020GXNSFAA297255);广西嵌入式技术与智能系统重点实验室项目(2019-01-06)。

Association Prediction Based on Duplex Polymerize Operation in GAT and Inductive Matrix Completion

ZHANG Yi1,2, ZHENG Jing1, CAI Gangsheng1, WANG Zhenmei1   

  1. 1. School of Information Science and Engineering, Guilin University of Technology, Guilin, Guangxi 541004, China;
    2. Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin, Guangxi 541004, China
  • Received:2021-10-27 Revised:2022-01-04 Published:2022-01-17

摘要: 可计算模型能够有效替代生物实验进行长链非编码RNA(lncRNA)-疾病的关联预测,但由于存在已知数据稀疏性问题,导致现有模型的预测精度不高。针对这一局限性,提出基于图注意力网络与归纳式矩阵补全技术的双融合机制lncRNA-疾病关联预测模型(DFMP-LDA)。引入n头注意力机制,设计带有双重聚合器的图注意力网络,增强lncRNA节点与疾病节点的特征,避免数据稀疏性导致模型预测精度不高的问题。在此基础上,针对传统图注意力网络不能直接应用于潜在lncRNA-疾病对关联预测的问题,引入归纳式矩阵补全技术,应用增强后的节点特征重建lncRNA-疾病关联网络,进一步提高模型的预测精度。5折交叉验证结果表明,DFMP-LDA预测lncRNA-疾病关联的AUC值为0.932 2,AUPR值为0.770 5,在时间成本上分别较DMF-LDA、SDLDA、TPGLDA模型节省33.89%、32.17%、16.12%,预测性能较优。

关键词: 图注意力网络, 归纳式矩阵补全, 关联预测, 双重聚合器, 特征增强

Abstract: Computational models have been applied in long non-coding RNA(lncRNA)-disease association prediction to effectively replace traditional biological experiments.Due to the sparsity lack of input data, however, the prediction accuracy of existing models remains low.To address this limitations, Dual Fusion Mechanism Prediction model for lncRNA-Disease Association(DFMP-LDA) is proposed based on Graph Attention Network(GAT) and Inductive Martix Completion(IMC).In the first step of DFMP-LDA, a multi-head attention mechanism is introduced to design a GAT with duplex polymerizers, which enhance the features of lncRNA nodes and disease nodes.In the second step, as the traditional GAT cannot be directly applied to the potential lncRNA-disease prediction, IMC technology is introduced to reconstruct the lncRNA-disease association network.The IMC uses the enhanced node features obtained in the first step to improve model accuracy.The results of 5-fold cross-validation show that DFMP-LDA predicts association with an AUC value of 0.932 2 and an AUPR value of 0.770 5, saving 33.89%, 32.17%, 16.12% in time cost compared with DMF-LDA, SDLDA, and TPGLDA, respectively.The experimental results therefore show that DFMP-LDA has better prediction performance than previous prediction frameworks.

Key words: Graph Attention Network(GAT), Inductive Matrix Completion(IMC), association prediction, duplex polymerizer, feature enhancement

中图分类号: