作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (6): 299-304. doi: 10.19678/j.issn.1000-3428.0057626

• 开发研究与工程应用 • 上一篇    下一篇

基于GAN异质网络表示学习的疾病关联预测算法

郭梦洁, 熊贇   

  1. 复旦大学 计算机科学技术学院 上海市数据科学重点实验室, 上海 201203
  • 收稿日期:2020-03-09 修回日期:2020-05-13 发布日期:2020-05-21
  • 作者简介:郭梦洁(1994-),女,硕士研究生,主研方向为数据挖掘;熊贇,教授、博士生导师。

Disease Association Prediction Algorithm Using GAN-Based Heterogeneous Network Representation Learning

GUO Mengjie, XIONG Yun   

  1. Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai 201203, China
  • Received:2020-03-09 Revised:2020-05-13 Published:2020-05-21
  • Contact: 国家自然科学基金(U1936213,U1636207);上海市科委发展基金(19511121204,19DZ1200802);上海市科技创新行动计划项目(18511107800)。 E-mail:mjguo17@fudan.edu.cn

摘要: 分析疾病与基因、miRNA等生物实体之间的关联是生物研究领域的重要目标,然而利用海量的数据进行生物学实验成本过高。提出一种基于网络表示学习的关联预测算法,通过多源数据集构建生物异质网络,并给出基于生成式对抗网络的异质网络表示学习算法学习鲁棒的向量表示,算法中的判别器和生成器考虑网络中的关系来捕获丰富的异质语义信息,并通过对抗学习进行训练,在此基础上通过衡量实体向量的相似性预测疾病和基因、miRNA之间的关联。实验结果表明,与HSSVM、GAN等算法相比,该算法在两个关联预测任务上均取得了最高的AUC值,具有更好的预测结果,并且通过引入更多异质数据进行训练,有效提升了算法性能。

关键词: 异质网络, 网络表示学习, 疾病关联预测, 生成式对抗网络, 对抗学习

Abstract: Analyzing the relationship between diseases and biological entities such as genes and miRNAs is an important goal in the field of biological research.However, the cost of biological experiments based on massive data is too high.This paper proposes a correlation prediction algorithm based on network representation learning.A biological heterogeneous network is constructed by integrating multi-source datasets, and this basis a heterogeneous network representation learning algorithm based on Generative Adversarial Network(GAN) is proposed to learn robust vector representations.The discriminator and generator in the proposed method both consider relations in the network to capture rich heterogeneous semantic information and are trained by adversarial learning.On this basis, associations between diseases and genes as well as diseases and miRNAs are predicted by measuring the similarity between entity vectors. Experimental results show that compared with HSSVM, GAN and other algorithms, the algorithm has better prediction performance, achieving the highest AUC value on the two related prediction tasks, and demonstrates that the introduction of more heterogeneous data for training can improve the performance of the algorithm.

Key words: heterogeneous network, network representation learning, disease association prediction, Generative Adversarial Network(GAN), adversarial learning

中图分类号: