作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (8): 69-76. doi: 10.19678/j.issn.1000-3428.0065072

• 人工智能与模式识别 • 上一篇    下一篇

基于边权重信息深度网络嵌入的PPIN功能模块检测

李泽水, 冀俊忠*, 杨翠翠   

  1. 北京工业大学 多媒体与智能软件技术北京市重点实验室, 北京 100124
  • 收稿日期:2022-06-24 出版日期:2023-08-15 发布日期:2023-08-15
  • 通讯作者: 冀俊忠
  • 作者简介:

    李泽水(1997—),男,硕士研究生,主研方向为机器学习、生物信息

    杨翠翠,副教授、博士

  • 基金资助:
    国家自然科学基金(61375059)

Functional Module Detection Based on Deep Network Embedding of Edge Weighing Information in PPIN

Zeshui LI, Junzhong JI*, Cuicui YANG   

  1. Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing 100124, China
  • Received:2022-06-24 Online:2023-08-15 Published:2023-08-15
  • Contact: Junzhong JI

摘要:

现有基于网络嵌入的蛋白质相互作用网络(PPIN)功能模块检测方法通常仅对蛋白质节点信息进行网络嵌入,并未对蛋白质间的边权重信息进行网络嵌入,导致蛋白质功能模块检测质量不理想。针对该问题,提出一种基于边权重信息深度网络嵌入的PPIN功能模块检测方法。结合PPIN的拓扑结构以及基因本体的属性信息,通过图注意力网络的注意力系数来衡量蛋白质间的一阶边权重信息,基于邻域聚合对蛋白质的一阶边权重信息进行嵌入。利用长短期记忆网络的遗忘门和输入门来衡量蛋白质间的高阶边权重信息,并对蛋白质的高阶边权重信息进行嵌入。根据网络嵌入得到的低维向量,通过核心附属聚类算法挖掘出核心团并添加附属蛋白质,从而获得最终的蛋白质功能模块。在Collins、Gavin和Krogan蛋白质数据集上的实验结果表明,该方法相较于基于核心附属聚类的蛋白质功能模块检测等方法在准确率和F1值上最高提升了18.1和12.9个百分点。

关键词: 蛋白质相互作用网络, 功能模块检测, 深度学习, 网络嵌入, 核心附属聚类

Abstract:

The existing functional module detection methods of Protein-Protein Interaction Network(PPIN), which are based on network embedding, usually only embed the information of protein nodes and do not embed the information of edge weights between proteins, which deteriorates the quality of protein functional module detection.To solve this problem, a functional module detection method based on deep network embedding of edge weighing information in PPIN is proposed. Combined with the topological structure of PPIN and attribute information of Gene Ontology(GO), the first-order edge weight information between proteins is measured using the attention coefficient of Graph ATtention(GAT) network, and the first-order edge weight information of proteins is embedded based on neighborhood aggregation.The forget and input gates of a Long Short-Term Memory(LSTM) network are used to measure the high-order edge weight information between proteins, whereby this information is embedded.According to the low-dimensional vector obtained by network embedding, the core clique is mined by the core attachment clustering algorithm, and the affiliate proteins are added to obtain the final protein functional module. Experimental results on the Collins, Gavin, and Krogan datasets show that the proposed method improves the accuracy and F1 score by up to 18.1 and 12.9 percentage points, respectively, compared with the methods such as COACH.

Key words: Protein-Protein Interaction Network(PPIN), functional module detection, deep learning, network embedding, core attachment clustering