作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (2): 65-71,78. doi: 10.19678/j.issn.1000-3428.0060360

• 人工智能与模式识别 • 上一篇    下一篇

基于层次聚类的生物网络全局比对算法

田盼盼1, 陈璟1,2   

  1. 1. 江南大学 人工智能与计算机学院, 江苏 无锡 214122;
    2. 江南大学 江苏省模式识别与计算智能工程实验室, 江苏 无锡 214122
  • 收稿日期:2020-12-22 修回日期:2021-02-08 发布日期:2021-02-25
  • 作者简介:田盼盼(1995-),女,硕士研究生,主研方向为复杂网络;陈璟(通信作者),副教授、博士。
  • 基金资助:
    江苏省青年科学基金(BK20150159)。

Global Biological Network Alignment Algorithm Based on Hierarchical Clustering

TIAN Panpan1, CHEN Jing1,2   

  1. 1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu 214122, China;
    2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computing Intelligence, Jiangnan University, Wuxi, Jiangsu 214122, China
  • Received:2020-12-22 Revised:2021-02-08 Published:2021-02-25

摘要: 生物网络比对是研究生物进化过程的重要手段,不同物种间的比对不仅有助于理解物种的知识转移,同时也有助于进行功能预测和检测保守功能成分。然而,现有比对算法很难实现拓扑度量和生物度量同时最优。设计JAlign算法,将拓扑相似性与归一化序列相似性相结合构成目标函数,基于种子-扩展算法和模块检测进行全局比对。在种子筛选阶段,利用Jerarca聚类算法划分功能模块,借助目标函数计算模块间的相似性进行最优模块匹配,并从匹配结果中提取部分节点对作为种子节点。在扩展阶段,将比对从种子节点扩展至其邻居节点,在选择节点对进行扩展比对时综合考虑节点之间的连接关系、度差值、节点相似性等因素。在此基础上,为避免遗漏分散节点,找到剩余未匹配的节点构建二分图,以贪心方式进行最大加权二分图匹配,并将匹配结果合并到比对集合中,完成最终匹配。实验结果表明,JAlign算法能够实现拓扑度量和生物度量的良好平衡,其边正确性指标、诱导保守子结构得分、对称子结构得分和生物质量使用功能一致性指标均优于L-GRAAL、SPINAL和ModuleAlign算法,在时间效率上也具有优势。

关键词: 蛋白质相互作用网络, 网络比对, 层次聚类, 功能模块检测, 种子-扩展算法

Abstract: Biological network alignment is an important means to study the process of biological evolution.The comparison between different species assists not only in understanding the knowledge transfer between species, but also in functional prediction and conserved functional component detection.However, existing comparison algorithms are usually unable to achieve optimal topological measure and biological measure at the same time.This paper presents design of the JAlign algorithm, which combines topological similarity and normalized sequence similarity to form the objective function, and performs global alignment based on the seed-and-extend algorithm and module detection.In the seed selection stage, the Jerarca clustering algorithm is used to divide the functional modules.The similarity between the modules is calculated by using the objective function to perform optimal module matching, and some node pairs are extracted from the matching results as seed nodes.In the extension stage, the comparison extends from the seed node to its neighbor nodes.When selecting the node pair for extended comparison, the conection relationship between nodes, degree difference, node similarity and other factors are comprehensively considered.On this basis, in order to avoid missing scattered nodes, the remaining unmatched nodes are found to build a bipartite graph.The maximum weighted bipartite graph is matched in the greedy way, and the matching results are merged into the comparison set to complete the final matching.The experimental results show that the JAlign algorithm can achieve a good balance between topological measure and biological measure.It provides better results in Edge Correctness(EC) index, Induced Con-served-structure Score(ICS), Symmetric Sub-structure Score(S3) and Functional Coherence(FC) index than L-GRAAL, SPINAL and ModuleAlign algorithms.The proposed algorithm also displays advantages in time efficiency.

Key words: Protein-Protein Interaction(PPI) network, network alignment, hierarchical clustering, functional module detection, seed-and-extend algorithm

中图分类号: