作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (3): 197-207. doi: 10.19678/j.issn.1000-3428.0068477

• 体系结构与软件技术 • 上一篇    下一篇

面向大规模动态图的异构图计算系统设计

张明1,2, 郭文康1, 王海峰1,2,*()   

  1. 1. 临沂大学信息科学与工程学院, 山东 临沂 276002
    2. 山东省网络重点实验室临沂大学研究所, 山东 临沂 214215
  • 收稿日期:2023-09-27 出版日期:2025-03-15 发布日期:2025-03-25
  • 通讯作者: 王海峰
  • 基金资助:
    山东省自然科学基金面上项目(ZR2023MF090); 山东省科技型中小企业创新能力提升工程项目(2023TSGC0449); 山东省高等学校青创团队引育计划(2021QCYY003); 山东省重点研发资助项目(2019GGX101003)

Design of Heterogeneous Graph Computing System for Large-Scale Dynamic Graph

ZHANG Ming1,2, GUO Wenkang1, WANG Haifeng1,2,*()   

  1. 1. School of Information Science and Engineering, Linyi University, Linyi 276002, Shandong, China
    2. Shandong Provincial Network Key Laboratory Linyi University Research Institute, Linyi 214215, Shandong, China
  • Received:2023-09-27 Online:2025-03-15 Published:2025-03-25
  • Contact: WANG Haifeng

摘要:

图形处理器(GPU)异构集群中处理大规模动态图时GPU计算资源未被充分利用, 并且面向GPU的图划分方法存在局限性导致出现性能瓶颈。为提高图计算系统性能, 提出一种中央处理器(CPU)/GPU分布式异构图计算系统引擎(DH-Engine), 用于提升异构处理器的计算性能。首先提出新的异构图分割算法, 该分割算法以流式图划分为核心, 通过贪心策略调整顶点位置, 进而实现计算节点之间、CPU/GPU之间的动态负载均衡。在初始图划分时基于最多邻居顶点分配图顶点, 在迭代时基于最少连接边动态调整顶点位置。其次, 设计GPU异构计算模型, 通过CPU/GPU功能并行的方式实现协同计算。CPU与GPU并行执行图算法, 提高CPU核心的利用率, 进而提升图计算效率。实验以图算法PageRank、CC(Connected Components)、SSSP(Single-Source Shortest Path)与k-core为例, 将DH-Engine与其他图计算系统展开对比。与未考虑异构计算的图引擎相比, DH-Engine能更好地平衡各节点计算负载以及计算节点内部的异构处理器之间的负载, 通过缩短局部时延来提高整体的计算速度。实验结果表明DH-Engine的CPU/GPU协同度趋于1。相较于其他图计算系统, DH-Engine异构计算的加速比达到5倍, 可以提供更好的图异构计算方案。

关键词: 异构计算, 负载均衡, 动态图, 加速比, 图划分

Abstract:

Graphics Processing Unit (GPU) is not fully utilized when processing large-scale dynamic graphs, and the limitations of GPU-oriented graph partitioning methods lead to performance bottlenecks. To improve the performance of graph computing, a Central Processing Unit (CPU)/GPU Distributed Heterogeneous Engine (DH-Engine) is proposed to improve the performance of heterogeneous processors. First, a new heterogeneous graph partitioning algorithm is proposed. It uses a streaming algorithm for graph partitioning as the core to achieve dynamic load balancing between the computing nodes and between the CPU and GPU. The greedy strategy assigns vertices based on the maximum number of neighboring vertices during the initial graph partitioning and dynamically adjusts the vertex position based on the minimum number of connected edges during the iteration. Second, the system introduces a GPU heterogeneous computing model to improve graph computing efficiency through functional parallelism. The experiment used PageRank, Connected Components(CC), Single-Source Shortest Path(SSSP), and k-core as examples to conduct comparative experiments with other graph computing systems. Compared with other graph engines, DH-Engine can better balance the computing load of each node and the load between heterogeneous processors to shorten the delay and accelerate the overall computing speed. The results show that the CPU/GPU synergy of this system tends to 1, and the heterogeneous computing has speedup ratio of 5 times compared to other graph computing systems. DH-Engine provides an improved heterogeneous graph scheme.

Key words: heterogeneous computing, load balance, dynamic graph, speedup ratio, graph partitioning