面向大规模动态图的异构图计算系统设计

doi:10.19678/j.issn.1000-3428.0068477

摘要/Abstract

摘要：

图形处理器(GPU)异构集群中处理大规模动态图时GPU计算资源未被充分利用, 并且面向GPU的图划分方法存在局限性导致出现性能瓶颈。为提高图计算系统性能, 提出一种中央处理器(CPU)/GPU分布式异构图计算系统引擎(DH-Engine), 用于提升异构处理器的计算性能。首先提出新的异构图分割算法, 该分割算法以流式图划分为核心, 通过贪心策略调整顶点位置, 进而实现计算节点之间、CPU/GPU之间的动态负载均衡。在初始图划分时基于最多邻居顶点分配图顶点, 在迭代时基于最少连接边动态调整顶点位置。其次, 设计GPU异构计算模型, 通过CPU/GPU功能并行的方式实现协同计算。CPU与GPU并行执行图算法, 提高CPU核心的利用率, 进而提升图计算效率。实验以图算法PageRank、CC(Connected Components)、SSSP(Single-Source Shortest Path)与k-core为例, 将DH-Engine与其他图计算系统展开对比。与未考虑异构计算的图引擎相比, DH-Engine能更好地平衡各节点计算负载以及计算节点内部的异构处理器之间的负载, 通过缩短局部时延来提高整体的计算速度。实验结果表明DH-Engine的CPU/GPU协同度趋于1。相较于其他图计算系统, DH-Engine异构计算的加速比达到5倍, 可以提供更好的图异构计算方案。

关键词: 异构计算, 负载均衡, 动态图, 加速比, 图划分

Abstract:

Graphics Processing Unit (GPU) is not fully utilized when processing large-scale dynamic graphs, and the limitations of GPU-oriented graph partitioning methods lead to performance bottlenecks. To improve the performance of graph computing, a Central Processing Unit (CPU)/GPU Distributed Heterogeneous Engine (DH-Engine) is proposed to improve the performance of heterogeneous processors. First, a new heterogeneous graph partitioning algorithm is proposed. It uses a streaming algorithm for graph partitioning as the core to achieve dynamic load balancing between the computing nodes and between the CPU and GPU. The greedy strategy assigns vertices based on the maximum number of neighboring vertices during the initial graph partitioning and dynamically adjusts the vertex position based on the minimum number of connected edges during the iteration. Second, the system introduces a GPU heterogeneous computing model to improve graph computing efficiency through functional parallelism. The experiment used PageRank, Connected Components(CC), Single-Source Shortest Path(SSSP), and k-core as examples to conduct comparative experiments with other graph computing systems. Compared with other graph engines, DH-Engine can better balance the computing load of each node and the load between heterogeneous processors to shorten the delay and accelerate the overall computing speed. The results show that the CPU/GPU synergy of this system tends to 1, and the heterogeneous computing has speedup ratio of 5 times compared to other graph computing systems. DH-Engine provides an improved heterogeneous graph scheme.

Key words: heterogeneous computing, load balance, dynamic graph, speedup ratio, graph partitioning

张明, 郭文康, 王海峰. 面向大规模动态图的异构图计算系统设计[J]. 计算机工程, 2025, 51(3): 197-207.

ZHANG Ming, GUO Wenkang, WANG Haifeng. Design of Heterogeneous Graph Computing System for Large-Scale Dynamic Graph[J]. Computer Engineering, 2025, 51(3): 197-207.

https://www.ecice06.com/CN/Y2025/V51/I3/197

图/表 9

图1 图异构计算框架

Fig.1 Graph heterogeneous computing framework

图2 图负载均衡模块

Fig.2 Graph load balancing module

图3 异构图计算模块

Fig.3 Heterogeneous graph computing module

图4 图复制因子与划分时间

Fig.4 Graph replication factor and partition time

图5 不同图算法各阶段执行时间

Fig.5 Execution time of graph algorithms in each stage

图6 不同图算法的异构负载

Fig.6 Heterogeneous workloads of different graph algorithms

图7 与其他图算法的比较

Fig.7 Comparison with other graph algorithms

参考文献 29

1	AYALL T A, LIU H W, ZHOU C J, et al. Graph computing systems and partitioning techniques: a survey. IEEE Access, 2022, 10, 118523- 118550. doi: 10.1109/ACCESS.2022.3219422
2	LIU N, LI D S, ZHANG Y M, et al. Large-scale graph processing systems: a survey. Frontiers of Information Technology & Electronic Engineering, 2020, 21(3): 384- 404.
3	MAZAHERI S N, FATEMI A, NEMATBAKHSH M. An investigation of big graph partitioning methods for distribution of graphs in vertex-centric systems. Distributed and Parallel Databases, 2020, 38(1): 1- 29. doi: 10.1007/s10619-019-07256-z
4	GHOSH S, DAS N, GONÇALVES T, et al. The journey of graph kernels through two decades. Computer Science Review, 2018, 27, 88- 111. doi: 10.1016/j.cosrev.2017.11.002
5	GUI C Y, ZHENG L, HE B S, et al. A survey on graph processing accelerators: challenges and opportunities. Journal of Computer Science and Technology, 2019, 34(2): 339- 371. doi: 10.1007/s11390-019-1914-z
6	GONZALEZ J E, LOW Y, GU H J, et al. PowerGraph: distributed graph-parallel computation on natural graphs[C]//Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation. Hollywood, USA: USENIX Association, 2012: 17-30.
7	ARIDHI S, MONTRESOR A, VELEGRAKIS Y. BLADYG: a graph processing framework for large dynamic graphs. Big Data Research, 2017, 9, 9- 17. doi: 10.1016/j.bdr.2017.05.003
8	FAN W F, HE T, LAI L B, et al. GraphScope. Proceedings of the VLDB Endowment, 2021, 14(12): 2879- 2892. doi: 10.14778/3476311.3476369
9	SHI X H, ZHENG Z G, ZHOU Y L, et al. Graph processing on GPUs. ACM Computing Surveys, 2018, 50(6): 1- 35.
10	ZHANG T, ZHANG J J, SHU W, et al. Efficient graph computation on hybrid CPU and GPU systems. The Journal of Supercomputing, 2015, 71(4): 1563- 1586. doi: 10.1007/s11227-015-1378-z
11	JIA Z H, KWON Y, SHIPMAN G, et al. A distributed multi-GPU system for fast graph processing. Proceedings of the VLDB Endowment, 2017, 11(3): 297- 310. doi: 10.14778/3157794.3157799
12	WANG Y Z H, PANY C, DAVIDSON A, et al. Gunrock: GPU graph analytics. ACM Transactions on Parallel Computing, 2017, 4(1): 2329- 4949.
13	ZHOU S J, KANNAN R, PRASANNA V K, et al. HitGraph: high-throughput graph processing framework on FPGA. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(10): 2249- 2264. doi: 10.1109/TPDS.2019.2910068
14	MIAO X P, MA L X, YANG Z, et al. CuWide: towards efficient flow-based training for sparse wide models on GPUs. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(9): 4119- 4132. doi: 10.1109/TKDE.2020.3038109
15	ZHU H Z, HE L G, LEEKE M, et al. WolfGraph: the edge-centric graph processing on GPU. Future Generation Computer Systems, 2020, 111, 552- 569. doi: 10.1016/j.future.2019.09.052
16	WANG P Y, WANG J, LI C, et al. Grus. ACM Transactions on Architecture and Code Optimization, 2021, 18(2): 1- 25.
17	ZHANG Y, PENG D, LIAO X F, et al. LargeGraph. ACM Transactions on Architecture and Code Optimization, 2021, 18(4): 1- 24.
18	YANG C, BULUÇ A, OWENS J D. GraphBLAST: a high-performance linear algebra-based graph framework on the GPU. ACM Transactions on Mathematical Software, 2022, 48(1): 1- 51.
19	蒋筱斌, 熊轶翔, 张珩, 等. ChattyGraph: 面向异构多协处理器环境的高可扩展图计算系统. 软件学报, 2023, 34(4): 1977- 1996.
	JIANG X B, XIONG Y X, ZHANG H, et al. ChattyGraph: highly scalable graph computing system for heterogeneous multi accelerators. Journal of Software, 2023, 34(4): 1977- 1996.
20	FAN W F, XU J B, WU Y H, et al. Parallelizing sequential graph computations[C]//Proceedings of the 2017 ACM International Conference on Management of Data. New York, USA: ACM Press, 2017: 495-510.
21	钱裳云, 邵志远, 郑然, 等. 图数据库中基于GPU的图分析计算方法. 计算机工程, 2021, 47(6): 52- 59. doi: 10.19678/j.issn.1000-3428.0057965
	QIAN S Y, SHAO Z Y, ZHENG R, et al. GPU-based graph analysis and computation method for graph database. Computer Engineering, 2021, 47(6): 52- 59. doi: 10.19678/j.issn.1000-3428.0057965
22	王晓峰, 于卓, 赵健, 等. 大规模图例的最大团问题算法分析. 计算机工程, 2022, 48(6): 182-192, 199. doi: 10.19678/j.issn.1000-3428.0063092
	WANG X F, YU Z, ZHAO J, et al. Algorithm analysis for solving maximum clique problems of large-scale graphs. Computer Engineering, 2022, 48(6): 182-192, 199. doi: 10.19678/j.issn.1000-3428.0063092
23	HUANG J, WANG H, FEI X, et al. TCStream: large-scale graph triangle-counting on a single Machine using GPUs. IEEE Transactions on Parallel and Distributed Systems, 2022, 33(11): 3067- 3078.
24	PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: bringing order to the Web[C]//Proceedings of the Web Conference. [S. l. ]: International World Wide Web Conference Committee, 1999: 1-10.
25	LESKOVEC J, KREVL A. SNAP datasets: Stanford large network dataset collection[EB/OL]. [2024-09-10]. http://snap.stanford.edu/data.
26	FENG X, CHANG L J, LIN X M, et al. Distributed computing connected components with linear communication cost. Distributed and Parallel Databases, 2018, 36(3): 555- 592.
27	MALEKI S, NGUYEN D, LENHARTH A, et al. DSMR: a shared and distributed memory algorithm for single-source shortest path problem[C]//Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York, USA: ACM Press, 2016: 1-10.
28	BATAGELJ V, ZAVERSNIK M. An O(m) algorithm for cores decomposition of networks[EB/OL]. [2024-09-10]. https://arxiv.org/pdf/cs.DS/0310049.
29	LESKOVEC J, LANG K J, DASGUPTA A, et al. Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 2009, 6(1): 29- 123.

[1]	聂雷, 胡字升, 鲍海洲. 基于RSU辅助和自适应分簇的异构车载网络选择方法[J]. 计算机工程, 2025, 51(3): 162-171.
[2]	魏德宾, 乔维维, 张怡. 基于麻雀搜索算法的软件定义卫星网络控制器部署[J]. 计算机工程, 2025, 51(3): 172-179.
[3]	杨太龙, 赵红朋, 张磊. 基于国产异构平台的奇异值分解法[J]. 计算机工程, 2024, 50(9): 216-225.
[4]	严长宇, 张磊. 基于任务复制与预调度的混合列表调度算法[J]. 计算机工程, 2024, 50(12): 124-132.
[5]	彭世明, 林士飏, 贾硕, 杨苗会. 基于负载预测的多目标优化任务卸载策略[J]. 计算机工程, 2024, 50(1): 206-215.
[6]	刘向举, 赵犇, 方贤进, 徐杨洋. SDN中基于过程优化的动态负载均衡策略[J]. 计算机工程, 2023, 49(8): 137-145.
[7]	杨周凡, 韩林, 李冰洋, 谢景明, 韩璞, 刘勇杰. 基于“嵩山”超级计算机系统的大规模管网仿真[J]. 计算机工程, 2022, 48(9): 155-161.
[8]	王晓峰, 于卓, 赵健, 曹泽轩. 大规模图例的最大团问题算法分析[J]. 计算机工程, 2022, 48(6): 182-192,199.
[9]	王奎宇, 宋晓勤, 缪娟娟, 张昕婷, 雷磊. 基于SDN的高性能QoS保障低轨道卫星星间路由算法[J]. 计算机工程, 2022, 48(5): 185-190,199.
[10]	朱旭东, 熊贇. 基于多层次注意力与图模型的图像多标签分类算法[J]. 计算机工程, 2022, 48(4): 173-178,190.
[11]	刘鹏飞, 朱健晨, 万良易, 江波. 低功耗异构计算架构的高光谱遥感图像分类研究[J]. 计算机工程, 2022, 48(12): 9-15,23.
[12]	贺鹏飞, 范鹏飞, 尹千慧, 王中训, 张桐敬, 梁大伟. 基于负载均衡算法的Hyperledger Fabric共识机制研究[J]. 计算机工程, 2022, 48(11): 170-176.
[13]	李亚朋, 庞建民, 徐金龙, 聂凯. 一种针对线性循环结构的非线性静态调度策略[J]. 计算机工程, 2022, 48(1): 155-162.
[14]	施凌鹏, 朱征, 周俊松, 李鑫, 李静. 面向微服务架构的云系统负载均衡机制[J]. 计算机工程, 2021, 47(9): 44-50,58.
[15]	左攀, 束永安. DCN中基于前馈神经网络的动态多路径负载均衡方法[J]. 计算机工程, 2021, 47(9): 113-119.

选择文件类型/文献管理软件名称

选择包含的内容