作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 开发研究与工程应用 • 上一篇    下一篇

面向节点异构GPU 集群的编程框架

盛冲冲,胡新明,李佳佳,吴百锋   

  1. (复旦大学计算机科学技术学院,上海201203)
  • 收稿日期:2014-03-12 出版日期:2015-02-15 发布日期:2015-02-13
  • 作者简介:盛冲冲(1988 - ),男,硕士研究生,主研方向:嵌入式系统,并行计算;胡新明、李佳佳,硕士;吴百锋,教授。
  • 基金资助:
    复旦大学ASIC 和系统国家重点实验室基金资助项目;华为创新研究计划基金资助项目。

Programming Framework for Node Heterogeneous GPU Cluster

SHENG Chongchong,HU Xinming,LI Jiajia,WU Baifeng   

  1. (School of Compute Science,Fudan University,Shanghai 201203,China)
  • Received:2014-03-12 Online:2015-02-15 Published:2015-02-13

摘要: 基于异构GPU 集群的主流编程方法是MPI 与CUDA 的混合编程或者其简单变形。因为对底层的集群架构不透明,程序员对GPU 集群采用MPI 与CUDA 编写应用程序时需要人为考虑硬件计算资源,复杂度高、可移植性差。为此,基于数据流模型设计和实现面向节点异构GPU 集群体系结构的新型编程框架分布式并行编程框架(DISPAR)。DISPAR 框架包含2 个子系统:(1)代码转换系统StreamCC,是DISPAR 源代码到MPI + CUDA 代码的自动转换器。(2)任务分配系统StreamMAP,具有自动发现异构计算资源和任务自动映射功能的运行时系统。实验结果表明,该框架有效简化了GPU 集群应用程序的编写,可高效地利用异构GPU 集群的计算资源,且程序不依赖于硬件平台,可移植性较好。

关键词: GPU 集群, 异构, 分布式并行编程框架, 代码转换, 任务分配, 可移植性

Abstract: The mainly used programming method for heterogeneous GPU cluster is hybrid MPI / CUDA or its simple deformation. However,because of its transparency to underlying architecture when using hybrid MPI / CUDA to write code for heterogeneous GPU cluster,programmers tend to need detailed knowledge of the hardware resources,which makes the program more complicated and less portable. This paper presents Distributed Parallel Programming Framework (DISPAR),a new programming framework for node-level heterogeneous GPU cluster based on data flow model. DISPAR framework contains two sub-systems,StreamCC and StreamMAP. StreamCC is a code conversion tool which coverts DISPAR code into hybrid MPI / CUDA code. StreamMAP is a run-time system which can detect heterogeneous computing resources and map the tasks to appropriate computing units automatically. Experimental results show that the methods can make efficient use of the computing resources and simplify the programming on heterogeneous GPU cluster. Besides,it has better portability and scalability as the code does not rely on the execution platform.

Key words: GPU cluster, heterogeneous, Distributed Parallel Programming Framework (DISPAR), code conversion, task assignment, portability

中图分类号: