CUDA到异构众核架构的线程映射模型

doi:10.3969/j.issn.1000-3428.2012.09.086

计算机工程 ›› 2012, Vol. 38 ›› Issue (9): 282-284,287. doi: 10.3969/j.issn.1000-3428.2012.09.086

CUDA到异构众核架构的线程映射模型

余勇，庞建民，单征，刘晓楠

(解放军信息工程大学信息工程学院，郑州 450002)

收稿日期:2011-09-06 出版日期:2012-05-05 发布日期:2012-05-05
作者简介:余勇(1984－)，男，硕士研究生，主研方向：逆向工程；庞建民，教授、博士生导师；单征，副教授；刘晓楠，博士研究生
基金资助:
国家“863”计划基金资助重点项目(2009AA012201)； “核高基”重大专项(2009ZX01036-001-001)；河南省重大科技攻关计划基金资助项目(092101210501)

Thread Mapping Model from CUDA to Heterogeneous Many-core Architecture

YU Yong, PANG Jian-min, SHAN Zheng, LIU Xiao-nan

(Institute of Information Engineering, PLA Information Engineering University, Zhengzhou 450002, China)

Received:2011-09-06 Online:2012-05-05 Published:2012-05-05

摘要/Abstract

摘要： 统一计算设备架构(CUDA)程序移植到其他异构众核架构时的线程数不匹配。为此，提出一种层次化的线程映射模型。在第1个映射层次上，将CUDA主机端线程和设备端线程分别映射到目标平台的主核和从核阵列上，在第2个映射层次上，采用线程循环的方法消除协作线程阵列(CTA)中线程间同步操作，将整个CTA映射到从核阵列的一个从核上。实验结果表明，该模型能使CUDA程序在其他异构众核系统上得到有效运行。

关键词: 代码移植, 图形处理器, 统一计算设备架构, 异构众核架构, 流式多处理器, 线程循环

Abstract: Aiming at the problem that the number of threads is not matched when migrating Compute Unified Device Architecture(CUDA) programs to other heterogeneous many-core architecture, the paper proposes a hierarchical thread mapping model. In the first level, the model maps the host threads and device threads of CUDA to the target platform’s master core and slave core array respectively. In the second level, the model removes synchronization operations of Cooperative Threads Array(CTA) threads by thread loop, and maps the whole CTA to a slave core. Experimental results show that the model allows CUDA programs to be implemented effectively in other heterogeneous many-core systems.

Key words: code transplantation, Graphics Processing Unit(GPU), Compute Unified Device Architecture(CUDA), heterogeneous many-core architecture, Streaming Multiprocessor(SM), thread cycle

中图分类号:

TP945.12

余勇, 庞建民, 单征, 刘晓楠. CUDA到异构众核架构的线程映射模型[J]. 计算机工程, 2012, 38(9): 282-284,287.

TU Yong, LONG Jian-Min, CHAN Zheng, LIU Xiao-Nan. Thread Mapping Model from CUDA to Heterogeneous Many-core Architecture[J]. Computer Engineering, 2012, 38(9): 282-284,287.

http://www.ecice06.com/CN/Y2012/V38/I9/282

[1]	林琳, 祝爱琦, 赵明璨, 张帅, 叶炎昊, 徐骥, 韩林, 赵荣彩, 侯超峰. 晶硅分子动力学模拟的GPU加速算法优化[J]. 计算机工程, 2023, 49(4): 166-173.
[2]	李博, 黄东强, 贾金芳, 吴利, 王晓英, 黄建强. 基于CPU与GPU的异构模板计算优化研究[J]. 计算机工程, 2023, 49(4): 131-137.
[3]	李靖, 祝爱琦, 韩林, 侯超峰. 基于GPU的固态晶体硅分子动力学算法优化[J]. 计算机工程, 2023, 49(3): 288-295.
[4]	肖汉, 郭宝云, 李彩林, 周清雷. 面向异构架构的传递闭包并行算法[J]. 计算机工程, 2021, 47(8): 131-139.
[5]	杨世伟, 蒋国平, 宋玉蓉, 涂潇. 基于GPU的稀疏矩阵存储格式优化研究[J]. 计算机工程, 2019, 45(9): 23-31,39.
[6]	周琦,柴小丽,马克杰,俞则人. 基于CUDA与CUBLAS的Tucker分解模块设计与实现[J]. 计算机工程, 2019, 45(3): 41-46.
[7]	汤佳,龚奕利,李文海. 一种基于GPU的KNN动态扩展查询策略[J]. 计算机工程, 2018, 44(6): 1-7.
[8]	高艺,罗健欣,裘杭萍,吴波. 基于GPU栅格化的任意多边形布尔运算[J]. 计算机工程, 2018, 44(3): 301-306,314.
[9]	魏渐俊,陈良育. 基于GPGPU的大整数矩阵行列式快速准确计算方法[J]. 计算机工程, 2018, 44(3): 47-54.
[10]	王吉军,程华. 通用图形处理器功耗估算模型[J]. 计算机工程, 2017, 43(2): 92-97,104.
[11]	马冬冬,衷璐洁,朱敬茹. 基于GPU的LLVM程序分析信息并行提取[J]. 计算机工程, 2017, 43(10): 23-30.
[12]	裴鑫,聂俊,陈卯蒸,李健. 基于混合架构的双通道实时相关器实现[J]. 计算机工程, 2016, 42(5): 42-46,53.
[13]	陈勇,吴晓民,杨坚,奚宏生. 基于CUDA的H.264并行解码器设计与实现[J]. 计算机工程, 2016, 42(5): 249-252,257.
[14]	王震,许晓航,王静,李圣,郑宏. 多路高清YUV视频GPU实时拼接研究[J]. 计算机工程, 2016, 42(12): 314-320.
[15]	孟小华,覃大胜,郑冬琴,周玉宇. 基于GPU 的碳纳米管分子动力学并行仿真[J]. 计算机工程, 2015, 41(4): 288-293.

选择文件类型/文献管理软件名称

选择包含的内容

CUDA到异构众核架构的线程映射模型

Thread Mapping Model from CUDA to Heterogeneous Many-core Architecture

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

CUDA到异构众核架构的线程映射模型

Thread Mapping Model from CUDA to Heterogeneous Many-core Architecture

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价