Abstract:
The efficiency of GPU parallel algorithms depends on the average implement efficiency of the kernel on the streaming multiprocessor. This paper introduces the implementation of the kernel and the relationship among the grid, the block and the thread. It refines the GPU kernel size, and applies it to the ray-tracing algorithm. Experimental results show that the size and direction distribution of GPU kernel affects internal consistency of a block, and the refinement of kernel size can increase the number of warps in one block which runs simultaneously.
Key words:
Graphics Processing Unit(GPU),
Compute Unified Device Architecture(CUDA),
ray tracing
摘要: GPU上的并行算法效率依赖于核函数在流多处理器上的平均运行效率,基于此,分析GPU核的执行方式,以及网格、线程块和线程之间的关系,采用细化核函数的方法将光线跟踪算法进行细化。实验结果证明,核的大小设置和分布方向影响了线程块内部的一致性,核函数的细化能增加线程块中同时运行的线程捆的数量。
关键词:
图形处理器,
计算统一设备体系结构,
光线跟踪
CLC Number:
JIAO Liang-Bao, CHEN Rui. Research on Refinement of GPU Kernel[J]. Computer Engineering, 2010, 36(18): 10-12.
焦良葆, 陈瑞. GPU核函数细化研究[J]. 计算机工程, 2010, 36(18): 10-12.