计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

NVIDIA Tegra K1异构计算平台访存优化研究

梁军 1a,李威 1b,肖琳 2,徐歆恺 1a   

  1. (1.北京联合大学 a.电子信息技术实验实训基地; b.自动化学院,北京 100101;2.北京联合大学 应用科技学院,北京 102200)
  • 收稿日期:2016-04-29 出版日期:2016-12-15 发布日期:2016-12-15
  • 作者简介:梁军(1962—),女,副教授、硕士,主研方向为异构并行计算、数字图像处理;李威,学士;肖琳,副教授;徐歆恺,讲师。
  • 基金项目:
    国家自然科学基金重大研究计划项目(91420202);北京市教育委员会科技计划面上项目(SQKM201411417010,KM2015 11417003)。

Research on Memory Access Optimization of NVIDIA Tegra K1 Heterogeneous Computing Platform

LIANG Jun  1a,LI Wei  1b,XIAO Lin  2,XU Xinkai  1a   

  1. (1a.Training Center of Electronic Information; 1b.College of Automation,Beijing Union University,Beijing 100101,China;2.College of Applied Science and Technology,Beijing Union University,Beijing 102200,China)
  • Received:2016-04-29 Online:2016-12-15 Published:2016-12-15

摘要: 在异构计算平台的移植和优化过程中,数字图像处理算法的访存性能已成为制约系统性能的主要因素。为此,结合NVIDIA Tegra K1硬件架构特征和具体算法特性,从合并与向量化访存优化、全局访存bank和channel冲突消除等方面,对矩阵转置算法和拉普拉斯滤波算法在NVIDIA Tegra K1异构计算平台上的实现和访存性能优化进行研究。实验结果表明,采用优化方法后的矩阵转置算法和拉普拉斯滤波算法在NVIDIA Tegra K1异构计算平台上取得了较大的访存性能提升,并且具有较好的实时性。

关键词: GPU优化, 访存带宽, 数据本地化, 向量化, 合并访问, 拉普拉斯滤波算法

Abstract: During the transplantation and optimization of the heterogeneous computing platform,memory access performance of digital image data algorithm becomes the main factor.In order to solve the problem,this paper combines with the NVIDIA Tegra K1 hardware architecture’s characteristics and the specific algorithm’s characteristics,reserches the implementation and memory access performance optimization of matrix transpose and Laplace filtering algorithms on the NVIDIA Tegra K1 heterogeneous computing platform from memory access optimization of consolidation and vectorization,eliminating global memory access’s bank and channel conflict etc.Experimental result shows that the performance of matrix transpose and Laplace filtering algorithms on the NVIDIA Tegra K1 heterogeneous computing platform has an obvious improvement,and has good real-time performance.

Key words: GPU optimization, memory access bandwidth, data localization, vectorization, coalesced access, Laplace filtering algorithm

中图分类号: