计算机工程 ›› 2019, Vol. 45 ›› Issue (3): 41-46.doi: 10.19678/j.issn.1000-3428.0052189

• 体系结构与软件技术 • 上一篇    下一篇

基于CUDA与CUBLAS的Tucker分解模块设计与实现

周琦,柴小丽,马克杰,俞则人   

  1. 中国电子科技集团公司第三十二研究所,上海 201808
  • 收稿日期:2018-07-23 出版日期:2019-03-15 发布日期:2019-03-15
  • 作者简介:周琦(1990—),男,硕士研究生,主研方向为异构计算、FPGA技术;柴小丽,研究员;马克杰、俞则人,工程师
  • 基金项目:

    中国电子科技集团安可系统自由硬件新技术研发项目(170225)

Design and Implementation of Tucker Decomposition Module Based on CUDA and CUBLAS

ZHOU Qi,CHAI Xiaoli,MA Kejie,YU Zeren   

  1. The 32th Research Institution of the China Electronics Technology Group Corporation,Shanghai 201808,China
  • Received:2018-07-23 Online:2019-03-15 Published:2019-03-15

摘要:

由于张量Tucker分解在图像处理、人脸识别与信号处理等领域中的大量应用,使得Tucker分解算法成为目前重点研究对象。但是当前流行的Tucker分解算法需要对张量进行多次展开,导致算法加速效率降低。针对上述问题,提出一种应用于统一计算设备架构(CUDA)平台上的改进Tucker分解模块,通过对Tucker分解算法与CUDA平台进行优化,在省略张量展开过程的同时,提高加速效率,从而降低对加速系统的要求。实验结果表明,改进Tucker分解算法在CUDA平台上的加速性能具有明显提高。

关键词: Tucker分解算法, 张量分解, 统一计算设备架构, 图形处理单元, 张量范数

Abstract:

Because tensor Tucker decomposition is widely used in image processing,face recognition,signal processing and other fields,Tucker decomposition algorithm becomes a key research object.However,the current popular Tucker decomposition algorithm needs to expand tensors many times,which results in that the acceleration efficiency of the algorithm is mostly consumed in tensor multiple expansion.In order to solve the above problems,a modified Tucker decomposition module applied to CUDA platform is proposed.By optimizing the Tucker decomposition algorithm and CUDA platform,the tensor expansion process is omitted,and the requirements of acceleration system are reduced and the acceleration efficiency is improved.Experimental results show that the modified Tucker decomposition algorithm has better acceleration performance on CUDA platform.

Key words: Tucker decomposition algorithm, tensor decomposition, Compute Unified Device Architecture(CUDA), Graphics Processing Unit(GPU), tensor norm

中图分类号: