计算机工程 ›› 2019, Vol. 45 ›› Issue (2): 284-289.doi: 10.19678/j.issn.1000-3428.0049718

• 开发研究与工程应用 • 上一篇    下一篇

基于GPU的并行Cholesky分解及其应用

沈雁1,戴瑜兴1,2   

  1. 1.湖南大学 电气与信息工程学院,长沙 410082; 2.温州大学 数理与电子信息工程学院,浙江 温州 325035
  • 收稿日期:2017-12-15 出版日期:2019-02-15 发布日期:2019-02-15
  • 作者简介:沈雁(1985—),男,博士研究生,主研方向为机器学习、计算机视觉、机器人;戴瑜兴,教授、博士生导师。
  • 基金项目:

    浙江省自然科学基金重点项目(LZ16E050002)。

Parallel Cholesky Decomposition and Its Application Based on GPU

SHEN Yan1,DAI Yuxing1,2   

  1. 1.College of Electrical and Information Engineering,Hunan University,Changsha 410082,China; 2.College of Mathematics,Physics and Electronic Information Engineering,Wenzhou University,Wenzhou,Zhejiang 325035,China
  • Received:2017-12-15 Online:2019-02-15 Published:2019-02-15

摘要:

在OpenCL并行计算框架的clMAGMA库中,Cholesky分解算法采用大尺寸分块并行方法,不能充分利用GPU的高速局部存储器,且在计算过程中存在多次GPU-CPU间的数据传递。为此,提出采用小尺寸分块并行方法,充分利用GPU中的高速局部存储器,使矩阵子块的逆矩阵得到复用,完成对称正定矩阵的高效Cholesky分解,并且其能够应用于三维视觉光束平差问题中的大型正定矩阵的分解。实验结果表明,该方法的Cholesky分解速度比clMAGMA提升50%以上,针对光束平差问题,比Ceres Solver中使用的Eigen库速度提升约38倍。

关键词: 正定系统, Cholesky分解, 并行计算, OpenCL框架, 光束平差

Abstract:

In the clMAGMA library of OpenCL parallel computing framework,the large size block parallel method is used in the Cholesky decomposition algorithm,which can not make full use of the high speed local memory of GPU,and there are many data transfers between GPU-CPU in the calculation process.To solve this problem,a small size block parallel method is proposed.By making full use of the high speed local memory in GPU,the inverse matrix of matrixsubblock is multiplexed,and the efficient Cholesky decomposition of symmetric positive definite matrix is completed,and it can be applied to the decomposition of large positive definite matrix in the problem of three-dimensional vision bundle adjustment.Experimental results show that the speed of Cholesky decomposition is more than 50% higher than that of clMAGMA,and in bundle adjustment problem,the speed is 38 times faster than the Eigen library used in Ceres Solver.

Key words: positive definite system, Cholesky decomposition, parallel computing, OpenCL framework, bundle adjustment

中图分类号: