基于GPU的并行Cholesky分解及其应用

doi:10.19678/j.issn.1000-3428.0049718

计算机工程 ›› 2019, Vol. 45 ›› Issue (2): 284-289. doi: 10.19678/j.issn.1000-3428.0049718

基于GPU的并行Cholesky分解及其应用

沈雁¹,戴瑜兴^1,2

1.湖南大学电气与信息工程学院,长沙 410082; 2.温州大学数理与电子信息工程学院,浙江温州 325035

收稿日期:2017-12-15 出版日期:2019-02-15 发布日期:2019-02-15
作者简介:沈雁(1985—),男,博士研究生,主研方向为机器学习、计算机视觉、机器人;戴瑜兴,教授、博士生导师。
基金资助:
浙江省自然科学基金重点项目(LZ16E050002)。

Parallel Cholesky Decomposition and Its Application Based on GPU

SHEN Yan¹,DAI Yuxing^1,2

1.College of Electrical and Information Engineering,Hunan University,Changsha 410082,China; 2.College of Mathematics,Physics and Electronic Information Engineering,Wenzhou University,Wenzhou,Zhejiang 325035,China

Received:2017-12-15 Online:2019-02-15 Published:2019-02-15

摘要/Abstract

摘要：

在OpenCL并行计算框架的clMAGMA库中,Cholesky分解算法采用大尺寸分块并行方法,不能充分利用GPU的高速局部存储器,且在计算过程中存在多次GPU-CPU间的数据传递。为此,提出采用小尺寸分块并行方法,充分利用GPU中的高速局部存储器,使矩阵子块的逆矩阵得到复用,完成对称正定矩阵的高效Cholesky分解,并且其能够应用于三维视觉光束平差问题中的大型正定矩阵的分解。实验结果表明,该方法的Cholesky分解速度比clMAGMA提升50%以上,针对光束平差问题,比Ceres Solver中使用的Eigen库速度提升约38倍。

关键词: 正定系统, Cholesky分解, 并行计算, OpenCL框架, 光束平差

Abstract:

In the clMAGMA library of OpenCL parallel computing framework,the large size block parallel method is used in the Cholesky decomposition algorithm,which can not make full use of the high speed local memory of GPU,and there are many data transfers between GPU-CPU in the calculation process.To solve this problem,a small size block parallel method is proposed.By making full use of the high speed local memory in GPU,the inverse matrix of matrixsubblock is multiplexed,and the efficient Cholesky decomposition of symmetric positive definite matrix is completed,and it can be applied to the decomposition of large positive definite matrix in the problem of three-dimensional vision bundle adjustment.Experimental results show that the speed of Cholesky decomposition is more than 50% higher than that of clMAGMA,and in bundle adjustment problem,the speed is 38 times faster than the Eigen library used in Ceres Solver.

Key words: positive definite system, Cholesky decomposition, parallel computing, OpenCL framework, bundle adjustment

中图分类号:

TP361

沈雁,戴瑜兴. 基于GPU的并行Cholesky分解及其应用[J]. 计算机工程, 2019, 45(2): 284-289.

SHEN Yan,DAI Yuxing. Parallel Cholesky Decomposition and Its Application Based on GPU[J]. Computer Engineering, 2019, 45(2): 284-289.

https://www.ecice06.com/CN/Y2019/V45/I2/284

参考文献

［1］江慧芳,蔡达,王晓蕊.基于CPU-GPU异构环境的运算代价评估模型［J］.计算机工程,2017,43(9):12-16.
［2］ANDERSON E,BAI Z,BISCHOF C,et al.LAPACK’s user’s guide［M］.Philadelphia,USA:Society for Industrial and Applied Mathematics,1992.
［3］肖玄基,张云泉,李玉成,等.异构平台数学库MAGMA性能测试与分析［J］.软件学报,2013,24(2):118-126.
［4］MUNSHI A,GASTER B,MATTSON T G,et al.OpenCL programming guide［EB/OL］.［2017-11-20］.http://ptgmedia.pearsoncmg.com/images/978032174 9642/samplepages/0321749642.pdf.
［5］CAO C,DONGARRA J,DU P,et al.clMAGMA:high performance dense linear algebra with OpenCL［C］//Proceedings of the International Workshop on OpenCL.New York,USA:ACM Press,2014:1-9.
［6］BALLARD G,DEMMEL J,HOLTZ O,et al.Communication-optimal parallel and sequential cholesky decomposition［C］//Proceedings of the 21st Annual Symposium on Parallelism in Algorithms and Architectures.New York,USA:ACM Press,2009:245-252.
［7］LTAIEF H,TOMOV S,NATH R,et al.A scalable high performant cholesky factorization for multicore with GPU accelerators［C］//Proceedings of the 9th International Conference on High Performance Computing for Computational Science.Berlin,Germany:Springer,2010:93-101.
［8］HOGG J D,REID J K,SCOTT J A.Design of a multicore sparse Cholesky factorization using DAGs［J］.SIAM Journal on Scientific Computing,2009,32(6):3627-3649.
［9］REID J K,SCOTT J A.An out-of-core sparse Cholesky solver［J］.ACM Transactions on Mathematical Software,2009,36(2):1-33.
［10］邹丹,窦勇,郭松.基于GPU的稀疏矩阵Cholesky分解［J］.计算机学报,2014,37(7):1445-1454.
［11］沈聪,高火涛.使用GPU加速计算矩阵的Cholesky分解［J］.计算机应用与软件,2016,33(9):284-287.
［12］TRIGGS B,MCLAUCHLAN P F,HARTLEY R I,et al.Bundle adjustment——a modern synthesis［C］//Proceedings of the International Workshop on Vision Algorithms:Theory and Practice.Berlin,Germany:Springer,1999:298-372.
［13］LOURAKIS M I A,ARGYROS A A.SBA:a software package for generic sparse bundle adjustment［J］.ACM Transactions on Mathematical Software,2009,36(1):2.
［14］NOCEDAL J,WRIGHT S J.Numerical optimization［M］.Berlin,Germany:Springer,1999.
［15］LOURAKIS M I A,ARGYROS A A.Is levenberg-marquardt the most efficient optimization algorithm for implementing bundle adjustment?［C］//Proceedings of the 10th IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2005:1526-1531.
［16］AGARWAL S,SNAVELY N,SIMON I,et al.Building rome in a day［C］//Proceedings of the 12th IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2016:72-79.
［17］AGARWAL S,MIERLE K.Ceres solver［EB/OL］.［2017-11-20］.http://ceres-solver.org.

[1]	张磊, 赵光岳, 肖超恩, 王建新. Falcon后量子算法的密钥树生成部件GPU并行优化设计与实现[J]. 计算机工程, 2024, 50(9): 208-215.
[2]	杨太龙, 赵红朋, 张磊. 基于国产异构平台的奇异值分解法[J]. 计算机工程, 2024, 50(9): 216-225.
[3]	雷斗威, 何德彪, 罗敏, 彭聪. 基于AVX512的格密码高速并行实现[J]. 计算机工程, 2024, 50(2): 15-24.
[4]	王其涵, 庞建民, 岳峰, 祝迪, 沈莉, 肖谦. 面向申威架构的KNN并行算法实现与优化[J]. 计算机工程, 2023, 49(5): 286-294.
[5]	夏立斌, 刘晓宇, 姜晓巍, 孙功星. 基于分布式数据集的并行计算框架内存优化方法[J]. 计算机工程, 2023, 49(4): 43-51.
[6]	房俊, 薛晓东, 周云亮. 基于深度生成模型的聚合查询区间估计方法[J]. 计算机工程, 2023, 49(11): 284-292, 301.
[7]	黄瑞, 金光浩, 李磊, 姜文超, 宋庆增. 轻量化神经网络加速器的设计与实现[J]. 计算机工程, 2021, 47(9): 185-190,196.
[8]	易培淮, 李卫东, 林韬, 邹佳恒, 邓子艳, 刘言. GPU在缪子快速模拟中的应用[J]. 计算机工程, 2021, 47(8): 100-108.
[9]	佘鑫, 何震瀛. 复杂属性条件下基于Spark的clique社区搜索算法[J]. 计算机工程, 2021, 47(12): 54-61,70.
[10]	吴健凤, 郑博文, 聂一, 柴志雷. 基于OpenCL的3DES算法FPGA加速器[J]. 计算机工程, 2021, 47(12): 147-155,162.
[11]	郭渝洛, 边浩东, 董润婷, 唐嘉豪, 王晓英, 黄建强. 基于SIMD的并行傅里叶空间图像相似度计算[J]. 计算机工程, 2021, 47(11): 247-253.
[12]	肖成龙, 聂紫阳, 王宁, 张重鹏, 王珊珊. 基于并行约束规划的最大团识别研究[J]. 计算机工程, 2020, 46(4): 53-59,69.
[13]	李洁, 朱洪亮, 陈玉玲, 辛阳. 基于哈希存储与事务加权的并行Apriori改进算法[J]. 计算机工程, 2020, 46(11): 109-116.
[14]	徐国伟, 陈建, 成怡. 基于GPU并行计算的雷达杂波模拟研究[J]. 计算机工程, 2020, 46(11): 306-314.
[15]	宋匡时, 李翀, 张士波. 一个轻量级分布式机器学习系统的设计与实现[J]. 计算机工程, 2020, 46(1): 201-207.

选择文件类型/文献管理软件名称

选择包含的内容

基于GPU的并行Cholesky分解及其应用

Parallel Cholesky Decomposition and Its Application Based on GPU

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于GPU的并行Cholesky分解及其应用

Parallel Cholesky Decomposition and Its Application Based on GPU

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价