基于GPU的稀疏矩阵存储格式优化研究

doi:10.19678/j.issn.1000-3428.0053513

计算机工程 ›› 2019, Vol. 45 ›› Issue (9): 23-31,39. doi: 10.19678/j.issn.1000-3428.0053513

基于GPU的稀疏矩阵存储格式优化研究

杨世伟^a, 蒋国平^b, 宋玉蓉^b, 涂潇^a

南京邮电大学 a. 计算机学院;b. 自动化学院, 南京 210023

收稿日期:2018-12-28 修回日期:2019-02-19 出版日期:2019-09-15 发布日期:2019-09-03
作者简介:杨世伟(1994-),男,硕士研究生,主研方向为复杂网络、GPU并行计算;蒋国平、宋玉蓉,教授、博士生导师;涂潇,博士。
基金资助:
国家自然科学基金（61672298，61873326，61373136）。

Research on Storage Format Optimization of Sparse Matrix Based on GPU

YANG Shiwei^a, JIANG Guoping^b, SONG Yurong^b, TU Xiao^a

a. School of Computer Science;b. School of Automation, Nanjing Utniversity of Posts and Telecommunications, Nanjing 210023, China

Received:2018-12-28 Revised:2019-02-19 Online:2019-09-15 Published:2019-09-03
Supported by:
This work is supported by National Key R&D Program of China (No.2017YFB1201003-020), Provincial Key R&D Program of Gansu (No.18YF1FA058).

摘要/Abstract

摘要： 稀疏矩阵存储格式中的稀疏矩阵向量乘（SpMV）计算效率低下，且分块行列（BRC）存储格式的计算结果缺少再现性和确定性。为此，提出一种改进的BRCP存储格式。采用不同的二维分块策略，根据矩阵各行非零元素分布的统计特性自适应调节分块参数，提高SpMV在GPU平台上的并行性，并设计基于快速分段求和算法的GPU内核函数，保证计算结果的确定性及其在不同GPU平台上的再现性。实验结果表明，BRCP存储格式具有较高的计算效率，相比BRC存储格式可减少并行环境中的SpMV计算误差，并提高PageRank排序的准确率。

关键词: 稀疏矩阵向量乘, 计算统一设备架构, 图形处理器, 存储格式, 浮点运算

Abstract: Sparse Matrix-Vector Multiplication(SpMV) calculation in sparse matrix storage format is inefficient,and the computing results of Blocked Row-Column(BRC) storage format lack reproducibility and certainty.To solve the problem,this paper proposes an improved Blocked Row-Column Plus(BRCP) storage format.The BRCP storage format adopts different two-dimensional blocking strategies,adaptively adjusts the blocking parameters according to the statistical characteristics of the distribution of non-zero elements of each row in the matrix,and improves the parallelism of SpMV on the GPU platform.A GPU kernel fuction based on fast segmented summation algorithm is designed to ensure the certainty of calculation results and their reproducibility on different GPU platforms.Experimental results show that the BRCP storage format has high computational efficiency,which reduces the SpMV calculation error in the parallel environment and improves the accuracy of PageRank sorting compared to the BRC storage format.

Key words: Sparse Matrix-Vector Multiplication(SpMV), Compute Unified Device Architecture(CUDA), Graphic Processing Unit(GPU), storage format, floating-point operation

中图分类号:

TP312

杨世伟, 蒋国平, 宋玉蓉, 涂潇. 基于GPU的稀疏矩阵存储格式优化研究[J]. 计算机工程, 2019, 45(9): 23-31,39.

YANG Shiwei, JIANG Guoping, SONG Yurong, TU Xiao. Research on Storage Format Optimization of Sparse Matrix Based on GPU[J]. Computer Engineering, 2019, 45(9): 23-31,39.

https://www.ecice06.com/CN/Y2019/V45/I9/23

图/表 15

20190912181926

20190912181929

20190912181932

20190912181935

20190912181938

20190912181940

20190912181943

20190912181946

20190912181950

20190912181953

20190912181956

20190912181959

20190912182007

20190912182011

20190912182014

参考文献

[1] 尹孟嘉,许先斌,何水兵,等.GPU稀疏矩阵向量乘的性能模型构造[J].计算机科学,2017,44(4):182-187.
[2] BRIN S,PAGE L.The anatomy of a large-scale hypertextual Web search engine[J].Computer Networks and ISDN Systems,1998,30(1):107-117.
[3] TONG Hanghang,FALOUTSOS C,PAN Jiayu.Random walk with restart:fast solutions and applications[J].Knowledge and Information Systems,2008,14(3):327-346.
[4] LANGR D,TVRDÍK P.Evaluation criteria for sparse matrix storage formats[J].IEEE Transactions on Parallel and Distributed Systems,2016,27(2):428-440.
[5] 李佳佳,张秀霞,谭光明,等.选择稀疏矩阵乘法最优存储格式的研究[J].计算机研究与发展,2014,51(4):882-894.
[6] 刘芳芳,杨超,袁欣辉,等.面向国产申威26010众核处理器的SpMV实现与优化[J].软件学报,2018,29(12):3921-3932.
[7] LINDHOLM E,NICKOLLS J,OBERMAN S,et al.NVIDIA Tesla:a unified graphics and computing architecture[J].IEEE Micro,2008,28(2):39-55.
[8] MAGGIONI M,BERGER-WOLF T.Optimization techniques for sparse matrix-vector multiplication on GPUs[J].Journal of Parallel and Distributed Computing,2016,93(C):66-86.
[9] 张珩,张立波,武延军.基于Multi-GPU平台的大规模图数据处理[J].计算机研究与发展,2018,55(2):273-288.
[10] 程凯,田瑾,马瑞琳.基于GPU的高效稀疏矩阵存储格式研究[J].计算机工程,2018,44(8):54-60.
[11] LIU Yongchao,SCHMIDT B.LightSpMV:faster CUDA-compatible sparse matrix-vector multiplication using compressed sparse rows[J].Journal of Signal Processing Systems,2018,90(1):69-86.
[12] LIU Weifeng,VINTER B.CSR5:an efficient storage format for cross-platform sparse matrix-vector multiplication[C]//Proceedings of ACM International Conference on Supercomputing.New York,USA:ACM Press,2015:339-350.
[13] ASHARI A,SEDAGHATI N,EISENLOHR J,et al.An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs[M].Berlin,Germany:Springer,2014.
[14] ASHARI A,SEDAGHATI N,EISENLOHR J,et al.A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs[J].Journal of Parallel and Distributed Computing,2015,76:3-15.
[15] WILKINSON J H.Error analysis of floating-point computation[J].Numerische Mathematik,1960,2(1):319-340.
[16] HILLIS W D,STEELE G L.Data parallel algorithms[J].Communications of the ACM,1986,29(12):1170-1183.
[17] HARRIS M,SENGUPTA S,OWENS J D.Parallel prefix sum(scan) with CUDA[M]//NGUYEN H.GPU Gems 3.New Jersey,USA:Addison Wesley,2007:851-876.
[18] SENGUPTA S,HARRIS M,ZHANG Yao,et al.Scan primitives for GPU computing[C]//Proceedings of ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware.New York,USA:ACM Press,2007:97-106.
[19] FILIPPONE S,CARDELLINI V,BARBIERI D,et al.Sparse matrix-vector multiplication on GPGPUs[J].ACM Transactions on Mathematical Software,2017,43(4):1-49.
[20] BLELLOCH G E,HEROUX M A,ZAGHA M.Segmented operations for sparse matrix computation on vector multiprocessors:CMU-CS-93-173[R].Pittsburgh,USA:Carnegie Mellon University,1993:1-7.
[21] CHENG J,GROSSMAN M,MCKERCHER T.Professional CUDA C Programming[M].[S.l.]:Wrox,2014.
[22] NVIDIA.CUDA C programming guide[EB/OL].[2018-11-27].https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html.
[23] DAVIS T A,HU Yifan.The university of florida sparse matrix collection[J].ACM Transactions on Mathematical Software,2011,38(1):1-25.

选择文件类型/文献管理软件名称

选择包含的内容

基于GPU的稀疏矩阵存储格式优化研究

Research on Storage Format Optimization of Sparse Matrix Based on GPU

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王宇华, 何俊飞, 张宇琪, 兰海燕, 曹林琳. 面向GPU的稀疏对角矩阵自适应SpMV优化方法[J]. 计算机工程, 2026, 52(3): 332-345.
[2]	张磊, 赵光岳, 肖超恩, 王建新. Falcon后量子算法的密钥树生成部件GPU并行优化设计与实现[J]. 计算机工程, 2024, 50(9): 208-215.
[3]	关明晓, 刘嘉堃, 张鸿锐, 何安平. 基于FPGA误差可控的浮点运算加速器研究[J]. 计算机工程, 2024, 50(5): 291-297.
[4]	黄斌, 柳安军, 潘景山, 田敏, 张煜, 朱光慧. 基于GPU的LBM迁移模块算法优化[J]. 计算机工程, 2024, 50(2): 232-238.
[5]	庞文豪, 王嘉伦, 翁楚良. GPGPU和CUDA统一内存研究现状综述[J]. 计算机工程, 2024, 50(12): 1-15.
[6]	林琳, 祝爱琦, 赵明璨, 张帅, 叶炎昊, 徐骥, 韩林, 赵荣彩, 侯超峰. 晶硅分子动力学模拟的GPU加速算法优化[J]. 计算机工程, 2023, 49(4): 166-173.
[7]	李靖, 祝爱琦, 韩林, 侯超峰. 基于GPU的固态晶体硅分子动力学算法优化[J]. 计算机工程, 2023, 49(3): 288-295.
[8]	曹中潇, 冯仰德, 王珏, 闵维潇, 姚铁锤, 高岳, 王丽华, 高付海. 基于深度学习的稀疏矩阵向量乘运算性能预测模型[J]. 计算机工程, 2022, 48(2): 86-91.
[9]	肖汉, 郭宝云, 李彩林, 周清雷. 面向异构架构的传递闭包并行算法[J]. 计算机工程, 2021, 47(8): 131-139.
[10]	方玉玲, 陈庆奎. 基于矩阵转换的卷积计算优化方法[J]. 计算机工程, 2019, 45(7): 217-221,228.
[11]	程凯,田瑾,马瑞琳. 基于GPU的高效稀疏矩阵存储格式研究[J]. 计算机工程, 2018, 44(8): 54-60.
[12]	汤佳,龚奕利,李文海. 一种基于GPU的KNN动态扩展查询策略[J]. 计算机工程, 2018, 44(6): 1-7.
[13]	高艺,罗健欣,裘杭萍,吴波. 基于GPU栅格化的任意多边形布尔运算[J]. 计算机工程, 2018, 44(3): 301-306,314.
[14]	魏渐俊,陈良育. 基于GPGPU的大整数矩阵行列式快速准确计算方法[J]. 计算机工程, 2018, 44(3): 47-54.
[15]	王吉军,程华. 通用图形处理器功耗估算模型[J]. 计算机工程, 2017, 43(2): 92-97,104.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于GPU的稀疏矩阵存储格式优化研究

Research on Storage Format Optimization of Sparse Matrix Based on GPU

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献

相关文章 15

编辑推荐

Metrics

本文评价