面向GPU的稀疏对角矩阵自适应SpMV优化方法

doi:10.19678/j.issn.1000-3428.0069807

计算机工程 ›› 2026, Vol. 52 ›› Issue (3): 332-345. doi: 10.19678/j.issn.1000-3428.0069807

面向GPU的稀疏对角矩阵自适应SpMV优化方法

王宇华¹^,², 何俊飞¹, 张宇琪¹, 兰海燕¹^,*(), 曹林琳¹

1. 哈尔滨工程大学计算机学院, 黑龙江哈尔滨 150001
2. 电子政务建模仿真国家工程实验室, 黑龙江哈尔滨 150001

收稿日期:2024-04-29 修回日期:2024-08-19 出版日期:2026-03-15 发布日期:2024-10-28
通讯作者: 兰海燕
作者简介:
王宇华(CCF高级会员)，男，副教授、博士，主研方向为并行计算、人工智能
何俊飞，硕士研究生
张宇琪，硕士研究生
兰海燕(通信作者)，讲师、博士
曹林琳，硕士研究生
基金资助:
国家自然科学基金(62072135)

Sparse Diagonal Matrix Adaptive SpMV Optimization Method for GPU

WANG Yuhua¹^,², HE Junfei¹, ZHANG Yuqi¹, LAN Haiyan¹^,*(), CAO Linlin¹

1. School of Computer Science, Harbin Engineering University, Harbin 150001, Heilongjiang, China
2. National Engineering Laboratory of Modeling and Emulation in E-Government, Harbin 150001, Heilongjiang, China

Received:2024-04-29 Revised:2024-08-19 Online:2026-03-15 Published:2024-10-28
Contact: LAN Haiyan

摘要/Abstract

摘要：

稀疏矩阵向量乘(SpMV)是稀疏线性系统的计算核心和瓶颈, 其运算效率会影响迭代求解器的整体性能, 其优化研究一直是科学计算和工程应用领域中的研究热点之一。偏微分方程的离散化会产生稀疏对角矩阵, 由于其多样的非零元分布, 导致没有一种方法能够在所有矩阵中取得最优时间性能。针对上述问题, 提出一种面向图形处理单元(GPU)的稀疏对角矩阵自适应SpMV优化方法AST(Adaptive SpMV Tuning)。该方法通过设计特征空间, 构建特征提取器, 提取矩阵结构精细特征, 通过深入分析特征和SpMV方法的相关性, 建立可扩展的候选方法集合, 形成特征和最优方法的映射关系, 构建性能预测工具, 实现矩阵最优方法的高效预测。实验结果表明, AST能够取得85.8%的预测准确率, 平均时间性能损失为0.09, 相比于DIA(Diagonal)、HDIA(Hacked DIA)、HDC(Hybrid of DIA and Compressed Sparse Row)、DIA-Adaptive和DRM(Divide-Rearrange and Merge), 能够获得平均20.19、1.86、3.06、3.72和1.53倍的内核运行时间加速和1.05、1.28、12.45、1.94和0.97倍的浮点运算性能加速。

关键词: 稀疏矩阵向量乘, 稀疏对角矩阵, 图形处理单元, 自适应优化方法, 矩阵结构特征

Abstract:

Sparse Matrix-Vector multiplication (SpMV) is the computational core and bottleneck of sparse linear systems, and its computational efficiency affects the overall performance of iterative solvers. Its optimization has long been a research hotspot in the fields of scientific computing and engineering applications. The discretization of partial differential equations produces sparse diagonal matrices, and because of their diverse distributions of nonzero elements, no single method can achieve optimal time performance across all matrices. To solve these problems, a Graphics Processing Unit (GPU)-based sparse diagonal matrix adaptive SpMV optimization method called Adaptive SpMV Tuning (AST) is proposed. This method designs a feature space and constructs a feature extractor to extract fine-grained features of the matrix structure. By analyzing the correlation between these features and SpMV methods, it establishes a scalable set of candidate methods and forms a mapping relationship between the features and optimal methods. Subsequently, a performance prediction tool is built to efficiently predict the optimal method for the matrix. The experimental results show that AST can achieve a prediction accuracy of 85.8%, with an average time performance loss of 0.09. Compared to Diagonal (DIA), Hacked DIA (HDIA), Hybrid of DIA and Compressed Sparse Row (HDC), DIA-Adaptive, and Divide-Rearrange and Merge (DRM), AST can achieve an average speedup in kernel runtime of 20.19, 1.86, 3.06, 3.72, and 1.53 times, respectively, and a speedup in floating-point performance of 1.05, 1.28, 12.45, 1.94, and 0.97 times, respectively.

Key words: Sparse Matrix-Vector multiplication (SpMV), sparse diagonal matrix, Graphics Processing Unit (GPU), adaptive optimization method, matrix structural feature

王宇华, 何俊飞, 张宇琪, 兰海燕, 曹林琳. 面向GPU的稀疏对角矩阵自适应SpMV优化方法[J]. 计算机工程, 2026, 52(3): 332-345.

WANG Yuhua, HE Junfei, ZHANG Yuqi, LAN Haiyan, CAO Linlin. Sparse Diagonal Matrix Adaptive SpMV Optimization Method for GPU[J]. Computer Engineering, 2026, 52(3): 332-345.

https://www.ecice06.com/CN/Y2026/V52/I3/332

图/表 18

图1 稀疏对角矩阵示例

Fig.1 Examples of sparse diagonal matrices

图2 5种SpMV方法的内核运行时间对比

Fig.2 Comparison of kernel running time of five SpMV methods

图3 自适应优化方法框架

Fig.3 Framework of adaptive optimization method

图4 5种SpMV方法在N_Ndiags和R_{ER_DIA}上的分布

Fig.4 Distribution of five SpMV methods over N_Ndiags and R_{ER_DIA}

图5 5种SpMV方法在N_{var_diags_nnz}和N_{avg_diags_nnz}上的分布

Fig.5 Distribution of five SpMV methods over N_{var_diags_nnz} and N_{avg_diags_nnz}

图6 5种SpMV方法在特征N_{max_offset}和N_splashes上的分布

Fig.6 Distribution of five SpMV methods over N_{max_offset} and N_splashes

图7 6种SpMV方法的平均时间性能损失对比

Fig.7 Comparison of the average time performance loss of six SpMV methods

图8 6种SpMV方法在20个矩阵上的内核运行总时间对比

Fig.8 Comparison of total kernel running time of six SpMV methods on twenty matrices

图9 6种SpMV方法的内核运行时间对比

Fig.9 Comparison of kernel running time of six SpMV methods

图10 6种SpMV方法在20个矩阵上的总GFLOPs对比

Fig.10 Comparison of total GFLOPs of six SpMV methods on twenty matrices

图11 6种SpMV方法的GFLOPs对比

Fig.11 Comparison of GFLOPs of six SpMV methods

图12 AST方法相比于其他5种方法的GFLOPs加速比

Fig.12 GFLOPs speedup ratio of AST method compared to the other five methods

参考文献 26

1	BORRELL R , DOSIMONT D , GARCIA-GASULLA M , et al. Heterogeneous CPU/GPU co-execution of CFD simulations on the POWER9 architecture: application to airplane aerodynamics. Future Generation Computer Systems, 2020, 107 (6): 31- 48.
2	INA T, IDOMURA Y, IMAMURA T, et al. Iterative methods with mixed-precision preconditioning for ill-conditioned linear systems in multiphase CFD simulations[C]//Proceedings of the 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA). Saint Louis, USA: IEEE Press, 2021: 1-8.
3	XING L Y , WANG Z S , DING Z Z , et al. An efficient sparse stiffness matrix vector multiplication using compressed sparse row storage format on AMD GPU. Concurrency and Computation: Practice and Experience, 2022, 34 (23): e7186.1- e7186.12.
4	O'HEARN K A, ALPEREN A, AKTULGA H M. Performance optimization of reactive molecular dynamics simulations with dynamic charge distribution models on distributed memory platforms[C]//Proceedings of the ACM International Conference on Supercomputing. New York, USA: ACM Press, 2019: 150-159.
5	杜臻, 谭光明, 孙凝晖. 高性能稀疏矩阵向量乘的程序设计综述. 高技术通讯, 2024, 34 (8): 807- 823. doi: 10.3772/j.issn.1002-0470.2024.08.003
	DU Z , TAN G M , SUN N H . A survey of high-performance sparse matrix-vector multiplication programming. Chinese High Technology Letters, 2024, 34 (8): 807- 823. doi: 10.3772/j.issn.1002-0470.2024.08.003
6	ASANOVIC K, BODIK R, CATANZARO B C, et al. The landscape of parallel computing research: a view from Berkeley: UCB/EECS-2006-183[R]. Berkeley, USA: University of California at Berkeley, 2006: 1-56.
7	BELL N, GARLAND M. Efficient sparse matrix-vector multiplication on CUDA: NVR-2008-004[R]. Santa Clara, USA: Nvidia Corporation, 2008: 1-32.
8	BELL N, GARLAND M. Implementing sparse matrix-vector multiplication on throughput-oriented processors[C]//Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. New York, USA: ACM Press, 2009: 1-11.
9	YESIL S, HEIDARSHENAS A, MORRISON A, et al. WISE: predicting the performance of sparse matrix vector multiplication with machine learning[C]//Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming. New York, USA: ACM Press, 2023: 329-341.
10	CUI H Y , HAN Q L , WANG N B , et al. An adaptive approach for compression format based on bagging algorithm. International Journal of Parallel, Emergent and Distributed Systems, 2023, 38 (5): 401- 423. doi: 10.1080/17445760.2023.2231291
11	TAN G M , LIU J H , LI J J . Design and implementation of adaptive SpMV library for multicore and many-core architecture. ACM Transactions on Mathematical Software, 2018, 44 (4): 1- 25. doi: 10.1145/3218823
12	NISA I, SIEGEL C, RAJAM A S, et al. Effective machine learning based format selection and performance modeling for SpMV on GPUs[C]//Proceedings of 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). Vancouver, Canada: IEEE Press, 2018: 1056-1065.
13	MOHAMMED T , ALBESHRI A , KATIB I , et al. DIESEL: a novel deep learning-based tool for SpMV computations and solving sparse linear equation systems. The Journal of Supercomputing, 2021, 77 (6): 6313- 6355. doi: 10.1007/s11227-020-03489-3
14	NIU Y Y, LU Z Y, DONG M C, et al. TileSpMV: a tiled algorithm for sparse matrix-vector multiplication on GPUs[C]//Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS). Portland, USA: IEEE Press, 2021: 68-78.
15	DU Z, LI J J, WANG Y S, et al. AlphaSparse: generating high performance SpMV codes directly from sparse matrices[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, USA: IEEE Press, 2022: 1-15.
16	BARBIERI D, CARDELLINI V, FANFARILLO A, et al. Three storage formats for sparse matrices on GPGPUs: RR-15.06[R]. Rome, Italy: University di Roma Tor Vergata, 2015: 1-32.
17	YANG W D , LI K L , LIU Y , et al. Optimization of quasi-diagonal matrix-vector multiplication on GPU. The International Journal of High Performance Computing Applications, 2014, 28 (2): 183- 195. doi: 10.1177/1094342013501126
18	阳王东, 李肯立, 石林. 一种准对角矩阵的混合压缩算法及其与向量相乘在GPU上的实现. 计算机科学, 2014, 41 (7): 290- 296. doi: 10.11896/j.issn.1002-137X.2014.07.060
	YANG W D , LI K L , SHI L . Quasi-diagonal matrix hybrid compression algorithm and implementation for SpMV on GPU. Computer Science, 2014, 41 (7): 290- 296. doi: 10.11896/j.issn.1002-137X.2014.07.060
19	GAO J Q , XIA Y F , YIN R J , et al. Adaptive diagonal sparse matrix-vector multiplication on GPU. Journal of Parallel and Distributed Computing, 2021, 157, 287- 302. doi: 10.1016/j.jpdc.2021.07.007
20	夏羿飞. 面向GPU的并行稀疏对角矩阵矢量乘算法研究及应用[D]. 南京: 南京师范大学, 2019.
	XIA Y F. Research and application of parallel sparse diagonal matrix-vector multiplication algorithm on GPU[D]. Nanjing: Nanjing Normal University, 2019. (in Chinese)
21	王宇华, 何俊飞, 张宇琪, 等. DRM: 基于迭代归并策略的GPU并行SpMV存储格式. 计算机工程与科学, 2024, 46 (3): 381- 394. doi: 10.3969/j.issn.1007-130X.2024.03.001
	WANG Y H , HE J F , ZHANG Y Q , et al. DRM: A GPU-parallel SpMV storage format based on iterative merge strategy. Computer Engineering & Science, 2024, 46 (03): 381- 394. doi: 10.3969/j.issn.1007-130X.2024.03.001
22	SONG Y Y , LU Y . Decision tree methods: applications for classification and prediction. Shanghai Archives of Psychiatry, 2015, 27 (2): 130- 135. doi: 10.11919/j.issn.1002-0829.215044
23	CERVANTES J , GARCIA-LAMONT F , RODRÍGUEZ-MAZAHUA L , et al. A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing, 2020, 408, 189- 215. doi: 10.1016/j.neucom.2019.10.118
24	CHEN T Q, GUESTRIN C. XGBoost: a scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2016: 785-794.
25	DAVIS T A , HU Y F . The university of Florida sparse matrix collection. ACM Transactions on Mathematical Software, 2011, 38 (1): 1- 25.
26	RIGATTI S J . Random forest. Journal of Insurance Medicine, 2017, 47 (1): 31- 39. doi: 10.17849/insm-47-01-31-39.1

[1]	曹中潇, 冯仰德, 王珏, 闵维潇, 姚铁锤, 高岳, 王丽华, 高付海. 基于深度学习的稀疏矩阵向量乘运算性能预测模型[J]. 计算机工程, 2022, 48(2): 86-91.
[2]	钱裳云, 邵志远, 郑然, 陈继林. 图数据库中基于GPU的图分析计算方法[J]. 计算机工程, 2021, 47(6): 52-59.
[3]	袁佳伟, 宋庆增, 王雪纯, 姜文超, 金光浩. 边缘计算设备的性能功耗测量与分析[J]. 计算机工程, 2021, 47(2): 233-238,245.
[4]	杨世伟, 蒋国平, 宋玉蓉, 涂潇. 基于GPU的稀疏矩阵存储格式优化研究[J]. 计算机工程, 2019, 45(9): 23-31,39.
[5]	周琦,柴小丽,马克杰,俞则人. 基于CUDA与CUBLAS的Tucker分解模块设计与实现[J]. 计算机工程, 2019, 45(3): 41-46.
[6]	裴鑫,聂俊,陈卯蒸,李健. 基于混合架构的双通道实时相关器实现[J]. 计算机工程, 2016, 42(5): 42-46,53.
[7]	杨仁忠,张洁,韦宏卫,石璐. 基于GPU的Landsat8实时解压缩处理技术[J]. 计算机工程, 2016, 42(3): 301-307.
[8]	杨先凤,李映洁,赖俊良,彭博. 基于GPU并行粒子群优化的超声弹性实时成像算法[J]. 计算机工程, 2015, 41(12): 220-225,230.
[9]	孟小华，黄丛珊，朱丽莎. 基于CUDA的热传导GPU并行算法研究[J]. 计算机工程, 2014, 40(5): 41-44,48.
[10]	王震，李仁发，李彦彪，田峥. 一种并行中英文混合多模式匹配算法[J]. 计算机工程, 2014, 40(4): 318-320.
[11]	闫钧华，杭谊青，孙思佳. 基于GPU的可见光与红外图像融合快速实现[J]. 计算机工程, 2013, 39(11): 249-253.
[12]	荆锐, 赵旦谱, 台宪青. 基于GPU的实时三维点云数据配准研究[J]. 计算机工程, 2012, 38(23): 198-202.
[13]	黄敏, 王金武, 顾力栩, 周喆, 陆文龙. 虚拟肩关节镜手术中软组织形变的模拟[J]. 计算机工程, 2012, 38(19): 281-284,289.
[14]	王华江, 周圣川, 马纯永, 陈戈. 并行化水体表面动态多分辨率物理仿真模型[J]. 计算机工程, 2012, 38(18): 286-290.
[15]	张德好, 刘青昆. 一种Cholesky分解重叠算法[J]. 计算机工程, 2012, 38(18): 262-264.

选择文件类型/文献管理软件名称

选择包含的内容

面向GPU的稀疏对角矩阵自适应SpMV优化方法

Sparse Diagonal Matrix Adaptive SpMV Optimization Method for GPU

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 26

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

面向GPU的稀疏对角矩阵自适应SpMV优化方法

Sparse Diagonal Matrix Adaptive SpMV Optimization Method for GPU

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 18

参考文献 26

相关文章 15

编辑推荐

Metrics

本文评价