[1] 尹孟嘉,许先斌,何水兵,等.GPU稀疏矩阵向量乘的性能模型构造[J].计算机科学,2017,44(4):182-187. [2] BRIN S,PAGE L.The anatomy of a large-scale hypertextual Web search engine[J].Computer Networks and ISDN Systems,1998,30(1):107-117. [3] TONG Hanghang,FALOUTSOS C,PAN Jiayu.Random walk with restart:fast solutions and applications[J].Knowledge and Information Systems,2008,14(3):327-346. [4] LANGR D,TVRDÍK P.Evaluation criteria for sparse matrix storage formats[J].IEEE Transactions on Parallel and Distributed Systems,2016,27(2):428-440. [5] 李佳佳,张秀霞,谭光明,等.选择稀疏矩阵乘法最优存储格式的研究[J].计算机研究与发展,2014,51(4):882-894. [6] 刘芳芳,杨超,袁欣辉,等.面向国产申威26010众核处理器的SpMV实现与优化[J].软件学报,2018,29(12):3921-3932. [7] LINDHOLM E,NICKOLLS J,OBERMAN S,et al.NVIDIA Tesla:a unified graphics and computing architecture[J].IEEE Micro,2008,28(2):39-55. [8] MAGGIONI M,BERGER-WOLF T.Optimization techniques for sparse matrix-vector multiplication on GPUs[J].Journal of Parallel and Distributed Computing,2016,93(C):66-86. [9] 张珩,张立波,武延军.基于Multi-GPU平台的大规模图数据处理[J].计算机研究与发展,2018,55(2):273-288. [10] 程凯,田瑾,马瑞琳.基于GPU的高效稀疏矩阵存储格式研究[J].计算机工程,2018,44(8):54-60. [11] LIU Yongchao,SCHMIDT B.LightSpMV:faster CUDA-compatible sparse matrix-vector multiplication using compressed sparse rows[J].Journal of Signal Processing Systems,2018,90(1):69-86. [12] LIU Weifeng,VINTER B.CSR5:an efficient storage format for cross-platform sparse matrix-vector multiplication[C]//Proceedings of ACM International Conference on Supercomputing.New York,USA:ACM Press,2015:339-350. [13] ASHARI A,SEDAGHATI N,EISENLOHR J,et al.An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs[M].Berlin,Germany:Springer,2014. [14] ASHARI A,SEDAGHATI N,EISENLOHR J,et al.A model-driven blocking strategy for load balanced sparse matrix-vector multiplication on GPUs[J].Journal of Parallel and Distributed Computing,2015,76:3-15. [15] WILKINSON J H.Error analysis of floating-point computation[J].Numerische Mathematik,1960,2(1):319-340. [16] HILLIS W D,STEELE G L.Data parallel algorithms[J].Communications of the ACM,1986,29(12):1170-1183. [17] HARRIS M,SENGUPTA S,OWENS J D.Parallel prefix sum(scan) with CUDA[M]//NGUYEN H.GPU Gems 3.New Jersey,USA:Addison Wesley,2007:851-876. [18] SENGUPTA S,HARRIS M,ZHANG Yao,et al.Scan primitives for GPU computing[C]//Proceedings of ACM SIGGRAPH/EUROGRAPHICS Symposium on Graphics Hardware.New York,USA:ACM Press,2007:97-106. [19] FILIPPONE S,CARDELLINI V,BARBIERI D,et al.Sparse matrix-vector multiplication on GPGPUs[J].ACM Transactions on Mathematical Software,2017,43(4):1-49. [20] BLELLOCH G E,HEROUX M A,ZAGHA M.Segmented operations for sparse matrix computation on vector multiprocessors:CMU-CS-93-173[R].Pittsburgh,USA:Carnegie Mellon University,1993:1-7. [21] CHENG J,GROSSMAN M,MCKERCHER T.Professional CUDA C Programming[M].[S.l.]:Wrox,2014. [22] NVIDIA.CUDA C programming guide[EB/OL].[2018-11-27].https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. [23] DAVIS T A,HU Yifan.The university of florida sparse matrix collection[J].ACM Transactions on Mathematical Software,2011,38(1):1-25. |