[1] 李亿渊, 薛巍, 陈德训, 等.稀疏矩阵向量乘法在申威众核架构上的性能优化[J].计算机学报, 2020, 43(6):1010-1024. LI Y Y, XUE W, CHEN D X, et al.Performance optimization for sparse matrix-vector multiplication on Sunway architecture[J].Chinese Journal of Computers, 2020, 43(6):1010-1024.(in Chinese) [2] VUDUC R, DEMMEL J W, YELICK K A, et al.Performance optimizations and bounds for sparse matrix-vector multiply[C]//Proceedings of 2002 ACM/IEEE Conference on Supercomputing.Washington D.C., USA:IEEE Press, 2002:1-35. [3] BELL N, GARLAND M.Implementing sparse matrix-vector multiplication on throughput-oriented processors[C]//Proceedings of 2009 ACM Conference on High Performance Computing Networking, Storage and Analysis.New York, USA:ACM Press, 2009:1-11. [4] CHOI J W, SINGH A, VUDUC R W.Model-driven autotuning of sparse matrix-vector multiply on GPUs[C]//Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York, USA:ACM Press, 2010:115-126. [5] KOURTIS K, KARAKASIS V, GOUMAS G, et al.CSX:an extended compression format for SpMV on shared memory systems[C]//Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming.New York, USA:ACM Press, 2011:247-256. [6] LIU W F, VINTER B.CSR5:an efficient storage format for cross-platform sparse matrix-vector multiplication[C]//Proceedings of the 29th ACM on International Conference on Supercomputing.New York, USA:ACM Press, 2015:339-350. [7] LIU W F, VINTER B.Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors[J].Parallel Computing, 2015, 49:179-193. [8] SU B Y, KEUTZER K.clSpMV:a cross-platform OpenCL SpMV framework on GPUs[C]//Proceedings of the 26th ACM international conference on Supercomputing.New York, USA:ACM Press, 2012:353-364. [9] XIE B W, ZHAN J F, LIU X, et al.CVR:efficient vectorization of SpMV on X86 processors[C]//Proceedings of 2018 International Symposium on Code Generation and Optimization.New York, USA:ACM Press, 2018:149-162. [10] YAN S G, LI C, ZHANG Y Q, et al.yaSpMV:yet another SpMV framework on GPUs[C]//Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York, USA:ACM Press, 2014:107-118. [11] 袁娥, 张云泉, 刘芳芳, 等.SpMV的自动性能优化实现技术及其应用研究[J].计算机研究与发展, 2009, 46(7):1117-1126. YUAN E, ZHANG Y Q, LIU F F, et al.Automatic performance tuning of sparse matrix-vector multiplication:implementation techniques and its application research[J].Journal of Computer Research and Development, 2009, 46(7):1117-1126.(in Chinese) [12] VUDUC R, DEMMEL J W, YELICK K A.OSKI:a library of automatically tuned sparse matrix kernels[J].Journal of Physics:Conference Series, 2005, 16:521-530. [13] WILLIAMS S, OLIKER L, VUDUC R, et al.Optimization of sparse matrix-vector multiplication on emerging multicore platforms[J].Parallel Computing, 2009, 35(3):178-194. [14] 李佳佳, 张秀霞, 谭光明, 等.选择稀疏矩阵乘法最优存储格式的研究[J].计算机研究与发展, 2014, 51(4):882-894. LI J J, ZHANG X X, TAN G M, et al.Study of choosing the optimal storage format of sparse matrix vector multiplication[J].Journal of Computer Research and Development, 2014, 51(4):882-894.(in Chinese) [15] SEDAGHATI N, MU T, POUCHET L N, et al.Automatic selection of sparse matrix representation on GPUs[C]//Proceedings of the 29th ACM on International Conference on Supercomputing.New York, USA:ACM Press, 2015:99-108. [16] BENATIA A, JI W X, WANG Y Z, et al.Sparse matrix format selection with multiclass SVM for SpMV on GPU[C]//Proceedings of the 45th International Conference on Parallel Processing.Washington D.C., USA:IEEE Press, 2016:496-505. [17] NISA I, SIEGEL C, RAJAM A S, et al.Effective machine learning based format selection and performance modeling for SpMV on GPUs[C]//Proceedings of 2018 IEEE International Parallel and Distributed Processing Symposium Workshops.Washington D.C., USA:IEEE Press, 2018:1056-1065. [18] ZHAO Y, ZHOU W J, SHEN X P, et al.Overhead-conscious format selection for SpMV-based applications[C]//Proceedings of 2018 IEEE International Parallel and Distributed Processing Symposium.Washington D.C., USA:IEEE Press, 2018:950-959. [19] ZHOU W J, ZHAO Y, SHEN X P, et al.Enabling runtime SpMV format selection through an overhead conscious method[J].IEEE Transactions on Parallel and Distributed Systems, 2020, 31(1):80-93. [20] ZHAO Y, LI J J, LIAO C H, et al.Bridging the gap between deep learning and sparse matrix format selection[C]//Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York, USA:ACM Press, 2018:94-108. [21] IOFFE S, SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on International Conference on Machine Learning.New York, USA:ACM Press, 2015:448-456. [22] WILLIAMSON D F, PARKER R A, KENDRICK J S.The box plot:a simple visual method to interpret data[J].Annals of Internal Medicine, 1989, 110(11):916-921. [23] DAVIS T A, HU Y F.The university of Florida sparse matrix collection[J].ACM Transactions on Mathematical Software, 2011, 38(1):1-25. [24] BOISVERT R F, POZO R, REMINGTON K A.The matrix market exchange formats[EB/OL].[2020-11-15].https://www.researchgate.net/publication/213880672_The_Matrix_Market_Exchange_Format_Initial_Design. |