[1] DATTA K, MURPHY M, VOLKOV V, et al.Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures[C]//Proceedings of 2008 ACM/IEEE Conference on Supercomputing.Washington D.C., USA:IEEE Press, 2009:1-12. [2] KRISHNAMOORTHY S, BASKARAN M, BONDHUGULA U, et al.Effective automatic parallelization of stencil computations[J].ACM SIGPLAN Notices, 2007, 42(6):235-244. [3] HUANG J Q, HAN W T, WANG X Y, et al.Heterogeneous parallel algorithm design and performance optimization for WENO on the Sunway Taihulight supercomputer[J].Tsinghua Science and Technology, 2019, 25(1):56-67. [4] ZHANG K F, SU H Y, DOU Y.Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures[J].The Journal of Supercomputing, 2021, 77(11):13584-13600. [5] MENEGHIN M, MAHMOUD A H, JAYARAMAN P K, et al.Neon:a multi-GPU programming model for grid-based computations[C]//Proceedings of IEEE International Parallel and Distributed Processing Symposium.Washington D.C., USA:IEEE Press, 2022:817-827. [6] LI K, YUAN L, ZHANG Y Q, et al.Reducing redundancy in data organization and arithmetic calculation for stencil computations[C]//Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis.Washington D.C., USA:IEEE Press, 2022:1-15. [7] SHEN J, WU Y, OKITA M, et al.Accelerating GPU-based out-of-core stencil computation with on-the-fly compression[EB/OL].[2022-02-20].https://arxiv.org/abs/2109.05410. [8] PEARSON C, HIDAYETOĞLU M, ALMASRI M, et al.Node-aware stencil communication for heterogeneous supercomputers[C]//Proceedings of International Parallel and Distributed Processing Symposium Workshops.Washington D.C., USA:IEEE Press, 2020:796-805. [9] SULAIMAN M, HALIM Z, WAQAS M, et al.A hybrid list-based task scheduling scheme for heterogeneous computing[J].The Journal of Supercomputing, 2021, 77(9):10252-10288. [10] BRODTKORB A R, DYKEN C, HAGEN T R, et al.State-of-the-art in heterogeneous computing[J].Scientific Programming, 2010, 18(1):1-33. [11] CHANG L W, GÓMEZ-LUNA J, EL HAJJ I, et al.Collaborative computing for heterogeneous integrated systems[C]//Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering.New York, USA:ACM Press, 2017:385-388. [12] GAN L, FU H H, XUE W, et al.Scaling and analyzing the stencil performance on multi-core and many-core architectures[C]//Proceedings of the 20th IEEE International Conference on Parallel and Distributed Systems.Washington D.C., USA:IEEE Press, 2015:103-110. [13] FAIZUR RAHMAN S M, YI Q, QASEM A.Understanding stencil code performance on multicore architectures[C]//Proceedings of the 8th ACM International Conference on Computing Frontiers.New York, USA:ACM Press, 2011:1-10. [14] SMITH L, BULL M.Development of mixed mode MPI/OpenMP applications[J].Scientific Programming, 2001, 9(3):83-98. [15] LI D, DE SUPINSKI B R, SCHULZ M, et al.Hybrid MPI/OpenMP power-aware computing[C]//Proceedings of IEEE International Symposium on Parallel & Distributed Processing.Washington D.C., USA:IEEE Press, 2010:1-12. [16] 张琨, 贾金芳, 严文昕, 等.GRAPES动力框架中大规模稀疏线性系统并行求解及优化[J].计算机工程, 2022, 48(1):149-154, 162. ZHANG K, JIA J F, YAN W X, et al.Parallel solution and optimization of large-scale sparse linear system in GRAPES dynamic framework[J].Computer Engineering, 2022, 48(1):149-154, 162.(in Chinese) [17] DAGUM L, MENON R.OpenMP:an industry standard API for shared-memory programming[J].IEEE Computational Science and Engineering, 1998, 5(1):46-55. [18] GABRIEL E, FAGG G E, BOSILCA G, et al.Open MPI:goals, concept, and design of a next generation MPI implementation[M]//KRANZLMÜLLER D, KACSUK P, DONGARRA J.Recent Advances in Parallel Virtual Machine and Message Passing Interface.Berlin, Germany:Springer, 2004:97-104. [19] DE SUPINSKI B R, SCOGLAND T R W, DURAN A, et al.The ongoing evolution of OpenMP[J].Proceedings of the IEEE, 2018, 106(11):2004-2019. [20] KWEDLO W, CZOCHANSKI P J.A hybrid MPI/OpenMP parallelization of means algorithms accelerated using the triangle inequality[J].IEEE Access, 2019, 7:42280-42297. [21] ZHENG R H, PAI S.Efficient execution of graph algorithms on CPU with SIMD extensions[C]//Proceedings of IEEE/ACM International Symposium on Code Generation and Optimization.Washington D.C., USA:IEEE Press, 2021:262-276. [22] ZHONG D, CAO Q L, BOSILCA G, et al.Using advanced vector extensions AVX-512 for MPI reductions[C]//Proceedings of the 27th European MPI Users'Group Meeting.New York, USA:ACM Press, 2020:1-10. [23] BIAN H, HUANG J, LIU L, et al.ALBUS:a method for efficiently processing SpMV using SIMD and load balancing[J].Future Generation Computer Systems, 2021, 116:371-392. [24] 郭渝洛, 边浩东, 董润婷, 等.基于SIMD的并行傅里叶空间图像相似度计算[J].计算机工程, 2021, 47(11):247-253. GUO Y L, BIAN H D, DONG R T, et al.Parallel Fourier space image similarity calculation based on SIMD[J].Computer Engineering, 2021, 47(11):247-253.(in Chinese) [25] GARLAND M, LE GRAND S, NICKOLLS J, et al.Parallel computing experiences with CUDA[J].IEEE Micro, 2008, 28(4):13-27. [26] BUCK I.GPU computing with NVIDIA CUDA[C]//Proceedings of ACM SIGGRAPH 2007 Courses.New York, USA:ACM Press, 2007:6-12. [27] 徐国伟, 陈建, 成怡.基于GPU并行计算的雷达杂波模拟研究[J].计算机工程, 2020, 46(11):306-314. XU G W, CHEN J, CHENG Y.Research on radar clutter simulation based on GPU parallel computing[J].Computer Engineering, 2020, 46(11):306-314.(in Chinese) [28] CHOQUETTE J, GANDHI W.NVIDIA A100 GPU:performance & innovation for GPU computing[C]//Proceedings of IEEE Hot Chips 32 Symposium.Washington D.C., USA:IEEE Press, 2020:1-43. [29] NARASIMAN V, SHEBANOW M, LEE C J, et al.Improving GPU performance via large warps and two-level warp scheduling[C]//Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.Washington D.C., USA:IEEE Press, 2017:308-317. [30] 肖汉, 郭宝云, 李彩林, 等.面向异构架构的传递闭包并行算法[J].计算机工程, 2021, 47(8):131-139. XIAO H, GUO B Y, LI C L, et al.Parallel transitive closure algorithm for heterogeneous architecture[J].Computer Engineering, 2021, 47(8):131-139.(in Chinese) [31] MITTAL S, VETTER J S.A survey of CPU-GPU heterogeneous computing techniques[J].ACM Computing Surveys, 2015, 47(4):1-35. |