[1] 刘扬, 王鹏, 杨瑞, 等.基于OpenMP的遥感影像并行ISODATA聚类研究[J].计算机工程, 2016, 42(7):238-243, 250. LIU Y, WANG P, YANG R, et al.Research on parallel ISODATA clustering for remote sensing image based on OpenMP[J].Computer Engineering, 2016, 42(7):238-243, 250.(in Chinese) [2] TIOTTO E, MAHJOUR B, TSANG W, et al.OpenMP 4.5 compiler optimization for GPU offloading[J].IBM Journal of Research and Development, 2020, 64(3/4):1-14. [3] NETH B, SCOGLAND T R W, STROUT M M, et al.Unified sequential optimization directives in OpenMP[C]//Proceedings of the 16th International Workshop on OpenMP.Berlin, Germany:Springer, 2020:85-97. [4] MOSSERI I, ALON L O, HAREL R, et al.ComPar:optimized multi-compiler for automatic OpenMP S2S parallelization[C]//Proceedings of the 16th International Workshop on OpenMP.Berlin, Germany:Springer, 2020:247-262. [5] 邵雨新, 席静, 张自圃.一种利用全国产化器件启动龙芯3A1000的方法[J].兵工自动化, 2020, 39(7):33-35. SHAO Y X, XI J, ZHANG Z P.Method for starting Loongson 3A1000 by using domestic device[J].Ordnance Industry Automation, 2020, 39(7):33-35.(in Chinese) [6] SOUZA J D, BECKER P H E, BECK A C S.Improving multitask performance and energy consumption with partial-ISA multicores[J].Journal of Parallel and Distributed Computing, 2021, 153:1-14. [7] MCINTOSH-SMITH S, DE SUPINSKI B R, KLINKENBERG J.OpenMP:enabling massive node-level parallelism[M].Berlin, Germany:Springer, 2021. [8] LÖFF J, GRIEBLER D, MENCAGLI G, et al.The NAS parallel benchmarks for evaluating C++ parallel programming frameworks on shared-memory architectures[J].Future Generation Computer Systems, 2021, 125:743-757. [9] 朱会东, 黄永丽, 宋宝卫.基于CMP的指针数据预取方法[J].计算机工程, 2011, 37(6):71-73. SHU H D, HUANG Y L, SONG B W.Pointer data prefetching method based on CMP[J].Computer Engineering, 2011, 37(6):71-73.(in Chinese) [10] ONODERA N, IDOMURA Y, HASEGAWA Y, et al.GPU acceleration of multigrid preconditioned conjugate gradient solver on block-structured Cartesian grid[C]//Proceedings of International Conference on High Performance Computing in Asia-Pacific Region.New York, USA:ACM Press, 2021:120-128. [11] PEREIRA F H, LOPES VERARDI S L, NABETA S I.A fast algebraic multigrid preconditioned conjugate gradient solver[J].Applied Mathematics and Computation, 2006, 179(1):344-351. [12] PAL S, PATHAK S, RAJASEKARAN S.On speeding-up parallel Jacobi iterations for SVDs[C]//Proceedings of the 18th IEEE International Conference on High Performance Computing and Communications.Washington D.C., USA:IEEE Press, 2016:9-16. [13] YANG X, MITTAL R.Efficient relaxed-Jacobi smoothers for multigrid on parallel computers[J].Journal of Computational Physics, 2017, 332:135-142. [14] KUDO S, YAMAMOTO Y, BEČKA M, et al.Performance of the parallel one-sided block Jacobi SVD algorithm on a modern distributed-memory parallel computer[C]//Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics.Washington D.C., USA:IEEE Press, 2016:594-604. [15] CERVINI S.System and method for efficiently executing single program multiple data programs:USA, US7904905[P].2011-03-08. [16] Intel Corporation.Architecture and method for data parallel single program multiple data execution:USA, US20200104139[P].2020-05-10. [17] SPRENGER S, ZEUCH S, LESER U.Exploiting automatic vectorization to employ SPMD on SIMD registers[C]//Proceedings of the 34th IEEE International Conference on Data Engineering Workshops.Washington D.C., USA:IEEE Press, 2018:90-95. [18] ZHU W R, CUVILLO J, GAO G R.Performance characteristics of OpenMP language constructs on a many-core-on-a-chip architecture[C]//Proceedings of International Workshop on OpenMP.Berlin, Germany:Springer, 2008:90-95. [19] STELLE G, MOSES W S, OLIVIER S L, et al.OpenMPIR:implementing OpenMP tasks with tapir[C]//Proceedings of the 4th Workshop on LLVM Compiler Infrastructure in HPC.New York, USA:ACM Press, 2017:1-12. [20] BOURAOUI H, CASTRILLON J, JERAD C.Comparing dataflow and OpenMP programming for speaker recognition applications[C]//Proceedings of the 10th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms.Washington D.C., USA:IEEE Press, 2019:1-6. [21] SCOGLAND T R W, GYLLENHAAL J, KEASLER J, et al.Enabling region merging optimizations in OpenMP[M].Berlin, Germany:Springer, 2015. [22] ALDINUCCI M, CESARE V, COLONNELLI I, et al.Practical parallelization of scientific applications with OpenMP, OpenACC and MPI[J].Journal of Parallel and Distributed Computing, 2021, 157(11):13-29. [23] HONGXUE J, DANBING L, XILA L.Parallel efficiency analysis of large increment method based on OpenMP[J].Earth and Environmental Science, 2021, 787(1):012052. [24] 蔡雨, 孙成国, 杜朝晖, 等.异构HPL算法中CPU端高性能BLAS库优化[J].软件学报, 2021, 32(8):2289-2306. CAI Y, SUN C G, DU Z H, et al.CPU-side high performance BLAS library optimization in heterogeneous HPL algorithm full text replacement[J].Journal of Software, 2021, 32(8):2289-2306.(in Chinese) |