[1] GROUP W, LUCK E, SKJELLUM A.Using MPI:portable parallel programming with the message passing interface[M].Cambridge, USA:MIT Press, 1994. [2] STELLNER G.CoCheck:check pointing and process migration for MPI[C]//Proceedings of International Conference on Parallel Processing.Washington D.C., USA:IEEE Press, 1996:526-531. [3] GABRIEL E, FAGG G E, BOSILCA G, et al.Open MPI:goals, concept, and design of a next generation MPI implementation[C]//Proceedings of European Parallel Virtual Machine/Message Passing Interface Users' Group Meeting.Berlin, Germany:Springer, 2004:97-104. [4] STONE J E, GOHARA D, SHI G C.OpenCL:a parallel programming standard for heterogeneous computing systems[J].Computing in Science & Engineering, 2010, 12(3):66-73. [5] JÄÄSKELÄINEN P, LAMA C S, SCHNETTER E, et al.POCL:a performance-portable OpenCL implementation[J].International Journal of Parallel Programming, 2015, 43(5):752-785. [6] DAGUM L, MENON R.OpenMP:an industry standard API for shared-memory programming[J].IEEE Computational Science and Engineering, 1998, 5(1):46-55. [7] CHAPMAN B, JOST G, VAN DER PAS R.Using OpenMP:portable shared memory parallel programming[M].Cambridge, USA:MIT Press, 2008. [8] ATZENI S, GOPALAKRISHNAN G, RAKAMARIC Z, et al.ARCHER:effectively spotting data races in large OpenMP applications[C]//Proceedings of 2016 IEEE International Parallel and Distributed Processing Symposium(IPDPS).Washington D.C., USA:IEEE Press, 2016:53-62. [9] WIENKE S, SPRINGER P, TERBOVEN C.OpenACC-first experiences with real-world applications[C]//Proceedings of European Conference on Parallel Processing.Berlin, Germany:Springer, 2012:859-870. [10] KIRK D.Nvidia CUDA software and GPU parallel computing architecture[C]//Proceedings of the 6th International Symposium on Memory Management.New York, USA:ACM Press, 2007:103-104. [11] YANG Z Y, ZHU Y T, PU Y.Parallel image processing based on CUDA[C]//Proceedings of International Conference on Computer Science and Software Engineering.Washington D.C., USA:IEEE Press, 2008:198-201. [12] SANDERS J, KANDROT E.CUDA by example:an introduction to general-purpose GPU programming[M].[S.l.]:Addison-Wesley Professional, 2010. [13] FONSECA A, CABRAL B, RAFAEL J, et al.Automatic parallelization:executing sequential programs on a task-based parallel runtime[J].International Journal of Parallel Programming, 2016, 44(6):1337-1358. [14] ZHANG Y Q, CAO T, LI S G, et al.Parallel processing systems for big data:a survey[J].Proceedings of the IEEE, 2016, 104(11):2114-2136. [15] OH T, BEARD S R, JOHNSON N P, et al.A generalized framework for automatic scripting language parallelization[C]//Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques(PACT).Washington D.C., USA:IEEE Press, 2017:356-369. [16] BASKARAN M M, RAMANUJAM J, SADAYAPPAN P.Automatic C-to-CUDA code generation for affine programs[M].Berlin, Germany:Springer, 2010. [17] BONDHUGULA U, RAMANUJAM J.Pluto:a practical and fully automatic polyhedral parallelizer and locality optimizer[EB/OL].[2022-01-11].https://www.xueshufan.com/publication/2034761517. [18] BASTOUL C.Efficient code generation for automatic parallelization and optimization[C]//Proceedings of the 2nd International Symposium on Parallel and Distributed Computing.Washington D.C., USA:IEEE Press, 2003:23-30. [19] VERDOOLAEGE S, JUEGA J C, COHEN A, et al.Polyhedral parallel code generation for CUDA[J].ACM Transactions on Architecture and Code Optimization, 2013, 9(4):1-23. [20] NUGTEREN C, CORPORAAL H.Bones:an automatic skeleton based C-to-CUDA compiler for GPUS[J].ACM Transactions on Architecture and Code Optimization, 2015, 11(4):1-35. [21] LEE S I, T.A.JOHNSON T A, EIGENMANN R.Cetus-an extensible compiler infrastructure for source-to-source transformation[C]//Proceedings of International Workshop on Languages and Compilers for Parallel Computing.Berlin, Germany:Springer, 2003:539-553. [22] DAVE C, BAE H, MIN S J, et al.Cetus:a source-to-source compiler infrastructure for multicores[J].Computer, 2009, 42(12):36-42. [23] BAE H S, MUSTAFA D, LEE J W, et al.The Cetus source-to-source compiler infrastructure:overview and evaluation[J].International Journal of Parallel Programming, 2013, 41(6):753-767. [24] AMINI M, CREUSILLET B, EVEN S, et al.Par4All:from convex array regions to heterogeneous computing[EB/OL].[2022-01-11].https://hal-mines-paristech.archives-ouvertes.fr/hal-00744733. [25] QUINLAN D.ROSE:compiler support for object-oriented frameworks[J].Parallel Processing Letters, 2000, 10(2):215-226. [26] QUINLAN D, LIAO C H.ROSE source-to-source compiler infrastructure[EB/OL].[2022-01-11].https://engineering.purdue.edu/Cetus/cetusworkshop/papers/4-1.pdf. [27] 刘松, 赵博, 蒋庆, 等.一种面向循环优化和非规则代码段的粗粒度半自动并行化方法[J].计算机学报, 2017, 40(9):2127-2147. LIU S, ZHAO B, JIANG Q, et al.A semi-automatic coarse-grained parallelization approach for loop optimization and irregular code sections[J].Chinese Journal of Computers, 2017, 40(9):2127-2147.(in Chinese) [28] 李雁冰, 赵荣彩, 韩林, 等.一种面向异构众核处理器的并行编译框架[J].软件学报, 2019, 30(4):981-1001. LI Y B, ZHAO R C, HAN L, et al.Parallelizing compilation framework for heterogeneous many-core processors[J].Journal of Software, 2019, 30(4):981-1001.(in Chinese) [29] 王鹏翔, 韩林, 丁丽丽, 等.典型编译器自动并行化效果和评估[J].信息工程大学学报, 2018, 19(2):186-190. WANG P X, HAN L, DING L L, et al.Effect and evaluation of automatic parallelization of typical compiler[J].Journal of Information Engineering University, 2018, 19(2):186-190.(in Chinese) [30] 丁丽丽, 李雁冰, 张素平, 等.分支嵌套循环的自动并行化研究[J].计算机科学, 2017, 44(5):14-19, 52. DING L L, LI Y B, ZHANG S P, et al.Auto-parallelization research based on branch nested loops[J].Computer Science, 2017, 44(5):14-19, 52.(in Chinese) [31] 高雨辰, 赵荣彩, 韩林, 等.循环自动并行化技术研究[J].信息工程大学学报, 2019, 20(1):82-89. GAO Y C, ZHAO R C, HAN L, et al.Research on automatic parallelization of loops[J].Journal of Information Engineering University, 2019, 20(1):82-89.(in Chinese) [32] PARR T J, QUONG R W.ANTLR:a predicated-LL(k) parser generator[J].Software:Practice and Experience, 1995, 25(7):789-810. [33] PARR T.The definitive ANTLR 4 reference[M].[S.l.]:Pragmatic Bookshelf, 2013. |