[1] 高伟, 赵荣彩, 韩林, 等.SIMD自动向量化编译优化概述[J].软件学报, 2015, 26(6):1265-1284. GAO W, ZHAO R C, HAN L, et al. Research on SIMD auto-vectorization compiling optimization[J].Journal of Software, 2015, 26(6):1265-1284.(in Chinese) [2] ZHOU H, XUE J L.Exploiting mixed SIMD parallelism by reducing data reorganization overhead[C]//Proceedings of 2016 International Symposium on Code Generation and Optimization.New York, USA:ACM Press, 2016:59-69. [3] PANDEY M, SARDA S.LLVM cookbook[M].Packt Publishing Ltd.,[s.n.]:2015. [4] RALF K, SEBASTIAN H.Whole-function vectorization[C]//Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization.Washington D.C., USA:IEEE Press, 2011:141-150. [5] TIAN X M, SAITO H, SU E, et al.LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading[C]//Proceedings of the 3rd Workshop on LLVM Compiler Infrastructure in HPC.Washington D.C., USA:IEEE Press, 2016:21-31. [6] DORIT N, IRA R, AYAL Z.Auto-vectorization of interleaved data for SIMD[J].Association for Computing Machinery, 2006, 6:132-143. [7] PETROGALLI F, WALKER P.LLVM and the automatic vectorization of loops invoking math routines:-fsimdmath[C]//Proceedings of 2018 IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC.New York, USA:ACM Press, 2018:30-38. [8] PORPODAS V.SuperGraph-SLP auto-vectorization[C]//Proceedings of 2017 International Conference on Parallel Architecture and Compilation.Washington D.C., USA:IEEE Press, 2017:330-342. [9] PORPODAS V, ROCHA R C O, LUÍS F W.Look-ahead SLP:auto-vectorization in the presence of commutative operations[C]//Proceedings of International Symposium on Code Generation and Optimization.Washington D.C., USA:IEEE Press, 2018:163-174. [10] HAO Z, XUE J L.A compiler approach for exploiting partial SIMD parallelism[J].ACM Transactions on Architecture and Code Optimization, 2016, 13(1):1-26. [11] MOLDOVANOVA O V, KURNOSOV M G, MELNIKOV A.Energy efficiency and performance of auto-vectorized loops on Intel Xeon processors[C]//Proceedings of 2018 Russian-Pacific Conference on Computer Technology and Applications.Washington D.C., USA:IEEE Press, 2018:1-6. [12] 李威, 梁军, 张桢, 等.基于ARM GPU的机载SAR成像算法并行优化策略[J].计算机工程, 2020, 46(10):240-247. LI W, LIANG J, ZHANG Z, et al.Parallel optimization strategy of airborne SAR imaging algorithm based on ARM GPU[J].Computer Engineering, 2020, 46(10):240-247.(in Chinese) [13] VASILEIOS P, RODRIGO C O R, LUÍS F W G.VW-SLP:auto-vectorization with adaptive vector width[C]//Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques.New York, USA:ACM Press, 2018:1-15. [14] ZHOU H, XUE J L.A compiler approach for exploiting partial SIMD parallelism[J].ACM Transactions on Architecture and Code Optimization, 2016, 13(11):26-35. [15] RODRIGO C O R, VASILEIOS P, PAVLOS P, et al. Vectorization-aware loop unrolling with seed forwarding[C]//Proceedings of the 29th International Conference on Compiler Construction.New York, USA:ACM Press, 2020:1-13. [16] ZHOU H, XUE J.Exploiting mixed SIMD parallelism by reducing data reorganization overhead[C]//Proceedings of 2016 International Symposium on Code Generation and Optimization.New York, USA:ACM Press, 2016:59-69. [17] SIMON M, SHREY S, MATTHIAS K, et al.Multi-dimensional vectorization in LLVM[C]//Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing.New York, USA:ACM Press, 2019:1-8. [18] ANDREW A, AVINASH M, DAVID G.Automatic vectorization of interleaved data revisited[J].ACM Transactions on Architecture and Code Optimization, 2016, 13(2):1-25. [19] LIU J, ZHANG Y R, JANG O Y, et al.A compiler framework for extracting superword level parallelism[C]//Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation.New York, USA:ACM Press, 2012:347-358. [20] BOHM C, PLANT C.Mining massive vector data on single instruction multiple data microarchitectures[C]//Proceedings of 2015 IEEE International Conference on Data Mining Workshop.Washington D.C., USA:IEEE Press, 2015:597-606. [21] PORPODAS V, ROCHA R, BREVNOV E, et al.Super-node SLP:optimized vectorization for code sequences containing operators and their inverse elements[C]//Proceedings of 2019 IEEE/ACM International Symposium on Code Generation and Optimization.Washington D.C., USA:IEEE Press, 2019:206-216. [22] YAO J Y, ZHAO R C, WANG Q, et al.Loop-nest auto-vectorization method based on benefit analysis[C]//Proceedings of the 2nd International Conference on Advances in Image Processing.New York, USA:ACM Press, 2018:240-244. |