面向国产平台的LLVM自动向量化移植与优化

doi:10.19678/j.issn.1000-3428.0060240

摘要/Abstract

摘要： 作为SIMD扩展部件向量化的重要手段，自动向量化已在LLVM编译器中得到实现，但向量长度以及指令集功能的差异，导致国产平台在自动向量化过程中容易错失向量化机会以及向量化后产生倒加速的问题。为使SIMD得到充分应用，结合国产平台的指令集特征完善指令代价信息以提高收益分析精准度，使其在自动向量化后生成后端支持且简洁高效的向量指令。在此基础上，提出一种改进的控制流向量化方法，通过添加指令代价信息提高自动向量化的适配能力，从而形成一套面向国产平台的LLVM自动向量化系统。实验结果表明，相比自动向量化移植前，通过该方法进行移植优化后，SPEC测试的整体性能提升10.8%，TSVC测试集中的加速比提升16%，精准代价指导下的加速比提升42%，控制流向量化下的加速比提升51%。

关键词: 自动向量化, 向量化收益, 移植, LLVM编译器, 国产平台

Abstract: Automatic vectorization is essential in SIMD extension vectorization, and has been implemented in the LLVM compiler.However, the difference of vector length and instruction set functions can cause the domestic processors to lose the opportunity of vectorization in the process of automatic vectorization, or produce negative acceleration after vectorization.To make full use of SIMD, this paper discusses how to improve instruction cost information according to the instruction set features of domestic processors, so the accuracy of benefit analysis is increased.On this basis, precise and efficient vector instructions supported by the back end are generated after automatic vectorization.Furthermore, this paper proposes a vectorization method with improved control flows.By adding instruction cost information, the adaptability of automatic vectorization is improved.Finally a LLVM-based automatic vectorization system for domestic platforms is formed.The experimental results show that for the platforms having received automatic vectorization transplant, the proposed method provides a 10.8% overall performance improvement in SPEC tests, 16% acceleration ratio improvement on the TSVC test, 42% acceleration ratio improvement under the guidance of precision cost, and 51% acceleration ratio improvement under the control flow vecctorization.

Key words: automatic vectorization, vectorization cost, transplant, LLVM compiler, domestic processor

中图分类号:

TP314

李嘉楠, 韩林, 柴赟达. 面向国产平台的LLVM自动向量化移植与优化[J]. 计算机工程, 2022, 48(1): 142-148.

LI Jia'nan, HAN Lin, CHAI Yunda. Automatic Vectorization Transplant and Optimization of LLVM for Domestic Processors[J]. Computer Engineering, 2022, 48(1): 142-148.

http://www.ecice06.com/CN/Y2022/V48/I1/142

图/表 9

20220108122413

20220108122417

20220108122421

20220108122425

20220108122429

20220108122434

20220108122438

20220108122442

20220108122445

参考文献

[1] 高伟, 赵荣彩, 韩林, 等.SIMD自动向量化编译优化概述[J].软件学报, 2015, 26(6):1265-1284. GAO W, ZHAO R C, HAN L, et al. Research on SIMD auto-vectorization compiling optimization[J].Journal of Software, 2015, 26(6):1265-1284.(in Chinese)
[2] ZHOU H, XUE J L.Exploiting mixed SIMD parallelism by reducing data reorganization overhead[C]//Proceedings of 2016 International Symposium on Code Generation and Optimization.New York, USA:ACM Press, 2016:59-69.
[3] PANDEY M, SARDA S.LLVM cookbook[M].Packt Publishing Ltd.,[s.n.]:2015.
[4] RALF K, SEBASTIAN H.Whole-function vectorization[C]//Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization.Washington D.C., USA:IEEE Press, 2011:141-150.
[5] TIAN X M, SAITO H, SU E, et al.LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading[C]//Proceedings of the 3rd Workshop on LLVM Compiler Infrastructure in HPC.Washington D.C., USA:IEEE Press, 2016:21-31.
[6] DORIT N, IRA R, AYAL Z.Auto-vectorization of interleaved data for SIMD[J].Association for Computing Machinery, 2006, 6:132-143.
[7] PETROGALLI F, WALKER P.LLVM and the automatic vectorization of loops invoking math routines:-fsimdmath[C]//Proceedings of 2018 IEEE/ACM Workshop on the LLVM Compiler Infrastructure in HPC.New York, USA:ACM Press, 2018:30-38.
[8] PORPODAS V.SuperGraph-SLP auto-vectorization[C]//Proceedings of 2017 International Conference on Parallel Architecture and Compilation.Washington D.C., USA:IEEE Press, 2017:330-342.
[9] PORPODAS V, ROCHA R C O, LUÍS F W.Look-ahead SLP:auto-vectorization in the presence of commutative operations[C]//Proceedings of International Symposium on Code Generation and Optimization.Washington D.C., USA:IEEE Press, 2018:163-174.
[10] HAO Z, XUE J L.A compiler approach for exploiting partial SIMD parallelism[J].ACM Transactions on Architecture and Code Optimization, 2016, 13(1):1-26.
[11] MOLDOVANOVA O V, KURNOSOV M G, MELNIKOV A.Energy efficiency and performance of auto-vectorized loops on Intel Xeon processors[C]//Proceedings of 2018 Russian-Pacific Conference on Computer Technology and Applications.Washington D.C., USA:IEEE Press, 2018:1-6.
[12] 李威, 梁军, 张桢, 等.基于ARM GPU的机载SAR成像算法并行优化策略[J].计算机工程, 2020, 46(10):240-247. LI W, LIANG J, ZHANG Z, et al.Parallel optimization strategy of airborne SAR imaging algorithm based on ARM GPU[J].Computer Engineering, 2020, 46(10):240-247.(in Chinese)
[13] VASILEIOS P, RODRIGO C O R, LUÍS F W G.VW-SLP:auto-vectorization with adaptive vector width[C]//Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques.New York, USA:ACM Press, 2018:1-15.
[14] ZHOU H, XUE J L.A compiler approach for exploiting partial SIMD parallelism[J].ACM Transactions on Architecture and Code Optimization, 2016, 13(11):26-35.
[15] RODRIGO C O R, VASILEIOS P, PAVLOS P, et al. Vectorization-aware loop unrolling with seed forwarding[C]//Proceedings of the 29th International Conference on Compiler Construction.New York, USA:ACM Press, 2020:1-13.
[16] ZHOU H, XUE J.Exploiting mixed SIMD parallelism by reducing data reorganization overhead[C]//Proceedings of 2016 International Symposium on Code Generation and Optimization.New York, USA:ACM Press, 2016:59-69.
[17] SIMON M, SHREY S, MATTHIAS K, et al.Multi-dimensional vectorization in LLVM[C]//Proceedings of the 5th Workshop on Programming Models for SIMD/Vector Processing.New York, USA:ACM Press, 2019:1-8.
[18] ANDREW A, AVINASH M, DAVID G.Automatic vectorization of interleaved data revisited[J].ACM Transactions on Architecture and Code Optimization, 2016, 13(2):1-25.
[19] LIU J, ZHANG Y R, JANG O Y, et al.A compiler framework for extracting superword level parallelism[C]//Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation.New York, USA:ACM Press, 2012:347-358.
[20] BOHM C, PLANT C.Mining massive vector data on single instruction multiple data microarchitectures[C]//Proceedings of 2015 IEEE International Conference on Data Mining Workshop.Washington D.C., USA:IEEE Press, 2015:597-606.
[21] PORPODAS V, ROCHA R, BREVNOV E, et al.Super-node SLP:optimized vectorization for code sequences containing operators and their inverse elements[C]//Proceedings of 2019 IEEE/ACM International Symposium on Code Generation and Optimization.Washington D.C., USA:IEEE Press, 2019:206-216.
[22] YAO J Y, ZHAO R C, WANG Q, et al.Loop-nest auto-vectorization method based on benefit analysis[C]//Proceedings of the 2nd International Conference on Advances in Image Processing.New York, USA:ACM Press, 2018:240-244.

选择文件类型/文献管理软件名称

选择包含的内容