作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (1): 142-148. doi: 10.19678/j.issn.1000-3428.0060240

• 先进计算与数据处理 • 上一篇    下一篇

面向国产平台的LLVM自动向量化移植与优化

李嘉楠1, 韩林2, 柴赟达1   

  1. 1. 郑州大学 信息工程学院, 郑州 450000;
    2. 国家超级计算郑州中心, 郑州 450000
  • 收稿日期:2020-12-09 修回日期:2021-01-19 发布日期:2022-01-04
  • 作者简介:李嘉楠(1994-),女,硕士研究生,主研方向为先进编译技术、高性能计算;韩林,副教授;柴赟达,硕士研究生。
  • 基金资助:
    国家重点研发计划“全球对地观测成果管理及共享服务系统关键技术研究”(2018YFB0505000)。

Automatic Vectorization Transplant and Optimization of LLVM for Domestic Processors

LI Jia'nan1, HAN Lin2, CHAI Yunda1   

  1. 1. School of Information Engineering, Zhengzhou University, Zhengzhou 450000, China;
    2. National Supercomputing Center in Zhengzhou, Zhengzhou 450000, China
  • Received:2020-12-09 Revised:2021-01-19 Published:2022-01-04

摘要: 作为SIMD扩展部件向量化的重要手段,自动向量化已在LLVM编译器中得到实现,但向量长度以及指令集功能的差异,导致国产平台在自动向量化过程中容易错失向量化机会以及向量化后产生倒加速的问题。为使SIMD得到充分应用,结合国产平台的指令集特征完善指令代价信息以提高收益分析精准度,使其在自动向量化后生成后端支持且简洁高效的向量指令。在此基础上,提出一种改进的控制流向量化方法,通过添加指令代价信息提高自动向量化的适配能力,从而形成一套面向国产平台的LLVM自动向量化系统。实验结果表明,相比自动向量化移植前,通过该方法进行移植优化后,SPEC测试的整体性能提升10.8%,TSVC测试集中的加速比提升16%,精准代价指导下的加速比提升42%,控制流向量化下的加速比提升51%。

关键词: 自动向量化, 向量化收益, 移植, LLVM编译器, 国产平台

Abstract: Automatic vectorization is essential in SIMD extension vectorization, and has been implemented in the LLVM compiler.However, the difference of vector length and instruction set functions can cause the domestic processors to lose the opportunity of vectorization in the process of automatic vectorization, or produce negative acceleration after vectorization.To make full use of SIMD, this paper discusses how to improve instruction cost information according to the instruction set features of domestic processors, so the accuracy of benefit analysis is increased.On this basis, precise and efficient vector instructions supported by the back end are generated after automatic vectorization.Furthermore, this paper proposes a vectorization method with improved control flows.By adding instruction cost information, the adaptability of automatic vectorization is improved.Finally a LLVM-based automatic vectorization system for domestic platforms is formed.The experimental results show that for the platforms having received automatic vectorization transplant, the proposed method provides a 10.8% overall performance improvement in SPEC tests, 16% acceleration ratio improvement on the TSVC test, 42% acceleration ratio improvement under the guidance of precision cost, and 51% acceleration ratio improvement under the control flow vecctorization.

Key words: automatic vectorization, vectorization cost, transplant, LLVM compiler, domestic processor

中图分类号: