计算机工程 ›› 2019, Vol. 45 ›› Issue (6): 82-88.doi: 10.19678/j.issn.1000-3428.0050322

• 先进计算与数据处理 • 上一篇    下一篇

基于ARM V8平台的向量算法库实现与优化

王晶a,张云泉a,梁军b   

  1. 北京联合大学 a.北京市信息服务工程重点实验室; b.工科综合实验教学示范中心,北京 100101
  • 收稿日期:2018-01-29 出版日期:2019-06-15 发布日期:2019-06-15
  • 作者简介:王晶(1993—),女,硕士研究生,主研方向为高性能计算、并行计算;张云泉,研究员;梁军,教授。
  • 基金项目:
    国家重点研发计划(2017YFB0202105,2016YFB0200803,2017YFB0202302);国家自然科学基金重点项目(61272136);北京市教委科研计划(KM201811417006)。

Vector algorithm library implementation and optimization based on ARM V8 platform

WANG Jing a,ZHANG Yunquan a,LIANG Jun b   

  1. a.Beijing Key Laboratory of Information Service Engineering;b.Demonstration Center of Experimental Teaching in Comprehensive Engineering,Beijing Union University,Beijing 100101,China
  • Received:2018-01-29 Online:2019-06-15 Published:2019-06-15

摘要: 基于ARM V8架构的VecOp向量算法库,提出一种基础向量算法在ARM V8平台上实现和优化的方案。从访存对界优化、指令集优化、基本块优化以及向量分支优化4个方面进行精细调优,提升向量算法函数在ARM V8平台上的性能,以实现VecOp算法库在ARM V8平台上的优化。实验结果表明,该方案在ARM V8计算平台上实现的向量算法库性能提升可达到10%~300%。

关键词: 数学函数库, ARM V8架构, 向量算法库, 单指令流多数据, 访存优化

Abstract: Based on the VecOp vector algorithm library of ARM V8 architecture,this paper proposes a scheme of implementing and optimizing the basic vector algorithm lobrary on ARM V8 platform.The optimization is implemented from four aspects: memory access optimization,instruction set optimization,basic block optimization,and vector branch optimization,to improve the performance of vector algorithm functions on ARM V8 platform to optimize VecOp algorithm library on ARM V8 platform.Experimental results show that the performance of the vector algorithm library on the ARM V8 computing platform can be improved by 10%~300%.

Key words: mathematical function library, ARM V8 architecture, vector algorithm library, Single Instruction Multiple Data (SIMD), memory access optimization

中图分类号: