计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

SIMD非对齐访存结构设计与实现

余成龙,王永文   

  1. (国防科学技术大学 计算机学院,长沙 410073)
  • 收稿日期:2015-09-15 出版日期:2016-09-15 发布日期:2016-09-15
  • 作者简介:余成龙(1990-),男,硕士研究生,主研方向为高性能微处理器设计;王永文,研究员、博士。
  • 基金项目:
    国家自然科学基金资助项目“面向超高性能计算的众线程宽向量微体系结构研究”(61170045)。

Design and Implementation of SIMD Unaligned Memory Access Structure

YU Chenglong,WANG Yongwen   

  1. (School of Computer Science,National University of Defense Technology,Changsha 410073,China)
  • Received:2015-09-15 Online:2016-09-15 Published:2016-09-15

摘要: 单指令流多数据流(SIMD)是实现数据级并行的有效方法,但访问地址非对齐的数据严重影响程序的向量化,造成处理器性能下降。为降低非对齐访存延时,对高性能应用程序的访存结构进行建模,设计并实现SIMD分离缓冲行非对齐访存结构与双体cache非对齐访存结构。实验结果表明,在双体cache非对齐访存结构下,通过两数组相加与SIMD向量化实现的非对齐访存代码可达到对齐访存代码性能的99%,提高了SIMD向量化的访存效率。

关键词: 高性能计算, 据级并行, 向量化, 单指令流多数据流扩展, 非对齐访存, Gem5模拟器

Abstract: Single Instruction Multiple Data(SIMD) is an effective approach to realize data level parallelism,but accessing unaligned data seriously affects vectorization of the program and causes processor performance degradation.In order to reduce the latency of unaligned memory access,the memory access structure of high-performance application programs is modeled.SIMD unaligned memory access structure which buffer line is splited and the memory unaligned memory access structure of dual cache are designed and implemented.Under memory unaligned memory access structure of dual cache,experimental results show that for addition of two arrays and SIMD vectorization,the performance of unaligned code is 99% of aligned code.The memory access efficiency of SIMD vectorization is improved.

Key words: high-performance computing, Data Level Parallelism(DLP), vectorization, Single Instruction Multiple Data(SIMD) extension, unaligned memory access, Gem5 simulator

中图分类号: