作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (12): 10-24. doi: 10.19678/j.issn.1000-3428.0066548

• 计算机系统前沿技术 • 上一篇    下一篇

面向E级超算系统的众核片上存储层次研究

方燕飞, 刘齐, 董恩铭, 李雁冰, 过锋, 王谛, 何王全, 漆锋滨   

  1. 国家并行计算机工程技术研究中心, 北京 100190
  • 收稿日期:2022-12-16 出版日期:2023-12-15 发布日期:2023-12-15
  • 作者简介:

    方燕飞(1980—),女,高级工程师、硕士,主研方向为软件体系结构

    刘齐,工程师、硕士

    董恩铭,工程师、博士

    李雁冰,工程师、博士

    过锋,副研究员、博士

    王谛,工程师、博士

    何王全,研究员、博士

    漆锋滨,研究员、博士

  • 基金资助:
    国家重点研发计划(2021FYB03011100)

Research on Manycore On-chip Storage Hierarchy for Exascale Supercomputer Systems

Yanfei FANG, Qi LIU, Enming DONG, Yanbing LI, Feng GUO, Di WANG, Wangquan HE, Fengbin QI   

  1. National Research Center of Parallel Computer Engineering and Technology, Beijing 100190, China
  • Received:2022-12-16 Online:2023-12-15 Published:2023-12-15

摘要:

当前众核已成为构建高性能计算(HPC)超级计算机的主流微处理器架构,为HPC领域E级超算提供强大的算力。随着众核处理器片上集成的运算核心数量不断增加,众多核心对存储资源竞争愈加激烈,“访存墙”问题越来越突出。众核片上存储层次是缓解“访存墙”问题并帮助HPC应用更好地发挥众核处理器的计算优势以提升实际应用性能的重要结构。众核片上存储层次的设计对众核片上系统性能、功耗和面积具有重要影响,是众核结构设计中的重要环节,也是业界的研究热点。由于众核芯片发展历史和片上微体系结构设计技术的不同,以及所面向的应用领域需求不同等原因,目前的HPC主流众核片上存储层次结构并不单一,但从横向比较和各处理器自身纵向发展趋势,以及从HPC与数据科学、机器学习不断融合发展带来的应用需求变化来看,SPM+Cache的混合结构最可能成为今后HPC E级超算系统众核处理器片上存储层次设计的主流选择。在面向E级计算的软件和算法层面,开展针对众核存储层次特点的设计与优化,可以帮助HPC应用更好地发挥众核处理器的计算优势,从而有效提升实际应用性能,因此面向众核片上存储层次特点的软件及算法设计与优化技术也是业界的研究热点之一。首先按照不同的组织方式将片上存储层次分为多级Cache结构、SPM结构和SPM+Cache混合结构,并总结分析3种结构的优缺点。然后分析国际主流GPU、同构众核、国产众核等面向主流E级超算系统的众核处理器片上存储层次设计现状与发展趋势。最后从众核LLC管理与缓存一致性协议、SPM空间管理与数据移动优化、SPM+Cache混合结构的全局视角优化等角度综述国际上的存储层次设计与优化相关软硬件技术的研究现状。在此基础上,从软硬件及算法设计等不同角度展望了片上存储层次的未来研究方向。

关键词: E级超算, 众核处理器, 存储层次, 高性能计算, 便签式存储器, 末级缓存

Abstract:

Manycore has become the mainstream processor architecture for building HPC supercomputer systems, providing powerful computing power for High Performance Computing(HPC) exascale supercomputers. With the increasing number of cores integrated on manycore processor chips, the competition for large-scale cores for memory resources has become more intense. Manycore on-chip memory hierarchy is an important structure that alleviates the "memory wall" problem, aids HPC applications better play the computing advantages of manycore processors, and improves the performance of practical applications. The design has a significant impact on the performance, power consumption, and area of an on-chip system. The design of a many-call on-chip memory hierarchy has a significant impact on the performance, power consumption, and area of manycore systems. It is an important part of the structural design of manycore systems and is a research interest in the industry. Owing to the differences in the development history of manycore chips, the design technology of on-chip microarchitecture, and the different requirements of the application fields, the current HPC mainstream manycore on-chip storage hierarchy is different; however, from the perspective of horizontal comparison and the vertical development trend of each processor, as well as from the changes in application requirements brought by the continuous integration and development of HPC, data science, and machine learning, the hybrid structure of the SPM+Cache would most likely become the mainstream choice for the on-chip storage hierarchy designs of manycore processors in HPC exascale supercomputer systems in the future. For exascale computing software and algorithms, the designs and optimization based on the characteristics of the manycore memory hierarchy can aid HPC applications benefit from the computing advantages of manycore processors, thus effectively improving the performance of practical applications. Therefore, software, algorithm design, and optimization technology for the characteristics of the manycore on-chip storage hierarchy is also a research interest in the industry. This study first partitioned the on-chip memory hierarchy into multilevel Cache, SPM, and SPM+Cache hybrid structures according to different organizations, and then summarized and analyzed the advantages and disadvantages of these structures. This study analyzed the current status and development trend of the memory hierarchy designs of the chips of mainstream exascale supercomputer systems, such as the international mainstream GPU, homogeneous manycore, and domestic manycore. In summary, the research status of software and hardware technologies is related to the design and optimization of the memory hierarchy from the manycore of the manycore LLC management and cache consistency protocol, SPM management and data movement optimization, and the global perspective optimization of the SPM+cache hybrid architecture. Thus, this study looks forward to the future research direction of on-chip memory hierarchy based on different perspectives, such as hardware, software, and algorithm designs.

Key words: Exascale supercomputer, manycore processor, storage hierarchy, High Performance Computing(HPC), Scratchpad Memory(SPM), Last Level Cache(LLC)