计算机工程 ›› 2019, Vol. 45 ›› Issue (6): 45-51.doi: 10.19678/j.issn.1000-3428.0051226

• 先进计算与数据处理 • 上一篇    下一篇

基于神威·太湖之光的非结构网格众核优化技术

倪鸿,刘鑫   

  1. 国家并行计算机工程技术研究中心,北京 100190
  • 收稿日期:2018-04-16 出版日期:2019-06-15 发布日期:2019-06-15
  • 作者简介:倪鸿(1989—),男,工程师、硕士,主研方向为并行算法及应用;刘鑫,副研究员、博士。
  • 基金项目:

    国家重点研发计划“大规模多模式多过程地球系统模式耦合平台开发”(2016YFA0602200)。

Multi-Core optimization technology of unstructured grid based on Sunway TaihuLight

NI Hong,LIU Xin   

  1. National Research Centre of Parallel Computer Engineering and Technology,Beijing 100190,China
  • Received:2018-04-16 Online:2019-06-15 Published:2019-06-15

摘要:

为解决高性能计算中的非结构网格离散访存问题,以神威·太湖之光国产超级计算机为平台,根据异构众核处理器SW26010的体系结构特点,提出一种基于排序思想的通用众核优化算法,以减少非结构网格计算中的随机访存。基于网格划分原理,在O(n)时间内对生成的稀疏矩阵非零元素进行并行重排序。采用一种内部映射方式对计算向量实现扩展或变换,将细粒度访存转化为无写冲突的粗粒度访存。对多个实际应用算例的通量计算进行众核优化,结果表明,相比主核上的串行算法,该算法能够获得平均10倍以上的加速效果。

关键词: 离散访存, 非结构网格, 通量计算, 异构众核优化, 并行排序

Abstract:

In order to solve discrete memory access problem of unstructured grid in high performance computing,this paper proposes a general multi-core optimization algorithm according to the architecture features of the heterogeneous multi-core processor SW26010.This algorithm takes the Chinese supercomputer,Sunway TaihuLight,as the platform,and is based on a sorting approach.Based on the principle of mesh generation,generated non-zero elements of the sparse matrix are reordered in O(n) time.An internal mapping method is used to extend or transform the computational vectors,and the fine-grained memory access is transformed into the coarse-grained access without writing conflicts.Multi-core optimization is carried out for the flux calculation in several practical examples.Experimental results show that compared with the serial algorithm on the main core,the proposed algorithm can achieve an average acceleration of more than 10 times.

Key words: discrete memory access, unstructured grid, flux calculation, heterogeneous multi-core optimization, parallel sorting

中图分类号: