作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (2): 232-238. doi: 10.19678/j.issn.1000-3428.0067084

• 体系结构与软件技术 • 上一篇    下一篇

基于GPU的LBM迁移模块算法优化

黄斌1, 柳安军2, 潘景山1,*(), 田敏1, 张煜3, 朱光慧1   

  1. 1. 齐鲁工业大学(山东省科学院)山东省计算中心(国家超级计算济南中心), 山东 济南 251013
    2. 济南超级计算技术研究院高性能计算实验室, 山东 济南 251013
    3. 哈尔滨工业大学能源科学与工程学院, 黑龙江 哈尔滨 150001
  • 收稿日期:2023-03-01 出版日期:2024-02-15 发布日期:2023-04-28
  • 通讯作者: 潘景山
  • 基金资助:
    国家自然科学基金(62002186); 山东省重点研发计划项目(2021RZB01002)

GPU-based Algorithm Optimization for Streaming Module of Lattice Boltzmann Method

Bin HUANG1, Anjun LIU2, Jingshan PAN1,*(), Min TIAN1, Yu ZHANG3, Guanghui ZHU1   

  1. 1. Shandong Computer Science Center(National Supercomputer Center in Jinan), Qilu University of Technology(Shandong Academy of Sciences), Jinan 251013, Shandong, China
    2. High Performance Computing Laboratory, Jinan Institute of Supercomputer Technology, Jinan 251013, Shandong, China
    3. School of Energy Science and Engineering, Harbin Institute of Technology, Harbin 150001, Heilongjiang, China
  • Received:2023-03-01 Online:2024-02-15 Published:2023-04-28
  • Contact: Jingshan PAN

摘要:

格子玻尔兹曼方法(LBM)是一种基于介观模拟尺度的计算流体力学方法,其在计算时设置大量的离散格点,具有适合并行的特性。图形处理器(GPU)中有大量的算术逻辑单元,适合大规模的并行计算。基于GPU设计LBM的并行算法,能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信,存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑,通过模型降维,将三维模型按照速度分量离散为多个二维模型,降低模型的复杂度;然后分析迁移模块计算前后格点中的数据差异,通过数据定位找到迁移模块的通信规律,并对格点之间的数据交换方式进行分类;最后使用分类的交换方式对离散的二维模型进行区域划分,设计新的数据通信方式,由此消除数据依赖的影响,将迁移模块完全并行化。对并行算法进行测试,结果显示:该算法在1.3×108规模网格下能达到1.92的加速比,表明算法具有良好的并行效果;同时对比未将迁移模块并行化的算法,所提优化策略能提升算法30%的并行计算效率。

关键词: 高性能计算, 格子玻尔兹曼方法, 图形处理器, 并行优化, 数据重排

Abstract:

The Lattice Boltzmann Method(LBM) is a Computational Fluid Dynamics(CFD) method based on a mesoscopic simulation scale. A large number of discrete lattice points suitable for parallelism are set during the calculation. Several arithmetic logic units in a Graphics Processing Unit(GPU) are suitable for large-scale parallel computing. The design of a GPU-based LBM parallel algorithm can improve the computational efficiency of the algorithm. However, the calculation of each lattice point in the streaming module of the LBM algorithm requires communication with other lattice points that have strong data dependence. In this study, a GPU-based optimization strategy for an LBM streaming module is proposed. First, the implementation logic of the migration part is analyzed in detail, and a three-dimensional model is discretized into several two-dimensional models according to the velocity component through model dimension reduction, which reduces the complexity of the model. Second, the data differences in the lattice points before and after the streaming module calculation are analyzed, the communication rules of the streaming module are determined through data positioning, and the data exchange modes between the lattice points are classified. The discrete two-dimensional model is thereafter divided into regions using a classified exchange mode, and a new data communication mode is designed. Finally, the influence of data dependence is successfully eliminated and the streaming module is completely parallel. The parallel algorithm is tested, and an acceleration ratio of 1.92 times is achieved under 1.3×108 grids, which shows that the algorithm has a good parallel effect. Meanwhile, compared with an algorithm that does not parallelize the streaming module, the optimization strategy in this study can improve the parallel computing efficiency of the algorithm by 30%.

Key words: High Performance Computing(HPC), Lattice Boltzmann Method(LBM), Graphics Processing Unit(GPU), parallel optimization, data rearrangement