基于GPU的LBM迁移模块算法优化

doi:10.19678/j.issn.1000-3428.0067084

摘要/Abstract

摘要：

格子玻尔兹曼方法（LBM）是一种基于介观模拟尺度的计算流体力学方法，其在计算时设置大量的离散格点，具有适合并行的特性。图形处理器（GPU）中有大量的算术逻辑单元，适合大规模的并行计算。基于GPU设计LBM的并行算法，能够提高计算效率。但是LBM算法迁移模块中每个格点的计算都需要与其他格点进行通信，存在较强的数据依赖。提出一种基于GPU的LBM迁移模块算法优化策略。首先分析迁移部分的实现逻辑，通过模型降维，将三维模型按照速度分量离散为多个二维模型，降低模型的复杂度；然后分析迁移模块计算前后格点中的数据差异，通过数据定位找到迁移模块的通信规律，并对格点之间的数据交换方式进行分类；最后使用分类的交换方式对离散的二维模型进行区域划分，设计新的数据通信方式，由此消除数据依赖的影响，将迁移模块完全并行化。对并行算法进行测试，结果显示：该算法在1.3×10⁸规模网格下能达到1.92的加速比，表明算法具有良好的并行效果；同时对比未将迁移模块并行化的算法，所提优化策略能提升算法30%的并行计算效率。

关键词: 高性能计算, 格子玻尔兹曼方法, 图形处理器, 并行优化, 数据重排

Abstract:

The Lattice Boltzmann Method(LBM) is a Computational Fluid Dynamics(CFD) method based on a mesoscopic simulation scale. A large number of discrete lattice points suitable for parallelism are set during the calculation. Several arithmetic logic units in a Graphics Processing Unit(GPU) are suitable for large-scale parallel computing. The design of a GPU-based LBM parallel algorithm can improve the computational efficiency of the algorithm. However, the calculation of each lattice point in the streaming module of the LBM algorithm requires communication with other lattice points that have strong data dependence. In this study, a GPU-based optimization strategy for an LBM streaming module is proposed. First, the implementation logic of the migration part is analyzed in detail, and a three-dimensional model is discretized into several two-dimensional models according to the velocity component through model dimension reduction, which reduces the complexity of the model. Second, the data differences in the lattice points before and after the streaming module calculation are analyzed, the communication rules of the streaming module are determined through data positioning, and the data exchange modes between the lattice points are classified. The discrete two-dimensional model is thereafter divided into regions using a classified exchange mode, and a new data communication mode is designed. Finally, the influence of data dependence is successfully eliminated and the streaming module is completely parallel. The parallel algorithm is tested, and an acceleration ratio of 1.92 times is achieved under 1.3×10⁸ grids, which shows that the algorithm has a good parallel effect. Meanwhile, compared with an algorithm that does not parallelize the streaming module, the optimization strategy in this study can improve the parallel computing efficiency of the algorithm by 30%.

Key words: High Performance Computing(HPC), Lattice Boltzmann Method(LBM), Graphics Processing Unit(GPU), parallel optimization, data rearrangement

黄斌, 柳安军, 潘景山, 田敏, 张煜, 朱光慧. 基于GPU的LBM迁移模块算法优化[J]. 计算机工程, 2024, 50(2): 232-238.

Bin HUANG, Anjun LIU, Jingshan PAN, Min TIAN, Yu ZHANG, Guanghui ZHU. GPU-based Algorithm Optimization for Streaming Module of Lattice Boltzmann Method[J]. Computer Engineering, 2024, 50(2): 232-238.

http://www.ecice06.com/CN/Y2024/V50/I2/232

图/表 15

图1 LBM实现流程

Fig.1 Implementation process of LBM

图2 D3Q19速度分量模型

Fig.2 D3Q19 velocity component model

图3 swap函数的运算法则

Fig.3 Operation law of swap function

图4 三维模型转换为二维模型的示意图

Fig.4 Schematic diagram of converting a three-dimensional model to two-dimensional models

图5 二维模型下交换射线在网格中的排布方式

Fig.5 Arrangement mode of exchange rays in grid in a two-dimensional model

图6 同方向上的格点数大于等于3时数据的交换方式

Fig.6 Data exchange mode when the number of lattice points in the same direction is greater than or equal to 3

图7 同方向上的格点数为1或2时数据的交换方式

Fig.7 Data exchange mode when the number of lattice points in the same direction is 1 or 2

图8 串行算法和并行算法迭代中各个部分的执行时间对比

Fig.8 Comparison of execution time of each part in iteration by serial algorithm and parallel algorithm

图9 不同网格规模下3种算法的效率对比

Fig.9 Efficiency comparison of three algorithms at different grid scales

图10 不同网格规模下加速比对比

Fig.10 Comparison of acceleration ratios at different grid scales

参考文献 26

1	SALEH A M, VAHEDI T H. A simple semi-numerical model for designing pleated air filters under dust loading. Separation and Purification Technology, 2014, 137, 94- 108. doi: 10.1016/j.seppur.2014.09.029
2	CHEN L, KANG Q J, MU Y T, et al. A critical review of the pseudopotential multiphase lattice Boltzmann model: methods and applications. International Journal of Heat and Mass Transfer, 2014, 76, 210- 236. doi: 10.1016/j.ijheatmasstransfer.2014.04.032
3	BALLA-ARABÉ S, GAO X B, WANG B. A fast and robust level set method for image segmentation using fuzzy clustering and lattice Boltzmann method. IEEE Transactions on Cybernetics, 2013, 43(3): 910- 920. doi: 10.1109/TSMCB.2012.2218233
4	ZHANG J F. Lattice Boltzmann method for microfluidics: models and applications. Microfluidics and Nanofluidics, 2011, 10(1): 1- 28. doi: 10.1007/s10404-010-0624-1
5	王利民, 付少童. 颗粒流体系统的格子Boltzmann数值方法研究进展. 计算力学学报, 2022, 39(3): 332- 340.
	WANG L M, FU S T. Research progress of lattice Boltzmann modeling for particle-fluid system. Chinese Journal of Computational Mechanics, 2022, 39(3): 332- 340.
6	SANTOS-MARTINS D, SOLIS-VASQUEZ L, TILLACK A F, et al. Accelerating AutoDock4 with GPUs and gradient-based local search. Journal of Chemical Theory and Computation, 2021, 17(2): 1060- 1073. doi: 10.1021/acs.jctc.0c01006
7	GRILLO L, REYES R, DE SANDE F. Performance evaluation of OpenACC compilers[C]//Proceedings of the 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. Washington D. C. , USA: IEEE Press, 2014: 656-663.
8	AL-MOUHAMED M A, KHAN A H, MOHAMMAD N. A review of CUDA optimization techniques and tools for structured grid computing. Computing, 2020, 102(4): 977- 1003. doi: 10.1007/s00607-019-00744-1
9	李博, 黄东强, 贾金芳, 等. 基于CPU与GPU的异构模板计算优化研究. 计算机工程, 2023, 49(4): 131- 137. doi: 10.19678/j.issn.1000-3428.0064282
	LI B, HUANG D Q, JIA J F, et al. Research on optimization of heterogeneous stencil computing based on CPU and GPU. Computer Engineering, 2023, 49(4): 131- 137. doi: 10.19678/j.issn.1000-3428.0064282
10	MA Y, MOHEBBI R, RASHIDI M M, et al. Numerical study of MHD nanofluid natural convection in a baffled U-shaped enclosure. International Journal of Heat and Mass Transfer, 2019, 130, 123- 134. doi: 10.1016/j.ijheatmasstransfer.2018.10.072
11	RAHMAN A, NAG P, MOLLA M M. Non-Newtonian effects on MHD thermosolutal free convection and entropy production of nanofluids in a rectangular enclosure using the GPU-based mesoscopic simulation[EB/OL]. [2023-02-01]. https://www.tandfonline.com/doi/full/10.1080/17455030.2022.2119303.
12	WATANABE S, HU C H. Lattice Boltzmann simulations for multiple tidal turbines using actuator line model. Journal of Hydrodynamics, 2022, 34(3): 372- 381. doi: 10.1007/s42241-022-0037-0
13	KIANI-OSHTORJANI M, KIANI-OSHTORJANI M, MIKKOLA A, et al. Conjugate heat transfer in isolated granular clusters with interstitial fluid using lattice Boltzmann method. International Journal of Heat and Mass Transfer, 2022, 187, 122539. doi: 10.1016/j.ijheatmasstransfer.2022.122539
14	LATT J, MALASPINAS O, KONTAXAKIS D, et al. Palabos: parallel lattice Boltzmann solver. Computers & Mathematics with Applications, 2021, 81, 334- 350.
15	KOTSALOS C, LATT J, CHOPARD B. Bridging the computational gap between mesoscopic and continuum modeling of red blood cells for fully resolved blood flow. Journal of Computational Physics, 2019, 398, 108905. doi: 10.1016/j.jcp.2019.108905
16	MOHAMMADREZAEI S, SIAVASHI M, ASIAEI S. Surface topography effects on dynamic behavior of water droplet over a micro-structured surface using an improved-VOF based lattice Boltzmann method. Journal of Molecular Liquids, 2022, 350, 118509. doi: 10.1016/j.molliq.2022.118509
17	DI PALMA P R, GUYENNON N, PARMIGIANI A, et al. Impact of synthetic porous medium geometric properties on solute transport using direct 3D pore-scale simulations. Geofluids, 2019, 2019, 6810467.
18	KOTSALOS C, LATT J, BENY J, et al. Digital blood in massively parallel CPU/GPU systems for the study of platelet transport. Interface Focus, 2021, 11(1): 20190116. doi: 10.1098/rsfs.2019.0116
19	FAKHARI A, LEE T. Numerics of the lattice boltzmann method on nonuniform grids: standard LBM and finite-difference LBM. Computers & Fluids, 2015, 107, 205- 213.
20	HUANG B, LIU A J, TIAN M, et al. Parallel performance and optimization of the lattice Boltzmann method software Palabos using CUDA[C]//Proceedings of HP3C'22. New York, USA: ACM Press, 2022: 91-98.
21	MOHAMAD A. Lattice Boltzmann method. Berlin, Germany: Springer, 2011.

	MOHAMAD A. Lattice Boltzmann method. Berlin, Germany: Springer, 2011.
23	LATT J, COREIXAS C, BENY J. Cross-platform programming model for many-core lattice Boltzmann simulations. PLoS One, 2021, 16(4): e0250306. doi: 10.1371/journal.pone.0250306
24	BRAUN L, FRONING H. CUDA flux: a lightweight instruction profiler for CUDA applications[C]//Proceedings of IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems. Washington D. C. , USA: IEEE Press, 2019: 73-81.
25	YU X D, NIKITIN V, CHING D J, et al. Scalable and accurate multi-GPU-based image reconstruction of large-scale ptychography data. Scientific Reports, 2022, 12, 5334. doi: 10.1038/s41598-022-09430-3
26	RIVERA C, DI S, TIAN J N, et al. Optimizing Huffman decoding for error-bounded lossy compression on GPUs[C]//Proceedings of IEEE International Parallel and Distributed Processing Symposium. Washington D. C. , USA: IEEE Press, 2022: 717-727.

[1]	林琳, 祝爱琦, 赵明璨, 张帅, 叶炎昊, 徐骥, 韩林, 赵荣彩, 侯超峰. 晶硅分子动力学模拟的GPU加速算法优化[J]. 计算机工程, 2023, 49(4): 166-173.
[2]	李靖, 祝爱琦, 韩林, 侯超峰. 基于GPU的固态晶体硅分子动力学算法优化[J]. 计算机工程, 2023, 49(3): 288-295.
[3]	方燕飞, 刘齐, 董恩铭, 李雁冰, 过锋, 王谛, 何王全, 漆锋滨. 面向E级超算系统的众核片上存储层次研究[J]. 计算机工程, 2023, 49(12): 10-24.
[4]	刘康, 万伟, 刘波, 李俊宏, 李柱. 基于“嵩山”超级计算机的UCX库分析与优化[J]. 计算机工程, 2023, 49(12): 274-281.
[5]	刘博阳, 胡舒凯, 施得君, 卢宏生. VTFTR：高维胖树中的无死锁容错路由算法[J]. 计算机工程, 2022, 48(12): 38-44,53.
[6]	建澜涛, 任秀江, 张祯, 石嵩, 黄益明, 张春林. E级高性能计算机的维护故障诊断系统研究[J]. 计算机工程, 2022, 48(12): 24-37.
[7]	肖汉, 郭宝云, 李彩林, 周清雷. 面向异构架构的传递闭包并行算法[J]. 计算机工程, 2021, 47(8): 131-139.
[8]	刘旭, 张曦煌, 刘钊, 吕小敬, 朱光辉. 基于神威太湖之光的宇宙学多体模拟[J]. 计算机工程, 2020, 46(9): 35-43.
[9]	彭龙, 陈俊仕, 安虹. 基于神威太湖之光的AMBER软件移植与优化[J]. 计算机工程, 2020, 46(12): 12-20.
[10]	孙震宇, 石京燕, 孙功星, 杜然, 姜晓巍, 邹佳恒, 谭宏楠. 大规模异构计算集群的双层作业调度系统[J]. 计算机工程, 2020, 46(1): 187-195.
[11]	杨世伟, 蒋国平, 宋玉蓉, 涂潇. 基于GPU的稀疏矩阵存储格式优化研究[J]. 计算机工程, 2019, 45(9): 23-31,39.
[12]	汤佳,龚奕利,李文海. 一种基于GPU的KNN动态扩展查询策略[J]. 计算机工程, 2018, 44(6): 1-7.
[13]	魏渐俊,陈良育. 基于GPGPU的大整数矩阵行列式快速准确计算方法[J]. 计算机工程, 2018, 44(3): 47-54.
[14]	高艺,罗健欣,裘杭萍,吴波. 基于GPU栅格化的任意多边形布尔运算[J]. 计算机工程, 2018, 44(3): 301-306,314.
[15]	陈曦,朱建涛,何晓斌. 一种面向高性能计算的分布式对象存储系统[J]. 计算机工程, 2017, 43(8): 69-73.

选择文件类型/文献管理软件名称

选择包含的内容