作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• •    

OpenFOAM中GAMG算法的GPU并行

  • 发布日期:2025-12-04

GPU Parallelization of the GAMG Algorithm in OpenFOAM

  • Published:2025-12-04

摘要: 在计算流体力学问题求解过程中,使用AMG算法能够有效提高求解速度。作为目前最常用的CFD开源软件,OpenFOAM中采用了基于LDU矩阵格式的GAMG算法,在CPU上实现了流场求解加速。近年来CPU+GPU的异构并行计算系统蓬勃发展,国产GPGPU也取得了突破,实现了国产化替代。面向上述异构计算系统,广泛开展了CFD中的GPU加速算法研究,在国产平台上实现对OpenFOAM中GAMG算法的异构并行化设计,能够充分发挥国产算力性能,大幅度提高流场仿真效率。面向CPU+国产GPGPU加速卡的异构计算平台,设计并实现了一种针对LDU矩阵格式的GAMG并行加速方法,充分利用了GPU多线程的并行优势,实现了GAMG全部算法组件在GPU上的并行优化。最后通过对3D顶盖驱动方腔流和motorBike绕流算例的基准测试,在不同算例规模下对异构平台上的GAMG进行正确性验证和性能测试。经实验表明,所提出的算法在计算精度方面和原版本保持一致,基于Jacobi平滑器配置的GAMG异构算法相较于基于Gauss-Seidel平滑器配置的CPU串行方式实现了10-27倍的加速效果。性能分析表明耗时占比较大的限制算子和光滑算子的计算速度得到显著提高。实验结果验证了该GAMG并行求解框架在国产异构平台上的有效性和计算潜力,为CFD求解器在国产GPGPU平台上的异构并行化与工程应用提供了可行路径与技术基础。

Abstract: In the process of solving computational fluid dynamics (CFD) problems, the Algebraic Multigrid (AMG) algorithm can effectively accelerate the solution process. As the most widely used open-source CFD software, OpenFOAM employs the Geometric Agglomerated Algebraic Multigrid (GAMG) algorithm based on the Lower-Diagonal-Upper (LDU) matrix format to accelerate flow field solutions on CPUs.In recent years, CPU+GPU heterogeneous parallel computing systems have flourished, and domestic GPGPUs have achieved breakthroughs, enabling localized substitution. Targeting such heterogeneous computing systems, extensive research has been conducted on GPU-accelerated algorithms in CFD, implementing a heterogeneous parallel version of the GAMG algorithm in OpenFOAM on domestic platforms can fully leverage domestic computing power and significantly improve simulation efficiency.Targeting a heterogeneous computing platform composed of CPUs and domestic GPGPU accelerator cards, this work designs and implements a parallel acceleration method for the LDU-based GAMG algorithm. By fully utilizing the multithreading capabilities of GPUs, all components of the GAMG algorithm are optimized for parallel execution on the GPU.Benchmark tests on the 3D lid-driven cavity flow and motorBike flow-over cases are conducted to verify the correctness and evaluate the performance of the heterogeneous GAMG algorithm at different problem scales. Experimental results show that the proposed algorithm maintains the same computational accuracy as the original version. The heterogeneous GAMG implementation configured with a Jacobi smoother achieves a 10–27× speedup compared to the CPU serial implementation configured with a Gauss-Seidel smoother. Performance analysis indicates that the computational speed of the time-dominant restriction and smoothing operators has been significantly improved.These results validate the effectiveness and computational potential of the GAMG parallel solver framework on domestic heterogeneous platforms and provide a feasible approach and technical foundation for the heterogeneous parallelization and engineering application of CFD solvers on domestic GPGPU systems.