作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (7): 263-274. doi: 10.19678/j.issn.1000-3428.0068818

• 体系结构与软件技术Computer Architecture and Software Technology • 上一篇    下一篇

面向DCU的流固耦合浸没边界算法异构实现

商建东1,*(), 熊威1,2, 华浩波1,3, 宋昭璐1,2, 郭恒亮1, 张军4   

  1. 1. 郑州大学国家超级计算郑州中心, 河南 郑州 450001
    2. 郑州大学计算机与人工智能学院, 河南 郑州 450001
    3. 郑州航空工业管理学院数学学院, 河南 郑州 450006
    4. 南京航空航天大学航空学院, 江苏 南京 210016
  • 收稿日期:2023-11-13 出版日期:2025-07-15 发布日期:2024-06-06
  • 通讯作者: 商建东
  • 基金资助:
    河南省重大科技专项(221100210600)

Heterogeneous Implementation of Fluid-Structure Interaction Immersed Boundary Algorithm for Deep Compute Unit

SHANG Jiandong1,*(), XIONG Wei1,2, HUA Haobo1,3, SONG Zhaolu1,2, GUO Hengliang1, ZHANG Jun4   

  1. 1. National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, Henan, China
    2. School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, Henan, China
    3. School of Mathematics, Zhengzhou University of Aeronautics, Zhengzhou 450006, Henan, China
    4. College of Aerospace Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, Jiangsu, China
  • Received:2023-11-13 Online:2025-07-15 Published:2024-06-06
  • Contact: SHANG Jiandong

摘要:

直接力浸没边界法是求解流固耦合问题的常见方法之一, 其可以有效地处理复杂的几何形状, 包括移动和变形的固体。然而, 三维复杂流动模拟具有网格规模大、耗时多的特点, 在单核处理器上使用传统的串行算法往往无法满足计算要求。目前, 在国产平台上对流固耦合问题的研究较少, 而在国产平台上实现直接力浸没边界算法能够丰富平台的应用生态。为此, 使用国产DCU(Deep Compute Unit)加速器, 基于CPU-DCU异构编程, 设计并实现利用三维直接力浸没边界算法求解流固耦合问题的并行程序。首先, 在CPU上实现串行算法并进行热点分析, 对程序热点部分使用DCU加速器进行异构加速; 其次, 在异构实现的基础上, 结合DCU硬件特性, 使用共享内存、循环分块、调整访存顺序等优化手段对核函数进行优化; 最后, 通过圆球绕流和仿生鱼自主游动算例对程序进行正确性验证和性能测试。实验结果表明, 在雷诺数分别为100和200时, 圆球的阻力系数分别为1.11和0.78, 计算结果均与相关文献吻合; 在雷诺数为7 142的仿生鱼自主游动实验中, 游动稳定后的平均前进速度为0.396, 该结果与相关文献结果一致。在圆球绕流实验中, 在5 033万网格规模下该并行程序较串行程序获得了83.7倍的加速效果。通过两类流固耦合数值实验, 验证了CPU-DCU并行直接力浸没边界算法在国产异构平台上进行计算的有效性和准确性, 这为国产平台上CFD(Computational Fluid Dynamics)算法的研究提供了坚实基础。

关键词: 异构计算, DCU加速器件, 访存优化, 直接力浸没边界法, 流固耦合

Abstract:

The direct-forcing immersed boundary method is widely used for solving fluid-structure interaction problems because it can effectively handle complex geometries, including moving and deforming solids. However, three-dimensional complex flow simulations are characterized by large grid scales and consume high computational time, which hinder traditional serial algorithms running on single-core processors from meeting computational requirements. Currently, the research on fluid-structure interaction problems on domestic platforms is limited, and implementing the direct-forcing immersed boundary method on such platforms can enrich their application ecosystems. To this end, this study leverages the domestic Deep Compute Unit (DCU) accelerator and designs and implements a parallel program based on CPU-DCU heterogeneous programming, to solve fluid-structure interaction problems using the three-dimensional direct-forcing immersed boundary method. First, a serial algorithm is implemented on a CPU, and a hotspot analysis is conducted to identify the computationally intensive parts of the program, which are then accelerated using the DCU accelerator in a heterogeneous manner. Second, based on the heterogeneous implementation, the kernel functions are optimized by incorporating the hardware characteristics of the DCU, such as shared memory, loop tiling, and memory access order adjustment. Finally, the correctness and performance of the program are validated and tested through case studies involving the flow around a sphere and self-propelled swimming of a biomimetic fish. Experimental results show that at Reynolds numbers of 100 and 200, the drag coefficients of the sphere are 1.11 and 0.78, respectively, which are in good agreement with the relevant literature. In the self-propelled swimming experiment with the biomimetic fish at a Reynolds number of 7 142, the average forward velocity after stable swimming is 0.396, which is consistent with the results from the relevant literature. In the flow-around-a-sphere experiment, the parallel program achieves an 83.7-fold speed-up compared with the serial program with a grid scale of 50.33 million. These fluid-structure interaction numerical experiments verify the effectiveness and accuracy of the CPU-DCU parallel direct-forcing immersed boundary method for computations on domestic heterogeneous platforms, providing a solid foundation for research on Computational Fluid Dynamics (CFD) algorithms on domestic platforms.

Key words: heterogeneous computing, DCU accelerator components, memory access optimization, direct-forcing immersed boundary method, fluid-structure interaction