面向DCU的流固耦合浸没边界算法异构实现

doi:10.19678/j.issn.1000-3428.0068818

摘要/Abstract

摘要：

直接力浸没边界法是求解流固耦合问题的常见方法之一, 其可以有效地处理复杂的几何形状, 包括移动和变形的固体。然而, 三维复杂流动模拟具有网格规模大、耗时多的特点, 在单核处理器上使用传统的串行算法往往无法满足计算要求。目前, 在国产平台上对流固耦合问题的研究较少, 而在国产平台上实现直接力浸没边界算法能够丰富平台的应用生态。为此, 使用国产DCU(Deep Compute Unit)加速器, 基于CPU-DCU异构编程, 设计并实现利用三维直接力浸没边界算法求解流固耦合问题的并行程序。首先, 在CPU上实现串行算法并进行热点分析, 对程序热点部分使用DCU加速器进行异构加速; 其次, 在异构实现的基础上, 结合DCU硬件特性, 使用共享内存、循环分块、调整访存顺序等优化手段对核函数进行优化; 最后, 通过圆球绕流和仿生鱼自主游动算例对程序进行正确性验证和性能测试。实验结果表明, 在雷诺数分别为100和200时, 圆球的阻力系数分别为1.11和0.78, 计算结果均与相关文献吻合; 在雷诺数为7 142的仿生鱼自主游动实验中, 游动稳定后的平均前进速度为0.396, 该结果与相关文献结果一致。在圆球绕流实验中, 在5 033万网格规模下该并行程序较串行程序获得了83.7倍的加速效果。通过两类流固耦合数值实验, 验证了CPU-DCU并行直接力浸没边界算法在国产异构平台上进行计算的有效性和准确性, 这为国产平台上CFD(Computational Fluid Dynamics)算法的研究提供了坚实基础。

关键词: 异构计算, DCU加速器件, 访存优化, 直接力浸没边界法, 流固耦合

Abstract:

The direct-forcing immersed boundary method is widely used for solving fluid-structure interaction problems because it can effectively handle complex geometries, including moving and deforming solids. However, three-dimensional complex flow simulations are characterized by large grid scales and consume high computational time, which hinder traditional serial algorithms running on single-core processors from meeting computational requirements. Currently, the research on fluid-structure interaction problems on domestic platforms is limited, and implementing the direct-forcing immersed boundary method on such platforms can enrich their application ecosystems. To this end, this study leverages the domestic Deep Compute Unit (DCU) accelerator and designs and implements a parallel program based on CPU-DCU heterogeneous programming, to solve fluid-structure interaction problems using the three-dimensional direct-forcing immersed boundary method. First, a serial algorithm is implemented on a CPU, and a hotspot analysis is conducted to identify the computationally intensive parts of the program, which are then accelerated using the DCU accelerator in a heterogeneous manner. Second, based on the heterogeneous implementation, the kernel functions are optimized by incorporating the hardware characteristics of the DCU, such as shared memory, loop tiling, and memory access order adjustment. Finally, the correctness and performance of the program are validated and tested through case studies involving the flow around a sphere and self-propelled swimming of a biomimetic fish. Experimental results show that at Reynolds numbers of 100 and 200, the drag coefficients of the sphere are 1.11 and 0.78, respectively, which are in good agreement with the relevant literature. In the self-propelled swimming experiment with the biomimetic fish at a Reynolds number of 7 142, the average forward velocity after stable swimming is 0.396, which is consistent with the results from the relevant literature. In the flow-around-a-sphere experiment, the parallel program achieves an 83.7-fold speed-up compared with the serial program with a grid scale of 50.33 million. These fluid-structure interaction numerical experiments verify the effectiveness and accuracy of the CPU-DCU parallel direct-forcing immersed boundary method for computations on domestic heterogeneous platforms, providing a solid foundation for research on Computational Fluid Dynamics (CFD) algorithms on domestic platforms.

Key words: heterogeneous computing, DCU accelerator components, memory access optimization, direct-forcing immersed boundary method, fluid-structure interaction

商建东, 熊威, 华浩波, 宋昭璐, 郭恒亮, 张军. 面向DCU的流固耦合浸没边界算法异构实现[J]. 计算机工程, 2025, 51(7): 263-274.

SHANG Jiandong, XIONG Wei, HUA Haobo, SONG Zhaolu, GUO Hengliang, ZHANG Jun. Heterogeneous Implementation of Fluid-Structure Interaction Immersed Boundary Algorithm for Deep Compute Unit[J]. Computer Engineering, 2025, 51(7): 263-274.

https://www.ecice06.com/CN/Y2025/V51/I7/263

图/表 17

图1 DCU加速器逻辑架构

Fig.1 Logic architecture of DCU accelerator

图2 WENO23模板计算示意图

Fig.2 WENO23 template calculation diagram

图3 程序热点分析结果

Fig.3 Program hotspot analysis results

图4 MDF-IBM异构程序流程

Fig.4 MDF-IBM heterogeneous program process

图5 v循环示意图

Fig.5 Schematic diagram of v-cycle

图6 M=2时的访存顺序

Fig.6 Memory access sequence when M=2

图7 固体边界速度δ-函数插值

Fig.7 Solid boundary velocity with δ-function interpolation

图8 圆球绕流计算域示意图

Fig.8 Schematic diagram of the calculation domain for spherical flow around a sphere

图9 阻力系数随时间的变化曲线

Fig.9 Resistance coefficient curve over time

图10 鳗鱼表面三角网格剖分

Fig.10 Triangular mesh subdivision of eel surface

图11 鳗鱼前游速度曲线

Fig.11 The velocity curve of eel swimming forward

图12 不同时刻游动鱼体附近及尾流形成的涡量图

Fig.12 Vorticity maps formed near the swimming fish body and wake at different times

参考文献 31

1	邬江兴, 祁晓峰, 高彦钊. 异构计算并行编程模型综述. 上海航天(中英文), 2021, 38 (4): 1- 11.
	WU J X , QI X F , GAO Y Z . Review of programming models for heterogeneous parallel computing. Aerospace Shanghai (Chinese & English), 2021, 38 (4): 1- 11.
2	马可. 面向国产DCU的量子傅里叶变换算法并行化研究[D]. 郑州: 郑州大学, 2022.
	MA K. Research on parallelization of quantum Fourier transform algorithm for domestic DCU[D]. Zhengzhou: Zhengzhou University, 2022. (in Chinese)
3	PESKIN C S . Numerical analysis of blood flow in the heart. Journal of Computational Physics, 1977, 25 (3): 220- 252.
4	HUA M J , PESKIN C S . An analysis of the numerical stability of the immersed boundary method. Journal of Computational Physics, 2022, 467, 111435.
5	KUO F A, WANG S T, CHOU C Y, et al. Parallelization of direct-forcing immersed boundary method using OpenACC[C]//Proceedings of the 7th International Symposium on Computing and Networking Workshops (CANDARW). Washington D.C., USA: IEEE Press, 2019: 176-179.
6	XIN J J , CHEN Z L , SHI F L , et al. An efficient large-deformation fluid-structure interaction model for flow induced oscillation of an elastic thin structure. Ocean Engineering, 2023, 278, 114348.
7	AMES J , PULERI D F , BALOGH P , et al. Multi-GPU immersed boundary method hemodynamics simulations. Journal of Computational Science, 2020, 44, 101153.
8	黄斌, 柳安军, 潘景山, 等. 基于GPU的LBM迁移模块算法优化. 计算机工程, 2024, 50 (2): 232- 238. doi: 10.3969/j.issn.1007-130X.2024.02.006
	HUANG B , LIU A J , PAN J S , et al. GPU-based algorithm optimization for streaming module of lattice Boltzmann method. Computer Engineering, 2024, 50 (2): 232- 238. doi: 10.3969/j.issn.1007-130X.2024.02.006
9	丁越, 徐传福, 邱昊中, 等. 基于SYCL的多相流LBM模拟跨平台异构并行计算研究. 计算机科学, 2023, 50 (11): 32- 40. doi: 10.11896/jsjkx.230300123
	DING Y , XU C F , QIU H Z , et al. Study on cross-platform heterogeneous parallel computing for lattice Boltzmann multi-phase flow simulations based on SYCL. Computer Science, 2023, 50 (11): 32- 40. doi: 10.11896/jsjkx.230300123
10	杨周凡, 韩林, 李冰洋, 等. 基于"嵩山"超级计算机系统的大规模管网仿真. 计算机工程, 2022, 48 (9): 155- 161. doi: 10.19678/j.issn.1000-3428.0063418
	YANG Z F , HAN L , LI B Y , et al. Large-scale pipeline network simulation based on "Songshan" supercomputer system. Computer Engineering, 2022, 48 (9): 155- 161. doi: 10.19678/j.issn.1000-3428.0063418
11	ZHOU Q W , LI J N , ZHAO R C , et al. Compilation optimization of DCU-oriented OpenMP thread scheduling. Journal of Physics: Conference Series, 2023, 2558 (1): 012003.
12	陆金甫, 关治. 偏微分方程数值解法. 3版北京: 清华大学出版社, 2016.
	LU J F , GUAN Z . Numerical solution of partial differential equation. 3rd ed Beijing: Tsinghua University Press, 2016.
13	LIU X D , OSHER S , CHAN T . Weighted essentially non-oscillatory schemes. Journal of Computational Physics, 1994, 115 (1): 200- 212.
14	JIANG G S , SHU C W . Efficient implementation of weighted ENO schemes. Journal of Computational Physics, 1996, 126 (1): 202- 228.
15	NOOR D Z , CHERN M J , HORNG T L . An immersed boundary method to solve fluid-solid interaction problems. Computational Mechanics, 2009, 44 (4): 447- 453.
16	MARIANO F P , DE QUEIROZ MOREIRA L , NASCIMENTO A A , et al. An improved immersed boundary method by coupling of the multi-direct forcing and Fourier pseudo-spectral methods. Journal of the Brazilian Society of Mechanical Sciences and Engineering, 2022, 44 (9): 388.
17	黄泽浩. 针对沉水植被水流的浸没边界—格子玻尔兹曼GPU并行算法[D]. 武汉: 武汉大学, 2021.
	HUANG Z H. Parallel algorithm of submerged boundary-lattice Boltzmann GPU for submerged vegetation flow[D]. Wuhan: Wuhan University, 2021. (in Chinese)
18	HUA H B , SHIN J , KIM J . Level set, phase-field, and immersed boundary methods for two-phase fluid flows. Journal of Fluids Engineering, 2014, 136 (2): 021301.
19	WANG Z L , FAN J R , LUO K . Combined multi-direct forcing and immersed boundary method for simulating flows with moving particles. International Journal of Multiphase Flow, 2008, 34 (3): 283- 302.
20	SMIT J , VAN SINT ANNALAND M , KUIPERS J A M . Grid adaptation with WENO schemes for non-uniform grids to solve convection-dominated partial differential equations. Chemical Engineering Science, 2005, 60 (10): 2609- 2619.
21	AL-MAHDAWI H K , SIDIKOVA A I , ALKATTAN H , et al. Parallel multigrid method for solving inverse problems. MethodsX, 2022, 9, 101887.
22	PETER S , DE A K . A parallel implementation of the ghost-cell immersed boundary method with application to stationary and moving boundary problems. Sādhanā, 2016, 41 (4): 441- 450.
23	JOHNSON T A , PATEL V C . Flow past a sphere up to a Reynolds number of 300. Journal of Fluid Mechanics, 2000, 378, 19- 70.
24	YANG Y C , BALACHANDAR S . A scalable parallel algorithm for direct-forcing immersed boundary method for multiphase flow simulation on spectral elements. The Journal of Supercomputing, 2021, 77 (3): 2897- 2927.
25	BAGCHI P , BALACHANDAR S . Steady planar straining flow past a rigid sphere at moderate Reynolds number. Journal of Fluid Mechanics, 2002, 466, 365- 407.
26	黄山. 格子Boltzmann方法模拟圆球绕流的并行实现. 中国水运(下半月), 2013, 13 (7): 212- 213.
	HUANG S . The lattice Boltzmann method simulates a parallel implementation of the flow around a sphere. China Water Transport, 2013, 13 (7): 212- 213.
27	任安禄, 李广望, 邹建峰. 中等雷诺数圆球绕流的数值研究. 浙江大学学报(工学版), 2004, 38 (5): 124- 128.
	REN A L , LI G W , ZOU J F . Numerical study of uniform flow over sphere at intermediate Reynolds numbers. Journal of Zhejiang University (Engineering Science), 2004, 38 (5): 124- 128.
28	崔祚. 身体/尾鳍游动鱼体复合波动模式及其推进性能研究[D]. 哈尔滨: 哈尔滨工业大学, 2017.
	CUI Z. Study on compound wave mode and propulsion performance of body/caudal fin swimming fish[D]. Harbin: Harbin Institute of Technology, 2017. (in Chinese)
29	KERN S , KOUMOUTSAKOS P . Simulations of optimized anguilliform swimming. The Journal of Experimental Biology, 2006, 209 (24): 4841- 4857.
30	ZHANG D , ZHANG J D , HUANG W X . Physical models and vortex dynamics of swimming and flying: a review. Acta Mechanica, 2022, 233 (4): 1249- 1288.
31	AROTE A , BADE M , BANERJEE J . On coherent structures of spatially oscillating planar liquid jet developing in a quiescent atmosphere. Physics of Fluids, 2020, 32 (8): 082111.

[1]	张明, 郭文康, 王海峰. 面向大规模动态图的异构图计算系统设计[J]. 计算机工程, 2025, 51(3): 197-207.
[2]	杨太龙, 赵红朋, 张磊. 基于国产异构平台的奇异值分解法[J]. 计算机工程, 2024, 50(9): 216-225.
[3]	严长宇, 张磊. 基于任务复制与预调度的混合列表调度算法[J]. 计算机工程, 2024, 50(12): 124-132.
[4]	杨周凡, 韩林, 李冰洋, 谢景明, 韩璞, 刘勇杰. 基于“嵩山”超级计算机系统的大规模管网仿真[J]. 计算机工程, 2022, 48(9): 155-161.
[5]	刘鹏飞, 朱健晨, 万良易, 江波. 低功耗异构计算架构的高光谱遥感图像分类研究[J]. 计算机工程, 2022, 48(12): 9-15,23.
[6]	郭恒亮, 柴晓楠, 韩林, 赫晓慧, 商建东. Canny边缘检测算法在飞腾平台上的实现与优化[J]. 计算机工程, 2021, 47(7): 37-43.
[7]	李威, 梁军, 张桢, 李青. 基于ARM GPU的机载SAR成像算法并行优化策略[J]. 计算机工程, 2020, 46(10): 240-247.
[8]	王晶,张云泉,梁军. 基于ARM V8平台的向量算法库实现与优化[J]. 计算机工程, 2019, 45(6): 82-88.
[9]	许武,梁军,李威,徐鹏飞,徐圣瑞,张福贵. 异构计算平台激光雷达算法优化研究[J]. 计算机工程, 2018, 44(7): 1-7.
[10]	魏秋明,梁军,鲍泓,王晶,李论. 异构计算平台图像边缘检测算法优化研究[J]. 计算机工程, 2017, 43(5): 240-247.
[11]	旷文，张建军，刘永凯. 异构环境下Out-Tree任务图的调度算法[J]. 计算机工程, 2013, 39(10): 63-67.
[12]	高原, 顾星, 杨群, 柯何杨. 异构系统中改进的遗传调度算法[J]. 计算机工程, 2012, 38(19): 142-146.
[13]	黄海于;何大可. 一种基于负载均衡性的网格任务调度算法[J]. 计算机工程, 2010, 36(2): 58-60.
[14]	王涛;曾志文;陈志刚. 动态自由节点滞后的任务调度算法[J]. 计算机工程, 2009, 35(12): 38-40.
[15]	肖汉雄，陈次昌，齐冬梅. 一种异构计算环境下基于复制的调度算法[J]. 计算机工程, 2006, 32(3): 108-109，148.

选择文件类型/文献管理软件名称

选择包含的内容