作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (8): 207-215. doi: 10.19678/j.issn.1000-3428.0067530

• 图形图像处理 • 上一篇    下一篇

基于多绘制管线的大规模并行体绘制性能优化技术

王华维1,2,*(), 刘若妍1, 艾志玮1,2, 曹轶1,2   

  1. 1. 北京应用物理与计算数学研究所计算物理重点实验室, 北京 100088
    2. 中物院高性能数值模拟软件中心, 北京 100088
  • 收稿日期:2023-05-04 出版日期:2024-08-15 发布日期:2023-12-29
  • 通讯作者: 王华维
  • 基金资助:
    国家重点研发计划(2017YFB0202203)

Performance Optimization Technique for Large-Scale Parallel Volume Rendering Based on Multiple Rendering Pipelines

Huawei WANG1,2,*(), Ruoyan LIU1, Zhiwei AI1,2, Yi CAO1,2   

  1. 1. Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Beijing 100088, China
    2. CAEP Software Center for High Performance Numerical Simulation, Beijing 100088, China
  • Received:2023-05-04 Online:2024-08-15 Published:2023-12-29
  • Contact: Huawei WANG

摘要:

针对数值模拟输出的大规模科学数据, 体绘制方法为了刻画复杂物理特征, 会进行高密度光线采样, 但由此带来了极大的计算开销和数据增量。在国产自主CPU高性能计算机上, 由于处理器单核的计算能力低于商业CPU, 只能使用更多的处理器核来分担体绘制任务, 从而引起了采样数据并行通信的可扩展性瓶颈。为充分利用国产自主CPU高性能计算机来高效完成体绘制任务, 针对大规模并行体绘制提出一种基于多绘制管线的性能优化技术, 通过多管线、多进程的两级并行模式来降低单条管线的并行规模。在大规模并行体绘制中, 该技术将绘制目标图像划分成多个子区域, 绘制进程则相应分组, 每个进程组独立执行一条绘制管线, 以完成图像相应子区域的绘制, 最后再收集所有的图像子区域, 形成完整图像并输出。实验结果表明, 优化后的体绘制算法在国产自主CPU高性能计算机上可以扩展到万核规模, 并能有效完成体绘制任务。

关键词: 体绘制, 多管线, 两级并行, 并行可扩展性, 性能优化

Abstract:

For large-scale scientific data output in numerical simulations, volume rendering methods inevitably perform high-density ray sampling to capture complex physical features, resulting in significant computational overhead and data increment. However, on domestic autonomous-CPU supercomputers, owing to the lower computing power of a single processor core compared to that of commercial CPU, more processor cores must be used to share volume rendering tasks; this leads to scalability bottlenecks in the parallel communication of sampling data. Full utilization of domestic autonomous-CPU supercomputers to efficiently complete volume rendering tasks is an urgent problem that needs to be solved. To address this problem, this paper proposes a performance optimization technique for large-scale parallel volume rendering based on multiple rendering pipelines; here, the parallel scale of a rendering pipeline is reduced by two-level parallelism: first, at the pipeline level, and then, at the process level. In large-scale parallel volume rendering after optimization, the rendered goal image is first divided into multiple sub-regions, and all rendering processes are grouped accordingly. Each process group then executes a rendering pipeline independently, and as a result, the corresponding sub-region of the image is produced. Finally, all sub-regions of the image are collected, and the whole image is output. Experiments demonstrate that the optimized volume rendering algorithm can scale to approximately 10 000 processing cores on domestic autonomous-CPU supercomputers and can effectively complete volume rendering tasks.

Key words: volume rendering, multiple pipelines, two-level parallelism, parallel scalability, performance optimization