基于多绘制管线的大规模并行体绘制性能优化技术

doi:10.19678/j.issn.1000-3428.0067530

摘要/Abstract

摘要：

针对数值模拟输出的大规模科学数据, 体绘制方法为了刻画复杂物理特征, 会进行高密度光线采样, 但由此带来了极大的计算开销和数据增量。在国产自主CPU高性能计算机上, 由于处理器单核的计算能力低于商业CPU, 只能使用更多的处理器核来分担体绘制任务, 从而引起了采样数据并行通信的可扩展性瓶颈。为充分利用国产自主CPU高性能计算机来高效完成体绘制任务, 针对大规模并行体绘制提出一种基于多绘制管线的性能优化技术, 通过多管线、多进程的两级并行模式来降低单条管线的并行规模。在大规模并行体绘制中, 该技术将绘制目标图像划分成多个子区域, 绘制进程则相应分组, 每个进程组独立执行一条绘制管线, 以完成图像相应子区域的绘制, 最后再收集所有的图像子区域, 形成完整图像并输出。实验结果表明, 优化后的体绘制算法在国产自主CPU高性能计算机上可以扩展到万核规模, 并能有效完成体绘制任务。

关键词: 体绘制, 多管线, 两级并行, 并行可扩展性, 性能优化

Abstract:

For large-scale scientific data output in numerical simulations, volume rendering methods inevitably perform high-density ray sampling to capture complex physical features, resulting in significant computational overhead and data increment. However, on domestic autonomous-CPU supercomputers, owing to the lower computing power of a single processor core compared to that of commercial CPU, more processor cores must be used to share volume rendering tasks; this leads to scalability bottlenecks in the parallel communication of sampling data. Full utilization of domestic autonomous-CPU supercomputers to efficiently complete volume rendering tasks is an urgent problem that needs to be solved. To address this problem, this paper proposes a performance optimization technique for large-scale parallel volume rendering based on multiple rendering pipelines; here, the parallel scale of a rendering pipeline is reduced by two-level parallelism: first, at the pipeline level, and then, at the process level. In large-scale parallel volume rendering after optimization, the rendered goal image is first divided into multiple sub-regions, and all rendering processes are grouped accordingly. Each process group then executes a rendering pipeline independently, and as a result, the corresponding sub-region of the image is produced. Finally, all sub-regions of the image are collected, and the whole image is output. Experiments demonstrate that the optimized volume rendering algorithm can scale to approximately 10 000 processing cores on domestic autonomous-CPU supercomputers and can effectively complete volume rendering tasks.

Key words: volume rendering, multiple pipelines, two-level parallelism, parallel scalability, performance optimization

王华维, 刘若妍, 艾志玮, 曹轶. 基于多绘制管线的大规模并行体绘制性能优化技术[J]. 计算机工程, 2024, 50(8): 207-215.

Huawei WANG, Ruoyan LIU, Zhiwei AI, Yi CAO. Performance Optimization Technique for Large-Scale Parallel Volume Rendering Based on Multiple Rendering Pipelines[J]. Computer Engineering, 2024, 50(8): 207-215.

https://www.ecice06.com/CN/Y2024/V50/I8/207

图/表 11

图1 光线投射体绘制原理

Fig.1 The principle of ray-casting volume rendering

图2 并行光线投射体绘制管线示意图

Fig.2 Schematic diagram of the pipeline of parallel ray-casting volume rendering

图3 体绘制时间随处理器核数的变化

Fig.3 The variation of volume rendering time with the number of processor cores

图4 基于多绘制管线的大规模并行体绘制算法优化技术示意图

Fig.4 Schematic diagram of the optimization technique based on multiple rendering pipelines for large-scale parallel volume rendering algorithm

图5 管线之间和内部的图像两级划分示意图

Fig.5 Schematic diagram of two-level division of images between and inside pipelines

图6 激光等离子体相互作用数值模拟结果数据体绘制效果

Fig.6 Volume rendering effect of the resulting data from a numerical simulation of laser plasma interaction

图7 体绘制算法优化后绘制时间和通信时间随处理器核数的变化

Fig.7 The variation of rendering time and communication time with the number of processor cores after optimizing the volume rendering algorithm

图8 体绘制算法优化前后的通信时间对比

Fig.8 Comparison of communication time before and after optimization of volume rendering algorithm

图9 体绘制算法优化前后的数据读入时间对比

Fig.9 Comparison of data reading time before and after optimization of volume rendering algorithm

图10 激光在等离子体中传播与成丝机理数值模拟结果体绘制效果

Fig.10 Volume rendering effect of the resulting data from the numerical simulation of laser propagation and filamentation in plasma

图11 体绘制算法优化后绘制时间和通信时间随处理器核数和进程组数的变化

Fig.11 The variation of rendering time and communication time with the number of processor cores and process groups after optimizing the volume rendering algorithm

参考文献 45

1	LEVOY M. Display of surfaces from volume data. IEEE Computer Graphics and Applications, 1988, 8(3): 29- 37. doi: 10.1109/38.511
2	DREBIN R A, CARPENTER L, HANRAHAN P. Volume rendering. ACM SIGGRAPH Computer Graphics, 1988, 22(4): 65- 74. doi: 10.1145/378456.378484
3	LEVOY M. Efficient ray tracing of volume data. ACM Transactions on Graphics, 1990, 9(3): 245- 261. doi: 10.1145/78964.78965
4	王华维, 何柳, 曹轶, 等. 大规模科学数据体绘制技术综述. 国防科技大学学报, 2020, 42(2): 1- 12. URL
	WANG H W, HE L, CAO Y, et al. A survey of the techniques of volume rendering for large-scale scientific data. Journal of National University of Defense Technology, 2020, 42(2): 1- 12. URL
5	NIEH J, LEVOY M. Volume rendering on scalable shared-memory MIMD architectures[C]//Proceedings of 1992 Workshop on Volume Visualization. New York, USA: ACM Press, 1992: 17-24.
6	MA K L, PAINTER J S, HANSEN C D, et al. Parallel volume rendering using binary-swap compositing. IEEE Computer Graphics and Applications, 1994, 14(4): 59- 68. doi: 10.1109/38.291532
7	CHILDS H, BRUGGER E, BONNELL K, et al. A contract based system for large data visualization[EB/OL]. [2023-04-05]. https://visitusers.org/images/d/d7/Contracts_paper.pdf.
8	CHILDS H, PUGMIRE D, AHERN S, et al. Extreme scaling of production visualization software on diverse architectures. IEEE Computer Graphics and Applications, 2010, 30(3): 22- 31. doi: 10.1109/MCG.2010.51
9	CHILDS H, DUCHAINEAU M, MA K. A scalable, hybrid scheme for volume rendering massive data sets[EB/OL]. [2023-04-05]. http://cdux.cs.uoregon.edu/pubs/ChildsEGPGV06.pdf.
10	MOLONEY B, WEISKOPF D, MÖLLER T, et al. Scalable sort-first parallel direct volume rendering with dynamic load balancing[EB/OL]. [2023-04-05]. https://dl.acm.org/doi/abs/10.5555/2386154.2386161.
11	MOLONEY B, AMENT M, WEISKOPF D, et al. Sort-first parallel volume rendering. IEEE Transactions on Visualization and Computer Graphics, 2011, 17(8): 1164- 1177. doi: 10.1109/TVCG.2010.116
12	WANG J M, BI C K, DENG L, et al. A composition-free parallel volume rendering method. Journal of Visualization, 2021, 24(3): 531- 544. doi: 10.1007/s12650-020-00719-x
13	HOWISON M, BETHEL E W, CHILDS H. Hybrid parallelism for volume rendering on large-, multi-, and many-core systems. IEEE Transactions on Visualization and Computer Graphics, 2012, 18(1): 17- 29. doi: 10.1109/TVCG.2011.24
14	MORELAND K, SEWELL C, USHER W, et al. VTK-m: accelerating the visualization toolkit for massively threaded architectures. IEEE Computer Graphics and Applications, 2016, 36(3): 48- 58. doi: 10.1109/MCG.2016.48
15	WALD I, JOHNSON G P, AMSTUTZ J, et al. OSPRay—A CPU ray tracing framework for scientific visualization. IEEE Transactions on Visualization and Computer Graphics, 2017, 23(1): 931- 940. doi: 10.1109/TVCG.2016.2599041
16	WU Q, USHER W, PETRUZZA S, et al. VisIt-OSPRay: toward an exascale volume visualization system[C]//Proceedings of the Symposium on Parallel Graphics and Visualization. New York, USA: ACM Press, 2018: 13-24.
17	WANG F, WALD I, JOHNSON C R. Interactive rendering of large-scale volumes on multi-core CPUs[EB/OL]. [2023-04-05]. https://ieeexplore.ieee.org/abstract/document/8944267.
18	罗月童, 薛晔, 刘晓平. 基于GPU的多分辨率体数据重构和渲染. 计算机辅助设计与图形学学报, 2009, 21(1): 107- 111. URL
	LUO Y T, XUE Y, LIU X P. GPU based multi-resolution volume data reconstruction and rendering. Journal of Computer-Aided Design & Computer Graphics, 2009, 21(1): 107- 111. URL
19	赵利平. 基于GPU大规模数据体绘制方法研究与实现[D]. 长沙: 湖南大学, 2009.
	ZHAO L P. The research and implementation on large data sets volume rendering based on GPU[D]. Changsha: Hunan University, 2009. (in Chinese)
20	陈为, 夏佳志, 张龙, 等. 一种统一的硬件加速自适应EWA Splatting算法. 计算机学报, 2009, 32(8): 1571- 1581. URL
	CHEN W, XIA J Z, ZHANG L, et al. A uniform hardware-accelerated adaptive EWA Splatting algorithm. Chinese Journal of Computers, 2009, 32(8): 1571- 1581. URL
21	ENGEL K, KRAUS M, ERTL T. High-quality pre-integrated volume rendering using hardware-accelerated pixel shading[C]//Proceedings of ACM SIGGRAPH/EUROGRAPHICS Workshop on Graphics Hardware. New York, USA: ACM Press, 2001: 9-16.
22	STEGMAIER S, STRENGERT M, KLEIN T, et al. A simple and flexible volume rendering framework for graphic-shardware-based raycasting[C]//Proceedings of the 4th Eurographics/IEEE VGTC Conference on Volume Graphics. Washington D.C., USA: IEEE Press, 2005: 187-195.
23	KRAUS M, STRENGERT M, KLEIN T, et al. Adaptive sampling in three dimensions for volume rendering on GPUs[C]//Proceedings of the 6th International Asia-Pacific Symposium on Visualization. Washington D.C., USA: IEEE Press, 2007: 113-120.
24	MARCHESIN S, DE VERDIERE G C. High-quality, semi-analytical volume rendering for AMR data. IEEE Transactions on Visualization and Computer Graphics, 2009, 15(6): 1611- 1618. doi: 10.1109/TVCG.2009.149
25	SINGH J M, NARAYANAN P J. Real-time ray tracing of implicit surfaces on the GPU. IEEE Transactions on Visualization and Computer Graphics, 2010, 16(2): 261- 272. doi: 10.1109/TVCG.2009.41
26	LEFOHN A E, SENGUPTA S, KNISS J, et al. Glift: generic, efficient, random-access GPU data structures. ACM Transactions on Graphics, 2006, 25(1): 60- 99. doi: 10.1145/1122501.1122505
27	FOLEY T, SUGERMAN J. KD-tree acceleration structures for a GPU raytracer[C]//Proceedings of ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware. New York, USA: ACM Press, 2005: 15-22.
28	FOUT N, MA K L. Transform coding for hardware-accelerated volume rendering. IEEE Transactions on Visualization and Computer Graphics, 2007, 13(6): 1600- 1607. doi: 10.1109/TVCG.2007.70516
29	HUGHES D M, LIM I S. Kd-Jump: a path-preserving stackless traversal for faster isosurface raytracing on GPUs. IEEE Transactions on Visualization and Computer Graphics, 2009, 15(6): 1555- 1562. doi: 10.1109/TVCG.2009.161
30	孔明明. 基于GPU集群的并行体绘制[D]. 杭州: 浙江大学, 2007.
	KONG M M. GPU cluster based parallel volume rendering[D]. Hangzhou: Zhejiang University, 2007. (in Chinese)
31	FOGAL T, CHILDS H, SHANKAR S, et al. Large data visualization on distributed memory multi-GPU clusters[C]//Proceedings of the Conference on High Performance Graphics. New York, USA: ACM Press, 2010: 57-66.
32	XU C Q, SUN G D, LIANG R H. A survey of volume visualization techniques for feature enhancement. Visual Informatics, 2021, 5(3): 70- 81. doi: 10.1016/j.visinf.2021.08.001
33	SHARMA O, ARORA T, KHATTAR A. Graph-based transfer function for volume rendering. Computer Graphics Forum, 2020, 39(1): 76- 88. doi: 10.1111/cgf.13663
34	SALHI M, KSANTINI R, ZOUARI B. A real-time image-centric transfer function design based on incremental classification. Journal of Real-Time Image Processing, 2022, 19(1): 185- 203. doi: 10.1007/s11554-021-01176-x
35	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis[EB/OL]. [2023-04-05]. https://arxiv.org/abs/2003.08934.
36	FU Q C, XU Q S, ONG Y, et al. NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction[EB/OL]. [2023-04-05]. https://arxiv.org/abs/2106.10689.
37	CHEN D C, ZHANG P, FELDMANN I, et al. Recovering fine details for neural implicit surface reconstruction[C]//Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision. Washington D.C., USA: IEEE Press, 2023: 4330-4339.
38	MA J, CHEN J J, CHEN L Y, et al. Gaussian mixture model-based target feature extraction and visualization. Journal of Visualization, 2021, 24(3): 545- 563. doi: 10.1007/s12650-020-00724-0
39	MA J, CHEN J J, YANG C. Using optimized Gaussian mixture model rules and global tracking graph for feature extraction and tracking in time-varying data. The Visual Computer, 2023, 39(5): 1869- 1892. doi: 10.1007/s00371-022-02451-z
40	ENGEL D, ROPINSKI T. Deep volumetric ambient occlusion. IEEE Transactions on Visualization and Computer Graphics, 2021, 27(2): 1268- 1278. doi: 10.1109/TVCG.2020.3030344
41	NIEMEYER M, MESCHEDER L, OECHSLE M, et al. Differentiable volumetric rendering: learning implicit 3D representations without 3D supervision[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D.C., USA: IEEE Press, 2020: 3504-3515.
42	XU Z L, SUN Q, WANG L, et al. Unsupervised image reconstruction for gradient-domain volumetric rendering. Computer Graphics Forum, 2020, 39(7): 193- 203. doi: 10.1111/cgf.14137
43	GAO Y, CHANG C, YU X X, et al. A VR-based volumetric medical image segmentation and visualization system with natural human interaction. Virtual Reality, 2022, 26(2): 415- 424. doi: 10.1007/s10055-021-00577-4
44	KIM S, JANG Y, KIM S E. Image-based TF colorization with CNN for direct volume rendering. IEEE Access, 2021, 9, 124281- 124294. doi: 10.1109/ACCESS.2021.3100429
45	FALK M, LJUNG P, LUNDSTRÖM C, et al. Feature exploration using local frequency distributions in computed tomography data[EB/OL]. [2023-04-05]. https://www.semanticscholar.org/paper/Feature-Exploration-using-Local-Frequency-in-Data-Falk-Ljung/e4d96bec5a312189663366b6ac509120667f8cb7.

[1]	林琳, 祝爱琦, 赵明璨, 张帅, 叶炎昊, 徐骥, 韩林, 赵荣彩, 侯超峰. 晶硅分子动力学模拟的GPU加速算法优化[J]. 计算机工程, 2023, 49(4): 166-173.
[2]	刘金硕, 黄朔, 邓娟. 面向PMVS算法的自动两级并行翻译方法[J]. 计算机工程, 2022, 48(12): 16-23.
[3]	吉毅,贾俊铖,张书奎,王进,周经亚. 安卓端即时通信应用的心跳机制研究及性能优化[J]. 计算机工程, 2018, 44(1): 299-305.
[4]	骆慧,应时,李琳,董波. 一种支持性能优化的软件部署描述语言[J]. 计算机工程, 2017, 43(6): 11-18.
[5]	雷鹏斌,王玲,吴宇,黄子鸿,李兰花. 软件无线电系统中CORBA中间件优化设计与实现[J]. 计算机工程, 2016, 42(6): 43-47,54.
[6]	蔡新玮,陈明,冯国富. 无线多跳网络中VoIP数据包聚合算法优化[J]. 计算机工程, 2016, 42(1): 83-88.
[7]	余超君,李春强,尚云海,张培勇. 基于Trace 合并和寄存器分配的Dalvik 优化[J]. 计算机工程, 2014, 40(10): 61-65,70.
[8]	周鹏, 周海鹰, 左德承, 李韬. 基于Spirent的Web应用性能评测[J]. 计算机工程, 2012, 38(24): 57-61.
[9]	郭正红, 郭绍忠. 基础数学库中的层次结构寄存器分配策略[J]. 计算机工程, 2012, 38(24): 266-268.
[10]	熊军军. Informix向Oracle迁移的关键技术研究[J]. 计算机工程, 2012, 38(14): 52-55.
[11]	张正, 王鸿鹏, 刘景泰, 胡怡芳. ?基于节点缓存的网络服务器性能分析与优化[J]. 计算机工程, 2012, 38(01): 5-9.
[12]	郑丽萍, 李光耀, 姜华. 口腔颌面疾病辅助诊断系统的设计与实现[J]. 计算机工程, 2011, 37(21): 279-281,284.
[13]	王伟杭, 任勇毛, 岳兆娟, 李俊. 高速长距离网络传输性能优化[J]. 计算机工程, 2011, 37(14): 94-96.
[14]	陈颖;奚宏生. 基于JBD的日志型文件系统性能优化[J]. 计算机工程, 2010, 36(8): 52-54.
[15]	栾亚建, 黄翀民, 龚高晟, 赵铁柱. Hadoop平台的性能优化研究[J]. 计算机工程, 2010, 36(14): 262-263.

选择文件类型/文献管理软件名称

选择包含的内容