晶硅分子动力学模拟的GPU加速算法优化

doi:10.19678/j.issn.1000-3428.0064457

计算机工程 ›› 2023, Vol. 49 ›› Issue (4): 166-173. doi: 10.19678/j.issn.1000-3428.0064457

晶硅分子动力学模拟的GPU加速算法优化

林琳¹, 祝爱琦², 赵明璨², 张帅², 叶炎昊², 徐骥², 韩林³, 赵荣彩³, 侯超峰²

1. 郑州大学信息工程学院, 郑州 450001;
2. 中国科学院过程工程研究所, 北京 100190;
3. 郑州大学国家超级计算郑州中心, 郑州 450001

收稿日期:2022-04-13 修回日期:2022-06-02 发布日期:2022-08-22
作者简介:林琳(1997-),女,硕士研究生,主研方向为分子动力学模拟;祝爱琦、赵明璨、张帅、叶炎昊,博士研究生;徐骥,副研究员、博士;韩林,副教授、博士;赵荣彩,教授、博士;侯超峰(通信作者),副研究员、博士。
基金资助:
国家自然科学基金（21776280，22073103）；北京市自然科学基金（JQ21034）；河南省重大科技专项（201400211300）。

GPU-Accelerated Algorithm Optimization for Molecular Dynamics Simulation of Crystalline Silicon

LIN Lin¹, ZHU Aiqi², ZHAO Mingcan², ZHANG Shuai², YE Yanhao², XU Ji², HAN Lin³, ZHAO Rongcai³, HOU Chaofeng²

1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China;
2. Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China;
3. National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, China

Received:2022-04-13 Revised:2022-06-02 Published:2022-08-22

摘要/Abstract

摘要： 分子动力学（MD）模拟是研究硅纳米薄膜热力学性质的主要方法，但存在数据处理量大、计算密集、原子间作用模型复杂等问题，限制了MD模拟的深入应用。针对晶硅分子动力学模拟算法中数据访问不连续和大量分支判断造成并行资源浪费、线程等待等问题，结合Nvidia Tesla V100 GPU硬件体系结构特点，对晶硅MD模拟算法进行设计。通过全局内存的合并访存、循环展开、原子操作等优化方法，利用GPU强大并行计算和浮点运算能力，减少显存访问及算法执行过程中的分支冲突和判断指令，提升算法整体计算性能。测试结果表明，优化后的晶硅MD模拟算法的计算速度相比于优化前提升了1.69~1.97倍，相比于国际上主流的GPU加速MD模拟软件HOOMD-blue和LAMMPS分别提升了3.20~3.47倍和17.40~38.04倍，具有较好的模拟加速效果。

关键词: 分子动力学, 图形处理器, 合并访存, 循环展开, 原子操作, 性能优化

Abstract: Molecular Dynamics(MD) is one of the main methods used to study the thermodynamic properties of silicon nano-films;however, these studies have problems such as processing massive amounts of data, computational intensity, and complex interatomic interaction, which limit the comprehensive application of MD simulations.To address discontinuities in data access and branch judgments causing the parallel waste of resources and thread waiting in the crystalline silicon MD simulation algorithm, this study combines the characteristics of Nvidia Tesla V100 Graphics Processor Unit(GPU) hardware architecture to design the crystalline silicon MD simulation algorithm.Global memory optimization methods such as coalesced access, loop unrolling, and atomic operation are designed for the MD simulation algorithm.The combination of optimization design and a GPU with powerful parallel and floating-point computing capabilities reduces branch conflicts and judgment instructions during memory access and algorithm execution and improves the overall computing performance of the algorithm.The test results show that the optimized crystal silicon MD simulation algorithm is 1.69-1.97 times faster than the unoptimized algorithm.The optimized algorithm performs 3.20-3.47 and 17.40-38.04 times better than the GPU-accelerated MD simulation software HOOMD-blue and LAMMPS, respectively.The simulations achieve good computation performance.

Key words: Molecular Dynamics(MD), Graphics Processor Unit(GPU), coalesced access, loop unrolling, atomic operation, performance optimization

中图分类号:

TP391.9

林琳, 祝爱琦, 赵明璨, 张帅, 叶炎昊, 徐骥, 韩林, 赵荣彩, 侯超峰. 晶硅分子动力学模拟的GPU加速算法优化[J]. 计算机工程, 2023, 49(4): 166-173.

LIN Lin, ZHU Aiqi, ZHAO Mingcan, ZHANG Shuai, YE Yanhao, XU Ji, HAN Lin, ZHAO Rongcai, HOU Chaofeng. GPU-Accelerated Algorithm Optimization for Molecular Dynamics Simulation of Crystalline Silicon[J]. Computer Engineering, 2023, 49(4): 166-173.

https://www.ecice06.com/CN/Y2023/V49/I4/166

图/表 11

20230417185312

20230417185315

20230417185318

20230417185322

20230417185325

20230417185328

20230417185332

20230417185335

20230417185339

20230417185346

20230417185405

参考文献

[1] 赵成龙, 施慧彬, 俞忻峰.基于OpenCL的Lammps短程力算法优化研究[J].计算机工程与科学, 2015, 37(9):1614-1620. ZHAO C L, SHI H B, YU X F.Short-range force algorithm optimization in Lammps based on OpenCL[J].Computer Engineering & Science, 2015, 37(9):1614-1620.(in Chinese)
[2] HOU C F, XU J, WANG P, et al.Efficient GPU-accelerated molecular dynamics simulation of solid covalent crystals[J].Computer Physics Communications, 2013, 184(5):1364-1371.
[3] HOU C F, XU J, WANG P, et al.Petascale molecular dynamics simulation of crystalline silicon on Tianhe-1A[J].International Journal of High Performance Computing Applications, 2013, 27(3):307-317.
[4] 侯超峰, 高国贤, 徐骥.纳米材料制备及物性测量的虚拟过程工程初探[J].计算机与应用化学, 2016, 33(9):1003-1007. HOU C F, GAO G X, XU J.Explorations to the virtual process engineering offabrication and property measurement of nanomaterials[J].Computers and Applied Chemistry, 2016, 33(9):1003-1007.(in Chinese)
[5] HOU C F, ZHANG C L, GE W, et al.Record atomistic simulation of crystalline silicon:bridging microscale structures and macroscale properties[J].Journal of Computational Chemistry, 2020, 41(7):731-738.
[6] MINKIN A S, KNIZHNIK A A, POTAPKIN B V.GPU implementations of some many-body potentials for molecular dynamics simulations[J].Advances in Engineering Software, 2017, 111:43-51.
[7] 潘林.基于GPU的分子动力学势函数及结构特征量模拟系统[D].哈尔滨:哈尔滨工业大学, 2014. PAN L.GPU-based molecular dynamics potential function and structural characteristic simulation system[D].Harbin:Harbin Institute of Technology, 2014.(in Chinese)
[8] THOMPSON A P, AKTULGA H M, BERGER R, et al.LAMMPS-a flexible simulation tool for particle-based materials modeling at the atomic, meso, and continuum scales[J].Computer Physics Communications, 2022, 271:108171.
[9] ANDERSON J A, GLASER J, GLOTZER S C.HOOMD-blue:a Python package for high-performance molecular dynamics and hard particle Monte Carlo simulations[J].Computational Materials Science, 2020, 173:109363.
[10] PHILLIPS J C, BRAUN R, WANG W, et al.Scalable molecular dynamics with NAMD[J].Journal of Computational Chemistry, 2005, 26(16):1781-1802.
[11] ABRAHAM M J, MURTOLA T, SCHULZ R, et al.GROMACS:high performance molecular simulations through multi-level parallelism from laptops to supercomputers[J].SoftwareX, 2015, 1/2:19-25.
[12] 胡蓉, 阳王东, 王昊天, 等.基于GPU加速的并行WMD算法[J].计算机科学, 2021, 48(12):24-28. HU R, YANG W D, WANG H T, et al.Parallel WMD algorithm based on GPU acceleration[J].Computer Science, 2021, 48(12):24-28.(in Chinese)
[13] TERSOFF J.Empirical interatomic potential for silicon with improved elastic properties[J].Physical Review B, 1988, 38(14):9902-9905.
[14] LIANG J, HUA R, ZHANG H, et al Accelerated molecular dynamics simulation of silicon crystals on TaihuLight using OpenACC[J].Parallel Computing, 2020, 99:102667.
[15] TERSOFF J.New empirical model for the structural properties of silicon[J].Physical Review Letters, 1986, 56(6):632-635.
[16] TERSOFF J.New empirical approach for the structure and energy of covalent systems[J].Physical Review B, 1988, 37(12):6991-7000.
[17] LEE T S, CERUTTI D S, MERMELSTEIN D, et al.GPU-accelerated molecular dynamics and free energy methods in Amber18:performance enhancements and new features[J].Journal of Chemical Information and Modeling, 2018, 58(10):2043-2050.
[18] SGHERZI F, PARRAVICINI A, SANTAMBROGIO M D.A mixed precision, multi-GPU design for large-scale top-K sparse eigenproblems[EB/OL].[2022-03-17].https://arxiv.org/abs/2201.07498.
[19] QU G, SUN Z.In silico prediction methods for site-saturation mutagenesis[J].Methods in Molecular Biology, 2022, 2397:49-69.
[20] AAIJ R, ADINOLFI M, AIOLA S, et al.A comparison of CPU and GPU implementations for the LHCb experiment Run 3 trigger[J].Computing and Software for Big Science, 2022, 6(1):1-20.
[21] 刘云, 董守杰.基于CUDA核函数的多路视频图像拼接加速算法[J].计算机科学, 2022, 49(S1):441-446, 561. LIU Y, DONG S J.Acceleration algorithm of multi-channel video image stitching based on CUDA kernel function[J].Computer Science, 2022, 49(S1):441-446, 561.(in Chinese)
[22] 朱峥嵘, 黄亚锋, 赵立营.一种利用GPU加速的轨迹线热力图生成显示方法[J].武汉大学学报(信息科学版), 2022, 47(7):1035-1042. ZHU Z R, HUANG Y F, ZHAO L Y.A method of generating and displaying trajectory line heat map with GPU acceleration[J].Geomatics and Information Science of Wuhan University, 2022, 47(7):1035-1042.(in Chinese)
[23] 宋佩涛, 张志俭, 梁亮, 等.GPU加速MOC输运计算性能分析研究[J].原子能科学技术, 2020, 54(1):103-111. SONG P T, ZHANG Z J, LIANG L, et al.Performance analysis on acceleration of transport calculation with method of characteristics based on GPU[J].Atomic Energy Science and Technology, 2020, 54(1):103-111.(in Chinese)
[24] GUO X W, LI C, LI W, et al.Improving performance for simulating complex fluids on massively parallel computers by component loop-unrolling and communication hiding[C]//Proceedings of the 22nd International Conference on High Performance Computing and Communications;the 18th International Conference on Smart City;the 6th International Conference on Data Science and Systems(HPCC/SmartCity/DSS).Washington D.C., USA:IEEE Press, 2021:130-137.
[25] ZHANG X, SUN X, GUO X H, et al.Re-evaluation of atomic operations and graph coloring for unstructured finite volume GPU simulations[C]//Proceedings of the 32nd International Symposium on Computer Architecture and High Performance Computing.Washington D.C., USA:IEEE Press, 2020:297-304.
[26] YANG L, ZHANG F, WANG C Z, et al.Implementation of metal-friendly EAM/FS-type semi-empirical potentials in HOOMD-blue:a GPU-accelerated molecular dynamics software[J].Journal of Computational Physics, 2018, 359:352-360.
[27] BROWN W M, YAMADA M.Implementing molecular dynamics on hybrid high performance computers-three-body potentials[J].Computer Physics Communications, 2013, 184(12):2785-2793.

选择文件类型/文献管理软件名称

选择包含的内容

晶硅分子动力学模拟的GPU加速算法优化

GPU-Accelerated Algorithm Optimization for Molecular Dynamics Simulation of Crystalline Silicon

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	张磊, 赵光岳, 肖超恩, 王建新. Falcon后量子算法的密钥树生成部件GPU并行优化设计与实现[J]. 计算机工程, 2024, 50(9): 208-215.
[2]	王华维, 刘若妍, 艾志玮, 曹轶. 基于多绘制管线的大规模并行体绘制性能优化技术[J]. 计算机工程, 2024, 50(8): 207-215.
[3]	黄斌, 柳安军, 潘景山, 田敏, 张煜, 朱光慧. 基于GPU的LBM迁移模块算法优化[J]. 计算机工程, 2024, 50(2): 232-238.
[4]	李靖, 祝爱琦, 韩林, 侯超峰. 基于GPU的固态晶体硅分子动力学算法优化[J]. 计算机工程, 2023, 49(3): 288-295.
[5]	范明亮, 郭子涵, 柴晓楠, 商建东. 面向FT-M7002的Sobel边缘检测算法优化实现[J]. 计算机工程, 2022, 48(6): 193-199.
[6]	肖汉, 郭宝云, 李彩林, 周清雷. 面向异构架构的传递闭包并行算法[J]. 计算机工程, 2021, 47(8): 131-139.
[7]	彭龙, 陈俊仕, 安虹. 基于神威太湖之光的AMBER软件移植与优化[J]. 计算机工程, 2020, 46(12): 12-20.
[8]	杨世伟, 蒋国平, 宋玉蓉, 涂潇. 基于GPU的稀疏矩阵存储格式优化研究[J]. 计算机工程, 2019, 45(9): 23-31,39.
[9]	汤佳,龚奕利,李文海. 一种基于GPU的KNN动态扩展查询策略[J]. 计算机工程, 2018, 44(6): 1-7.
[10]	贺爱香,顾乃杰,苏俊杰. 基于多核ARM体系结构的基础函数优化方法[J]. 计算机工程, 2018, 44(5): 47-52,59.
[11]	高艺,罗健欣,裘杭萍,吴波. 基于GPU栅格化的任意多边形布尔运算[J]. 计算机工程, 2018, 44(3): 301-306,314.
[12]	魏渐俊,陈良育. 基于GPGPU的大整数矩阵行列式快速准确计算方法[J]. 计算机工程, 2018, 44(3): 47-54.
[13]	吉毅,贾俊铖,张书奎,王进,周经亚. 安卓端即时通信应用的心跳机制研究及性能优化[J]. 计算机工程, 2018, 44(1): 299-305.
[14]	骆慧,应时,李琳,董波. 一种支持性能优化的软件部署描述语言[J]. 计算机工程, 2017, 43(6): 11-18.
[15]	王吉军,程华. 通用图形处理器功耗估算模型[J]. 计算机工程, 2017, 43(2): 92-97,104.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

晶硅分子动力学模拟的GPU加速算法优化

GPU-Accelerated Algorithm Optimization for Molecular Dynamics Simulation of Crystalline Silicon

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价