作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (4): 166-173. doi: 10.19678/j.issn.1000-3428.0064457

• 先进计算与数据处理 • 上一篇    下一篇

晶硅分子动力学模拟的GPU加速算法优化

林琳1, 祝爱琦2, 赵明璨2, 张帅2, 叶炎昊2, 徐骥2, 韩林3, 赵荣彩3, 侯超峰2   

  1. 1. 郑州大学 信息工程学院, 郑州 450001;
    2. 中国科学院过程工程研究所, 北京 100190;
    3. 郑州大学 国家超级计算郑州中心, 郑州 450001
  • 收稿日期:2022-04-13 修回日期:2022-06-02 发布日期:2022-08-22
  • 作者简介:林琳(1997-),女,硕士研究生,主研方向为分子动力学模拟;祝爱琦、赵明璨、张帅、叶炎昊,博士研究生;徐骥,副研究员、博士;韩林,副教授、博士;赵荣彩,教授、博士;侯超峰(通信作者),副研究员、博士。
  • 基金资助:
    国家自然科学基金(21776280,22073103);北京市自然科学基金(JQ21034);河南省重大科技专项(201400211300)。

GPU-Accelerated Algorithm Optimization for Molecular Dynamics Simulation of Crystalline Silicon

LIN Lin1, ZHU Aiqi2, ZHAO Mingcan2, ZHANG Shuai2, YE Yanhao2, XU Ji2, HAN Lin3, ZHAO Rongcai3, HOU Chaofeng2   

  1. 1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China;
    2. Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China;
    3. National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, China
  • Received:2022-04-13 Revised:2022-06-02 Published:2022-08-22

摘要: 分子动力学(MD)模拟是研究硅纳米薄膜热力学性质的主要方法,但存在数据处理量大、计算密集、原子间作用模型复杂等问题,限制了MD模拟的深入应用。针对晶硅分子动力学模拟算法中数据访问不连续和大量分支判断造成并行资源浪费、线程等待等问题,结合Nvidia Tesla V100 GPU硬件体系结构特点,对晶硅MD模拟算法进行设计。通过全局内存的合并访存、循环展开、原子操作等优化方法,利用GPU强大并行计算和浮点运算能力,减少显存访问及算法执行过程中的分支冲突和判断指令,提升算法整体计算性能。测试结果表明,优化后的晶硅MD模拟算法的计算速度相比于优化前提升了1.69~1.97倍,相比于国际上主流的GPU加速MD模拟软件HOOMD-blue和LAMMPS分别提升了3.20~3.47倍和17.40~38.04倍,具有较好的模拟加速效果。

关键词: 分子动力学, 图形处理器, 合并访存, 循环展开, 原子操作, 性能优化

Abstract: Molecular Dynamics(MD) is one of the main methods used to study the thermodynamic properties of silicon nano-films;however, these studies have problems such as processing massive amounts of data, computational intensity, and complex interatomic interaction, which limit the comprehensive application of MD simulations.To address discontinuities in data access and branch judgments causing the parallel waste of resources and thread waiting in the crystalline silicon MD simulation algorithm, this study combines the characteristics of Nvidia Tesla V100 Graphics Processor Unit(GPU) hardware architecture to design the crystalline silicon MD simulation algorithm.Global memory optimization methods such as coalesced access, loop unrolling, and atomic operation are designed for the MD simulation algorithm.The combination of optimization design and a GPU with powerful parallel and floating-point computing capabilities reduces branch conflicts and judgment instructions during memory access and algorithm execution and improves the overall computing performance of the algorithm.The test results show that the optimized crystal silicon MD simulation algorithm is 1.69-1.97 times faster than the unoptimized algorithm.The optimized algorithm performs 3.20-3.47 and 17.40-38.04 times better than the GPU-accelerated MD simulation software HOOMD-blue and LAMMPS, respectively.The simulations achieve good computation performance.

Key words: Molecular Dynamics(MD), Graphics Processor Unit(GPU), coalesced access, loop unrolling, atomic operation, performance optimization

中图分类号: