作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (9): 130-138. doi: 10.19678/j.issn.1000-3428.0062139

• 体系结构与软件技术 • 上一篇    下一篇

面向神威高性能多核处理器的并行编译优化方法

周雍浩1, 徐金龙2, 李斌1, 钱宏3, 聂凯2   

  1. 1. 郑州大学 信息工程学院, 郑州 450001;
    2. 数学工程与先进计算国家重点实验室, 郑州 450001;
    3. 江南计算技术研究所, 江苏 无锡 214083
  • 收稿日期:2021-07-20 修回日期:2021-11-07 发布日期:2021-11-11
  • 作者简介:周雍浩(2000—),男,本科生,主研方向为先进编译技术;徐金龙,讲师、博士;李斌,副教授、博士;钱宏,高级工程师、硕士;聂凯,博士研究生。
  • 基金资助:
    国家重点研发计划“高性能计算”重点专项(2016YFB0200503)。

Parallel Compilation Optimization Method for Sunway High Performance Multi-Core Processors

ZHOU Yonghao1, XU Jinlong2, LI Bin1, QIAN Hong3, NIE Kai2   

  1. 1. School of Information Engineering, Zhengzhou University, Zhengzhou 450001, China;
    2. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450001, China;
    3. Jiangnan Institute of Computing Technology, Wuxi, Jiangsu 214083, China
  • Received:2021-07-20 Revised:2021-11-07 Published:2021-11-11

摘要: 在神威高性能多核服务器上,自动并行化编译系统为识别和申明程序中的并行性,产生的OpenMP程序没有经过充分的优化,其采用简单的fork-join模型,存在大量的并行循环嵌套,导致运行效率低。为提升自动并行化编译系统产生的OpenMP程序的运行效率,提出一种并行域重构优化技术。并行域重构技术通过合并程序中的并行域和扩展嵌套循环中的并行域范围,减少OpenMP程序的并行域数目,降低线程组频繁创建和合并等控制开销,将简单fork-join模型的OpenMP程序转换为性能更为高效的单程序多数据模型的OpenMP程序。实验结果表明,在新一代神威高性能多核服务器SW1621平台上,并行域重构技术在NPB3.3-OMP测试集和SPEC OMP2012测试集上的运行效率分别提高了10.77%和7.94%的,可有效提升自动并行化编译系统OpenMP程序的执行效率。

关键词: 神威高性能多核处理器, OpenMP编程, 并行域重构, fork-join模型, 单程序多数据模型

Abstract: In the Sunway high performance multi-core server, the automatic parallelization compiling system produces OpenMP programs that are not sufficiently optimized to identify and assert parallelism in the program.Moreover, the program uses a simple fork-join pattern, which has many parallel loops nested in the program, resulting in poor running efficiency.In this study, a parallel region reconstruction optimization technique is developed to improve the running efficiency of OpenMP programs generated by the automatic parallelization compiling system.Parallel domain reconstruction can reduce the number of parallel domains in OpenMP programs by merging parallel domains in programs and extending the scope of parallel domains in nested loops, reduce the control overhead of frequent creation and merging of thread groups, and transform the OpenMP programs with the simple fork-join model into OpenMP programs with a more efficient Single Program Multi-Data(SPMD) model.The experimental results show that on the new-generation Sunway high-performance multi-core server SW1621 platform, the proposed parallel domain reconstruction technique improves the operating efficiency of the NPB3.3-OMP and SPEC OMP2012 test sets by 10.77% and 7.94%, respectively.Furthermore, the proposed technique provides technical support for improving the execution efficiency of OpenMP programs generated by the automatic parallelization compilation system.

Key words: Sunway high performance multi-core processors, OpenMP programming, parallel region reconstruction, fork-join model, Single Program Multi-Data(SPMD) model

中图分类号: