作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (8): 354-363. doi: 10.19678/j.issn.1000-3428.0068758

• 开发研究与工程应用 • 上一篇    下一篇

面向鲲鹏处理器的HPL-MxP多重lookahead优化

高昂1,2, 王银山1,2,*(), 燕雯1,2, 宋昌成3, 王龙3, 姚二林1,2   

  1. 1. 中国科学院计算技术研究所,北京 100190
    2. 中国科学院大学,北京 101408
    3. 华为技术有限公司,浙江 杭州 310052
  • 收稿日期:2023-11-03 修回日期:2023-12-25 出版日期:2025-08-15 发布日期:2025-08-15
  • 通讯作者: 王银山
  • 基金资助:
    中国科学院青年创新促进基金(E345060)

HPL-MxP Multiple lookahead Optimization for Kunpeng Processors

GAO Ang1,2, WANG Yinshan1,2,*(), YAN Wen1,2, SONG Changcheng3, WANG Long3, YAO Erlin1,2   

  1. 1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
    2. University of Chinese Academy of Sciences, Beijing 101408, China
    3. Huawei Technologies Co., Ltd, Hangzhou 310052, Zhejiang, China
  • Received:2023-11-03 Revised:2023-12-25 Online:2025-08-15 Published:2025-08-15
  • Contact: WANG Yinshan

摘要:

HPL-MxP基准测试程序被广泛用于衡量超算在混合精度计算下的计算能力。受制于该程序的并行实现算法,矩阵分块大小(NB)值的选取是一个需要兼顾矩阵乘效率和负载均衡的权衡问题。针对该问题,在鲲鹏920系统上进行优化研究,提出多重lookahead优化策略,采用小NB值进行矩阵分块实现更好的负载均衡,同时通过合并多轮尾矩阵更新提升等效NB值,实现负载均衡与高矩阵乘效率两者兼得的目标。为实现多重lookahead优化方案,重构Panel存储方式,并设计计算与通信细粒度流水线,扩展HPL-MxP源程序接口。在鲲鹏920多节点平台上的单双精度混合测试结果表明,HPL-MxP在多重lookahead优化下可有效解决NB值的权衡问题,且相较单重lookahead策略未产生明显额外开销。

关键词: HPL-MxP基准测试程序, 矩阵分块, 混合精度, 多重lookahead优化策略, Panel存储方式

Abstract:

The HPL-MxP benchmark program is widely used for measuring the computational power of supercomputers in mixed-precision computing. Subject to the parallel implementation algorithm of this program, the selection of the matrix Numerical Block (NB) value of the matrix block size is a tradeoff problem that must consider matrix multiplication efficiency and load balancing. To solve this problem, this paper presents an optimization study on the Kunpeng 920 system and proposes a multi-level lookahead optimization strategy: small NB values are used for matrix chunking to achieve better load balancing, and equivalent NB values are improved by merging multiple rounds of matrix multiplication updates to achieve load balancing and high matrix multiplication efficiency. To realize a multi-level lookahead optimization scheme, this study reconstructs the Panel storage mode, designs a fine-grained computing and communication pipeline, and expands the HPL-MxP source program interface. A single-double precision hybrid test on the Kunpeng 920 multi-node platform shows that HPL-MxP can effectively solve the trade-off problem of NB values under multi-level lookahead optimization and does not incur significant additional overhead compared with the single-level lookahead strategy.

Key words: HPL-MxP benchmark test program, matrix blocking, mixed precision, multi-level lookahead optimization strategy, Panel storage mode