作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (5): 291-297. doi: 10.19678/j.issn.1000-3428.0067233

• 开发研究与工程应用 • 上一篇    下一篇

基于FPGA误差可控的浮点运算加速器研究

关明晓, 刘嘉堃, 张鸿锐, 何安平   

  1. 兰州大学信息科学与工程学院, 甘肃 兰州 730000
  • 收稿日期:2023-03-22 修回日期:2023-07-20 发布日期:2023-09-06
  • 通讯作者: 何安平,E-mail:heap@lzu.edu.cn E-mail:heap@lzu.edu.cn

Study of FPGA-based Error-controllable Floating-point Operation Accelerators

GUAN Mingxiao, LIU Jiakun, ZHANG Hongrui, HE Anping   

  1. School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, Gansu, China
  • Received:2023-03-22 Revised:2023-07-20 Published:2023-09-06
  • Contact: 何安平,E-mail:heap@lzu.edu.cn E-mail:heap@lzu.edu.cn

摘要: 浮点运算是高性能计算(HPC)领域的基础运算。在大数据与云计算的背景下,高性能计算平台需要处理的数据量与日俱增,而且浮点数的舍入误差在大规模、长时程的运算中会产生累积,因此,在提升浮点运算性能的同时保证计算结果的可靠性非常重要。利用现场可编程门阵列(FPGA)可编程、低功耗、灵活性强的特点,针对含复杂单项运算的浮点多项式设计一种浮点运算加速器。基于无误差变换的思想,通过计算得出舍入误差值,将其补偿到浮点数值上,从而实现误差可控。采用异步并行的方式实现运算加速,并通过构建CPU-FPGA平台最大化地利用计算资源,保证计算任务执行的高效性。数据测试结果表明:在不限制对称性下的数值相对论模拟运算中,该加速器在200 MHz的主频下可达到91.85 MFLOPs的峰值性能;与Intel i7 6700K CPU运行最大线程数的性能相比,该加速器实现了50.54的加速比,并在该条件下获得了平均53.6%的精确结果百分比以及更低的相对误差,表明其具备较高的可靠性。

关键词: 现场可编程门阵列, 浮点运算加速器, 可控误差, 异构系统, 高可靠性

Abstract: Floating-point operations are fundamental operations in the field of High-Performance Computing(HPC). In the context of big data and cloud computing,the amount of data that HPC platforms need to process is continuously growing,and the round-off error of floating-point arithmetic numbers will accumulate in large-scale,long-term operations. Therefore,it is crucial to ensure the reliability of the calculation results while improving the performance of floating-point operations. In response to these issues,based on the programmable,low-power,and flexible characteristics of a Field Programmable Gate Array (FPGA),a floating-point polynomial accelerator is designed mainly for complex single item operations. Based on the idea of error free transformation,the round-off error value is calculated and compensated to the floating-point value,such that the error can be controlled. Asynchronous and parallel methods are adopted to accelerate computation,and a CPU-FPGA platform is constructed to maximize the utilization of computing resources and ensure the efficiency of computing task execution. The data test results demonstrate that the accelerator can achieve a peak performance of 91.85 MFLOPs at the main frequency of 200 MHz in the numerical relativity simulation without limiting the symmetry. Compared to the performance of Intel i7 6700K CPU running the maximum number of threads,this accelerator achieved an acceleration ratio of 50.54,and achieved an average accurate result percentage of 53.6% and lower relative error under these conditions,demonstrating high reliability.

Key words: Field Programmable Gate Array(FPGA), floating-point operation accelerator, controllable error, heterogeneous system, high reliability

中图分类号: