基于FPGA误差可控的浮点运算加速器研究

doi:10.19678/j.issn.1000-3428.0067233

摘要/Abstract

摘要： 浮点运算是高性能计算(HPC)领域的基础运算。在大数据与云计算的背景下,高性能计算平台需要处理的数据量与日俱增,而且浮点数的舍入误差在大规模、长时程的运算中会产生累积,因此,在提升浮点运算性能的同时保证计算结果的可靠性非常重要。利用现场可编程门阵列(FPGA)可编程、低功耗、灵活性强的特点,针对含复杂单项运算的浮点多项式设计一种浮点运算加速器。基于无误差变换的思想,通过计算得出舍入误差值,将其补偿到浮点数值上,从而实现误差可控。采用异步并行的方式实现运算加速,并通过构建CPU-FPGA平台最大化地利用计算资源,保证计算任务执行的高效性。数据测试结果表明:在不限制对称性下的数值相对论模拟运算中,该加速器在200 MHz的主频下可达到91.85 MFLOPs的峰值性能;与Intel i7 6700K CPU运行最大线程数的性能相比,该加速器实现了50.54的加速比,并在该条件下获得了平均53.6%的精确结果百分比以及更低的相对误差,表明其具备较高的可靠性。

关键词: 现场可编程门阵列, 浮点运算加速器, 可控误差, 异构系统, 高可靠性

Abstract: Floating-point operations are fundamental operations in the field of High-Performance Computing(HPC). In the context of big data and cloud computing,the amount of data that HPC platforms need to process is continuously growing,and the round-off error of floating-point arithmetic numbers will accumulate in large-scale,long-term operations. Therefore,it is crucial to ensure the reliability of the calculation results while improving the performance of floating-point operations. In response to these issues,based on the programmable,low-power,and flexible characteristics of a Field Programmable Gate Array (FPGA),a floating-point polynomial accelerator is designed mainly for complex single item operations. Based on the idea of error free transformation,the round-off error value is calculated and compensated to the floating-point value,such that the error can be controlled. Asynchronous and parallel methods are adopted to accelerate computation,and a CPU-FPGA platform is constructed to maximize the utilization of computing resources and ensure the efficiency of computing task execution. The data test results demonstrate that the accelerator can achieve a peak performance of 91.85 MFLOPs at the main frequency of 200 MHz in the numerical relativity simulation without limiting the symmetry. Compared to the performance of Intel i7 6700K CPU running the maximum number of threads,this accelerator achieved an acceleration ratio of 50.54,and achieved an average accurate result percentage of 53.6% and lower relative error under these conditions,demonstrating high reliability.

Key words: Field Programmable Gate Array(FPGA), floating-point operation accelerator, controllable error, heterogeneous system, high reliability

中图分类号:

TP391

关明晓, 刘嘉堃, 张鸿锐, 何安平. 基于FPGA误差可控的浮点运算加速器研究[J]. 计算机工程, 2024, 50(5): 291-297.

GUAN Mingxiao, LIU Jiakun, ZHANG Hongrui, HE Anping. Study of FPGA-based Error-controllable Floating-point Operation Accelerators[J]. Computer Engineering, 2024, 50(5): 291-297.

https://www.ecice06.com/CN/Y2024/V50/I5/291

参考文献

[1] IEEE.IEEE standard for floating-point arithmetic[EB/OL].[2023-02-05].http://www.dsc.ufcg.edu.br/~cnum/modulos/Modulo2/IEEE754_2008.pdf.
[2] 赵世忠,陈冬火,刘静.循环迭代程序的一种可信计算算法[J].软件学报,2020,31(12):3685-3699. ZHAO S Z,CHEN D H,LIU J.Reliable algorithm for computing cyclic iterative program[J].Journal of Software,2020,31(12):3685-3699.(in Chinese)
[3] 陈磊,唐滔,漆海俊,等.面向飞腾处理器的多线程可复现DGEMV设计与实现[J].计算机科学,2022,49(10):27-35. CHEN L,TANG T,QI H J,et al.Design and implementation of multithreaded reproducible DGEMV for phytium processor[J].Computer Science,2022,49(10):27-35.(in Chinese)
[4] 赵世忠.算术表达式的一种可信计算算法及其软件ISReal[J].中国科学:信息科学,2016,46(6):698-713. ZHAO S Z.A reliable computing algorithm and its software application ISReal for arithmetic expressions[J].Scientia Sinica (Informationis),2016,46(6):698-713.(in Chinese)
[5] 天野英晴.FPGA原理和结构[M].赵谦,译.北京:人民邮电出版社,2016. TIANNO Y Q.FPGA principle and structure[M].Translated by ZHAO Q.Beijing:Posts & Telecom Press,2016.(in Chinese)
[6] 张金凤,唐金慧,马成英.FPGA可编程资源测试技术研究[J].电子元器件与信息技术,2018,2(7):59-62. ZHANG J F,TANG J H,MA C Y.Research on FPGA programmable resource testing technology[J].Electronic Component and Information Technology,2018,2(7):59-62.(in Chinese)
[7] FIFIELD J,KERYELL R,RATIGNER H,et al.Optimizing OpenCL applications on Xilinx FPGA[EB/OL].[2023-02-05].https://dl.acm.org/doi/10.1145/2909437.2909447.
[8] GROSS M,JACOB N,ZANKL A,et al.Breaking TrustZone memory isolation and secure boot through malicious hardware on a modern FPGA-SoC[J].Journal of Cryptographic Engineering,2022,12(2):181-196.
[9] HASSAN M F,HUSSEIN K F,AL-MUSAWI B.Design and implementation of fast floating point units for FPGAs[J].Indonesian Journal of Electrical Engineering and Computer Science,2020,19(3):1480.
[10] CORDA S,VEENBOER B,AWAN A J,et al.Reduced-precision acceleration of radio-astronomical imaging on reconfigurable hardware[J].IEEE Access,2022,10:22819-22843.
[11] BOCCO A,DURAND Y,DE DINECHIN F.SMURF:scalar multiple-precision unum RISC-V floating-point accelerator for scientific computing[C]//Proceedings of the Conference for Next Generation Arithmetic 2019.New York,USA:ACM Press,2019:1-8.
[12] ZHAO C,MEI K Z,WANG F,et al.A high-efficient floating point coprocessor for SPARC Leon2 embedded processor[C]//Proceedings of the 11th IEEE International Conference on ASIC.Washington D.C.,USA:IEEE Press,2015:18-23.
[13] PETITET A.HPL-a portable implementation of the high-performance Linpack benchmark for distributed-memory computers[EB/OL].[2023-02-05].http://www.netlib.org/benchmark/hpl/.
[14] LIU Y J,WANG J,LIU J.Accelerating the simulation of Finite Difference Time Domain(FDTD) with GPU[C]//Proceedings of IEEE International Joint EMC/SI/PI and EMC Europe Symposium.Washington D.C.,USA:IEEE Press,2021:707-711.
[15] BERGER M,GIOTAKIS A I,PILLEI M,et al.Agreement between rhinomanometry and computed tomography-based computational fluid dynamics[J].International Journal of Computer Assisted Radiology and Surgery,2021,16(4):629-638.
[16] JEONG W K,FLETCHER P T,TAO R,et al.Interactive visualization of volumetric white matter connectivity in DT-MRI using a parallel-hardware hamilton-jacobi solver[J].IEEE Transactions on Visualization and Computer Graphics,2007,13(6):1480-1487.
[17] OGITA T,RUMP S M,OISHI S.Accurate sum and dot product[J].SIAM Journal on Scientific Computing,2005,26(6):1955-1988.
[18] RUMP S M,OGITA T,OISHI S.Accurate floating-point summation part I:faithful rounding[J].SIAM Journal on Scientific Computing,2008,31(1):189-224.
[19] RUMP S M,OGITA T,OISHI S.Accurate floating-point summation part II:sign,$K$-fold faithful and rounding to nearest[J].SIAM Journal on Scientific Computing,2008,31(2):1269-1302.
[20] KNUTH D.The art of computer programming[M].3rd ed.[S.l.]:Addison Wesley,1998.
[21] DEKKER T J.A floating-point technique for extending the available precision[J].Numerische Mathematik,1971,18(3):224-242.
[22] NIEVERGELT Y.Scalar fused multiply-add instructions produce floating-point matrix arithmetic provably accurate to the penultimate digit[J].ACM Transactions on Mathematical Software,2003,29(1):27-48.
[23] OKAWA H,CARDOSO V,PANI P.Collapse of self-interacting fields in asymptotically flat spacetimes:do self-interactions render Minkowski spacetime unstable?[J].Physical Review D,2014,89(4):041502.
[24] Xilinx[EB/OL].[2023-02-05].https://china.xilinx.com/products/silicon-devices/fpga/kintex-7.html.
[25] The GNU MPFR library[EB/OL].[2023-02-05].http://www.mpfr.org.

选择文件类型/文献管理软件名称

选择包含的内容