摘要: 浮点融合乘加部件会增加独立浮点加减法、乘法等运算延迟。为克服该缺陷,研究将乘加部件独立乘法、加减法等运算延迟由6拍减为4拍时对浮点性能的影响。以某支持乘加运算的国产处理器为基础,修改相关的RTL级设计代码,利用硬件仿真加速器平台,对SPEC CPU2000浮点测试课题进行评估。实验结果表明,该延迟优化有利于提高浮点性能,最大提高5.25%,平均提高1.61%。
关键词:
浮点加法,
浮点乘法,
融合乘加,
硬件仿真,
浮点性能,
运算延迟
Abstract: Considering the shortcoming that the Fused Multiply-Add(FMA) unit increases the latency of separate floating-point add/subs tract and multiply operations, the effect of FMA unit latency optimization, reducing the latency of separated floating-point add/subtract and multiply operations from 6 cycles to 4 cycles, on floating-point performance is studied. Based on a homemade processor with FMA unit, the RTL design is modified. The effect of the optimization on floating-point performance is estimated after running SPEC CPU2000 floating-point benchmarks on the hardware emulation acceleration platform. As the results turned out that the floating-point performance of the benchmarks is all improved 5.25% at most and 1.61% on average, proving that such optimization in favor of floating-point performance promotion.
Key words:
floating-point add,
floating-point multiply,
Fused Multiply-Add(FMA),
hardware emulation,
floating-point performance,
operation latency
中图分类号:
何军, 田增, 郭勇, 陈诚. 浮点乘加部件延迟对浮点性能影响的研究[J]. 计算机工程, 2013, 39(7): 311-313,317.
HE Jun, TIAN Ceng, GUO Yong, CHEN Cheng. Study of Floating-point Multiply-Add Unit Latency Effect on Floating-point Performance[J]. Computer Engineering, 2013, 39(7): 311-313,317.