Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2013, Vol. 39 ›› Issue (7): 311-313,317. doi: 10.3969/j.issn.1000-3428.2013.07.069

• Networks and Communications • Previous Articles     Next Articles

Study of Floating-point Multiply-Add Unit Latency Effect on Floating-point Performance

HE Jun, TIAN Zeng, GUO Yong, CHEN Cheng   

  1. (Shanghai High Performance IC Design Centre, Shanghai 201204, China)
  • Received:2012-08-03 Online:2013-07-15 Published:2013-07-12

浮点乘加部件延迟对浮点性能影响的研究

何 军,田 增,郭 勇,陈 诚   

  1. (上海高性能集成电路设计中心,上海 201204)
  • 作者简介:何 军(1980-),男,博士研究生、CCF会员,主研方向:微处理器设计;田 增、郭 勇、陈 诚,硕士

Abstract: Considering the shortcoming that the Fused Multiply-Add(FMA) unit increases the latency of separate floating-point add/subs tract and multiply operations, the effect of FMA unit latency optimization, reducing the latency of separated floating-point add/subtract and multiply operations from 6 cycles to 4 cycles, on floating-point performance is studied. Based on a homemade processor with FMA unit, the RTL design is modified. The effect of the optimization on floating-point performance is estimated after running SPEC CPU2000 floating-point benchmarks on the hardware emulation acceleration platform. As the results turned out that the floating-point performance of the benchmarks is all improved 5.25% at most and 1.61% on average, proving that such optimization in favor of floating-point performance promotion.

Key words: floating-point add, floating-point multiply, Fused Multiply-Add(FMA), hardware emulation, floating-point performance, operation latency

摘要: 浮点融合乘加部件会增加独立浮点加减法、乘法等运算延迟。为克服该缺陷,研究将乘加部件独立乘法、加减法等运算延迟由6拍减为4拍时对浮点性能的影响。以某支持乘加运算的国产处理器为基础,修改相关的RTL级设计代码,利用硬件仿真加速器平台,对SPEC CPU2000浮点测试课题进行评估。实验结果表明,该延迟优化有利于提高浮点性能,最大提高5.25%,平均提高1.61%。

关键词: 浮点加法, 浮点乘法, 融合乘加, 硬件仿真, 浮点性能, 运算延迟

CLC Number: