摘要: 针对目前浮点运算单元在处理向量点乘运算时存在数据相关性的问题,提出一种低延迟单周期的累加单元结构。该结构用于7级流水的可配置乘累加单元,可兼容双精度浮点、双单精度浮点以及32位有符号数,且能对后置模块进行操作数隔离与门控时钟的低功耗处理。在Viterx-4平台上实验结果表明,该结构具有高性能、低延迟、单周期完成数据吞吐等特点,与使用Xilinx浮点IP的设计面积相比,时间积减少30%以上。
关键词:
浮点运算单元,
乘累加,
向量点乘,
双精度,
双单精度
Abstract: To solve data hazards in vector dot product operations of float point unit, a low-latency and single-cycle accumulator architecture is used in the 7-stage pipelined configurable multiply-accumulator design. It is compatible with double-precision floating point, dual single-precision floating point and 32 bit signed integer operands. Fused multiply-add operations and continuous multiply- accumulation operations are supported. In addition, energy control is achieved using operand isolation and clock gating. Implementation result on Viterx-4 shows that the accumulator architecture has high performance, low latency and single-cycle throughput, the area and time product is 30% less than which is designed by using Xilinx Float IPs.
Key words:
Float Point Unit(FPU),
multiply-accumulator,
vector dot product,
double-precision,
dual single-precision
中图分类号:
沈俊, 沈海斌, 虞玉龙. 一种低延迟高吞吐率的浮点整型乘累加单元[J]. 计算机工程, 2013, 39(6): 91-94,102.
CHEN Dun, CHEN Hai-Bin, YU Yu-Long. A Low Latency High Throughput Multiply-accumulator Unit for Float Point and Integer[J]. Computer Engineering, 2013, 39(6): 91-94,102.