作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2013, Vol. 39 ›› Issue (6): 91-94,102. doi: 10.3969/j.issn.1000-3428.2013.06.018

• 体系结构与软件技术 • 上一篇    下一篇

一种低延迟高吞吐率的浮点整型乘累加单元

沈 俊,沈海斌,虞玉龙   

  1. (浙江大学超大规模集成电路设计研究所,杭州 310027)
  • 收稿日期:2012-07-02 出版日期:2013-06-15 发布日期:2013-06-14
  • 作者简介:沈 俊(1987-),男,硕士研究生,主研方向:集成电路设计;沈海斌,教授;虞玉龙,工程师

A Low Latency High Throughput Multiply-accumulator Unit for Float Point and Integer

SHEN Jun, SHEN Hai-bin, YU Yu-long   

  1. (Institute of VLSI Design, Zhejiang University, Hangzhou 310027, China)
  • Received:2012-07-02 Online:2013-06-15 Published:2013-06-14

摘要: 针对目前浮点运算单元在处理向量点乘运算时存在数据相关性的问题,提出一种低延迟单周期的累加单元结构。该结构用于7级流水的可配置乘累加单元,可兼容双精度浮点、双单精度浮点以及32位有符号数,且能对后置模块进行操作数隔离与门控时钟的低功耗处理。在Viterx-4平台上实验结果表明,该结构具有高性能、低延迟、单周期完成数据吞吐等特点,与使用Xilinx浮点IP的设计面积相比,时间积减少30%以上。

关键词: 浮点运算单元, 乘累加, 向量点乘, 双精度, 双单精度

Abstract: To solve data hazards in vector dot product operations of float point unit, a low-latency and single-cycle accumulator architecture is used in the 7-stage pipelined configurable multiply-accumulator design. It is compatible with double-precision floating point, dual single-precision floating point and 32 bit signed integer operands. Fused multiply-add operations and continuous multiply- accumulation operations are supported. In addition, energy control is achieved using operand isolation and clock gating. Implementation result on Viterx-4 shows that the accumulator architecture has high performance, low latency and single-cycle throughput, the area and time product is 30% less than which is designed by using Xilinx Float IPs.

Key words: Float Point Unit(FPU), multiply-accumulator, vector dot product, double-precision, dual single-precision

中图分类号: