摘要: 对于数据密集型应用,大量能量和延时消耗在计算和存储单元之间的数据传输上,造成冯·诺依曼瓶颈。在采用2.5D封装集成的系统中,这一问题依然存在。为此,提出一种新型的硬件加速方案。引入存储型计算到2.5D系统中,使片外存储具备运算的能力。将存储器划分为若干个bank,支持bank间并行访问,并在存储阵列中设计可配置的加速单元,充分利用存储阵列的带宽进行并行计算,降低数据传输的延时和能耗。以H.264解码中的反量化反变换为例对该结构进行实现,仿真结果显示,相较于传统软件实现方法,该方案可获得7.1倍的性能提升,节省80.5%的能量,并且只增加2%的面积开销。
关键词:
存储型计算,
冯·诺依曼瓶颈,
2.5D封装系统,
H.264解码,
数据密集型应用
Abstract: For data-intensive applications,the large amounts of energy and latency spent in transporting data between off-chip memory and on-chip computing elements cause a limitation referred to as the von Neumann bottleneck.Even in the 2.5D integrated system,the bottleneck also exists.Aiming at this problem,this paper proposes a novel hardware acceleration framework that enables computing in off-chip memory array for 2.5D system.It divides the memory into multiple banks and puts an accelerator designed for H.264 decoder in the memory to utilize the high bandwidth provided by the memory array.Simluation result shows that,compared with traditional software implementation method,this framework achieves 7.1X improvement in performance and 80.5% reduction in energy consumption,and it only increases 2% accelerator area.
Key words:
in-memory computing,
von Neumann bottleneck,
2.5D integrated system,
H.264 decoding,
data-intensive application
中图分类号: