Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2023, Vol. 49 ›› Issue (12): 55-62. doi: 10.19678/j.issn.1000-3428.0066172

• Frontiers in Computer Systems • Previous Articles     Next Articles

Design of Sparse CNN Accelerator Based on Inter-Frame Data Reuse

Qirun HONG, Qin WANG   

  1. School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
  • Received:2022-11-04 Online:2023-12-15 Published:2023-02-08

基于帧间数据复用的稀疏CNN加速器设计

洪起润, 王琴   

  1. 上海交通大学 电子信息与电气工程学院, 上海 200240
  • 作者简介:

    洪起润(1996—),男,硕士研究生,主研方向为神经网络加速器设计

    王琴,研究员、博士

  • 基金资助:
    国家重点研发计划(2018YFA0701500)

Abstract:

Convolutional Neural Network(CNN)are widely used for object detection and other tasks in video applications. However, conventional CNN accelerators focus only on the acceleration of single-image inferences and do not use data redundancy between successive video frames to accelerate video tasks. CNN accelerators currently using inter-frame data reuse have low sparsity, large model size, and high computational complexity. To solve these problems, a design using a learned step-size low-precision quantization is proposed to increase the sparsity of differential frames. Furthermore, the power of two scales is proposed to implement hardware-friendly quantization. This design also uses the Winograd algorithm to reduce the computational complexity of the convolution operator. Based on this, an input-channel bitmap compression scheme is proposed to exploit the sparsity of both activations and weights to leverage full zero skipping. Based on the YOLOv3 tiny network, the proposed quantization method and sparse CNN accelerator are verified on a Field Programmable Gate Array(FPGA) platform using a subset of the ImageNet ILSVRC2015 VID and DAC2020 datasets. The results show that the proposed quantization method achieves 4-bit full-integer quantization with a loss of less than 2% in mean Average Precision(mAP). Owing to interframe data reuse, the designed sparse CNN accelerator achieves a performance of 814.2×109operation/s and an energy efficiency ratio of 201.1×109operation/s/W. Compared with other FPGA-based accelerators, the designed accelerator achieves 1.77-8.99 times higher performance and 1.91-5.56 times higher energy efficiency.

Key words: Convolutional Neural Network(CNN), low-precision quantization, inter-frame data reuse, Winograd algorithm, accelerator, Field Programmable Gate Array(FPGA)

摘要:

卷积神经网络(CNN)被广泛应用于目标检测等任务场景中。然而,传统的CNN加速器只对单帧图像进行加速处理,没有对视频任务中连续帧之间存在的数据冗余特性进行加速处理。目前利用帧间数据复用的CNN加速器存在稀疏度低、模型规模大以及计算复杂度高的缺点。为解决上述问题,通过可学习步长的低精度量化方法提高差分帧的稀疏度,提出量化因子2的幂次约束实现一个硬件友好的量化方法。使用Winograd算法降低卷积算子的计算复杂度,并在此基础上提出输入通道位图压缩方案,利用激活和权重的稀疏性跳过无效的零值计算。基于YOLOv3-tiny网络,使用ImageNet ILSVRC2015 VID部分数据集和DAC2020数据集,在现场可编程门阵列(FPGA)平台上对所提出的量化方法和稀疏CNN加速器进行验证。实验结果表明,在平均精度均值损失小于2%的条件下,该量化方法实现了4 bit位宽的全整形量化。得益于帧间数据复用,所设计的稀疏加速器实现了814.2×109operation/s的性能和201.1×109operation/s/W的能效比,与其他基于FPGA的同类型加速器相比,所设计的加速器提供了1.77~8.99倍的性能提升以及1.91~5.56倍的能效比提升。

关键词: 卷积神经网络, 低精度量化, 帧间数据复用, Winograd算法, 加速器, 现场可编程门阵列