Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (3): 170-174,196. doi: 10.19678/j.issn.1000-3428.0060675

• Computer Architecture and Software Technology • Previous Articles     Next Articles

Design of Quantized CNN Acceleration System Based on FPGA

GONG Jie, ZHAO Shuo, HE Hu, DENG Ning   

  1. Institute of Microelectronics, Tsinghua University, Beijing 100084, China
  • Received:2021-01-22 Revised:2021-03-19 Published:2022-03-11

基于FPGA的量化CNN加速系统设计

巩杰, 赵烁, 何虎, 邓宁   

  1. 清华大学微电子学研究所, 北京 100084
  • 作者简介:巩杰(1995-),男,硕士,主研方向为CNN加速系统;赵烁,硕士;何虎,副教授;邓宁,教授。
  • 基金资助:
    国家重点研发计划(2016YFA0201800)。

Abstract: The convolution layer and full connection layer in the deep Convolutional Neural Network(CNN) model contain a large number of convolution operations, resulting in a significant increase in network scale, parameters, and computation.Deep CNNs are unsuitable for the mobile device environment, and the parallel computing performance is poor when deployed on the CPU/GPU platform.Thus, it is necessary to quantify the convolution parameters and speed up the design combining with the hardware.Field Programmable Gate Array (FPGA) with low power consumption and high flexibility, meet the requirements of CNN parallel computing.Therefore, the CNN quantization method and its acceleration system are designed based on FPGA.The general dynamic fixed-point quantization method proposed in this study quantifies each level of the network with different accuracy, simultaneously, reducing the loss in network accuracy as well as storage requirements of network parameters.On this basis, a special accelerator and its on-chip system are designed for the quantized CNN, to accelerate the forward reasoning calculation of the network.Using the ImageNetILSVRC2012 dataset, the performance of the designed quantization method and acceleration system are verified on the VGG-16 and ResNet-50 networks.Experimental results show that the network scale of VGG-16 and ResNet-50, after quantization, is only 13.8% and 24.8% of the original, respectively while the Top-1 accuracy loss is less than 1%, indicating that the quantization method is remarkably effective.Meanwhile, the acceleration system of VGG-16, outperforms the other three FPGA acceleration systems with a maximum increase of 4.5 times in peak performance(614.4 GOPs) and 4.7 times in energy consumption ratio (113.99 GOPs/W).

Key words: Convolutional Neural Network(CNN), dynamic fiexed-point quantization, hardware acceleration, Field Programmable Gate Array(FPGA), model compression

摘要: 深度卷积神经网络(CNN)模型中卷积层和全连接层包含大量卷积操作,导致网络规模、参数量和计算量大幅增加,部署于CPU/GPU平台时存在并行计算性能差和不适用于移动设备环境的问题,需要对卷积参数做量化处理并结合硬件进行加速设计。现场可编程门阵列(FPGA)可满足CNN并行计算和低功耗的需求,并具有高度的灵活性,因此,基于FPGA设计CNN量化方法及其加速系统。提出一种通用的动态定点量化方法,同时对网络的各个层级进行不同精度的量化,以减少网络准确率损失和网络参数的存储需求。在此基础上,针对量化后的CNN设计专用加速器及其片上系统,加速网络的前向推理计算。使用ImageNet ILSVRC2012数据集,基于VGG-16与ResNet-50网络对所设计的量化方法和加速系统进行性能验证。实验结果显示,量化后VGG-16与ResNet-50的网络规模仅为原来的13.8%和24.8%,而Top-1准确率损失均在1%以内,表明量化方法效果显著,同时,加速系统在运行VGG-16时,加速效果优于其他3种FPGA实现的加速系统,峰值性能达到614.4 GOPs,最高提升4.5倍,能耗比达到113.99 GOPs/W,最高提升4.7倍。

关键词: 卷积神经网络, 动态定点量化, 硬件加速, 现场可编程门阵列, 模型压缩

CLC Number: