Design of Quantized CNN Acceleration System Based on FPGA

doi:10.19678/j.issn.1000-3428.0060675

Abstract

Abstract: The convolution layer and full connection layer in the deep Convolutional Neural Network(CNN) model contain a large number of convolution operations, resulting in a significant increase in network scale, parameters, and computation.Deep CNNs are unsuitable for the mobile device environment, and the parallel computing performance is poor when deployed on the CPU/GPU platform.Thus, it is necessary to quantify the convolution parameters and speed up the design combining with the hardware.Field Programmable Gate Array (FPGA) with low power consumption and high flexibility, meet the requirements of CNN parallel computing.Therefore, the CNN quantization method and its acceleration system are designed based on FPGA.The general dynamic fixed-point quantization method proposed in this study quantifies each level of the network with different accuracy, simultaneously, reducing the loss in network accuracy as well as storage requirements of network parameters.On this basis, a special accelerator and its on-chip system are designed for the quantized CNN, to accelerate the forward reasoning calculation of the network.Using the ImageNetILSVRC2012 dataset, the performance of the designed quantization method and acceleration system are verified on the VGG-16 and ResNet-50 networks.Experimental results show that the network scale of VGG-16 and ResNet-50, after quantization, is only 13.8% and 24.8% of the original, respectively while the Top-1 accuracy loss is less than 1%, indicating that the quantization method is remarkably effective.Meanwhile, the acceleration system of VGG-16, outperforms the other three FPGA acceleration systems with a maximum increase of 4.5 times in peak performance(614.4 GOPs) and 4.7 times in energy consumption ratio (113.99 GOPs/W).

Key words: Convolutional Neural Network(CNN), dynamic fiexed-point quantization, hardware acceleration, Field Programmable Gate Array(FPGA), model compression

摘要： 深度卷积神经网络（CNN）模型中卷积层和全连接层包含大量卷积操作，导致网络规模、参数量和计算量大幅增加，部署于CPU/GPU平台时存在并行计算性能差和不适用于移动设备环境的问题，需要对卷积参数做量化处理并结合硬件进行加速设计。现场可编程门阵列（FPGA）可满足CNN并行计算和低功耗的需求，并具有高度的灵活性，因此，基于FPGA设计CNN量化方法及其加速系统。提出一种通用的动态定点量化方法，同时对网络的各个层级进行不同精度的量化，以减少网络准确率损失和网络参数的存储需求。在此基础上，针对量化后的CNN设计专用加速器及其片上系统，加速网络的前向推理计算。使用ImageNet ILSVRC2012数据集，基于VGG-16与ResNet-50网络对所设计的量化方法和加速系统进行性能验证。实验结果显示，量化后VGG-16与ResNet-50的网络规模仅为原来的13.8%和24.8%，而Top-1准确率损失均在1%以内，表明量化方法效果显著，同时，加速系统在运行VGG-16时，加速效果优于其他3种FPGA实现的加速系统，峰值性能达到614.4 GOPs，最高提升4.5倍，能耗比达到113.99 GOPs/W，最高提升4.7倍。

关键词: 卷积神经网络, 动态定点量化, 硬件加速, 现场可编程门阵列, 模型压缩

CLC Number:

TP399

GONG Jie, ZHAO Shuo, HE Hu, DENG Ning. Design of Quantized CNN Acceleration System Based on FPGA[J]. Computer Engineering, 2022, 48(3): 170-174,196.

巩杰, 赵烁, 何虎, 邓宁. 基于FPGA的量化CNN加速系统设计[J]. 计算机工程, 2022, 48(3): 170-174,196.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0060675

http://www.ecice06.com/EN/Y2022/V48/I3/170

Figures/Tables 12

References

[1] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Computer Society, 2016:770-778.
[2] LIU W, ANGUELOV D, ERHAN D, et al.SSD:single shot MultiBox detector[C]//Proceedings of European Conference on Computer Vision.Berlin, Germany:Springer, 2016:21-37.
[3] ZHANG C, LI P, SUN G Y, et al.Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]//Proceedings of 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.New York, USA:ACM Press, 2015:161-170.
[4] GHAFFARI S, SHARIFIAN S.FPGA-based convolutional neural network accelerator design using high level synthesize[C]//Proceedings of the 2nd International Conference of Signal Processing and Intelligent Systems.Washington D.C., USA:IEEE Press, 2016:1-6.
[5] CHEN Y H, KRISHNA T, EMER J S, et al.Eyeriss:an energy-efficient reconfigurable accelerator for deep convolutional neural networks[J].IEEE Journal of Solid-State Circuits, 2017, 52(1):127-138.
[6] COURBARIAUX M, BENGIO Y, DAVID J P.Training deep neural networks with low precision multiplications[EB/OL].(2015-09-23)[2021-01-02].https://arxiv.org/pdf/1412.7024.pdf.
[7] WESS M, DINAKARRAO S M P, JANTSCH A.Weighted quantization-regularization in DNNs for weight memory minimization toward HW implementation[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(11):2929-2939.
[8] GONG C, LI T, LU Y, et al.μL2Q:an ultra-low loss quantization method for DNN compression[C]//Proceedings of 2019 International Joint Conference on Neural Networks.Washington D.C., USA:IEEE Press, 2019:1-8.
[9] LI H, KADAV A, DURDANOVIC I, et al.Pruning filters for efficient ConvNets[EB/OL].(2017-03-10)[2021-01-02].https://arxiv.org/pdf/1608.08710.pdf.
[10] POLINO A, PASCANU R, ALISTARH D.Model compression via distillation and quantization[EB/OL].(2018-02-15)[2021-01-02].https://arxiv.org/pdf/1802.05668.pdf.
[11] SHAWAHNA A, SAIT S M, EL-MALEH A.FPGA-based accelerators of deep learning networks for learning and classification:a review[J].IEEE Access, 2019, 7:7823-7859.
[12] ZHAO R Z, LUK W, NIU X Y, et al.Hardware acceleration for machine learning[C]//Proceedings of 2017 IEEE Computer Society Annual Symposium on VLSI.Washington D.C., USA:IEEE Press, 2017:645-650.
[13] CHEN T S, DU Z D, SUN N H, et al.DianNao:a small-footprint high-throughput accelerator for ubiquitous machine-learning[J].ACM SIGPLAN Notices, 2014, 49(4):269-284.
[14] SUN S, JIANG H J, YIN M C, et al.Design of efficient CNN accelerator based on Zynq platform[C]//Proceedings of the 15th International Conference on Computer Science and Education.Washington D.C., USA:IEEE Press, 2020:489-493.
[15] IOFFE S.Batch renormalization:towards reducing minibatch dependence in batch-normalized models[EB/OL].(2017-03-30)[2021-01-02].https://arxiv.org/pdf/1702.03275.pdf.
[16] STANKOVIĆ I, BRAJOVIĆ M, DAKOVIĆ M, et al.Quantization in compressive sensing:a signal processing approach[J].IEEE Access, 2020, 8:50611-50625.
[17] ZHOU Y L, CHEN L, XIE R, et al.Low-precision CNN model quantization based on optimal scaling factor estimation[C]//Proceedings of 2019 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting.Washington D.C., USA:IEEE Press, 2019:1-5.
[18] CHIEN J T, CHANG S T.M-ARY quantized neural networks[C]//Proceedings of 2020 IEEE International Conference on Multimedia and Expo.Washington D.C., USA:IEEE Press, 2020:1-6.
[19] GUO K Y, SUI L Z, QIU J T, et al.Angel-Eye:a complete design flow for mapping CNN onto embedded FPGA[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 37(1):35-47.
[20] MAO W D, WANG J C, LIN J, et al.Methodology for efficient reconfigurable architecture of generative neural network[C]//Proceedings of 2019 IEEE International Symposium on Circuits and Systems.Washington D.C., USA:IEEE Press, 2019:1-5.
[21] QIU J T, WANG J, YAO S, et al.Going deeper with embedded FPGA platform for convolutional neural network[C]//Proceedings of 2016 ACM/SIGDA International Symposium.New York, USA:ACM Press, 2016:26-35.

Please choose a citation manager

Content to export