FPGA-based Accelerator for Sparse Convolutional Neutral Network

doi:10.19678/j.issn.1000-3428.0058640

Abstract

Abstract: In order to eliminate the invalid operations caused by the sparsity of the model parameters in the forward process of the Convolution Neural Network(CNN),a dataflow and parallel accelerator system for the sparse neural network are designed based on the Field Programmable Gate array(FPGA).By using a dedicated logic module,the non-zero elements in the feature map matrices and the convolution filter matrices are picked up.Then the valid data is transferred to the array consisting of Digital Signal Processor(DSP) for multiply-accumulate operations. On this basis,all relevant intermediate results are transferred to the adder tree to generate the final output feature map.Meanwhile,the coarse-grained parallelism is implemented along the width,height and output channel of the feature maps,and the optimal design parameters are searched for.Experiments are carried out based on Xilinx FPGAs for verification,and the results show that the design enables the sparse convolution layer in VGG to deliver performance of 678.2 GOPS and energy efficiency of 69.45 GOPS/W,displaying a considerable improvement of performance and energy efficiency compared with FPGA-based accelerators for the dense and sparse networks.

Key words: Convolutional Neural Network(CNN), sparsity, Field Programmable Gate Array (FPGA), parallel accelerator, Digital Signal Processor(DSP), adder tree

摘要： 为消除卷积神经网络前向计算过程中因模型参数的稀疏性而出现的无效运算，基于现场可编程门阵列（FPGA）设计针对稀疏化神经网络模型的数据流及并行加速器。通过专用逻辑模块在输入通道方向上筛选出特征图矩阵和卷积滤波器矩阵中的非零点，将有效数据传递给由数字信号处理器组成的阵列做乘累加操作。在此基础上，对所有相关的中间结果经加法树获得最终输出特征图点，同时在特征图宽度、高度和输出通道方向上做粗颗粒度并行并寻找最佳的设计参数。在Xilinx器件上进行实验验证，结果表明，该设计实现VGG16卷积层综合性能达到678.2 GOPS，性能功耗比为69.45 GOPS/W，其性能与功耗指标较基于FPGA的稠密网络加速器和稀疏网络加速器有较大提升。

关键词: 卷积神经网络, 稀疏性, 现场可编程门阵列, 并行加速器, 数字信号处理器, 加法树

CLC Number:

TP393

DI Xinkai, YANG Haigang. FPGA-based Accelerator for Sparse Convolutional Neutral Network[J]. Computer Engineering, 2021, 47(7): 189-195,204.

狄新凯, 杨海钢. 基于FPGA的稀疏化卷积神经网络加速器[J]. 计算机工程, 2021, 47(7): 189-195,204.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0058640

http://www.ecice06.com/EN/Y2021/V47/I7/189

Figures/Tables 12

References

[1] LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-444.
[2] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.New York,USA:Curran Associates Inc.,2012.
[3] XIAO Q C,LIANG Y,LU L Q,et al.Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs[C]//Proceedings of the 54th Annual Design Automation Conference.New York,USA:ACM Press,2012:1-6.
[4] SHEN J Z,HUANG Y,WANG Z L,et al.Towards a uniform template-based architecture for accelerating 2D and 3D CNNs on FPGA[C]//Proceedings of 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.New York,USA:ACM Press,2018:97-106.
[5] 秦华标,曹钦平.基于FPGA的卷积神经网络硬件加速器设计[J].电子与信息学报,2019,41(11):2599-2605. QIN H B,CAO Q P.Design of convolutional neural networks hardware acceleration based on FPGA[J].Journal of Electronics and Information Technology,2019,41(11):2599-2605.(in Chinese)
[6] MA Y F,CAO Y,SARMA V,et al.Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks[C]//Proceedings of 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.San Diego,USA:[s.n.],2018:45-54.
[7] LU L Q,XIE J M,HUANG R R,et al.An efficient hardware accelerator for sparse convolutional neural networks on FPGAs[C]//Proceedings of the 27th Annual International Symposium on Field-Programmable Custom Computing Machines.Washington D.C.,USA:IEEE Press,2019:17-25.
[8] ZHANG S J,DU Z D,ZHANG L,et al.Cambricon-X:an accelerator for sparse neural networks[C]//Proceedings of 2016 IEEE International Symposium on Microarchitecture.Washington D.C.,USA:IEEE Press,2016:1-5.
[9] AIMAR A,MOSTAFA H,CALABRESE E,et al.NullHop:a flexible convolutional neural network accelerator based on sparse representations of feature maps[J].IEEE Transactions on Neural Networks and Learning Systems,2018,30(3):644-656.
[10] 刘勤让,刘崇阳.利用参数稀疏性的卷积神经网络计算优化及其FPGA加速器设计[J].电子与信息学报,2018,40(6):1368-1374. LIU Q R,LIU C Y.Calculation optimization for convolutional neural networks and FPGA-based accelerator design using the parameters sparsity[J].Journal of Electronics and Information Technology,2018,40(6):1368-1374.(in Chinese)
[11] PARASHAR A,RHU M,MUKKARA A,et al.SCNN:an accelerator for compressed-sparse convolutional neural networks[J].Computer Architecture News,2017,45(2):27-40.
[12] 周聖元,杜子东,陈云霁.稀疏神经网络加速器设计[J].高技术通讯,2019,29(3):24-33. ZHOU S Y,DU Z D,CHEN Y J.Design of sparse neural network accelerator[J].Chinese High Technology Letters,2019,29(3):24-33.(in Chinese)
[13] KIM D,AHN J,YOO S.ZeNA:zero-aware neural network accelerator[J].IEEE Design & Test,2017,35(1):39-46.
[14] ZHU C Y,HUANG K J,YANG S Y,et al.An efficient hardware accelerator for structured sparse convolutional neural networks on FPGAs[EB/OL].(2020-01-07)[2020-06-10].https://arxiv.org/pdf/2001.01955.pdf.
[15] ZHOU X D,DU Z D,GUO Q,et al.Cambricon-S:addressing irregularity in sparse neural networks through a cooperative software/hardware approach[C]//Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture.Washington D.C.,USA:IEEE Press,2018:1-5.
[16] HAN S,MAO H Z,DALLY W J.Deep compression:compressing deep neural networks with pruning,trained quantization and huffman coding[C]//Proceedings of the 4th International Conference on Learning Representations.San Juan,Puerto Rico:[s.n.],2016:1-5.
[17] HAN S,JEFF P,TRAN J,et al.Learning both weights and connections for efficient neural network[C]//Proceedings of NIPS'15.Montreal,Canada:Neural Information Processing Systems Foundation,Inc.,2015:1-5.
[18] ANWAR S,HWANG K,SUNG W.Structured pruning of deep convolutional neural networks[J].ACM Journal on Emerging Technologies in Computing Systems,2017,13(3):1-18.
[19] LAING Y,LU L Q,XIAO Q C,et al.Evaluating fast algorithms for convolutional neural networks on FPGAs[C]//Proceedings of the 25th Annual International Symposium on Field-Programmable Custom Computing Machines.Washington D.C.,USA:IEEE Press,2019:1-5.
[20] KAREN S,ANDREW Z.Very deep convolutional networks for large-scale image recognition[EB/OL].(2015-04-10)[2020-06-10].https://arxiv.org/pdf/1409.1556.pdf.

Please choose a citation manager

Content to export