FPGA Accelerator Design for Hybrid Precision Frequency Domain Convolutional Neural Network

doi:10.19678/j.issn.1000-3428.0066701

Abstract

Abstract:

Deep Convolutional Neural Network(CNN) have large models and high computational complexity, making their deployment in Programmable Gate Array(FPGA) with limited hardware resources difficult. Hybrid precision CNNs can provide an effective trade-off between model size and accuracy, thus providing an efficient solution for reducing the model's memory footprint. As a fast algorithm, the Fast Fourier Transform(FFT) can convert traditional spatial domain CNNs into the frequency domain, effectively reducing the computational complexity of the model. This study presents an FPGA-based accelerator design for 8 bit and 16 bit hybrid precision frequency domain CNNs that supports the dynamic configuration of 8 bit and 16 bit frequency domain convolutions and can pack 8 bit frequency domain multiplication operations to enable the reuse of DSPs for performance improvement. A DSP-based Frequency-domain Processing Element(FPE) is designed to support 8 bit and 16 bit frequency domain convolution operations. It can pack a couple of 8 bit frequency domain multiplications to reuse DSPs to boost throughput. In addition, a mapping dataflow that supports both 8 bit and 16 bit computation patterns and can maximize the reduction of redundant data processing and data movement through data reuse is proposed. The proposed accelerator is evaluated based on the ResNet-18 and VGG16 models using the ImageNet dataset. The experimental results reveal that the proposed model can achieve 29.74 and 56.73 energy efficiency ratio(ratio of GOP to energy consumption)on the ResNet-18 and VGG16 models, respectively, which is 1.2-6.0 times better than those of frequency domain FPGA accelerators.

Key words: Convolutional Neural Network(CNN), hardware accelerator, frequency domain, hybrid precision, Field Programmable Gate Array(FPGA)

摘要：

深度卷积神经网络具有模型大、计算复杂度高的特点，难以部署到硬件资源有限的现场可编程门阵列（FPGA）中。混合精度卷积神经网络可在模型大小和准确率之间做出权衡，从而为降低模型内存占用提供有效方案。快速傅里叶变换作为一种快速算法，可将传统空间域卷积神经网络变换至频域，从而有效降低模型计算复杂度。提出一个基于FPGA的8 bit和16 bit混合精度频域卷积神经网络加速器设计。该加速器支持8 bit和16 bit频域卷积的动态配置，并可将8 bit频域乘法运算打包以复用DSP，用来提升计算性能。首先设计一个基于DSP的频域计算单元，支持8 bit和16 bit频域卷积运算，通过打包一对8 bit频域乘法以复用DSP，从而提升吞吐率。然后提出一个映射数据流，该数据流支持8 bit和16 bit计算两种形式，通过数据重用方式最大化减少冗余数据处理和数据搬运操作。最后使用ImageNet数据集，基于ResNet-18与VGG16模型对所设计的加速器进行评估。实验结果表明，该加速器的能效比（GOP与能耗的比值）在ResNet-18和VGG16模型上分别达到29.74和56.73，较频域FPGA加速器提升1.2~6.0倍。

关键词: 卷积神经网络, 硬件加速器, 频域, 混合精度, 现场可编程门阵列

Yi CHEN, Bosheng LIU, Yongqi XU, Jigang WU. FPGA Accelerator Design for Hybrid Precision Frequency Domain Convolutional Neural Network[J]. Computer Engineering, 2023, 49(12): 1-9.

陈逸, 刘博生, 徐永祺, 武继刚. 混合精度频域卷积神经网络FPGA加速器设计[J]. 计算机工程, 2023, 49(12): 1-9.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0066701

http://www.ecice06.com/EN/Y2023/V49/I12/1

Figures/Tables 15

Fig.1 Frequency-domain convolution process

Fig.2 Internal architecture of DSP

Fig.3 FHA accelerator architecture

Fig.4 FHA accelerator operation process

Fig.5 BMN module and DP module

Fig.6 Data path of FPE

Fig.7 Mapping dataflow

Fig.8 Data reuse effects of mapping dataflow

Fig.9 Power breakdown of three different forms of accelerators

Fig.10 Change trend of model accuracy with training iterations

References 25

1	ABDEL-HAMID O, MOHAMED A R, JIANG H, et al. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(10): 1533- 1545. doi: 10.1109/TASLP.2014.2339736
2	刘通, 胡亮, 王永军, 等. 基于卷积神经网络的卫星遥感图像拼接. 吉林大学学报(理学版), 2022, 60(1): 99- 108. URL
	LIU T, HU L, WANG Y J, et al. Satellite remote sensing image mosaic based on convolutional neural network. Journal of Jilin University(Science Edition), 2022, 60(1): 99- 108. URL
3	LUO R B, SEDLAZECK F J, LAM T W, et al. A multi-task convolutional deep neural network for variant calling in single molecule sequencing. Nature Communications, 2019, 10, 998. doi: 10.1038/s41467-019-09025-z
4	黄瑞, 金光浩, 李磊, 等. 轻量化神经网络加速器的设计与实现. 计算机工程, 2021, 47(9): 185-190, 196. URL
	HUANG R, JIN G H, LI L, et al. Design and implementation of accelerator for lightweight neural network. Computer Engineering, 2021, 47(9): 185-190, 196. URL
5	LI Z S, WANG L, GUO S S, et al. Laius: an 8-bit fixed-point CNN hardware inference engine[C]//Proceedings of IEEE International Symposium on Parallel and Distributed Processing with Applications and IEEE International Conference on Ubiquitous Computing and Communications. Washington D. C., USA: IEEE Press, 2018: 143-150.
6	ZHANG J X, ZHANG W T, LUO G J, et al. Frequency improvement of systolic array-based CNNs on FPGAs[C]//Proceedings of IEEE International Symposium on Circuits and System. Washington D. C., USA: IEEE Press, 2019: 1-4.
7	WANG D, XU K, GUO J N, et al. DSP-efficient hardware acceleration of convolutional neural network inference on FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(12): 4867- 4880. doi: 10.1109/TCAD.2020.2968023
8	WANG J B, FANG S X, WANG X, et al. High-performance mixed-low-precision CNN inference accelerator on FPGA. IEEE Micro, 2021, 41(4): 31- 38. doi: 10.1109/MM.2021.3081735
9	NGUYEN D T, KIM H, LEE H J. Layer-specific optimization for mixed data flow with mixed precision in FPGA design for CNN-based object detectors. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(6): 2450- 2464. doi: 10.1109/TCSVT.2020.3020569
10	NAKAHARA H, QUE Z Q, LUK W. High-throughput convolutional neural network on an FPGA by customized JPEG compression[C]//Proceedings of the 28th Annual International Symposium on Field-Programmable Custom Computing Machines. Washington D. C., USA: IEEE Press, 2020: 1-9.
11	LIANG Y, LU L Q, XIAO Q C, et al. Evaluating fast algorithms for convolutional neural networks on FPGAs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(4): 857- 870. doi: 10.1109/TCAD.2019.2897701
12	NIU Y, ZENG H Q, SRIVASTAVA A, et al. SPEC2: spectral SParsE CNN accelerator on FPGAs[C]//Proceedings of the 26th International Conference on High Performance Computing, Data, and Analytics. Washington D. C., USA: IEEE Press, 2020: 195-204.
13	RAJAT R, ZENG H Q, PRASANNA V. A flexible design automation tool for accelerating quantized spectral CNNs[C]//Proceedings of the 29th International Conference on Field Programmable Logic and Applications. Washington D. C., USA: IEEE Press, 2019: 144-150.
14	NGUYEN D, KIM D, LEE J. Double MAC: doubling the performance of convolutional neural networks on modern FPGAs[C]//Proceedings of Design, Automation & Test in Europe Conference & Exhibition. Washington D. C., USA: IEEE Press, 2017: 890-893.
15	FU Y, WU E, SIRASAO A, et al. Deep learning with INT8 optimization on Xilinx devices[EB/OL]. (2017-04-24) [2023-03-24]. https://www.xilinx.com/content/dam/xilinx/support/documents/white_papers/wp486-deep-learning-int8.pdf.
16	FU Y, WU E, SANTHASEELAN V, et al. Embedded vision with INT8 optimization on Xilinx devices[EB/OL]. (2017-04-19)[2023-03-24]. https://www.xilinx.com/content/dam/xilinx/support/documents/white_papers/wp490-embedded-vision-int8.pdf.
17	MIYAGI R, TAKAGI N, KINOSHISTA S, et al. ZytleBot: FPGA integrated ROS-based autonomous mobile robot[C]//Proceedings of International Conference on Field-Programmable Technology. Washington D. C., USA: IEEE Press, 2021: 1-4.
18	DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2009: 248-255.
19	RUSSAKOVSKY O, DENG J, SU H, et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211- 252. doi: 10.1007/s11263-015-0816-y
20	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2016: 770-778.
21	SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. (2015-04-10)[2023-03-24]. https://arxiv.org/pdf/1409.1556.pdf.
22	LIANG L, DENG L, ZENG Y, et al. Crossbar-aware neural network pruning[EB/OL]. (2018-09-06)[2023-03-24]. https://arxiv.org/abs/1807.10816.
23	MENG J, YANG L, PENG X C, et al. Structured pruning of RRAM crossbars for efficient In-memory computing acceleration of deep neural networks. IEEE Transactions on Circuits and Systems Ⅱ: Express Briefs, 2021, 68(5): 1576- 1580. doi: 10.1109/TCSII.2021.3069011
24	ZHANG C, PRASANNA V. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system[C]//Proceedings of 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. New York, USA: ACM Press, 2017: 35-44.
25	ABTAHI T, SHEA C, KULKARNI A, et al. Accelerating convolutional neural network with FFT on embedded hardware. IEEE Transactions on Very Large Scale Integration(VLSI) Systems, 2018, 26(9): 1737- 1749. doi: 10.1109/TVLSI.2018.2825145

[1]	Yixiao DU, Hongjun WANG, Xiuhe LI. Research on Fingerprint Positioning Method of Radiation Source Based on Spectrum Map [J]. Computer Engineering, 2023, 49(9): 183-190, 198.
[2]	Xianguo LI, Bin LI. Image Deblurring Based on Transformer and Multi-scale CNN [J]. Computer Engineering, 2023, 49(9): 226-233, 245.
[3]	Lu HAN, Weigang HUO, Yonghui ZHANG, Tao LIU. Multivariate Time Series Forecasting Based on Multi-Scale Feature Fusion and Dual-Attention Mechanism [J]. Computer Engineering, 2023, 49(9): 99-108.
[4]	CHANG Jian, LIU Xinshu. Low Illumination Image Enhancement with Spatial Transformation and Adaptive Gray Correction [J]. Computer Engineering, 2023, 49(6): 193-200,207.
[5]	SHEN Xueli, TIAN Guiyuan, JIANG Yanji, MA Linlin. Time-Frequency Domain Speech Enhancement Algorithm Based on Dual-Stage Conv-Transformer [J]. Computer Engineering, 2023, 49(6): 123-130.
[6]	DING Zixuan, YU Lei, ZHANG Juan, LI Xiang, WANG Xinyu. Image Super-Resolution Reconstruction Based on Depth Residual Adaptive Attention Network [J]. Computer Engineering, 2023, 49(5): 231-238.
[7]	CHEN Zhixu, JIN Yanxia, LU Ye, YANG Jing, LIU Yabian, SHI Zhiru. Multi-Precision Clothing Modeling Method Based on Subgraph Convolutional Neural Network [J]. Computer Engineering, 2023, 49(4): 174-181.
[8]	YANG Jingjing, XIE Haiyan, XUE Nini, ZHANG Aoming. Research on Underwater Image Denoising Based on Dual-Channels Residual Network [J]. Computer Engineering, 2023, 49(4): 188-198.
[9]	ZHONG Baorong, WU Xialing. Research on Lightweight Human Pose Estimation Based on High-Resolution Network [J]. Computer Engineering, 2023, 49(4): 226-232,239.
[10]	CHEN Rui, SUN Yufei, GUO Qiang, SUI Yicheng, ZHOU Zhenhui, SHI Changqing, ZHANG Yuzhi. OclDNN: A General-Purpose DNN Library for TensorFlow [J]. Computer Engineering, 2023, 49(4): 138-148.
[11]	LIU Jingjng, HUANG Hao. Fundamental Frequency Extraction Model Using Convolutional Neural Networks with Non-local Modules [J]. Computer Engineering, 2023, 49(3): 128-133,160.
[12]	ZOU Changlong, AN Jingmin, LI Guanyu. Knowledge Graph Entity Type Completion Based on Neighborhood Aggregation and CNN [J]. Computer Engineering, 2023, 49(3): 134-141.
[13]	CHENG Xiaohui, LI Yu, KANG Yanping. Double Standard Pruning of Convolution Network Based on Feature Extraction of Intermediate Graph [J]. Computer Engineering, 2023, 49(3): 105-112.
[14]	XU Hong, JIAO Guie, ZHANG Wenjun, CHEN Yimin. Classification Algorithm for Structured Imbalanced Data Based on Convolutional Neural Network [J]. Computer Engineering, 2023, 49(2): 81-89.
[15]	WANG Zhen, LI Hao, YAN Dongmei, ZHU Yongrong. Pavement Disease Detection Model Based on Improved YOLOv5 [J]. Computer Engineering, 2023, 49(2): 15-23.

Please choose a citation manager

Content to export