Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2021, Vol. 47 ›› Issue (7): 189-195,204. doi: 10.19678/j.issn.1000-3428.0058640

• Computer Architecture and Software Technology • Previous Articles     Next Articles

FPGA-based Accelerator for Sparse Convolutional Neutral Network

DI Xinkai1,2, YANG Haigang1,2   

  1. 1. Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2020-06-15 Revised:2020-08-08 Published:2020-08-17

基于FPGA的稀疏化卷积神经网络加速器

狄新凯1,2, 杨海钢1,2   

  1. 1. 中国科学院空天信息创新研究院, 北京 100094;
    2. 中国科学院大学, 北京 100049
  • 作者简介:狄新凯(1992-),男,博士研究生,主研方向为高性能计算、计算机体系结构;杨海钢,研究员、博士、博士生导师。
  • 基金资助:
    国家自然科学基金(61876172);北京市科委重大科研计划项目(Z171100000117019)。

Abstract: In order to eliminate the invalid operations caused by the sparsity of the model parameters in the forward process of the Convolution Neural Network(CNN),a dataflow and parallel accelerator system for the sparse neural network are designed based on the Field Programmable Gate array(FPGA).By using a dedicated logic module,the non-zero elements in the feature map matrices and the convolution filter matrices are picked up.Then the valid data is transferred to the array consisting of Digital Signal Processor(DSP) for multiply-accumulate operations. On this basis,all relevant intermediate results are transferred to the adder tree to generate the final output feature map.Meanwhile,the coarse-grained parallelism is implemented along the width,height and output channel of the feature maps,and the optimal design parameters are searched for.Experiments are carried out based on Xilinx FPGAs for verification,and the results show that the design enables the sparse convolution layer in VGG to deliver performance of 678.2 GOPS and energy efficiency of 69.45 GOPS/W,displaying a considerable improvement of performance and energy efficiency compared with FPGA-based accelerators for the dense and sparse networks.

Key words: Convolutional Neural Network(CNN), sparsity, Field Programmable Gate Array (FPGA), parallel accelerator, Digital Signal Processor(DSP), adder tree

摘要: 为消除卷积神经网络前向计算过程中因模型参数的稀疏性而出现的无效运算,基于现场可编程门阵列(FPGA)设计针对稀疏化神经网络模型的数据流及并行加速器。通过专用逻辑模块在输入通道方向上筛选出特征图矩阵和卷积滤波器矩阵中的非零点,将有效数据传递给由数字信号处理器组成的阵列做乘累加操作。在此基础上,对所有相关的中间结果经加法树获得最终输出特征图点,同时在特征图宽度、高度和输出通道方向上做粗颗粒度并行并寻找最佳的设计参数。在Xilinx器件上进行实验验证,结果表明,该设计实现VGG16卷积层综合性能达到678.2 GOPS,性能功耗比为69.45 GOPS/W,其性能与功耗指标较基于FPGA的稠密网络加速器和稀疏网络加速器有较大提升。

关键词: 卷积神经网络, 稀疏性, 现场可编程门阵列, 并行加速器, 数字信号处理器, 加法树

CLC Number: