计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

基于FPGA的卷积神经网络加速器

余子健 1,马德 2,严晓浪 1,沈君成 1   

  1. (1.浙江大学 电气工程学院,杭州 310027; 2.杭州电子科技大学 电子信息学院,杭州 310018)
  • 收稿日期:2016-01-25 出版日期:2017-01-15 发布日期:2017-01-13
  • 作者简介:余子健(1990—),男,硕士研究生,主研方向为片上系统;马德,讲师、博士;严晓浪,教授、博士;沈君成,博士研究生。
  • 基金项目:
    国家“863”计划项目“CMC系列芯片的设计、开发与制造”(2012AA041701)。

FPGA-based Accelerator for Convolutional Neural Network

YU Zijian  1,MA De  2,YAN Xiaolang  1,SHEN Juncheng  1   

  1. (1.College of Electrical Engineering,Zhejiang University,Hangzhou 310027,China; 2.College of Electronics and Information,Hangzhou Dianzi University,Hangzhou 310018,China)
  • Received:2016-01-25 Online:2017-01-15 Published:2017-01-13

摘要: 现有软件实现方案难以满足卷积神经网络对运算性能与功耗的要求。为此,设计一种基于现场可编程门阵列(FPGA)的卷积神经网络加速器。在粗粒度并行层面对卷积运算单元进行并行化加速,并使用流水线实现完整单层运算过程,使单个时钟周期能够完成20次乘累加,从而提升运算效率。针对MNIST手写数字字符识别的实验结果表明,在75 MHz的工作频率下,该加速器可使FPGA峰值运算速度达到0.676 GMAC/s,相较通用CPU平台实现4倍加速,而功耗仅为其2.68%。

关键词: 卷积神经网络, 现场可编程门阵列, 加速器, 流水线, 并行化

Abstract: Aiming at the problem that existing software implementation schemes of Convolutional Neutral Network(CNN) cannot meet the requirements of computing performance and power consumption,this paper proposes a Field Programmable Gate Array (FPGA)-based accelerator for CNN.The convolution computation unit is paralled accelerated in the coarse-grained paralleled level and the whole process is fully pipelined.This optimization allows 20 multiply-accumulations to finish in a single cycle,which greatly improves calculation efficiency.Experimental results for MNIST handwritten digits character recoghition show that the proposed FPGA-based accelerator can achieve peak performance of 0.676 GMAC/s under 75 MHz,and be 4 times faster than general CPU platform,while the power consumption is only 2.68 percent of it.

Key words: Convolutional Neutral Network(CNN), Field Programmable Gate Array(FPGA), accelerator, pipeline, parallelization

中图分类号: