基于RISC-V处理器的卷积加速SoC系统设计

doi:10.19678/j.issn.1000-3428.0057835

计算机工程 ›› 2021, Vol. 47 ›› Issue (4): 153-157. doi: 10.19678/j.issn.1000-3428.0057835

基于RISC-V处理器的卷积加速SoC系统设计

张坤宁¹, 赵烁¹, 何虎¹, 邓宁¹, 杨旭²

1. 清华大学微电子学研究所, 北京 100084;
2. 北京理工大学软件学院, 北京 100081

收稿日期:2020-03-23 修回日期:2020-05-13 发布日期:2020-04-24
作者简介:张坤宁(1995-),女,硕士,主研方向为卷积加速器设计与优化;赵烁,硕士;何虎(通信作者),副教授、博士;邓宁,教授、博士;杨旭,副教授、博士。
基金资助:
国家自然科学基金（91846303）。

Design of SoC System for Convolution Acceleration Based on RISC-V Processor

ZHANG Kunning¹, ZHAO Shuo¹, HE Hu¹, DENG Ning¹, YANG Xu²

1. Institute of Microelectronics, Tsinghua University, Beijing 100084, China;
2. School of Software, Beijing Institute of Technology, Beijing 100081, China

Received:2020-03-23 Revised:2020-05-13 Published:2020-04-24

摘要/Abstract

摘要： 为提高卷积神经网络（CNN）的计算效率和能效，以8 bit定点数据作为输入，设计一个支持激活、批标准化以及池化等CNN网络中常见计算类型的卷积加速器，优化循环计算顺序并将其与数据复用技术相结合，以提高卷积计算的效率。基于软硬件协同设计思想，构建包含RISC-V处理器和卷积加速器的SoC系统，RISC-V处理器基于开源的指令集标准，可以根据具体的设计需求扩展指令功能。将该SoC系统部署在Xilinx ZCU102开发板上，RISC-V处理器和卷积加速器分别工作在100 MHz和300 MHz频率下，测试结果表明，该加速器的算力达到153.6 GOP/s，运行VGG16网络进行图片推理计算时加速效果较好。

关键词: 卷积加速, 循环计算优化, 数据复用, RISC-V处理器, SoC系统, 软硬件协同设计

Abstract: To improve the computation and energy efficiency of Convolutional Neural Network(CNN),this paper proposes a convolution accelerator with 8 bit fixed-point data as input.The accelerator supports common CNN calculations,including activation,Batch Normalization(BN) and pooling.By optimizing the loop computation order and adopting the data reuse strategy,the convolution computation efficiency is greatly improved.Based on the idea of the co-design of software and hardware,a SoC system including a RISC-V processor and the convolution accelerator is designed.The RISC-V processor is based on the open source instruction set,which makes it flexible to add instructions according to specific design requirements.The SoC system is deployed on the Xilinx ZCU102 board,where the RISC-V processor and the accelerator work at the frequency of 100 MHz and 300 MHz,respectively.The testing results show that the computing speed of the accelerator reaches 153.6 GOP/s.It provides a significant speedup for VGG16 network running for inference computation of pictures.

Key words: convolution acceleration, loop computation optimization, data reuse, RISC-V processor, SoC system, co-design of software and hardware

中图分类号:

TP332

张坤宁, 赵烁, 何虎, 邓宁, 杨旭. 基于RISC-V处理器的卷积加速SoC系统设计[J]. 计算机工程, 2021, 47(4): 153-157.

ZHANG Kunning, ZHAO Shuo, HE Hu, DENG Ning, YANG Xu. Design of SoC System for Convolution Acceleration Based on RISC-V Processor[J]. Computer Engineering, 2021, 47(4): 153-157.

https://www.ecice06.com/CN/Y2021/V47/I4/153

图/表 7

20210425170014

20210425170017

20210425170020

20210425170023

20210425170025

20210425170028

20210425170030

参考文献

[1] DU Guiming,WANG Xia,WANG Guangyan,et al.Speech recognition based on convolutional neural networks[C]//Proceedings of 2016 IEEE International Conference on Signal and Image Processing.Washington D.C.,USA:IEEE Press,2016:708-711.
[2] LI Haoxiang,LIN Zhe,SHEN Xiaohui,et al.A convolu-tional neural network cascade for face detection[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:5325-5334.
[3] REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.
[4] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2020-02-05].https://arxiv.org/pdf/1409.1556.pdf.
[5] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778.
[6] ANDREW W,YUNSUP L,DAVID A P,et al.The RISC-V instruction set manual,volume I:user-level ISA,version 2.0:UCB/EECS-2014-54[EB/OL].[2020-02-05].https://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-54.pdf.
[7] LEI Silei.Research on open source processor and SoC based on RISC-V[J].Microcontrollers & Embedded Systems,2017,17(2):56-60.(in Chinese)雷思磊.RISC-V架构的开源处理器及SoC研究综述[J].单片机与嵌入式系统应用,2017,17(2):56-60.
[8] ROVINSKI A,CHUN Z,AL-HAWAJ K,et al.Evaluating celerity:a 16-nm 695 Giga-RISC-V instructions/s manycore processor with synthesizable PLL[J].IEEE Solid-State Circuits Letters,2019,2(12):289-292.
[9] CAVALCANTE M,SCHUIKI F,ZARUBA F,et al.Ara:a 1-GHz+ scalable and energy-efficient RISC-V vector processor with multiprecision floating-point support in 22-nm FD-SOI[J].IEEE Transactions on Very Large Scale Integration Systems,2020,28(2):530-543.
[10] MELONI P,CAPOTONDI A,DERIU G,et al.Neuraghe:exploiting CPU-FPGA synergies for efficient and flexible CNN inference acceleration on Zynq SoCs[J].ACM Transactions on Reconfigurable Technology and Systems,2017,11(3):1-24.
[11] XUE Chengbo,CAO Shan,JIANG Rongkun,et al.A reconfigurable pipelined architecture for convolutional neural network acceleration[C]//Proceedings of 2018 IEEE International Symposium on Circuits and Systems.Washington D.C.,USA:IEEE Press,2018:1-5.
[12] YAO Yuchen,DUAN Qinghua,ZHANG Zhiqian,et al.A FPGA-based hardware accelerator for multiple convo-lutional neural networks[C]//Proceedings of 2018 IEEE International Conference on Solid-State and Integrated Circuit Technology.Washington D.C.,USA:IEEE Press,2018:1075-1077.
[13] CHANG Mengchou,PAN Zegang,CHEN Junliang.Hardware accelerator for boosting convolution computation in image classification applications[C]//Proceedings of 2017 IEEE Global Conference on Consumer Electronics.Washington D.C.,USA:IEEE Press,2017:1-2.
[14] FLAMAND E,ROSSI D,CONTI F,et al.GAP-8:a RISC-V SoC for AI at the edge of the IoT[C]//Proceedings of 2018 IEEE International Conference on Application-Specific Systems,Architectures and Processors.Washington D.C.,USA:IEEE Press,2018:1-4.
[15] YANG Weike.Research on design method of convolution neural network accelerator based on RISC-V open source processor[D].Shanghai:Shanghai Jiao Tong University,2018.(in Chinese)杨维科.基于RISC-V开源处理器的卷积神经网络加速器设计方法研究[D].上海:上海交通大学,2018.
[16] ASANOVIC K,AVIZIENIS R,BACHRACH J,et al.The rocket chip generator:UCB/EECS-2016-17[EB/OL].[2020-02-05].https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.pdf.
[17] YANG Weike,HE Guanghui,JING Naifeng.Design and implementation of CNN acceleration module based on Rocket-Chip open source processor[J].Microelectronics & Computer,2018,35(4):17-21.(in Chinese)杨维科,贺光辉,景乃锋.基于Rocket-Chip开源处理器的CNN加速模块的设计与实现[J].微电子学与计算机,2018,35(4):17-21.
[18] LI Dongze,GONG Haoran,CHANG Yuchun.Implementing RISCV system-on-chip for acceleration of convolution operation and activation function based on FPGA[C]//Proceedings of 2018 IEEE International Conference on Solid-State and Integrated Circuit Technology.Washington D.C.,USA:IEEE Press,2018:1-3.
[19] ZHANG Chen,LI Peng,SUN Guangyu,et al.Optimizing FPGA-based accelerator design for deep convolutional neural networks[C]//Proceedings of 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.New York,USA:ACM Press,2015:161-170.
[20] CHEN Y,KRISHNA T,EMER J,et al.Eyeriss:an energy-efficient reconfigurable accelerator for deep convolutional neural networks[C]//Proceedings of 2016 IEEE International Solid-State Circuits Conference.Washington D.C.,USA:IEEE Press,2016:262-263.
[21] ZHAO R Z,LUK W,NIU X Y,et al.Hardware acceleration for machine learning[C]//Proceedings of 2017 IEEE Computer Society Annual Symposium on VLSI.Washington D.C.,USA:IEEE Press,2017:645-650.
[22] MA Y F,CAO Y,VRUDHULA S,et al.Optimizing the convolution operation to accelerate deep neural networks on FPGA[J].IEEE Transactions on Very Large Scale Integration Systems,2018,26(7):1354-1367.
[23] CHEN Tianshi,DU Zidong,SUN Ninghui,et al.DianNao:a small-footprint high-throughput accelerator for ubiquitous machine-learning[J].ACM SIGPLAN Notices,2014,49(4):269-283.

选择文件类型/文献管理软件名称

选择包含的内容

基于RISC-V处理器的卷积加速SoC系统设计

Design of SoC System for Convolution Acceleration Based on RISC-V Processor

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献

相关文章 8

编辑推荐

Metrics

本文评价

[1]	洪起润, 王琴. 基于帧间数据复用的稀疏CNN加速器设计[J]. 计算机工程, 2023, 49(12): 55-62.
[2]	李瑞珍,张晓旭,马德,黄凯,严晓浪. 一种灵活可配置的JPEG 编解码器软硬件架构[J]. 计算机工程, 2014, 40(11): 266-272.
[3]	石长振, 王贞松. 基于Chirp Scaling算法的相位补偿优化[J]. 计算机工程, 2011, 37(24): 239-241.
[4]	谢平, 李蜀瑜. 改进PSO算法在软/硬件划分中的应用[J]. 计算机工程, 2011, 37(13): 254-256,271.
[5]	刘　滔;李仁发;陈　宇;刘　彦;付　彬. 基于过程级编程模型的软硬件协同设计框架[J]. 计算机工程, 2010, 36(4): 259-261.
[6]	陈芸;王遵彤;凌毅. 基于多代理系统的软硬件协同设计[J]. 计算机工程, 2010, 36(4): 256-258.
[7]	初建朋;李小进;赖宗声. 802.11a频率同步和信道估计的分析及ASP实现[J]. 计算机工程, 2007, 33(09): 114-116.
[8]	陈东晓;高磊;梅优良. 一种快速断点仿真器的软硬件协同设计[J]. 计算机工程, 2007, 33(05): 246-248.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于RISC-V处理器的卷积加速SoC系统设计

Design of SoC System for Convolution Acceleration Based on RISC-V Processor

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 7

参考文献

相关文章 8

编辑推荐

Metrics

本文评价