基于RISC-V的卷积神经网络专用指令集处理器

doi:10.19678/j.issn.1000-3428.0058371

计算机工程 ›› 2021, Vol. 47 ›› Issue (7): 196-204. doi: 10.19678/j.issn.1000-3428.0058371

基于RISC-V的卷积神经网络专用指令集处理器

廖汉松^1,2, 吴朝晖^1,2, 李斌^1,2

1. 华南理工大学微电子学院, 广州 510641;
2. 人工智能与数字经济广东省实验室(广州), 广州 510330

收稿日期:2020-05-19 修回日期:2020-07-11 发布日期:2020-06-30
作者简介:廖汉松(1995-),男,硕士研究生,主研方向为数字集成电路设计;吴朝晖,副教授;李斌,教授、博士生导师。
基金资助:
广东省重点领域研发计划项目（2018B010142001）。

Special Instruction Set Processor for Convolutional Neural Network Based on RISC-V

LIAO Hansong^1,2, WU Zhaohui^1,2, LI Bin^1,2

1. School of Microelectronics, South China University of Technology, Guangzhou 510641, China;
2. Guangdong Artificial Intelligence and Digital Economy Laboratory(Guangzhou), Guangzhou 510330, China

Received:2020-05-19 Revised:2020-07-11 Published:2020-06-30

摘要/Abstract

摘要： 针对x86和ARM商用架构CPU因专利、授权导致定制成本过高和灵活性不够的问题，面向物联网领域提出一种基于RISC-V开源指令集的卷积神经网络（CNN）专用指令集处理器。通过自定义拓展指令调用加速器对轻量化CNN中的卷积和池化操作进行加速，提高终端设备能效。在此过程中，配置CNN各层信息控制加速器进行分组运算，以适应不同大小的输入数据，同时调整加速器的数据通路，对耗时操作进行单独或结合运算，以适应不同的轻量化网络。FPGA平台验证结果表明，该处理器在100 MHz工作频率下推理SqueezeNet网络，耗时约40.89 ms，功耗为1.966 W，较手机处理器单核计算速度更快，与AMD Ryzen7 3700X、NVIDIA RTX2070 Super和Qualcomm Snapdragon 835平台相比，其消耗资源少、功耗低，在性能功耗比上也具有优势。

关键词: RISC-V指令集, 卷积神经网络, 领域专用架构, 专用指令集处理器, 硬件加速

Abstract: The x86-based and ARM-based CPU are limited by the patent authorization,which increases their customization cost and reduces the flexibility.To address the problem,this paper chooses the open-source instruction set architecture,RISC-V,to build an special instruction set processor for Convolutional Neural Network(CNN) used in the Internet of Things(IoT).The processor uses the custom extended instructions to call the accelerator to speed up the convolution and pooling operations of lightweight CNN,improving the power efficiency of terminal devices.In this process,the information of each layer of CNN is configured to control the accelerator to perform grouping operations,so as to adapt to the input data of different sizes.At the same time,the data path of the accelerator is adjusted,and the time-consuming operations are operated separately or in combination to adapt to different lightweight networks.The verification results on the FPGA platform show that this processor delivers a power consumption of 1.966 W when inferring SqueezeNet at 100 MHz.The inference takes about 40.89 ms,which is less than the single-core mobile phone processors take.Also,it reduces the consumption of resources and power,demonstrating an obvious advantage in performance power ratio compared with AMD Ryzen7 3700X,NVIDIA RTX2070 Super and Qualcomm Snapdragon 835.

Key words: RISC-V instruction set, Convolutional Neural Network(CNN), Domain Specific Architecture(DSA), special instruction set processor, hardware acceleration

中图分类号:

TP332

廖汉松, 吴朝晖, 李斌. 基于RISC-V的卷积神经网络专用指令集处理器[J]. 计算机工程, 2021, 47(7): 196-204.

LIAO Hansong, WU Zhaohui, LI Bin. Special Instruction Set Processor for Convolutional Neural Network Based on RISC-V[J]. Computer Engineering, 2021, 47(7): 196-204.

https://www.ecice06.com/CN/Y2021/V47/I7/196

图/表 14

20210721091031

20210721091035

20210721091038

20210721091041

20210721091045

20210721091048

20210721091051

20210721091054

20210721091058

20210721091101

20210721091105

20210721091108

20210721091111

20210721091115

参考文献

[1] WATERMAN A,LEE Y,PATTERSON D A,et al.The RISC-V instruction set manual,volume I:user-level ISA,version 2.1,UCB/EECS-2014-54[R].Berkeley,USA:University of California,Berkeley,2016.
[2] PATTERSON D,WATERMAN A.The RISC-V reader:an open architecture atlas[M].[S.l.]:Strawberry Canyon,2017.
[3] 雷思磊.RISC-V架构的开源处理器及SOC研究综述[J].单片机与嵌入式系统应用,2017,17(2):56-60. LEI S L.Summary of open source processor and SOC of RISC-V architecture[J].Microcontrollers & Embedded Systems,2017,17(2):56-60.(in Chinese)
[4] HUANG J.AI drives the rise of accelerated computing in data centers[EB/OL].(2017-04-10)[2020-05-02].https://blogs.nvidia.com/blog/2017/04/10/ai-drives-rise-accelerated-computing-datacenter/.
[5] JOUPPI N P,YOUNG C,PATIL N,et al.In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th Annual International Symposium on Computer Architecture.New York,USA:ACM Press,2017:1-12.
[6] CHEN Y J,CHEN T S,XU Z W,et al.DianNao family:energy-efficient hardware accelerators for machine learning[J].Communications of the ACM,2016,59(11):105-112.
[7] CHEN Y H,KRISHNA T,EMER J S,et al.Eyeriss:an energy-efficient reconfigurable accelerator for deep convolutional neural networks[J].IEEE Journal of Solid-State Circuits,2016,52(1):127-138.
[8] HENNESSY J L,PATTERSON D A.Computer architecture:a quantitative approach[M].[S.l.]:Morgan Kaufmann,2019.
[9] ASANOVIĆ K,AVIZIENIS R,BACHRACH J,et al.The rocket chip generator[EB/OL].(2016-04-16)[2020-05-02].https://digitalassets.lib.berkeley.edu/techreports/ucb/text/EECS-2016-17.pdf.
[10] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].(2015-04-10)[2020-05-02].https://arxiv.org/pdf/1409.1556.pdf.
[11] IANDOLA F N,HAN S,MOSKEWICZ M W,et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and < 0.5 MB model size[EB/OL].(2016-11-04)[2020-05-02].https://arxiv.org/pdf/1602.07360.pdf.
[12] HOWARD A G,ZHU M,CHEN B,et al.MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].(2017-04-17)[2020-05-02].https://arxiv.org/pdf/1704.04861v1.pdf.
[13] ZHANG X Y,ZHOU X Y,LIN M X,et al.ShuffleNet:an extremely efficient convolutional neural network for mobile devices[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:6848-6856.
[14] BIANCO S,CADENE R,CELONA L,et al.Benchmark analysis of representative deep neural network archi-tectures[J].IEEE Access,2018,6:64270-64277.
[15] Hey-Yahei.OpSummary.MXNet[EB/OL].(2019-11-14)[2020-05-02].https://github.com/hey-yahei/OpSummary.
[16] MIGACZ S.8-bit inference with tensorrt[EB/OL].[2020-05-02].http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bitinference-with-tensorrt.pdf.
[17] IBM.RoCC-software[EB/OL].(2020-02-20)[2020-05-02].https://github.com/IBM/rocc-software.
[18] HUANG C,NI S Y,CHEN G S.A layer-based structured design of CNN on FPGA[C]//Proceedings of 2017 IEEE International Conference on ASIC.Washington D.C.,USA:IEEE Press,2017:1037-1040.
[19] MOUSOULIOTIS P G,PANAYIOTOU K L,TSARDOULIAS E G,et al.Expanding a robot's life:low power object recognition via FPGA-based DCNN deployment[C]//Proceedings of the 7th International Conference on Modern Circuits and Systems Technologies.Washington D.C.,USA:IEEE Press,2018:1-4.
[20] YAO Y C,DUAN Q H,ZHANG Z Q,et al.A FPGA-based hardware accelerator for multiple convolu-tional neural networks[C]//Proceedings of 2018 IEEE International Conference on Solid-State and Integrated Circuit Technology.Washington D.C.,USA:IEEE Press,2018:1-3.
[21] Tencent.NCNN[EB/OL].(2020-05-11)[2020-05-02].https://github.com/Tencent/ncnn.

选择文件类型/文献管理软件名称

选择包含的内容

基于RISC-V的卷积神经网络专用指令集处理器

Special Instruction Set Processor for Convolutional Neural Network Based on RISC-V

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王志浩, 钱沄涛. 基于Swin Transformer的双流遥感图像时空融合超分辨率重建[J]. 计算机工程, 2024, 50(9): 33-45.
[2]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[3]	张鲁, 田春伟, 宋焕生, 刘侍刚. 用于低剂量CT图像去噪的多级双树复小波网络[J]. 计算机工程, 2024, 50(9): 266-275.
[4]	高煜宝, 文志诚. 基于注意力机制的双路解码器图像去噪方法[J]. 计算机工程, 2024, 50(9): 324-332.
[5]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[6]	耿丽丽, 牛保宁. 基于通道相似度熵的卷积神经网络裁剪[J]. 计算机工程, 2024, 50(7): 133-143.
[7]	张洋, 刘畅, 李少青. 基于可控制性度量的图神经网络门级硬件木马检测方法[J]. 计算机工程, 2024, 50(7): 164-173.
[8]	牛瑞婷, 严天峰, 高锐, 王映植. 低信噪比下基于深度学习TCNN-MobileNet的调制识别[J]. 计算机工程, 2024, 50(7): 204-215.
[9]	张溢文, 蔡满春, 陈咏豪, 朱懿, 姚利峰. 融合空间特征的多尺度深度伪造检测方法[J]. 计算机工程, 2024, 50(7): 240-250.
[10]	逯焕宇, 张永宏, 马光义, 谢东林, 田伟. 基于半监督对抗学习的遥感图像水体提取[J]. 计算机工程, 2024, 50(7): 251-263.
[11]	于洋, 孙芳芳, 吕华, 李扬, 王晓民. 基于多尺度时空注意力网络的微表情检测方法[J]. 计算机工程, 2024, 50(6): 228-235.
[12]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[13]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[14]	李政学, 李枝名, 彭德中, 陈杰. 基于特征对比学习和图卷积的社交网络用户分类[J]. 计算机工程, 2024, 50(4): 258-266.
[15]	姜百浩, 刘静, 仇大伟, 姜良. 深度学习在脊柱图像分割中的应用综述[J]. 计算机工程, 2024, 50(3): 1-15.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于RISC-V的卷积神经网络专用指令集处理器

Special Instruction Set Processor for Convolutional Neural Network Based on RISC-V

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献

相关文章 15

编辑推荐

Metrics

本文评价