作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (7): 196-204. doi: 10.19678/j.issn.1000-3428.0058371

• 体系结构与软件技术 • 上一篇    下一篇

基于RISC-V的卷积神经网络专用指令集处理器

廖汉松1,2, 吴朝晖1,2, 李斌1,2   

  1. 1. 华南理工大学 微电子学院, 广州 510641;
    2. 人工智能与数字经济广东省实验室(广州), 广州 510330
  • 收稿日期:2020-05-19 修回日期:2020-07-11 发布日期:2020-06-30
  • 作者简介:廖汉松(1995-),男,硕士研究生,主研方向为数字集成电路设计;吴朝晖,副教授;李斌,教授、博士生导师。
  • 基金资助:
    广东省重点领域研发计划项目(2018B010142001)。

Special Instruction Set Processor for Convolutional Neural Network Based on RISC-V

LIAO Hansong1,2, WU Zhaohui1,2, LI Bin1,2   

  1. 1. School of Microelectronics, South China University of Technology, Guangzhou 510641, China;
    2. Guangdong Artificial Intelligence and Digital Economy Laboratory(Guangzhou), Guangzhou 510330, China
  • Received:2020-05-19 Revised:2020-07-11 Published:2020-06-30

摘要: 针对x86和ARM商用架构CPU因专利、授权导致定制成本过高和灵活性不够的问题,面向物联网领域提出一种基于RISC-V开源指令集的卷积神经网络(CNN)专用指令集处理器。通过自定义拓展指令调用加速器对轻量化CNN中的卷积和池化操作进行加速,提高终端设备能效。在此过程中,配置CNN各层信息控制加速器进行分组运算,以适应不同大小的输入数据,同时调整加速器的数据通路,对耗时操作进行单独或结合运算,以适应不同的轻量化网络。FPGA平台验证结果表明,该处理器在100 MHz工作频率下推理SqueezeNet网络,耗时约40.89 ms,功耗为1.966 W,较手机处理器单核计算速度更快,与AMD Ryzen7 3700X、NVIDIA RTX2070 Super和Qualcomm Snapdragon 835平台相比,其消耗资源少、功耗低,在性能功耗比上也具有优势。

关键词: RISC-V指令集, 卷积神经网络, 领域专用架构, 专用指令集处理器, 硬件加速

Abstract: The x86-based and ARM-based CPU are limited by the patent authorization,which increases their customization cost and reduces the flexibility.To address the problem,this paper chooses the open-source instruction set architecture,RISC-V,to build an special instruction set processor for Convolutional Neural Network(CNN) used in the Internet of Things(IoT).The processor uses the custom extended instructions to call the accelerator to speed up the convolution and pooling operations of lightweight CNN,improving the power efficiency of terminal devices.In this process,the information of each layer of CNN is configured to control the accelerator to perform grouping operations,so as to adapt to the input data of different sizes.At the same time,the data path of the accelerator is adjusted,and the time-consuming operations are operated separately or in combination to adapt to different lightweight networks.The verification results on the FPGA platform show that this processor delivers a power consumption of 1.966 W when inferring SqueezeNet at 100 MHz.The inference takes about 40.89 ms,which is less than the single-core mobile phone processors take.Also,it reduces the consumption of resources and power,demonstrating an obvious advantage in performance power ratio compared with AMD Ryzen7 3700X,NVIDIA RTX2070 Super and Qualcomm Snapdragon 835.

Key words: RISC-V instruction set, Convolutional Neural Network(CNN), Domain Specific Architecture(DSA), special instruction set processor, hardware acceleration

中图分类号: