作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (9): 185-190,196. doi: 10.19678/j.issn.1000-3428.0059351

• 体系结构与软件技术 • 上一篇    下一篇

轻量化神经网络加速器的设计与实现

黄瑞1, 金光浩1, 李磊1, 姜文超2, 宋庆增1   

  1. 1. 天津工业大学 计算机科学与技术学院, 天津 300387;
    2. 广东工业大学 计算机学院, 广州 510006
  • 收稿日期:2020-08-24 修回日期:2020-10-19 发布日期:2020-11-02
  • 作者简介:黄瑞(1995-),男,硕士研究生,主研方向为深度学习;金光浩,讲师、博士;李磊,硕士研究生;姜文超,讲师、博士;宋庆增(通信作者),副教授、博士。
  • 基金资助:
    广东省自然科学基金(2018A030313061);广东省科技计划项目(2017B010124001,201902020016,2019B010139001)。

Design and Implementation of Accelerator for Lightweight Neural Network

HUANG Rui1, JIN Guanghao1, LI Lei1, JIANG Wenchao2, SONG Qingzeng1   

  1. 1. School of Computer Science and Technology, Tianjin Polytechnic University, Tianjin 300387, China;
    2. Faulty of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2020-08-24 Revised:2020-10-19 Published:2020-11-02

摘要: 针对以MobileNet为代表的轻量化卷积网络,基于现场可编程门阵列平台设计网络加速器。通过优化DW、PW轻量化模块并实现常用的卷积、ReLU等功能模块,满足神经网络加速器低功耗、低时延的要求,同时基于指令设计使加速器支持MobileNet及各类变种。利用上位机配置YoloV3 tiny(不含轻量模块)指令和YoloV3&MobileNet(含轻量模块)指令进行目标检测,实验结果表明,该网络加速器具有较快的推断速度,用于YoloV3 tiny结构时达到85 frame/s,用于YoloV3&MobileNet结构时达到62 frame/s。

关键词: 硬件加速, 模型压缩, 轻量化神经网络, 现场可编程门阵列, 并行计算

Abstract: This paper designs a network accelerator based on the Field Programmable Gate Array(FPGA) platform for the lightweight convolutional network represented by MobileNet.By optimizing DW and PW lightweight modules and implementing commonly used convolution, ReLU and other functional modules, the neural network accelerator meets the requirements of low power consumption and low latency.At the same time, based on instruction-based design technology, the neural network accelerator supports MobileNet and its various variants.By configuring the target detection experiment of YoloV3 tiny(without lightweight modules) instructions and YoloV3&MobileNet(including lightweight modules) instructions on the host computer, the neural network accelerator can reach a faster inference speed. It reaches 85 frame/s for the YoloV3 tiny structure, reaches 62 frame/s for YoloV3&MobileNet structure.

Key words: hardware acceleration, model compression, lightweight neural network, Field Programmable Gate Array(FPGA), parallel computing

中图分类号: