轻量化神经网络加速器的设计与实现

doi:10.19678/j.issn.1000-3428.0059351

计算机工程 ›› 2021, Vol. 47 ›› Issue (9): 185-190,196. doi: 10.19678/j.issn.1000-3428.0059351

轻量化神经网络加速器的设计与实现

黄瑞¹, 金光浩¹, 李磊¹, 姜文超², 宋庆增¹

1. 天津工业大学计算机科学与技术学院, 天津 300387;
2. 广东工业大学计算机学院, 广州 510006

收稿日期:2020-08-24 修回日期:2020-10-19 发布日期:2020-11-02
作者简介:黄瑞(1995-),男,硕士研究生,主研方向为深度学习;金光浩,讲师、博士;李磊,硕士研究生;姜文超,讲师、博士;宋庆增(通信作者),副教授、博士。
基金资助:
广东省自然科学基金（2018A030313061）；广东省科技计划项目（2017B010124001，201902020016，2019B010139001）。

Design and Implementation of Accelerator for Lightweight Neural Network

HUANG Rui¹, JIN Guanghao¹, LI Lei¹, JIANG Wenchao², SONG Qingzeng¹

1. School of Computer Science and Technology, Tianjin Polytechnic University, Tianjin 300387, China;
2. Faulty of Computer, Guangdong University of Technology, Guangzhou 510006, China

Received:2020-08-24 Revised:2020-10-19 Published:2020-11-02

摘要/Abstract

摘要： 针对以MobileNet为代表的轻量化卷积网络，基于现场可编程门阵列平台设计网络加速器。通过优化DW、PW轻量化模块并实现常用的卷积、ReLU等功能模块，满足神经网络加速器低功耗、低时延的要求，同时基于指令设计使加速器支持MobileNet及各类变种。利用上位机配置YoloV3 tiny（不含轻量模块）指令和YoloV3&MobileNet（含轻量模块）指令进行目标检测，实验结果表明，该网络加速器具有较快的推断速度，用于YoloV3 tiny结构时达到85 frame/s，用于YoloV3&MobileNet结构时达到62 frame/s。

关键词: 硬件加速, 模型压缩, 轻量化神经网络, 现场可编程门阵列, 并行计算

Abstract: This paper designs a network accelerator based on the Field Programmable Gate Array(FPGA) platform for the lightweight convolutional network represented by MobileNet.By optimizing DW and PW lightweight modules and implementing commonly used convolution, ReLU and other functional modules, the neural network accelerator meets the requirements of low power consumption and low latency.At the same time, based on instruction-based design technology, the neural network accelerator supports MobileNet and its various variants.By configuring the target detection experiment of YoloV3 tiny(without lightweight modules) instructions and YoloV3&MobileNet(including lightweight modules) instructions on the host computer, the neural network accelerator can reach a faster inference speed. It reaches 85 frame/s for the YoloV3 tiny structure, reaches 62 frame/s for YoloV3&MobileNet structure.

Key words: hardware acceleration, model compression, lightweight neural network, Field Programmable Gate Array(FPGA), parallel computing

中图分类号:

TP391

黄瑞, 金光浩, 李磊, 姜文超, 宋庆增. 轻量化神经网络加速器的设计与实现[J]. 计算机工程, 2021, 47(9): 185-190,196.

HUANG Rui, JIN Guanghao, LI Lei, JIANG Wenchao, SONG Qingzeng. Design and Implementation of Accelerator for Lightweight Neural Network[J]. Computer Engineering, 2021, 47(9): 185-190,196.

https://www.ecice06.com/CN/Y2021/V47/I9/185

图/表 14

20210917191719

20210917191724

20210917191728

20210917191732

20210917191735

20210917191740

20210917191743

20210917191747

20210917191751

20210917191754

20210917191758

20210917191801

20210917191805

20210917191809

参考文献

[1] QIU J, WANG J, YAO S, et al.Going deeper with embedded FPGA platform for convolutional neural network[C]//Proceedings of 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.New York, USA:ACM Press, 2016:26-35.
[2] SHEN Y R, HAN T, YANG Q, et al.CS-CNN:enabling robust and efficient convolutional neural networks inference for Internet-of-things applications[J].IEEE Access, 2018, 6:13439-13448.
[3] ZHANG X Y, ZHOU X Y, LIN M X, et al.ShuffleNet:an extremely efficient convolutional neural network for mobile device[C]//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:6848-6856.
[4] 崔小乐, 陈红英, 崔小欣, 等.一种软硬件协同设计工具原型及其设计描述方法[J].微电子学与计算机, 2007, 24(6):28-30. CUI X L, CHEN H Y, CUI X X, et al, A prototype of software and hardware collaborative design tool and its design description method[J].Microelectronics and Computer, 2007, 24(6):28-30.(in Chinese)
[5] ABDELOUAHAB K, BOURRASSET C, PELCAT M, et al.A holistic approach for optimizing DSP block utilization of a CNN implementation on FPGA[C]//Proceedings of the 10th International Conference on Distributed Smart Camera.Paris, France:[s.n.], 2016:69-75.
[6] WU B C, IANDOLA F, JIN P H, et al.SqueezeDet:unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:446-454.
[7] CHOLLET F.Xception:deep learning with depthwise separable convolutions[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:1251-1258.
[8] ZHANG C, SUN G, FANG Z, et al.Caffeine:towards uniformed representation and acceleration for deep convolutional neural networks[J].IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018, 46(9):2231-2321.
[9] DESOLI G, CHAWLA N, BOESCH T, et al.Deep convolutional neural network SoC in FD-SOI 28 nm for intelligent embedded systems[C]//Proceedings of 2017 Asian Solid-State Circuits Conference.Washington D.C., USA:IEEE Press, 2017:1-5.
[10] JACOB B, KLIGYS S, CHEN B, et al.Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:1-5.
[11] KRISHNAMOORTHI R.Quantizing deep convolutional networks for efficient inference:a whitepaper[EB/OL].[2020-05-20].https://arxiv.org/pdf/1806.08342v1.pdf.
[12] HU J, SHEN L, ALBANIE S, et al.Squeeze-and-excitation networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8):2011-2023.
[13] PEEMEN M, SETIO A A A, MESMAN B, et al.Memory-centric accelerator design for convolutional neural networks[C]//Proceedings of the 10th International Conference on Distributed Smart Camera.Washington D.C., USA:IEEE Press, 2013:1-5.
[14] 卢冶, 陈瑶, 李涛, 等.面向边缘计算的嵌入式FPGA卷积神经网络构建方法[J].计算机研究与发展, 2018, 55(3):551-562. LU Z, CHEN Y, LI T, et al.Embedded FPGA convolutional neural network construction method for edge computing[J].Computer Research and Development, 2018, 55(3):551-562.(in Chinese)
[15] 李欣瑶, 刘飞阳, 文鹏程, 等.卷积神经网络的软硬件协同加速技术[J].航空兵器, 2021, 28(3):99-104. LI X Y, LIU F Y, WEN P C, et al.Convolutional neural network software and hardware coacceleration technology[J].Aircraft Weapon, 2021, 28(3):99-104.(in Chinese)
[16] SANKARADAS M, JAKKULA V, CADAMBI S, et al.A massively parallel coprocessor for convolutional neural networks[C]//Proceedings of the 20th IEEE International Conference on Application-Specific Systems, Architectures and Processors.Washington D.C., USA:IEEE Press, 2009:53-60.
[17] CADAMBI S, MAJUMDAR A, BECCHI M, et al.A programmable parallel accelerator for learning and classification[C]//Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques.Vienna, Austria:[s.n.], 2010:273-284.
[18] OVTCHAROV K, RUWASE O, KIM J Y, et al.Accelerating deep convolutional neural networks using specialized hardware[J].Microsoft Research Whitepaper, 2015, 2(11):11-14.
[19] MOTAMEDI M, GYSEL P, AKELLA V, et al.Design space exploration of FPGA-based deep convolutional neural networks[C]//Proceedings of the 21st Asia and South Pacific Design Automation Conference.Macao, China:[s.n.], 2016:575-580.
[20] HOWARD A, SANDLER M, CHEN B, et al.Searching for MobileNetV3[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2020:1-5.
[21] REDMON J, FARHADI A.Yolov3:an incremental improvement[EB/OL].[2020-05-20].https://arxiv.org/pdf/1804.02767v1.pdf.

选择文件类型/文献管理软件名称

选择包含的内容

轻量化神经网络加速器的设计与实现

Design and Implementation of Accelerator for Lightweight Neural Network

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	张磊, 赵光岳, 肖超恩, 王建新. Falcon后量子算法的密钥树生成部件GPU并行优化设计与实现[J]. 计算机工程, 2024, 50(9): 208-215.
[2]	杨太龙, 赵红朋, 张磊. 基于国产异构平台的奇异值分解法[J]. 计算机工程, 2024, 50(9): 216-225.
[3]	关明晓, 刘嘉堃, 张鸿锐, 何安平. 基于FPGA误差可控的浮点运算加速器研究[J]. 计算机工程, 2024, 50(5): 291-297.
[4]	雷斗威, 何德彪, 罗敏, 彭聪. 基于AVX512的格密码高速并行实现[J]. 计算机工程, 2024, 50(2): 15-24.
[5]	杨思捷, 陈俊奇, 王勇, 李树林. 基于FPGA的软硬件协同纠删码编码加速方案[J]. 计算机工程, 2024, 50(2): 224-231.
[6]	李宜亭, 屈丹, 杨绪魁, 张昊, 沈小龙. 基于分解门控注意力单元的高效Conformer模型[J]. 计算机工程, 2023, 49(5): 73-80.
[7]	王其涵, 庞建民, 岳峰, 祝迪, 沈莉, 肖谦. 面向申威架构的KNN并行算法实现与优化[J]. 计算机工程, 2023, 49(5): 286-294.
[8]	夏立斌, 刘晓宇, 姜晓巍, 孙功星. 基于分布式数据集的并行计算框架内存优化方法[J]. 计算机工程, 2023, 49(4): 43-51.
[9]	陈锐, 孙羽菲, 郭强, 隋轶丞, 周振辉, 石昌青, 张玉志. OclDNN:一种可应用于TensorFlow的通用DNN库[J]. 计算机工程, 2023, 49(4): 138-148.
[10]	郭奕裕, 周箩鱼. 安全帽佩戴检测网络模型的轻量化设计[J]. 计算机工程, 2023, 49(4): 312-320.
[11]	陈逸, 刘博生, 徐永祺, 武继刚. 混合精度频域卷积神经网络FPGA加速器设计[J]. 计算机工程, 2023, 49(12): 1-9.
[12]	洪起润, 王琴. 基于帧间数据复用的稀疏CNN加速器设计[J]. 计算机工程, 2023, 49(12): 55-62.
[13]	房俊, 薛晓东, 周云亮. 基于深度生成模型的聚合查询区间估计方法[J]. 计算机工程, 2023, 49(11): 284-292, 301.
[14]	王国栋, 叶剑, 谢萦, 钱跃良. 基于梯度的自适应阈值结构化剪枝算法[J]. 计算机工程, 2022, 48(9): 113-120.
[15]	黄正伟, 刘宏伟, 徐渊. 用于IToF传感器的极低功耗RISC-V专用处理器设计[J]. 计算机工程, 2022, 48(9): 146-154.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

轻量化神经网络加速器的设计与实现

Design and Implementation of Accelerator for Lightweight Neural Network

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献

相关文章 15

编辑推荐

Metrics

本文评价