作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (4): 138-148. doi: 10.19678/j.issn.1000-3428.0064142

• 先进计算与数据处理 • 上一篇    下一篇

OclDNN:一种可应用于TensorFlow的通用DNN库

陈锐1, 孙羽菲1,2, 郭强1, 隋轶丞1, 周振辉1, 石昌青1, 张玉志1,2   

  1. 1. 南开大学 软件学院, 天津 300457;
    2. 先进计算与关键软件海河实验室, 天津 300459
  • 收稿日期:2022-03-09 修回日期:2022-04-14 发布日期:2022-07-25
  • 作者简介:陈锐(1991-),男,博士研究生,主研方向为深度学习、异构计算;孙羽菲,特聘研究员;郭强,硕士研究生;隋轶丞,博士研究生;周振辉、石昌青,硕士研究生;张玉志,教授。
  • 基金资助:
    国家重点研发计划(2021YFB0300104)。

OclDNN: A General-Purpose DNN Library for TensorFlow

CHEN Rui1, SUN Yufei1,2, GUO Qiang1, SUI Yicheng1, ZHOU Zhenhui1, SHI Changqing1, ZHANG Yuzhi1,2   

  1. 1. College of Software, Nankai University, Tianjin 300457, China;
    2. Haihe Lab of ITAI, Tianjin 300459, China
  • Received:2022-03-09 Revised:2022-04-14 Published:2022-07-25

摘要: 深度学习模型的构建、训练以及推理离不开TensorFlow等机器学习框架中深度学习算子的支撑,对于卷积、池化等深度学习中被高频调用或计算量较大的算子,机器学习框架一般通过调用深度神经网络(DNN)库来提升计算效能。现有DNN库主要由英伟达、AMD等少数国外厂商开发并根据自有硬件设备特点进行优化,但其封闭性导致其他厂商生产的通用加速器难以在深度学习领域发挥作用。为解决现有DNN库无法支持国产加速器的问题,使得深度学习模型能够调用国产加速器进行运算,研究跨平台的通用DNN库,通过对开源MIOpen的结构特点和调用方式进行分析,提出修改和重构该库的方法,并实现一种基于OpenCL的DNN(OclDNN)库。考虑到TensorFlow较高的流行度及其对DNN库调用的特殊性与复杂性,研究通用DNN库在TensorFlow中的集成方法,通过StreamExecutor中的OpenCL平台实现对OclDNN的调用。实验结果表明,OclDNN在英伟达、华为等不同厂商的计算设备上运算结果正确可靠,在相同实验环境下,深度学习算子使用OclDNN时的加速性能比传统CPU并行算法提升了5~60倍。

关键词: 深度神经网络库, 深度学习, 开放计算语言, 硬件加速器, TensorFlow框架

Abstract: In machine learning frameworks such as TensorFlow, the construction, training, and reasoning of deep learning models rely on the support of deep learning operators.The efficiency of frequently used or computationally heavy deep learning operators, such as convolution and pooling, is improved in machine learning frameworks by using Deep Neural Network(DNN) libraries.The existing DNN libraries are mainly developed by a few hardware manufacturers such as Nvidia and AMD and optimized particularly for the characteristics of their own hardware equipment.Consequently, it is challenging for other hardware manufacturers to explore deep learning domain using the existing DNN libraries.To address this challenge faced by the domestic hardware accelerators and to enable deep learning models to use this hardware easily for calculations, a cross-platform generic DNN library was studied.The analysis of the structural characteristics and call methods of the open-source library MIOpen was performed.On the basis of the results, a method of modifying and reconstructing the library was developed, and OclDNN, a DNN library based on OpenCL, was implemented.Considering the high popularity of TensorFlow and the specificity and complexity of calling DNN libraries, the method of integrating generic DNN libraries in TensorFlow was studied.The call to OclDNN was implemented using the OpenCL platform in StreamExecutor.The experimental results indicate that OclDNN operates correctly and reliably on the computing devices from Nvidia, Huawei, and other hardware manufacturers.Moreover, the acceleration performance of deep learning operators using OclDNN is 5-60 times higher than that of the traditional CPU parallel algorithm.

Key words: Deep Neural Network(DNN) library, deep learning, Open Computing Language(OpenCL), hardware accelerator, TensorFlow framework

中图分类号: