一种面向神威·太湖之光的通用并行卷积算法

doi:10.19678/j.issn.1000-3428.0053855

计算机工程 ›› 2019, Vol. 45 ›› Issue (12): 153-159. doi: 10.19678/j.issn.1000-3428.0053855

一种面向神威·太湖之光的通用并行卷积算法

舒嘉明, 安虹, 武铮, 陈俊仕

中国科学技术大学计算机科学与技术学院, 合肥 230000

收稿日期:2019-01-30 修回日期:2019-04-04 发布日期:2019-05-22
作者简介:舒嘉明(1995-),男,硕士研究生,主研方向为深度学习、计算机系统结构、高性能计算;安虹,教授、博士;武铮、陈俊仕,博士研究生。
基金资助:
国家重点研发计划（2016YFB1000403）。

A General Parallel Convolution Algorithm for Sunway Taihu Light

SHU Jiaming, AN Hong, WU Zheng, CHEN Junshi

School of Computer Science and Technology, University of Science and Technology of China, Hefei 230000, China

Received:2019-01-30 Revised:2019-04-04 Published:2019-05-22

摘要/Abstract

摘要： 神威·太湖之光深度学习库中的并行卷积算法存在批量受限的问题，且传统gemm卷积算法在其硬件架构上效率较低。基于申威异构众核处理器，提出一种无批量限制的通用并行卷积算法。结合异步DMA访存操作和从核间的寄存器通信，使用数据重用和软件流水等方法降低从核访存开销，利用手动向量化的方法充分发挥从核浮点的计算能力。实验结果表明，与基础7层循环算法、gemm算法和Intel平台上的MKL-DNN算法相比，该算法的加速性能较好。

关键词: 神威·太湖之光, 卷积神经网络, 数据重用, 软件流水, 批量受限

Abstract: The parallel convolution algorithm in the deep learning library of Sunway Taihu Light has the problem of batch limitation,and the traditional gemm convolution algorithm is inefficient for its hardware architecture.In order to solve the above problems,a general parallel convolution algorithm without batch limitation based on Sunway heterogeneous multi-core processor is proposed.Combined with asynchronous DMA fetch operations and inter-core register communication,the algorithm communication overhead is reduced by means of data reuse and software pipelining,and the floating point caculation performance of the slave core is fully utilized by using manual vectorization.Experimental results show that compared with the basic 7-layer loop algorithm,gemm algorithm and MKL-DNN algorithm on Intel platform,the acceleration performace of the proposed algorithm is better.

Key words: Sunway Taihu Light, Convolutional Neural Network(CNN), data reuse, software pipelining, batchlimitation

中图分类号:

TP38

舒嘉明, 安虹, 武铮, 陈俊仕. 一种面向神威·太湖之光的通用并行卷积算法[J]. 计算机工程, 2019, 45(12): 153-159.

SHU Jiaming, AN Hong, WU Zheng, CHEN Junshi. A General Parallel Convolution Algorithm for Sunway Taihu Light[J]. Computer Engineering, 2019, 45(12): 153-159.

https://www.ecice06.com/CN/Y2019/V45/I12/153

图/表 8

20191214132756

20191214132759

20191214132802

20191214132804

20191214132806

20191214132809

20191214132812

20191214132814

参考文献

[1] GIRSHICK R.Fast R-CNN[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2015:1440-1448.
[2] ZHOU Feiyan,JIN Linpeng,DONG Jun.Review of convolutional neural network[J].Chinese Journal of Computers,2017,40(6):1229-1251.(in Chinese)周飞燕,金林鹏,董军.卷积神经网络研究综述[J].计算机学报,2017,40(6):1229-1251.
[3] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Playing Atari with deep reinforcement learning[EB/OL].[2019-01-05].https://arxiv.org/pdf/1312.5602v1.pdf.
[4] BOJARSKI M,DEL TESTA D,DWORAKOWSKI D,et al.End to end learning for self-driving cars[EB/OL].[2019-01-05].https://arxiv.org/pdf/1604.07316.pdf.
[5] ROSKA T,CHUA L O.The CNN universal machine:an analogic array computer[J].IEEE Transactions on Circuits and Systems II:Analog and Digital Signal Processing,2015,40(3):163-173.
[6] CHEN Hongcai,CHENG Yu,ZHANG Changyou.Application of convolutional neural network in vehicle target detection[J].Journal of Software,2017,28(S1):107-114.(in Chinese)陈宏彩,程煜,张常有.卷积神经网络在车辆目标快速检测中的应用[J].软件学报,2017,28(S1):107-114.
[7] LIAN Yiya,WU Xiaojun.Research on image super-resolution reconstruction of super deep convolutional neural network[J].Computer Engineering,2019,45(1):217-220.(in Chinese)连逸亚,吴小俊.超深卷积神经网络的图像超分辨率重建研究[J].计算机工程,2019,45(1):217-220.
[8] SZEGEDY C,IOFFE S,VANHOUCKE V,et al.Inception-v4,inception-resnet and the impact of residual connections on learning[EB/OL].[2019-01-05].https://arxiv.org/pdf/1602.07261.pdf.
[9] GUPTA S,ZHANG Wei,WANG Fei.Model accuracy and runtime tradeoff in distributed deep learning:a systematic study[C]//Proceedings of 2016 IEEE International Conference on Data Mining.Washington D.C.,USA:IEEE Press,2016:171-180.
[10] KRIZHEVSKY A.One weird trick for parallelizing convolutional neural networks[EB/OL].[2019-01-05].https://arxiv.org/pdf/1404.5997.pdf.
[11] FU Haohuan,LIAO Junfeng,YANG Jinzhe,et al.The Sunway Taihu Light supercomputer:system and applications[J].Science China Information Sciences,2016,59:1-16.
[12] FANG Jiarui,FU Haohuan,ZHAO Wenlai,et al.swDNN:a library foraccelerating deep learning applications on Sunway Taihu Light[C]//Proceedings of 2017 IEEE International Parallel and Distributed Processing Symposium.Washington D.C.,USA:IEEE Press,2017:615-624.
[13] JIA Yangqing,SHELHAMER E,DONAHUE J,et al.Caffe:convolutional architecture for fast feature embedding[EB/OL].[2019-01-05].https://arxiv.org/pdf/1408.5093.pdf,2014:675-678.
[14] YU Yang,AN Hong,CHEN Junshi,et al.Pipelining computation and optimization strategies for scaling GROMACS on the Sunway many-core processor[C]//Proceedings of International Conference on Algorithms and Architectures for Parallel Processing.Berlin,Germany:Springer,2017:18-32.
[15] LAVIN A,GRAY S.Fast algorithms for convolutional neural networks[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:4013-4021.
[16] MATHIEU M,HENAFF M,LECUN Y.Fast training of convolutional networks through FFTs[EB/OL].[2019-01-05].https://arxiv.org/pdf/1312.5851.pdf.
[17] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.Imagenet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2012:1097-1105.
[18] SZEGEDY C,LIU Wei,JIA Yangqing,et al.Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:1-9.
[19] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-01-05].https://arxiv.org/pdf/1409.1556.pdf.
[20] LECUN Y,BOTTOU L,BENGIO Y,et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE,1998,86(11):2278-2324.
[21] RONNEBERGER O,FISCHER P,BROX T.U-net:convolutional networks for biomedical image segmentation[C]//Proceedings of International Conference on Medical Image Computing and Computer-assisted Intervention.Berlin,Germany:Springer,2015:234-241.
[22] CHETLUR S,WOOLLEY C,VANDERMERSCH P,et al.cuDNN:efficient primitives for deep learning[EB/OL].[2019-01-05].https://arxiv.org/pdf/1410.0759.pdf.
[23] FANG Jiarui,FU Haohuan,JIANG Jinlei,et al.swCaffe:a parallel framework for accelerating deep learning applications on Sunway Taihu Light[C]//Proceedings of 2018 IEEE International Conference on Cluster Computing.Washington D.C.,USA:IEEE Press,2018:413-422.
[24] WANG Endong,ZHANG Qing,SHEN Bo,et al.High-performance computing on the Intel® Xeon Phi[M].Berlin,Germany:Springer,2014:167-188.

选择文件类型/文献管理软件名称

选择包含的内容

一种面向神威·太湖之光的通用并行卷积算法

A General Parallel Convolution Algorithm for Sunway Taihu Light

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王志浩, 钱沄涛. 基于Swin Transformer的双流遥感图像时空融合超分辨率重建[J]. 计算机工程, 2024, 50(9): 33-45.
[2]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[3]	张鲁, 田春伟, 宋焕生, 刘侍刚. 用于低剂量CT图像去噪的多级双树复小波网络[J]. 计算机工程, 2024, 50(9): 266-275.
[4]	高煜宝, 文志诚. 基于注意力机制的双路解码器图像去噪方法[J]. 计算机工程, 2024, 50(9): 324-332.
[5]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[6]	耿丽丽, 牛保宁. 基于通道相似度熵的卷积神经网络裁剪[J]. 计算机工程, 2024, 50(7): 133-143.
[7]	张洋, 刘畅, 李少青. 基于可控制性度量的图神经网络门级硬件木马检测方法[J]. 计算机工程, 2024, 50(7): 164-173.
[8]	牛瑞婷, 严天峰, 高锐, 王映植. 低信噪比下基于深度学习TCNN-MobileNet的调制识别[J]. 计算机工程, 2024, 50(7): 204-215.
[9]	张溢文, 蔡满春, 陈咏豪, 朱懿, 姚利峰. 融合空间特征的多尺度深度伪造检测方法[J]. 计算机工程, 2024, 50(7): 240-250.
[10]	逯焕宇, 张永宏, 马光义, 谢东林, 田伟. 基于半监督对抗学习的遥感图像水体提取[J]. 计算机工程, 2024, 50(7): 251-263.
[11]	于洋, 孙芳芳, 吕华, 李扬, 王晓民. 基于多尺度时空注意力网络的微表情检测方法[J]. 计算机工程, 2024, 50(6): 228-235.
[12]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[13]	张雷, 沈国琛, 欧冬秀. 用于热成像数据的卷积神经网络特征图筛选方法[J]. 计算机工程, 2024, 50(4): 31-40.
[14]	李政学, 李枝名, 彭德中, 陈杰. 基于特征对比学习和图卷积的社交网络用户分类[J]. 计算机工程, 2024, 50(4): 258-266.
[15]	姜百浩, 刘静, 仇大伟, 姜良. 深度学习在脊柱图像分割中的应用综述[J]. 计算机工程, 2024, 50(3): 1-15.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

一种面向神威·太湖之光的通用并行卷积算法

A General Parallel Convolution Algorithm for Sunway Taihu Light

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 8

参考文献

相关文章 15

编辑推荐

Metrics

本文评价