基于矩阵转换的卷积计算优化方法

doi:10.19678/j.issn.1000-3428.0051507

计算机工程 ›› 2019, Vol. 45 ›› Issue (7): 217-221,228. doi: 10.19678/j.issn.1000-3428.0051507

基于矩阵转换的卷积计算优化方法

方玉玲^a, 陈庆奎^a,b

上海理工大学 a. 管理学院;b. 光电信息与计算机工程学院, 上海 200093

收稿日期:2018-05-09 修回日期:2018-06-11 出版日期:2019-07-15 发布日期:2019-07-23
作者简介:方玉玲(1990-),女,博士研究生,主研方向为计算机视觉、并行计算、GPU集群可靠性分析;陈庆奎,教授、博士、博士生导师。
基金资助:
国家自然科学基金（61572325，60970012）；高等学校博士学科点专项科研博导基金（20113120110008）；上海重点科技攻关项目（14511107902，16DZ1203603）；上海市工程中心建设项目（GCZX14014）；上海智能家居大规模物联共性技术工程中心项目（GCZX14014）；上海市一流学科建设项目（XTKX2012）；沪江基金研究基地专项（C14001）。

Convolution Calculation Optimization Method Based on Matrix Transformation

FANG Yuling^a, CHEN Qingkui^a,b

a. Business School;b. School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China

Received:2018-05-09 Revised:2018-06-11 Online:2019-07-15 Published:2019-07-23

摘要/Abstract

摘要： 提出一种基于矩阵转换的高效卷积计算优化方法MCFA。根据输出矩阵的宽度和卷积核大小对输入矩阵进行分块，通过im2col方法转换输入矩阵子块和核函数矩阵，利用计算统一设备架构中封装的矩阵-矩阵乘法加速库提升卷积计算的速度。在此基础上，将输出子块按序排列，最终得到完整的输出矩阵。实验结果证明，该方法相比im2col方法能节省61.25%的计算空间，相比MEC方法能提高20.57%的计算速度，且在分块情况下可以缓解大输入矩阵引起的缓存压力，提高缓存利用率。

关键词: 深度学习, 卷积计算, 直接卷积, 矩阵分块, 计算统一设备架构, 卷积优化

Abstract: An efficient convolution calculation optimization method MCFA based on matrix transformation is proposed.The input matrix is divided into blocks according to the width and the convolution core size of the output matrix.The input matrix sub-blocks and the core function matrix are transformed by im2col method.The matrix-matrix multiplication library encapsulated in the Computing Unified Device Architecture(CUDA) is used to speed up the convolution calculation.On this basis,the output sub-blocks are arranged in order,and the complete output matrix is finally obtained.Experimental results show that this method can save 61.25% of the computing space compared with im2col method,improve 20.57% of the computing speed compared with MEC method,and relieve the cathe pressure caused by large input matrix in the case of block,thus improve the cache utilization.

Key words: deep learning, convolution calculation, direct convolution, matrix blocking, Computing Unified Device Architecture(CUDA), convolution optimization

中图分类号:

TP391

方玉玲, 陈庆奎. 基于矩阵转换的卷积计算优化方法[J]. 计算机工程, 2019, 45(7): 217-221,228.

FANG Yuling, CHEN Qingkui. Convolution Calculation Optimization Method Based on Matrix Transformation[J]. Computer Engineering, 2019, 45(7): 217-221,228.

http://www.ecice06.com/CN/Y2019/V45/I7/217

参考文献

[1] DALAL N,TRIGGS B.Histograms of oriented gradients for human detection[C]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Computer Society,2005:886-893.
[2] ZHOU Huiyu,YUAN Yuan,SHI Chunmei.Object tracking using SIFT features and mean shift[J].Computer Vision and Image Understanding,2009,113(3):345-352.
[3] SHARIF R A,AZIZPOUR H,SULLIVAN J,et al.CNN features off-the-shelf:an astounding baseline for recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops.Washington D.C.,USA:IEEE Press,2014:156-163.
[4] 王晓晖,盛斌,申瑞民.基于深度学习的深度图超分辨率采样[J].计算机工程,2017,43(11):252-260.
[5] 李传朋,秦品乐,张晋京.基于深度卷积神经网络的图像去噪研究[J].计算机工程,2017,43(3):253-260.
[6] 周飞燕,金林鹏,董军.卷积神经网络研究综述[J].计算机学报,2017,40(6):1229-1251.
[7] YANG Fan,CHOI W,LIN Yuanqing.Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:236-243.
[8] WANG Xiaolong,SHRIVASTAVA A,GUPTA A.A-fast-RCNN: hard positive generation via adversary for object detection[EB/OL].[2018-04-29].https://arxiv.org/pdf/1704.03414.pdf.
[9] JIA Yangqing,SHELHAMER E,DONAHUE J,et al.Caffe: convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia.New York,USA:ACM Press,2014:269-280.
[10] CHO M,BRAND D.MEC:memory-efficient convolu-tion for deep neural network[EB/OL].[2018-04-25].https://arxiv.org/pdf/1706.06873.pdf.
[11] BERGSTRA J,BASTIEN F,BREULEUX O,et al.Theano:deep learning on GPUs with Python[EB/OL].[2018-04-25].http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.678.1889&rep=rep1&type=pdf.
[12] CHETLUR S,WOOLLEY C,VANDERMERSCH P,et al.cuDNN:efficient primitives for deep learning[EB/OL].[2018-04-25].https://arxiv.org/pdf/1410.0759.pdf.
[13] JIA Yangqing.Learning semantic image representations at a large scale[EB/OL].[2018-04-26].https://cloudfront.escholarship.org/dist/prd/content/qt64c2v6sn/qt64c2v6sn.pdf.
[14] ZEE F G V.BLIS:a framework for rapidly instantiating BLAS functionality[J].ACM Transactions on Mathematical Software,2013,41(3):1-33.
[15] CIRE AN D C,MEIER U,MASCI J,et al.High-performance neural networks for visual object classification[EB/OL].[2018-04-26].https://arxiv.org/pdf/1102.0183.pdf.
[16] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2018-04-28].https://arxiv.org/pdf/1409.1556.pdf.
[17] WINOGRAD S.Arithmetic complexity of computations[M].[S.l.]:Society for Industrial and Applied Mathematics,1980.
[18] VASILACHE N,ZINENKO O,THEODORIDIS T,et al.Tensor comprehensions: framework-agnostic high-performance machine learning abstractions[EB/OL].[2018-04-25].https://arxiv.org/pdf/1802.04730.pdf.
[19] NVIDIA C.CUBLAS library[EB/OL].[2018-04-28].https://arcb.csc.ncsu.edu/~mueller/cluster/nvidia/0.8/NVIDIA_CUBLAS_Library_0.8.pdf.

选择文件类型/文献管理软件名称

选择包含的内容

基于矩阵转换的卷积计算优化方法

Convolution Calculation Optimization Method Based on Matrix Transformation

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[2]	李泽水, 冀俊忠, 杨翠翠. 基于边权重信息深度网络嵌入的PPIN功能模块检测[J]. 计算机工程, 2023, 49(8): 69-76.
[3]	王可铮, 徐玉芬, 周尚波. 结合对比感知损失和融合注意力的图像去雾模型[J]. 计算机工程, 2023, 49(8): 207-214.
[4]	刘俊豪, 王美林, 谢兴, 宋烨兴, 许莉花. 基于改进YOLOv5的皮革瑕疵检测算法[J]. 计算机工程, 2023, 49(8): 240-249.
[5]	闫兴亚, 匡娅茜, 白光睿, 李月. 基于深度学习的学生课堂行为识别方法[J]. 计算机工程, 2023, 49(7): 251-258.
[6]	李军侠, 王星驰, 殷梓, 石德硕. 边缘深度挖掘的弱监督显著性目标检测[J]. 计算机工程, 2023, 49(7): 169-178.
[7]	吴珊, 周凤. 基于改进SSD算法的小目标检测[J]. 计算机工程, 2023, 49(7): 179-188.
[8]	席建锐, 唐红梅, 梁春阳, 刘鑫. 基于改进隐函数的点云物体重建[J]. 计算机工程, 2023, 49(7): 214-222.
[9]	齐咏生, 杜晓旭, 朱俊峰, 高胜利, 刘利强. 基于增强型轻量深度网络的牧区牲畜高效检测[J]. 计算机工程, 2023, 49(7): 278-287.
[10]	谌雨章, 黄逸姿, 张钧涵. 基于多速率空洞卷积的多尺度水下小目标检测[J]. 计算机工程, 2023, 49(6): 257-264.
[11]	张博旭, 蒲智, 程曦. 基于提示学习的维吾尔语文本分类研究[J]. 计算机工程, 2023, 49(6): 292-299,313.
[12]	于海洋, 景鹏, 张文涛, 谢赛飞, 滑志华, 宋草原. 基于残差与注意力机制的道路裂缝检测U-Net改进模型[J]. 计算机工程, 2023, 49(6): 265-273.
[13]	王爱玲, 马文臻, 邹自明, 钟佳. 基于领域自适应的卫星工程参数异常检测[J]. 计算机工程, 2023, 49(5): 29-37,47.
[14]	李静雯, 赵奎. 基于改进PCFG算法的口令猜测方法[J]. 计算机工程, 2023, 49(5): 38-47.
[15]	唐彦, 卢镘旭. 基于知识图谱与深度涟漪网络的推荐系统[J]. 计算机工程, 2023, 49(5): 63-72,80.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于矩阵转换的卷积计算优化方法

Convolution Calculation Optimization Method Based on Matrix Transformation

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价