基于向量转换的卷积计算优化方法

doi:10.19678/j.issn.1000-3428.0069199

摘要/Abstract

摘要：

针对卷积计算中的效率问题，提出卷积计算优化方法OAC。该研究的主要目的在于提高卷积计算的效率，以应对深度学习领域对卷积计算速度不断增大的需求。在该技术实现过程中，OAC方法以向量转换为基础，采取一系列巧妙的步骤来优化卷积计算。首先，通过逐行取值的方式将输入矩阵连接成一个向量；然后，对卷积核进行拉伸变换，并根据输入矩阵的宽度和卷积核的大小在适当位置进行补零，形成另一个向量，这一转换的设计旨在和输入矩阵转换后的向量能够进行正确计算，最大程度地减少计算过程中的冗余操作，从而提高效率；最后，结合一些其他的优化手段对向量计算进行加速。实验结果表明，与传统MEC方法相比，OAC方法的计算速度提高了58.9%，与im2col方法相比，计算速度提升90.1%，内存占用相比于MEC方法减少了53.7%。OAC方法不仅在计算效率上取得了显著成果，而且为深度学习等计算任务提供了高效可行的解决方案。

关键词: 深度学习, 卷积计算, 卷积优化, 向量转换, 加速库

Abstract:

To solve efficiency problems in convolution calculations, this paper proposes a convolution calculation optimization method OAC. The objective is to improve the efficiency of convolution calculations to address the increasing demand for high convolution calculation speed in fields such as deep learning. The OAC method is based on vector conversion and involves a series of ingenious steps to optimize convolution calculations. First, the input matrix is concatenated row-by-row into a vector. Subsequently, the convolution kernel is stretched and transformed, and zeroes are padded at appropriate positions according to the width of the input matrix and size of the convolution kernel to form another vector. This transformation is designed to perform correct calculations with the transformed vectors of the input matrix and minimize redundant operations in the calculation process, thereby improving efficiency. Finally, other optimization methods are combined to accelerate the vector calculations. Experimental results show that the calculation speed of the OAC method is 58.9% and 90.1% higher than that of the traditional MEC method and the im2col method. Further, the memory usage is reduced by 53.7% compared with that of the MEC method. The OAC method has not only achieved significant results in computational efficiency, but also provided efficient and feasible solutions for computing tasks such as deep learning scheme.

Key words: deep learning, convolutional computation, convolutional optimization, vector transformation, acceleration library

王培吉, 邹承明. 基于向量转换的卷积计算优化方法[J]. 计算机工程, 2025, 51(6): 74-82.

WANG Peiji, ZOU Chengming. Optimization Method for Convolutional Computing Based on Vector Transformation[J]. Computer Engineering, 2025, 51(6): 74-82.

https://www.ecice06.com/CN/Y2025/V51/I6/74

图/表 11

图1 直接卷积、im2col卷积和MEC卷积

Fig.1 Direct convolution, im2col convolution, and MEC convolution

图2 3种卷积方法的演化过程

Fig.2 Evolution process of three convolutional methods

图3 OAC方法整个卷积过程

Fig.3 The entire convolution process of OAC method

图4 3种卷积方法的内存开销柱状图比较

Fig.4 Comparison of memory overhead bar charts for three convolutional methods

图5 3种卷积方法的运行时间柱状图比较

Fig.5 Comparison of running time bar charts for three convolutional methods

图6 2种OAC卷积方法的运行时间柱状图比较

Fig.6 Comparison of running time bar charts between two OAC convolutional methods

参考文献 26

1	ZOPH B, VASUDEVAN V, SHLENS J, et al. Learning transferable architectures for scalable image recognition[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington D. C., USA: IEEE Press, 2018: 11-18.
2	CHIU C C, SAINATH T N, WU Y H, et al. State-of-the-art speech recognition with sequence-to-sequence models[C]// Proceedings of 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. Washington D. C., USA: IEEE Press, 2018: 45-51.
3	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[C]//Proceedings of Conference on the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, USA: Association for Computational Linguistics, 2019: 3498-4195.
4	王霞. 协同过滤在电子商务推荐系统中的应用研究[D]. 西安: 西北大学, 2003.
	WANG X. Research on the application of collaborative filtering in e-commerce recommendation systems[D]. Xi'an: Northwest University, 2003. (in Chinese)
5	KUNEGIS J, SCHMIDT S. Collaborative filtering using electrical resistance network models[C]//Proceedings of the 7th Industrial Conference on Advances in Data Mining. Berlin, Germany: Springer, 2007: 269-282.
6	COOPER C, LEE S H, RADZIK T, et al. Random walks in recommender systems: exact computation and simulations[C]//Proceedings of the 23rd International Conference on World Wide Web. New York, USA: ACM Press, 2014: 811-816.
7	罗辛, 欧阳元新, 熊璋, 等. 通过相似度支持度优化基于K近邻的协同过滤算法. 计算机学报, 2010, 33 (8): 1437- 1445. doi: 10.3724/SP.J.1016.2010.01437
	LUO X , OU YANG Y X , XIONG Z , et al. The effect of similarity support in K-nearest-neighborhoog based collaborative filtering. Chinese Journal of Computers, 2010, 33 (8): 1437- 1445. doi: 10.3724/SP.J.1016.2010.01437
8	PENG J, ZENG D J, ZHAO H M, et al. Collaborative filtering in social tagging systems based on joint item-tag recommendations[C]//Proceedings of the 19th ACM International Conference on Information and Knowledge Management. New York, USA: ACM Press, 2010: 809-818.
9	CLAYPOOL M, GOKHALE A, MIRANDA T, et al. Combining content-based and collaborative filters in online newspaper[C]//Proceedings of ACM SIGIR Workshop on Recommender Systems. New York, USA: ACM Press, 1999: 40-48.
10	PAZZANI M J . A framework for collaborative, contentbased and demographic filtering. Artificial Intelligence Review, 1999, 13 (5/6): 393- 408.
11	GEORGANAS E, AVANCHA S, BANERJEE K, et al. Anatomy of high-performance deep learning convolutions on SIMD architectures[C]//Proceedings of the SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. Washington D. C., USA: IEEE Press, 2018: 830-841.
12	李创, 刘宗林, 刘胜, 等. 快速卷积算法的综述研究. 计算机工程与科学, 2021, 43 (10): 1711- 1719. doi: 10.3969/j.issn.1007-130X.2021.10.001
	LI C , LIU Z L , LIU S , et al. A survey of fast convolution algorithms. Computer Engineering and Science, 2021, 43 (10): 1711- 1719. doi: 10.3969/j.issn.1007-130X.2021.10.001
13	JEH G, VWIDOM J. SimRank: a measure of structuralcontext similarity[C]//Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2002: 538-543.
14	BELL R M, KOREN Y. Improved neighborhood-based collaborative filtering[EB/OL]. [2023-12-03]. https://www.ixueshu.com/document/eaab2fa803bce4da318947a18e7f9386.html.
15	CHO M, BRAND D. MEC: memory-efficient convolution for deep neural network[C]//Proceedings of the 34th International Conference on Machine Learning. New York, USA: ACM Press, 2017: 815-824.
16	王朝闻, 蒋林, 李远成, 等. 基于TVM平台的MEC卷积算法优化. 计算机工程与应用, 2023, 59 (1): 180- 186. doi: 10.3778/j.issn.1002-8331.2106-0502
	WANG Z W , JIANG L , LI Y C , et al. Optimization of MEC convolution algorithm based on TVM platform. Computer Engineering and Application, 2023, 59 (1): 180- 186. doi: 10.3778/j.issn.1002-8331.2106-0502
17	李叶, 毛伊敏, 陈志刚. 基于Winograd卷积的并行深度卷积神经网络优化算法. 信息与控制, 2023, 52 (4): 466- 482. doi: 10.13976/j.cnki.xk.2023.2270
	LI Y , MAO Y M , CHEN Z G . Winograd-based parallel deep convolutional neural network optimization algorithm. Information and Control, 2023, 52 (4): 466- 482. doi: 10.13976/j.cnki.xk.2023.2270
18	KOLDA T G, SUN J M. Scalable Tensor decompositions for multi-aspect data mining[C]// Proceedings of the 8th IEEE International Conference on Data Mining. Washington D. C., USA: IEEE Press, 2008: 363-372.
19	苗瑞霞, 张雪兰, 谭星浩, 等. 基于RISC-V的神经网络卷积算法的研究与优化. 计算机工程与设计, 2022, 43 (3): 668- 676. doi: 10.16208/j.issn1000-7024.2022.03.010
	MIAO R X , ZHANG X L , TAN X H , et al. Research and optimization of neural network convolution algorithm for RISC-V. Computer Engineering and Design, 2022, 43 (3): 668- 676. doi: 10.16208/j.issn1000-7024.2022.03.010
20	李叶. 基于MapReduce的并行深度卷积神经网络优化算法的研究[D]. 赣州: 江西理工大学, 2022.
	LI Y. Distributed deep convolution neural network optimization algorithm based on MapReduce[D]. Ganzhou: Jiangxi University of Science and Technology, 2022. (in Chinese)
21	李茂文, 曲国远, 魏大洲, 等. 面向GPU计算平台的神经网络卷积性能优化. 计算机研究与发展, 2022, 59 (6): 1181- 1191. doi: 10.7544/issn1000-1239.20200985
	LI M W , QU G Y , WEI D Z , et al. Performance optimization of neural network convolution based on GPU platform. Journal of Computer Research and Development, 2022, 59 (6): 1181- 1191. doi: 10.7544/issn1000-1239.20200985
22	方玉玲, 陈庆奎. 基于矩阵转换的卷积计算优化方法. 计算机工程, 2019, 45 (7): 217-221, 228. doi: 10.19678/j.issn.1000-3428.0051507
	FANG Y L , CHEN Q K . Convolution calculation optimization method based on matrix transformation. Computer Engineering, 2019, 45 (7): 217-221, 228. doi: 10.19678/j.issn.1000-3428.0051507
23	武铮, 安虹, 金旭, 等. 基于Intel平台的Winograd快速卷积算法研究与优化. 计算机研究与发展, 2019, 56 (4): 825- 835. doi: 10.7544/issn1000-1239.2019.20170932
	WU Z , AN H , JIN X , et al. Research and optimization of fast convolution algorithm Winograd on Intel platform. Journal of Computer Research and Development, 2019, 56 (4): 825- 835. doi: 10.7544/issn1000-1239.2019.20170932
24	LIU Y Z, WANG Y, YU R F, et al. Optimizing CNN model inference on CPUs[C]//Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference. New York, USA: ACM Press, 2019: 1025-1040.
25	苗鑫, 周欢欢, 陆栋洵. 基于ZCU102 DSP的CNN卷积运算加速方法. 自动化技术与应用, 2022, 41 (12): 64- 67. doi: 10.20033/j.1003-7241.(2022)12-0064-04
	MIAO X , ZHOU H H , LU D X . Acceleration method for CNN convolution based on ZCU102 DSP. Techniques of Automation and Applications, 2022, 41 (12): 64- 67. doi: 10.20033/j.1003-7241.(2022)12-0064-04
26	MELVILLE P, MOONEY R J, NAGARAJAN R. Content-boosted collaborative filtering for improved recommendations[C]//Proceedings of the 8th National Conference on Artificial Intelligence. [S. l.]: AAAI Press, 2002: 187-192.

[1]	秦永旺, 张洋, 胡星, 刘胜, 李少青. 基于图注意力网络的门级网表功能识别[J]. 计算机工程, 2025, 51(6): 29-37.
[2]	庞鑫, 葛凤培, 李艳玲. 声景识音：数字化时代声学场景分类的探索与前沿[J]. 计算机工程, 2025, 51(6): 1-19.
[3]	廖丁丁, 刘俊峰, 曾君, 邱晓欢. 一种基于块平均正交权重修正的连续学习算法[J]. 计算机工程, 2025, 51(6): 57-64.
[4]	陈思帆, 杨家志, 黄琳, 吕志玮, 沈露. 融合可变形核和自注意力的点云分类分割边卷积网络[J]. 计算机工程, 2025, 51(6): 146-154.
[5]	曹蓓, 赵奎. 基于双重情感和多特征融合的虚假新闻检测[J]. 计算机工程, 2025, 51(6): 193-203.
[6]	郝志峰, 黎阳霖, 许柏炎, 蔡瑞初. 面向跨域自然语言生成SQL语句的超图神经网络[J]. 计算机工程, 2025, 51(5): 114-123.
[7]	魏铭康, 李嘉楠, 韩林, 高伟, 赵荣彩, 王洪生. 面向深度学习编译器的多粒度量化框架支持与优化[J]. 计算机工程, 2025, 51(5): 62-72.
[8]	赵瑶谦, 滕奇志, 何小海, 税爱, 陈洪刚. 基于自注意力特征蒸馏的轻量级图像超分辨率重建[J]. 计算机工程, 2025, 51(5): 257-265.
[9]	庄紫薇, 朱俊国. 面向多源文本的越南语文本检错方法[J]. 计算机工程, 2025, 51(5): 93-102.
[10]	李丹丹, 李智, 郑龙, 张丽. 面向弥散张量图像的鲁棒可逆水印算法[J]. 计算机工程, 2025, 51(5): 279-287.
[11]	蒋杰平, 王明文. 基于时空置换注意力机制的残差行为识别模型[J]. 计算机工程, 2025, 51(4): 119-128.
[12]	杜晨阳, 张雪英, 黄丽霞, 李娟. 基于改进高效通道注意力机制的多特征语音情感识别[J]. 计算机工程, 2025, 51(4): 97-106.
[13]	戴康佳, 徐慧英, 朱信忠, 李悉钰, 黄晓, 陈国强, 张志雄. YGL-SLAM: 动态场景下基于点和线的语义SLAM系统[J]. 计算机工程, 2025, 51(3): 95-104.
[14]	韩鹏, 黄韫栀, 任彩月, 程竞仪, 徐军. 基于双分支网络的乳腺PET新辅助化疗疗效评估[J]. 计算机工程, 2025, 51(3): 293-299.
[15]	胡朝举, 郭凤仪. 基于改进YOLOv7的MODF端口状态检测算法[J]. 计算机工程, 2025, 51(2): 78-85.

选择文件类型/文献管理软件名称

选择包含的内容