基于彩票假设的软剪枝算法

doi:10.19678/j.issn.1000-3428.0064139

摘要/Abstract

摘要： 神经网络层数的不断增加使网络复杂度也呈指数级上升，导致应用场景受到限制。提出一种基于彩票假设的软剪枝算法实现网络加速。通过使用前一阶段的剪枝网络对其进行知识蒸馏来补偿的方法恢复错误参数，并在知识蒸馏的损失函数中加入稀疏约束来保持稀疏性。在此基础上，将当前阶段得到的剪枝网络与知识蒸馏得到的学生网络进行融合。在进行网络融合时，计算剪枝网络与学生网络的相似性，并通过设计特定的融合公式来突出相近的网络参数和抑制相离的网络参数，使得网络在剪枝率提高后仍然表现良好。在CIFAR-10/100数据集上对VGG16、ResNet-18和ResNet-56模型进行实验，结果显示：剪枝率为80%时，VGG16在CIFAR-10数据集上的分类精度下降0.07个百分点；剪枝率为60%时，ResNet-56在CIFAR-10数据集上的分类精度提升0.06个百分点；剪枝率为85%、95%和99%时，ResNet-18在CIFAR-100数据集上的分类精度仅下降1.03、1.51和2.04个百分点。实验结果表明，所提算法在提高网络剪枝率的同时仍能使其保持较高的精度，验证了算法的有效性。

关键词: 网络加速, 彩票假设, 全局剪枝, 稀疏蒸馏, 模型融合

Abstract: The increasing number of neural network layers exponentially increases the network complexity and limits its application scenarios.To solve this problem，this study proposes a soft pruning algorithm based on lottery ticket hypothesis.The pruning network of the previous stage is used to compensate for knowledge distillation.To maintain the sparsity in knowledge distillation，the wrongly-pruned parameters are recovered and sparse constraints are added to its loss function.Subsequently，the pruning network obtained at the current stage is integrated with the student network obtained through knowledge distillation.The similarity between the pruning and student networks during the network fusion is then calculated and a specific fusion formula is designed to highlight similar network parameters and inhibit discrete network parameters.Consequently，the network continues to perform well after the pruning rate is increased.The experimental results of VGG16，ResNet-18，and ResNet-56 models on CiFAR-10/100 dataset indicate the following：when the pruning rate is 80%，the classification accuracy of VGG16 in CIFAR-10 dataset decreases by 0.07 percentage points；when the pruning rate is 60%，the classification accuracy of ResNet-56 in CIFAR-10 dataset is improved by 0.06 percentage points；and when the pruning rates are 85%，95%，and 99%，the accuracy of ResNet-18 on CIFAR-100 dataset only decreased by 1.03，1.51，and 2.04 percentage points，respectively.This shows that the proposed algorithm can improve the pruning rate of the network while maintaining high accuracy，thus，proving the effectiveness of the proposed algorithm.

Key words: network acceleration, lottery ticket hypothesis, global pruning, sparse distillation, model fusion

中图分类号:

TP18

马嘉翔, 宋晓宁. 基于彩票假设的软剪枝算法[J]. 计算机工程, 2023, 49(5): 97-104.

MA Jiaxiang, SONG Xiaoning. Soft Pruning Algorithm Based on Lottery Ticket Hypothesis[J]. Computer Engineering, 2023, 49(5): 97-104.

http://www.ecice06.com/CN/Y2023/V49/I5/97

图/表 8

20230515185326

20230515185330

20230515185334

20230515185338

20230515185341

20230515185345

20230515185348

20230515185352

参考文献

[1] SZEGEDY C,LIU W,JIA Y Q,et al.Going deeper with convolutions[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2015:1-9.
[2] ALEX K,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM,2017,60(6):84-90.
[3] SIMONYAN K,ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[C]//Proceedings of the 3rd International Conference on Learning Representations.San Diego,USA:[s.n.],2015:1-10.
[4] HE K M,ZHANG X Y,REN S Q,et al.Deep residual learning for image recognition[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778.
[5] 郭子博,高瑛珂,胡航天,等.基于混合架构的卷积神经网络算法加速研究[J].计算机工程与应用,2022,58(6):88-94. GUO Z B,GAO Y K,HU H T,et al.Research on acceleration of convolutional neural network algorithm based on hybrid architecture[J].Computer Engineering and Applications,2022,58(6):88-94.(in Chinese)
[6] HOWARD A G,ZHU M L,CHEN B,et al.MobileNets:efficient convolutional neural networks for mobile vision applications[EB/OL].[2022-01-10].https://arxiv.org/abs/1704.04861.
[7] SANDLER M,HOWARD A,ZHU M L,et al.MobileNetV2:inverted residuals and linear bottlenecks[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2018:4510-4520.
[8] WU J X,LENG C,WANG Y H,et al.Quantized convolutional neural networks for mobile devices[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:4820-4828.
[9] LUO J H,WU J X,LIN W Y.ThiNet:a filter level pruning method for deep neural network compression[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:5068-5076.
[10] ROMERO A,BALLAS N,KAHOU S E,et al.FitNets:hints for thin deep nets[EB/OL].[2022-01-10].https://arxiv.org/abs/1412.6550.
[11] WANG W X,FU C,GUO J S,et al.COP:customized deep model compression via regularized correlation-based filter-level pruning[EB/OL].[2022-01-10].https://arxiv.org/abs/1906.10337.
[12] LIU Z,LI J G,SHEN Z Q,et al.Learning efficient convolutional networks through network slimming[C]//Proceedings of IEEE International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2017:2755-2763.
[13] CHIN T W,DING R Z,ZHANG C,et al.Towards efficient model compression via learned global ranking[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:1515-1525.
[14] FRANKLE J,CARBIN M.The lottery ticket hypothesis:finding sparse,trainable neural networks[EB/OL].[2022-01-10].https://arxiv.org/abs/1803.03635.
[15] MAENE J,LI M X,MOENS M F.Towards understanding iterative magnitude pruning:why lottery tickets win[EB/OL].[2022-01-10].https://arxiv.org/abs/2106.06955.
[16] SAVARESE P,HUGO S,MICHAEL M.Winning the lottery with continuous sparsification[EB/OL].[2022-01-10].https://arxiv.org/abs/1912.04427v4.
[17] MALACH E,YEHUDAI G,SHALEV-SHWARTZ S,et al.Proving the lottery ticket hypothesis:pruning is all you need[C]//Proceedings of the 3rd International Conference on Learning Representations.San Diego,USA:[s.n.],2020:1-10.
[18] ZHOU H,LAN J,LIU R,et al.Deconstructing lottery tickets:zeros,signs,and the supermask[EB/OL].[2022-01-10].https://arxiv.org/abs/1905.01067.
[19] FRANKLE J,DZIUGAITE G K,ROY D M,et al.Stabilizing the lottery ticket hypothesis[EB/OL].[2022-01-10].https://arxiv.org/abs/1903.01611.
[20] PRAKASH A,STORER J,FLORENCIO D,et al.RePr:improved training of convolutional filters[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:10658-10667.
[21] HE Y,KANG G L,DONG X Y,et al.Soft filter pruning for accelerating deep convolutional neural networks[EB/OL].[2022-01-10].https://arxiv.org/abs/1808.06866.
[22] MISHRA A,LATORRE J A,POOL J,et al.Accelerating sparse deep neural networks[EB/OL].[2022-01-10].https://arxiv.org/abs/2104.08378.
[23] GOU J P,YU B S,MAYBANK S J,et al.Knowledge distillation:a survey[J].International Journal of Computer Vision,2021,129(6):1789-1819.
[24] HINTON G,VINYALS O,DEAN J.Distilling the knowledge in a neural network[EB/OL].[2022-01-10].https://arxiv.org/abs/1503.02531.
[25] XU Y G,QIU X P,ZHOU L G,et al.Improving BERT fine-tuning via self-ensemble and self-distillation[EB/OL].[2022-01-10].https://arxiv.org/abs/2002.10345.
[26] KIM K,JI B,YOON D,et al.Self-knowledge distillation with progressive refinement of targets[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2022:6547-6556.
[27] SIMONE S,COMMINIELLO D,HUSSAIN A,et al.Group sparse regularization for deep neural networks[J].Neurocomputing,2017,241:81-89.
[28] 韦越,陈世超,朱凤华,等.基于稀疏正则化的卷积神经网络模型剪枝方法[J].计算机工程,2021,47(10):61-66. WEI Y,CHEN S C,ZHU F H,et al.Pruning method for convolutional neural network models based on sparse regularization[J].Computer Engineering,2021,47(10):61-66.(in Chinese)
[29] WIMMER P,MEHNERT J,CONDURACHE A.COPS:controlled pruning before training starts[EB/OL].[2022-01-10].https://arxiv.org/abs/2107.12673.
[30] LI Y W,GU S H,MAYER C,et al.Group sparsity:the hinge between filter pruning and decomposition for network compression[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:8015-8024.
[31] LIN M B,JI R R,WANG Y,et al.HRank:filter pruning using high-rank feature map[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:1526-1535.
[32] ZHAO C L,NI B B,ZHANG J,et al.Variational convolutional neural network pruning[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:2775-2784.
[33] GUO Y,YUAN H,TAN J C,et al.GDP:stabilized neural network pruning via gates with differentiable polarization[C]//Proceedings of IEEE/CVF International Conference on Computer Vision.Washington D.C.,USA:IEEE Press,2022:5219-5230.
[34] WANG Y L,ZHANG X L,XIE L X,et al.Pruning from scratch[J].Proceedings of the AAAI Conference on Artificial Intelligence,2020,34(7):12273-12280.
[35] LIN M B,JI R R,ZHANG Y X,et al.Channel pruning via automatic structure search[EB/OL].[2022-01-10].https://arxiv.org/abs/2001.08565v1.
[36] TANG Y H,WANG Y H,XU Y X,et al.SCOP:scientific control for reliable neural network pruning[EB/OL].[2022-01-10].https://arxiv.org/abs/2010.10732.
[37] CHIN T W,DING R Z,ZHANG C,et al.Towards efficient model compression via learned global ranking[C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2020:1515-1525.
[38] LEE N,AJANTHAN T,TORR P H S.SNIP:single-shot network pruning based on connection sensitivity[EB/OL].[2022-01-10].https://arxiv.org/abs/1810.02340.
[39] MARQUES-SILVA J P,SAKALLAH K A.GRASP-a new search algorithm for satisfiability[M]//KUEHLMANN A.The best of ICCAD.Boston,USA:Springer US,2003:73-89.

选择文件类型/文献管理软件名称

选择包含的内容