Research on Optimization of Convolution Pretraining Model Based on Causal Intervention and Invariance

doi:10.19678/j.issn.1000-3428.0061188

Abstract

Abstract: The deep learning model based on Convolutional Neural Network(CNN) has been widely used in image recognition and classification.However, the model still has some shortcomings in the overall grasp of global features, the effective extraction of feature invariance at concept level, and determining the clear causal relationship between variables.This makes deep models less flexible, adaptable and generalizable.Based on causal intervention and invariance, this study proposes a directional pruning and network structure optimization method based on a CNN deep model.The optimization method performs invariant-based intervention modulation on the model input, then analyzes the output distribution of the pretrained network convolutional substructure according to the generated modulation picture sequence, and filters and directionally prunes the noise-sensitive substructure.On this basis, the objective function based on inter-class differentiation and the inter-layer connection of the network are constructed with the help of the Capital Asset Pricing Model(CAPM) used in the field of economics.The network topology that can increase the inter-class differentiation under a single classification task is generated, and the stable characteristics of the concept level are optimized layer by layer.The experimental results on the ImageNet-2012 dataset show that the optimized deep model improves the classification accuracy of the ResNet50 baseline pre-training model by about 5 percentage points, and greatly reduces the size of the training set.

Key words: image recognition and classification, Convolutional Neural Network(CNN), causal intervention, invariance, Capital Asset Pricing Model(CAPM)

摘要： 基于卷积神经网络（CNN）的深度模型在图像识别与分类领域应用广泛，但在全局特征控制、概念层次特征不变性提取和变量之间的因果关系确定方面仍存在不足，使得深度模型缺乏灵活性、适应性及泛化性。基于因果干预和不变性，提出一种基于CNN深度模型的定向修剪和网络结构优化方法。通过对模型输入进行基于不变性的干预调制，根据生成的调制图片序列分析预训练网络卷积子结构的输出分布，筛选和定向修剪噪声敏感子结构。构建基于类间区分度的目标函数，借助经济学领域中的资本资产定价模型构建网络的层间连接，生成在单分类任务下能增大类间区分度的网络拓扑结构，逐层优化构建概念层次的稳定特征。在ImageNet-2012数据集上的实验结果表明，优化后的深度模型相比于ResNet50基线预训练模型的分类准确率约提升了5个百分点，并大幅降低了训练集规模。

关键词: 图像识别与分类, 卷积神经网络, 因果干预, 不变性, 资本资产定价模型

CLC Number:

TP391

HU Xuan, XING Kai, LI Yaming, WANG Zhiyong, DENG Hongwu. Research on Optimization of Convolution Pretraining Model Based on Causal Intervention and Invariance[J]. Computer Engineering, 2022, 48(4): 89-98.

胡璇, 邢凯, 李亚鸣, 王志勇, 邓洪武. 基于因果干预与不变性的卷积预训练模型优化研究[J]. 计算机工程, 2022, 48(4): 89-98.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0061188

http://www.ecice06.com/EN/Y2022/V48/I4/89

Figures/Tables 10

References

[1] BELLO I, ZOPH B, LE Q, et al.Attention augmented convolutional networks[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2019:3285-3294.
[2] AZULAY A, WEISS Y.Why do deep convolutional networks generalize so poorly to small image transformations?[EB/OL].[2021-02-05].https://arxiv.org/abs/1805.12177.
[3] PEARL J, MACKENZIE D.The book of why:the new science of cause and effect[M].Berlin, Germany:Springer, 2018.
[4] SIMONYAN K, ZISSERMAN A.Very deep convolutional networks for large-scale image recognition[EB/OL].[2021-02-05].https://arxiv.org/abs/1409.1556.
[5] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[6] KOSIOREK A R, SABOUR S, TEH Y W, et al.Stacked capsule autoencoders[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2019:15512-15522.
[7] SZEGEDY C, LIU W, JIA Y Q, et al.Going deeper with convolutions[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2015:1-9.
[8] IOFFE S, SZEGEDY C.Batch normalization:accelerating deep network training by reducing internal covariate shift[C]//Proceedings of International Conference on Machine Learning.Washington D.C., USA:IEEE Press, 2015:448-456.
[9] SZEGEDY C, VANHOUCKE V, IOFFE S, et al.Rethinking the inception architecture for computer vision[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:2818-2826.
[10] LIN T Y, DOLLÁR P, GIRSHICK R, et al.Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2017:936-944.
[11] SABOUR S, FROSST N, HINTON G E.Dynamic routing between capsules[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems.New York, USA:ACM Press, 2017:3859-3869.
[12] HINTON G E, SABOUR S, FROSST N.Matrix capsules with EM routing[EB/OL].[2021-02-05].http://www.cs.toronto.edu/~hinton/absps/EMcapsules.pdf.
[13] ARORA S, BHASKARA A, GE R, et al.Provable bounds for learning some deep representations[EB/OL].[2021-02-05].http://export.arxiv.org/pdf/1310.6343.
[14] CIRESAN D C, MEIER U, MASCI J, et al.Flexible, high performance convolutional neural networks for image classification[C]//Proceedings of 2011 International Joint Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2011:1237-1242.
[15] LECUN Y, BOTTOU L, BENGIO Y, et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE, 1998, 86(11):2278-2324.
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM, 2017, 60(6):84-90.
[17] HU J, SHEN L, SUN G.Squeeze-and-excitation networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7132-7141.
[18] WANG X L, GIRSHICK R, GUPTA A, et al.Non-local neural networks[C]//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2018:7794-7803.
[19] JEBARA T, WANG J, CHANG S F.Graph construction and b-matching for semi-supervised learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning.New York, USA:ACM Press, 2009:441-448.
[20] LI S, FU Y.Learning balanced and unbalanced graphs via low-rank coding[J].IEEE Transactions on Knowledge and Data Engineering, 2015, 27(5):1274-1287.
[21] WANG F, ZHANG C S.Label propagation through linear neighborhoods[J].IEEE Transactions on Knowledge and Data Engineering, 2008, 20(1):55-67.
[22] 王省, 康昭.基于光滑表示的半监督分类算法[J].计算机科学, 2021, 48(3):124-129. WANG X, KANG Z.Smooth representation-based semi-supervised classification[J].Computer Science, 2021, 48(3):124-129.(in Chinese)
[23] VALLENDER S S.Calculation of the Wasserstein distance between probability distributions on the line[J].Theory of Probability & Its Applications, 1974, 18(4):784-786.
[24] KESKIN Z, ASTE T.Information-theoretic measures for nonlinear causality detection:application to social media sentiment and cryptocurrency prices[J].Royal Society Open Science, 2020, 7(9):200863.
[25] 蔡瑞初, 陈薇, 张坤, 等.基于非时序观察数据的因果关系发现综述[J].计算机学报, 2017, 40(6):1470-1490. CAI R C, CHEN W, ZHANG K, et al.A survey on non-temporal series observational data based causal discovery[J].Chinese Journal of Computers, 2017, 40(6):1470-1490.(in Chinese)
[26] GRANGER C W J.Investigating causal relations by econometric models and cross-spectral methods[J].Econometrica, 1969, 37(3):424-438.
[27] 胡宗义.投资选择及资产定价数学模型研究[D].长沙:湖南大学, 2004. HU Z Y.Research on investment choice and asset pricing mathematical model[D].Changsha:Hunan University, 2004.(in Chinese)
[28] SHARPE W F.The Sharpe ratio[J].The Journal of Portfolio Management, 1994, 21(1):49-58.
[29] BAILEY D, LÓPEZ DE PRADO M.The Sharpe ratio efficient frontier[J].The Journal of Risk, 2012, 15(2):3-44.
[30] MELLOR J, TURNER J, STORKEY A, et al.Neural architecture search without training[EB/OL].[2021-02-05].https://arxiv.org/abs/2006.04647v1.
[31] QIAN N.On the momentum term in gradient descent learning algorithms[J].Neural Networks, 1999, 12(1):145-151.
[32] DUCHI J C, HAZAN E, SINGER Y.Adaptive subgradient methods for online learning and stochastic optimization[J].Journal of Machine Learning Research, 2011, 12(61):2121-2159.
[33] KINGMA D P, BA J.Adam:a method for stochastic optimization[EB/OL].[2021-02-05].https://arxiv.org/abs/1412.6980
[34] GUPTA V, KOREN T, SINGER Y.Shampoo:preconditioned stochastic tensor optimization[C]//Proceedings of International Conference on Machine Learning.Washington D.C., USA:IEEE Press, 2018:1842-1850.

Please choose a citation manager

Content to export