深度学习中的权重初始化方法研究

doi:10.19678/j.issn.1000-3428.0062017

计算机工程 ›› 2022, Vol. 48 ›› Issue (7): 104-113. doi: 10.19678/j.issn.1000-3428.0062017

深度学习中的权重初始化方法研究

邢彤彤, 孙仁诚, 邵峰晶, 隋毅

青岛大学计算机科学技术学院, 山东青岛 266071

收稿日期:2021-07-08 修回日期:2021-09-06 出版日期:2022-07-15 发布日期:2022-07-12
作者简介:邢彤彤(1997—),女,硕士研究生,主研方向为深度学习、大数据;孙仁诚(通信作者),教授、博士;邵峰晶,教授、博士、博士生导师;隋毅,副教授、博士。
基金资助:
国家自然科学基金青年科学基金项目（41706198）。

Research on Weight Initialization Method in Deep Learning

XING Tongtong, SUN Rencheng, SHAO Fengjing, SUI Yi

School of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China

Received:2021-07-08 Revised:2021-09-06 Online:2022-07-15 Published:2022-07-12

摘要/Abstract

摘要： 深度神经网络训练的实质是初始化权重不断调整的过程，整个训练过程存在耗费时间长、需要数据量大等问题。大量预训练网络由经过训练的权重数据组成，若能发现预训练网络权重分布规律，利用这些规律来初始化未训练网络，势必会减少网络训练时间。通过对AlexNet、ResNet18网络在ImageNet数据集上的预训练模型权重进行概率分布分析，发现该权重分布具备单侧幂律分布的特征，进而使用双对数拟合的方式进一步验证权重的单侧分布服从截断幂律分布的性质。基于该分布规律，结合防止过拟合的正则化思想提出一种标准化对称幂律分布（NSPL）的初始化方法，并基于AlexNet和ResNet32网络，与He初始化的正态分布、均匀分布两种方法在CIFAR10数据集上进行实验对比，结果表明，NSPL方法收敛速度优于正态分布、均匀分布两种初始化方法，且在ResNet32上取得了更高的精确度。

关键词: 深度学习, 卷积神经网络, 预训练模型, 权重初始化, 对称幂律分布

Abstract: The essence of deep neural network training is the constant adjustment of the initial weight, and the entire training process is time consuming and requires a large amount of data.Most pretraining networks are essentially trained weight data.If the weight distribution rules of pretraining networks are identified and untrained networks can be initialized using these rules, then the network training time can be reduced.In this study, the probability distribution analysis of the pre-training model weights of AlexNet and ResNet18 on the ImageNet dataset is performed;the result shows that the weight distribution exhibits the characteristics of a one-sided power law distribution.Subsequently, the double logarithm fitting method is used to verify that the one-sided distribution of weight obeys the truncated power law distribution.Combining the distribution law with the regularization idea to prevent overfitting, an initialization method fora Normalized Symmetric Power Law(NSPL) distribution is proposed.Subsequently, the normal and uniform distribution methods initialized by He on the AlexNet and ResNet32 networks are compared experimentally on the CIFAR10 dataset.The experimental results show that the convergence rate of the NSPL distribution initializing method is higher than those of the two abovementioned initializing methods, and that ResNet32 achieves higher accuracy.

Key words: deep learning, Convolutional Neural Network(CNN), pre-training model, weight initialization, symmetric power law distribution

中图分类号:

TP18

邢彤彤, 孙仁诚, 邵峰晶, 隋毅. 深度学习中的权重初始化方法研究[J]. 计算机工程, 2022, 48(7): 104-113.

XING Tongtong, SUN Rencheng, SHAO Fengjing, SUI Yi. Research on Weight Initialization Method in Deep Learning[J]. Computer Engineering, 2022, 48(7): 104-113.

https://www.ecice06.com/CN/Y2022/V48/I7/104

图/表 11

20220808084620

20220808084623

20220808084627

20220808084631

20220808084635

20220808084638

20220808084642

20220808084646

20220808084650

20220808084654

20220808084658

参考文献

[1] MCCULLOCH W S, PITTS W.A logical calculus of the ideas immanent in nervous activity[J].Bulletin of Mathematical Biology, 1990, 52(1/2):99-115.
[2] HINTON G E, SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science, 2006, 313(5786):504-507.
[3] ROSENBLATT F.The perceptron:a probabilistic model for information storage and organization in the brain[J].Psychological Review, 1958, 65(6):386-408.
[4] LECUN Y, BOTTOU L, BENGIO Y, et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE, 1998, 86(11):2278-2324.
[5] 司念文, 张文林, 屈丹, 等.卷积神经网络表征可视化研究综述[J/OL].自动化学报:1-31[2021-02-12].https://doi:10.16383/j.aas.c200554. SI N W, ZHANG W L, QU D, et al.A review on representation visualization of convolutional neural networks[J].ActaAutomatica Sinica:1-31[2021-02-12].https://doi:10.16383/j.aas.c200554.(in Chinese)
[6] RUMELHART D E, HINTON G E, WILLIAMS R J.Learning representations by back-propagating errors[J].Nature, 1986, 323(6088):533-536.
[7] 刘晴.一种改进的深度卷积神经网络及其权值初始化方法研究[D].保定:河北大学, 2018. LIU Q.An improved deep convolutional neural network and its weight initialization[D].Baoding:Hebei University, 2018.(in Chinese).
[8] 沈成恺.卷积神经网络权值初始化方法研究[D].北京:北京工业大学, 2017. SHEN C K.Research on initialization method of convolutional neural networks[D].Beijing:Beijing University of Technology, 2017.(in Chinese)
[9] BURKARDT J.The truncated normal distribution[EB/OL].[2021-06-01].https://www.doc88.com/p-1176985733398.html.
[10] 李玉鑑, 沈成恺, 杨红丽, 等.初始化卷积神经网络的主成分洗牌方法[J].北京工业大学学报, 2017, 43(1):22-27. LI Y J, SHEN C K, YANG H L, et al.PCA shuffling initialization of convolutional neural networks[J].Journal of Beijing University of Technology, 2017, 43(1):22-27.(in Chinese)
[11] SHEN H.Towards a mathematical understanding of the difficulty in learning with feedforward neural networks[EB/OL].[2021-06-01].https://arxiv.org/abs/1611. 05827.
[12] HE K M, ZHANG X Y, REN S Q, et al.Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:1026-1034.
[13] 张焕, 张庆, 于纪言.激活函数的发展综述及其性质分析[J].西华大学学报(自然科学版), 2021, 40(4):1-10. ZHANG H, ZHANG Q, YU J Y.A review of the development and property analysis of activation function[J].Journal of XihuaUniversity(Natural Science Edition), 2021, 40(4):1-10.(in Chinese)
[14] 李杰.卷积神经网络的权重初始化研究及应用[D].青岛:青岛大学, 2020. LI J.Research and application of weight initialization of convolutional neural networks[D].Qingdao:Qingdao University, 2020.(in Chinese).
[15] HAN X, ZHANG Z Y, DING N, et al.Pre-trained models:past, present and future[J].AI Open, 2021, 2:225-250.
[16] KETKAR N S.Introduction to PyTorch[M].Germany, Germany:Springer, 2017.
[17] HAN J, MORAGA C.The influence of the sigmoid function parameters on the speed of backpropagation learning[C]//Proceedings of IEEE International Workshop on Artificial Neural Networks.Washington D.C., USA:IEEE Press, 1995:195-201.
[18] DAHL G E, SAINATH T N, HINTON G E.Improving deep neural networks for LVCSR using rectified linear units and dropout[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2013:8609-8613.
[19] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM, 2017, 60(6):84-90.
[20] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[21] BARABASI A L, ALBERT R.Emergence of scaling in random networks[J].Science, 1999, 286(5439):509-512.
[22] KANG G L, DONG X Y, ZHENG L, et al.PatchShuffle regularization[EB/OL].[2021-06-01].https://arxiv.org/abs/1707.07103.
[23] MCMAHAN H B, HOLT G, SCULLEY D, et al.Ad click prediction:a view from the trenches[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA:ACM Press, 2013:1222-1230.

选择文件类型/文献管理软件名称

选择包含的内容

深度学习中的权重初始化方法研究

Research on Weight Initialization Method in Deep Learning

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	魏嵬, 丁香香, 郭梦星, 杨钊, 刘辉. 文本相似度计算方法综述[J]. 计算机工程, 2024, 50(9): 18-32.
[2]	王志浩, 钱沄涛. 基于Swin Transformer的双流遥感图像时空融合超分辨率重建[J]. 计算机工程, 2024, 50(9): 33-45.
[3]	李俊俊, 董建刚, 李坤. 基于Kubernetes的集群节能策略研究[J]. 计算机工程, 2024, 50(9): 82-91.
[4]	张鲁, 田春伟, 宋焕生, 刘侍刚. 用于低剂量CT图像去噪的多级双树复小波网络[J]. 计算机工程, 2024, 50(9): 266-275.
[5]	朱凯, 李理, 张彤, 江晟, 别一鸣. 基于Transformer的多阶段运动模糊图像修复网络[J]. 计算机工程, 2024, 50(9): 276-285.
[6]	张天鹏, 韩晶, 吕学强. 基于多任务学习的超分辨率辅助小目标检测[J]. 计算机工程, 2024, 50(9): 304-312.
[7]	高煜宝, 文志诚. 基于注意力机制的双路解码器图像去噪方法[J]. 计算机工程, 2024, 50(9): 324-332.
[8]	张华青, 夏张涛, 陆晓庆, 童基均. 基于字形特征的血管外科命名实体识别[J]. 计算机工程, 2024, 50(8): 13-21.
[9]	王蕾, 党时鹏, 潘丰. 基于卷积神经网络的隐匿性旁路预测模型[J]. 计算机工程, 2024, 50(8): 40-49.
[10]	张亚洲, 和玉, 戎璐, 王祥凯. 基于上下文知识增强型Transformer网络的抑郁检测[J]. 计算机工程, 2024, 50(8): 75-85.
[11]	高伟, 李帅龙, 茆琳, 王磊, 李颖颖, 韩林. 一种基于TVM的算子生成加速策略[J]. 计算机工程, 2024, 50(8): 353-362.
[12]	王宇, 祁琦, 王纯, 许才. 储能变流器信号高精度故障诊断方法[J]. 计算机工程, 2024, 50(8): 389-396.
[13]	耿丽丽, 牛保宁. 基于通道相似度熵的卷积神经网络裁剪[J]. 计算机工程, 2024, 50(7): 133-143.
[14]	张洋, 刘畅, 李少青. 基于可控制性度量的图神经网络门级硬件木马检测方法[J]. 计算机工程, 2024, 50(7): 164-173.
[15]	牛瑞婷, 严天峰, 高锐, 王映植. 低信噪比下基于深度学习TCNN-MobileNet的调制识别[J]. 计算机工程, 2024, 50(7): 204-215.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

深度学习中的权重初始化方法研究

Research on Weight Initialization Method in Deep Learning

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价