深度学习中的权重初始化方法研究

doi:10.19678/j.issn.1000-3428.0062017

计算机工程 ›› 2022, Vol. 48 ›› Issue (7): 104-113. doi: 10.19678/j.issn.1000-3428.0062017

深度学习中的权重初始化方法研究

邢彤彤, 孙仁诚, 邵峰晶, 隋毅

青岛大学计算机科学技术学院, 山东青岛 266071

收稿日期:2021-07-08 修回日期:2021-09-06 出版日期:2022-07-15 发布日期:2022-07-12
作者简介:邢彤彤(1997—),女,硕士研究生,主研方向为深度学习、大数据;孙仁诚(通信作者),教授、博士;邵峰晶,教授、博士、博士生导师;隋毅,副教授、博士。
基金资助:
国家自然科学基金青年科学基金项目（41706198）。

Research on Weight Initialization Method in Deep Learning

XING Tongtong, SUN Rencheng, SHAO Fengjing, SUI Yi

School of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China

Received:2021-07-08 Revised:2021-09-06 Online:2022-07-15 Published:2022-07-12

摘要/Abstract

摘要： 深度神经网络训练的实质是初始化权重不断调整的过程，整个训练过程存在耗费时间长、需要数据量大等问题。大量预训练网络由经过训练的权重数据组成，若能发现预训练网络权重分布规律，利用这些规律来初始化未训练网络，势必会减少网络训练时间。通过对AlexNet、ResNet18网络在ImageNet数据集上的预训练模型权重进行概率分布分析，发现该权重分布具备单侧幂律分布的特征，进而使用双对数拟合的方式进一步验证权重的单侧分布服从截断幂律分布的性质。基于该分布规律，结合防止过拟合的正则化思想提出一种标准化对称幂律分布（NSPL）的初始化方法，并基于AlexNet和ResNet32网络，与He初始化的正态分布、均匀分布两种方法在CIFAR10数据集上进行实验对比，结果表明，NSPL方法收敛速度优于正态分布、均匀分布两种初始化方法，且在ResNet32上取得了更高的精确度。

关键词: 深度学习, 卷积神经网络, 预训练模型, 权重初始化, 对称幂律分布

Abstract: The essence of deep neural network training is the constant adjustment of the initial weight, and the entire training process is time consuming and requires a large amount of data.Most pretraining networks are essentially trained weight data.If the weight distribution rules of pretraining networks are identified and untrained networks can be initialized using these rules, then the network training time can be reduced.In this study, the probability distribution analysis of the pre-training model weights of AlexNet and ResNet18 on the ImageNet dataset is performed;the result shows that the weight distribution exhibits the characteristics of a one-sided power law distribution.Subsequently, the double logarithm fitting method is used to verify that the one-sided distribution of weight obeys the truncated power law distribution.Combining the distribution law with the regularization idea to prevent overfitting, an initialization method fora Normalized Symmetric Power Law(NSPL) distribution is proposed.Subsequently, the normal and uniform distribution methods initialized by He on the AlexNet and ResNet32 networks are compared experimentally on the CIFAR10 dataset.The experimental results show that the convergence rate of the NSPL distribution initializing method is higher than those of the two abovementioned initializing methods, and that ResNet32 achieves higher accuracy.

Key words: deep learning, Convolutional Neural Network(CNN), pre-training model, weight initialization, symmetric power law distribution

中图分类号:

TP18

邢彤彤, 孙仁诚, 邵峰晶, 隋毅. 深度学习中的权重初始化方法研究[J]. 计算机工程, 2022, 48(7): 104-113.

XING Tongtong, SUN Rencheng, SHAO Fengjing, SUI Yi. Research on Weight Initialization Method in Deep Learning[J]. Computer Engineering, 2022, 48(7): 104-113.

http://www.ecice06.com/CN/Y2022/V48/I7/104

图/表 11

20220808084620

20220808084623

20220808084627

20220808084631

20220808084635

20220808084638

20220808084642

20220808084646

20220808084650

20220808084654

20220808084658

参考文献

[1] MCCULLOCH W S, PITTS W.A logical calculus of the ideas immanent in nervous activity[J].Bulletin of Mathematical Biology, 1990, 52(1/2):99-115.
[2] HINTON G E, SALAKHUTDINOV R R.Reducing the dimensionality of data with neural networks[J].Science, 2006, 313(5786):504-507.
[3] ROSENBLATT F.The perceptron:a probabilistic model for information storage and organization in the brain[J].Psychological Review, 1958, 65(6):386-408.
[4] LECUN Y, BOTTOU L, BENGIO Y, et al.Gradient-based learning applied to document recognition[J].Proceedings of the IEEE, 1998, 86(11):2278-2324.
[5] 司念文, 张文林, 屈丹, 等.卷积神经网络表征可视化研究综述[J/OL].自动化学报:1-31[2021-02-12].https://doi:10.16383/j.aas.c200554. SI N W, ZHANG W L, QU D, et al.A review on representation visualization of convolutional neural networks[J].ActaAutomatica Sinica:1-31[2021-02-12].https://doi:10.16383/j.aas.c200554.(in Chinese)
[6] RUMELHART D E, HINTON G E, WILLIAMS R J.Learning representations by back-propagating errors[J].Nature, 1986, 323(6088):533-536.
[7] 刘晴.一种改进的深度卷积神经网络及其权值初始化方法研究[D].保定:河北大学, 2018. LIU Q.An improved deep convolutional neural network and its weight initialization[D].Baoding:Hebei University, 2018.(in Chinese).
[8] 沈成恺.卷积神经网络权值初始化方法研究[D].北京:北京工业大学, 2017. SHEN C K.Research on initialization method of convolutional neural networks[D].Beijing:Beijing University of Technology, 2017.(in Chinese)
[9] BURKARDT J.The truncated normal distribution[EB/OL].[2021-06-01].https://www.doc88.com/p-1176985733398.html.
[10] 李玉鑑, 沈成恺, 杨红丽, 等.初始化卷积神经网络的主成分洗牌方法[J].北京工业大学学报, 2017, 43(1):22-27. LI Y J, SHEN C K, YANG H L, et al.PCA shuffling initialization of convolutional neural networks[J].Journal of Beijing University of Technology, 2017, 43(1):22-27.(in Chinese)
[11] SHEN H.Towards a mathematical understanding of the difficulty in learning with feedforward neural networks[EB/OL].[2021-06-01].https://arxiv.org/abs/1611. 05827.
[12] HE K M, ZHANG X Y, REN S Q, et al.Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision.Washington D.C., USA:IEEE Press, 2015:1026-1034.
[13] 张焕, 张庆, 于纪言.激活函数的发展综述及其性质分析[J].西华大学学报(自然科学版), 2021, 40(4):1-10. ZHANG H, ZHANG Q, YU J Y.A review of the development and property analysis of activation function[J].Journal of XihuaUniversity(Natural Science Edition), 2021, 40(4):1-10.(in Chinese)
[14] 李杰.卷积神经网络的权重初始化研究及应用[D].青岛:青岛大学, 2020. LI J.Research and application of weight initialization of convolutional neural networks[D].Qingdao:Qingdao University, 2020.(in Chinese).
[15] HAN X, ZHANG Z Y, DING N, et al.Pre-trained models:past, present and future[J].AI Open, 2021, 2:225-250.
[16] KETKAR N S.Introduction to PyTorch[M].Germany, Germany:Springer, 2017.
[17] HAN J, MORAGA C.The influence of the sigmoid function parameters on the speed of backpropagation learning[C]//Proceedings of IEEE International Workshop on Artificial Neural Networks.Washington D.C., USA:IEEE Press, 1995:195-201.
[18] DAHL G E, SAINATH T N, HINTON G E.Improving deep neural networks for LVCSR using rectified linear units and dropout[C]//Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2013:8609-8613.
[19] KRIZHEVSKY A, SUTSKEVER I, HINTON G E.ImageNet classification with deep convolutional neural networks[J].Communications of the ACM, 2017, 60(6):84-90.
[20] HE K M, ZHANG X Y, REN S Q, et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C., USA:IEEE Press, 2016:770-778.
[21] BARABASI A L, ALBERT R.Emergence of scaling in random networks[J].Science, 1999, 286(5439):509-512.
[22] KANG G L, DONG X Y, ZHENG L, et al.PatchShuffle regularization[EB/OL].[2021-06-01].https://arxiv.org/abs/1707.07103.
[23] MCMAHAN H B, HOLT G, SCULLEY D, et al.Ad click prediction:a view from the trenches[C]//Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA:ACM Press, 2013:1222-1230.

选择文件类型/文献管理软件名称

选择包含的内容

深度学习中的权重初始化方法研究

Research on Weight Initialization Method in Deep Learning

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	江雨燕, 陶承凤, 李平. 数据增强和自适应自步学习的深度子空间聚类算法[J]. 计算机工程, 2023, 49(8): 96-103, 110.
[2]	李泽水, 冀俊忠, 杨翠翠. 基于边权重信息深度网络嵌入的PPIN功能模块检测[J]. 计算机工程, 2023, 49(8): 69-76.
[3]	王可铮, 徐玉芬, 周尚波. 结合对比感知损失和融合注意力的图像去雾模型[J]. 计算机工程, 2023, 49(8): 207-214.
[4]	刘俊豪, 王美林, 谢兴, 宋烨兴, 许莉花. 基于改进YOLOv5的皮革瑕疵检测算法[J]. 计算机工程, 2023, 49(8): 240-249.
[5]	曹坪, 杨怀志, 薄一军, 尤嘉, 张淳杰, 李丹勇. 面向低质量裂缝图像的多知识蒸馏分类[J]. 计算机工程, 2023, 49(7): 204-213.
[6]	白明昌. 基于折叠路径聚合的属性网络节点嵌入方法[J]. 计算机工程, 2023, 49(7): 76-84.
[7]	闫兴亚, 匡娅茜, 白光睿, 李月. 基于深度学习的学生课堂行为识别方法[J]. 计算机工程, 2023, 49(7): 251-258.
[8]	李军侠, 王星驰, 殷梓, 石德硕. 边缘深度挖掘的弱监督显著性目标检测[J]. 计算机工程, 2023, 49(7): 169-178.
[9]	吴珊, 周凤. 基于改进SSD算法的小目标检测[J]. 计算机工程, 2023, 49(7): 179-188.
[10]	席建锐, 唐红梅, 梁春阳, 刘鑫. 基于改进隐函数的点云物体重建[J]. 计算机工程, 2023, 49(7): 214-222.
[11]	齐咏生, 杜晓旭, 朱俊峰, 高胜利, 刘利强. 基于增强型轻量深度网络的牧区牲畜高效检测[J]. 计算机工程, 2023, 49(7): 278-287.
[12]	谌雨章, 黄逸姿, 张钧涵. 基于多速率空洞卷积的多尺度水下小目标检测[J]. 计算机工程, 2023, 49(6): 257-264.
[13]	张博旭, 蒲智, 程曦. 基于提示学习的维吾尔语文本分类研究[J]. 计算机工程, 2023, 49(6): 292-299,313.
[14]	于海洋, 景鹏, 张文涛, 谢赛飞, 滑志华, 宋草原. 基于残差与注意力机制的道路裂缝检测U-Net改进模型[J]. 计算机工程, 2023, 49(6): 265-273.
[15]	代祖华, 刘园园, 狄世龙. 语义增强的图神经网络方面级文本情感分析[J]. 计算机工程, 2023, 49(6): 71-80.

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

深度学习中的权重初始化方法研究

Research on Weight Initialization Method in Deep Learning

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 11

参考文献

相关文章 15

编辑推荐

Metrics

本文评价