作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (7): 104-113. doi: 10.19678/j.issn.1000-3428.0062017

• 人工智能与模式识别 • 上一篇    下一篇

深度学习中的权重初始化方法研究

邢彤彤, 孙仁诚, 邵峰晶, 隋毅   

  1. 青岛大学 计算机科学技术学院, 山东 青岛 266071
  • 收稿日期:2021-07-08 修回日期:2021-09-06 出版日期:2022-07-15 发布日期:2022-07-12
  • 作者简介:邢彤彤(1997—),女,硕士研究生,主研方向为深度学习、大数据;孙仁诚(通信作者),教授、博士;邵峰晶,教授、博士、博士生导师;隋毅,副教授、博士。
  • 基金资助:
    国家自然科学基金青年科学基金项目(41706198)。

Research on Weight Initialization Method in Deep Learning

XING Tongtong, SUN Rencheng, SHAO Fengjing, SUI Yi   

  1. School of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
  • Received:2021-07-08 Revised:2021-09-06 Online:2022-07-15 Published:2022-07-12

摘要: 深度神经网络训练的实质是初始化权重不断调整的过程,整个训练过程存在耗费时间长、需要数据量大等问题。大量预训练网络由经过训练的权重数据组成,若能发现预训练网络权重分布规律,利用这些规律来初始化未训练网络,势必会减少网络训练时间。通过对AlexNet、ResNet18网络在ImageNet数据集上的预训练模型权重进行概率分布分析,发现该权重分布具备单侧幂律分布的特征,进而使用双对数拟合的方式进一步验证权重的单侧分布服从截断幂律分布的性质。基于该分布规律,结合防止过拟合的正则化思想提出一种标准化对称幂律分布(NSPL)的初始化方法,并基于AlexNet和ResNet32网络,与He初始化的正态分布、均匀分布两种方法在CIFAR10数据集上进行实验对比,结果表明,NSPL方法收敛速度优于正态分布、均匀分布两种初始化方法,且在ResNet32上取得了更高的精确度。

关键词: 深度学习, 卷积神经网络, 预训练模型, 权重初始化, 对称幂律分布

Abstract: The essence of deep neural network training is the constant adjustment of the initial weight, and the entire training process is time consuming and requires a large amount of data.Most pretraining networks are essentially trained weight data.If the weight distribution rules of pretraining networks are identified and untrained networks can be initialized using these rules, then the network training time can be reduced.In this study, the probability distribution analysis of the pre-training model weights of AlexNet and ResNet18 on the ImageNet dataset is performed;the result shows that the weight distribution exhibits the characteristics of a one-sided power law distribution.Subsequently, the double logarithm fitting method is used to verify that the one-sided distribution of weight obeys the truncated power law distribution.Combining the distribution law with the regularization idea to prevent overfitting, an initialization method fora Normalized Symmetric Power Law(NSPL) distribution is proposed.Subsequently, the normal and uniform distribution methods initialized by He on the AlexNet and ResNet32 networks are compared experimentally on the CIFAR10 dataset.The experimental results show that the convergence rate of the NSPL distribution initializing method is higher than those of the two abovementioned initializing methods, and that ResNet32 achieves higher accuracy.

Key words: deep learning, Convolutional Neural Network(CNN), pre-training model, weight initialization, symmetric power law distribution

中图分类号: