作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (4): 68-76,83. doi: 10.19678/j.issn.1000-3428.0057340

• 人工智能与模式识别 • 上一篇    下一篇

一种基于分布式编码的同步梯度下降算法

李博文, 谢在鹏, 毛莺池, 徐媛媛, 朱晓瑞, 张基   

  1. 河海大学 计算机与信息学院, 南京 211100
  • 收稿日期:2020-02-06 修回日期:2020-04-04 发布日期:2020-04-14
  • 作者简介:李博文(1996-),男,硕士研究生,主研方向为边缘计算、分布式计算;谢在鹏,副教授、博士;毛莺池,教授、博士;徐媛媛、朱晓瑞,博士;张基,硕士研究生。
  • 基金资助:
    国家自然科学基金重点项目(61832005);国家重点研发计划(2016YFC0402710)。

A Synchronized Gradient Descent Algorithm Based on Distributed Coding

LI Bowen, XIE Zaipeng, MAO Yingchi, XU Yuanyuan, ZHU Xiaorui, ZHANG Ji   

  1. School of Computer and Information, Hohai University, Nanjing 211100, China
  • Received:2020-02-06 Revised:2020-04-04 Published:2020-04-14

摘要: 基于数据并行化的异步随机梯度下降(ASGD)算法由于需要在分布式计算节点之间频繁交换梯度数据,从而影响算法执行效率。提出基于分布式编码的同步随机梯度下降(SSGD)算法,利用计算任务的冗余分发策略对每个节点的中间结果传输时间进行量化以减少单一批次训练时间,并通过数据传输编码策略的分组数据交换模式降低节点间的数据通信总量。实验结果表明,当配置合适的超参数时,与SSGD和ASGD算法相比,该算法在深度神经网络和卷积神经网络分布式训练中平均减少了53.97%、26.89%和39.11%、26.37%的训练时间,从而证明其能有效降低分布式集群的通信负载并保证神经网络的训练精确度。

关键词: 神经网络, 深度学习, 分布式编码, 梯度下降, 通信负载

Abstract: The Asynchronized Stochastic Gradient Descent(ASGD) algorithm based on data parallelization require frequent gradient data exchanges between distributed computing nodes,which affects the execution efficiency of the algorithm.This paper proposes a Synchronized Stochastic Gradient Descent(SSGD) algorithm based on distributed coding.The algorithm uses the redundancy allocation strategy of computation tasks to quantify the intermediate transmission time of each node,and thus reduces the consumed time for training of a single batch.Then the amount of data transmitted between nodes is reduced by using the grouped data exchange mode of the coding strategy for data communication.Experimental results show that with a suitable hyper parameter configuration,the proposed algorithm can reduce the average distributed training time of Deep Neural Network(DNN) and Convolutional Neural Network(CNN) by 53.97% and 26.89% compared with the SSGD algorithm,and by 39.11% and 26.37% compared with the ASGD algorithm.It can significantly reduce the communication loads of the distributed cluster and ensures the training accuracy of neural networks.

Key words: neural network, deep learning, distributed coding, Gradient Descent(GD), communication load

中图分类号: