[1] SAK H,SENIOR A,BEAUFAYS F.Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition[EB/OL].[2020-01-04].https://arxiv.org/abs/1402.1128. [2] SERCU T,PUHRSCH C,KINGSBURY B,et al.Very deep multilingual convolutional neural networks for LVCSR[C]//Proceedings of 2016 IEEE International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2016:4955-4959. [3] KRIZHEVSKY A,SUTSKEVER I,GEOFFREY E H.ImageNet classification with deep convolutional neural networks[EB/OL].[2020-01-04].https://blog.csdn.net/yuanchheneducn/article/details/50161047. [4] HE Kaiming,ZHANG Xiangyu,REN Shaoqing,et al.Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition.Washington D.C.,USA:IEEE Press,2016:770-778. [5] MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[C]//Proceedings of International Conference on Neural Information Processing Systems.Washington D.C.,USA:IEEE Press,2013:3111-3119. [6] KINGMA D P.ADAM:a method for stochastic optimization[C]//Proceedings of International Conference on Learning Representations.Washington D.C.,USA:IEEE Press,2015:1-7. [7] LUO Liangchen,XIONG Yuanhao,LIU Yan,et al.Adaptive gradient method with dynamic bound of learning rate[C]//Proceedings of International Conference on Learning Representations.Washington D.C.,USA:IEEE Press,2019:15-25. [8] DEAN J,CORRADO G S,MONGA R,et al.Large scale distributed deep networks[C]//Proceedings of International Conference on Neural Information Processing Systems.New York,USA:ACM Press,2013:1223-1231. [9] POVEY D,ZHANG X H,KHUDANPUR S.Parallel training of DNNs with natural gradient and parameter averaging[C]//Proceedings of International Conference on Learning Representations.New York,USA:ACM Press,2015:7-18. [10] NIU F,ECHT B.HOGWILD!:a lock-free approach to parallelizing stochastic gradient descent[C]//Proceedings of the 25th Conference on Neural Information Processing Systems.New York,USA:ACM Press,2011:693-701. [11] DAI Wei,ZHOU Yi,DONG Nanqing,et al.Toward understanding the impact of staleness in distributed machine learning[C]//Proceedings of International Conference on Learning Representations.New York,USA:ACM Press,2019:1-8. [12] ZHENG Shuxin,MENG Qi,WANG Taifeng,et al.Asynchronous stochastic gradient descent with delay compensation[C]//Proceedings of International Conference on Machine Learning.New York,USA:ACM Press,2017:28-45. [13] LI S Z,MADDAH-ALI M A,YU Q,et al.A fundamental tradeoff between computation and communication in distributed computing[J].IEEE Transactions on Information Theory,2018,64(1):109-128. [14] LI S Z,MADDAH-ALI M A.Compressed coded distributed computing[C]//Proceedings of 2018 IEEE International Symposium on Information Theory.Washington D.C.,USA:IEEE Press,2018:2032-2036. [15] FERDINAND N,AL-LAWATI H,DRAPER S,et al.Anytime minibatch:exploiting stragglers in online distributed optimization[EB/OL].[2020-01-04].https://arxiv.org/abs/2006.05752. [16] FERDINAND N,GHARACHORLOO B,DRAPER S C.Anytime exploitation of stragglers in synchronous stochastic gradient descent[C]//Proceedings of the 16th IEEE International Conference on Machine Learning and Applications.Washington D.C.,USA:IEEE Press,2017:141-146. [17] YU Q,MADDAH-ALI M A.Straggler mitigation in distributed matrix multiplication:fundamental limits and optimal coding[C]//Proceedings of IEEE International Symposium on Information Theory.Washington D.C.,USA:IEEE Press,2018:2157-2162. [18] YU Q,MADDAH-ALI M A.Polynomial codes:an optimal design for high-dimensional coded matrix multiplication[C]//Proceedings of the 31st Conference on Neural Information Processing Systems.New York,USA:ACM Press,2017:4406-4416. [19] SEIDE F,FU H,DROPPO J,et al.1-Bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs[EB/OL].[2020-01-04].https://www.cnblogs.com/littleorange/p/12674552.html [20] CHEN Kai,HUO Qiang.Scalable train of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering[C]//Proceedings of International Conference on Acoustics,Speech and Signal Processing.Washington D.C.,USA:IEEE Press,2016:2379-2384. [21] ASSRAN M,LOIZOU N,BALLAS N,et al.Stochastic gradient push for distributed deep learning[EB/OL].[2020-01-04].https://arxiv.org/abs/1811.10792. [22] LEE K,LAM M,PEDARSANI R,et al.Speeding up distributed machine learning using codes[J].IEEE Transactions on Information Theory,2018,64(3):1514-1529. |