Robust Speech Recognition Technology Based on Federal Learning with Local Distillation

doi:10.19678/j.issn.1000-3428.0062812

Abstract

Abstract: This study proposes a personalized local distillation-based Federated Learning(FL) algorithm, called PLD-FLD, to solve the problem of Non-Independent Identical Distribution(Non-IID) of the training data and the lack of personalization of client models when the FL algorithm is applied to robust speech recognition tasks.First, the clients upload local Logits through the uplink.Second, the center server sends the parameters under aggregation only when the edge model test performance is better than that of the local model, to use a center server to download and link the parameters and ensure the personalized and generalization of the local model.Finally, the model parameters and global Logits are downloaded to the client through a downlink, and local distillation learning is performed to overcome the problem of Non-IID training samples.The experimental results on the AISHELL and PERSONAL datasets show that the PLD-FLD algorithm can improve balance model performance and reduce communication costs.The speech recognition accuracy in military equipment control tasks reaches 91%.The PLD-FLD algorithm exhibits higher convergence speed and better robustness than the distributed training FL and Federated Learning Distillation(FLD) algorithms.

Key words: robust speech recognition, Federal Learning(FL), local distillation, Non-Independent Identically Distribution(Non-IID), distributed training

摘要： 当联邦学习（FL）算法应用于鲁棒语音识别任务时，为解决训练数据非独立同分布（Non-IID）与客户端模型缺乏个性化问题，提出基于个性化本地蒸馏的联邦学习（PLD-FLD）算法。客户端通过上行链路上传本地Logits并在中心服务器聚合后下传参数，当边缘端模型测试性能优于本地模型时，利用下载链路接收中心服务器参数，确保了本地模型的个性化与泛化性，同时将模型参数与全局Logits通过下行链路下传至客户端，实现本地蒸馏学习，解决了训练数据的Non-IID问题。在AISHELL与PERSONAL数据集上的实验结果表明，PLD-FLD算法能在模型性能与通信成本之间取得较好的平衡，面向军事装备控制任务的语音识别准确率高达91%，相比于分布式训练的FL和FLD算法具有更快的收敛速度和更强的鲁棒性。

关键词: 鲁棒语音识别, 联邦学习, 本地蒸馏, 非独立同分布, 分布式训练

CLC Number:

TP18

BAI Caitong, CUI Xiaolong, LI Ai. Robust Speech Recognition Technology Based on Federal Learning with Local Distillation[J]. Computer Engineering, 2022, 48(10): 103-109.

柏财通, 崔翛龙, 李爱. 基于本地蒸馏联邦学习的鲁棒语音识别技术[J]. 计算机工程, 2022, 48(10): 103-109.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0062812

http://www.ecice06.com/EN/Y2022/V48/I10/103

Figures/Tables 11

References

[1] ZHANG S C, DO C T, DODDIPATLA R, et al.Learning noise invariant features through transfer learning for robust end-to-end speech recognition[C]//Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2020:7024-7028.
[2] LIANG D, HUANG Z H, LIPTON Z C.Learning noise-invariant representations for robust speech recognition[C]//Proceedings of IEEE Spoken Language Technology Workshop.Washington D.C., USA:IEEE Press, 2018:56-63.
[3] WÖLLMER M, EYBEN F, GRAVES A, et al.Improving keyword spotting with a tandem BLSTM-DBN architecture[C]//Proceedings of International Conference on Nonlinear Speech Processing.Berlin, Germany:Springer, 2010:68-75.
[4] WÖLLMER M, SCHULLER B, EYBEN F, et al.Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening[J].IEEE Journal of Selected Topics in Signal Processing, 2010, 4(5):867-881.
[5] GEIGER J T, WENINGER F, GEMMEKE J F, et al.Memory-enhanced neural networks and NMF for robust ASR[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(6):1037-1046.
[6] QIAN Y M, BI M X, TAN T, et al.Very deep convolutional neural networks for noise robust speech recognition[J].IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016, 24(12):2263-2276.
[7] Cisco.Fog computing and the Internet of Things:extend the cloud to where the things are[EB/OL].[2022-08-11].http://www.innovation4.cn/library/r1490.
[8] Cisco visual networking index:global mobile data traffic forecast update(2017-2022)[EB/OL].[2022-08-11].http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11-520862.html.
[9] VOIGT P, VON DEM BUSSCHE A.The EU General Data Protection Regulation(GDPR)[M].Berlin, Germany:Springer, 2017.
[10] WANG X F, HAN Y W, WANG C Y, et al.In-edge AI:intelligentizing mobile edge computing, caching and communication by federated learning[J].IEEE Network, 2019, 33(5):156-165.
[11] LI E, ZHOU Z, CHEN X.Edge intelligence:on-demand deep learning model co-inference with device-edge synergy[C]//Proceedings of 2018 Workshop on Mobile Edge Communications.Washington D.C., USA:IEEE Press, 2018:31-36.
[12] WANG Z Y, CUI Y, LAI Z Q.A first look at mobile intelligence:architecture, experimentation and challenges[J].IEEE Network, 2019, 33(4):120-125.
[13] KHELIFI H, LUO S L, NOUR B, et al.Bringing deep learning at the edge of information-centric Internet of Things[J].IEEE Communications Letters, 2019, 23(1):52-55.
[14] LANE N D, WARDEN P.The deep(learning) transformation of mobile and embedded computing[J].Computer, 2018, 51(5):12-16.
[15] CHEN F, LUO M, DONG Z H, et al.Federated meta-learning with fast convergence and efficient communication[EB/OL].[2022-08-11].https://arxiv.org/abs/1802.07876.
[16] CHEN Y Q, QIN X, WANG J D, et al.FedHealth:a federated transfer learning framework for wearable healthcare[J].IEEE Intelligent Systems, 2020, 35(4):83-93.
[17] PELTONEN E, BENNIS M, CAPOBIANCO M, et al.6G white paper on edge intelligence[EB/OL].[2022-08-11].https://arxiv.org/abs/2004.14850.
[18] KAIROUZ E B P, MCMAHAN H B.Advances and open problems in federated learning[EB/OL].[2022-08-11].https://arxiv.org/abs/1912.04977v3.
[19] PARK J, SAMARAKOON S, BENNIS M, et al.Wireless network intelligence at the edge[J].Proceedings of the IEEE, 2019, 107(11):2204-2239.
[20] LIM W Y B, LUONG N C, HOANG D T, et al.Federated learning in mobile edge networks:a comprehensive survey[EB/OL].[2022-08-11].https://arxiv.org/abs/1909.11875.
[21] MCMAHAN B, MOORE E, RAMAGE D, et al.Communication-efficient learning of deep networks from decentralized data[EB/OL].[2022-08-11].https://arxiv.org/abs/1602.05629v3.
[22] LIU Y, YUAN X L, XIONG Z H, et al.Federated learning for 6G communications:challenges, methods, and future directions[J].China Communications, 2020, 17(9):105-118.
[23] JEONG E, OH S, KIM H, et al.Communication-efficient on-device machine learning:federated distillation and augmentation under non-IID private data[EB/OL].[2022-08-11].https://arxiv.org/abs/1811.11479.
[24] OH S, PARK J, JEONG E, et al.Mix2FLD:downlink federated learning after uplink federated distillation with two-way mixup[J].IEEE Communications Letters, 2020, 24(10):2211-2215.
[25] AHN J H, SIMEONE O, KANG J.Wireless federated distillation for distributed edge learning with heterogeneous data[C]//Proceedings of the 30th Annual International Symposium on Personal, Indoor and Mobile Radio Communications.Washington D.C., USA:IEEE Press, 2019:1-6.
[26] ITAHARA S, NISHIO T, KODA Y, et al.Distillation-based semi-supervised federated learning for communication-efficient collaborative training with non-IID private data[EB/OL].[2022-08-11].https://arxiv.org/abs/2008.06180v2.
[27] CHANG H, SHEJWALKAR V, SHOKRI R, et al.Cronus:robust and heterogeneous collaborative learning with black-box knowledge transfer[EB/OL].[2022-08-11].https://arxiv.org/abs/1912.11279.
[28] PARK J, WANG S Q, ELGABLI A, et al.Distilling on-device intelligence at the network edge[EB/OL].[2022-08-11].https://arxiv.org/abs/1908.05895?context=math.IT.
[29] ZHANG Y, XIANG T, HOSPEDALES T M, et al.Deep Mutual Learning[EB/OL].[2022-08-11].https://openaccess.thecvf.com/content_cvpr_2018/html/Zhang_Deep_Mutual_Learning_CVPR_2018_paper.html.
[30] LAN X, ZHU X T, GONG S G.Knowledge distillation by on-the-fly native ensemble[EB/OL].[2022-08-11].https://arxiv.org/abs/1806.04606.
[31] KUNCHEVA L I, WHITAKER C J.Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy[J].Machine Learning, 2003, 51(2):181-207.
[32] 柏财通, 高志强, 李爱, 等.基于门控网络的军事装备控制指令语音识别研究[J].计算机工程, 2021, 47(7):301-306. BAI C T, GAO Z Q, LI A, et al.Research on voice recognition of military equipment control commands based on gated network[J].Computer Engineering, 2021, 47(7):301-306.(in Chinese)
[33] RAVANELLI M, ZHONG J Y, PASCUAL S, et al.Multi-task self-supervised learning for robust speech recognition[C]//Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing.Washington D.C., USA:IEEE Press, 2020:6989-6993.
[34] GRAVES A, FERNÁNDEZ S, GOMEZ F, et al.Connectionist temporal classification:labelling unsegmented sequence data with recurrent neural networks[C]//Proceedings of the 23rd International Conference on Machine learning.New York, USA:ACM Press, 2006:369-376.

Please choose a citation manager

Content to export