Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2022, Vol. 48 ›› Issue (10): 103-109. doi: 10.19678/j.issn.1000-3428.0062812

• Artificial Intelligence and Pattern Recognition • Previous Articles     Next Articles

Robust Speech Recognition Technology Based on Federal Learning with Local Distillation

BAI Caitong1,3, CUI Xiaolong2,3, LI Ai1,3   

  1. 1. Graduate Group, Engineering University of PAP, Xi'an 710086, China;
    2. Urumqi Campus of Engineering University of PAP, Urumqi 830049, China;
    3. Anti-terrorism Command Information Engineering Research Team, Engineering University of PAP, Xi'an 710086, China
  • Received:2021-09-26 Revised:2021-10-29 Published:2021-11-05

基于本地蒸馏联邦学习的鲁棒语音识别技术

柏财通1,3, 崔翛龙2,3, 李爱1,3   

  1. 1. 武警工程大学 研究生大队, 西安 710086;
    2. 武警工程大学乌鲁木齐校区, 乌鲁木齐 830049;
    3. 武警工程大学 反恐指挥信息工程研究团队, 西安 710086
  • 作者简介:柏财通(1995—),男,硕士研究生,主研方向为智能语音识别;崔翛龙(通信作者),教授;李爱,硕士。
  • 基金资助:
    国家自然科学基金(U1603261);网信融合项目(LXJH-10(A)-09)。

Abstract: This study proposes a personalized local distillation-based Federated Learning(FL) algorithm, called PLD-FLD, to solve the problem of Non-Independent Identical Distribution(Non-IID) of the training data and the lack of personalization of client models when the FL algorithm is applied to robust speech recognition tasks.First, the clients upload local Logits through the uplink.Second, the center server sends the parameters under aggregation only when the edge model test performance is better than that of the local model, to use a center server to download and link the parameters and ensure the personalized and generalization of the local model.Finally, the model parameters and global Logits are downloaded to the client through a downlink, and local distillation learning is performed to overcome the problem of Non-IID training samples.The experimental results on the AISHELL and PERSONAL datasets show that the PLD-FLD algorithm can improve balance model performance and reduce communication costs.The speech recognition accuracy in military equipment control tasks reaches 91%.The PLD-FLD algorithm exhibits higher convergence speed and better robustness than the distributed training FL and Federated Learning Distillation(FLD) algorithms.

Key words: robust speech recognition, Federal Learning(FL), local distillation, Non-Independent Identically Distribution(Non-IID), distributed training

摘要: 当联邦学习(FL)算法应用于鲁棒语音识别任务时,为解决训练数据非独立同分布(Non-IID)与客户端模型缺乏个性化问题,提出基于个性化本地蒸馏的联邦学习(PLD-FLD)算法。客户端通过上行链路上传本地Logits并在中心服务器聚合后下传参数,当边缘端模型测试性能优于本地模型时,利用下载链路接收中心服务器参数,确保了本地模型的个性化与泛化性,同时将模型参数与全局Logits通过下行链路下传至客户端,实现本地蒸馏学习,解决了训练数据的Non-IID问题。在AISHELL与PERSONAL数据集上的实验结果表明,PLD-FLD算法能在模型性能与通信成本之间取得较好的平衡,面向军事装备控制任务的语音识别准确率高达91%,相比于分布式训练的FL和FLD算法具有更快的收敛速度和更强的鲁棒性。

关键词: 鲁棒语音识别, 联邦学习, 本地蒸馏, 非独立同分布, 分布式训练

CLC Number: