Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2023, Vol. 49 ›› Issue (6): 131-143. doi: 10.19678/j.issn.1000-3428.0065658

• Cyberspace Security • Previous Articles     Next Articles

Client Selection Method Based on Local Model Quality

WEN Yilin1, ZHAO Nailiang1, ZENG Yan1,2,3, HAN Meng1, YUE Lupeng1, ZHANG Jilin1,2,3   

  1. 1. School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China;
    2. Key Laboratory of Complex System Modeling and Simulation, Ministry of Education, Hangzhou Dianzi University, Hangzhou 310018, China;
    3. Zhejiang Engineering Research Center of Data Security Governance, Hangzhou 310018, China
  • Received:2022-09-02 Revised:2022-11-03 Published:2023-06-10

基于本地模型质量的客户端选择方法

温依霖1, 赵乃良1, 曾艳1,2,3, 韩猛1, 岳鲁鹏1, 张纪林1,2,3   

  1. 1. 杭州电子科技大学 计算机学院, 杭州 310018;
    2. 杭州电子科技大学 复杂系统建模与仿真教育部重点实验室, 杭州 310018;
    3. 数据安全治理浙江省工程研究中心, 杭州 310018
  • 作者简介:温依霖(1998-),女,硕士研究生,主研方向为联邦学习;赵乃良,教授;曾艳,副教授;韩猛、岳鲁鹏,博士研究生;张纪林(通信作者),教授。
  • 基金资助:
    国家重点研发计划(2019YFB2102101);国家自然科学基金(62072146,61972358);浙江省重点研发计划(2021C03187)。

Abstract: Federated learning is a distributed machine learning method that targets environments where data are distributed across multiple clients that collaborate to train models.In an ideal scenario,all clients participate in each round of training,but in practical applications,only a random portion of clients are selected to participate.Randomly selected clients often fail to fully reflect the global data distribution characteristics,resulting in a decrease in global model training efficiency and model accuracy.ChFL,a client selection method based on local model quality,is proposed.The important factors that affect the accuracy and convergence speed of the model are analyzed,and loss value and training time,two important indicators that can reflect the quality of the client model,are extracted. The combination of local loss values and training time for modeling are used to evaluate the quality of client models.Consequently,client selection is guided based on client quality and combined with a random selection strategy in a certain proportion to improve the accuracy of the global model.By selecting clients with high-quality data and better computational performance to participate in training,model accuracy is improved and convergence speed is accelerated.The experimental results on the FEMNIST,CIFAR-10,MNIST,CINIC-10,and EMNIST datasets show that compared to the three baseline algorithms(FedAvg,FedProx,and FedNova),combining ChFL with the baseline algorithm results in an average convergence speed acceleration of about 10% and an accuracy improvement of about 4 percentage points.

Key words: federated learning, heterogeneous data, loss value, training time, client selection

摘要: 联邦学习是一种针对数据分布于多个客户端的环境下,客户端共同协作训练模型的分布式机器学习方法。在理想情况下全部客户端均参与每轮训练,但是实际应用中只随机选择一部分客户端参与。随机选择的客户端通常不能全面反映全局数据分布特征,导致全局模型训练效率和模型精度降低。为此,提出一种基于本地模型质量的客户端选择方法ChFL。分析影响模型精度和收敛速度的重要因素,提取可反映客户端模型质量的损失值和训练时间2个重要指标。通过对本地损失值和训练时间融合建模,用于评估客户端模型质量。在此基础上,基于客户端质量指导客户端选择,同时与随机选择策略进行一定比例的结合,以提高全局模型精度。通过选择具有高质量的数据且计算性能较优的客户端参与训练,提升模型精度并加快收敛速度。在FEMNIST、CIFAR-10、MNIST、CINIC-10和EMNIST数据集上的实验结果表明,相比3种基线算法FedAvg、FedProx、FedNova,将ChFL与基线算法相结合后的收敛速度平均加快约10%,准确率平均提高4个百分点。

关键词: 联邦学习, 数据异构, 损失值, 训练时间, 客户端选择

CLC Number: