Client Selection Method Based on Local Model Quality

doi:10.19678/j.issn.1000-3428.0065658

Abstract

Abstract: Federated learning is a distributed machine learning method that targets environments where data are distributed across multiple clients that collaborate to train models.In an ideal scenario，all clients participate in each round of training，but in practical applications，only a random portion of clients are selected to participate.Randomly selected clients often fail to fully reflect the global data distribution characteristics，resulting in a decrease in global model training efficiency and model accuracy.ChFL，a client selection method based on local model quality，is proposed.The important factors that affect the accuracy and convergence speed of the model are analyzed，and loss value and training time，two important indicators that can reflect the quality of the client model，are extracted. The combination of local loss values and training time for modeling are used to evaluate the quality of client models.Consequently，client selection is guided based on client quality and combined with a random selection strategy in a certain proportion to improve the accuracy of the global model.By selecting clients with high-quality data and better computational performance to participate in training，model accuracy is improved and convergence speed is accelerated.The experimental results on the FEMNIST，CIFAR-10，MNIST，CINIC-10，and EMNIST datasets show that compared to the three baseline algorithms（FedAvg，FedProx，and FedNova），combining ChFL with the baseline algorithm results in an average convergence speed acceleration of about 10% and an accuracy improvement of about 4 percentage points.

Key words: federated learning, heterogeneous data, loss value, training time, client selection

摘要： 联邦学习是一种针对数据分布于多个客户端的环境下，客户端共同协作训练模型的分布式机器学习方法。在理想情况下全部客户端均参与每轮训练，但是实际应用中只随机选择一部分客户端参与。随机选择的客户端通常不能全面反映全局数据分布特征，导致全局模型训练效率和模型精度降低。为此，提出一种基于本地模型质量的客户端选择方法ChFL。分析影响模型精度和收敛速度的重要因素，提取可反映客户端模型质量的损失值和训练时间2个重要指标。通过对本地损失值和训练时间融合建模，用于评估客户端模型质量。在此基础上，基于客户端质量指导客户端选择，同时与随机选择策略进行一定比例的结合，以提高全局模型精度。通过选择具有高质量的数据且计算性能较优的客户端参与训练，提升模型精度并加快收敛速度。在FEMNIST、CIFAR-10、MNIST、CINIC-10和EMNIST数据集上的实验结果表明，相比3种基线算法FedAvg、FedProx、FedNova，将ChFL与基线算法相结合后的收敛速度平均加快约10%，准确率平均提高4个百分点。

关键词: 联邦学习, 数据异构, 损失值, 训练时间, 客户端选择

CLC Number:

TP181

WEN Yilin, ZHAO Nailiang, ZENG Yan, HAN Meng, YUE Lupeng, ZHANG Jilin. Client Selection Method Based on Local Model Quality[J]. Computer Engineering, 2023, 49(6): 131-143.

温依霖, 赵乃良, 曾艳, 韩猛, 岳鲁鹏, 张纪林. 基于本地模型质量的客户端选择方法[J]. 计算机工程, 2023, 49(6): 131-143.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0065658

http://www.ecice06.com/EN/Y2023/V49/I6/131

Figures/Tables 12

References

[1] MCMAHAN H B,RAMAGE D,TALWAR K,et al.Learning differentially private recurrent language models[EB/OL].[2022-07-29].https://arxiv.org/pdf/1710. 06963v2.pdf.
[2] Apple Differential Privacy Team.Learning with privacy at scale[EB/OL].[2022-07-29].https://machinelearning.apple.com/research/learning-with-privacyat-scale.
[3] HARTMANN F,SUH S,KOMARZEWSKI A,et al.Federated learning for ranking browser history suggestions[EB/OL].[2022-07-29].https://arxiv.org/abs/1911.11807.
[4] Google.Tensorflow federated[EB/OL].[2022-07-29].https://www.tensorfow.org/federated 2020.
[5] CALDAS S,DUDDU S M K,WU P,et al.Leaf:a benchmark for federated settings[EB/OL].[2022-07-29].https://arxiv.org/pdf/1812.01097.pdf.
[6] Baudu.PaddleFL[EB/OL].[2022-07-29].https://github.com/PaddlePaddle/PaddleFL2020.
[7] RYFFEL T,TRASK A,DAHL M,et al.A generic framework for privacy preserving deep learning[EB/OL].[2022-07-29].https://arxiv.org/pdf/1811.04017.pdf.
[8] LI Q B,DIAO Y Q,CHEN Q,et al.Federated learning on non-IID data silos:an experimental study[C]//Proceedings of the 38th International Conference on Data Engineering.Washington D.C.,USA:IEEE Press,2022:965-978.
[9] ZHU H Y,XU J J,LIU S Q,et al.Federated learning on non-IID data:a survey[J].Neurocomputing,2021,465:371-390.
[10] ZHAO Y,LI M,LAI L Z,et al.Federated learning with non-IID data[EB/OL].[2022-07-29].https://arxiv.org/pdf/1806.00582.pdf.
[11] LI T,SAHU A K,ZAHEER M,et al.Federated optimization in heterogeneous networks[EB/OL].[2022-07-29].https://arxiv.org/abs/1812.06127v5.
[12] WANG H,KAPLAN Z,NIU D,et al.Optimizing federated learning on non-IID data with reinforcement learning[C]//Proceedings of Conference on Computer Communications.Washington D.C.,USA:IEEE Press,2020:1698-1707.
[13] KONEČNÝ J,MCMAHAN H B,YU F X,et al.Federated learning:strategies for improving communication efficiency[EB/OL].[2022-07-29].https://arxiv.org/pdf/1610.05492.pdf.
[14] CHAI Z,ALI A,ZAWAD S,et al.TiFL:a tier-based federated learning system[C]//Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing.New York,USA:ACM Press,2020:125-136.
[15] MCMAHAN H B,MOORE E,RAMAGE D,et al.Communication-efficient learning of deep networks from decentralized data[EB/OL].[2022-07-29].https://arxiv.org/pdf/1602.05629.pdf.
[16] KARIMIREDDY S P,KALE S,MOHRI M,et al.SCAFFOLD:stochastic controlled averaging for on-device federated learning[EB/OL].[2022-07-29].https://arxiv.org/abs/1910.06378v4.
[17] ZHU Z D,HONG J Y,ZHOU J Y.Data-free knowledge distillation for heterogeneous federated learning[EB/OL].[2022-07-29].https://arxiv.org/abs/2105.10056.
[18] SATTLER F,MÜLLER K R,SAMEK W.Clustered federated learning:model-agnostic distributed multitask optimization under privacy constraints[J].IEEE Transactions on Neural Networks and Learning Systems,2020,32(8):3710-3722.
[19] 邱天晨,郑小盈,祝永新,等. FedFog:面向非独立同分布数据的联邦学习架构[J/OL].计算机工程:1-10[2022-07-29].https://doi.org/10.19678/j.issn.1000-3428.0064016.QIU T C,ZHENG X Y,ZHU Y X,et al.FedFog:federated learning architecture for non-IID data[J/OL].Computer Engineering:1-10[2022-07-29].https://doi.org/10.19678/j.issn.1000-3428.0064016.(in Chinese)
[20] 陈乃月,金一,李浥东,等.基于区块链的公平性联邦学习模型[J].计算机工程,2022,48(6):33-41.CHEN N Y,JIN Y,LI Y D,et al.Federated learning model with fairness based on blockchain[J].Computer Engineering,2022,48(6):33-41.(in Chinese)
[21] RIBERO M,VIKALO H.Communication-efficient federated learning via optimal client sampling[EB/OL].[2022-07-29].https://arxiv.org/abs/2007.15197v1.
[22] LAI F,ZHU X F,MADHYASTHA H,et al.Oort:informed participant selection for scalable federated learning[EB/OL].[2022-07-29].http://arXiv preprint arXiv:2010.06081,2020.
[23] LI X Y,QU Z,TANG B,et al.Stragglers are not disaster:a hybrid federated learning algorithm with delayed gradients[C]//Proceedings of the 21st International Conference on Machine Learning and Applications.Washington D.C.,USA:IEEE Press,2022:1-10.
[24] CHO Y J,WANG J Y,JOSHI G.Client selection in federated learning:convergence analysis and power-of-choice selection strategies[EB/OL].[2022-07-29].http://arXivpreprintarXiv:2010.01243,2020.
[25] KATHAROPOULOS A,FLEURET F.Biased importance sampling for deep neural network training[EB/OL].[2022-07-29].https://arxiv.org/pdf/1706.00043.pdf.
[26] KATHAROPOULOS A,FLEURET F.Not all samples are created equal:deep learning with importance sampling[EB/OL].[2022-07-29].https://arxiv.org/pdf/1803. 00942.pdf.
[27] DA COSTA PEREIRA C,DRAGONI M,PASI G.Multidimensional relevance:prioritized aggregation in a personalized information retrieval setting[J].Information Processing & Management,2012,48(2):340-357.
[28] HUANG T S,LIN W W,WU W T,et al.an efficiency-boosting client selection scheme for federated learning with fairness guarantee[J].IEEE Transactions on Parallel and Distributed Systems,2020,32(7):1552-1564.
[29] HE C Y,LI S Z,SO J Y,et al.FedML:a research library and benchmark for federated machine learning[EB/OL].[2022-07-29].https://arxiv.org/abs/2007.13518v2.

Please choose a citation manager

Content to export