Hierarchical Federated Learning Algorithm Based on EMD Optimal Matching

doi:10.19678/j.issn.1000-3428.0068389

Abstract

Abstract:

Federated learning allows multiple clients to cooperatively train a high-performance global model without sharing private data. In a horizontal federated learning environment involving cross-silo scenarios, the statistical heterogeneity in the distribution of local client data degrades the performance of the global model. To improve the global model performance of federated learning, prevent sacrificing client privacy, and reduce computing costs, a new hybrid federated learning method, FedAvg-Match, is proposed in this paper. The basic idea is to improve model quality for clients by improving the federated learning method. Aiming at the data heterogeneity characterized by an unbalanced label distribution, a client group aggregation algorithm is designed under a hierarchical federated learning framework, to reduce the impact of client data heterogeneity on model performance. A client-matching algorithm, Dynamic Programming (DP)-ClientMatch, is designed to solve the problem of optimal client matching, whereby optimal client group matching is determined according to the client data distribution using Earth Mover's Distance (EMD). Experimental results across three datasets, MNIST, Fashion-MNIST and CIFAR-10, showed that compared with other federated learning algorithms, the proposed FedAvg-Match algorithm can significantly improve the performance of the global model in federated learning for image classification tasks. In federated learning scenarios with high statistical heterogeneity, the accuracy of the global model testing can be improved by at least more than 10 percentage points.

Key words: federated learning, non-Independent Identically Distribution (non-IID) data, optimal matching, EMD optimal matching, model quality

摘要：

联邦学习允许多个客户端在不共享私有数据的情况下协同训练高性能的全局模型。在跨组织场景的水平联邦学习环境下, 客户端本地数据分布中的统计异质性将降低全局模型的性能。为提升联邦学习的全局模型性能, 同时避免牺牲客户端隐私和增加计算成本, 提出一种新的混合联邦学习算法FedAvg-Match, 其基本思路是通过改进联邦学习算法提升客户端的模型质量。该算法面向以不平衡标签分布为特征的数据异构性, 在分层联邦学习框架下设计客户端分组聚合算法来减轻客户端数据异构性对模型性能的影响。针对客户端优化分组问题, 设计一种基于动态规划的客户端匹配算法DP-ClientMatch, 根据客户端的数据分布距离EMD得到最优的客户端分组匹配。在MNIST、Fashion-MNIST和CIFAR-10 3个数据集上的实验结果表明, 与其他联邦学习算法相比, 在高度统计异质性的联邦学习场景下, FedAvg-Match算法使全局模型测试精度最少可提高10百分点, 可以显著提高联邦学习全局模型在图像分类任务上的性能。

关键词: 联邦学习, 非独立同分布数据, 最优匹配, EMD最优匹配, 模型质量

WU Xiaohong, LI Pei, GU Yonggen, TAO Jie. Hierarchical Federated Learning Algorithm Based on EMD Optimal Matching[J]. Computer Engineering, 2025, 51(2): 170-178.

吴小红, 李佩, 顾永跟, 陶杰. 基于EMD最优匹配的分层联邦学习算法[J]. 计算机工程, 2025, 51(2): 170-178.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0068389

https://www.ecice06.com/EN/Y2025/V51/I2/170

Figures/Tables 8

Fig.1 Framework of hybrid federated learning

Fig.2 Schematic diagram of client grouping

Fig.3 The client data distribution

Fig.4 Local model average loss for global communication rounds under Dir(0.2) scenarios of different datasets

Fig.5 Learning curve of global communication rounds under Dir(0.2) scenarios of different datasets

Fig.6 Learning curve of global communication rounds under D(1) scenarios of different datasets

References 22

1	YANG Q, LIU Y, CHEN T J, et al. Federated machine learning: concept and applications[EB/OL]. [2023-08-12]. https://arxiv.org/pdf/1902.04885.10.1145/3298981
2	MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deep networks from decentralized data[EB/OL]. [2023-08-12]. https://arxiv.org/pdf/1602.05629.
3	邱天晨, 郑小盈, 祝永新, 等. 面向非独立同分布数据的联邦学习架构. 计算机工程, 2023, 49 (7): 110- 117. doi: 10.19678/j.issn.1000-3428.0064016
	QIU T C , ZHENG X Y , ZHU Y X , et al. Federated learning architecture for non-IID data. Computer Engineering, 2023, 49 (7): 110- 117. doi: 10.19678/j.issn.1000-3428.0064016
4	KAIROUZ P , MCMAHAN H B , AVENT B , et al. Advances and open problems in federated learning. Foundations and Trends^® in Machine Learning, 2021, 14 (1-2): 1- 210. URL
5	ZHAO Y, LI M, LAI L Z, et al. Federated learning with non-IID data[EB/OL]. [2023-08-12]. https://arxiv.org/pdf/1806.00582.
6	WANG H, KAPLAN Z, NIU D, et al. Optimizing federated learning on non-IID data with reinforcement learning[C]//Proceedings of Conference on Computer Communications. Washington D. C., USA: IEEE Press, 2020: 1698-1707.10.1109/INFOCOM41043.2020.9155494
7	HSU T M H, QI H, BROWN M. Measuring the effects of non-identical data distribution for federated visual classification[EB/OL]. [2023-08-12]. https://arxiv.org/abs/1909.06335.10.48550/arXiv.1909.06335
8	HUANG Y T, CHU L Y, ZHOU Z R, et al. Personalized cross-silo federated learning on non-IID data[C]//Proceedings of the AAAI Conference on Artificial Intelligence. [S. l. ]: AAAI Press, 2021: 7865-7873.10.1609/aaai.v35i9.16960
9	JIAO Y T , WANG P , NIYATO D , et al. Toward an automated auction framework for wireless federated learning services market. IEEE Transactions on Mobile Computing, 2021, 20 (10): 3034- 3048. doi: 10.1109/TMC.2020.2994639
10	KHALED A, MISHCHENKO K, RICHTÁRIK P. Tighter theory for local SGD on identical and heterogeneous data[EB/OL]. [2023-08-12]. https://arxiv.org/abs/1909.04746?context=math.NA.
11	WOODWORTH B E, WANG J L, MCMAHAN H B, et al. Graph oracle models, lower bounds, and gaps for parallel stochastic optimization[EB/OL]. [2023-08-12]. https://arxiv.org/pdf/1805.10222.
12	LI X, HUANG K X, YANG W H, et al. On the convergence of FedAvg on non-IID data[EB/OL]. [2023-08-12]. https://arxiv.org/abs/1907.02189v4.
13	KARIMIREDDY S P, KALE S, MOHRI M, et al. Scaffold: stochastic controlled averaging for federated learning[EB/OL]. [2023-08-12]. https://arxiv.org/abs/1910.06378?context=stat.
14	LI T, SAHU A K, ZAHEER M, et al. Federated optimization in heterogeneous networks[EB/OL]. [2023-08-12]. https://arxiv.org/abs/1812.06127v5.
15	LI Q B, HE B S, SONG D. Model-contrastive federated learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2021: 10713-10722.10.1109/CVPR46437.2021.01057
16	GAO L, FU H Z, LI L, et al. FedDC: federated learning with non-IID data via local drift decoupling and correction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Washington D. C., USA: IEEE Press, 2022: 10112-10121.10.1109/CVPR52688.2022.00987
17	程勇, 董苗波, 刘洋, 等. 一种混合联邦学习方法及架构: 110490738A[P]. 2019-11-22.
	CHENG Y, DONG M B, LIU Y, et al. A hybrid federated learning method and architecture: 110490738A[P]. 2019-11-22. (in Chinese)
18	GHOSH A, CHUNG J C, YIN D, et al. An efficient framework for clustered federated learning[EB/OL]. [2023-08-12]. https://arxiv.org/abs/2006.04088?context=cs.LG.10.1109/TIT.2022.3192506
19	LI Z X , LU J X , LUO S , et al. Towards effective clustered federated learning: a peer-to-peer framework with adaptive neighbor matching. IEEE Transactions on Big Data, 2024, 10 (6): 812- 826. doi: 10.1109/TBDATA.2022.3222971
20	CORMEN T H , LEISERSON C E , RIVEST R L , et al. Introduction to algorithms. Cambridge, USA: MIT Press, 2022.
21	TIAN X L , OUYANG D T , WANG Y Y , et al. Combinatorial optimization and local search: a case study of the discount knapsack problem. Computers and Electrical Engineering, 2023, 105, 108551. doi: 10.1016/j.compeleceng.2022.108551
22	YUROCHKIN M, AGARWAL M, GHOSH S, et al. Bayesian nonparametric federated learning of neural networks[EB/OL]. [2023-08-12]. https://arxiv.org/abs/1905.12022.

[1]	WU Ruolan, CHEN Yuling, DOU Hui, ZHANG Yangwen, LONG Zhong. Privacy Preserving Algorithm Using Federated Learning Against Attacks [J]. Computer Engineering, 2025, 51(2): 179-187.
[2]	WANG Yuanyuan, WANG Shiqian, WANG Han, GUO Zhengbin, HU Xiancheng. Cross-Border Intelligent Analysis of Energy Emission Based on Vertical Federated Learning [J]. Computer Engineering, 2025, 51(1): 164-173.
[3]	CHEN Xianyi, DING Sizhe, WANG Kang, YAN Leiming, FU Zhangjie. An Watermarking Framework of Active Protection Model for Secure Federated Learning [J]. Computer Engineering, 2025, 51(1): 138-147.
[4]	PAN Enyuan, ZHONG Yuan, LI Ping. Semi-Supervised Cervical Spine MRI Segmentation Model in Federated Heterogeneous Data [J]. Computer Engineering, 2024, 50(9): 367-376.
[5]	Hongjiao LI, Baojin WANG, Zhaohui WANG, Renhao HU. Dual-Client Selection Algorithm Based on Model Similarity and Local Loss [J]. Computer Engineering, 2024, 50(8): 153-164.
[6]	GU Yonggen, GAO Lingxuan, WU Xiaohong, TAO Jie. Research on Data Sharing of Federated Semi-Supervised Learning with Non-IID [J]. Computer Engineering, 2024, 50(6): 188-196.
[7]	GU Yonggen, LI Guoxiao, WU Xiaohong, TAO Jie, ZHANG Yanqiong. Incentive Mechanism for Multi-Task Federated Learning Under Budget Constraints [J]. Computer Engineering, 2024, 50(5): 149-157.
[8]	XIONG Shiqiang, HE Daojing, WANG Zhendong, DU Runmeng. Review of Federated Learning and Its Security and Privacy Protection [J]. Computer Engineering, 2024, 50(5): 1-15.
[9]	Shaojie LIU, Bin WEN, Zexu WANG. Multi-Technology Fused Data Trading Method Based on Federated Learning [J]. Computer Engineering, 2024, 50(3): 182-190.
[10]	Huawei SONG, Shengqi LI, Fangjie WAN, Yuping WEI. Federated Learning Optimization Method in Non-IID Scenarios [J]. Computer Engineering, 2024, 50(3): 166-172.
[11]	Xiaojun ZHANG, Xingpeng LI, Wei TANG, Yunpu HAO, Jingting XUE. Cloud-Edge Fusion Verifiable Privacy-Preserving Cross-Domain Federated Learning Scheme [J]. Computer Engineering, 2024, 50(3): 148-155.
[12]	Chenjun ZHENG, Yan ZENG, Junfeng YUAN, Jilin ZHANG, Xin WANG, Meng HAN. Ship AIS Trajectory Prediction Algorithm Based on Federated Learning [J]. Computer Engineering, 2024, 50(2): 298-307.
[13]	Panfeng ZHANG, Danhua WU, Minggang DONG. Differential Privacy Deep Learning Model Based on Particle Swarm Optimization [J]. Computer Engineering, 2023, 49(9): 144-157.
[14]	Meiguang ZHENG, Yong YANG. Personalized Federated Learning Algorithm Based on Mutual Information and Soft Clustering [J]. Computer Engineering, 2023, 49(8): 20-28.
[15]	WEN Yilin, ZHAO Nailiang, ZENG Yan, HAN Meng, YUE Lupeng, ZHANG Jilin. Client Selection Method Based on Local Model Quality [J]. Computer Engineering, 2023, 49(6): 131-143.

Please choose a citation manager

Content to export