Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2025, Vol. 51 ›› Issue (2): 170-178. doi: 10.19678/j.issn.1000-3428.0068389

• Cyberspace Security • Previous Articles     Next Articles

Hierarchical Federated Learning Algorithm Based on EMD Optimal Matching

WU Xiaohong1,2, LI Pei1, GU Yonggen1,2,*(), TAO Jie1,2   

  1. 1. School of Information Engineering, Huzhou University, Huzhou 313000, Zhejiang, China
    2. Zhejiang Key Laboratory of Intelligent Management and Application of Modern Agricultural Resources, Huzhou University, Huzhou 313000, Zhejiang, China
  • Received:2023-09-17 Online:2025-02-15 Published:2025-02-28
  • Contact: GU Yonggen

基于EMD最优匹配的分层联邦学习算法

吴小红1,2, 李佩1, 顾永跟1,2,*(), 陶杰1,2   

  1. 1. 湖州师范学院信息工程学院, 浙江 湖州 313000
    2. 湖州师范学院浙江省现代农业资源智慧管理与应用研究重点实验室, 浙江 湖州 313000
  • 通讯作者: 顾永跟
  • 基金资助:
    国家自然科学基金青年科学基金项目(61906066); 国家自然科学基金青年科学基金项目(2022ZD2002)

Abstract:

Federated learning allows multiple clients to cooperatively train a high-performance global model without sharing private data. In a horizontal federated learning environment involving cross-silo scenarios, the statistical heterogeneity in the distribution of local client data degrades the performance of the global model. To improve the global model performance of federated learning, prevent sacrificing client privacy, and reduce computing costs, a new hybrid federated learning method, FedAvg-Match, is proposed in this paper. The basic idea is to improve model quality for clients by improving the federated learning method. Aiming at the data heterogeneity characterized by an unbalanced label distribution, a client group aggregation algorithm is designed under a hierarchical federated learning framework, to reduce the impact of client data heterogeneity on model performance. A client-matching algorithm, Dynamic Programming (DP)-ClientMatch, is designed to solve the problem of optimal client matching, whereby optimal client group matching is determined according to the client data distribution using Earth Mover's Distance (EMD). Experimental results across three datasets, MNIST, Fashion-MNIST and CIFAR-10, showed that compared with other federated learning algorithms, the proposed FedAvg-Match algorithm can significantly improve the performance of the global model in federated learning for image classification tasks. In federated learning scenarios with high statistical heterogeneity, the accuracy of the global model testing can be improved by at least more than 10 percentage points.

Key words: federated learning, non-Independent Identically Distribution (non-IID) data, optimal matching, EMD optimal matching, model quality

摘要:

联邦学习允许多个客户端在不共享私有数据的情况下协同训练高性能的全局模型。在跨组织场景的水平联邦学习环境下, 客户端本地数据分布中的统计异质性将降低全局模型的性能。为提升联邦学习的全局模型性能, 同时避免牺牲客户端隐私和增加计算成本, 提出一种新的混合联邦学习算法FedAvg-Match, 其基本思路是通过改进联邦学习算法提升客户端的模型质量。该算法面向以不平衡标签分布为特征的数据异构性, 在分层联邦学习框架下设计客户端分组聚合算法来减轻客户端数据异构性对模型性能的影响。针对客户端优化分组问题, 设计一种基于动态规划的客户端匹配算法DP-ClientMatch, 根据客户端的数据分布距离EMD得到最优的客户端分组匹配。在MNIST、Fashion-MNIST和CIFAR-10 3个数据集上的实验结果表明, 与其他联邦学习算法相比, 在高度统计异质性的联邦学习场景下, FedAvg-Match算法使全局模型测试精度最少可提高10百分点, 可以显著提高联邦学习全局模型在图像分类任务上的性能。

关键词: 联邦学习, 非独立同分布数据, 最优匹配, EMD最优匹配, 模型质量