作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2022, Vol. 48 ›› Issue (6): 107-114,123. doi: 10.19678/j.issn.1000-3428.0061911

• 人工智能与模式识别 • 上一篇    下一篇

一种鲁棒的半监督联邦学习系统

王树芬1, 张哲2, 马士尧2, 陈俞强3, 伍一2   

  1. 1. 哈尔滨石油学院 信息工程学院, 哈尔滨 150028;
    2. 黑龙江大学 数据科学与技术学院, 哈尔滨 150080;
    3. 广州航海学院信息与通信工程学院, 广州 510725
  • 收稿日期:2021-06-15 修回日期:2021-07-23 发布日期:2021-08-12
  • 作者简介:王树芬(1982—),女,副教授、硕士,主研方向为边缘计算、联邦学习;张哲、马士尧,硕士研究生;陈俞强,教授、博士;伍一(通信作者),教授。
  • 基金资助:
    国家自然科学基金“基于DIBR绘制3D图像认证的关键技术研究”(61702224)。

A Robust Semi-Supervised Federated Learning System

WANG Shufen1, ZHANG Zhe2, MA Shiyao2, CHEN Yuqiang3, WU Yi2   

  1. 1. School of Information Engineering, Harbin Institute of Petroleum, Harbin 150028, China;
    2. School of Data Science and Technology, Heilongjiang University, Harbin 150080, China;
    3. School of Information and Communication Engineering, Guangzhou Maritime University, Guangzhou 510725, China
  • Received:2021-06-15 Revised:2021-07-23 Published:2021-08-12

摘要: 联邦学习允许边缘设备或客户端将数据存储在本地来合作训练共享的全局模型。主流联邦学习系统通常基于客户端本地数据有标签这一假设,然而客户端数据一般没有真实标签,且数据可用性和数据异构性是联邦学习系统面临的主要挑战。针对客户端本地数据无标签的场景,设计一种鲁棒的半监督联邦学习系统。利用FedMix方法分析全局模型迭代之间的隐式关系,将在标签数据和无标签数据上学习到的监督模型和无监督模型进行分离学习。采用FedLoss聚合方法缓解客户端之间数据的非独立同分布(non-IID)对全局模型收敛速度和稳定性的影响,根据客户端模型损失函数值动态调整局部模型在全局模型中所占的权重。在CIFAR-10数据集上的实验结果表明,该系统的分类准确率相比于主流联邦学习系统约提升了3个百分点,并且对不同non-IID水平的客户端数据更具鲁棒性。

关键词: 联邦学习, 半监督联邦学习, 数据异构性, 一致性损失, 鲁棒性

Abstract: Federated Learning(FL) allows edge devices or clients to cooperatively train a shared global model by storing data locally.Mainstream FL systems are typically based on the assumption that client-side local data contain labels;however, client-side data generally do not contain abundant real labels.Meanwhile, data availability and heterogeneity are the main challenges encountered by FL systems.A robust Semi-Supervised Federated Learning(SSFL) system is designed for scenarios where client local data are unlabeled.The FedMix method is used to analyze implicit relationships between global model iterations, whereas supervised and unsupervised models are learned separately on labeled and unlabeled data.The FedLoss aggregation method is used to alleviate the effect of not Identically and Independently Distributed(non-IID) data between clients on the convergence speed and stability of the global model, and the weight of the local model in the global model is dynamically adjusted based on the loss function value of the client model.Experimental results on the CIFAR-10 dataset show that the classification accuracy of this system is approximately 3 percentage points higher than that of the mainstream FL system, and that it is more robust to client data of different non-IID levels.

Key words: Federated Learning(FL), Semi-Supervised Federated Learning(SSFL), data heterogeneity, consistency loss, robustness

中图分类号: