作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (7): 110-117. doi: 10.19678/j.issn.1000-3428.0064016

• 人工智能与模式识别 • 上一篇    下一篇

面向非独立同分布数据的联邦学习架构

邱天晨1,2, 郑小盈1,2, 祝永新1,2,*, 封松林1,2   

  1. 1. 中国科学院上海高等研究院, 上海 201210
    2. 中国科学院大学, 北京 100049
  • 收稿日期:2022-02-23 出版日期:2023-07-15 发布日期:2022-08-19
  • 通讯作者: 祝永新
  • 作者简介:

    邱天晨(1998—),男,硕士研究生,主研方向为联邦学习

    郑小盈,副研究员、博士

    封松林,研究员、博士

  • 基金资助:
    国家自然科学基金委联合基金项目(U2032125); 国家重点研究计划(2019YFB2204204)

Federated Learning Architecture for Non-IID Data

Tianchen QIU1,2, Xiaoying ZHENG1,2, Yongxin ZHU1,2,*, Songlin FENG1,2   

  1. 1. Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
    2. University of Chinese Academy of Sciences, Beijing 100049, China
  • Received:2022-02-23 Online:2023-07-15 Published:2022-08-19
  • Contact: Yongxin ZHU

摘要:

在超大规模边缘设备参与的联邦学习场景中,参与方本地数据为非独立同分布,导致总体训练数据不均衡且毒药攻击防御困难。有监督学习中增强数据均衡的多数方法所要求的先验知识与联邦学习的隐私保护原则发生冲突,而针对非独立同分布场景中的毒药攻击,现有的防御算法则过于复杂或侵害数据隐私。提出一种多服务器架构FedFog,其能在不泄露参与方本地数据分布的前提下,对数据分布相似的参与方进行聚类,将非独立同分布的训练数据转换成多个独立同分布的数据子集。基于各聚类中心,全局服务器计算出从各类别数据中提取的特征在全局模型更新时的权重,从而缓解总体训练数据不均衡的负面影响。同时,将毒药攻击防御任务从参与方全集分配至每个聚类内部,从而解决毒药攻击防御问题。实验结果表明:在总体训练数据不均衡的场景中,FedFog的全局模型精度相较FedSGD最多获得了4.2个百分点的提升;在总体数据均衡但1/3的参与方为毒药攻击者的场景中,FedFog的收敛性接近于无毒药攻击场景中的FedSGD。

关键词: 非独立同分布, 隐私保护, 聚类, 数据均衡, 毒药攻击防御

Abstract:

In the scenarios of federated learning involving ultra-large-scale edge devices, the local data of participants are non-Independent Identically Distribution(non-IID) pattern, resulting in an imbalance in overall training data and difficulty in defending against poison attacks.The prior knowledge required by most methods to enhance the data balance in supervised learning conflicts with the privacy protection principle of federated learning.Furthermore, existing defense algorithms for poison attacks defense in non-IID scenarios are overly complex or violate data privacy.This study introduces FedFog, a multi-server architecture, capable of clustering participants with similar data distributions without disclosing the participants' local data distribution, and converting non-IID training data into multiple IID data subsets. Based on each cluster center, the global server calculates the weight of the features extracted from each category of data in the global model update to alleviate the negative impact of the overall training data imbalance.Simultaneously, FedFog assigns poison attack defense tasks from the entire set of participants to each cluster, thereby solving the problem of poison attack defense.The experimental results show that FedFog improves global model precision by up to 4.2 percentage points compared to FedSGD when the overall training data are not balanced.The convergence of FedFog in the scenario where the overall data are balanced but 1/3 of the participants are poison attackers approaches that of FedSGD in the no-poison attack scenario.

Key words: non-Independent Identically Distribution(non-IID), privacy protection, clustering, data balance, poison attack defense