作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2026, Vol. 52 ›› Issue (5): 129-138. doi: 10.19678/j.issn.1000-3428.0070288

• 计算智能与模式识别 • 上一篇    下一篇

结合旋转自监督和CLIP指导的长尾数据联邦学习

刘海军1, 付晓东1,2,*()   

  1. 1. 昆明理工大学信息工程与自动化学院, 云南 昆明 650500
    2. 昆明理工大学云南省计算机技术应用重点实验室, 云南 昆明 650500
  • 收稿日期:2024-08-26 修回日期:2024-11-21 出版日期:2026-05-15 发布日期:2025-01-03
  • 通讯作者: 付晓东
  • 作者简介:

    刘海军(CCF学生会员), 男, 硕士研究生, 主研方向为联邦学习、长尾数据

    付晓东(通信作者), 教授、博士、博士生导师

  • 基金资助:
    国家自然科学基金(62362043); 云南省"兴滇英才支持计划"项目(KKXY202203008); 云南省科技计划项目(202205AF150003); 云南省科技计划项目(202204BQ040010); 云南省科技计划项目(202102AD080002)

Federated Learning on Long-Tail Data Combining Rotational Self-Supervision and CLIP Guidance

LIU Haijun1, FU Xiaodong1,2,*()   

  1. 1. Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
    2. Yunnan Key Laboratory of Computer Technology Application, Kunming University of Science and Technology, Kunming 650500, Yunnan, China
  • Received:2024-08-26 Revised:2024-11-21 Online:2026-05-15 Published:2025-01-03
  • Contact: FU Xiaodong

摘要:

现实世界中的数据通常遵循长尾分布, 假设全局数据分布平衡的联邦学习方法难以对长尾数据中的尾类数据进行准确分类。为此, 现有研究为全局模型重新训练一个平衡的分类器, 以缓解长尾数据带来的影响, 但其未考虑平衡模型的特征提取器以及如何让模型的特征提取器学习高质量的图像特征, 导致全局模型性能不佳。为了使模型在特征学习阶段没有偏见地学习高质量的图像特征, 提出一种结合旋转自监督和对比语言-图像预训练(CLIP)指导的联邦学习方法, 通过使用旋转自监督学习来指导本地客户端模型的训练, 减少长尾数据对客户端模型造成的影响, 并使模型高质量地学习图像中的特征。同时, 利用CLIP对模型的正常训练以及旋转后的图片进行指导, 将CLIP中丰富的知识转移到客户端模型中, 进一步提升特征提取器的性能。在不同长尾分布下的CIFAR-10和CIFAR-100数据集上进行测试, 并与其他联邦学习方法进行对比, 实验结果表明, 与现有方法相比, 该方法可将全局模型的分类准确率提升2.35~4.72百分点。

关键词: 联邦学习, 长尾分布, 异构数据, 自监督学习, 对比语言-图像预训练

Abstract:

Real-world data often follow a long-tail distribution. Federated learning methods that assume a balanced global data distribution struggle to classify tail-class data within long-tail data accurately. Researchers typically focus on retraining a balanced classifier for the global model, to mitigate the impact of long-tail data. However, this approach does not consider the feature extractor of the balanced model or how the model's feature extractor can be enabled to learn high-quality image features, leading to the poor performance of the global model. To enable the model to learn high-quality image features without bias during the feature learning stage, this study proposes a federated learning method combining rotational self-supervision and Contrastive Language-Image Pre-training (CLIP) guidance. This method uses rotational self-supervision to guide the training of local client models, thereby reducing the impact of long-tail data on the client models and enabling the model to learn high-quality image features. Simultaneously, CLIP is utilized to guide both the normal training of the model and the rotated images, transferring rich knowledge from CLIP to the client model and further enhancing the performance of the feature extractor. In experiments on the CIFAR-10 and CIFAR-100 datasets under different long-tail distributions, the proposed approach improves the global model's classification accuracy by 2.35 to 4.72 percentage points, respectively, compared with other federated learning methods.

Key words: federated learning, long-tailed distribution, heterogeneous data, self-supervised learning, Contrastive Language-Image Pre-training (CLIP)