作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (12): 68-81. doi: 10.19678/j.issn.1000-3428.0069639

• 热点与综述 • 上一篇    下一篇

基于同态加密的隐私保护逻辑回归模型训练方案

苗炜捷1,2, 吴文渊1,2,*()   

  1. 1. 中国科学院重庆绿色智能技术研究院生物计算安全重庆市重点实验室, 重庆 400714
    2. 中国科学院大学重庆学院, 重庆 400714
  • 收稿日期:2024-03-22 修回日期:2024-06-25 出版日期:2025-12-15 发布日期:2024-08-20
  • 通讯作者: 吴文渊
  • 基金资助:
    国家重点研发专项(2020YFA0712300); 重庆市在渝院士牵头科技创新引导专项(2022YSZX-JCX0011CSTB); 重庆市在渝院士牵头科技创新引导专项(cstc2021yszx-jcyjX0004); 重庆市在渝院士牵头科技创新引导专项(CSTB2023YSZX-JCX0008); 重庆市在渝院士牵头科技创新引导专项(cstc2021jcyj-msxmX0821)

Privacy-Preserving Logistic Regression Model Training Scheme Based on Homomorphic Encryption

MIAO Weijie1,2, WU Wenyuan1,2,*()   

  1. 1. Chongqing Key Laboratory of Secure Computing for Biology, Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing 400714, China
    2. Chongqing School, University of Chinese Academy of Sciences, Chongqing 400714, China
  • Received:2024-03-22 Revised:2024-06-25 Online:2025-12-15 Published:2024-08-20
  • Contact: WU Wenyuan

摘要:

在大数据领域,逻辑回归是一种广泛用于预测事件发生概率的模型。针对两个用户参与且数据呈水平分布的场景,基于CKKS(Cheon-Kim-Kim-Song)加密方案,设计一种逻辑回归模型训练方案。该方案通过二次逼近的牛顿法取代梯度下降法,减少训练过程的迭代轮数;采用共轭梯度法求解牛顿法更新方向,避免由Hessian矩阵求逆导致的密文除法运算;并利用两方交互的形式,引入少量交互,避免密文域求逆操作,减少密文域计算开销;同时使用一种新的编码方式降低了密文乘法的次数和通信开销。实验结果表明,采用牛顿法后,对于大部分数据集,迭代轮数设置在3轮以内即可达到与现有隐私保护方案5~7轮相当的精度,且可在特征维数较大的数据集上高效运算,例如,对于60和112维的样本数据集,现有类似方案分别需90和165 s完成5轮迭代,该方案仅需8和27 s,且通信损耗减少为原有方案的一半,仅需30.8和62.7 Mb即可完成训练,可以满足特定场景的需求。

关键词: 隐私保护, 牛顿-共轭梯度法, 逻辑回归, 同态加密, CKKS方案

Abstract:

Logistic regression is widely used in big data for predicting the probability of event occurrence. This study focuses on scenarios involving two parties and data being horizontally distributed. Based on Cheon-Kim-Kim-Song (CKKS) encryption scheme, a logistic regression model training scheme is designed. This scheme reduces the number of iterations in the training process using Newton′s second-order approximation method. It employs the conjugate gradient method to solve Newton′s second-order approximation and introduces a small amount of interaction, thereby significantly reducing the computational overhead of the ciphertext domain. Additionally, a new encoding method is used to reduce the number of ciphertext multiplications and communication overhead. Experimental results show that, for most datasets, using the Newton′s second-order approximation method to set the number of iterations to less than three can achieve an accuracy comparable to that of the existing privacy protection schemes comprising five to seven iterations. For sample datasets with 60 and 112 dimension, existing schemes require 90 and 165 s, respectively, to complete five iterations, whereas the proposed scheme requires only 8 and 27 s. Moreover, the communication overhead is reduced to half that of the original scheme, requiring only 30.8 and 62.7 Mb to complete the training.

Key words: privacy protection, Newton-conjugate gradient method, logistic regression, homomorphic encryption, Cheon-Kim-Kim-Song (CKKS) scheme