Collaborative Computing of Privacy-Preserving Logistic Regression Based on Homomorphic Encryption

doi:10.19678/j.issn.1000-3428.0064391

Abstract

Abstract: With the establishment and standardization of data exchange markets, a new demand in which multiparties collaboratively train a machine learning model has emerged. Federated learning enables multiple data owners to jointly train a model, with the requirement that the model is trained and shared by all participants. The existing federated learning frameworks can not be applied to scenarios in which the data owners and model demander have different requirements and the model is jointly trained but not shared. A collaborative computing scheme for privacy-preserving logistic regression based on homomorphic encryption is proposed, which is independent of any third-party computing platforms. The collaborative compuing scheme includes a multiparty collaborative computing framework that comprises multiple data owners, a model demander, a key generator, and an interactive collaborative computing process based on the framework. With this framework, a model can be collaboratively trained without the leakage of model information or data privacy. The security of the collaborative computing scheme is analyzed by establishing an attack model. Based on the advanced floating-point fully homomorphic encryption scheme called the Cheon-Kim-Kim-Song(CKKS), a prototype system is implemented on a small computer cluster. This is optimized for calculations and communication, including the early termination of the training process and offloading the ciphertext homomorphic operations to the Graphics Processing Unit(GPU) to improve computational efficiency. The experimental results show that the computational optimizations can improve the system performance by approximately 50 times, and the prototype system can satisfy the practical requirements for small and medium-sized data sets.

Key words: data sharing, collaborative computing, privacy-preserving computing, homomorphic encryption, logistic regression

摘要： 随着数据交易市场的建立和规范化，多方协同进行机器学习建模成为新需求。联邦学习允许多个数据拥有方联合训练一个机器学习模型，适用于模型共建共用场景，但现有联邦学习计算框架无法适用于数据拥有方和模型需求方诉求不同、模型共建不共用的场景。提出一种不依赖于第三方计算平台且基于同态加密的隐私保护逻辑回归协同计算方案，包括由数据拥有方、模型需求方和密钥生成者构成的多方协同计算框架，以及基于该框架的多方交互协同计算流程，在不泄露模型信息及各方数据隐私的前提下协作完成模型训练任务，通过建立攻击模型分析协同计算方案的安全性。基于先进的浮点数全同态加密方案CKKS在小型计算机集群上实现协同计算的原型系统，并对原型系统进行计算和通信优化，包括提前终止训练和将密文同态运算卸载到GPU上提高计算效率。实验结果表明，计算优化措施获得了约50倍的速度提升，协同计算原型系统在中小规模的数据集上可满足实用性要求。

关键词: 数据共享, 协同计算, 隐私保护计算, 同态加密, 逻辑回归

CLC Number:

TP309.7

YANG Yuejia, HUA Bei, ZHONG Zhiwei, GAO Mi. Collaborative Computing of Privacy-Preserving Logistic Regression Based on Homomorphic Encryption[J]. Computer Engineering, 2023, 49(4): 23-31.

杨越佳, 华蓓, 钟志威, 高咪. 基于同态加密的隐私保护逻辑回归协同计算[J]. 计算机工程, 2023, 49(4): 23-31.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0064391

http://www.ecice06.com/EN/Y2023/V49/I4/23

Figures/Tables 6

References

[1] FANG H, QIAN Q.Privacy preserving machine learning with homomorphic encryption and federated learning[J].Future Internet, 2021, 13(4):94.
[2] RIVEST R, ADLEMAN L, DERTOUZOS M.On data banks and privacy homomorphisms[J].Foundations of Secure Computation, 1978, 4(11):169-180.
[3] DWORK C, MCSHERRY F, NISSIM K, et al.Calibrating noise to sensitivity in private data analysis[M].Berlin, Germany:Springer, 2006.
[4] YAO A C C.How to generate and exchange secrets[C]//Proceedings of the 27th Annual Symposium on Foundations of Computer Science.Washington D.C., USA:IEEE Press, 2008:162-167.
[5] YAO A C.Protocols for secure computations[C]//Proceedings of the 23rd Annual Symposium on Foundations of Computer Science.Washington D.C., USA:IEEE Press, 2008:160-164.
[6] SHAMIR A.How to share a secret[J].Communications of the ACM, 1979, 22(11):612-613.
[7] YANG Q, LIU Y, CHEN T J, et al.Federated machine learning[J].ACM Transactions on Intelligent Systems and Technology, 2019, 10(2):1-19.
[8] 中国信通院.数据价值化与数据要素市场发展报告(2021年)[EB/OL].[2022-03-07].http://www.caict.ac.cn/kxyj/qwfb/ztbg/202105/P020210527392862309670.pdf. China Academy of Information and Communication.Data value and data element market development report(2021)[EB/OL].[2022-03-07].http://www.caict.ac.cn/kxyj/qwfb/ztbg/202105/P020210527392862309670.pdf.(in Chinese)
[9] NAEHRIG M, LAUTER K, VAIKUNTANATHAN V.Can homomorphic encryption be practical?[C]//Proceedings of the 3rd ACM Workshop on Cloud Computing Security Workshop.New York, USA:ACM Press, 2011:113-124.
[10] O'NEILL A.Definitional issues in functional encryption[EB/OL].[2022-03-07].https://eprint.iacr.org/2010/556.pdf.
[11] 熊婉君, 韦永壮, 王会勇.一个基于整数的全同态加密改进方案[J].密码学报, 2016, 3(1):67-78. XIONG W J, WEI Y Z, WANG H Y.An improved fully homomorphic encryption scheme over the integers[J].Journal of Cryptologic Research, 2016, 3(1):67-78.(in Chinese)
[12] CHEON J H, KIM A, KIM M, et al.Homomorphic encryption for arithmetic of approximate numbers[C]//Proceedings of the 23rd International Conference on the Theory and Application of Cryptology and Information Security.Berlin, Germany:Springer, 2017:409-437.
[13] CHEON J H, HAN K, KIM A, et al.A full RNS variant of approximate homomorphic encryption[M].Berlin, Germany:Springer, 2019.
[14] HAN K Y.A full RNS variant of HEAAN[EB/OL].[2022-03-07].https://github.com/KyoohyungHan/FullRNS-HEAAN.
[15] JUNG W, KIM S, AHN J H, et al.Over 100x faster bootstrapping in fully homomorphic encryption through memory-centric optimization with GPUs[EB/OL].[2022-03-07].https://eprint.iacr.org/2016/870.
[16] KIM M, SONG Y, WANG S, et al.Secure logistic regression based on homomorphic encryption:design and evaluation[J].JMIR Medical Informatics, 2018, 6(2):1-11.
[17] PAUL J, ANNAMALAI M S M S, MING W, et al.Privacy-preserving collective learning with homomorphic encryption[J].IEEE Access, 2021, 9:132084-132096.
[18] CHAN F M, BADAWI A Q A A, SIM J J, et al.Genotype imputation with homomorphic encryption[C]//Proceedings of the 6th International Conference on Biomedical Signal and Image Processing.New York, USA:ACM Press, 2021:9-13.
[19] SHAIK I, EMMADI N, TUPSAMUDRE H, et al.Privacy preserving machine learning for malicious URL detection[M].Berlin, Germany:Springer, 2021.
[20] KIM M, LEE J, OHNO-MACHADO L, et al.Secure and differentially private logistic regression for horizontally distributed data[J].IEEE Transactions on Information Forensics and Security, 2019, 15:695-710.
[21] RUDER S.An overview of gradient descent optimization algorithms[EB/OL].[2022-03-07].https://arxiv.org/abs/1609.04747.
[22] HAN K, HONG S, CHEON J H, et al.Logistic regression on homomorphic encrypted data at scale[C]//Proceedings of AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2019:9466-9471.
[23] HAN K.Homomorphic logistic regression on encrypted data[EB/OL].[2022-03-07].https://github.com/Kyoohyung Han/HELR.
[24] CHEON J H, KIM D.Efficient homomorphic comparison methods with optimal complexity[M].Berlin, Germany:Springer, 2020.
[25] DARDIS C.LogisticDx:diagnostic tests for models with a binomial response[EB/OL].[2022-03-07].https://rdrr.io/rforge/LogisticDx/man/.

Please choose a citation manager

Content to export