计算机工程 ›› 2020, Vol. 46 ›› Issue (1): 93-101.doi: 10.19678/j.issn.1000-3428.0053592

• 网络空间安全 • 上一篇    下一篇

面向差分隐私保护的随机森林算法

李远航a,b, 陈先来b,c, 刘莉b,c, 安莹b,c, 李忠民b,c   

  1. 中南大学 a. 计算机学院;b. 医疗大数据应用技术国家工程实验室;c. 信息安全与大数据研究院, 长沙 410083
  • 收稿日期:2019-01-07 修回日期:2019-03-07 出版日期:2020-01-15 发布日期:2019-03-15
  • 作者简介:李远航(1993-),男,硕士研究生,主研方向为隐私保护、机器学习、数据挖掘;陈先来,教授、博士、博士生导师;刘莉,副教授、硕士;安莹、李忠民,副教授、博士。
  • 基金项目:
    国家重点研发计划(2016YFC0901705);湖南省自然科学基金(2018JJ2534)。

Random Forest Algorithm for Differential Privacy Protection

LI Yuanhanga,b, CHEN Xianlaib,c, LIU Lib,c, AN Yingb,c, LI Zhongminb,c   

  1. a. School of Computer Science and Engineering;b. National Engineering Laboratory for Medical Big Data Application Technology;c. Information Security and Big Data Research Institute, Central South University, Changsha 410083, China
  • Received:2019-01-07 Revised:2019-03-07 Online:2020-01-15 Published:2019-03-15

摘要: 数据挖掘中的隐私保护问题是目前信息安全领域的研究热点之一。针对隐私保护要求下的分类问题,提出一种面向差分隐私保护的随机森林算法RFDPP-Gini。将随机森林与差分隐私保护相结合,在隐私信息得到保护的同时提高分类的准确率。以CART分类树作为随机森林中的单棵决策树,使用Laplace机制和指数机制添加噪声并选择最佳分裂特征。实验结果表明,RFDPP-Gini算法既能处理离散型特征又能处理连续型特征,在Adult和Mushroom数据集上的分类准确率最高分别达86.335%和100%,且在加入噪声后算法的分类准确率下降幅度极小。

关键词: 隐私保护, 差分隐私, 随机森林, 决策树, CART分类树

Abstract: Privacy protection in data mining is one of the research hotspots in the field of information security.To address the classification problem under privacy protection requirements,this paper proposes a random forest algorithm RFDPP-Gini for differential privacy protection.The random forest and differential privacy protection are combined to improve the classification accuracy while guaranteeing the protection of private information.The CART classification tree is taken as a single decision tree in the random forest.The Laplace mechanism and the exponential mechanism are used to add noise and select the optimal splitting feature.Experimental results show that the RFDPP-Gini algorithm can deal with both discrete and continuous features.The classification accuracy on Adult and Mushroom datasets can reach up to 86.335% and 100% respectively,and the magnitude of classification accuracy decline is very slight after noise is added.

Key words: privacy protection, differential privacy, random forest, decision tree, CART classification tree

中图分类号: