作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2012, Vol. 38 ›› Issue (3): 12-13,18. doi: 10.3969/j.issn.1000-3428.2012.03.005

• 博士论文 • 上一篇    下一篇

基于概率论的隐私保持分类挖掘

李 光,王亚东,苏小红   

  1. (哈尔滨工业大学计算机科学与工程系,哈尔滨 150001)
  • 收稿日期:2011-07-25 出版日期:2012-02-05 发布日期:2012-02-05
  • 作者简介:李 光(1982-),男,博士研究生,主研方向:数据挖掘,生物信息学;王亚东、苏小红,教授、博士生导师
  • 基金资助:
    国家“863”计划基金资助项目(2007AA02Z329)

Privacy-preserving Classification Mining Based on Probability Theory

LI Guang, WANG Ya-dong, SU Xiao-hong   

  1. (Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, China)
  • Received:2011-07-25 Online:2012-02-05 Published:2012-02-05

摘要: 在现有的基于数据扰动的隐私保持分类挖掘算法中,扰动数据和原始数据相关联,对隐私数据的保护并不完善,且扰动算法和分类算法耦合度高,不适合在实际中使用。为此,提出一种基于概率论的隐私保持分类挖掘算法。扰动后可得到一组与原始数据独立同分布的数据,使扰动数据和原始数据不再相互关联,各种分类算法也可直接应用于扰动后的数据。

关键词: 数据挖掘, 隐私保持, 数据扰动, 随机噪声, 决策树

Abstract: In the existed privacy-preserving classification mining methods based on data perturbation, the privacy data is not protected perfectly because the perturbed data and the original data have been related. The classification algorithm and the data perturbation algorithm have high coupling It is not easy to use these methods in practice. To solve these problems, it proposes a privacy-preserving classification mining algorithm based on probability theory. The perturbed data is independent from the original data and they have the same distribution. This proposed method overcomes the shortcomings of others. The perturbed data is no relation with the original data and the classification methods can be used on the perturbed data directly.

Key words: data mining, privacy protection, data perturbation, random noise, decision tree

中图分类号: