基于概率论的隐私保持分类挖掘

doi:10.3969/j.issn.1000-3428.2012.03.005

计算机工程 ›› 2012, Vol. 38 ›› Issue (3): 12-13,18. doi: 10.3969/j.issn.1000-3428.2012.03.005

基于概率论的隐私保持分类挖掘

李光，王亚东，苏小红

(哈尔滨工业大学计算机科学与工程系，哈尔滨 150001)

收稿日期:2011-07-25 出版日期:2012-02-05 发布日期:2012-02-05
作者简介:李光(1982－)，男，博士研究生，主研方向：数据挖掘，生物信息学；王亚东、苏小红，教授、博士生导师
基金资助:
国家“863”计划基金资助项目(2007AA02Z329)

Privacy-preserving Classification Mining Based on Probability Theory

LI Guang, WANG Ya-dong, SU Xiao-hong

(Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin 150001, China)

Received:2011-07-25 Online:2012-02-05 Published:2012-02-05

摘要/Abstract

摘要： 在现有的基于数据扰动的隐私保持分类挖掘算法中，扰动数据和原始数据相关联，对隐私数据的保护并不完善，且扰动算法和分类算法耦合度高，不适合在实际中使用。为此，提出一种基于概率论的隐私保持分类挖掘算法。扰动后可得到一组与原始数据独立同分布的数据，使扰动数据和原始数据不再相互关联，各种分类算法也可直接应用于扰动后的数据。

关键词: 数据挖掘, 隐私保持, 数据扰动, 随机噪声, 决策树

Abstract: In the existed privacy-preserving classification mining methods based on data perturbation, the privacy data is not protected perfectly because the perturbed data and the original data have been related. The classification algorithm and the data perturbation algorithm have high coupling It is not easy to use these methods in practice. To solve these problems, it proposes a privacy-preserving classification mining algorithm based on probability theory. The perturbed data is independent from the original data and they have the same distribution. This proposed method overcomes the shortcomings of others. The perturbed data is no relation with the original data and the classification methods can be used on the perturbed data directly.

Key words: data mining, privacy protection, data perturbation, random noise, decision tree

中图分类号:

TP391.4

李光, 王亚东, 苏小红. 基于概率论的隐私保持分类挖掘[J]. 计算机工程, 2012, 38(3): 12-13,18.

LI Guang, WANG E-Dong, SU Xiao-Gong. Privacy-preserving Classification Mining Based on Probability Theory[J]. Computer Engineering, 2012, 38(3): 12-13,18.

http://www.ecice06.com/CN/Y2012/V38/I3/12

[1]	王博, 张远, 杨咏蓓. 基于模仿学习的决策树码率自适应算法研究[J]. 计算机工程, 2023, 49(5): 206-214.
[2]	席荣康, 蔡满春, 芦天亮. 基于数据增强与流数据处理的Tor流量分析模型[J]. 计算机工程, 2023, 49(3): 177-184.
[3]	甘红楠, 张凯. 参数自适应下基于近邻图的近似最近邻搜索[J]. 计算机工程, 2022, 48(9): 28-36.
[4]	冉懿, 王润年, 潘红伟, 俞海猛, 袁培森. 面向停电分类预测的因子分解机模型[J]. 计算机工程, 2022, 48(5): 98-103,111.
[5]	谷青竹, 董红斌. PPDM中面向k-匿名的MI Loss评估模型[J]. 计算机工程, 2022, 48(4): 143-147.
[6]	李莉, 任振康, 石可欣. 代价敏感的Boosting软件缺陷预测方法[J]. 计算机工程, 2022, 48(3): 175-180.
[7]	常硕, 张彦春. 基于袋外预测和扩展空间的随机森林改进算法[J]. 计算机工程, 2022, 48(3): 1-9.
[8]	王璐, 刘晓清, 何震瀛. 连续时间区间内的频繁词序列挖掘算法[J]. 计算机工程, 2022, 48(2): 79-85,91.
[9]	张攀, 高丰, 周逸, 饶涵宇, 毛冬, 李静. 一种在线实时微服务调用链异常检测方法[J]. 计算机工程, 2022, 48(11): 161-169.
[10]	吴军, 欧阳艾嘉, 张琳. 面向置换检验的冗余对比模式过滤算法[J]. 计算机工程, 2022, 48(1): 75-84.
[11]	吴军, 欧阳艾嘉, 张琳. 面向对比序列模式发现的独立精确置换检验算法[J]. 计算机工程, 2021, 47(8): 45-53,61.
[12]	纪文桃, 李媛媛, 秦宝东. 基于决策树的SM4分组密码工作模式识别[J]. 计算机工程, 2021, 47(8): 157-161,169.
[13]	杜诗晴, 王鹏, 汪卫. 一种基于MDL的日志序列模式挖掘算法[J]. 计算机工程, 2021, 47(2): 118-125.
[14]	魏文浩, 唐泽坤, 刘刚. 基于距离和密度的PBK-means算法[J]. 计算机工程, 2020, 46(9): 68-75.
[15]	史明阳, 王鹏, 汪卫. 有监督时间序列分割与状态识别算法[J]. 计算机工程, 2020, 46(5): 131-138.

选择文件类型/文献管理软件名称

选择包含的内容

基于概率论的隐私保持分类挖掘

Privacy-preserving Classification Mining Based on Probability Theory

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于概率论的隐私保持分类挖掘

Privacy-preserving Classification Mining Based on Probability Theory

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价