Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering ›› 2008, Vol. 34 ›› Issue (1): 14-16. doi: 10.3969/j.issn.1000-3428.2008.01.005

• Degree Paper • Previous Articles     Next Articles

Privacy-preserving Bayesian Network Learning on Horizontally Partitioned Data with Missing Values

WANG Hong-mei1,2, ZENG Yuan1, ZHAO Zheng3   

  1. (1. School of Electrical Engineering and Automation, Tianjin University, Tianjin 300072; 2. Tianjin Nankai Guard Company of Limited Liability, Tianjin 300457; 3. School of Computer Science and Technology, Tianjin University, Tianjin 300072)
  • Received:1900-01-01 Revised:1900-01-01 Online:2008-01-05 Published:2008-01-05

分布的缺失数据中保护隐私的贝叶斯网络学习

王红梅1,2,曾 沅1,赵 政3   

  1. (1. 天津大学电气与自动化工程学院,天津 300072;2. 南开戈德集团有限公司,天津 300457; 3. 天津大学计算机科学与技术学院,天津 300072)

Abstract: Privacy regulations may prevent parties from sharing their data. A privacy-preserving EM learning on horizontally partitioned data with missing data (PPHI-EM) is proposed to make parties share their data under privacy. Each party owning confidential data disassembles the likelihood function, replacing the expected value of all the statistics by the nonexistent value. Based on AMS-EM, the network structure is improved to convergence by iteration cycle. The intersection and union of directed edges of structure are computed with security directed edge statistic algorithm, in which pohlig-hellman encryption algorithm is used. The intersection is regarded as the initialization of structure. The edge in union other than that in intersection is put into the structure one by one. The proposed method alternates between the iterations that optimizes the parameters for the current model candidate and the iterations that searches for a different model. Parameters are solved with security matrix sum algorithm by enactment of proper weight sum. It lies on the values of scoring function that the edge is remained or not in this aggregation. Experimental results show its effectiveness.

Key words: privacy-preserving data mining, Bayesian network, distributed database, secure multiparty computation

摘要: 对隐私的保护性关注限制了参与各方对数据资源的共享使用,为此提出了从分布的缺失数据中保护隐私的贝叶斯网络学习方 法——PPHI-EM方法。该方法基于Pohlig-Hellman加密算法,使用安全有向边统计算法得到结构有向边的交集和并集。以交集作为初始网络结构,依次将并集中的其他边放入网络中,通过打分函数值的大小,判断该边是否应予保留。根据设定的适当权重,使用安全矩阵求和算法求解当前网络结构参数。循环计算直至确定网络的最优参数。该方法使用了期望统计来代替实际不存在的充分统计,使数据各方的打分函数便于分解,并基于AMS-EM方法分布迭代改进结构,使之收敛。实验结果验证了该方法的有效性。

关键词: 保护隐私的数据挖掘, 贝叶斯网络, 分布式数据库, 安全多方计算

CLC Number: