作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2010, Vol. 36 ›› Issue (8): 58-59. doi: 10.3969/j.issn.1000-3428.2010.08.020

• 软件技术与数据库 • 上一篇    下一篇

基于阈值估计的强相关项目对挖掘算法

李 强,张勇实   

  1. (哈尔滨工程大学计算机科学与技术学院,哈尔滨 150001)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2010-04-20 发布日期:2010-04-20

Mining Algorithm of Strongly Correlated Item Pairs Based on Threshold Estimation

LI Qiang, ZHANG Yong-shi   

  1. (School of Computer Science and Technology, Harbin Engineering University, Harbin 150001)
  • Received:1900-01-01 Revised:1900-01-01 Online:2010-04-20 Published:2010-04-20

摘要: 将关系数据库中基于最小相关阈值的挖掘问题转为Top-K强相关项目对的挖掘,利用关系数据库的结构信息,有效地估计合适的阈值,提出基于阈值估计的Top-K强相关项目对挖掘算法。借助定理证明的形式在理论上推导该算法,并基于自主开发的仿真平台和权威数据库样本进行仿真实验。该算法能高效、快速地得到挖掘结果。

关键词: 数据挖掘, 阈值, 强相关项目对, 关系数据库, 皮尔森关联系数

Abstract: The mining scheme based on minimum correlation threshold in relational database is transformed to the mining scheme on Top-K strongly correlated item pairs. By using structural information of relational database, appropriate threshold is effectively estimated, and an algorithm for Top-K mining strongly correlated item pairs based on evaluation of threshold is proposed. The theory of this algorithm is deduced with the help of the process of proving some theorems. Based on a simulation platform developed independently and two authorized samples of database, some experiments are conducted. The theoretically analysis and some simulations results show that this algorithm can obtain mining results efficiently and quickly.

Key words: data mining, threshold, strongly correlated item pairs, relational database, Pearson’s correlation coefficients

中图分类号: