摘要: 将关系数据库中基于最小相关阈值的挖掘问题转为Top-K强相关项目对的挖掘,利用关系数据库的结构信息,有效地估计合适的阈值,提出基于阈值估计的Top-K强相关项目对挖掘算法。借助定理证明的形式在理论上推导该算法,并基于自主开发的仿真平台和权威数据库样本进行仿真实验。该算法能高效、快速地得到挖掘结果。
关键词:
数据挖掘,
阈值,
强相关项目对,
关系数据库,
皮尔森关联系数
Abstract: The mining scheme based on minimum correlation threshold in relational database is transformed to the mining scheme on Top-K strongly correlated item pairs. By using structural information of relational database, appropriate threshold is effectively estimated, and an algorithm for Top-K mining strongly correlated item pairs based on evaluation of threshold is proposed. The theory of this algorithm is deduced with the help of the process of proving some theorems. Based on a simulation platform developed independently and two authorized samples of database, some experiments are conducted. The theoretically analysis and some simulations results show that this algorithm can obtain mining results efficiently and quickly.
Key words:
data mining,
threshold,
strongly correlated item pairs,
relational database,
Pearson’s correlation coefficients
中图分类号:
李 强;张勇实. 基于阈值估计的强相关项目对挖掘算法[J]. 计算机工程, 2010, 36(8): 58-59.
LI Qiang; ZHANG Yong-shi. Mining Algorithm of Strongly Correlated Item Pairs Based on Threshold Estimation[J]. Computer Engineering, 2010, 36(8): 58-59.