作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2011, Vol. 37 ›› Issue (5): 65-67,70. doi: 10.3969/j.issn.1000-3428.2011.05.022

• 软件技术与数据库 • 上一篇    下一篇

基于概率分布及维度编码的关联规则挖掘

王 盛,董黎刚,李 群   

  1. (浙江工商大学信息与电子工程学院,杭州 310018)
  • 出版日期:2011-03-05 发布日期:2012-10-31
  • 作者简介:王 盛(1987-),男,本科生,主研方向:数据挖掘, 网络通信技术;董黎刚,副教授、博士;李 群,硕士
  • 基金资助:
    浙江省科技计划基金资助项目(2009C31066, 2008C21093)

Association Rules Mining Based on Probability Distribution and Dimensions Coding

WANG Sheng, DONG Li-gang, LI Qun   

  1. (College of Information & Electronic Engineering, Zhejiang Gongshang University, Hangzhou 310018, China)
  • Online:2011-03-05 Published:2012-10-31

摘要: 设计一种基于二进制数及项目的支持度分布的Apriori改进算法BF-Apriori。该算法通过分析项目的概率分布并对项目集中的项目按概率从大到小进行排序,经维度编码为二进制数后,降低事务数据库的读取开销和存储开销,同时采用切片运算和剪枝技术降低规则挖掘运算的时间复杂度。实验结果表明,BF-Apriori算法降低了50%左右的存储开销及400%以上的执行时间,能提高数据挖掘的存储效率和运算速度。

关键词: 项目支持度分布, 行向量逆序转换, 列向量的转换, 切片运算, 逆序编码

Abstract: This paper designs an improved algorithm named BF-Apriori based on Binary and item support distribution. The algorithm analyses the probability distribution of the items, sorts them in descending order of the probability, and applies dimensions coding to reduce the cost of the database transactions to read and store overhead. While the slice operation and effective pruning scheme are used to reduce the time complexity of rule mining computing. Experimental results show BF-Apriori algorithm reduces about 50% of the storage and more than 400% of the execution time, it can improve the storage efficiency and computational speed in data mining.

Key words: item support distribution, Reverse Transform on Row(RTR), Transform on Column(TC), slice operation, reverse coding

中图分类号: