计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于不确定性数据的频繁闭项集挖掘算法

章淑云,张守志   

  1. (复旦大学计算机科学技术学院,上海 200433)
  • 收稿日期:2012-12-10 出版日期:2014-03-15 发布日期:2014-03-13
  • 作者简介:章淑云(1989-),女,硕士研究生,主研方向:数据仓库,数据挖掘;张守志,副教授。

Mining Algorithm of Frequent Closed Itemsets Based on Uncertain Data

ZHANG Shu-yun, ZHANG Shou-zhi   

  1. (School of Computer Science, Fudan University, Shanghai 200433, China)
  • Received:2012-12-10 Online:2014-03-15 Published:2014-03-13

摘要: 对于不确定性数据,传统判断项集是否频繁的方法并不能准确表达项集的频繁性,同样对于大型数据,频繁项集显得庞大和冗余。针对上述不足,在水平挖掘算法Apriori的基础上,提出一种基于不确定性数据的频繁闭项集挖掘算法UFCIM。利用置信度概率表达项集频繁的准确性,置信度越高,项集为频繁的准确性也越高,且由于频繁闭项集是频繁项集的一种无损压缩表示,因此利用压缩形式的频繁闭项集替代庞大的频繁项集。实验结果表明,该算法能够快速地挖掘出不确定性数据中的频繁闭项集,在减少项集冗余的同时保证项集的准确性和完整性。

关键词: 不确定性数据, 频繁闭项集, 数据挖掘, 水平挖掘, 置信度概率

Abstract: For the uncertain data, traditional method of judging whether an itemset is frequent cannot express how close the estimate is, meanwhile frequent itemsets are large and redundant for large datasets. Regarding to the above two disadvantages, this paper proposes a mining algorithm of frequent closed itemsets based on uncertain data called UFCIM to mine frequent closed itemsets from uncertain data according to frequent itemsets mining method from uncertain data, and it is based on level mining algorithm Apriori. It uses probability of confidence to express how close the estimate is, the larger that probability of confidence is, the itemsets are more likely to be frequent. Besides as frequent closed itemsets are compact and lossless representation of frequent itemsets, so it uses compacted frequent closed itemsets to take place of frequent itemsets which are of huge size. Experimental result shows the UFCIM algorithm can mine frequent closed itemsets effectively and quickly. It can reduce redundancy and meanwhile assure the accuracy and completeness of itemsets.

Key words: uncertain data, frequent closed itemsets, data mining, level mining, probability of confidence

中图分类号: