作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (02): 74-76. doi: 10.3969/j.issn.1000-3428.2007.02.025

• 软件技术与数据库 • 上一篇    下一篇

长生物数据集中频繁闭合模式挖掘算法研究

周 明,李 宏   

  1. (中南大学信息科学与工程学院,长沙 410083)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-01-20 发布日期:2007-01-20

Research of Frequent Closed Pattern Mining in Long Biological Datasets

ZHOU Ming, LI Hong   

  1. (Institute of Information Science and Engineering, Central South University, Changsha 410083)
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-01-20 Published:2007-01-20

摘要: 传统频繁项集挖掘算法在处理稠密或长数据集(如基因表达数据集)时效率低且产生大量冗余模式,为解决这些问题一些学者提出了闭合模式的概念和挖掘闭合模式的算法,研究证明挖掘闭合模式可以显著减少项集数量并消除大量冗余模式。该文针对生物数据特点提出了一个新颖的挖掘频繁闭合模式的算法REMFOR,该算法在闭合模式概念和行枚举思想的基础上,采用垂直数据结构和fp-tree技术,对行集建立行fp-tree来挖掘频繁闭合模式。通过实例和实验证明该算法是正确有效的。

关键词: 数据挖掘, 频繁项集, 闭合模式

Abstract: Traditional algorithms for mining frequent itemsets are proved to be inefficient and produce many redundant patterns when they are applied to dense datasets or long datasets, such as gene expression datasets. To solve this problem, some researchers propose closed pattern conception and some algorithms. It is proved that these algorithms based on the conception of closed pattern can substantially reduce the number of rules and redundant patterns. According to the characters of biological datasets, a novel algorithm called REMFOR is dlsigned to mine frequent closed pattern. It is based on the conception of closed pattern, using row enumeration and vertical data structure, building row fp-tree on row set to mine frequent closed pattern. And it is proved to be correct and efficient by example and tests.

Key words: Data mining, Frequent itemsets, Closed pattern