计算机工程 ›› 2019, Vol. 45 ›› Issue (5): 169-174,181.doi: 10.19678/j.issn.1000-3428.0052261

• 人工智能及识别技术 • 上一篇    下一篇

基于Top-k的高效用模式挖掘算法

赵林柳,吕鑫,陶飞飞   

  1. 河海大学 计算机与信息学院,南京 211100
  • 收稿日期:2018-07-30 出版日期:2019-05-15 发布日期:2019-05-15
  • 作者简介:赵林柳(1994—),女,硕士研究生,主研方向为数据挖掘;吕鑫(通信作者),讲师、博士;陶飞飞,副教授、博士。
  • 基金项目:

    国家重点研发计划(2018YFC0407105,2016YFC0400910);国家自然科学基金面上项目(61272543);NSFC-广东联合基金重点项目(U1301252)。

Efficient Algorithm for High Utility Pattern Mining Based on Top-k

ZHAO Linliu,LV Xin,TAO Feifei   

  1. College of Computer and Information,Hohai University,Nanjing 211100,China
  • Received:2018-07-30 Online:2019-05-15 Published:2019-05-15

摘要:

通过用户设定阈值获取高效用模式的算法效率较低且挖掘结果不一定满足用户需求。针对这一问题,基于EFIM算法提出一种高效用Top-k模式挖掘算法。由用户指定高效用模式个数来代替人为阈值设定。采用基于扩展效用和剩余效用的双重剪枝策略,有效控制模式的增长。在数据库投影过程中,应用事务排序及合并策略减少运行时间和内存消耗。实验结果表明,该算法在运行时间和内存消耗上具有较大优势,尤其适用于密集型数据集的高效用模式挖掘。

关键词: 高效用模式, Top-k模式, 扩展效用值, 剩余效用值, 数据库投影

Abstract:

Getting the high utility pattern through user-specified threshold is inefficient,and the result of mining may not satisfy user’s needs.Therefore,an efficient Top-k pattern mining algorithm based on EFIM algorithm is proposed.The number of high utility patterns specified by the user replaces the artificial threshold setting.By using the pruning strategy based on extended utility and remaining utility,the growth of the model is effectively controlled.Transaction sorting and merging strategies are applied in the database projection,which effectively reduces the running time and memory consumption.Experimental results show that this algorithm has great advantages in running time and memory consumption,and is especially suitable for high utility pattern mining of intensive datasets.

Key words: high utility pattern, Top-k pattern, extended utility, remaining utility, database projection

中图分类号: