作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (3): 60-67. doi: 10.19678/j.issn.1000-3428.0067255

• 人工智能与模式识别 • 上一篇    下一篇

高效的一次性弱间隙序列模式挖掘算法

杨鸿茜1, 武优西1,*(), 耿萌1, 刘靖宇1, 李艳2   

  1. 1. 河北工业大学人工智能与数据科学学院, 天津 300401
    2. 河北工业大学经济管理学院, 天津 300401
  • 收稿日期:2023-03-23 出版日期:2024-03-15 发布日期:2024-03-13
  • 通讯作者: 武优西
  • 基金资助:
    国家自然科学基金(61976240)

Efficient One-off Weak Gap Sequential Pattern Mining Algorithm

Hongxi YANG1, Youxi WU1,*(), Meng GENG1, Jingyu LIU1, Yan LI2   

  1. 1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China
    2. School of Economics and Management, Hebei University of Technology, Tianjin 300401, China
  • Received:2023-03-23 Online:2024-03-15 Published:2024-03-13
  • Contact: Youxi WU

摘要:

间隙约束序列模式挖掘作为序列模式挖掘的一个重要分支,可以发现模式在序列中的重复出现。然而,当前研究主要针对单项序列进行挖掘,并且序列中每一项都被认为具有相同意义。为解决该问题,提出一次性弱间隙序列模式挖掘(OWP)算法,该算法由准备阶段、支持度计算和候选模式生成3个步骤组成。在准备阶段,建立倒排索引,并对不频繁的项进行剪枝;在支持度计算方面,利用倒排索引结构记录出现位置,避免对原始数据集的重复扫描;在候选模式生成方面,采用模式连接策略,减少冗余候选模式的生成。在项集序列和单项序列共6个真实数据集上的实验结果表明,OWP算法相比OWP-p、Ows-OWP和OWP-e算法在运行时间上分别提升了2.653、1.348、3.592倍,在内存消耗上分别减少了3.51%、0.07%、5%,说明OWP算法可以更高效地挖掘出用户感兴趣的模式。此外,OWP算法在以D1数据集为基础的6倍大小的数据集上的运行时间比D1数据集增长了3.763倍,内存消耗增长了2.310倍,运行时间和内存消耗的增加倍数均小于数据集大小的增加倍数,说明OWP算法具有良好的可扩展性。

关键词: 序列模式挖掘, 项集挖掘, 间隙约束, 一次性条件, 弱间隙约束

Abstract:

Sequential Pattern Mining(SPM) with gap constraint, as an important branch of sequential pattern mining, can discover the repetitive occurrences of patterns in sequences. However, the current studies mainly focus on datasets with items, and each item in the sequence is considered to have the same effects. To address this problem, the One-off Weak gap sequential Pattern mining(OWP) algorithm is proposed, which includes three steps: preparation stage, support calculation, and candidate pattern generation. In the preparation stage, an inverted index is constructed and infrequent items are pruned. For support calculation, the occurrence positions are recorded using an inverted index, which avoids repeated scanning of the original dataset. For candidate pattern generation, a pattern- join strategy is used to reduce the generation of redundant candidate patterns. Finally, from the experimental results on six real datasets comprising datasets with items and itemsets, it is observed that the OWP algorithm improves the runtime by 2.653, 1.348, and 3.592 times compared with the OWP-p, Ows-OWP, and OWP-e algorithms, respectively, and as concerns memory usage, it reduces by 3.51%, 0.07%, and 5%, respectively. This indicates that the OWP algorithm can mine the patterns of user interest more efficiently. In addition, the OWP algorithm on the dataset with six times the size based on D1 increases by 3.763 times and memory usage increases by 2.310 times compared to that of D1. Clearly, both the increases in runtime and memory usage are smaller than those in dataset size, implying that the OWP algorithm has good scalability.

Key words: Sequential Pattern Mining(SPM), mining with itemset, gap constraint, one-off condition, weak-gap constraint