Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering

   

A Causal Pattern Mining Algorithm

  

  • Published:2025-07-16

一种因果模式挖掘算法

Abstract: Sequence pattern mining aims to extract frequent ordered subsequences from data. However, sequence patterns do not directly represent causal relationships, as the occurrence of earlier events may not necessarily be the triggers for subsequent ones. Causal inference can be used to discover causal patterns in sequential databases, but existing causal discovery methods for sequential data predominantly rely on expert-defined priors, which limits their applicability to knowledge-scarce scenarios. To address this issue, this paper proposes an algorithm, CP (Causal Pattern Mining, CP), based on mining both positive and negative sequence patterns. CP adopts a pattern join strategy to reduce the number of candidate positive sequence patterns. CP uses an intersection-based method to efficiently calculate the occurrence list of candidate negative sequential patterns. Additionally, CP introduces a matching sequence pair algorithm to improve the credibility of the results. Experimental results show that CP improves running time by 11.8%, 60.336%, 55.501%, 25.737%, and 84.252% compared with CP-a, CP-b, CP-d, CP-e, and CP-m, respectively. It also reduces the number of candidate causal patterns by 56.057% compared with CP-m, and the number of candidate positive patterns by 66.415% compared with CP-b and CP-d. Moreover, CP achieves approximately a 50% improvement in F1-score over NOTEARS. Unlike PC, which can only mine single-variable causal patterns, CP is capable of mining combinatorial causal patterns. These results demonstrate that CP outperforms other algorithms.

摘要: 序列模式挖掘旨在从数据中提取频繁有序子序列。然而,序列模式本身并不直接表示因果关系,即早期事件的出现并不必然触发后续事件。因果推断能够揭示序列数据库中的因果模式,但现有的序列数据因果发现方法大多依赖于专家定义的先验知识,这限制了它们在知识稀缺场景中的应用。为了解决这一问题,提出了一种基于正序列模式挖掘和负序列模式挖掘的CP算法(Causal Pattern Mining,CP),采用模式连接策略,以减少候选正序列模式的数量;采用基于交集的方法高效计算候选负序列模式的出现列表;使用匹配序列对算法,提高了结果的可信度。实验结果表明,CP算法相比CP-a、CP-b、CP-d、CP-e和CP-m算法在运行时间上分别提升了11.8%、60.336%、55.501%、25.737%、84.252%,在候选因果模式数量上比CP-m减少了56.057%,在候选正模式数量上比CP-b和CP-d均减少了66.415%。此外,与NOTEARS相比,CP算法的F1-score提升了约50%。相比之下,PC算法仅能挖掘单变量因果模式,而CP算法能够挖掘出组合变量的因果模式。综上,CP算法在因果模式挖掘中的性能优于现有对比算法。