A Causal Pattern Mining Algorithm

doi:10.19678/j.issn.1000-3428.0252246

Abstract

Abstract: Sequence pattern mining aims to extract frequent ordered subsequences from data. However, sequence patterns do not directly represent causal relationships, as the occurrence of earlier events may not necessarily be the triggers for subsequent ones. Causal inference can be used to discover causal patterns in sequential databases, but existing causal discovery methods for sequential data predominantly rely on expert-defined priors, which limits their applicability to knowledge-scarce scenarios. To address this issue, this paper proposes an algorithm, CP (Causal Pattern Mining, CP), based on mining both positive and negative sequence patterns. CP adopts a pattern join strategy to reduce the number of candidate positive sequence patterns. CP uses an intersection-based method to efficiently calculate the occurrence list of candidate negative sequential patterns. Additionally, CP introduces a matching sequence pair algorithm to improve the credibility of the results. Experimental results show that CP improves running time by 11.8%, 60.336%, 55.501%, 25.737%, and 84.252% compared with CP-a, CP-b, CP-d, CP-e, and CP-m, respectively. It also reduces the number of candidate causal patterns by 56.057% compared with CP-m, and the number of candidate positive patterns by 66.415% compared with CP-b and CP-d. Moreover, CP achieves approximately a 50% improvement in F1-score over NOTEARS. Unlike PC, which can only mine single-variable causal patterns, CP is capable of mining combinatorial causal patterns. These results demonstrate that CP outperforms other algorithms.

摘要： 序列模式挖掘旨在从数据中提取频繁有序子序列。然而，序列模式本身并不直接表示因果关系，即早期事件的出现并不必然触发后续事件。因果推断能够揭示序列数据库中的因果模式，但现有的序列数据因果发现方法大多依赖于专家定义的先验知识，这限制了它们在知识稀缺场景中的应用。为了解决这一问题，提出了一种基于正序列模式挖掘和负序列模式挖掘的CP算法（Causal Pattern Mining，CP），采用模式连接策略，以减少候选正序列模式的数量；采用基于交集的方法高效计算候选负序列模式的出现列表；使用匹配序列对算法，提高了结果的可信度。实验结果表明，CP算法相比CP-a、CP-b、CP-d、CP-e和CP-m算法在运行时间上分别提升了11.8%、60.336%、55.501%、25.737%、84.252%，在候选因果模式数量上比CP-m减少了56.057%，在候选正模式数量上比CP-b和CP-d均减少了66.415%。此外，与NOTEARS相比，CP算法的F1-score提升了约50%。相比之下，PC算法仅能挖掘单变量因果模式，而CP算法能够挖掘出组合变量的因果模式。综上，CP算法在因果模式挖掘中的性能优于现有对比算法。

HUO Ziyue, WU Youxi, GENG Meng, LIU Jingyu, LI Yan. A Causal Pattern Mining Algorithm[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0252246.

霍子月, 武优西, 耿萌, 刘靖宇, 李艳. 一种因果模式挖掘算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0252246.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0252246

References

[1] XIAO J, CHEN B, CHEN L, et al. Interpretable time-series neural turing machine for prognostic prediction of patients with type 2 diabetes in physician-pharmacist collaborative clinics[J]. International Journal of Medical Informatics, 2025, 195: 105737. [2] ZYARAH A, KUDITHIPUDI D. Time-series forecasting and sequence learning using memristor-based reservoir system[J]. ACM Transactions on Embedded Computing Systems, 2025, 24(1): 1-17. [3] 滕飞, 黄齐川, 李天瑞, 等. 大规模时间序列分析框架的研究与实现[J]. 计算机学报, 2020, 43(7): 1279-1292. Teng F, Huang Q C, Li T R, et al. An analysis framework for large-scale time series[J]. Chinese Journal of Computers, 2020, 43(7): 1279-1292. [4] OKOLICA J S, PETERSON G L, MILLS R F, et al. Sequence pattern mining with variables[J]. IEEE Transactions on Knowledge and Data Engineering, 2020, 32(1): 177-187. [5] AGRAWAL R, SRIKANT R. Mining sequential patterns[C]//Proceedings of the 11th International Conference on Data Engineering. New York, NY: IEEE Computer Society, 1995: 3-14. [6] 王珠林, 武优西, 王月华, 等. 具有周期间隙约束的负序列模式挖掘[J]. 计算机科学, 2023, 50(3): 147-154. WANG Z L, WU Y X, WANG Y H, et al. Mining negative sequential patterns with periodic gap constraints[J]. Computer Science, 2023, 50(3): 147-154. [7] WU Y X, CHEN M J, LI Y, et al. ONP-Miner: One-off negative sequential pattern mining[J]. ACM Transactions on Knowledge Discovery from Data, 2023, 17(3): 37:1-37:24. [8] GAN W S, LIN J C W, ZHANG J X, et al. Fast utility mining on sequence data[J]. IEEE Transactions on Cybernetics, 2021, 51(2): 487-500. [9] GAN W S, LIN J C W, ZHANG J X, et al. Utility mining across multi-dimensional sequences[J]. ACM Transactions on Knowledge Discovery from Data, 2021, 15(5): 82:1-82:24. [10] 王乐, 王水, 刘胜蓝, 等. 基于索引树的带通配符序列模式挖掘算法[J]. 计算机学报, 2019, 42(3): 554-565. WANG B, WANG S, LIU S L, et al. An algorithm of mining sequential pattern with wildcards based on index-tree[J]. Chinese Journal of Computers, 2019, 42(3): 554-565. [11] 武优西, 刘茜, 闫文杰, 等. 无重叠条件严格模式匹配的高效求解算法[J]. 软件学报, 2021, 32(11): 3331-3350. WU Y X, L X, Y W J, et al. Efficient algorithm for solving strict pattern matching under nonoverlapping condition [J]. Journal of Software, 2021, 32(11): 3331-3350. [12] LI Y, WANG Z L, LIU J, et al. Mining repetitive negative sequential patterns with gap constraints[J]. ACM Transactions on Knowledge Discovery from Data, 2025, 19(4): 86. [13] SUN C H, REN X Q, DONG X J, et al. Mining actionable repetitive positive and negative sequential patterns[J]. Knowledge-Based Systems, 2024, 302: 112398. [14] GIUDICE E, KUIPERS J, MOFFA G. The dual PC algorithm and the role of Gaussianity for structure learning of Bayesian networks[J]. International Journal of Approximate Reasoning, 2023, 161: 108975. [15] SHTELE E, BERIA P, TOLENTINO S. The evaluation of competition effect on rail fares using the difference-in-difference method through symmetric and lagged spans[J]. Journal of Rail Transport Planning & Management, 2024, 32: 100484. [16] KE Y H, HUANG J W, LIN J C W, et al. Finding possible promoter binding sites in DNA sequences by sequential patterns mining with specific numbers of gaps[J]. IEEE Transactions on Computational Biology and Bioinformatics, 2020, 18(6): 2459-2470. [17] 王运, 倪静. 基于用户行为序列的概率矩阵分解推荐算法[J]. 小型微型计算机系统, 2020, 41(7): 1357-1362. WANG Y, NI J. Probability matrix factorization recommendation algorithm based on user behavior sequence[J]. Journal of Chinese Computer Systems, 2020, 41(7): 1357-1362. [18] LIN J C W, LI T, PIROUZ M, et al. High average-utility sequential pattern mining based on uncertain databases[J]. Knowledge and Information Systems, 2020, 62(3): 1199-1228. [19] YU Q, HU Y, HU X N, et al. An efficient exact algorithm for planted motif search on large DNA sequence datasets[J]. IEEE Transactions on Computational Biology and Bioinformatics, 2024, 21(5): 1542-1551. [20] RAO X, JIANG R H, SHANG S, et al. Next point-of-interest recommendation with adaptive graph contrastive learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2025, 37(3):12105-12114. [21] DING Z J, LI K, CHEN L S, et al. Parallel online similarity join over trajectory streams[C]//Proceedings of the 34th International World Wide Web Conferences, New York, NY: Association for Computing Machinery, 2025: 3426-3437. [22] 蔡瑞初, 陈薇, 张坤, 等. 基于非时序观察数据的因果关系发现综述[J]. 计算机学报, 2017, 40(06): 1470-1490. CAI R C, CHEN W, ZHANG K, et al. A survey on non-temporal series, observational data based causal discovery[J]. Chinese Journal of Computers, 2017, 40(06): 1470-1490. [23] ZOU H, LI B, HAN J G, et al. Counterfactual prediction for outcome-oriented treatments[C]//Proceedings of the 39th International Conference on Machine Learning. New York, NY: Association for Computing Machinery, 2022: 27693-27706. [24] CAO F Y, WANG Y X, YU K, et al. Causal discovery from unknown interventional datasets over overlapping variable sets[J]. IEEE Transactions on Knowledge and Data Engineering, 2024, 36(12): 7725-7742. [25] HERNANDEZ GUILLAMET G, LOPEZ SEGUI F, VIDAL-ALABALL J, et al. CauRuler: Causal irredundant association rule miner for complex patient trajectory modelling[J]. Computers in Biology and Medicine, 2023, 155:106636. [26] Spirtes P, Glymour C. An algorithm for fast recovery of sparse causal graphs[J]. Social Science Computer Review, 1991, 9(1): 62-72. [27] Zheng X, Aragam B, Ravikumar P K, et al. Dags with no tears: Continuous optimization for structure learning[J]. Advances in Neural Information Processing Systems, 2018, 31

Please choose a citation manager

Content to export