Fast Causal Rule Mining Algorithm

doi:10.19678/j.issn.1000-3428.0260095

Abstract

Abstract: Causal relationship mining aims to reveal latent causal mechanisms from complex data. Existing studies, mostly based on Bayesian network frameworks or simple filtration of association rules, generally face challenges such as low mining efficiency and difficulty in controlling unobserved confounding variables, resulting in insufficient accuracy and robustness of causal identification. To address this, this paper proposes a fast causal rule mining algorithm. This algorithm utilizes a prefix-tree structure for frequent pattern mining and integrates multiple pruning strategies to significantly enhance mining efficiency. Furthermore, it introduces a covariate mechanism and a matched transaction pair technique to effectively control confounding factors, thereby improving the reliability of causal rules. Experimental results demonstrate that the computational efficiency of the proposed algorithm is improved by 3 to 4 orders of magnitude compared to baseline algorithms. On large-scale datasets, its execution time is further reduced by 30%–50% compared to similar variants. In terms of accuracy, compared with baseline causal methods, the proposed algorithm maintains a stable Precision in the range of 0.69–0.90 and generally achieves an improvement of over 40%–60% in F1-score. These results fully validate the efficiency and superiority of the proposed algorithm in large-scale causal rule mining tasks.

摘要： 因果关系挖掘旨在从复杂数据中揭示潜在的因果机制。现有研究多依赖贝叶斯网络框架或对关联规则进行简单过滤，普遍面临挖掘效率低下及未观测混杂变量难以控制等瓶颈，严重制约了因果识别的准确性与鲁棒性。鉴于此，本文提出了一种快速因果规则挖掘算法。该算法基于前缀树结构优化频繁模式挖掘过程，并融合多种剪枝策略显著提升计算效率；同时，引入协变量机制与匹配事务对技术，有效消除混杂因素干扰，从而增强因果规则的可靠性。实验结果表明，该算法的计算效率较基准算法提升了3至4个数量级；在大规模数据集上，其运行时间较同类变体进一步缩短了30%–50%。在准确性方面，相较于基准因果发现方法，该算法的精确率稳定在0.69–0.90区间，F1分数普遍提升40%–60%以上。上述结果充分验证了该算法在大规模因果规则挖掘任务中的高效性与优越性。

LIU Shuohan, WU Youxi, ZHANG Yajie, LIU Jingyu, LI Yan. Fast Causal Rule Mining Algorithm[J]. Computer Engineering, doi: 10.19678/j.issn.1000-3428.0260095.

刘朔含, 武优西, 张雅杰, 刘靖宇, 李艳. 快速因果规则挖掘算法[J]. 计算机工程, doi: 10.19678/j.issn.1000-3428.0260095.

/ Recommend / Download Citations

URL: https://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0260095

References

[1] TAO Z, CHEN H, SUN Y, et al. Active differentiable structure learning for clinical causal discovery[J]. Knowledge-Based Systems, 2025: 114145.
[2] TANG X, GUO R, MO Z, et al. Causality-driven candidate identification for reliable DNA methylation biomarker discovery[J]. Nature Communications, 2025, 16(1): 680.
[3] KUMAR S, VIVEK Y, RAVI V, et al. A comprehensive review of causal inference in banking, finance, and insurance[J]. ACM Computing Surveys, 2025, 57(12): 1-36.
[4] ABADIE A, IMBENS G W. Matching on the estimated propensity score[J]. Econometrica, 2016, 84(2): 781-807.
[5] LIU J, NIYOGI D. Identification of linkages between urban heat island magnitude and urban rainfall modification by use of causal discovery algorithms[J]. Urban Climate, 2020, 33: 100659.
[6] PINHEIRO C, GUERREIRO S, MAMEDE H S. A survey on association rule mining for enterprise architecture model discovery[J]. Business & Information Systems Engineering, 2023, 66(6): 777-798.
[7] HAN J, PEI J, YIN Y, et al. Mining frequent patterns without candidate generation: a frequent-pattern tree approach[J]. Data Mining and Knowledge Discovery, 2004, 8(1): 53-87.
[8] AGRAWAL R, SRIKANT R. Mining sequential patterns[C]//Proceedings of the 11th International Conference on Data Engineering. New York, NY: IEEE Computer Society, 1995: 3-14.
[9] PEARL J. Probabilistic reasoning in intelligent systems: networks of plausible inference[M]. San Francisco, USA: Morgan Kaufmann, 1988.
[10] NEUBERG L G. Causality: models, reasoning, and inference, by Judea Pearl, Cambridge University Press, 2000[J]. Econometric Theory, 2003, 19(4): 675-685.
[11] YAO L, CHU Z, LI S, et al. A survey on causal inference[J]. ACM Transactions on Knowledge Discovery from Data, 2021, 15(5): 1-46.
[12] HEINZE-DEML C, MAATHUIS M H, MEINSHAUSEN N. Causal structure learning[J]. Annual Review of Statistics and Its Application, 2018, 5(1): 371-391.
[13] SHAWKAT M, BADAWI M, EL-GHAMRAWY S, et al. An optimized FP-growth algorithm for discovery of association rules[J]. The Journal of Supercomputing, 2022, 78(4): 5479-5506.
[14] CORMEN T H, LEISERSON C E, RIVEST R L, et al. Introduction to algorithms[M]. 4th ed. Cambridge, USA: MIT Press, 2022.
[15] IMBENS G W, RUBIN D B. Causal inference in statistics, social, and biomedical sciences[M]. Cambridge, UK: Cambridge University Press, 2015.
[16] KE Y H, HUANG J W, LIN W C, et al. Finding possible promoter binding sites in DNA sequences by sequential patterns mining with specific numbers of gaps[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2020, 18(6): 2459-2470.
[17] 王运, 倪静. 基于用户行为序列的概率矩阵分解推荐算法[J]. 小型微型计算机系统, 2020, 41(7): 1357-1362. WANG Y, NI J. Probability matrix factorization recommendation algorithm based on user behavior sequence[J]. Journal of Chinese Computer Systems, 2020, 41(7): 1357-1362.
[18] 王珠林, 武优西, 王月华, 等. 具有周期间隙约束的负序列模式挖掘[J]. 计算机科学, 2023, 50(3): 147-154. WANG Z L, WU Y X, WANG Y H, et al. Mining negative sequential patterns with periodic gap constraints[J]. Computer Science, 2023, 50(3): 147-154.
[19] XIAO K, QIU P, LAN D, et al. HU-RNSP: Efficiently mining high-utility repeated negative sequential patterns[J]. Information Processing & Management, 2026, 63(2): 104402.
[20] LAN D, SUN C, DONG X, et al. TK-RNSP: Efficient top-k repetitive negative sequential pattern mining[J]. Information Processing & Management, 2025, 62(3): 104077.
[21] GAN W, LIN J C W, ZHANG J, et al. Fast utility mining on sequence data[J]. IEEE Transactions on Cybernetics, 2020, 51(2): 487-500.
[22] GAN W, LIN J C W, ZHANG J, et al. Utility mining across multi-dimensional sequences[J]. ACM Transactions on Knowledge Discovery from Data, 2021, 15(5): 1-24.
[23] 武优西, 刘茜, 闫文杰, 等. 无重叠条件严格模式匹配的高效求解算法[J]. 软件学报, 2021, 32(11): 3331-3350.WU Y X, LIU Q, YAN W J, et al. Efficient algorithm for solving strict pattern matching under nonoverlapping condition[J]. Journal of Software, 2021, 32(11): 3331-3350.
[24] GLYMOUR C, ZHANG K, SPIRTES P. Review of causal discovery methods based on graphical models[J]. Frontiers in Genetics, 2019, 10: 524.
[25] VOWELS M J, CAMGOZ N C, BOWDEN R. D’ya like dags? A survey on structure learning and causal discovery[J]. ACM Computing Surveys, 2022, 55(4): 1-36.
[26] SPIRTES P, GLYMOUR C N, SCHEINES R. Causation, prediction, and search[M]. 2nd ed. Cambridge, USA: MIT Press, 2000.
[27] SPIRTES P L, MEEK C, RICHARDSON T S. Causal inference in the presence of latent variables and selection bias[J/OL]. arXiv:1302.4983[2026-03-22].
[28] CHICKERING D M. Optimal structure identification with greedy search[J]. Journal of Machine Learning Research, 2002, 3: 507-554.
[29] SHIMIZU S, HOYER P O, HYVÄRINEN A, et al. A linear non-Gaussian acyclic model for causal discovery[J]. Journal of Machine Learning Research, 2006, 7: 2003-2030.
[30] 霍子月,武优西,耿萌,等.一种因果模式挖掘算法[J/OL].计算机工程:1-9[2026-01-17].https://doi.org/10.19678/j.issn.1000-3428.0252246. HUO Z Y, WU Y X, GENG M, et al. A causal pattern mining algorithm[J/OL]. Computer Engineering: 1-9[2026-01-17]. https://doi.org/10.19678/j.issn.1000-3428.0252246.
[31] MORO S, RITA P, CORTEZ P. Bank marketing[DB/OL]. (2014)[2026-01-17]. https://doi.org/10.24432/C5K306.
[32] KAHN M. Diabetes[DS/OL]. [2026-01-17]. https://doi.org/10.24432/C5T59G.
[33] HOGUE J. Metro interstate traffic volume[DB/OL]. (2019)[2026-01-17]. https://doi.org/10.24432/C5X60B.
[34] BECKER B, KOHAVI R. Adult[DB/OL]. (1996)[2026-01-17]. https://doi.org/10.24432/C5XW20.
[35] LAURITZEN S L, SPIEGELHALTER D J. Local computations with probabilities on graphical structures and their application to expert systems[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1988, 50(2): 157-224.
[36] KORB K B, NICHOLSON A E. Bayesian Artificial Intelligence[M]. 2nd ed. Boca Raton: CRC Press, 2010.

Please choose a citation manager

Content to export