Recursive Causal Inference Algorithm Based on Partial Correlation Test

doi:10.19678/j.issn.1000-3428.0062524

Computer Engineering ›› 2022, Vol. 48 ›› Issue (10): 123-129. doi: 10.19678/j.issn.1000-3428.0062524

• Artificial Intelligence and Pattern Recognition • Previous Articles Next Articles

Recursive Causal Inference Algorithm Based on Partial Correlation Test

CHEN Mingjie^1,2, ZHANG Hao^2,3, PENG Yuzhong⁴, XIE Feng⁵, PANG Yue^3,6

1. School of Computer Science and Technology, Dongguan University of Technology, Dongguan, Guangdong 523808, China;
2. School of Computer, Guangdong University of Petrochemical Technology, Maoming, Guangdong 525099, China;
3. School of Computer Science, Fudan University, Shanghai 200433, China;
4. School of Computer and Information Engineering, Nanning Normal University, Nanning 530001, China;
5. School of Mathematical Sciences, Peking University, Beijing 100871, China;
6. China UnionPay Post-Doctoral Research Station, Shanghai 201201, China

Received:2021-08-28 Revised:2021-10-29 Published:2021-11-05

基于偏相关性测试的递归式因果推断算法

陈铭杰^1,2, 张浩^2,3, 彭昱忠⁴, 谢峰⁵, 庞悦^3,6

1. 东莞理工学院计算机科学与技术学院, 广东东莞 523808;
2. 广东石油化工学院计算机学院, 广东茂名 525099;
3. 复旦大学计算机科学技术学院, 上海 200433;
4. 南宁师范大学计算机与信息工程学院, 南宁 530001;
5. 北京大学数学科学学院, 北京 100871;
6. 中国银联博士后科研工作站, 上海 201201

作者简介:陈铭杰(1999—),男,硕士研究生,主研方向为机器学习、因果推断;张浩(通信作者),讲师、博士;彭昱忠,教授、博士;谢峰、庞悦,博士后。
基金资助:
国家自然科学基金（62006051）；中国博士后科学基金（2020M680225）；广东省高校青年创新人才项目（2020KQNCX049）。

Abstract

Abstract: Causal inference is an important tool for mining relationships between observed data points.The causal inference algorithm encounters the problems of redundant tests and low test efficiency in high-dimensional cases, which limits the application of causal inference in high-dimensional datasets.This study proposes a recursive causal inference algorithm based on partial correlation test.The strategy of ‘divide and conquer’ is used to perform the recursive causal segmentation of the variable set to obtain the low-dimensional sub-dataset, which is easier to handle and improves the processing efficiency of the dataset.Local causal inference is performed on each subset to reduce the computation amount for each causal inference and improve the running speed of the algorithm.Thereafter, the significant values of the merger strategy are compared to integrate all subresults and obtain a complete causal relationship to ensure the accuracy of the overall causal structure.By ‘dividing and conquering’, an efficient partial correlation test is used to avoid the high complexity of kernel density estimation and further improve the efficiency of the algorithm.Experiments are performed on ten classical data sets.The results show that when the accuracy is the same as that of the classical inference algorithm, CAPA, the operation speed of this algorithm improved by two to ten times.The improvement effect is more obvious on the dataset with a larger sample size, which proves that the recursive causal inference algorithm can effectively handle high-dimensional datasets, ensure a good accuracy, and improve the operational efficiency.

Key words: causal inference, causal network, Conditional Independence(CI) test, partial correlation test, recursive algorithm

摘要： 因果推断是挖掘事物间联系的一种重要方式，但在高维数据场景下，利用因果推断算法进行条件独立性（CI）测试存在冗余测试多和测试效率低的问题，这限制了因果推断在高维数据集上的应用。提出一种基于偏相关性测试的递归式因果推断算法。采用“分治”的方法对变量集进行递归式因果分割，得到更易于处理的低维子数据集，提高对数据集的处理效率。在每个子数据集上进行局部因果推断，减少每次因果推断的计算量并提升算法的运行速度。在此基础上，通过比较显著性值的合并策略整合所有子结果并得到完整的因果关系，保证总体因果结构的准确性。在“分治”过程中，采用高效的偏相关性测试避免高复杂度的核密度估算，进一步提升算法效率。基于10个经典数据集的实验结果表明，在准确率与经典推断算法CAPA持平的情况下，该算法的运算速度提升了2~10倍，且在样本量越大的数据集中提升效果越明显，证明递归式因果推断算法可以有效处理高维数据集，在保证准确率的同时提高运算效率。

关键词: 因果推断, 因果网络, 条件独立性测试, 偏相关性测试, 递归式算法

CLC Number:

TP18

CHEN Mingjie, ZHANG Hao, PENG Yuzhong, XIE Feng, PANG Yue. Recursive Causal Inference Algorithm Based on Partial Correlation Test[J]. Computer Engineering, 2022, 48(10): 123-129.

陈铭杰, 张浩, 彭昱忠, 谢峰, 庞悦. 基于偏相关性测试的递归式因果推断算法[J]. 计算机工程, 2022, 48(10): 123-129.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0062524

http://www.ecice06.com/EN/Y2022/V48/I10/123

Figures/Tables 6

References

[1] 吴育锋.统计独立性的离散化新方法[J].计算机应用与软件, 2012, 29(4):249-252. WU Y F.A novel discretization method for statistical independence[J].Computer Applications and Software, 2012, 29(4):249-252.(in Chinese)
[2] ZHANG H, ZHOU S G, YAN C X, et al.Recursively learning causal structures using regression-based conditional independence test[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2019:3108-3115.
[3] 郑巧夺, 吴贞东, 邹俊颖.基于双层CNN-BiGRU-CRF的事件因果关系抽取[J].计算机工程, 2021, 47(5):58-64, 72. ZHENG Q D, WU Z D, ZOU J Y.Event causality extraction based on two-layer CNN-BiGRU-CRF[J].Computer Engineering, 2021, 47(5):58-64, 72.(in Chinese)
[4] SPIRTES P, GLYMOUR C, SCHEINES R.Causation, prediction, and search, second edition[M].Cambridge, USA:MIT Press, 2000.
[5] 张浩, 郝志峰, 蔡瑞初, 等.基于互信息的适用于高维数据的因果推断算法[J].计算机应用研究, 2015, 32(2):382-385. ZHANG H, HAO Z F, CAI R C, et al.High dimensional causality discovering based on mutual information[J].Application Research of Computers, 2015, 32(2):382-385.(in Chinese)
[6] SU L J, WHITE H.A nonparametric hellinger metric test for conditional independence[J].Econometric Theory, 2008, 24(4):829-864.
[7] GRETTON A, BORGWARDT K M, RASCH M, et al.A kernel method for the two-sample-problem[M]//SCHÖLKOPF B, PLATT J, HOFMANN T.Advances in neural information processing systems 19:proceedings of the 2006 conference.Cambridge, USA:MIT Press, 2008:513-520.
[8] FUKUMIZU K, GRETTON A, SUN X H, et al.Kernel measures of conditional dependence[J].Advances in Neural Information Processing Systems, 2007, 20(1):167-204.
[9] SRIPERUMBUDUR B K, FUKUMIZU K, LANCKRIET G R G.Universality, characteristic kernels and RKHS embedding of measures[J].Journal of Machine Learning Research, 2011, 12(Jul):2389-2410.
[10] FUKUMIZU K, GRETTON A, SCHOLKOPF B, et al.Characteristic kernels on groups and semigroups[C]//Proceedings of the 23rd Annual Conference on Neural Information Processing Systems.Cambridge, USA:MIT Press, 2009:473-480.
[11] SRIPERUMBUDUR B, FUKUMIZU K, LANCKRIET G.On the relation between universality, characteristic kernels and RKHS embedding of measures[C]//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics.Cambridge, USA:MIT Press, 2010:773-780.
[12] ZHANG K, PETERS J, JANZING D, et al.Kernel-based conditional independence test and application in causal discovery[C]//Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence.Barcelona, Spain:AUAI Press, 2011:804-813.
[13] ZHANG H, YAN C X, ZHOU S G, et al.Combined cause inference:definition, model and performance[J].Information Sciences, 2021, 574:431-443.
[14] ZHANG H, ZHOU S G, ZHANG K, et al.Causal discovery using regression-based conditional independence tests[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2017:1250-1256.
[15] ZHANG H, ZHOU S G, YAN C X, et al.Recursively learning causal structures using regression-based conditional independence test[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2019:3108-3115.
[16] ZHANG H, ZHOU S G, GUAN J H, et al.Measuring conditional independence by independent residuals for causal discovery[J].ACM Transactions on Intelligent Systems and Technology, 2019, 10(5):50-69.
[17] ZHANG H, ZHOU S, GUAN J.Testing independence between linear combinations for causal discovery[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2021:6538-6546.
[18] ZHANG H, ZHOU S G, GUAN J H.Measuring conditional independence by independent residuals:theoretical results and application in causal discovery[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence.Palo Alto, USA:AAAI Press, 2018:2029-2036.
[19] DAUDIN J J.Partial association measures and an application to qualitative regression[J].Biometrika, 1980, 67(3):581-590.
[20] FLAXMAN S R, NEILL D B, SMOLA A J.Gaussian processes for independence tests with non-iid data in causal inference[J].ACM Transactions on Intelligent Systems and Technology, 2016, 7(2):22.
[21] ZHANG H, ZHOU S, YAN C, et al.Learning causal structures based on divide and conquer[J].IEEE Transactions on Cybernetics, 2022, 52(5):3232-3243.
[22] CAI R C, ZHANG Z J, HAO Z F.SADA:a general framework to support robust causation discovery[C]//Proceedings of the 30th International Conference on Machine Learning.New York, USA:ACM Press, 2013:208-216.
[23] CAI R C, ZHANG Z J, HAO Z.SADA:A general framework to support robust Causation discovery with theoretical guarantee[DB/OL].(2017-07-05)[2021-06-10].https://arxiv.org/abs/1707.01283.

Please choose a citation manager

Content to export