作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (3): 87-94. doi: 10.19678/j.issn.1000-3428.0064165

• 人工智能与模式识别 • 上一篇    下一篇

基于递归分解的因果结构学习算法

蔡瑞初1, 张文辉1, 乔杰1, 郝志峰1,2   

  1. 1. 广东工业大学 计算机学院, 广州 510006;
    2. 汕头大学 理学院, 广东 汕头 515063
  • 收稿日期:2022-03-14 修回日期:2022-06-17 发布日期:2022-05-04
  • 作者简介:蔡瑞初(1983—),男,教授、博士生导师,主研方向为人工智能、因果关系发现;张文辉(通信作者),硕士研究生;乔杰,博士研究生;郝志峰,教授、博士生导师。
  • 基金资助:
    国家优秀青年科学基金(6212200101);国家自然科学基金(61876043,61976052)。

Causal Structure Learning Algorithm Based on Recursive Decomposition

CAI Ruichu1, ZHANG Wenhui1, QIAO Jie1, HAO Zhifeng1,2   

  1. 1. School of Computer Science and Technology, Guangdong University of Technology, Guangzhou 510006, China;
    2. College of Science, Shantou University, Shantou 515063, Guangdong, China
  • Received:2022-03-14 Revised:2022-06-17 Published:2022-05-04

摘要: 在高维小样本场景下,针对现有基于约束的因果结构学习方法存在因果结构学习效率低、马尔可夫等价类的问题,以非线性非高斯的高维小样本为研究对象,提出一种基于递归分解的因果结构学习算法CADR。在高维小样本的因果结构学习效率方面,结合递归分解的思想,将高维变量集递归分解为多个更小的子集,直到无法再分解或子集的大小达到阈值为止。在该过程中,变量集的减少缩减了条件独立性检验的条件候选集的搜索空间,从而提高学习效率。同时,为进一步识别马尔可夫等价类,根据非线性非高斯模型的因果方向的不可逆性,通过判断拟合噪声项与原因变量是否独立来识别马尔可夫等价类的因果方向。在仿真数据和真实因果结构数据上的实验结果表明,CADR不仅提高条件独立性检验的效率,而且能有效地区分马尔可夫等价类,学习到更精确的因果结构,其中,在真实因果结构实验中,与现有Xie_rec、PC_ANM和Notear_Sob方法相比,F1评分提高5%~12%。

关键词: 因果关系发现, 条件独立性检验, 高维小样本, 递归分解, 马尔可夫等价类

Abstract: In the case of high-dimensional small samples, owing to the problems of low efficiency and Markov equivalence class in the existing constraint-based causal structure learning methods, a causal structure learning algorithm, CADR, based on recursive decomposition, is proposed for nonlinear non-Gaussian high-dimensional small samples.When the learning efficiency of the causal structure of high-dimensional small samples is combined with the idea of recursive decomposition, the high-dimensional variable set is recursively decomposed into multiple smaller subsets exhaustively or until the subset size reaches a threshold.The reduced variable set reduces the search space of the conditional candidate set for the conditional independence test, thus improving the learning efficiency.Moreover, to identify the Markov equivalence class further, according to the irreversibility of the causal direction of the nonlinear non-Gaussian model, identify the causal direction of the Markov equivalence class by determining whether the fitting noise item is independent of the causal variable.The experimental results of simulation and real causal structure data indicate that CADR improves the efficiency of the conditional independence test and can effectively distinguish Markov equivalence classes and learn an accurate causal structure.In the real causal structure experiment, the F1 score increased by 5%-12% when the existing Xie_rec, PC_ANM, and Notear_Sob method are compared with the proposed method.

Key words: causal relationship discovery, conditional independence test, high-dimensional small samples, recursive decomposition, Markov equivalence class

中图分类号: