作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (12): 63-70. doi: 10.19678/j.issn.1000-3428.0066474

• 人工智能与模式识别 • 上一篇    下一篇

基于结构方程似然框架的缺失值因果学习算法

郝志峰1,2, 喻建华1, 乔杰1, 蔡瑞初1,*   

  1. 1. 广东工业大学 计算机学院, 广州 510006
    2. 汕头大学 理学院, 广东 汕头 515063
  • 收稿日期:2022-12-08 出版日期:2023-12-15 发布日期:2023-12-14
  • 通讯作者: 蔡瑞初
  • 作者简介:

    郝志峰(1968—),男,教授、博士生导师,主研方向为人工智能、代数及其应用

    喻建华,硕士研究生

    乔杰,博士研究生

  • 基金资助:
    国家自然科学基金(61876043); 国家自然科学基金(61976052); 国家自然科学基金(62206064); 国家优秀青年科学基金(62122022); 科技创新2030—“新一代人工智能”重大项目(2021ZD0111501)

Missing Value Causal Learning Algorithm Based on Structural Equation Likelihood Framework

Zhifeng HAO1,2, Jianhua YU1, Jie QIAO1, Ruichu CAI1,*   

  1. 1. School of Computer, Guangdong University of Technology, Guangzhou 510006, China
    2. College of Science, Shantou University, Shantou 515063, Guangdong, China
  • Received:2022-12-08 Online:2023-12-15 Published:2023-12-14
  • Contact: Ruichu CAI

摘要:

探索事物之间的因果关系是数据科学的核心问题。在实际场景中,缺失值的存在给基于约束的方法和基于结构方程模型的方法带来巨大挑战。现有的缺失值因果学习方法虽然可以处理随机缺失数据上的因果结构学习问题,但是对于非随机缺失数据,学习因果结构网络中的因果对和马尔可夫等价类结构以及校正因缺失导致错误因果方向等仍未得到解决。为此,基于结构方程似然框架提出新的缺失值因果学习算法MV-SELF。利用非线性加性噪声模型的条件概率分布可以转换为噪声分布表示性质,设计一种基于最大化似然的评分,实现基于评分的因果结构搜索框架。同时,为解决非随机缺失下的因果结构学习问题,利用逆概率加权校正工具来恢复缺失数据的联合分布,从而校正因缺失导致的冗余边和错误因果方向,实现对缺失数据上的高维因果结构搜索。仿真实验结果表明,相比TD-PC、MVPC、Structure EM算法,MV-SELF的F1值提高了3%~19%,能有效区分马尔可夫等价类。

关键词: 结构方程似然框架, 缺失数据, 逆概率加权, 因果方向学习, 加性噪声模型

Abstract:

Exploring causal relationships between entities is crucial in data science. In practical scenarios missing values pose significant challenges to both constraint-based and structural equation model-based methods. Although existing causal learning methods effectively address random missing data, discerning causal structures in non-random missing data remains problematic. Challenges include learning causal pairs, identifying Markov equivalence class structures, and correcting causal direction errors in causal structure networks. To tackle these issues, this paper introduces a novel algorithm, MV-SELF, based on the structural equation likelihood framework. This algorithm transforms the conditional probability distribution of a nonlinear Additive Noise Model(ANM) into a representation of noise distribution. Consequently, it enables a maximum likelihood-based scoring mechanism for causal structure search. Additionally, MV-SELF utilizes Inverse Probability Weight(IPW)correction to counteract non-random deletions. This approach effectively restores the joint distribution of missing data, thereby correcting redundant edges and inaccurate causal directions. It facilitates high-dimensional causal structure searches in datasets with missing values. Simulation experiments reveal that MV-SELF outperforms TD-PC, MVPC, and Structure EM algorithms, achieving a 3% to 19% increase in F1 value. This improvement highlights MV-SELF's effectiveness in distinguishing Markov equivalence classes.

Key words: structural equation likelihood framework, missing datas, Inverse Probability Weight(IPW), causal discovery learning, Additive Noise Model(ANM)