作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (4): 78-86. doi: 10.19678/j.issn.1000-3428.0066901

• 人工智能与模式识别 • 上一篇    下一篇

基于非稳态加性噪声模型的因果发现算法

郝志峰1,2, 丁凯培1, 蔡瑞初1,*(), 陈薇1   

  1. 1. 广东工业大学计算机学院, 广东 广州 510006
    2. 汕头大学理学院, 广东 汕头 515063
  • 收稿日期:2023-02-10 出版日期:2024-04-15 发布日期:2023-04-18
  • 通讯作者: 蔡瑞初
  • 基金资助:
    国家自然科学基金(61876043); 国家自然科学基金(61976052); 国家自然科学基金(62206064); 科技创新2030—“新一代人工智能”重大项目(2021ZD0111501); 国家优秀青年科学基金(62122022)

Causal Discovery Algorithm Based on Non-stationary Additive Noise Model

Zhifen HAO1,2, Kaipei DING1, Ruich CAI1,*(), Wei CHEN1   

  1. 1. School of Computer, Guangdong University of Technology, Guangzhou 510006, Guangdong, China
    2. College of Science, Shantou University, Shantou 515063, Guangdong, China
  • Received:2023-02-10 Online:2024-04-15 Published:2023-04-18
  • Contact: Ruich CAI

摘要:

因果发现旨在通过观测数据挖掘变量间的因果关系。现有的因果发现方法大多假定数据的产生过程是平稳的, 然而在实际环境下往往不满足稳态假设, 导致结果不可靠。研究发现, 在一些场景中的非稳态扰动与时序信息高度相关。因此, 在加性噪声模型基础上将非稳态扰动刻画为一项关于时序信息的函数, 设计非稳态加性噪声模型, 并给出非稳态加性噪声模型的识别条件, 提出一种两阶段的因果关系学习算法。第1阶段利用回归计算得到变量残差, 再检验残差与回归特征集的独立性从而选出叶子节点, 迭代得到观测变量集的因果次序; 第2阶段再次进行回归计算和独立性检验, 消除第1阶段中冗余的因果关系, 从而得到观测变量集的因果结构。实验结果表明, 与基于约束的异构/非平稳因果发现、LPCMCI和TiMINo算法相比, 该算法在仿真数据集上取得了最优的效果, 平均F1值达到0.85;而在真实因果结构数据集中, 该算法的F1值平均提升41.12%, 能够从非稳态数据集中恢复出更多因果结构的信息。

关键词: 因果发现, 因果结构, 非稳态扰动, 加性噪声模型, 函数式因果模型

Abstract:

Causal discovery aims to mine the causal relationship between variables through observed data. Most existing methods assume that the data-generation process is stationary. However, this assumption is often not satisfied in the application environments, leading to unreliable results. This study reveals that non-stationary disturbances in some scenes are highly correlated with time-series information. Therefore, based on the additive noise model, the method portrays non-stationary disturbances as a mapping of time series information and proposes a non-stationary additive noise model and its identification conditions. This study proposes a two-stage causality discovery algorithm based on identification conditions. Specifically, residuals are obtained through regression analysis and are used to evaluate the independence of selecting a leaf node in the initial phase of the algorithm. The causal order of the observed variable sets is thereafter obtained iteratively until all the variables have been included. In the second phase of the algorithm, regression analysis and independence tests are performed again to eliminate redundant causal relationships identified in the first stage, which helps to obtain the final causal structure of the observed variable set. Experimental results demonstrate that the proposed algorithm outperforms other algorithms such as Constraint-based causal Discovery heterogeneous/Non-stationary Data (CD-NOD), LPCMCI, and TiMINo. For the synthetic datasets, the proposed algorithm achieves an average F1 value of 0.85. In real-world structural datasets, the F1 value of the proposed algorithm increases by an average of 41.12%, signifying that the algorithm can learn more information about the causal structure from a dataset of non-stationary variables.

Key words: causal discovery, causal structure, non-stationary disturbances, additive noise model, functional causal model