作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (1): 65-72. doi: 10.19678/j.issn.1000-3428.0063615

• 人工智能与模式识别 • 上一篇    下一篇

面向故障间格兰杰因果发现的霍克斯过程研究

蔡瑞初, 吴思宇, 乔杰   

  1. 广东工业大学 计算机学院, 广州 510006
  • 收稿日期:2021-12-24 修回日期:2022-01-26 发布日期:2022-03-22
  • 作者简介:蔡瑞初(1983-),男,教授、博士生导师,主研方向为因果关系、机器学习、数据挖掘;吴思宇,硕士研究生;乔杰(通信作者),博士研究生。
  • 基金资助:
    国家自然科学基金(61876043);国家自然科学基金优秀青年科学基金项目(62122022);广州市科技计划(201902010058)。

Study of Hawkes Process for Granger Causality Discovery Among Faults

CAI Ruichu, WU Siyu, QIAO Jie   

  1. School of Computer, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2021-12-24 Revised:2022-01-26 Published:2022-03-22

摘要: 现有因果关系建模方法应用于故障事件序列时,难以有效引入因果先验,使得算法结果过于稠密,同时在稀疏、时间精度低的数据上因果关系可靠性较差。将不同故障类型事件的因果关系建模为基于霍克斯过程的格兰杰因果关系,提出一种面向故障序列的格兰杰因果发现的霍克斯过程模型。将霍克斯过程拓展到离散时间域,解决低时间精度数据的建模问题,并通过构造基于贝叶斯信息准则的目标函数,保证因果结构稀疏性,进而利用基于EM算法与爬山法的迭代优化算法引入因果先验,提高模型的可靠性。实验结果表明,该方法在由不同参数生成的模拟数据上均表现突出,且在两个通信网络的真实数据集中,F1评分相比ADM4、MLE-SGL、TSSO和PCMCI算法提升15.18%以上。而通过引入根因标注和因果依赖性先验,算法的F1评分进一步提升22.43%以上,验证了引入先验的有效性。

关键词: 事件序列, 格兰杰因果, 霍克斯过程, 贝叶斯信息准则, 期望最大化算法, 爬山法

Abstract: When existing causality modeling methods are applied to fault event sequences, it is difficult to introduce causal priors, and they suffer from problems, such as extremely dense algorithm results and poor causality reliability on sparse, low-time precision data.The Hawkes process is used to model the causal relationships of various fault-type events as Granger causality, and a Hawkes process model for Granger causality discovery of fault sequences is proposed.To solve the modeling problem of low-time precision data, the Hawkes process is extended to the discrete-time domain, the causal structure sparsity is ensured by constructing an objective function based on the Bayesian Information Criterion(BIC), and the causal prior is introduced using an iterative optimization algorithm based on the Expectation-Maximization (EM) algorithm and the Hill-Climbing method to improve the model's reliability.The experimental results demonstrate that the algorithm performs exceptionally well on simulated data generated by various parameters, and the F1 score is improved by more than 15.18% compared to ADM4, MLE-SGL, TSSO and PCMCI algorithms in the real dataset of two communication networks.By incorporating root-cause labeling and a causal dependency prior, the F1 score of the algorithm can be improved by more than 22.43%, demonstrating the effectiveness of the prior.

Key words: event sequence, Granger causality, Hawkes process, Bayesian Information Criterion(BIC), Expectation-Maximization(EM) algorithm, Hill-Climbing method

中图分类号: