作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2023, Vol. 49 ›› Issue (11): 150-159. doi: 10.19678/j.issn.1000-3428.0066262

• 网络空间安全 • 上一篇    下一篇

基于图神经网络的不平衡欺诈检测研究

陈安琪1, 陈睿1,*, 邝祝芳1, 黄华军2   

  1. 1. 中南林业科技大学 计算机与信息工程学院, 长沙 410004
    2. 湖南财政经济学院 信息技术与管理学院, 长沙 410205
  • 收稿日期:2022-11-15 出版日期:2023-11-15 发布日期:2023-11-08
  • 通讯作者: 陈睿
  • 作者简介:

    陈安琪(1998—),女,硕士研究生,主研方向为欺诈检测

    邝祝芳,教授、博士

    黄华军,教授、博士

  • 基金资助:
    国家重点研发计划(2019YFE0122600); 国家自然科学基金(62072477); 国家自然科学基金(61309027); 湖南省重点研发计划(63223008)

Research on Imbalance Fraud Detection Based on Graph Neural Network

Anqi CHEN1, Rui CHEN1,*, Zhufang KUANG1, Huajun HUANG2   

  1. 1. School of Computer and Information Engineering, Central South University of Forestry and Technology, Changsha 410004, China
    2. School of Information Technology and Management, Hunan University of Finance and Economics, Changsha 410205, China
  • Received:2022-11-15 Online:2023-11-15 Published:2023-11-08
  • Contact: Rui CHEN

摘要:

现阶段图神经网络被广泛应用于欺诈检测,由于欺诈检测中往往存在类不平衡问题,导致基于图神经网络模型性能不佳。针对上述问题,设计一种基于图神经网络的不平衡欺诈检测模型。该模型细化了图结构数据中存在的邻域不平衡和中心不平衡两个不平衡的概念。在邻域不平衡中,通过多层感知机和高斯核函数衡量中心节点与其邻域节点的非欧氏空间距离(相似度),基于马尔可夫决策动态更新采样阈值对邻域节点进行多层自适应欠采样,并在每一层中仅聚合其原始特征和前一层的隐藏嵌入得到中心节点的目标嵌入;在中心不平衡中,引入加权交叉熵损失函数为每个中心节点的损失设置动态权重以达到中心平衡。在Yelp和Amazon两个数据集上的实验结果表明,该模型的曲线下面积(AUC)、召回率(Recall)两个指标相较于最优基准模型均有显著提升,在两个数据集上的AUC和Recall分别提升了5.52%、5.42%和1.57%、4.31%。

关键词: 图神经网络, 欺诈检测, 类不平衡, 马尔可夫决策, 加权交叉熵损失函数

Abstract:

Currently, graph neural network is widely used in fraud detection. Because of the class imbalance problem in fraud detection, the performance of the model based on graph neural network is poor. To solve these problems, an unbalanced fraud detection model is proposed based on graph neural network. This model refines two concepts of imbalance in graph structure data, viz. neighborhood imbalance and center imbalance. In neighborhood imbalance, first, the non-Euclidean space distance(similarity) between the central node and its neighborhood nodes is measured by using Multilayer Perceptron(MLP) and Gaussian kernel function. Second, Markov decision is used to dynamically update the sampling threshold to conduct multi-level adaptive undersampling for neighborhood nodes. Finally, target embedding of the central node is realized by aggregating only its original features and the hidden embedding of the previous layer in each layer. In central imbalance, the weighted cross-entropy loss function is introduced to set the dynamic weight for the loss of each central node to achieve central balance. The experimental results obtained from Yelp and Amazon data sets show that the model is significantly improved compared with the optimal benchmark model in terms of Area Under Curve(AUC) and Recall. AUC and Recall on the two datasets increased by 5.52% and 5.42%, and 1.57% and 4.31%, respectively.

Key words: Graph Neural Network(GNN), fraud detection, class imbalance, Markov decision, weighted cross-entropy loss function