作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (3): 156-165. doi: 10.19678/j.issn.1000-3428.0067948

• 网络空间安全 • 上一篇    下一篇

基于角度的图神经网络高维数据异常检测方法

王俊1, 赖会霞1,2,*(), 万玥1, 张仕1,3,*()   

  1. 1. 福建师范大学计算机与网络空间安全学院, 福建 福州 350117
    2. 福建省网络安全与密码技术重点实验室, 福建 福州 350117
    3. 福建师范大学数字福建环境监测物联网实验室, 福建 福州 350117
  • 收稿日期:2023-06-28 出版日期:2024-03-15 发布日期:2023-10-12
  • 通讯作者: 赖会霞, 张仕
  • 基金资助:
    国家自然科学基金(61772004); 福建省自然科学基金(2020J01161); 福建省科技厅对外合作项目(2023I0013)

Angle-based Graph Neural Network Method for Anomaly Detection in High Dimensional Data

Jun WANG1, Huixia LAI1,2,*(), Yue WAN1, Shi ZHANG1,3,*()   

  1. 1. College of Computer and Cyber Security, Fujian Normal University, Fuzhou 350117, Fujian, China
    2. Fujian Provincial Key Laboratory of Network Security and Cryptography, Fuzhou 350117, Fujian, China
    3. Digit Fujian Internet-of-Things Laboratory of Environmental Monitoring, Fujian Normal University, Fuzhou 350117, Fujian, China
  • Received:2023-06-28 Online:2024-03-15 Published:2023-10-12
  • Contact: Huixia LAI, Shi ZHANG

摘要:

在高维数据空间中,数据大都处于高维空间边缘且分布十分稀疏,由此引起的“维度灾难”问题导致现有异常检测方法无法保证异常检测精度。为解决该问题,提出一种基于角度的图神经网络高维数据异常检测方法A-GNN。首先通过数据空间的均匀采样和初始训练数据的扰动来扩充用于训练的数据;然后利用k近邻关系构造训练数据的k近邻关系图,并以k近邻元素距离加权角度的方差作为近邻关系图节点的初始异常因子;最后通过训练图神经网络模型,实现节点间的信息交互,使得相邻节点能够互相学习,从而进行有效的异常评估。在6个自然数据集上将A-GNN方法与9种典型异常检测方法进行实验对比,结果表明:A-GNN在5个数据集中取得了最高的AUC值,其能够大幅提升各种维度数据的异常检测精度,在一些“真高维数据”上异常检测的AUC值提升达40%以上;在不同k值下与3种基于k近邻的异常检测方法相比,A-GNN利用图神经网络节点间的信息交互能有效避免k值对检测结果的影响,方法具有更强的鲁棒性

关键词: 异常检测, 基于角度的异常评估, 图神经网络, 高维数据, k近邻

Abstract:

In high-dimensional data spaces, most data are located at the edges of the high-dimensional space and distributed sparsely, resulting in the problem of "curse of dimensionality", which makes existing anomaly detection methods unable to ensure the accuracy of anomaly detection. To address this problem, an Angle-based Graph Neural Network(A-GNN) high-dimensional data anomaly detection method is proposed. First, the data used for training are expanded by uniformly sampling the data space and perturbing the initial training data. Second, the k-nearest neighbor relationship is used to construct a k-nearest neighbor relationship graph of the training data, and the variance of the k-nearest neighbor element distance weighted angle is used as the initial anomaly factor for the nodes in the k-nearest neighbor relationship graph. Finally, by training a GNN model, information exchange between nodes is achieved, enabling adjacent nodes to learn from each other and effectively evaluate anomalies. The A-GNN method is experimentally compared with nine typical anomaly detection methods on six natural datasets. The results demonstrate that A-GNN achieved the highest Area Under the Curve(AUC) value in five datasets, which can significantly improve the anomaly detection accuracy of various dimensions of data. On some true high-dimensional data, the AUC of anomaly detection increased by more than 40%. Compared with three k-nearest neighbor-based anomaly detection methods at different k values, A-GNN can effectively avoid the impact of k values on detection results by utilizing information exchange between GNN nodes, and the method has stronger robustness.

Key words: anomaly detection, angle-based anomaly assessment, Graph Neural Network(GNN), high-dimensional data, k-nearest neighbor