基于可解释性深度学习的物联网水质监测数据异常检测

doi:10.19678/j.issn.1000-3428.0067570

摘要/Abstract

摘要： 随着物联网技术的发展和应用范围的扩大,物联网设备和传感器的数量和种类也在不断增加。物联网水质传感器在生态监测与保护领域起着至关重要的作用,针对物联网水质传感器采集的监测数据中数据量大、维度高、无标注等问题,提出一种基于可解释性深度学习的无监督异常数据检测算法。使用自动编码器(AE)和SHAP算法对多维水质数据集进行异常检测。通过训练自动编码器模型,标记重建误差较大的数据,使用SHAP解释自动编码器并计算被标记数据中各数据特征的重要性。基于这些特征的重要性,确定最终的异常值,从而实现对水质监测数据的异常检测。在物联网水质监测数据集上的实验结果表明,该算法能有效检测出异常数据,F1值为0.875,性能优于当前无监督异常检测领域常用算法。该算法对于处理物联网水质监测数据具有实际应用价值,此外,还可以应用于其他领域的海量物联网监测数据的异常检测,例如气象、环境等领域。

关键词: 深度学习, 自动编码器, 异常检测, 可解释机器学习, 无监督学习

Abstract: With the increasing applicability of Internet-of-Things (IoT) technology, the number and types of IoT devices and sensors are continuously increasing. In particular, IoT water quality sensors play a vital role in the field of ecological monitoring and protection. Accordingly, this study proposes an unsupervised anomaly data detection algorithm based on explainable deep learning to address the issues of large volume, high dimensionality, and lack of labeling in the monitoring data collected by IoT water quality sensors. The algorithm uses the Auto-Encoder (AE) and SHAP algorithms to detect anomalies in multi-dimensional water quality datasets. The AE model is trained to flag data with significant reconstruction errors, and SHAP is used to interpret the AE and calculate the importance of each feature in the flagged data. Based on the importance of these features, the final anomaly value is determined for the dataset. Experimental results on an IoT water quality monitoring dataset show that the algorithm can effectively detect anomalous data with an F1 value of 0.875, outperforming existing unsupervised anomaly detection algorithms. Thus, the proposed algorithm has a practical application value for processing IoT water quality monitoring data. Furthermore, the algorithm can be applied to the anomaly detection of massive IoT monitoring data in other fields, such as meteorology and the environment.

Key words: deep learning, Auto-Encoder(AE), anomaly detection, explainable machine learning, unsupervised learning

中图分类号:

TP181

李永飞, 李铭洋, 常鑫, 曹可欣. 基于可解释性深度学习的物联网水质监测数据异常检测[J]. 计算机工程, 2024, 50(6): 179-187.

LI Yongfei, LI Mingyang, CHANG Xin, CAO Kexin. Anomaly Detection of IoT Water Quality Monitoring Data Based on Explainable Deep Learning[J]. Computer Engineering, 2024, 50(6): 179-187.

https://www.ecice06.com/CN/Y2024/V50/I6/179

参考文献

[1] 李南忘.基于数据挖掘方法通过简约参数建立水质异常检测及污染物识别系统的研究[D].上海:华东师范大学, 2016. LI N W. Study on the detection of water quality anomaly and classification of contaminants based on simple water quality parameters and data mining method[D].Shanghai:East China Normal University, 2016.(in Chinese)
[2] 黄训华,张凤斌,樊好义,等.基于多模态对抗学习的无监督时间序列异常检测[J].计算机研究与发展, 2021, 58(8):1655-1667. HUANG X H, ZHANG F B, FAN H Y, et al. Multimodal adversarial learning based unsupervised time series anomaly detection[J]. Journal of Computer Research and Development, 2021, 58(8):1655-1667.(in Chinese)
[3] QIAO Y, CUI X H, JIN P, et al. Fast outlier detection for high-dimensional data of wireless sensor networks[J]. International Journal of Distributed Sensor Networks, 2020, 16(10):155014772096383.
[4] HAWKINS D. Identification of outliers[M]. London, England:Chapman and Hall, 1980.
[5] SHUKLA D S, PANDEY A C, KULHARI A. Outlier detection:a survey on techniques of WSNs involving event and error based outliers[C]//Proceedings of the Innovative Applications of Computational Intelligence on Power, Energy and Controls with Their Impact on Humanity (CIPECH). Washington D. C., USA:IEEE Press, 2014:113-116.
[6] 王禹博,陈利锋,许卫霞.结合多解码器与两阶段通道选择的异常检测方法[J].计算机工程, 2023, 49(3):37-48. WANG Y B, CHEN L F, XU W X. Anomaly detection method combining with multi-decoder and two-stage channel selection[J]. Computer Engineering, 2023, 49(3):37-48.(in Chinese)
[7] LU W N, CHENG Y, XIAO C, et al. Unsupervised sequential outlier detection with deep architectures[J]. IEEE Transactions on Image Processing, 2017, 26(9):4321-4330.
[8] CHAKRABORTY D, NARAYANAN V, GHOSH A. Integration of deep feature extraction and ensemble learning for outlier detection[J]. Pattern Recognition, 2019, 89:161-171.
[9] 柳月强,张建锋,祝麒翔,等.基于时空相关性的多传感器数据异常检测[J].计算机应用与软件, 2020, 37(10):85-90. LIU Y Q, ZHANG J F, ZHU Q X, et al. Outliers detection of multi-sensor data based on spatial-temporal correlation[J]. Computer Applications and Software, 2020, 37(10):85-90.(in Chinese)
[10] YEPMO V, SMITS G, PIVERT O. Anomaly explanation:a review[J]. Data&Knowledge Engineering, 2022, 137:101946.
[11] 汤佳欣,陈阳,周孟莹,等.深度学习方法在兴趣点推荐中的应用研究综述[J].计算机工程, 2022, 48(1):12-23, 42. TANG J X, CHEN Y, ZHOU M Y, et al. A survey of studies on deep learning applications in POI recommendation[J]. Computer Engineering, 2022, 48(1):12-23, 42.(in Chinese)
[12] 张蕾,崔勇,刘静,等.机器学习在网络空间安全研究中的应用[J].计算机学报, 2018, 41(9):1943-1975. ZHANG L, CUI Y, LIU J, et al. Application of machine learning in cyberspace security research[J]. Chinese Journal of Computers, 2018, 41(9):1943-1975.(in Chinese)
[13] 崔景洋,陈振国,田立勤,等.基于机器学习的用户与实体行为分析技术综述[J].计算机工程, 2022, 48(2):10-24. CUI J Y, CHEN Z G, TIAN L Q, et al. Overview of user and entity behavior analytics technology based on machine learning[J]. Computer Engineering, 2022, 48(2):10-24.(in Chinese)
[14] 袁非牛,章琳,史劲亭,等.自编码神经网络理论及应用综述[J].计算机学报, 2019, 42(1):203-230. YUAN F N, ZHANG L, SHI J T, et al. Theories and applications of auto-encoder neural networks:a literature survey[J]. Chinese Journal of Computers, 2019, 42(1):203-230.(in Chinese)
[15] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323:533-536.
[16] 马金.基于深度神经网络的序列异常检测研究[D].成都:电子科技大学, 2018. MA J. Research on sequential anomaly detection based on deep neural network[D].Chengdu:University of Electronic Science and Technology of China, 2018.(in Chinese)
[17] 孟恒宇,李元祥.基于Transformer重建的时序数据异常检测与关系提取[J].计算机工程, 2021, 47(2):69-76. MENG H Y, LI Y X. Anomaly detection and relation extraction for time series data based on Transformer reconstruction[J]. Computer Engineering, 2021, 47(2):69-76.(in Chinese)
[18] 纪守领,李进锋,杜天宇,等.机器学习模型可解释性方法、应用与安全研究综述[J].计算机研究与发展, 2019, 56(10):2071-2096. JI S L, LI J F, DU T Y, et al. Survey on techniques, applications and security of machine learning interpretability[J]. Journal of Computer Research and Development, 2019, 56(10):2071-2096.(in Chinese)
[19] ADADI A, BERRADA M. Peeking inside the black-box:a survey on explainable artificial intelligence (XAI)[J]. IEEE Access, 2018, 6:52138-52160.
[20] 苏炯铭,刘鸿福,项凤涛,等.深度神经网络解释方法综述[J].计算机工程, 2020, 46(9):1-15. SU J M, LIU H F, XIANG F T, et al. Survey of interpretation methods for deep neural networks[J]. Computer Engineering, 2020, 46(9):1-15.(in Chinese)
[21] MILLER T. Explanation in artificial intelligence:insights from the social sciences[EB/OL].[2023-04-03]. https://arxiv.org/pdf/1706.07269.
[22] AAS K, JULLUM M, LØLAND A. Explaining individual predictions when features are dependent:more accurate approximations to Shapley values[J]. Artificial Intelligence, 2021, 298:103502.
[23] PETCH J, DI S, NELSON W. Opening the black box:the promise and limitations of explainable machine learning in cardiology[J]. The Canadian Journal of Cardiology, 2022, 38(2):204-213.
[24] LUNDBERG S, LEE S I. A unified approach to interpreting model predictions[EB/OL].[2023-04-03]. https://arxiv.org/pdf/1705.07874.
[25] ANTWARG L, MILLER R M, SHAPIRA B, et al. Explaining anomalies detected by autoencoders using Shapley Additive Explanations[J]. Expert Systems with Applications, 2021, 186:115736.

选择文件类型/文献管理软件名称

选择包含的内容