作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (4): 107-118. doi: 10.19678/j.issn.1000-3428.0068987

• 人工智能与模式识别 • 上一篇    下一篇

鲁棒物联网多维时序数据预测方法

沈忱1,2, 何勇1,2,*(), 彭安浪3   

  1. 1. 贵州大学公共大数据国家重点实验室, 贵州 贵阳 550025
    2. 贵州大学计算机科学与技术学院, 贵州 贵阳 550025
    3. 贵州兆信数码技术有限公司, 贵州 贵阳 550025
  • 收稿日期:2023-12-07 出版日期:2025-04-15 发布日期:2024-05-21
  • 通讯作者: 何勇
  • 基金资助:
    贵州省科技支撑计划项目(黔科合支撑[2022]一般267)

Robust Internet of Things Multidimensional Time Series Data Prediction Method

SHEN Chen1,2, HE Yong1,2,*(), PENG Anlang3   

  1. 1. State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, Guizhou, China
    2. College of Computer Science and Technology, Guizhou University, Guiyang 550025, Guizhou, China
    3. Guizhou Zhaoxin Digital Technology Co., Ltd., Guiyang 550025, Guizhou, China
  • Received:2023-12-07 Online:2025-04-15 Published:2024-05-21
  • Contact: HE Yong

摘要:

在物联网(IoT)场景中, 数据在采集和传输过程中易受噪声的干扰, 导致数据中存在一定的离群值与缺失值。现有的时间正则化矩阵分解模型通常考虑平方损失来衡量重构误差, 忽略了处理存在异常数据的多维时间序列时, 矩阵分解的质量同样是影响模型预测性能的关键因素。提出一种基于L2, log范数的时间感知鲁棒非负矩阵分解多维时序预测框架(TARNMF)。TARNMF通过非负矩阵分解(NMF)和参数可学习的自回归(AR)时间正则项建立多维时序数据的时空相关性, 基于存在离群值的数据服从拉普拉斯分布的假设, 使用L2, log范数来估计非负鲁棒矩阵分解中原始数据和重建矩阵的误差, 以减小异常数据对预测模型的干扰。L2, log范数具备现有鲁棒度量函数的性质, 解决了L1损失的近似问题, 并通过压缩异常值的残差来减少其对目标函数的影响。此外, 提出一种基于投影梯度下降的优化方法对模型进行优化。实验结果表明, TARNMF具有良好的可扩展性和鲁棒性, 尤其在高维Solar数据集上, 较次优结果的相对平均绝对误差降低了8.64%。同时, 在噪声数据上的实验结果验证了TARNMF能高效地处理和预测存在异常数据的IoT时序数据。

关键词: L2, log范数, 非负矩阵分解, 时间正则化矩阵分解, 多维时序数据预测, 鲁棒性

Abstract:

In Internet of Things (IoT) scenarios, data are susceptible to noise during collection and transmission, resulting in outliers and missing data. Existing temporal regularized matrix factorization models typically consider the squared loss as a measure of reconstruction errors, ignoring the fact that the quality of matrix factorization is also a key factor affecting a model's prediction performance when dealing with multidimensional time series in the presence of anomalous data. Therefore, this paper proposes a Time Aware Robust Non-negative Matrix Factorization multidimensional temporal prediction framework (TARNMF) based on the L2, log norm. TARNMF establishes the spatiotemporal correlation of multidimensional time series data through Nonnegative Matrix Factorization (NMF) and autoregressive temporal regular terms with learnable parameters. In the presence of outliers, data obey the Laplace distribution. Based on this assumption, the L2, log norm is used to estimate the error between the original data and the reconstructed matrices in the nonnegative robust matrix factorization to minimize the interference of the anomalous data on the prediction model. The L2, log norm is as robust as existing metric functions, solves the problem of approximating the L1 loss, and reduces its effect on the objective function by compressing the residuals of the outliers. The paper also proposes a projected gradient descent-based optimization method to optimize the model. Experiments on a high-dimensional Solar dataset show that TARNMF is scalable and robust, and the relative mean absolute error of the suboptimal results is reduced by 8.64%. Meanwhile, results on noisy data verify that TARNMF can efficiently process and predict IoT time series data in the presence of anomalous data.

Key words: L2, log norm, Nonnegative Matrix Factorization (NMF), temporal regularized matrix factorization, multidimensional time series data prediction, robustness