作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (11): 90-99. doi: 10.19678/j.issn.1000-3428.0069678

• 人工智能与模式识别 • 上一篇    下一篇

基于自注意力机制的时间序列插补

徐磊, 曾艳, 袁俊峰*(), 岳鲁鹏, 殷昱煜, 张纪林, 薛梅婷, 韩猛   

  1. 杭州电子科技大学计算机学院,浙江 杭州 310000
  • 收稿日期:2024-04-01 修回日期:2024-05-15 出版日期:2025-11-15 发布日期:2024-08-21
  • 通讯作者: 袁俊峰
  • 基金资助:
    国家自然科学基金(62072146); 浙江省"高层次人才特殊支持计划"科技创新领军型人才项目(2022R52043); 浙江省重点研发计划(2023C03194); 浙江省重点研发计划(2021C03187)

Time Series Imputation Based on Self-Attention Mechanism

XU Lei, ZENG Yan, YUAN Junfeng*(), YUE Lupeng, YIN Yuyu, ZHANG Jilin, XUE Meiting, HAN Meng   

  1. College of Computer Science, Hangzhou Dianzi University, Hangzhou 310000, Zhejiang, China
  • Received:2024-04-01 Revised:2024-05-15 Online:2025-11-15 Published:2024-08-21
  • Contact: YUAN Junfeng

摘要:

船舶轨迹数据作为海上交通的核心数据,可以用于轨迹预测、预警等任务,具有非常明显的时序特征,但海上环境恶劣、通信可靠性差等因素导致收集到的船舶轨迹数据普遍存在数据缺失的问题,对含有缺失数据的时间序列进行学习会严重影响时间序列分析的准确性。当前主流的解决方案是对缺失数据进行近似插补,主要基于卷积模型沿着时间轴对时间序列进行重塑,捕捉时间序列的局部特征,但对长时间序列的全局特征捕捉能力较弱。Transformer模型通过其核心的自注意力机制来捕获时间序列各个时间点之间的关系,从而增强模型对时间序列全局特征的捕捉能力,但注意力机制是通过矩阵乘计算得到的,导致其忽视了时间序列自身的时序性,得到的全局特征权重不具有时间跨度依赖性。因此,针对长时间序列全局特征捕捉的问题,提出一种基于自注意力机制的变体网络GANet。GANet首先通过自注意力机制获得基础的时间序列点之间的全局特征权重矩阵,再使用门控循环单元在时间轴上对全局特征权重矩阵进行遗忘与更新,从而得到具有时间跨度依赖性的全局特征权重矩阵;然后使用该矩阵进行数据重构,对缺失数据进行插补。GANet通过结合自注意力机制与门控机制实现了在捕捉全局特征的同时考虑时间跨度对各个时间点的影响,使得捕获到的全局特征具有时间跨度依赖性。实验结果表明,与现有Autoformer、Informer等模型相比,GANet对于Trajectory、ETT、Electricity数据集具有较好的插补效果。

关键词: 自注意力机制, 门控循环单元, 全局特征捕捉, 时间跨度依赖性, 时间序列插补

Abstract:

As core data for maritime traffic, ship trajectory data can be used for trajectory prediction, early warning, and other tasks with pronounced temporal characteristics. However, owing to factors such as harsh marine environments and poor communication reliability, missing ship trajectory data is a common problem. Learning from time series containing missing data can affect the accuracy of time series analysis significantly. The current mainstream solution is to approximate the imputation of missing data, mainly based on convolutional models, to reshape the time series along a timeline to capture its local features of the time series. However, the ability to capture the global features of long time series is limited. The Transformer enhances the ability of a model to capture the global features of a time series by capturing the relationships between various time points in the time series through its core self-attention mechanism. However, because its attention is calculated through matrix multiplication, it ignores the temporal nature of the time series, and the obtained global feature weights do not have a time span dependency. Therefore, to address the issue of capturing global features in long time series, this study proposes the GANet, a variant network based on the self-attention mechanism. GANet first obtains the basic global feature weight matrix from the time series points through the self-attention mechanism and then uses gated recurrent units to forget and update this global feature weight matrix on the timeline, thereby obtaining a global feature weight matrix with time span dependency, which is then used for data reconstruction to impute the missing data. GANet combines the self-attention mechanism and gating mechanism to capture global features while considering the impact of the time span on different time points, making the captured global feature time span dependent. Experimental results show that compared with existing models, such as Autoformer and Informer, GANet achieves better interpolation performance on Trajectory, ETT, and Electricity datasets.

Key words: self-attention mechanism, gated recurrent unit, global feature capture, time span dependency, time series imputation