作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2024, Vol. 50 ›› Issue (1): 60-67. doi: 10.19678/j.issn.1000-3428.0066632

• 人工智能与模式识别 • 上一篇    下一篇

支持均匀缩放的不等长时间子序列查询方法

熊浩然1,*(), 何震瀛2   

  1. 1. 复旦大学软件学院, 上海 200433
    2. 复旦大学计算机科学技术学院, 上海 200433
  • 收稿日期:2022-12-27 出版日期:2024-01-15 发布日期:2023-03-30
  • 通讯作者: 熊浩然
  • 基金资助:
    国家重点研发计划(2021YFB3300502)

Variable-Length Time Series Subsequence Query Method Supporting Uniform Scaling

Haoran XIONG1,*(), Zhenying HE2   

  1. 1. School of Software, Fudan University, Shanghai 200433, China
    2. School of Computer Science, Fudan University, Shanghai 200433, China
  • Received:2022-12-27 Online:2024-01-15 Published:2023-03-30
  • Contact: Haoran XIONG

摘要:

作为时序数据分析中的基础技术之一,时间序列的子序列查询旨在寻找与目标序列相似的子序列。现有的子序列查询方法大多仅支持查询与目标序列长度相同的子序列,因而均匀缩放技术常被用于解决子序列查询中的不等长问题。但现有支持均匀缩放的子序列查询技术大多未考虑子序列的Z-标准化,且对查询效率仍有改善的空间。针对该问题,提出一种基于索引技术且支持均匀缩放的子序列查询方法。结合现有索引方法ULISSE提供的树状数据结构,设计可保证非漏报的下界距离,为索引结构的剪枝提供理论保证,并利用索引中存储的元数据,提出精确K-近邻查询算法。所提方法适用于非归一化和归一化两种场景。实验结果表明,较UCR-US和ULISSE基线方法,该基于索引的不等长子序列查询方法在CAP、GAP两个真实数据集以及随机游走人工合成数据集上均实现了查询效率的显著提升,针对在非归一化和归一化两种场景下的不等长子序列查询,该方法的平均效率提升分别为2.33和2.51倍。

关键词: 时间序列, 子序列查询, 均匀缩放, 索引, 下界距离, K-近邻

Abstract:

Subsequence query, a fundamental technique in time-series data analysis, aims to find subsequences similar to the target sequence. Most existing methods for subsequence query support the query of subsequences of the same length as the target sequence. Therefore, uniform scaling is often used to address the problem of variable lengths in subsequence query. However, most existing subsequence query techniques that support uniform scaling do not consider the Z-normalization of subsequences, and the query efficiency requires improvement. To address this problem, a novel subsequence query method based on indexing techniques and supporting uniform scaling is proposed. Combined with the tree data structure provided by the existing ULISSE index method, a lower bound distance is designed to guarantee non-dismissal matching, which provides a theoretical guarantee for the pruning of the index structure. Furthermore, an exact K-Nearest Neighbor(K-NN) query algorithm is proposed using the metadata stored in the index. In addition, the entire set of methods is applicable to both nonnormalized and normalized scenarios. The experimental results show that this index-based query method achieves a significant improvement in efficiency compared with the baseline methods, UCR-US and ULISSE, on real datasets, CAP and GAP, as well as on synthetic datasets using random walking. For variable-length query in nonnormalized and normalized scenarios, the average efficiency improvement of this method is 2.33 times and 2.51 times, respectively.

Key words: time series, subsequence query, uniform scaling, index, lower bound distance, K-Nearest Neighbor(K-NN)