作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2025, Vol. 51 ›› Issue (7): 161-170. doi: 10.19678/j.issn.1000-3428.0069290

• 先进计算与数据处理 • 上一篇    下一篇

不确定时间序列Top-k窗口聚合查询方法

张航, 熊浩然, 何震瀛*()   

  1. 复旦大学计算机科学技术学院, 上海 200433
  • 收稿日期:2024-01-23 出版日期:2025-07-15 发布日期:2024-06-14
  • 通讯作者: 何震瀛
  • 基金资助:
    国家重点研发计划(2021YFB3300502)

Top-k Window Aggregation Query Method for Uncertain Time Series

ZHANG Hang, XIONG Haoran, HE Zhenying*()   

  1. School of Computer Science, Fudan University, Shanghai 200433, China
  • Received:2024-01-23 Online:2025-07-15 Published:2024-06-14
  • Contact: HE Zhenying

摘要:

近年来, 如何分析挖掘不确定时间序列数据逐渐受到业界关注。Top-k查询作为数据库领域研究的热点问题, 旨在从大规模数据中检索出最符合用户查询条件的前k项结果。然而, 尽管Top-k查询在其他领域已被广泛应用, 针对不确定时间序列的Top-k查询研究仍然较少。这种查询可以有效帮助用户从不确定时间序列提取重要信息。提出一种新的Top-k查询问题——不确定时间序列Top-k窗口聚合查询, 并针对该问题给出高效的查询方法。这个查询可以作为一个基础工具, 辅助用户探索和分析不确定时间序列数据。现有能够支持这个查询的方法均存在查询效率较低或所需存储空间过高的问题。针对该问题, 提出一种基于子窗口拼接策略的两级Top-k查询方法, 并提出高效计算阈值上界方法解决基于子窗口拼接策略引入的阈值计算复杂难题。该方法能够以较少的预计算存储空间, 高效支持不确定时间序列Top-k窗口聚合查询。为了验证所提方法的有效性, 在真实和人造数据集上进行实验。实验结果表明, 所提方法与基于TA的Top-k查询方法相比, 明显降低了预计算列表的存储空间; 与基于遍历的FSEC-S方法相比, 所提方法以及使用计算阈值上界优化方法的平均查询效率分别提升了7.27倍和20.04倍。

关键词: 不确定时间序列, Top-k查询, 窗口, 聚合查询, 有序列表, 阈值

Abstract:

The analysis and mining of uncertain time series data has attracted attention in various industries. Top-k queries, a popular research Topic in the database field, aim to retrieve the Top-k results that best match a user's query conditions from large-scale data. Although Top-k queries have been extensively explored and applied in various fields, research on Top-k queries specifically for uncertain time series is limited. Such queries can effectively help users extract important information from uncertain time series. This study proposes a new Top-k query problem, i.e., Top-k window aggregate queries over uncertain time series, and provides an efficient algorithm to address this problem. This query can serve as a fundamental tool to assist users in exploring and analyzing uncertain time series. Existing methods supporting this query suffer from low efficiency or require high storage space. To address these issues, this study proposes a novel two-level Top-k query method based on the sub-window stitching strategy and a method for efficiently computing the upper bound of thresholds to solve the complexity issues introduced by the sub-window stitching strategy. This method efficiently supports Top-k window aggregate queries over uncertain time series with less pre-computed storage space. The effectiveness and efficiency of the proposed method are evaluated on both real and synthetic datasets. The results demonstrate that the proposed method significantly reduces the storage space for pre-computed lists compared with Top-k query methods based on TA, overcoming challenges that hinder practical application. The average query efficiency of the proposed method and its further optimization using the upper bound of the thresholds are 7.27 times and 20.04 times better than those of the traversal method FSEC-S, respectively.

Key words: uncertain time series, Top-k query, window, aggregation query, sorted list, threshold