不确定时间序列Top-<i>k</i>窗口聚合查询方法

doi:10.19678/j.issn.1000-3428.0069290

摘要/Abstract

摘要：

近年来, 如何分析挖掘不确定时间序列数据逐渐受到业界关注。Top-k查询作为数据库领域研究的热点问题, 旨在从大规模数据中检索出最符合用户查询条件的前k项结果。然而, 尽管Top-k查询在其他领域已被广泛应用, 针对不确定时间序列的Top-k查询研究仍然较少。这种查询可以有效帮助用户从不确定时间序列提取重要信息。提出一种新的Top-k查询问题——不确定时间序列Top-k窗口聚合查询, 并针对该问题给出高效的查询方法。这个查询可以作为一个基础工具, 辅助用户探索和分析不确定时间序列数据。现有能够支持这个查询的方法均存在查询效率较低或所需存储空间过高的问题。针对该问题, 提出一种基于子窗口拼接策略的两级Top-k查询方法, 并提出高效计算阈值上界方法解决基于子窗口拼接策略引入的阈值计算复杂难题。该方法能够以较少的预计算存储空间, 高效支持不确定时间序列Top-k窗口聚合查询。为了验证所提方法的有效性, 在真实和人造数据集上进行实验。实验结果表明, 所提方法与基于TA的Top-k查询方法相比, 明显降低了预计算列表的存储空间; 与基于遍历的FSEC-S方法相比, 所提方法以及使用计算阈值上界优化方法的平均查询效率分别提升了7.27倍和20.04倍。

关键词: 不确定时间序列, Top-k查询, 窗口, 聚合查询, 有序列表, 阈值

Abstract:

The analysis and mining of uncertain time series data has attracted attention in various industries. Top-k queries, a popular research Topic in the database field, aim to retrieve the Top-k results that best match a user's query conditions from large-scale data. Although Top-k queries have been extensively explored and applied in various fields, research on Top-k queries specifically for uncertain time series is limited. Such queries can effectively help users extract important information from uncertain time series. This study proposes a new Top-k query problem, i.e., Top-k window aggregate queries over uncertain time series, and provides an efficient algorithm to address this problem. This query can serve as a fundamental tool to assist users in exploring and analyzing uncertain time series. Existing methods supporting this query suffer from low efficiency or require high storage space. To address these issues, this study proposes a novel two-level Top-k query method based on the sub-window stitching strategy and a method for efficiently computing the upper bound of thresholds to solve the complexity issues introduced by the sub-window stitching strategy. This method efficiently supports Top-k window aggregate queries over uncertain time series with less pre-computed storage space. The effectiveness and efficiency of the proposed method are evaluated on both real and synthetic datasets. The results demonstrate that the proposed method significantly reduces the storage space for pre-computed lists compared with Top-k query methods based on TA, overcoming challenges that hinder practical application. The average query efficiency of the proposed method and its further optimization using the upper bound of the thresholds are 7.27 times and 20.04 times better than those of the traversal method FSEC-S, respectively.

Key words: uncertain time series, Top-k query, window, aggregation query, sorted list, threshold

张航, 熊浩然, 何震瀛. 不确定时间序列Top-k窗口聚合查询方法[J]. 计算机工程, 2025, 51(7): 161-170.

ZHANG Hang, XIONG Haoran, HE Zhenying. Top-k Window Aggregation Query Method for Uncertain Time Series[J]. Computer Engineering, 2025, 51(7): 161-170.

https://www.ecice06.com/CN/Y2025/V51/I7/161

图/表 9

图1 美国纽约市附近气象站点

Fig.1 Meteorological stations near New York city, USA

图2 不确定时间序列举例

Fig.2 An example of uncertain time series

图3 基于TA算法的预计算列表与查询举例

Fig.3 Pre-computed lists and query example based on TA algorithm

图4 基于HTA算法的预计算列表与查询举例

Fig.4 Pre-computed lists and query example based on HTA algorithm

图5 时间序列实例T_i的子窗口聚合有序列表

Fig.5 Sub-window aggregation of time series instances T_i with sorted lists

图6 TA及HTA算法预计算有序列表时间与存储空间对比

Fig.6 Comparison of pre-computation time and storage space for TA and HTA algorithms on sorted lists

图7 不同k值下查询性能的变化

Fig.7 Changes in query performance under different k values

图8 不同窗口长度下查询性能的变化

Fig.8 Changes in query performance under different window lengths

图9 不同时间序列实例数量下查询性能的变化

Fig.9 Changes in query performance under different numbers of time series instances

参考文献 26

1	JENSEN S K, PEDERSEN T B, THOMSEN C. Time series management systems: a survey. IEEE Transactions on Knowledge and Data Engineering, 2017, 29 (11): 2581- 2600.
2	PALPANAS T. Data series management: the road to big sequence analytics. ACM SIGMOD Record, 2015, 44 (2): 47- 52.
3	陆怡, 王鹏, 汪卫. 基于子序列相似性的时间序列语义挖掘算法. 计算机工程, 2022, 48 (10): 88- 94. URL
	LU Y, WANG P, WANG W. Time-series semantic mining algorithm based on sub-series similarity. Computer Engineering, 2022, 48 (10): 88- 94. URL
4	DALLACHIESA M, NUSHI B, MIRYLENKA K, et al. Uncertain time-series similarity: return to the basics. Proceedings of the VLDB Endowment, 2012, 11 (5): 1662- 1673.
5	YEH M Y, WU K L, YU P S, et al. PROUD: a probabilistic approach to processing similarity queries over uncertain data streams[C]//Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. New York, USA: ACM Press, 2009: 684-695.
6	SARANGI S R, MURTHY K. DUST: a generalized notion of similarity between uncertain time series[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA: ACM Press, 2010: 383-392.
7	CHEN D H, CHEN L. Sliding-window probabilistic threshold aggregate queries on uncertain data streams. Information Sciences, 2020, 520, 353- 372.
8	DALLACHIESA M, PALPANAS T, ILYAS I F. Top-k nearest neighbor search in uncertain data series. Proceedings of the VLDB Endowment, 2014, 8 (1): 13- 24.
9	LIAN X, CHEN L. Similarity join processing on uncertain data streams. IEEE Transactions on Knowledge and Data Engineering, 2010, 23 (11): 1718- 1734.
10	WANG Y J, LI X Y, LI X, et al. A survey of queries over uncertain data. Knowledge and Information Systems, 2013, 37 (3): 485- 530.
11	KALININ A, CETINTEMEL U, ZDONIK S. Interactive data exploration using semantic windows[C]//Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. New York, USA: ACM Press, 2014: 505-516.
12	ZHAO J, WANG P, TANG B, et al. Constructing compact time series index for efficient window query processing[C]//Proceedings of the 38th International Conference on Data Engineering (ICDE). Washington D.C., USA: IEEE Press, 2022: 3025-3037.
13	JIN C Q, YI K, CHEN L, et al. Sliding-window top-k queries on uncertain streams. Proceedings of the VLDB Endowment, 2008, 1 (1): 301- 312.
14	LI F F, YI K, LE W C. Top-k queries on temporal data. The VLDB Journal, 2010, 19 (5): 715- 733.
15	JESTES J, PHILLIPS J M, LI F F, et al. Ranking large temporal data. Proceedings of the VLDB Endowment, 2012, 5 (11): 1412- 1423.
16	TANGWONGSAN K, HIRZEL M, SCHNEIDER S, et al. General incremental sliding-window aggregation. Proceedings of the VLDB Endowment, 2015, 8 (7): 702- 713.
17	TANGWONGSAN K, HIRZEL M, SCHNEIDER S. Optimal and general out-of-order sliding-window aggregation. Proceedings of the VLDB Endowment, 2019, 12 (10): 1167- 1180.
18	FAGIN R, LOTEM A, NAOR M. Optimal aggregation algorithms for middleware[C]//Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. New York, USA: ACM Press, 2001: 102-113.
19	ILYAS I F, BESKALES G, SOLIMAN M A. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys (CSUR), 2008, 40 (4): 1- 58.
20	National Centers for Environmental Information. Daily observational data[EB/OL]. [2023-12-18]. https://www.ncei.noaa.gov/maps/daily/.
21	FENG S, GLAVIC B, HUBER A, et al. Efficient uncertainty tracking for complex queries with attribute-level bounds[C]//Proceedings of the 2021 International Conference on Management of Data. New York, USA: ACM Press, 2021: 528-540.
22	OLAF K. Jena weather data analysis[EB/OL]. [2023-12-18]. https://www.bgc-jena.mpg.de/wetter/.
23	LINARDI M, PALPANAS T. Scalable data series subsequence matching with ULISSE. The VLDB Journal, 2020, 29 (6): 1449- 1474.
24	熊浩然, 何震瀛. 支持均匀缩放的不等长时间子序列查询方法. 计算机工程, 2024, 50 (1): 60- 67. URL
	XIONG H R, HE Z Y. Variable-length time series subsequence query method supporting uniform scaling. Computer Engineering, 2024, 50 (1): 60- 67. URL
25	WANG Z, WANG Q, WANG P, et al. Dumpy: a compact and adaptive index for large data series collections[C]//Proceedings of the ACM on Management of Data. New York, USA: ACM Press, 2023, 1(1): 1-27.
26	CHEN L, GAO Y J, ZHONG A X, et al. Indexing metric uncertain data for range queries and range joins. The VLDB Journal, 2017, 26, 585- 610.

[1]	王志特, 罗丽平, 廖义奎. 改进A^*算法融合改进动态窗口法的移动机器人路径规划[J]. 计算机工程, 2024, 50(8): 86-101.
[2]	郑娟毅, 张庆珏, 董嘉豪, 郭梦月, 杨溥江. 一种深度学习的波束空间信道估计算法[J]. 计算机工程, 2024, 50(5): 298-305.
[3]	王正家, 胡飞飞, 张成娟, 雷卓, 何涛. 引入轻量级Transformer的自适应窗口立体匹配算法[J]. 计算机工程, 2024, 50(2): 256-265.
[4]	潘伟, 黄瑞章, 任丽娜, 薛菁菁. 基于自适应结构学习的深度文本聚类[J]. 计算机工程, 2024, 50(11): 89-97.
[5]	黄聪, 邹耀斌, 孙水发. 圆形直方图线性化的高精度高适应性多阈值分割方法[J]. 计算机工程, 2024, 50(1): 259-270.
[6]	付雪, 朱良宽, 黄建平, 王璟瑀, ARYSTANRyspayev. 基于改进北方苍鹰优化算法的多阈值图像分割[J]. 计算机工程, 2023, 49(7): 232-241.
[7]	王春雷, 张建林, 李美惠, 徐智勇, 魏宇星. 结合卷积Transformer的目标跟踪算法[J]. 计算机工程, 2023, 49(4): 281-288,296.
[8]	王禹博, 陈利锋, 许卫霞. 结合多解码器与两阶段通道选择的异常检测方法[J]. 计算机工程, 2023, 49(3): 37-48.
[9]	衡红军, 范昱辰, 王家亮. 基于Transformer的多方面特征编码图像描述生成算法[J]. 计算机工程, 2023, 49(2): 199-205.
[10]	王国栋, 叶剑, 谢萦, 钱跃良. 基于梯度的自适应阈值结构化剪枝算法[J]. 计算机工程, 2022, 48(9): 113-120.
[11]	詹京吴, 黄宜庆. 融合安全A^*算法与动态窗口法的机器人路径规划[J]. 计算机工程, 2022, 48(9): 105-112,120.
[12]	潘金凤, 尹丽菊, 高明亮, 邹国峰. 压缩感知观测信号的低秩稀疏分解[J]. 计算机工程, 2022, 48(8): 234-239.
[13]	刘蒙蒙, 牛保宁, 杨茸. 关键词最优路径查询的分段拓展算法[J]. 计算机工程, 2022, 48(6): 79-88.
[14]	李冠达, 金兢, 王凡, 夏营威, 杨学志. 室内场景下应用拓扑结构的高效路径规划算法[J]. 计算机工程, 2022, 48(6): 95-106.
[15]	李莉, 任振康, 石可欣. 代价敏感的Boosting软件缺陷预测方法[J]. 计算机工程, 2022, 48(3): 175-180.

选择文件类型/文献管理软件名称

选择包含的内容

不确定时间序列Top-k窗口聚合查询方法

Top-k Window Aggregation Query Method for Uncertain Time Series

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 26

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

不确定时间序列Top-k窗口聚合查询方法

Top-k Window Aggregation Query Method for Uncertain Time Series

RichHTML

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

图/表 9

参考文献 26

相关文章 15

编辑推荐

Metrics

本文评价