Time-Series Semantic Mining Algorithm Based on Sub-Series Similarity

doi:10.19678/j.issn.1000-3428.0062832

Abstract

Abstract: Time-series is a sequence of values obtained by continuously measuring an object or system at the same interval.By obtaining potential semantic information in the time-series, the regularities or anomalies of a system can be identified, which can provide guidance for practice and analysis.However, most current time-series semantic mining algorithms are constrained by some of the characteristics of time-series data, and addressing a significant amount of time-series data with different characteristics is difficult.Hence, a time-series semantic mining algorithm based on sub-series similarity is proposed herein.First, by calculating the similarity of sub-series, the algorithm partitions the time-series into segment sequences for two-level clustering and identifies the underlying physical states in the time-series.Second, the algorithm introduces an iterative mode based on probability, dynamically adjusts the probability of a sub-series selected as a reference sub-series based on the candidate segmentation, and ensures that the reference sub-series includes all physical states.Experimental results show that the recognition accuracy of the algorithm on five real data sets such as PAMAP and Barbet exceeds 90%.Compared with FLUSS, pHMM, and AutoPlait algorithms, the proposed algorithm demonstrates higher recognition accuracy, operating efficiency, and versatility.

Key words: time-series, semantic mining, similarity measurement, clustering, k Nearest Neighbor(kNN)

摘要： 时间序列是对某个事物或系统进行连续同间隔测量得到的数值序列，挖掘时间序列中潜在的语义信息对于发现系统运行规律或识别系统突发异常至关重要，然而目前多数时间序列语义挖掘算法对于时间序列数据特征有一定的约束条件，难以处理海量且特征各异的时间序列数据。针对该问题，提出一种基于子序列相似性的时间序列语义挖掘算法。通过计算子序列的相似性，将时间序列分割成片段序列进行两级聚类，识别出时间序列中潜在的物理状态。引入基于概率的迭代模式，根据候选分段情况动态调整子序列被选为参考子序列的概率，保证参考子序列涵盖全部物理状态。实验结果表明，该算法在PAMAP、Barbet等5个真实数据集上的识别准确率均超过90%，相比于FLUSS、pHMM、AutoPlait算法具有更高的识别准确率与运行效率以及更强的通用性。

关键词: 时间序列, 语义挖掘, 相似性度量, 聚类, k最近邻

CLC Number:

TP391

LU Yi, WANG Peng, WANG Wei. Time-Series Semantic Mining Algorithm Based on Sub-Series Similarity[J]. Computer Engineering, 2022, 48(10): 88-94.

陆怡, 王鹏, 汪卫. 基于子序列相似性的时间序列语义挖掘算法[J]. 计算机工程, 2022, 48(10): 88-94.

/ / Recommend / Download Citations

URL: http://www.ecice06.com/EN/10.19678/j.issn.1000-3428.0062832

http://www.ecice06.com/EN/Y2022/V48/I10/88

Figures/Tables 6

References

[1] 黄超.基于特征分析的金融时间序列挖掘若干关键问题研究[D].上海:复旦大学, 2005. HUANG C.Research on several key issues in financial time series mining based on feature analysis[D].Shanghai:Fudan University, 2005.(in Chinese)
[2] SAXENA H, ANURAG A V, CHIRAYATH N, et al.Stock prediction using ARMA[J].International Journal of Engineering and Management Research, 2018, 8(2):1-4.
[3] 高飞翔.心电时间序列的表示方法和相似性度量问题研究[D].哈尔滨:哈尔滨工业大学, 2014. GAO F X.Research on representation methods and similarity measures of ECG time series[D].Harbin:Harbin Institute of Technology, 2014.(in Chinese)
[4] 史明阳, 王鹏, 汪卫.有监督时间序列分割与状态识别算法[J].计算机工程, 2020, 46(5):131-138. SHI M Y, WANG P, WANG W.Algorithm of supervised time series segmentation and state recognition[J].Computer Engineering, 2020, 46(5):131-138.(in Chinese)
[5] WANG P, WANG H X, WANG W.Finding semantics in time series[C]//Proceedings of 2011 ACM SIGMOD International Conference on Management of Data.New York, USA:ACM Press, 2011:385-396.
[6] EDDY S R.What is a hidden Markov model?[J].Nature Biotechnology, 2004, 22(10):1315-1316.
[7] MATSUBARA Y, SAKURAI Y, FALOUTSOS C.AutoPlait:automatic mining of co-evolving time sequences[C]//Proceedings of 2014 ACM SIGMOD International Conference on Management of Data.New York, USA:ACM Press, 2014:193-204.
[8] KAWABATA K, MATSUBARA Y, SAKURAI Y.StreamScope:automatic pattern discovery over data streams[C]//Proceedings of the 1st International Workshop on Exploiting Artificial Intelligence Techniques for Data Management.New York, USA:ACM Press, 2018:1-8.
[9] GRÜNWALD P D.The minimum description length principle[M].Cambridge, USA:MIT Press, 2007.
[10] GUI J, ZHENG Z, QIN Z, et al.An approach to extract state information from multivariate time series[J].Journal of Computers, 2020, 31(6):1-11.
[11] HONDA T, MATSUBARA Y, NEYAMA R, et al.Multi-aspect mining of complex sensor sequences[C]//Proceedings of IEEE International Conference on Data Mining.Washington D.C., USA.IEEE Press, 2019:299-308.
[12] MUEEN A, KEOGH E.Online discovery and maintenance of time series motifs[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York, USA:ACM Press, 2010:1089-1098.
[13] TOYODA M, SAKURAI Y, ISHIKAWA Y.Pattern discovery in data streams under the time warping distance[J].The VLDB Journal, 2013, 22(3):295-318.
[14] 原继东, 王志海, 韩萌.基于Shapelet剪枝和覆盖的时间序列分类算法[J].软件学报, 2015, 26(9):2311-2325. YUAN J D, WANG Z H, HAN M.Shapelet pruning and Shapelet coverage for time series classification[J].Journal of Software, 2015, 26(9):2311-2325.(in Chinese)
[15] GHARGHABI S, DING Y F, YEH C C M, et al.Matrix profile VIII:domain agnostic online semantic segmentation at superhuman performance levels[C]//Proceedings of IEEE International Conference on Data Mining.Washington D.C., USA:IEEE Press, 2017:117-126.
[16] GHARGHABI S, YEH C C M, DING Y F, et al.Domain agnostic online semantic segmentation for multi-dimensional time series[J].Data Mining and Knowledge Discovery, 2019, 33(1):96-130.
[17] DELDARI S, SMITH D V, SADRI A, et al.ESPRESSO:entropy and shape aware time-series segmentation on for processing heterogeneous sensor data[EB/OL].[2021-08-11].https://arxiv.org/abs/2008.03230v1.
[18] YEH C C M, ZHU Y, ULANOVA L, et al.Matrix profile I:all pairs similarity joins for time series:a unifying view that includes motifs, discords and Shapelets[C]//Proceedings of the 16th International Conference on Data Mining.Washington D.C., USA:IEEE Press, 2016:1317-1322.
[19] BRACEWELL R N.The Fourier transform and its applications[M].New York, USA:McGraw-Hill, 1986.
[20] BIRANT D, KUT A.ST-DBSCAN:an algorithm for clustering spatial-temporal data[J].Data & Knowledge Engineering, 2007, 60(1):208-221.
[21] 范子静, 罗泽, 马永征.一种基于模糊核聚类的谱聚类算法[J].计算机工程, 2017, 43(11):161-165, 172. FAN Z J, LUO Z, MA Y Z.A spectral clustering algorithm based on fuzzy kernel clustering[J].Computer Engineering, 2017, 43(11):161-165, 172.(in Chinese)
[22] HARTIGAN J A, WONG M A.Algorithm AS 136:a k-means clustering algorithm[J].Applied Statistics, 1979, 28(1):100-108.
[23] REISS A, STRICKER D.Towards global aerobic activity monitoring[C]//Proceedings of the 4th International Conference on Pervasive Technologies Related to Assistive Environments.Washington D.C., USA.IEEE Press, 2011:1-8.
[24] REISS A, WEBER M, STRICKER D.Exploring and extending the boundaries of physical activity recognition[C]//Proceedings of IEEE International Conference on Systems, Man, and Cybernetics.Washington D.C., USA:IEEE Press, 2011:46-50.
[25] MILLIGAN G W, COOPER M C.A study of the comparability of external criteria for hierarchical cluster analysis[J].Multivariate Behavioral Research, 1986, 21(4):441-458.

Please choose a citation manager

Content to export