基于群组与密度的轨迹聚类算法

doi:10.19678/j.issn.1000-3428.0057425

摘要/Abstract

摘要： 现有基于密度的聚类方法主要用于点数据的聚类，不适用于大规模轨迹数据。针对该问题，提出一种利用群组和密度的轨迹聚类算法。根据最小描述长度原则对轨迹进行分段预处理找出具有相似特征的子轨迹段，通过两次遍历轨迹数据集获取基于子轨迹段的群组集合，并采用群组搜索代替距离计算减少聚类过程中邻域对象集合搜索的计算量，最终结合群组和密度完成对轨迹数据集的聚类。在大西洋飓风轨迹数据集上的实验结果表明，与基于密度的TRACLUS轨迹聚类算法相比，该算法运行时间更短，聚类结果更准确，在小数据集和大数据集上的运行时间分别减少73.79%和84.19%，且运行时间的减幅随轨迹数据集规模的扩大而增加。

关键词: 群组, 密度, 群组可达, 邻域搜索, 轨迹聚类

Abstract: The existing density-based clustering methods are mainly used for point data clustering, and not suitable for large-scale trajectory data. To address the problem, this paper proposes a trajectory clustering algorithm based on group and density. According to the principle of Minimum Description Length(MDL), the trajectories are preprocessed by segments to find out the sub trajectories with similar characteristics. The group set based on the sub trajectories is obtained by traversing the trajectories dataset twice, and the group search is used to replace the distance calculation to reduce the calculation amount required for the neighborhood object set search in the clustering process. Finally, the trajectory data set is clustered by combining the group and density. Experimental results on Atlantic hurricane track dataset show that,compared with the density-based TRACLUS track clustering algorithm, the running time of the proposed algorithm is less and the clustering results are more accurate. The running time on the small dataset and large dataset is reduced by 73.79% and 84.19% respectively, and the reduction of running time increases with the expansion of track dataset.

Key words: group, density, group reachability, neighborhood search, trajectory clustering

中图分类号:

TP391

俞庆英, 赵亚军, 叶梓彤, 胡凡, 夏芸. 基于群组与密度的轨迹聚类算法[J]. 计算机工程, 2021, 47(4): 100-107.

YU Qingying, ZHAO Yajun, YE Zitong, HU Fan, XIA Yun. Trajectory Clustering Algorithm Based on Group and Density[J]. Computer Engineering, 2021, 47(4): 100-107.

https://www.ecice06.com/CN/Y2021/V47/I4/100

图/表 12

20210425165455

20210425165516

20210425165521

20210425165526

20210425165529

20210425165536

20210425165540

20210425165543

20210425165546

20210425165548

20210425165553

20210425165556

参考文献

[1] ZHENG Yu.Trajectory data mining?:an overview[J].ACM Transactions on Intelligent Systems and Technology,2015,6(3):1-41.
[2] LÜ Mingqi,CHEN Ling,XU Zhenxing,et al.The discovery of personally semantic places based on trajectory data mining[J].Neurocomputing,2016,173(10):1142-1153.
[3] GUREVICH I B,YASHINA V V.Descriptive approach to image analysis:image formalization space[J].Pattern Recognition and Image Analysis,2012,22(4):495-518.
[4] SHARAF M A,KOWALSKI B R,WEINSTEIN B.Construction of phylogenetic trees by pattern recognition procedures[J].Zeitschrift Fur Naturforschung,1980,35(5):508-513.
[5] COMAS D S,MESCHINO G J,NOWE A,et al.Discovering knowledge from data clustering using automatically-defined interval type-2 fuzzy predicates[J].Expert Systems with Applications,2017,68(2):136-150.
[6] PIRAYRE A,COUPRIE C,DUVAL L,et al.BRANE clust:cluster-assisted gene regulatory network inference refinement[J].IEEE/ACM Transactions on Computational Biology and Bioinformatics,2018,15(3):850-860.
[7] CHENG Qiming,ZHANG Qiang,CHENG Yinman,et al.Short-term photovoltaic power prediction model based on hierarchical clustering of density peaks algorithm[J].High Voltage Engineering,2017,43(4):1214-1222.
[8] WANG Zengfeng,ZHANG Hao,LU Tinging,et al.A grid-based localization algorithm for wireless sensor networks using connectivity and RSS rank[J].IEEE Access,2018,6:8426-8439.
[9] VISSER E,NIJHUIS E H,BUITELAAR J K,et al.Partition-based mass clustering of tractography streamlines[J].Neuroimage,2011,54(1):303-312.
[10] GUO Gongde,CHEN Lifei,YE Yanfang,et al.Cluster validation method for determining the number of clusters in categorical sequences[J].IEEE Transactions on Neural Networks and Learning Systems,2017,28(12):2936-2948.
[11] HE Xiongxiong,GUAN Junyi,YE Xuanzuo,et al.A density-based and grid-based cluster centers determination clustering algorithm[J].Control and Decision,2017,32(5):913-919.(in Chinese)何熊熊,管俊轶,叶宣佐,等.一种基于密度和网格的簇心可确定聚类算法[J].控制与决策,2017,32(5):913-919.
[12] ESTER M,KRIEGEL H P,SANDER J,et al.A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.New York,USA:ACM Press,1996:226-231.
[13] GAO Qiang,ZHANG Fengli,WANG Ruijin,et al.Trajectory big data:a review of key technologies in data processing[J].Journal of Software,2017,28(4):959-992.(in Chinese)高强,张凤荔,王瑞锦,等.轨迹大数据:数据处理关键技术研究综述[J].软件学报,2017,28(4):959-992.
[14] LEE J G,HAN J,WHANG K Y.Trajectory clustering:a partition and group framework[C]//Proceedings of 2007 International Conference on Management of Data.New York,USA:ACM Press,2007:593-605.
[15] GUTTMAN A.R-trees:a dynamic index structure for spatial searching[C]//Proceedings of 1984 International Conference on Management of Data.New York,USA:ACM Press,1984:47-57.
[16] PROCOPIUC O,AGARWAL P K,ARGE L,et al.Bkd-tree:a dynamic scalable kd-tree[C]//Proceedings of 2003 International Symposium on Spatial and Temporal Databases.Berlin,Germany:Springer,2003:46-65.
[17] DAI Yangyang,LI Chaofeng,XU Hua.Density clustering algorithm with initial point optimization and parameter self-adaption[J].Computer Engineering,2016,42(1):203-209.(in Chinese)戴阳阳,李朝锋,徐华.初始点优化与参数自适应的密度聚类算法[J].计算机工程,2016,42(1):203-209.
[18] GHANBARPOUR A,MINAEI B.EXDBSCAN:an extension of DBSCAN to detect clusters in multi-density datasets[C]//Proceedings of 2014 Iranian Conference on Intelligent Systems.Washington D.C.,USA:IEEE Press,2014:1-5.
[19] ANKITA,THAKUR M K.Modified DBSCAN using particle swarm optimization for spatial hotspot identification[C]//Proceedings of 2018 International Conference on Con-temporary Computing.Washington D.C.,USA:IEEE Press,2018:1-3.
[20] BRYANT A C,CIOS K J.RNN-DBSCAN:a density-based clustering algorithm using reverse nearest neighbor density estimates[J].IEEE Transactions on Knowledge and Data Engineering,2018,30(6):1109-1121.
[21] MERK A,CAL P,WOŹNIAK M.Distributed DBSCAN algorithm-concept and experimental evaluation[C]//Proceedings of the 10th International Conference on Computer Recognition Systems.Berlin,Germany:Springer,2017:472-480.
[22] GAO Xu,GUI Zhipeng,LONG Xi,et al.KDSG-DBSCAN:a high performance DBSCAN algorithm based on K-D Tree and Spark GraphX[J].Geography and Geo-Information Science,2017,33(6):1-7.(in Chinese)高旭,桂志鹏,隆玺,等.KDSG-DBSCAN:一种基于K-D Tree和Spark GraphX的高性能DBSCAN算法[J].地理与地理信息科学,2017,33(6):1-7.
[23] CHEN Zhihua,GUO Jianming,LIU Qing.DBSCAN algorithm clustering for massive ais data based on the hadoop platform[C]//Proceedings of 2017 International Conference on Industrial Informatics-Computing Techno-logy, Intelligent Technology, Industrial Information Integration.Washington D.C.,USA:IEEE Press,2017:25-28.
[24] ZHANG D,LEE K,LEE I.Hierarchical trajectory clustering for spatio-temporal periodic pattern mining[J].Expert Systems with Applications,2018,92(2):1-11.
[25] WANG Jiayu,ZHANG Zhenyu,CHU Zheng,et al.A trajectory data density partition based distributed parallel clustering method[J].Journal of University of Science and Technology of China,2018,48(1):47-56.(in Chinese)王佳玉,张振宇,褚征,等.一种基于轨迹数据密度分区的分布式并行聚类方法[J].中国科学技术大学学报,2018,48(1):47-56.

选择文件类型/文献管理软件名称

选择包含的内容