作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (4): 100-107. doi: 10.19678/j.issn.1000-3428.0057425

• 人工智能与模式识别 • 上一篇    下一篇

基于群组与密度的轨迹聚类算法

俞庆英1,2, 赵亚军1,2, 叶梓彤1,2, 胡凡1,2, 夏芸1,2   

  1. 1. 安徽师范大学 计算机与信息学院, 安徽 芜湖 241002;
    2. 安徽师范大学 网络与信息安全安徽省重点实验室, 安徽 芜湖 241002
  • 收稿日期:2020-02-19 修回日期:2020-03-24 发布日期:2020-04-01
  • 作者简介:俞庆英(1980-),女,副教授、博士,主研方向为空间数据处理、信息安全;赵亚军,本科生;叶梓彤、胡凡,硕士研究生;夏芸,讲师、硕士。
  • 基金资助:
    国家自然科学基金(61702010,61972439)。

Trajectory Clustering Algorithm Based on Group and Density

YU Qingying1,2, ZHAO Yajun1,2, YE Zitong1,2, HU Fan1,2, XIA Yun1,2   

  1. 1. School of Computer and Information, Anhui Normal University, Wuhu, Anhui 241002, China;
    2. Anhui Provincial Key Laboratory of Network and Information Security, Anhui Normal University, Wuhu, Anhui 241002, China
  • Received:2020-02-19 Revised:2020-03-24 Published:2020-04-01

摘要: 现有基于密度的聚类方法主要用于点数据的聚类,不适用于大规模轨迹数据。针对该问题,提出一种利用群组和密度的轨迹聚类算法。根据最小描述长度原则对轨迹进行分段预处理找出具有相似特征的子轨迹段,通过两次遍历轨迹数据集获取基于子轨迹段的群组集合,并采用群组搜索代替距离计算减少聚类过程中邻域对象集合搜索的计算量,最终结合群组和密度完成对轨迹数据集的聚类。在大西洋飓风轨迹数据集上的实验结果表明,与基于密度的TRACLUS轨迹聚类算法相比,该算法运行时间更短,聚类结果更准确,在小数据集和大数据集上的运行时间分别减少73.79%和84.19%,且运行时间的减幅随轨迹数据集规模的扩大而增加。

关键词: 群组, 密度, 群组可达, 邻域搜索, 轨迹聚类

Abstract: The existing density-based clustering methods are mainly used for point data clustering, and not suitable for large-scale trajectory data. To address the problem, this paper proposes a trajectory clustering algorithm based on group and density. According to the principle of Minimum Description Length(MDL), the trajectories are preprocessed by segments to find out the sub trajectories with similar characteristics. The group set based on the sub trajectories is obtained by traversing the trajectories dataset twice, and the group search is used to replace the distance calculation to reduce the calculation amount required for the neighborhood object set search in the clustering process. Finally, the trajectory data set is clustered by combining the group and density. Experimental results on Atlantic hurricane track dataset show that,compared with the density-based TRACLUS track clustering algorithm, the running time of the proposed algorithm is less and the clustering results are more accurate. The running time on the small dataset and large dataset is reduced by 73.79% and 84.19% respectively, and the reduction of running time increases with the expansion of track dataset.

Key words: group, density, group reachability, neighborhood search, trajectory clustering

中图分类号: