作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2021, Vol. 47 ›› Issue (3): 62-70. doi: 10.19678/j.issn.1000-3428.0057108

• 人工智能与模式识别 • 上一篇    下一篇

面向轨迹流数据的索引构建与存储方法研究

蔡瑞初1, 林峰极1, 郝志峰1,2, 王立3, 温雯1   

  1. 1. 广东工业大学 计算机学院, 广州 510006;
    2. 佛山科学技术学院 数学与大数据学院, 广东 佛山 528000;
    3. 依图网络科技有限公司 新加坡研发部, 新加坡 018960
  • 收稿日期:2020-01-03 修回日期:2020-03-02 发布日期:2020-03-06
  • 作者简介:蔡瑞初(1983-),男,教授、博士,主研方向为数据挖掘、高性能计算;林峰极(通信作者),硕士研究生;郝志峰,教授、博士;王立,高级工程师、博士;温雯,教授、博士。
  • 基金资助:
    国家自然科学基金(61100148,61876043);国家自然科学基金-广东省联合基金(U1501254);广东省自然科学基金(2014A030306004,2014A030308008);广东省特支计划(2015TQ01X140);广州市科技计划(201902010058);广州市珠江科技新星专项(201610010101)。

Research on Index Construction and Storage Method for Trajectory Stream Data

CAI Ruichu1, LIN Fengji1, HAO Zhifeng1,2, WANG Li3, WEN Wen1   

  1. 1. School of Computer, Guangdong University of Technology, Guangzhou 510006, China;
    2. School of Mathematics and Big Data, Foshan University, Foshan, Guangdong 528000, China;
    3. Singapore Research and Development Department, Yitu Network Technology Co., Ltd., Singapore 018960, Singapore
  • Received:2020-01-03 Revised:2020-03-02 Published:2020-03-06

摘要: 移动社交网络等基于定位服务应用的快速发展导致时空数据流规模呈爆炸式增长,要求底层数据存储系统支持高吞吐量轨迹数据的插入以及空间和时间约束下的低延迟查询,而现有HBase等数据存储方案因索引更新开销过高无法满足该需求。针对时空数据流的应用特性,提出一种数据流内存索引及存储方法。根据键值和时间范围对历史与增量数据元组进行物理分区,将其以模板B+树的形式写入内存并构建索引以增强快速写入和查询能力,同时对数据进行压缩存储提升索引效率。在此基础上,采用多级索引根据数据分区将复杂查询分解为可独立处理的子查询。实验结果表明,与传统HBase、WaterWheel等方法相比,该方法在不同数据插入和查询条件下的数据存储性能与查询效率更优。

关键词: 轨迹流数据, 数据分区, 存储, 多级索引, Bloom过滤器

Abstract: The rapid development of location-based services and applications such as mobile social networks leads to the explosive growth of the scale of spatiotemporal data stream,which requires the underlying data storage system to support high-throughput trajectory data insertion and low-latency query under space and time constraints.However,the existing data storage schemes such as HBase fail to meet the requirements due to the high index update overhead. According to the application characteristics of spatiotemporal data stream,this paper proposes a data stream memory index and storage method. According to the key value and time range,the tuple of historical and incremental data is physically partitioned, which is written into memory in the form of template B+ tree,and an index is built to enhance the ability of fast writing and query. At the same time,the data is compressed to improve the index efficiency.On this basis,the multi-level index is adopted to decompose the complex query into sub-queries that can be independently processed according to the data partition.Experimental results show that under different data insertion and query conditions,the proposed method outperforms traditional HBase,WaterWheel and other methods in terms of data storage performance and query efficiency.

Key words: trajectory stream data, data partition, storage, multilevel index, Bloom filter

中图分类号: