作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于冷热数据的MongoDB自动分片机制

冯超政1,蒋溢1,何军1,2,马祥均3   

  1. (1.重庆邮电大学 计算机科学与技术学院,重庆 400065; 2.重庆中兴通讯研究所,重庆 401121;3.重庆市公安局网络安全保卫总队,重庆 401147)
  • 收稿日期:2016-02-18 出版日期:2017-03-15 发布日期:2017-03-15
  • 作者简介:冯超政(1989—),男,硕士研究生,主研方向为海量信息处理;蒋溢,教授级高级工程师;何军、马祥均,高级工程师。
  • 基金资助:
    重庆市教委科学技术研究项目(KJ1400414);工信部2012年物联网发展专项(2-5);重庆邮电大学博士启动基金(A2015-17)。

Auto-Sharding Mechanism in MongoDB Based on Cold and Hot Data

FENG Chaozheng 1,JIANG Yi 1,HE Jun 1,2,MA Xiangjun 3   

  1. (1.School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China;2.Chongqing ZTE Research Institute,Chongqing 401121,China;3.Chongqing Municipal Public Security Bureau Network Security Corps,Chongqing 401147,China)
  • Received:2016-02-18 Online:2017-03-15 Published:2017-03-15

摘要: MongoDB数据库中的自动分片(Auto-Sharding)机制仅通过数据量来进行分片迁移,会导致负载不均衡的问题。为此,提出一种基于数据冷热访问特征的Auoto-Sharding优化机制。通过朴素贝叶斯算法对数据的访问特性进行冷热数据判定,将数据分片中热数据的所占比重作为热负载值以确定数据迁移时机,并根据数据片之间的热负载差异建立新的数据迁移策略。实验结果表明,在高并发条件下,该优化机制的数据吞吐量高于原有的Auto-Shading机制。

关键词: 自动分片机制, 冷热数据, 朴素贝叶斯, 热负载, 数据迁移

Abstract: The Auto-Sharding mechanism in MongoDB database finishes shard migration only through the data quantity,which causes unbalanced load imbalance.Aiming at this problem,this paper proposes an optimized Auto-Sharding mechanism based on the access characteristics of hot and cold data.It uses the naive Bayes algorithm to determine the data access characteristics of hot and cold data,and takes the proportion of the hot data in a data block as the heat load to determine the data migration time.It establishes new data migration strategy through the heat load differences between data blocks.Experimental results show that the data throughput of the improved mechanism is obviously better than that of the original Auto-Sharding mechanism under high concurrent condition.

Key words: Auto-Sharding mechanism, cold and hot data, Naive Bayes, heat load, data migration

中图分类号: