摘要: MongoDB数据库中的自动分片(Auto-Sharding)机制仅通过数据量来进行分片迁移,会导致负载不均衡的问题。为此,提出一种基于数据冷热访问特征的Auoto-Sharding优化机制。通过朴素贝叶斯算法对数据的访问特性进行冷热数据判定,将数据分片中热数据的所占比重作为热负载值以确定数据迁移时机,并根据数据片之间的热负载差异建立新的数据迁移策略。实验结果表明,在高并发条件下,该优化机制的数据吞吐量高于原有的Auto-Shading机制。
关键词:
自动分片机制,
冷热数据,
朴素贝叶斯,
热负载,
数据迁移
Abstract: The Auto-Sharding mechanism in MongoDB database finishes shard migration only through the data quantity,which causes unbalanced load imbalance.Aiming at this problem,this paper proposes an optimized Auto-Sharding mechanism based on the access characteristics of hot and cold data.It uses the naive Bayes algorithm to determine the data access characteristics of hot and cold data,and takes the proportion of the hot data in a data block as the heat load to determine the data migration time.It establishes new data migration strategy through the heat load differences between data blocks.Experimental results show that the data throughput of the improved mechanism is obviously better than that of the original Auto-Sharding mechanism under high concurrent condition.
Key words:
Auto-Sharding mechanism,
cold and hot data,
Naive Bayes,
heat load,
data migration
中图分类号:
冯超政,蒋溢,何军,马祥均. 基于冷热数据的MongoDB自动分片机制[J]. 计算机工程.
FENG Chaozheng,JIANG Yi,HE Jun,MA Xiangjun. Auto-Sharding Mechanism in MongoDB Based on Cold and Hot Data[J]. Computer Engineering.