摘要: HBase列式数据库的所有操作均以追加数据方式写入,导致其合并机制占用资源过多,影响系统读性能。为解决该问题,提出一种基于数据冗余的合并机制,将列族下文件删除数据占比达到设定阈值的文件进行合并,以减少无用数据在系统中的占用空间。实验结果表明,与HBase原有仅考虑文件大小、个数和时间间隔的合并机制相比,改进的合并机制可提高HBase系统查询效率以及Major合并性能。
关键词:
列式数据库,
存储,
HBase合并机制,
CPU利用率,
读性能
Abstract: In HBase,the operations are written to database in the form of appending data.HBase Compaction mechanisms occupy plenty of system resources,which affects read performance.To solve this problem,a mechanism based on data redundancy is proposed.By compacting the column files whose ratio of deleted data equals the threshold,the algorithm can reduce space occupation because it reduces the number of files while cleaning useless data.Experimental result indicates,compared with the original HBase Compaction mechanism,which only considers the size and number of files and time interval,the proposed Compaction mechanism can improve HBase system query efficiency and enhance HBase Major compaction capability.
Key words:
column database,
storage,
HBase Compaction mechanism,
CPU utilization rate,
read performance
中图分类号:
熊安萍,王运萍,邹洋. 基于数据冗余的HBase合并机制研究[J]. 计算机工程.
XIONG Anping,WANG Yunping,ZOU Yang. Research on HBase Compaction Mechanism Based on Data Redundancy[J]. Computer Engineering.