基于数据冗余的HBase合并机制研究

doi:10.3969/j.issn.1000-3428.2017.02.011

计算机工程

基于数据冗余的HBase合并机制研究

熊安萍,王运萍,邹洋

(重庆邮电大学计算机科学与技术学院,重庆 400065)

收稿日期:2015-12-29 出版日期:2017-02-15 发布日期:2017-02-15
作者简介:熊安萍(1970—),女,教授、博士,主研方向为高性能计算、信息安全;王运萍,硕士研究生;邹洋,讲师、硕士。
基金资助:
重庆市教委科学技术研究项目(KJ1400414);重庆邮电大学博士启动基金(A2015-17);重庆邮电大学自然科学基金(A2011-29)。

Research on HBase Compaction Mechanism Based on Data Redundancy

XIONG Anping,WANG Yunping,ZOU Yang

(School of Computer Science and Technology,Chongqing University of Posts and Telecommunications,Chongqing 400065,China)

Received:2015-12-29 Online:2017-02-15 Published:2017-02-15

摘要/Abstract

摘要： HBase列式数据库的所有操作均以追加数据方式写入,导致其合并机制占用资源过多,影响系统读性能。为解决该问题,提出一种基于数据冗余的合并机制,将列族下文件删除数据占比达到设定阈值的文件进行合并,以减少无用数据在系统中的占用空间。实验结果表明,与HBase原有仅考虑文件大小、个数和时间间隔的合并机制相比,改进的合并机制可提高HBase系统查询效率以及Major合并性能。

关键词: 列式数据库, 存储, HBase合并机制, CPU利用率, 读性能

Abstract: In HBase,the operations are written to database in the form of appending data.HBase Compaction mechanisms occupy plenty of system resources,which affects read performance.To solve this problem,a mechanism based on data redundancy is proposed.By compacting the column files whose ratio of deleted data equals the threshold,the algorithm can reduce space occupation because it reduces the number of files while cleaning useless data.Experimental result indicates,compared with the original HBase Compaction mechanism,which only considers the size and number of files and time interval,the proposed Compaction mechanism can improve HBase system query efficiency and enhance HBase Major compaction capability.

Key words: column database, storage, HBase Compaction mechanism, CPU utilization rate, read performance

中图分类号:

TP311

熊安萍,王运萍,邹洋. 基于数据冗余的HBase合并机制研究[J]. 计算机工程.

XIONG Anping,WANG Yunping,ZOU Yang. Research on HBase Compaction Mechanism Based on Data Redundancy[J]. Computer Engineering.

https://www.ecice06.com/CN/Y2017/V43/I2/63

参考文献

参考文献［1］O’Neil P,Cheng E,Gawlick D,et al.The Log-structured Merge-tree(LSM-tree)［J］.Acta Informatica,1996,33(4):351-385. ［2］Vora M N.Hadoop-HBase for Large-scale Data［C］//Proceedings of 2011 International Conference on Computer Science and Network Technology.Washington D.C.,USA:IEEE Press,2011:601-605. ［3］唐长城,杨峰,代栋,等.一种基于HBase的数据持久性和可用性研究［J］.计算机系统应用,2013(10):175-180. ［4］周跃,臧斌宇.分布式NoSQL系统写操作性能优化设计与实现［J］.计算机应用与软件,2014,31(11):25-28. ［5］Ahmad M Y,Kemme B.Compaction Management in Distributed Key-value Datastores［J］.Proceedings of the VLDB Endowment,2015,8(8):850-861. ［6］Ghosh M,Gupta I,Gupta S,et al.Fast Compaction Algorithms for NoSQL Databases［C］//Proceedings of the 35th International Conference on Distributed Computing Systems.Washington D.C.,USA:IEEE Computer Society,2015:452-461. ［7］Zhang Zigang,Yue Yinliang,He Bingsheng,et al.Pipelined Compaction for the LSM-Tree［C］//Proceedings of the 28th International Parallel and Distributed Processing Symposium.Washington D.C.,USA:IEEE Press,2014:777-786. ［8］Harter T,Borthakur D,Dong Siying,et al.Analysis of HDFS Under HBase:A Facebook Messages Case Study［C］//Proceedings of the 12th USENIX Conference on File and Storage Technologies.Berkeley,USA:USENIX Association,2014:199-212. ［9］张智,龚宇.分布式存储系统HBase关键技术研究［J］.现代计算机,2014(11):33-37. ［10］冯晓普.HBase存储的研究与应用［D］.北京:北京邮电大学,2014. ［11］竹叶青.hbase权威指南:store file合并(compaction)［EB/OL］.［2015-11-08］.http://blog.csdn.net/azhao_dn/article/details/8867036. ［12］Dumon B.Visualizing Hbase Flushes and Compaction［EB/OL］.［2015-11-08］.http://outerthought.org/blog/465-ot.html. ［13］Bhupathiraju V,Ravuri R P.The Dawn of Big Data-Hbase［C］//Proceedings of CSIBIG’14.Washington D.C.,USA:IEEE Press,2014:1-4. ［14］Bao Xianqiang,Liu Ling,Xiao Nong,et al.HConfig:Resource Adaptive Fast Bulk Loading in HBase［C］//Proceedings of 2014 International Conference on Collaborative Computing:Networking,Applications and Worksharing.Washington D.C.,USA:IEEE Press,2014:215-224. ［15］Saloustros G,Magoutis K.Rethinking HBase:Design and Implementation of an Elastic Key-value Store over Log-structured Local Volumes［C］//Proceedings of the 14th International Symposium on Parallel and Distributed Com- puting.Washington D.C.,USA:IEEE Press,2015:225-234. 编辑陆燕菲

[1]	唐莹莹, 陈玉玲, 罗运, 李再东. 基于全同态加密的可验证多关键词密文检索方案[J]. 计算机工程, 2025, 51(4): 188-197.
[2]	次天钊, 杨昊, 周游, 谢长生, 吴非. 安卓智能手机存储系统优化方法综述[J]. 计算机工程, 2025, 51(3): 1-23.
[3]	鲜港, 杨文祥, 张晓蓉, 喻杰, 田永强. 基于作业路径的存储系统作业感知条带优化方法[J]. 计算机工程, 2025, 51(3): 34-44.
[4]	周昱, 于宗光. 基于RS和BCH码的SRAM-PUF密钥提取方法及性能分析[J]. 计算机工程, 2024, 50(7): 187-193.
[5]	曾灵灵, 张敦博, 沈立, 窦强. 便笺式存储器中一种新颖的交错映射数据布局[J]. 计算机工程, 2024, 50(5): 33-40.
[6]	刘道清, 扈红超, 霍树民. 容器云中面向持久化存储的拟态防御技术研究[J]. 计算机工程, 2024, 50(2): 165-179.
[7]	杨思捷, 陈俊奇, 王勇, 李树林. 基于FPGA的软硬件协同纠删码编码加速方案[J]. 计算机工程, 2024, 50(2): 224-231.
[8]	张晓均, 刘庆, 郑爽, 王鑫, 薛婧婷, 王世雄. 支持隐私保护的可验证云端数据分享方案[J]. 计算机工程, 2023, 49(3): 49-57.
[9]	方燕飞, 刘齐, 董恩铭, 李雁冰, 过锋, 王谛, 何王全, 漆锋滨. 面向E级超算系统的众核片上存储层次研究[J]. 计算机工程, 2023, 49(12): 10-24.
[10]	李闽, 张倩颖, 王国辉, 施智平, 关永. 抗板级物理攻击的持久存储方法研究[J]. 计算机工程, 2022, 48(2): 132-139.
[11]	杨珂, 张帆, 郭威, 赵博, 穆清. 一种拟态存储元数据随机性问题解决方法[J]. 计算机工程, 2022, 48(2): 140-146,155.
[12]	何晓斌, 高洁, 肖伟, 陈起, 刘鑫, 陈左宁. 应用透明的超算多层存储加速技术研究[J]. 计算机工程, 2022, 48(12): 1-8.
[13]	刘新, 胡翔瑜, 徐刚, 陈秀波. 区块链数据保密查询的不经意传输协议[J]. 计算机工程, 2022, 48(10): 13-20.
[14]	魏秀然, 王峰. 基于协调器与遗传算法的云存储数据复制策略[J]. 计算机工程, 2021, 47(8): 124-130,139.
[15]	郑小敏, 李翔宇. 随机森林手势识别算法的高效嵌入式软件实现[J]. 计算机工程, 2021, 47(7): 218-225.

选择文件类型/文献管理软件名称

选择包含的内容

基于数据冗余的HBase合并机制研究

Research on HBase Compaction Mechanism Based on Data Redundancy

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

基于数据冗余的HBase合并机制研究

Research on HBase Compaction Mechanism Based on Data Redundancy

PDF

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价