Author Login Editor-in-Chief Peer Review Editor Work Office Work

Computer Engineering

Previous Articles     Next Articles

Design and Implementation of HBase-based Data Fully Localization Analysis Platform

LEI Xiaofeng  1,2,LI Qiang  1,2,SUN Zhenyu  1,2,SUN Gongxing  1   

  1. (1.Institute of High Energy Physics,Chinese Academy of Sciences,Beijing 100049,China; 2.University of Chinese Academy of Sciences,Beijing 100049,China)
  • Received:2015-06-29 Online:2016-06-15 Published:2016-06-15

基于HBase的数据完全本地化分析平台设计与实现

雷晓凤1,2,李强1,2,孙震宇1,2,孙功星1   

  1. (1.中国科学院 高能物理研究所,北京 100049; 2.中国科学院大学,北京 100049)
  • 作者简介:雷晓凤(1987-),女,博士研究生,主研方向为分布式计算;李强、孙震宇,博士研究生;孙功星,研究员。
  • 基金资助:
    国家自然科学基金资助项目(11375223,11375221);国家自然科学基金委员会-中国科学院大科学装置科学研究联合基金资助项目(11179020)。

Abstract: To make full use of I/O resources and improve data analysis efficiency,according to the features of data analysis procedure and data storage,this paper develops new C++ interfaces to access HBase by using Java Native Interface(JNI) and provides a data fully localization analysis platform for data accessing.Meanwhile,it re-designs and implements the related algorithms and software components of MapReduce,and enables optimal allocation and combination of Mapper tasks to improve the utilization of CPU resources.In addition,it provides new user friendly interfaces by integrating the data analysis environment,job management system and ROOT graphics module.Test results show that the new platform is faster and more scalable compared with traditional data analysis system based on file storage.

Key words: data localization, MapReduce model, HBase database, Java Native Interface(JNI), Cairgorm framework, Django framework

摘要: 为充分利用I/O资源并提高数据分析效率,针对高能物理数据分析过程及数据存储特点,利用Java本地接口技术,提出基于HBase C++访问接口的数据完全本地化分析平台,并设计MapReduce模型的相关算法及组件,根据Mapper任务的优化分配及组合提高CPU资源的利用率。通过 集成高能物理数据分析环境、作业管理系统、ROOT绘图模块等,实现全新的Web用户接口,简化用户操作。测试结果表明,与传统基于文件存储的数据分析系统相比,该平台的数据分析速度更快,可扩展性更好。

关键词: 数据本地化, MapReduce模型, HBase数据库, Java本地接口, Cairgorm框架, Django框架

CLC Number: