作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于HBase 的高能物理数据存储及分析平台

雷晓凤1,2,李 强1,2,孙功星1   

  1. (1. 中国科学院高能物理研究所,北京100049; 2. 中国科学院大学,北京100049)
  • 收稿日期:2014-07-17 出版日期:2015-06-15 发布日期:2015-06-15
  • 作者简介:雷晓凤(1987 - ),女,博士研究生,主研方向:分布式计算;李 强,博士研究生;孙功星,研究员。
  • 基金资助:

    国家自然科学基金资助项目(11375223, 11375221);国家自然科学基金A3 前瞻计划基金资助项目(61161140454);国家自然科学基金委员会-中国科学院大科学装置联合基金资助项目(11179020)。

HBase-based Storage and Analysis Platform for High Energy Physics Data

LEI Xiaofeng  1,2,LI Qiang  1,2,SUN Gongxing 1   

  1. (1. Institute of High Energy Physics,Chinese Academy of Sciences,Beijing 100049,China; 2. University of Chinese Academy of Sciences,Beijing 100049,China)
  • Received:2014-07-17 Online:2015-06-15 Published:2015-06-15

摘要:

高能物理对撞机产生数百亿计的物理事例,而物理分析则是从中选取几千个有意义的事例,该分析过程是一个典型的大数据处理及数据挖掘应用。由此,设计高效的数据结构、存储及访问机制,快速挑选出有意义的物理事例十分重要。介绍事例的数据结构、存储和处理技术,分析高能物理数据的特点,提出一种以HBase,ROOT,BEAN 及 MapReduce 为基础的新型高能物理数据存储及处理技术系统。利用HBase 存储数据、MapReduce 实现并行处理,选择 ROOT 和BEAN 作为高能物理分析框架,并给出具体设计与实现方案。测试结果表明,与传统高能物理数据存储系统相比,该系统具有更快的数据处理速度,当预筛选服务生效时能够更加有效地利用I / O 和CPU 资源。

关键词: 高能物理数据, 大数据, HBase 数据库, ROOT 框架, BEAN 框架, MapReduce 框架

Abstract:

High energy collider produces several billions of events in the whole life time. Physical analysis is to select thousands of meaningful events from them and it is a typical big data processing and data mining application. Therefore, it is significantly important to design an efficient data structure,storage and access mechanism,so that the meaningful events can be selected quickly. This paper introduces event data structure,storage and processing technology in popular. This paper analyses the features of high energy physics analysis and proposes a new technology of data storing and processing for high energy physics. This paper fertilizes HBase to store data,uses MapReduce to implement parallel processing and selects ROOT and BEAN as high energy physics analysis frame. This paper also describes the specific design and implementation of the new platform. Test result shows that compared with traditional data storage system of high energy physics,the system has quick data processing speed,it can use effectively I / O and CPU resources when reselection goes into effect.

Key words: high energy physics data, big data, HBase database, ROOT frame, BEAN framework, MapReduce frame-work

中图分类号: