摘要:
高能物理对撞机产生数百亿计的物理事例,而物理分析则是从中选取几千个有意义的事例,该分析过程是一个典型的大数据处理及数据挖掘应用。由此,设计高效的数据结构、存储及访问机制,快速挑选出有意义的物理事例十分重要。介绍事例的数据结构、存储和处理技术,分析高能物理数据的特点,提出一种以HBase,ROOT,BEAN 及 MapReduce 为基础的新型高能物理数据存储及处理技术系统。利用HBase 存储数据、MapReduce 实现并行处理,选择 ROOT 和BEAN 作为高能物理分析框架,并给出具体设计与实现方案。测试结果表明,与传统高能物理数据存储系统相比,该系统具有更快的数据处理速度,当预筛选服务生效时能够更加有效地利用I / O 和CPU 资源。
关键词:
高能物理数据,
大数据,
HBase 数据库,
ROOT 框架,
BEAN 框架,
MapReduce 框架
Abstract:
High energy collider produces several billions of events in the whole life time. Physical analysis is to select thousands of meaningful events from them and it is a typical big data processing and data mining application. Therefore, it is significantly important to design an efficient data structure,storage and access mechanism,so that the meaningful events can be selected quickly. This paper introduces event data structure,storage and processing technology in popular. This paper analyses the features of high energy physics analysis and proposes a new technology of data storing and processing for high energy physics. This paper fertilizes HBase to store data,uses MapReduce to implement parallel processing and selects ROOT and BEAN as high energy physics analysis frame. This paper also describes the specific design and implementation of the new platform. Test result shows that compared with traditional data storage system of high energy physics,the system has quick data processing speed,it can use effectively I / O and CPU resources when reselection goes into effect.
Key words:
high energy physics data,
big data,
HBase database,
ROOT frame,
BEAN framework,
MapReduce frame-work
中图分类号:
雷晓凤,李强,孙功星. 基于HBase 的高能物理数据存储及分析平台[J]. 计算机工程.
LEI Xiaofeng,LI Qiang,SUN Gongxing. HBase-based Storage and Analysis Platform for High Energy Physics Data[J]. Computer Engineering.