作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

BESIII分布式计算的元数据管理

林 蕾1,孙 涌1,李卫东2,邓子艳2,张晓梅2,Nicholson Caitriana 3   

  1. (1. 苏州大学计算机科学与技术学院,江苏 苏州 215006;2. 中国科学院高能物理研究所,北京 100049;3. 中国科学院大学,北京 100049)
  • 收稿日期:2013-02-04 出版日期:2014-02-15 发布日期:2014-02-13
  • 作者简介:林 蕾(1985-),女,硕士研究生,主研方向:分布式计算,元数据管理;孙 涌,副教授;李卫东,研究员、博士生导师;邓子艳、张晓梅,副研究员;Nicholson Caitriana,博士
  • 基金资助:
    国家“973”计划基金资助项目(2009CB825200);国家自然科学基金资助项目(11205180, 11179020, 11121092, U1232201)

Metadata Management for BESIII Distributed Computing

LIN Lei 1, SUN Yong 1, LI Wei-dong 2, DENG Zi-yan 2, ZHANG Xiao-mei 2, Nicholson Caitriana 3   

  1. (1. School of Computer Science and Technology, Soochow University, Suzhou 215006, China; 2. Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China; 3. University of Chinese Academy of Sciences, Beijing 100049, China)
  • Received:2013-02-04 Online:2014-02-15 Published:2014-02-13

摘要: 北京谱仪III(BESIII)高能物理实验产生PB量级的实验数据,海量数据的处理和分析对计算资源提出较大挑战。分布式计算是整合异构计算资源和解决计算资源短缺的可行方案。根据BESIII实验需求对分布式计算所需的元数据管理进行研究,提出数据文件的元数据模型,利用中间件软件DIRAC的目录服务设计并实现元数据管理系统。该系统利用树型目录结构、物理文件名动态构建和虚拟数据集等技术,组织和存储各种类型的元数据,实现查询请求、逻辑文件以及物理文件之间的映射,使用数字证书和开放安全套接层协议保证系统安全。将该系统应用于实验数据分析和处理中,测试结果表明,当并发用户访问量为300时,查询时间仅为0.3 s,证明该系统性能较好,可以满足BESIII实验的应用需要。

关键词: 元数据, 元数据模型, 分布式计算, 元数据管理, 目录服务, 高能物理

Abstract: The high energy physics experiment of Beijing Electron Spectrum III(BESIII) produces experimental data of the magnitude of PB, which becomes an immense challenge for the existing computing resources. In order to solve the bottleneck problem, the distributed computing is considered as one of the most realistic solutions. According to the need, the metadata management is studied as an important component of BESIII distributed computing. The metadata model is designed and then metadata management system is implemented by using the catalog service of the DIRAC middleware. It adopts new techniques such as tree-like directory structure, dynamic construction of data file names and virtual datasets etc to effectively organize and store all kinds of metadata, and provides the mapping among inquiry requests, logical files and physical files. It uses the digital certificate and OpenSSL protocol to guarantee the system security. The system is set up and applied to the data processing and analysis. Test results show that its query time is just 0.3 s when it is accessed by 300 clients at the same time, so its performance meets the requirements of the BESIII experiment.

Key words: metadata, metadata model, distributed computing, metadata management, catalog service, high energy physics

中图分类号: