摘要: 北京谱仪III(BESIII)高能物理实验产生PB量级的实验数据,海量数据的处理和分析对计算资源提出较大挑战。分布式计算是整合异构计算资源和解决计算资源短缺的可行方案。根据BESIII实验需求对分布式计算所需的元数据管理进行研究,提出数据文件的元数据模型,利用中间件软件DIRAC的目录服务设计并实现元数据管理系统。该系统利用树型目录结构、物理文件名动态构建和虚拟数据集等技术,组织和存储各种类型的元数据,实现查询请求、逻辑文件以及物理文件之间的映射,使用数字证书和开放安全套接层协议保证系统安全。将该系统应用于实验数据分析和处理中,测试结果表明,当并发用户访问量为300时,查询时间仅为0.3 s,证明该系统性能较好,可以满足BESIII实验的应用需要。
关键词:
元数据,
元数据模型,
分布式计算,
元数据管理,
目录服务,
高能物理
Abstract: The high energy physics experiment of Beijing Electron Spectrum III(BESIII) produces experimental data of the magnitude of PB, which becomes an immense challenge for the existing computing resources. In order to solve the bottleneck problem, the distributed computing is considered as one of the most realistic solutions. According to the need, the metadata management is studied as an important component of BESIII distributed computing. The metadata model is designed and then metadata management system is implemented by using the catalog service of the DIRAC middleware. It adopts new techniques such as tree-like directory structure, dynamic construction of data file names and virtual datasets etc to effectively organize and store all kinds of metadata, and provides the mapping among inquiry requests, logical files and physical files. It uses the digital certificate and OpenSSL protocol to guarantee the system security. The system is set up and applied to the data processing and analysis. Test results show that its query time is just 0.3 s when it is accessed by 300 clients at the same time, so its performance meets the requirements of the BESIII experiment.
Key words:
metadata,
metadata model,
distributed computing,
metadata management,
catalog service,
high energy physics
中图分类号:
林蕾,孙涌,李卫东,邓子艳,张晓梅,Nicholson Caitriana. BESIII分布式计算的元数据管理[J]. 计算机工程.
LIN Lei1, SUN Yong, LI Wei-dong, DENG Zi-yan, ZHANG Xiao-mei, Nicholson Caitriana. Metadata Management for BESIII Distributed Computing[J]. Computer Engineering.