作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程

• 体系结构与软件技术 • 上一篇    下一篇

CMS实验元数据发现的数据聚集系统

梁 栋1,臧冬松1,霍 菁1,孙功星1,Valentin Kuznetsov 2   

  1. (1. 中国科学院高能物理研究所,北京 100049;2. 康奈尔大学,美国 纽约 14850)
  • 收稿日期:2013-03-11 出版日期:2014-04-15 发布日期:2014-04-14
  • 作者简介:梁 栋(1984-),男,博士研究生,主研方向:数据发现,分布式系统;臧冬松、霍 菁,博士研究生;孙功星,研究员、博士生导师;Valentin Kuznetsov,副教授。
  • 基金资助:
    国家自然科学基金A3前瞻性计划基金资助项目(6116140454);国家自然科学基金资助面上项目(11179020, 11375223)。

Data Aggregation System for Meta-data Discovery in CMS Experiment

LIANG Dong  1, ZANG Dong-song  1, HUO Jing  1, SUN Gong-xing  1, Valentin Kuznetsov  2   

  1. (1. Institute of High Energy Physics, Chinese Academy of Sciences, Beijing 100049, China; 2. Cornell University, New York 14850, USA)
  • Received:2013-03-11 Online:2014-04-15 Published:2014-04-14

摘要: 在大型强子对撞机上的紧凑繆子螺线管探测器实验,具有数据量大(PB级规模)、数据类型复杂与数据地理上全球分布的特点。记录上述数据的元数据达到TB级的规模,并且以不同的格式保存在不同的关系和非关系数据源中,通过在这些异构数据源上添加一个缓存层的方法,实现一个提供精确的关键词查询的数据聚集系统。根据多重映射和聚集的方式支持用户的查询,并利用有效的缓存管理策略来提升查询的命中率。实验结果表明,该系统能够通过缓存的方式响应超过70%的用户查询,具有良好的查询性能。

关键词: 关键词查询, 数据聚集, 元数据发现, 缓存管理, 映射, 异构数据源

Abstract: The Compact Muon Solenoid(CMS) experiment on the Large Hadron Collider(LHC) produces PBs of physics data. Those data not only are huge on volume, but also have complex types and being distributed all over the world. Therefore the meta-data about how to organize those physics data reach TB in size. Those meta-data are kept in different relational or non-relational data sources in different format. In order to meet the data discovery requirement, it is important to provide a unified query interface. By adding a caching layer upon those data sources, this paper implements a data aggregation system, which provides precise keyword style search interface. It demonstrates how to support user queries by multiple mapping and aggregation, and how to manage the cache efficiently. Experimental result shows that more than 70% user queries can be answered by the cache system, and it has well queries performance.

Key words: keyword search, data aggregation, meta-data discovery, cache management, mapping, heterogeneous data sources

中图分类号: