作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2009, Vol. 35 ›› Issue (20): 73-75. doi: 10.3969/j.issn.1000-3428.2009.20.025

• 软件技术与数据库 • 上一篇    下一篇

基于PC集群的并行数据仓库架构

游进国,奚建清,肖裕洪   

  1. (华南理工大学计算机科学与工程学院,广州 510641)
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2009-10-20 发布日期:2009-10-20

Parallel Data Warehouses Architecture Based on PC Cluster

YOU Jin-guo, XI Jian-qing, XIAO Yu-hong   

  1. (School of Computer Science and Engineering, South China University of Technology, Guangzhou 510641)
  • Received:1900-01-01 Revised:1900-01-01 Online:2009-10-20 Published:2009-10-20

摘要: 针对数据仓库规模不断增长而导致难以确保即席查询分析性能的问题,提出一种构建在PC集群上的并行数据仓库架构——HDW,采用Google的GFS和Bigtable技术进行分布式存储管理,采用MapReduce技术进行并行联机分析处理,为前台应用程序提供遵循XMLA规范的统一接口。在18个节点的集群上进行实验,结果表明,HDW系统扩展性好,能快速处理至少千万条元组的数据。

关键词: 数据仓库, 联机分析处理, 集群

Abstract: As data warehouses grow in size, how to assuring the performance of answering Ad Hoc queries on massive data becomes a big challenge. To address the issue, this paper proposes a parallel data warehouse architecture, HDW, built upon PC cluster. It employs Google’s GFS, Bigtable to process the distributive storage management and MapReduce to parallelize OLAP computation tasks. In addition, it provides the XMLA interface for front-end applications. Experimental results conducted on an 18-node cluster show that HDW scales well and can process large data sets with at least 10 million tuples.

Key words: data warehouse, OLAP, cluster

中图分类号: