Abstract:
As data warehouses grow in size, how to assuring the performance of answering Ad Hoc queries on massive data becomes a big challenge. To address the issue, this paper proposes a parallel data warehouse architecture, HDW, built upon PC cluster. It employs Google’s GFS, Bigtable to process the distributive storage management and MapReduce to parallelize OLAP computation tasks. In addition, it provides the XMLA interface for front-end applications. Experimental results conducted on an 18-node cluster show that HDW scales well and can process large data sets with at least 10 million tuples.
Key words:
data warehouse,
OLAP,
cluster
摘要: 针对数据仓库规模不断增长而导致难以确保即席查询分析性能的问题,提出一种构建在PC集群上的并行数据仓库架构——HDW,采用Google的GFS和Bigtable技术进行分布式存储管理,采用MapReduce技术进行并行联机分析处理,为前台应用程序提供遵循XMLA规范的统一接口。在18个节点的集群上进行实验,结果表明,HDW系统扩展性好,能快速处理至少千万条元组的数据。
关键词:
数据仓库,
联机分析处理,
集群
CLC Number:
YOU Jin-guo; XI Jian-qing; XIAO Yu-hong. Parallel Data Warehouses Architecture Based on PC Cluster[J]. Computer Engineering, 2009, 35(20): 73-75.
游进国;奚建清;肖裕洪. 基于PC集群的并行数据仓库架构[J]. 计算机工程, 2009, 35(20): 73-75.