计算机工程

• 先进计算与数据处理 • 上一篇    下一篇

基于云计算的多重查询优化系统

葛 星1,沈 耀1,徐常亮2   

  1. (1. 上海交通大学计算机科学与工程系,上海200240;2. 阿里云计算有限公司,杭州310099)
  • 收稿日期:2013-09-25 出版日期:2014-09-15 发布日期:2014-09-12
  • 作者简介:葛 星(1987 - ),男,硕士研究生,主研方向:云计算,分布式计算;沈 耀(通讯作者),副教授;徐常亮,博士。
  • 基金项目:
    国家“863”计划基金资助重大项目“以支撑电子商务为主的网络操作系统研制”(2011AA01A202)。

Multiple Query Optimization System Based on Cloud Computing

GE Xing 1,SHEN Yao 1,XU Chang-liang 2   

  1. (1. Department of Computer Science and Engineering,Shanghai Jiaotong University,Shanghai 200240,China; 2. Alibaba Cloud Computing Co.,Ltd.,Hangzhou 310099,China)
  • Received:2013-09-25 Online:2014-09-15 Published:2014-09-12

摘要: 在常规海量数据分析作业中,CPU / IO 密集型的查询语句通常复杂、耗时并存在大量可复用的公共部 分。如何检测、共享和复用回归查询集中语句间的公共部分成为亟需解决的问题。为此,提出特征值索引方 法,并构建适用于云计算场景的LSShare 多重查询优化系统。基于查询语句的抽象语法树将语句划分为不同的 查询层次,针对每个查询层次抽取特征向量并计算特征值。建立简单高效的特征值索引表以识别多重查询语 句间的公共部分,并结合SQL 重写技术来复用其中的公共部分。随着运行迭代次数的增加,LSShare 系统将逐 步优化云计算场景中的回归查询集。实验结果表明,该系统在运行效率上优于传统查询语句系统,可节约近 1 / 3的执行时间。

关键词: 云计算, 多重查询优化, 查询处理, 子表达式识别, 海量数据处理, 回归查询集

Abstract: In routine massive data analysis queries,the CPU / IO intensive analysis queries are complex and timeconsuming, but share common components. It is challenging to detect,share and reuse the common components among thousands of SQL-like queries. Aiming at these problems, this paper proposes the signature-index approach and implements the LSShare system to solve the Multiple Query Optimization(MQO) problem in the cloud with a recurring query set. It generates signatures for each query based on Abstract Syntax Tree (AST). Then it makes a simple but efficient index for further identifying and sharing common components of multiple queries combined with SQL-rewriting techniques. LSShare system gradually optimizes regression query set in the cloud computing scene as the superposition of run number. Experimental results demonstrate, the system is superior to the traditional query optimization in share equally,and it can save nearly a third of the execution time.

Key words: cloud computing, Multiple Query Optimization (MQO), query processing, subexpression identification, massive data processing, recurring query set

中图分类号: