Abstract:
With the advance of semantic Web,the Resource Description Framework(RDF) data published on the Web
reaches the scale of ten billion triples,and it shows a geometric growth trend. Simple Protocol and RDF Query Language (SPARQL) query methods on stand-alone machine are no longer applicable. For this problem,this paper proposes a SPARQL Basic Graph Pattern(BGP) search algorithm based on Bulk Synchronous Parallel(BSP) model. According to the graph nature of RDF data and the definition of BGP,it divides the whole process into “ matching ” stage and “iteration” stage. First match each triple patterns and then iterate to get the query results eventually. It implements the algorithm by HAMA distributed computing framework. Experimental results show that it has higher query efficiency than SPARQL algorithm based on MapReduce,and it can support the SPARQL query of the large scale RDF data.
Key words:
semantic Web,
Resource Description Framework(RDF),
SPARQL search,
Basic Graph Pattern(BGP),
Bulk Synchronous Parallel(BSP) model,
HAMA framework
摘要: 随着语义网的不断发展,发布在互联网上的资源描述框架(RDF)数据达到百亿级三元组规模,并且呈现
几何增长趋势,针对RDF 数据的单机SPARQL 查询方法已经不再适用。为此,提出一种基于整体同步并行(BSP)
模型的SPARQL 基本图模式查询算法。根据RDF 有向图数据特性及基本图模式定义,将整个查询过程分成匹配
和迭代2 个阶段,在匹配出所需查询的三元组模式后,通过迭代使部分解逐步逼近完全解,得到最终查询结果。利
用HAMA 分布式计算框架进行算法实现,实验结果表明,与基于MapReduce 的SPARQL 查询算法相比,该算法具
有较高的查询效率,能为大规模RDF 数据的快速SPARQL 查询提供支持。
关键词:
语义网,
资源描述框架,
SPARQL 查询,
基本图模式,
整体同步并行模型,
HAMA 框架
CLC Number:
LI Guo-ding,FENG Zhi-yong,RAO Guo-zheng,WANG Xin. SPARQL Basic Graph Pattern Search Algorithm Based on Bulk Synchronous Parallel[J]. Computer Engineering.
李国鼎,冯志勇,饶国政,王鑫. 基于BSP 的SPARQL 基本图模式查询算法[J]. 计算机工程.