计算机工程 ›› 2020, Vol. 46 ›› Issue (11): 117-123.doi: 10.19678/j.issn.1000-3428.0056311

• 先进计算与数据处理 • 上一篇    下一篇

基于差分编码的RDF分组压缩

伍伟鑫, 韩京宇, 朱曼   

  1. 南京邮电大学 计算机学院, 南京 210023
  • 收稿日期:2019-10-16 修回日期:2019-12-16 发布日期:2020-01-09
  • 作者简介:伍伟鑫(1994-),男,硕士研究生,主研方向为RDF数据压缩、大数据管理;韩京宇,教授、博士;朱曼,讲师、博士。
  • 基金项目:
    国家自然科学基金(61602260);江苏省社科基金重点项目(18GLA004)。

RDF Grouping Compression Based on Delta Encoding

WU Weixin, HAN Jingyu, ZHU Man   

  1. School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
  • Received:2019-10-16 Revised:2019-12-16 Published:2020-01-09

摘要: 语义网技术的发展使资源描述框架(RDF)的数据量迅速增长,导致其对存储空间与传输带宽的要求不断提高。现有的通用压缩方法和RDF专用压缩方法可以解决该问题,但仍存在数据冗余。为此,提出一种基于差分编码的RDF分组压缩算法。将RDF数据根据连接宾语的谓语组合进行分组,在消除宾语冗余的同时进一步减少谓语冗余。在此基础上,针对分组后得到的主语序列,通过引入差分编码技术进一步优化其存储空间。实验结果显示,与Plain、HDT和HDT++算法相比,该算法在结构化程度低的Archives Hub、Linkedmdb、rdfabout和DBpedia数据集中可获得平均17%的性能提升,在结构化程度高的dbtune数据集中可获得23%的性能提升,表明其对于不同结构化程度的数据集均具有较好的RDF压缩性能。

关键词: 语义网, 资源描述框架, 结构化程度, 数据压缩, 差分编码

Abstract: With the development of semantic Web technology,the volume of Resource Description Framework(RDF) data is increasing rapidly along with its demand for storage space and transmission bandwidth.Existing general compression methods and RDF-specific compression methods can solve this problem,but still suffer from a lack of data redundancy.To this end,this paper proposes an RDF grouping compression algorithm based on delta encoding.The algorithm groups RDF data according to the combination of predicates connected to the object,so as to further reduce predicate redundancy while eliminating object redundancy.On this basis,it further optimizes the storage space of the grouped subject sequence data by introducing delta coding technology.Experimental results show that,compared with the Plain,HDT and HDT++ algorithm,this algorithm improves the performance by 17% on average in less structured datasets including Archives Hub,Linkedmdb,rdfabout and DBpedia,meanwhile improves performance by 23% on average in highly structured dataset dbtune,which demonstrates that the proposed algorithm has better RDF compression performance in datasets with different degrees of structure.

Key words: sematic Web, Resource Description Framework(RDF), degree of structure, data compression, delta encoding

中图分类号: