Author Login Chief Editor Login Reviewer Login Editor Login Remote Office

Computer Engineering ›› 2010, Vol. 36 ›› Issue (21): 251-253.

• Networks and Communications • Previous Articles     Next Articles

Differential Backup Method Based on Duplicated Data Elimination

WU Xiao-yong, YANG Pin, HU Xiao-qin, ZANG Wen-juan   

  1. (College of Computer Science, Sichuan University, Chengdu 610065, China)
  • Online:2010-11-05 Published:2010-11-03

基于重复数据消除的差异备份方法

吴晓勇,杨 频,胡晓勤,臧文娟   

  1. (四川大学计算机学院,成都 610065)
  • 作者简介:吴晓勇(1985-),男,硕士研究生,主研方向:网络安全;杨 频,副教授;胡晓勤,讲师;臧文娟,硕士研究生
  • 基金资助:
    国家自然科学基金资助项目(60873246);教育部创新工程重大项目培育基金资助项目(708075);教育部博士点基金资助项目(20070610032)

Abstract: In order to eliminate the influence of duplicated data on transmission and storage, this paper proposes a differential backup method based on duplicated data elimination. By segmenting the block of file into several fixed size according to some interval and using Hash table to identify unique block, Rsync algorithm can detect duplicated data among different files. Local match is realized by segmenting Hash table. Differences transmission between different versions of files is realized by using local checksum file. Experimental results show that, compared with Rsync algorithm, the method can reduce the amount of data transmitted, decrease the disk capacity, and enhance the block search efficiency.

Key words: Rsync algorithm, duplicated data, length of regional block, group Hash

摘要: 为消除重复数据对数据传输和存储产生的影响,提出一种基于重复数据消除的差异备份方法。通过将文件的块按照一定区间划分固定大小并采用Hash表对文件块进行唯一性标识,使Rsync算法能检测不同文件之间的重复数据,通过分割Hash表,使块实现局部匹配,并利用校验和文件实现文件不同版本的差异传输。实验结果表明,与Rsync算法相比,该方法能有效减少传输的数据量,降低备份中心的存储量,提高块查找的效率。

关键词: Rsync算法, 重复数据, 区域块长, 分组Hash

CLC Number: