作者投稿和查稿 主编审稿 专家审稿 编委审稿 远程编辑

计算机工程 ›› 2007, Vol. 33 ›› Issue (24): 66-68. doi: 10.3969/j.issn.1000-3428.2007.24.022

• 软件技术与数据库 • 上一篇    下一篇

一种低开销非阻塞的协同式检查点算法

万国伟,卢宇彤,谢 旻,沈志宇   

  1. 国防科技大学计算机学院,长沙 410073
  • 收稿日期:1900-01-01 修回日期:1900-01-01 出版日期:2007-12-20 发布日期:2007-12-20

Coordinated Checkpoint Algorithm of Low-overhead and Non-blocking

WAN Guo-wei, LU Yu-tong, XIE Min, SHEN Zhi-yu   

  1. Computer School, National University of Defense Technology, Changsha 410073
  • Received:1900-01-01 Revised:1900-01-01 Online:2007-12-20 Published:2007-12-20

摘要: 协同式检查点设置及卷回恢复技术是一种简单有效的容错手段,被广泛地运用于并行/分布式系统中。为进一步降低协同式检查点算法的开销,该文给出了一个基于可重建检查点的非阻塞协同式检查点算法。并行程序出错导致卷回恢复发生的概率远小于检查点设置概率,该算法利用这一特性,将检查点设置的部分开销转至卷回恢复阶段,降低了容错的开销,提高了系统的可扩展性。

关键词: 检查点, 容错, 卷回恢复, 非阻塞

Abstract: As an effective method of fault-tolerance, technologies of coordinated checkpoint and rollback recovery are widely used on the parallel or distributed computer systems. In order to reduce the overhead of checkpoint time, this paper proposes a low and non-blocking coordinated checkpoint algorithm based on reconstructed checkpoint. Checkpoint happens much more often than rollback, fractional consumption of checkpoint setting is turned to rollback recovery stage. The algorithm lowers fault-tolerance consumption, and improves system’s scalability.

Key words: checkpoint, fault-tolerance, rollback recovery, non-blocking

中图分类号: