摘要: 协同式检查点设置及卷回恢复技术是一种简单有效的容错手段,被广泛地运用于并行/分布式系统中。为进一步降低协同式检查点算法的开销,该文给出了一个基于可重建检查点的非阻塞协同式检查点算法。并行程序出错导致卷回恢复发生的概率远小于检查点设置概率,该算法利用这一特性,将检查点设置的部分开销转至卷回恢复阶段,降低了容错的开销,提高了系统的可扩展性。
关键词:
检查点,
容错,
卷回恢复,
非阻塞
Abstract: As an effective method of fault-tolerance, technologies of coordinated checkpoint and rollback recovery are widely used on the parallel or distributed computer systems. In order to reduce the overhead of checkpoint time, this paper proposes a low and non-blocking coordinated checkpoint algorithm based on reconstructed checkpoint. Checkpoint happens much more often than rollback, fractional consumption of checkpoint setting is turned to rollback recovery stage. The algorithm lowers fault-tolerance consumption, and improves system’s scalability.
Key words:
checkpoint,
fault-tolerance,
rollback recovery,
non-blocking
中图分类号:
万国伟;卢宇彤;谢 旻;沈志宇. 一种低开销非阻塞的协同式检查点算法[J]. 计算机工程, 2007, 33(24): 66-68.
WAN Guo-wei; LU Yu-tong; XIE Min; SHEN Zhi-yu. Coordinated Checkpoint Algorithm of Low-overhead and Non-blocking[J]. Computer Engineering, 2007, 33(24): 66-68.