摘要: 把分布式的备份思想应用到大规模并行文件系统中,在使用冗余机制构建数据的系统中提供快速恢复机制。并使用马尔可夫模型建立了一个平均直到数据丢失时间的分布模型,指导如何在数据可靠性需求和冗余数据开销之间进行平衡。根据可靠性模型分析,在快速恢复机制下,使用m-n 机制,只要n≥m+2,并且恢复数据所需的计算时间与磁盘I/O 时间相比可以忽略不计,就可以满足大规模存储系统对可靠性的需求。
关键词:
可靠性;基于对象存储系统;并行文件系统
Abstract: A fast recover mechanism is built using distributed sparing. An analytic model for the distribution of the mean time utile data lost is constructed. This paper shows how to balance requirement for high data reliability against the overhead cost of redundant data. According to the reliability analysis, using m-n mechanisms under the fast recover mechanism can meet the need of reliability for large-scale storage system, if only n ≥m+2 and the time for data recovery computing can be ignored compared with the time for disk I/O.
Key words:
Reliability; Object-based storage system; Parallel file system
谈华芳,侯紫峰. 大规模并行文件系统中的数据可靠性机制[J]. 计算机工程, 2006, 32(9): 25-27.
TAN Huafang, HOU Zifeng. Reliability Mechanisms for Very Large Parallel File System[J]. Computer Engineering, 2006, 32(9): 25-27.