[1]KUSNEZOV D,BINKLEY S,HARROD B,et al.DOE exascale initiative[EB/OL].[2018-01-04].https://energy.gov/downloads/doe-exascale-initiative.
[2]KOGGE P,BERGMAN K,BORKAR S,et al.Exascale computing study:technology challenges in achieving exascale systems:TR 2008-13[R/OL].[2018-01-04].http://www.cse.nd.edu/Reports/2008/TR-2008-13.pdf.
[3]SCHROEDER B,GIBSON G A.A large-scale study of failures in high-performance computing systems[J].IEEE Transactions on Dependable and Secure Computing,2010,7(4):337-350.
[4]LIANG Yinglung,ZHANG Yanyong.Blue Gene/L failure analysis and prediction models[C]//Proceedings of the 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks.Washington D.C.,USA:IEEE Press,2006:425-434.
[5]ZHENG Ziming,YU Li,TANG Wei.Co-analysis of RAS log and job log on Blue Gene/P[C]//Proceedings of 2011 IEEE International Parallel and Distributed Processing Symposium.Washington D.C.,USA:IEEE Press,2011:840-851.
[6]NIE Bin,TIWARI D,GUPTA S,et al.A large-scale study of soft-errors on GPUs in the field[C]//Proceedings of 2016 IEEE International Symposium on High Performance Computer Architecture.Washington D.C.,USA:IEEE Press,2016:519-530.
[7]SRIDHARAN V,JSTEARLEY J,DEBARDELEBEN N.Feng shui of supercomputer memory:positional effects in DRAM and SRAM faults[C]//Proceedings of International Conference on High Performance Computing.Washington D.C.,USA:IEEE Press,2013:22-27.
[8]BASEMAN E,DE BARDELEBEN N,FERREIRA K,et al.Improving DRAM fault characterization through machine learning[C]//Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop.Washington D.C.,USA:IEEE Press,2016:1-5.
[9]HEIEN E,LAPINE D,KONDO D,et al.Modeling and tolerating heterogeneous failures in large parallel systems[C]//Proceedings of 2011 International Conference for High Performance Computing,Networking,Storage and Analysis.Washington D.C.,USA:IEEE Press,2011:45-52.
[10]SCHROEDER B,PINHEIRO E,WEBER W.DRAM errors in the wild:a large-scale field study[C]//Proceedings of the 11th International Joint Conference on Measurement and Modeling of Computer Systems.Washington D.C.,USA:IEEE Press,2009:193-204.
[11]王波,左德承,钱军,等.面向安藤架构的分层内存故障注入方法[J].计算机工程,2012,38(4):70-72.
[12]LIU Ruitao,CHEN Zuoning.A large-scale study of failures on petascale supercomputers[J].Journal of Computer Science and Technology,2018,33(1):24-41.
[13]张志华.可靠性理论及工程应用[M].北京:科学出版社,2012.
[14]薛毅,陈立萍.统计建模与R软件[M].北京:清华大学出版社,2007.
[15]盛骤,谢式千,潘承毅.概率论与数理统计[M].北京:高等教育出版社,2008.
[16]SRIKANT R,AGRAWAL R.Mining sequential patterns:generalizations and performance improvements[C]//Proceedings of the 5th International Conference on Extending Database Technology:Advances in Database Technology.Washington D.C.,USA:IEEE Press,2005:3-17. |