An Approach of Algorithm Based Fault Tolerance for High Performance Computing Systems

Publish Year: 1390
نوع سند: مقاله کنفرانسی
زبان: English
View: 1,357

This Paper With 9 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

SASTECH05_116

تاریخ نمایه سازی: 22 مرداد 1391

Abstract:

We present a new approach to algorithm based fault tolerance (ABFT) for High Performance Computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of faults, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. We have implemented a systematic procedure for introducing structured redundancy into ABFT. Algorithm Based Fault Tolerance has been recommending as a cost-effective concurrent error detection scheme. It proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. To that end, a matrix-based model has been developed and, based on that, algorithms for both the design and analysis of ABFT systems are formulated

Keywords:

Algorithm Based Fault Tolerance (ABFT) , Checkpointing , Error Correction , Matrix operations

Authors

H Hamidi

Islamic Azad University -Doroud Branch

مراجع و منابع این Paper:

لیست زیر مراجع و منابع استفاده شده در این Paper را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود Paper لینک شده اند :
  • Acree, R. K. and Nasr Ullah, Karia, A. and J. ...
  • Ashouei , M. and Chatterjee, A.(October 2009) _ Che cksum-based ...
  • Banerjee, P., and Rahmeh, P. J. T. and Stunkel, C. ...
  • Baylis, J. (1998.) Error- Correcting Codes: A Mathematical Introduction, Chapman ...
  • Biernat, J. (2010), Fast fault-tolerant adders, International Journal Critical Comp ...
  • Chen, Z. and Dongarra, J.(Apr. 2008) _ Al gorithm-Based Fault ...
  • Elnozahy, E. N. and Johnson, D. _ and Zwaenep oel, ...
  • Hadjicostis, C. N. (1999), Coding Approaches to Fault Tolerance in ...
  • Hadjicostis, C. N. (2002), Coding Approaches to Fault Tolerance in ...
  • Hadjicostis , C. N. and Verghese, G. C. (January 2002). ...
  • Hakkarinen , D. and Chen, Z. "Algorithmic Cholesky Factorization Fault ...
  • Huang, K. H. and Abraham, J. A, (1984), Algorithm -based ...
  • Jou, J. Y. and Abraham, J. A., (May 1986), Fault-tolerant ...
  • Jou, J. Y. and Abraham, J. A.(May 1988) , Fault-tolerant ...
  • Kim, Y.(June 1996) _ Fault Tolerant Matrix Operations for Parallel ...
  • "sAsrech 201 1, Khavaran Higher-education Institute, Mashhad, Iran. May 12-14. ...
  • Moosavie Nia, A.and Mohammadi, K. (2007): A Generalized ABFT Technique ...
  • Nair, V. S. S, and. Abraham, J. A, (April, 1990), ...
  • Plank, J. S. and Kim, Y. and Dongarra , J.(June ...
  • Plank, J. S. and Li. Ickp, K. (Summer 1994), A ...
  • Salfiner, and Lenk, M and Malek (March 2010) _ _ ...
  • نمایش کامل مراجع