Task failure prediction in cloud computing systems

Publish Year: 1404
نوع سند: مقاله کنفرانسی
زبان: English
View: 17

This Paper With 9 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

ICIRT01_021

تاریخ نمایه سازی: 9 آذر 1404

Abstract:

As cloud data centers grow in scale and complexity, ensuring high service reliability and minimizing failures have become critical challenges. Despite technological advances, failures due to hardware and software issues persist, disrupting tasks, wasting resources, and impacting service reliability. Accurately predicting task or job failures before they occur is essential to reducing downtime and unnecessary resource usage. Traditional fault-tolerance methods like checkpointing and replication are insufficient for the complexity of modern systems. Consequently, machine learning and deep learning techniques have been adopted to analyze system logs and predict failures more accurately. Federated learning further enhances this by enabling decentralized data analysis across nodes, preserving privacy while improving prediction accuracy through collaborative learning. In this paper, we propose a fault prediction mechanism based on federated learning and a deep neural network to identify patterns leading to task failures. Our model achieved a high prediction accuracy of ۹۵.۳%, making it a robust solution for failure prediction in cloud computing environments.

Authors

Milad Mahdudi

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, IRAN

Pooya Jamshidi

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, IRAN

Shahpour Rahmani

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, IRAN

Nasser Yazdani

School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, IRAN