Resource Optimization in Large Language Model Deployment Using Reinforcement Learning and Adaptive Software Engineering

Publish Year: 1404
نوع سند: مقاله کنفرانسی
زبان: English
View: 166

This Paper With 5 Page And PDF Format Ready To Download

  • Certificate
  • من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

شناسه ملی سند علمی:

ICIRES21_022

تاریخ نمایه سازی: 19 مرداد 1404

Abstract:

Large Language Models (LLMs) are extremely resource-intensive to deploy, demanding high memory and compute. Static provisioning often leads to waste or unmet demand. We propose a conceptual framework that uses reinforcement learning (RL) and self-adaptive software engineering to optimize resource use in LLM deployments. An RL agent monitors system metrics (throughput, latency, GPU/CPU utilization) and takes actions such as scaling instances, adjusting model precision, or modifying batch sizes. The system employs a Monitor-Analyze-Plan-Execute (MAPE-K) loop where dynamic configuration parameters are tuned online to maximize throughput and minimize cost. We illustrate the approach with examples: RL-driven autoscaling (showing ~۴۰–۵۰% higher GPU utilization) and adaptive inference optimizations like key-value caching (up to ۴× speedup). Real-world LLM deployments (cloud services and edge settings) exhibit highly variable workloads; our framework adapts to these changes. Experiments and industry reports show that RL-based adaptation can significantly improve resource efficiency and performance.

Authors

Parvaneh Asghari

Department of Computer Engineering, CT.C., Islamic Azad University, Tehran, Iran

Alireza Rahimipour Anaraki

Department of Computer Engineering, CT.C., Islamic Azad University, Tehran, Iran