Resource Optimization in Large Language Model Deployment Using Reinforcement Learning and Adaptive Software Engineering

Parvaneh Asghari; Alireza Rahimipour Anaraki

Resource Optimization in Large Language Model Deployment Using Reinforcement Learning and Adaptive Software Engineering

Publish place: 21st International Conference on Innovation and Research in Engineering Sciences

Publish Year: 1404

نوع سند: مقاله کنفرانسی

زبان: English

This Paper With 5 Page And PDF Format Ready To Download

دریافت فایل کامل Paper

Certificate
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این Paper:

https://civilica.com/doc/2330175

شناسه ملی سند علمی:

ICIRES21_022

تاریخ نمایه سازی: 19 مرداد 1404

Abstract:

Large Language Models (LLMs) are extremely resource-intensive to deploy, demanding high memory and compute. Static provisioning often leads to waste or unmet demand. We propose a conceptual framework that uses reinforcement learning (RL) and self-adaptive software engineering to optimize resource use in LLM deployments. An RL agent monitors system metrics (throughput, latency, GPU/CPU utilization) and takes actions such as scaling instances, adjusting model precision, or modifying batch sizes. The system employs a Monitor-Analyze-Plan-Execute (MAPE-K) loop where dynamic configuration parameters are tuned online to maximize throughput and minimize cost. We illustrate the approach with examples: RL-driven autoscaling (showing ~۴۰–۵۰% higher GPU utilization) and adaptive inference optimizations like key-value caching (up to ۴× speedup). Real-world LLM deployments (cloud services and edge settings) exhibit highly variable workloads; our framework adapts to these changes. Experiments and industry reports show that RL-based adaptation can significantly improve resource efficiency and performance.

Keywords:

Large Language Models , LLMs , Reinforcement Learning , Adaptive Software Engineering , Resource Optimization , Dynamic Configuration , MAPE-K Loop , Inference Optimizations , Cloud Services , Edge Deployments

Authors

Parvaneh Asghari

Department of Computer Engineering, CT.C., Islamic Azad University, Tehran, Iran

Alireza Rahimipour Anaraki

Department of Computer Engineering, CT.C., Islamic Azad University, Tehran, Iran