Resource Optimization in Large Language Model Deployment Using Reinforcement Learning and Adaptive Software Engineering
Publish Year: 1404
نوع سند: مقاله کنفرانسی
زبان: English
View: 166
This Paper With 5 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
ICIRES21_022
تاریخ نمایه سازی: 19 مرداد 1404
Abstract:
Large Language Models (LLMs) are extremely resource-intensive to deploy, demanding high memory and compute. Static provisioning often leads to waste or unmet demand. We propose a conceptual framework that uses reinforcement learning (RL) and self-adaptive software engineering to optimize resource use in LLM deployments. An RL agent monitors system metrics (throughput, latency, GPU/CPU utilization) and takes actions such as scaling instances, adjusting model precision, or modifying batch sizes. The system employs a Monitor-Analyze-Plan-Execute (MAPE-K) loop where dynamic configuration parameters are tuned online to maximize throughput and minimize cost. We illustrate the approach with examples: RL-driven autoscaling (showing ~۴۰–۵۰% higher GPU utilization) and adaptive inference optimizations like key-value caching (up to ۴× speedup). Real-world LLM deployments (cloud services and edge settings) exhibit highly variable workloads; our framework adapts to these changes. Experiments and industry reports show that RL-based adaptation can significantly improve resource efficiency and performance.
Keywords:
Authors
Parvaneh Asghari
Department of Computer Engineering, CT.C., Islamic Azad University, Tehran, Iran
Alireza Rahimipour Anaraki
Department of Computer Engineering, CT.C., Islamic Azad University, Tehran, Iran