M.Tech Research: Thesis Defense: CDS: 04, April 2024 “Intelligent Methods for Cloud Workload Orchestration”

When

4 Apr 24    
4:00 PM - 5:00 PM

Event Type

DEPARTMENT OF COMPUTATIONAL AND DATA SCIENCES

M.Tech Research Thesis Defense


Speaker : Mr. Prathamesh Saraf Vinayak

S.R. Number : 06-18-01-10-22-21-1-19717
Title : “Intelligent Methods for Cloud Workload Orchestration.”
Research Supervisor :Dr. J. Lakshmi
Date & Time : April 04, 2024 (Thursday), 04:00 PM

Venue : The Thesis Défense will be held on MICROSOFT TEAMS

Please click on the following link to join the Thesis Defense:

MS Teams link


ABSTRACT

Cloud workload orchestration is pivotal in optimizing the performance, resource utilization, and cost-effectiveness of applications in data centers. As modern businesses and IT operations are migrating their businesses to the cloud, understanding the dynamics of cloud data centers has become indispensable. Often, two perspectives play a pivotal role in workload orchestration in data centers. One is from the cloud provider side, whose goal is to provision as many applications as possible on the available resources, biding to SLA constraints and increasing return on investment. Others are from the side of enterprises and individual customers, often referred to as end users, whose primary objective is to ensure application performance with a reduced deployment cost. Containerization has gained popularity for deploying applications on public clouds, where large enterprises manage numerous applications through thousands of containers placed onto Virtual Machines (VMs). While the need for cost-efficient placement in cloud data centers is undeniable, the complexities involved in achieving this goal cannot be understated. This problem is usually modeled as a multi-dimensional Vector Bin-packing Problem (VBP). Solving VBP optimally is NP-hard and practical solutions requiring real-time decisions use heuristics. This work explores the landscape of cloud data centers, emphasizing the significance of efficient bin packing in achieving optimal cost and resource utilization. Traditional methods, including heuristics and optimal algorithms, face limitations in handling continuous request arrivals and the dynamic nature of cloud workloads. Integer Linear Programming (ILP), which can provide optimal solutions for small problem sizes with tens of requests, may take minutes to hours to complete, even at such scales. Moreover, optimal algorithms inherently demand perfect knowledge of all current and future requests to be placed within the bins, rendering them unsuitable for the dynamic and often unpredictable online placement scenarios prevalent in cloud setups.

To address these challenges, this work introduces a novel approach to solving VBP through Reinforcement Learning (RL), trained on the historical container workload trace for an enterprise, a.k.a CARL (Cost-optimized container placement using Adversarial Reinforcement Learning). The proposed work evaluates the effectiveness of CARL in comparison to traditional methods. CARL leverages historical container workload traces, learning from a semi-optimal VBP solver while optimizing VM costs. The contributions of this research extend beyond traditional methods, providing insights into the advantages and disadvantages of heuristics, optimal algorithms, and learning approaches. We trained and evaluated CARL on workloads derived from realistic traces from Google Cloud and Alibaba for placing 10,000 container requests onto over 8000 VMs. CARL is fast, making placement decisions for request sets with 124 containers per second within 65ms onto 1000s of potential VMs. It is also efficient, achieving up to 13.98% lower VM costs than baseline heuristics for larger traces. To push the boundaries further, we use the Mixture of Experts (MoE) strategy in CARL, wherein we use multiple experts who help CARL learn placement policies of various approaches combined. Including an MoE strategy enhances CARL’s adaptability to changes in workload distribution, ensuring competitive performance in scenarios with skewed resource needs or inter-arrival times.


ALL ARE WELCOME