Loading web-font TeX/Main/Regular
Learning-Based Two-Tiered Online Optimization of Region-Wide Datacenter Resource Allocation | IEEE Journals & Magazine | IEEE Xplore

Learning-Based Two-Tiered Online Optimization of Region-Wide Datacenter Resource Allocation


Abstract:

Online optimization of resource management for large-scale data centers and infrastructures to meet dynamic capacity reservation demands and various practical constraints...Show More

Abstract:

Online optimization of resource management for large-scale data centers and infrastructures to meet dynamic capacity reservation demands and various practical constraints (e.g., feasibility and robustness) is a very challenging problem. Mixed Integer Programming (MIP) approaches suffer from recognized limitations in such a dynamic environment, while learning-based approaches may face with prohibitively large state/action spaces. To this end, this paper presents a novel two-tiered online optimization to enable a learning-based Resource Allowance System (RAS). To solve optimal server-to-reservation assignment in RAS in an online fashion, the proposed solution leverages a reinforcement learning (RL) agent to make high-level decisions, e.g., how much resource to select from the Main Switch Boards (MSBs), and then a low-level Mixed Integer Linear Programming (MILP) solver to generate the local server-to-reservation mapping, conditioned on the RL decisions. We take into account fault tolerance, server movement minimization, and network affinity requirements and apply the proposed solution to large-scale RAS problems. To provide interpretability, we further train a decision tree model to explain the learned policies and to prune unreasonable corner cases at the low-level MILP solver, resulting in further performance improvement. Extensive evaluations show that our two-tiered solution outperforms baselines such as pure MIP solver by over 15% while delivering 100\times speedup in computation.
Published in: IEEE Transactions on Network and Service Management ( Volume: 22, Issue: 1, February 2025)
Page(s): 572 - 581
Date of Publication: 21 October 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.