I. Introduction
The growing software-as-a-service demand has increased the pressure on warehouse infrastructures to support more robust cloud services, which involve applications from many domains, such as machine learning, biomedical, and video/audio processing. In such clouds, the workloads are highly heterogeneous and often result from requests from different clients. In such systems, the main challenge lies in providing the service with the lowest possible latency and wisely using the available resources while exploiting Request-Level Parallelism (RLP).