Optimal Resource Allocation Using Genetic Algorithm in Container-Based Heterogeneous Cloud

This paper tackles the complex problem of optimizing resource configuration for microservice management in heterogeneous cloud environments. To address this challenge, an enhanced framework, the multi-objective microservice allocation (MOMA) algorithm, is developed to formulate the efficient resource management of cloud microservice resources as a constrained optimization problem, guided by resource utilization and network communication overhead, which are two important factors in microservice resource allocation. The proposed framework simplifies the deployment of cloud services and streamlines workload monitoring and analysis within a diverse cloud system. A comprehensive comparison is made between the effectiveness of the proposed algorithm and existing algorithms on real-world datasets, with a focus on resource balancing, network overhead, and network reliability. Experimental results reveal that the proposed algorithm significantly enhances resource utilization, reduces network transmission overhead, and improves reliability.


I. INTRODUCTION
In recent years, with the rise of microservices architecture for breaking large-scale applications down into smaller independent components, microservice applications invoke numerous internal microservices to construct responses.For instance, a container is a typical example that meets the requirements of a microservices architecture.By using containers, developers can focus on service development via operating system virtualization.Since docker is one of the most successful container frameworks [1], providing independent execution environments with isolated file systems, portability, and superior resource utilization compared to virtual machines [2], it has become an important technology in current microservices.Examples of container orchestration platforms that offer automated deployment include Docker Swarm, Apache Mesos, and Google Kubernetes [3].
The associate editor coordinating the review of this manuscript and approving it for publication was Huaqing Li .
Despite the rapid technological development of microservices architecture, there are still many tasks to be tackled.For example, the default resource allocation method in Kubernetes only aims at physical resource utilization [4] and does not address the costs and reliability of network transmission.Moreover, reliability is a critical issue in cloud service environments, which has been specifically addressed in [5] and [6].Given that the approaches of the existing works mainly operate across homogeneous clouds, handling resource heterogeneity within multi-cluster environments may pose even more problems.Accordingly, in view of the characteristics of microservices, monitoring all service components and their interactions can be complex.Monitoring metrices, resource metrics (e.g., CPU and memory utilization) and platform metrics (e.g., number of requests per second, distribution of time required for each request, and average execution time for the queries), may be built for individual services, which provide visibility into the distributed system for evaluating the application's performance [7].In this work, we use resource utilization as the optimization goal.Thus, resource metrics, CPU utilization and memory consumption, are applied for effectively managing the resources and further enhancing both the performance and service reliability.
With recognized as an NP-hard problem for container resource allocation [8], finding polynomial-time complexity algorithms remains an open issue.Many researchers turn to meta-heuristic algorithms to obtain optimal solutions for these resource allocation problems.In various heuristic algorithms, each possesses its own set of strengths and weaknesses.By employing contextual analysis, we can determine the most suitable algorithm for a given scenario [9].For instance, [10] compares different algorithms to propose the most fitting one for a specific context and acquires results through iterative experimentation.Genetic algorithms (GA) are considered effective in addressing such problems [11], and Non-Dominated Sorting Genetic Algorithm II (NSGA-II) is one of the most widely used genetic algorithms [12].Accordingly, further research is needed to address these concerns and advance the field.
In this study, an elitism-based genetic algorithm, the multi-objective microservice allocation (MOMA) algorithm, is developed and utilized to determine the optimal placement of microservices within the cluster, taking into account the current state of the cluster and the microservices themselves, where a cluster is a group of servers or nodes, which participate in workload management.The proposed framework aims to facilitate the placement of workloads into the Kubernetes cluster, which may consist of a PC and physical computer systems (e.g., Raspberry Pi and an NVIDIA Jetson Nano).This operation involves considering factors such as resource balancing, inter-dependencies among microservices, network characteristics, and performance requirements.By analyzing these factors, the system can devise an effective distribution strategy that ensures efficient resource utilization for the microservices.Thus, the goal is to find the best possible arrangement that maximizes the overall system performance and minimizes any potential bottlenecks or resource constraints.
The main contributions and features of this study are as follows: 1) This work addresses the heterogeneity challenge of microservice resource allocation by developing an enhanced GA algorithm with two objective models, considering resource utilization and network communication overhead, and the empirical parameter settings via real-world data, for optimizing resource management in heterogeneous cloud environments.
2) The proposed framework tackles the heterogeneity aspects of cluster monitoring and resource selection, simplifies the deployment of cloud services, and streamlines workload management and analysis within a diverse cloud system.3) A comprehensive evaluation of the proposed framework is presented on real-world datasets.The experimental results show that the proposed framework outperforms existing methods in terms of resource balancing, network overhead, and network reliability.The organization of this paper is as follows: Section II reviews related works about cluster resource allocation and multi-objective optimization.Section III presents the proposed system architecture and workload analysis framework.Section IV describes a customized multi-objective optimization model for evaluating the heterogeneous system performance.Section V examines the framework characteristics and presents a performance comparison between the proposed scheme and the existing works.Finally, Section VIII draws conclusions and outlines future research directions.

II. RELATED WORKS
This section reviews related works about cluster resource allocation and multi-objective optimization in various cloud environments.

B. CLOUD ENVIRONMENTS 1) SINGLE-CLOUD SCENARIO
A large portion of literature focuses on issues related to single-cloud scenarios.Fu et al. [21] use a PSO algorithm to allocate resources and improve efficiency in a single-cloud environment.Abdallah et al. [22] utilize SA algorithm and tabu search to emphasize fair allocation procedures of multiple resource types.Liu et al. [30] propose a multi-objective optimization container scheduling algorithm that considers five criteria to select the most suitable node for deployment.Kaewkasi et al. [31] develop a new Docker scheduler and use the ACO algorithm to balance resources.Gupta et al. [32] implement enhanced algorithms (e.g., Max-Min and Greedy) for load balancing in cloud environments.Qiu et al. [33], Guo et al. [34], Li [35], and Ali [36] adopt machine learning models (e.g., reinforcement learning and transfer learning) as the main algorithms for service deployment, microservice selection, optimizing the delay, and reducing deployment cost with a fixed service set.However, due to the issue of model retraining, theses algorithms may not be suitable for a microservices system with new services within short execution time.

2) MULTI-CLOUD SCENARIO
In the context of multi-cloud environments, [24], [25], [26] employ NSGA-II to address application availability and energy consumption requirements in container-based clouds.Han et al. [37] propose a Greedy algorithm for optimizing microservice placement across multiple Kubernetes clusters.They also introduce an empirical analysis framework to provide systematic and reliable measurement data.Frincu et al. [38] utilize a GA algorithm to achieve high availability and fault tolerance for applications.In [39], the monitoring of multi-cloud services is discussed and implemented via Prometheus and Grafana.Moreover, Lee et al. [40] propose a hierarchical monitoring framework for multi-cloud environments that takes workloads into account but does not specifically address heterogeneity.

3) HETEROGENEOUS CLOUD SCENARIO
Instead of considering single and multi-cloud scenarios, Rocha et al. [41] address the importance of heterogeneous clusters in the cloud and propose an algorithm for accessing heterogeneous resources, which significantly reduces energy consumption and runtime.Ali et al. [42] present an improved NSGA-II algorithm for minimizing range and total cost in heterogeneous environments.Hasan [43] introduces a resource monitoring framework for heterogeneous clusters without detailed consideration of workloads.
In this work, we further extend the scope of the resource management in a scenario of multiple heterogeneous clouds, focusing on microservice placement and resource allocation.We summarize these findings in Table 1, including different cloud architectures, resource management approaches, objective models, and corresponding algorithms.Here we examine the problem background and the contributions of the proposed framework from two different perspectives.
From the algorithm perspective, the default algorithm in Kubernetes considers too few factors, focusing solely on resource allocation balance without applying optimization strategies.The Greedy algorithm (e.g., [37]) is applied to simplify calculations and achieve a local optimal solution, which may fail to deal with the multifaceted considerations we aim for in multi-cluster algorithms in the cloud.Moreover, the existing GA-based algorithms (e.g., [24], [25], [26], [41], [42]) only focus on resource management in multiple clouds or heterogeneous clusters in the cloud.
Therefore, upon addressing the problems of lacking in heterogeneous requirements and optimization strategies for container management, the proposed enhanced framework allows for a broader exploration of resource management considerations.Thus, this study focuses on addressing this gap and proposes a microservice placement and workflow scheduling approach along with resource allocation strategies tailored for heterogeneous environments.The proposed approach is further analyzed based on two pivotal factors: resource utilization and network communication overhead, for tackling these aspects of heterogeneity in cluster monitoring and resource selection.

III. FRAMEWORK A. SYSTEM MODEL
Referring to the microservice placement framework in [37], we propose a novel system framework.Derived from empirical analysis, the proposed framework provides several improvements with respect to throughput, latency, and distribution strategies for microservices, which are depicted in Figure 1.The proposed framework is divided into four main components: the Monitoring Unit, the Data Analysis Unit, the Optimization Algorithm Placement Unit, and the Kubernetes Management Unit.Through the interactions of these units, the framework facilitates the placement of workloads into the Kubernetes cluster.The four components are described as follows:

1) MONITORING UNIT
It monitors the resource usage of the cluster, keeping track of metrics like CPU utilization and memory consumption.This information helps in managing and optimizing resource allocation.Moreover, the system collects performance data of microservices, including metrics such as latency and throughput, which enables performance evaluation and identification of bottlenecks for further optimization.By monitoring these aspects, the system helps maintain the stability, performance, and overall health of the cluster environment.

2) DATA ANALYSIS UNIT
The collected monitoring data undergoes comprehensive analysis to evaluate the state of the cluster and assess the performance of microservices.This analysis involves examining various metrics, such as resource utilization, response time, and throughput.By analyzing this data, valuable insights can be gained regarding the efficiency and effectiveness of the cluster and its microservices.The analyzed data is then stored for further use by other components or units within the system, enabling informed decision-making, optimization of resource allocation, and performance enhancements.

3) PLACEMENT OPTIMIZATION UNIT
The optimization algorithms are utilized to determine the optimal placement of microservices within the cluster, taking into account the current state of the cluster and the microservices themselves.This involves considering factors such as resource balancing, inter-dependencies among microservices, and performance requirements.By analyzing these factors, the system can devise an effective distribution strategy that ensures efficient resource utilization for the microservices.The goal is to find the best possible arrangement that maximizes the overall system performance and minimizes any potential bottlenecks or resource constraints.

4) KUBERNETES MANAGEMENT UNIT
The resource management system interacts with the cluster through the Kubernetes API, enabling it to perform various tasks.By leveraging the Kubernetes infrastructure, the system ensures efficient deployment of the workload by assigning the microservices to the appropriate nodes within the cluster.

B. FRAMEWORK WORKFLOW
Figure 2 illustrates the overall workflow of the three-stage framework and explains how it communicates with users and the cluster.In Stage one, the user sends a request that is received by the Kubernetes Management Unit within the framework, where the Kubernetes Management Unit is responsible for deploying the application to the selected cluster.Next, the Monitoring Unit is utilized to monitor the workload and gather information about resource utilization and microservices performance within the cluster.The collected data are then stored using persistent volume.
In Stage two, the stored measurement data are passed to the Data Analysis Unit for analyzing the captured values and deriving stable workload results, which are then stored in the Analysis Database.This procedure is repeated for each microservice within the application, ensuring completion for all microservices.In Stage three, after organizing the analyzed data in the database, they are passed to the Placement Optimization Unit, which executes the designed algorithm for determining an approximate optimal placement strategy.The results of the algorithm execution provide insights into the placement of microservices.Subsequently, the Kubernetes Management Unit is used to deploy the microservices onto the cluster.Finally, the results can be applied for strengthening the monitoring and evaluation of microservice management.

C. APPLICATION TYPE
Since cloud-native technologies empower organizations to build and execute scalable applications in a modern, dynamic environment, including public, private, and hybrid clouds, in this work, we consider a cloud service that can operate for both edges and clouds.We establish three cloud environment services to represent the heterogeneous environment, as illustrated in Figure 3.

1) KUBEFLOW APPLICATION
Kubeflow is a model development platform, built on top of Kubernetes, which provides all the necessary tools for developing models and leverages Kubernetes to achieve flexible control over resources and networking.During the execution of the sample program, we utilize the Chicago Taxi Trips dataset, which is included in Kubeflow's builtin test dataset.We adopt the Xgboost demo example in Kubeflow for training.The process involves various units such as data preprocessing, model training, prediction, data normalization, and data validation.

2) SOCK SHOP APPLICATION
The application is a well known microservices application, widely used in demonstration and testing of microservice environments such as Kubernetes.It is built using Spring Boot, Go kit and Node.js and is packaged in Docker containers.We use Locust to conduct HTTP workload testing and simulate the performance of the store application under real-world usage scenarios.

3) EDGEX FOUNDRY APPLICATION
EdgeX is used to provide an open-source platform for industrial-grade edge computing in the Internet of Things (IoT) domain.We use Raspberry Pi 4 as the edge device, combined with DHT-11 sensor, to collect temperature and humidity data as an example for IoT services.
The three mentioned applications serve as deployable applications in a heterogeneous hybrid cloud environment, where these components are often used as reference points for testing.By integrating the proposed framework with these applications, it opens up greater possibilities for future adoption and promotion of cloud-native services, which allows us to make significant advancements in utilizing and promoting cloud-native services.

IV. THE OPTIMIZATION MODEL
This section provides an overview of the optimization model, integrating the objectives of establishing load-balancing cluster environments and reducing network transmission overhead for reliable microservice communication specified by the problem model.The notations and descriptions are summarized in Table 2.

A. PROBLEM MODEL 1) OBJECTIVE 1: MAXIMUM RESOURCE UTILIZATION
The problem model aims to balance the resources in multiple clusters, which is referred to as the multi-resource load balancing problem.To tackle this, we adopt the ''server's dominant load'' method to maximize the load of all resource types [44].To avoid significant disparities, the proportional values between loads in each cluster and its associated nodes are calculated and normalized with the standard deviations σ ℓ,1 and σ ℓ,2 as scalar factors for the memory and CPU resources of cluster ℓ, respectively.Comparing to the similar model in [45], the proposed load balancing model achieves a more even distribution of resources in heterogeneous multicluster systems, which yields where i is the microservice index, ℓ is the cluster index, j is the node index, c i represents the number of Kubernetes clusters of the ith microservice, n ℓ represents the total number of physical nodes in the ℓth cluster, and m is the total number of microservices comprising the application.Note that mem req i , mem res ℓ j , cpu req i , and cpu res ℓ j are resource elements as described in Table 2.

2) OBJECTIVE 2: REDUCING COMMUNICATION OVERHEADS
To improve data availability in edge computing, the importance of retransmission mechanisms in the context of IoT and edge computing is emphasized [46], [47].Therefore, we take this aspect into account and design a model that focuses on the impact of retransmission mechanism in heterogeneous clouds, which is given by where , and are network elements as described in Table 2.Here Fail represents the probability or frequency at which the node may experience a failure or become unavailable by the microservice.Distance represents the network distance between two nodes by the microservice.Interaction represents the total volume of data transmission for sending and receiving operations between two nodes by the microservice.

B. MULTI-OBJECTIVE MICROSERVICE ALLOCATION MODEL
Based on the aforementioned problem model, an allocation model aiming at optimizing two objectives is designed to fulfill the requirements through the following constraints.
Equations ( 3) and ( 4) represent two optimization objectives: minimizing the maximum resource utilization on multiple cluster physical nodes and minimizing network transmission overhead for reliable microservice communication.Note that equations ( 5) and ( 6) represent the constraints of the model, which ensure that the resources of the microservices can be fully allocated on a single node.Therefore, constraints are imposed specifically for this aspect.
As we know, multiple-objective problems are difficult to find exact solutions and often require searching for optimal approximate solutions.With the parallel computing capability and scalability for efficiently solving complex and challenging problems, GA algorithms have been widely used.However, GA algorithms may suffer from the issue of easily getting trapped in local optima.Therefore, based on the NSGA-II algorithm, we enhance its adaptability for the domain-specific problem via the proposed multi-objective microservice allocation (MOMA) algorithm, as detailed in Section V.

V. MOMA ALGORITHM DESIGN
This section explains the design principles of the proposed MOMA algorithm, which aims to provide better resource allocation for a container-based heterogeneous cloud.The proposed MOMA algorithm defines chromosome representation, usage of crossover operators, mutation operator methods, parameter settings, and algorithm flow.Since the quality of a GA algorithm is greatly influenced by the definition of each component, in the following subsections, the overall algorithm structure is described and the operation procedures are summarized in Algorithm 1.

A. REPRESENTATION
When using a GA algorithm to solve a problem, it is essential to analyze the problem for determining the decision variables (i.e., the genes) [48].After encoding the genes through a series of processes, we refer to them as chromosomes, which typically consist of individual genes.To manipulate and optimize the chromosomes, the binary encoding scheme [49] is used to ultimately find the optimal or near-optimal solutions.Thus, we define a microservice list based on different allocations, which represents the assignment of containers to various workers in a cluster with implementing microservices.The workers consist of a heterogeneous combination of general-purpose computers and edge devices.Figure 4 shows a typical run of a microservice list based on different allocations, which represents the assignment of containers to various workers in a cluster with implementing microservices.The workers consist of a heterogeneous combination of general-purpose computers and edge devices.Note that ms i represents the ith microservice and 1-2 represents performing the microservice via node 2 in cluster 1.

B. CROSSOVER
During the crossover process, genes are randomly paired from replicated genetic material with the pairing methods, including the single-point crossover, two-point crossover, uniform crossover, mask-arithmetical crossover, and simulated binary crossover (SBX) [50], [51].However, not every individual is required to mate in each generation, which leads to the introduction of a crossover probability, Prob crossover .
The SBX operator primarily aims to emulate the characteristics of single-point crossover in binary-encoded chromosomes.When applying the SBX, assuming two parent individuals (P 1 and P 2 ), two offspring individuals (Q 1 and Q 2 ) can be generated using the SBX operator, which are Q 1 = 0.5((P 1 + P 2 ) − β(P 2 − P 1 )), Q 2 = 0.5((P 1 + P 2 ) + β(P 2 − P 1 )), (7) where β depends on the random number u, as shown by equation ( 8) u is a random number between 0 and 1, and η is a constant representing the distribution index.The value of η is set to a commonly used value of 10.When η has a larger value, the offspring will be more inclined to resemble their parents.Given two parent individuals (P 1 and P 2 ) and referring to the SBX operator described in equations ( 7) and (8), Figure 5 shows the generation process of two offspring individuals (Q 1 and Q 2 ).

C. MUTATION
Mutation is a method of changing the genetic genes of offspring with a certain probability to prevent from falling  into the local optimal solution and to maintain genetic diversity.Referring to [48], the displacement-based operators, the insertion and deletion operators, are applied on newly generated individuals.
To maintain genetic diversity, the insertion and deletion operations are point mutations that insert or delete a gene in a DNA sequence.Based on the chromosome representation in Figure 4, for the insertion mutation operation, it adds variations, where values are randomly added to the distribution list.As shown in Figure 6(a), the microservice ms 7 is newly arranged to be performed via node 2 in cluster 2. For the deletion mutation operation, it randomly deletes values from the allocation list, where Figure 6(b) depicts the task deletion of performing the microservice ms 1 via node 2 in cluster 2. Accordingly, as shown in Figure 6, the manifest can be scaled to make mutations more flexible.

D. ALGORITHM PARAMETERS
The parameter configuration is a crucial aspect in the GA algorithm.Referring to the empirical parameter settings in existing studies, in this work we derive the parameter settings through empirical analysis, including population size, offspring size, crossover rate, mutation rate, mutation type, and termination criterion.When adjusting the population size and offspring size, the number of individuals ranged from 50 to 500 in increments of 50.Through experimentation, a size of 200 is determined as the optimal value.For the crossover rate, a suggested setting of 0.5 is adopted.In terms of mutation, two types operations are considered, decrease mutation and increase mutation, with probabilities set at 0.5 each.This configuration is found to be suitable for our experimental environment.As for the termination criterion, experiments are conducted at intervals of 1000, and it is observed that the value of termination criterion 25000 provides the best parameter configuration.Table 3 summarizes the parameters derived from all the evaluated experiments.

E. ALGORITHM DESIGN
In the algorithm workflow, the parameters (i.e., cluster information, microservice information, and predefined problem model parameters and algorithm parameters) are fed into the system.The workflows and the inputs/outputs of the proposed MOMA algorithm are briefly described in Algorithm 1.In Step 1, we initialize the individuals by creating a population P. In Step 2, the algorithm's operations are executed until reaching the termination criterion.Then, the algorithm is performed based on the predefined population size.Next, the Binary Tournament Selection method is applied to select two parents and offspring is further generated by the SBX method.Consequently, we determine if mutation is required, and if so, we apply mutation to the two offspring.After performing the mutation, we combine these two offspring to form a new generation, sort the parents and offspring, and then calculate the crowding distance to measure how close an individual is to its neighbors.Accordingly, we set the next generation of individuals P. In Step 3, the final output is the ultimate result (i.e., the Pareto front).

VI. EXPERIMENTAL SETTINGS AND ANALYSIS
This section describes the experimental environment, examines the workload distribution for each application.Figure 7 depicts the cluster structure for generating experimental data on the heterogeneous cluster with workload, where a private repository is set up to store images, allowing us to easily perform local pulls.

A. EXPERIMENTAL ENVIRONMENT
To set up the cluster nodes, we use Ubuntu 20.04 and partition the disk adequately to meet the requirements of cloud  architecture, including backup and mount areas.We then install the NVIDIA driver, CUDA, and cuDNN on the system.Once these steps are completed, we proceed to combine the heterogeneous systems by using Kubeadm.We utilize Helm to install Prometheus and Grafana (Figure 8) for monitoring the cluster and visualizing its current status.Additionally, we install DCGM to specifically monitor GPU resource usage.
Referring to Figure 7, three different types of clusters are established.First, for the primary of each cluster, we use the local PC to create virtual machines (VM) for the purpose of controls.The PC specifications are as follows: Intel Core i9-10900KF CPU @ 3.7GHz with 20 cores, 256GB of Micron Crucial PRO DDR4 2666GHz RAM, and an NVIDIA RTX 3080 GPU.The VM specifications for the primaries follow the official recommendations of a minimum of 2 cores and 4GB RAM.In the first cluster,   the configuration of the first node consists of an Intel X(R) Silver 4110 CPU @ 2.1GHz with 32 cores (2 CPUs), 64GB of Samsung DDR4 2933MHz RAM (4 modules), and 2 NVIDIA GeForce RTX 2080 Ti GPUs.Additionally, we add a Raspberry Pi to create a different architecture.In the second cluster, the configuration of the first node includes an Intel i7-12700 CPU @ 4.9GHz with 24 cores, 32GB of Micron Crucial DDR4 3200MHz RAM (2 modules), and an NVIDIA GeForce RTX 2060 GPU.It also includes a Raspberry Pi.Finally, in the third cluster, the configuration of the first node is the same as the second cluster, with an Intel i7-12700 CPU @ 4.9GHz with 24 cores, 32GB of Micron Crucial DDR4 3200MHz RAM (2 modules).Instead of adding a Raspberry Pi, here an NVIDIA Jetson Nano 4GB is applied to generate architectural diversity.
After setting up the clusters, the proposed framework is utilized to deploy workloads across the three heterogeneous cloud applications, which involves selecting the appropriate deployment strategies and configurations tailored to each application's specific requirements and characteristics.By leveraging our framework, we can effectively distribute and manage the workloads, optimizing performance and resource utilization across the heterogeneous cloud environment.This paper integrates the jMetal framework [52] and a generic framework based on metaheuristic multi-objective optimization for implementing and validating the proposed algorithm.However, to better address the challenges of multi-objective optimization problem, we utilize a modified version of the jMetalPy framework [53] for the algorithm development.To determine the quality of solutions, we employ the hypervolume metric, calculated using the pygmo framework [54], which is particularly advantageous for large scale parallel environments.By evaluating the hypervolume values, we can assess the dispersion in solutions and ensure the retention of a superior Pareto front.

B. EMPIRICAL ANALYSIS OF RESOURCE REQUIREMENT
This subsection explains the generation of measurement data, primarily within the Monitoring Unit.We utilize the Prometheus system to monitor nodes and display the results through Grafana.This setup enables a continuous integration and continuous deployment (CI/ CD) process, allowing us to complete the entire data collection workflow in this manner.Figure 9 showcases the data results displayed in Grafana after Prometheus measures the metrics.
In the experiment, we use Locust to request and distribute the workload for each application (Figure 10).Shell scripts are applied to send 1 to 10 user requests per second continuously for 450 seconds.Thus, the interval time between each microservice is subject to a uniform arrival distribution, which is applied for empirical analysis of resource requirement in this work.Consequently, 20 iterations of the results are collected as the data for analysis.Note that here we only present the data related to the resources.For instance, as for the GPU part, Kubernetes does not provide specific deployment options for GPUs.Therefore, we monitor GPU usage, but do not take any specific actions regarding it.
Tables 4 and 5 present the resource requirements with respect to the number of users, ranging from 1 to 10.The CPU and Memory columns represent the analysis results obtained from monitoring and measurement, measured in Cores and Mbytes, respectively.The Network Interaction column represents the total amount of network communication (send/receive) per second.

VII. PERFORMANCE EVALUATION
To assess the effectiveness of the proposed resource allocation scheme, the performances of three related algorithms (i.e., Multiopt algorithm [24], Greedy-based heuristic algorithm [37], and Kubernetes default algorithm [55]) are compared and contrasted in three distinct cloud environments with respect to resource utilization, network communication overheads, and reliability.The experiments are conducted using the data measured by the framework with user requests, ranging from 1 to 10 at an interval of 1.

A. SYSTEM PERFORMANCE
This subsection explores the characteristics of the proposed system.Assume the failure rate Fail of the multi-heterogeneous clusters is in a given range, which yields Fail = [0.01,0.03] [56].In a multi-heterogeneous cluster, there may be variations in the performance (e.g., reliability and failure rates) among different nodes, which are the factors we need to consider in resource allocation and load management.Similarly, in the context of multiheterogeneous networks, we set the distance in a given range, which is Distance = [1.0,4.0].Notice that the variation in distance is an important consideration in network communication and transmission overheads, especially in scenarios with diverse node characteristics and geographic locations.
To further depict the performance efficiency, a test is conducted with varying the percentage of the number of failed requests to the total number of user requests (e.g., 1%, 1.5%, 2%, 2.5%, and 3%, respectively).and executed on three different microservices.The results are shown in Figures 11(a), 11(b), and 11(c).It can be observed that the overall number of failed requests increases rapidly when the percentage of the number of failed requests to the total number of user requests exceeds 2.5%, which effectively demonstrates the system usability with respect to the number of failed requests.Through this analysis, we can gain insights into the overall system performance and  identify its condition, which may further enhance the system capabilities.

B. COMPARATIVE PERFORMANCE ANALYSIS
This subsection presents a comparative analysis of the effectiveness of the MOMA algorithm, comparing to Multiopt algorithm [24], Greedy-based heuristic algorithm [37], and Kubernetes default algorithm [55].The key characteristics of these algorithms are outlined as follows.
For the Kubernetes default algorithm [55], similar to the Binpack algorithm used in Docker Swarm [57], it seeks to allocate nodes with the lowest resource utilization, which sorts the nodes based on their available resources and assigns pods to nodes with the lowest resource utilization.For the multiopt algorithm [24], it considers CPU and memory usage of every node, the association between containers and nodes, and the clustering of containers, where these objectives align with the goals considered in the proposed algorithm as well.VOLUME 12, 2024 For the greedy-based heuristic algorithm [37], it aims to place all microservices in the same cluster by selecting the target cluster and then prioritizing the placement based on the high interaction value among the microservices.After examining the characteristics of these algorithms, they will be examined from the perspectives of resource utilization, network communication overhead, and reliability.12(e), and 12(f) respectively indicate the utilization of memory resources.We observe that the standard deviation of CPU and memory usage for Kubeflow is lowest compared to EdgeX Foundry and Socks Shop.This is primarily because Kubeflow experiences a higher workload, requiring more significant resource consumption.As a result, the standard deviation is lower due to the consistent and substantial resource utilization demands placed on the system, distinguishing it from EdgeX Foundry and Socks Shop.Moreover, it is evident that regardless of the application, the proposed algorithm consistently exhibits the lowest resource utilization, indicating greater efficiency in resource consumption.From a statistical perspective, the overall standard deviation of a cluster σ cluster can be obtained by combining the two standard deviations of cpu and memory, σ cpu and σ mem , which is given by Referring to equation ( 9), Figures 13(a), 13(b), and 13(c) show that the greedy algorithm [37] has the largest standard deviation among all the algorithms in terms of resource utilization.This is because it focuses more on the interaction of microservices and does not prioritize and balance the cluster workloads.In Multiopt [24], the performance is similar to the default Kubernetes algorithm [55] due to the consideration of CPU and memory utilization for placement.However, none of these algorithms take into account the heterogeneity of the architecture, so their performance is slightly inferior to the proposed algorithm.
Observe that for the EdgeX Foundry application, given a smaller number of user requests, say 2, the Multiopt and the proposed MOMA algorithms respectively realizes 5.6% and 11.1% improvement of cluster resource utilization with respect to the default Kubernetes algorithm.In contrast, for a larger number of user requests, say 8, only the proposed MOMA algorithms realizes 7.0% improvement of cluster resource utilization with respect to the default Kubernetes algorithm.Similarly, for the Socks Shop application, with the number of user requests equal to 2, the Multiopt and the proposed MOMA algorithms respectively realizes 6.5% and 11.3% improvement of cluster resource utilization with respect to the default Kubernetes algorithm.In contrast, for the number of user requests equal to 8, the Multiopt and the proposed MOMA algorithms respectively realizes 8.3% and 10.0% improvement of cluster resource utilization with respect to the default Kubernetes algorithm.Furthermore, for the Kubeflow application, with the number of user requests equal to 2, the Multiopt and the proposed MOMA algorithms respectively realizes 4.7% and 7.0% improvement of cluster resource utilization with respect to the default Kubernetes algorithm.In contrast, for the number of user requests equal to 8, the Multiopt and the proposed MOMA algorithms respectively realizes 2.6% and 5.3% improvement of cluster resource utilization with respect to the default Kubernetes algorithm.

2) NETWORK COMMUNICATION OVERHEAD
Given different number of user request scenarios, Figure 14 shows that with a smaller number of user requests (e.g., less than or equal to 3), the four algorithms have similar performance of the network communication overhead for these three applications.However, with a larger number of user requests (e.g., larger than 3), the default Kubernetes algorithm [55] contributes the largest number of communication overhead due to the only consideration of resource aspect.Moreover, Multiopt [24] attempts to place related containers together but does not consider the placement order, while Greedy [37] places all containers in the same cluster, significantly reducing network transmission costs and approaching the performance of the proposed MOMA method.
In the three different applications, as shown in    network demands, involving data creation and transmission to databases and employing socket-based communication mechanisms.In contrast, as shown in Figure 14(c), Kubeflow primarily tests data through standard databases and does not require redundant data fetching.Therefore, the overhead in Kubeflow is not as high when compared to EdgeX Foundry and Socks Shop.

3) RELIABILITY
Due to the neglect of the node failure rate, the default Kubernetes [55], Multiopt [24] and Greedy [37] algorithms obtain degraded performances compared with that of the proposed algorithm.However, observe that as shown in Figures 15(a), 15(b), and 15(c), Greedy's approach of placing the majority of services in the same cluster may mitigate the impact of the failure rate to some extent.Nevertheless, as the user requests increase (e.g., greater than 6), there is still an issue of increased failure rates.As user requests increase, the number of failures gradually becomes more challenging to control.
Different applications exhibit varying proportions of failures.For instance, in the case of Kubeflow, where training units require substantial resources, a failed unit creation can result in significant cascading errors, leading to a higher rate of failures.In EdgeX Foundry, the high interdependency among units means that a failure at an earlier stage can have a pronounced ripple effect, contributing to a higher rate of failures.Conversely, Socks Shop, with fewer constraints among its web microservices, doesn't exhibit as much interdependence in the event of failures, allowing it to continue functioning relatively independently.

C. COMPARATIVE SUMMARY
The comparative analysis explores the characteristics and performances of the Kubernetes default algorithm [55], Multiopt algorithm [24], Greedy algorithm [37], and the proposed MOMA algorithm.As shown in Figures 14 and 15, for the Greedy algorithm [37] with consolidating most services within the same cluster, the network communication overheads may be suppressed and the failure rates may have less of an impact on network reliability.However, as shown in Figures 12 and 13, this operation may deteriorate the performance of cluster resource utilization.As to the Kubernetes default algorithm [55] and Multiopt algorithm [24], they respectively only allocate nodes with the lowest resource utilization and consider CPU and memory usage of every node to handle microservice interaction.Therefore, as depicted in Figures 12 to 15, these two algorithms perform similarly for the microservice task with respect to resource utilization, network communication overheads, and network reliability.
Overall, the proposed MOMA algorithm outperforms the Kubernetes default algorithm [55], Multiopt algorithm [24], and Greedy algorithm [37] in terms of resource utilization, network transmission overhead, and reliability usage in the three applications within multi-heterogeneous cluster environments.This is because the above three algorithms that are being compared lack consideration for node failure rates and heterogeneous architectures, resulting in relative performance degradation.Moreover, the proposed system is scalable to be able to ingest an increasing number of heterogeneous Kubernetes clusters and services.For instance, when the system includes a new heterogeneous Kubernetes cluster or a service, the proposed MOMA algorithm can be applied to optimize resource allocation.For the robustness issue, the proposed framework can be employed to implement a robust microservices environment with resource balancing.Referring to the analysis in Section VII-B3, the architecture principle and design pattern of the proposed framework architecture can help in building a reliable microservice architecture.The summarized findings are presented in Tables 6-8.

VIII. CONCLUSION
This work establishes a bi-objective optimization model: (1) maximum resource utilization and (2) reducing network communication overhead.To evaluate the performance of the proposed MOMA model, we apply three different microservice applications (i.e., edgex foundry, socks shop, and kubeflow) and examine the framework via microservice workload analysis with the measurement data from heterogeneous architectures of real-world scenarios.To achieve a more diverse and better set of solutions, we develop the MOMA algorithm based on the improved Elitist NSGA-II.We design a genetic representation, utilize SBX crossover operator, and employ two different mutation operators.To evaluate the quality of our solutions, hypervolume is used as a metric.Compared with the existing algorithms, the experimental results show that the proposed algorithm demonstrate significant improvements in resource utilization, network transmission overhead, and reliability across the three different applications.
To further extend this study on resource allocation in multiple heterogeneous clouds, possible future works include (1) considering GPU management in microservice resource allocation due to emerging microservice applications with GPU utilization, (2) integrating cloud-native services from certain Graduated projects into our framework, (3) deriving the theoretical bounds of the resource matrices for further investigating important characteristics of a microservices system and providing a baseline for the overall health of the system, (4) exploring platform metrics for monitoring the microservice health, energy consumption, or the entire microservices application, (5) investigating the algorithm's performance by enlarging the number of heterogeneous Kubernetes clusters and services, and (6) using a large and diverse evaluation set to abstract the system characteristics (e.g., scalability, generalizability, and reliability), and to benchmark cloud/edge computing platforms [58], [59].We plan to explore and experiment with a wider array of meta-heuristic algorithms through comparative analysis to further optimize our approach and extend the current research by incorporating a larger heterogeneous resource pool, such as investigating the possibility of adding virtual machines as additional work nodes in the architecture and making the heterogeneous infrastructure more comprehensive and versatile.

FIGURE 1 .
FIGURE 1. Architecture diagram of the empirical analysis framework.

FIGURE 2 .
FIGURE 2. Information flowchart of the workflow process.

FIGURE 4 .
FIGURE 4.An example of chromosome representation.

FIGURE 6 .
FIGURE 6.An example of the insertion and deletion mutation operations.

FIGURE 11 .
FIGURE 11.Performance efficiency with varying the percentage of the number of failed requests to the total number of user requests.

Figures 12
Figures 12(a), 12(b), and 12(c) depict the computing resource usage in three different applications.Similarly, Figures 12(d),12(e), and 12(f) respectively indicate the utilization of memory resources.We observe that the standard deviation of CPU and memory usage for Kubeflow is lowest compared to EdgeX Foundry and Socks Shop.This is primarily because Kubeflow experiences a higher workload, requiring more significant resource consumption.As a result, the standard deviation is lower due to the consistent and substantial resource utilization demands placed on the system, distinguishing it from EdgeX Foundry and Socks Shop.Moreover, it is evident that regardless of the application, the proposed algorithm consistently exhibits the lowest resource utilization, indicating greater efficiency in resource consumption.From a statistical perspective, the overall standard deviation of a cluster σ cluster can be obtained by combining the two standard deviations of cpu and memory, σ cpu and σ mem , which is given by

FIGURE 12 .
FIGURE 12.The comparison results of standard deviations in resource utilization.
Figures 14(a) and 14(b), EdgeX Foundry and Socks Shop have significantly higher overheads because of higher 7424 VOLUME 12, 2024Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 13 .
FIGURE 13.The comparison results of standard deviations in cluster resource utilization.

FIGURE 14 .
FIGURE 14.The comparison results of network communication overhead.

FIGURE 15 .
FIGURE 15.The comparison results of the mean number of failed requests.

TABLE 1 .
Summary of resource management.

TABLE 2 .
Notations and Definitions for the Problem Model.

TABLE 3 .
The values of execution parameters.

TABLE 4 .
Network interactions and resource requirements with one user request.

TABLE 5 .
Network interactions and resource requirements with ten user requests.

TABLE 6 .
A summary of the results of EdgeX Foundry.

TABLE 7 .
A summary of the results of Socks shop.

TABLE 8 .
A summary of the results of Kubeflow.