Dynamic K-Means Clustering of Workload and Cloud Resource Configuration for Cloud Elastic Model

Cloud elasticity involves timely provisioning and de-provisioning of computing resources and adjusting resources size to meet the dynamic workload demand. This requires fast, and accurate resource scaling methods at minimum cost (e.g. pay as you go) that match with workload demands. Two dynamic changing parameters must be defined in an elastic model, the workload resource demand classes, and the data center resource reconfiguration classes. These parameters are not labeled for cloud management system while data center logs are being captured. Building an advance elastic model is a critical task, which defines multiple classes under these two categories i.e. for workload and for provisioning. A dynamic method is therefore required to define (during configuration time window) the workload classes and resource provisioning classes. Unsupervised learning model such as K-Means has many challenges such as time complexity, selection of optimum number of clusters (representing the classes), and determining centroid values of the clusters. All clustering methods depend on minimizing mean square error between center of population in same class member. These methods are often enhanced using guidelines to find out the centroids, but they suffer from K-Means limitations. For the application of clustering cloud log traces, most of the reported work use K-Means clustering to label workload types. However, there is no work reported that label data center scaling classes. In this work, a novel method is proposed to analyze the characteristics of both workloads and datacenter configurations using clustering method, and is based on random variable model transformation (kernel density estimator) guide. This method enhances K-Means clustering by automatically determining optimum number of classes and finding the mean centroids for the clusters. In addition, it improves the accuracy and the time complexity of standard K-Means clustering model, by best correlating between clustering attributes using statistical correlation methods.


I. INTRODUCTION
Cloud Elastic Model (i.e. rapid elasticity) is one of the basic cloud service characteristics as per NIST definition [1]. The configurable resources must be tuned dynamically with minimum management effort to meet client workload demands ''such that at each point in time the available resources match the current demand as closely The associate editor coordinating the review of this manuscript and approving it for publication was Haruna Chiroma . as possible'' [2]. Cloud resource scaling has three main methods to rescale resources: i) horizontal scaling that increases or decreases number of virtual instances like application containers or virtual machines (VMs), ii) vertical scaling that increases or decreases the virtual instances like memory size, processor numbers or performance, storage, and iii) migration, by moving VMs or applications from one physical host server to another. All these parameters must consider the size of resource units that can match the workload demands need. On the other hand, workload types are also important considerations in cloud elasticity, which must be investigated and characterized. Workload is of dynamic and disparate nature, which depends on user, and web-to-web activities, and events like the social media web 2.0 multi-tier applications. The workload is affected by inter-user activities, e.g. a famous person's tweet can cause a massive workload from other people, which may increase or decrease the resource demands unpredictably. As can be seen, the workload type and the resource configuration are two related factors that must be considered in cloud elastic actions. Good matching between workload demands and optimal resource provisioning will reduce the cost both for cloud service providers and the cloud users.
The challenges [3] in cloud elastic resource provisioning are as following: 1) accuracy in scaling resources to match workload demands, 2) management cost time and space complexity to find optimal configuration set, 3) configuration cost, cloud orchestration and initial service setup time (spin up time [4]), and 4) scaling dimensionality (scale in or out) i.e. resource scaling type (vertical or horizontal [5]) and unit of scale capacity resource units. Cloud resource capacities has been enhanced and developed by big cloud players who introduced cloud computing hardware platform to support customer's business requirements such as Amazon, Google, and Microsoft. The idea is to increase granularity of provisioned resources like servers' processes, storage and network that allow more control over performance and cost. Intel and Amazon tried to make processor aware about workload by increasing or decreasing processor speed, and the number of instructions per clock cycles (number of MIPS per second) [6]. Facebook also worked on cloud hardware platform by introducing an open source project called Open Compute Project (OCP) [7]. The goal of this project is to allow cloud services to choose most suitable hardware (server, storage, network) design for cloud data center [8]. Microsoft innovated the Olympus hardware project that is a ''next generation hyperscale cloud hardware design and a new model for open source hardware development with the (OCP) community'' [9]. Google cloud data center cluster is built by utilizing existing servers dynamically [10], which is a web-based subscription architecture model used to link resource units. The combinations of all these technologies increase the provisioned complexity and mapping tasks to workload demands. By labeling resources using unsupervised machine learning clustering method will allow data center cloud manager to accurately and efficiently configure the resources at reduced cost.
Workload applications are varied in cloud environments especially with changes in virtualization technologies and overlay networking setup. Software Defined Networking (SDN) [12] and Network Function Virtualization (NFV) [11] are impacting the resource provisioning based on network topology setup and services that need to be investigated and labeled as classes demands and released resources. Workload clustering in SDN and NFV environments is crucial, because it will reduce resources orchestration cost and time by defining architectures groups of computing resources that can interchange in provisioning and releasing the resource during service time. This allows cloud manager to pick one of these groups of data center resources that can match overlay network changes.
Clustering of cloud workload and configuration set is very important to make elastic decision too. By clustering workload and datacenter configuration, it can define a labeled data set such that the cloud management system can decide the best action based on the predefined demand classes to provision resource classes, using look up table. Many works had been done investigating cloud workload using real log traces. Google Cluster workload traces [13] have been clustered and analyzed in [14]- [23]. Also, Alibaba cluster traces are investigated in [24] to validate workload behavior in real cloud environment. MapReduce workload in Taobao ecommerce company has been studied in [25] to understand workload characterization in large scale cloud environment.
Reading cloud resource and workload activity logs in real time manner to be used in management decision action, is investigated in [26], [27]. Dimensionality reduction methods (wavelet transform and Principal Component Analysis PCA) are used to store and replay logs of Google Cluster workload traces to allow machine learning model utilizing decision time. Another method to reduce clustering time of cloud workload using Hyper-gamma distribution applying moments method is proposed in [28]. Defining a set of classes for workload types and cloud data center resource configuration set is a difficult task that needs to be addressed since provisioning decision will be based on these two parameters.
Our contributions in this work are to introduce an enhanced K-Means clustering approach for cloud workloads and datacenter configurations types, and a correlation matrix that evolves the relation between cloud resources and workloads. Our proposed method analyzes workloads and datacenter configuration traces based on kernel density estimator to find number of classes and classes center. Our method labels cloud workload demands and datacenter configuration capabilities to investigate workload types by related jobs submitted and data center capabilities, server types, capacity, and configuration setup. A dynamic mapping method based on demands and provisioned resources (expert system) is proposed to make the proper action during accepted configuration time. This will allow cloud service providers to respect service level agreements and minimize cost for customers and service providers.
This will allow cloud manager to find the best match between requested resources with provisioned resources considering all workload and data centers characteristics by using real data from cloud data center traces. The tasks carried out to achieve this goal are as follows: 1) analyze data center resource capabilities and configurations, 2) analyze workload demand types and behavior from different perspectives such as resource types needed (CPU, RAM, Network, Storage), demand speed, mass, format and messaging, protocols, and communications, 3) label workload and VOLUME 8, 2020 data center configuration with appropriate classes, 4) develop proper mapping that achieves the best elastic match between demands and resource provisioning, and 5) apply methods to reduce resource provisioning complexity, time, and cost by using hash look up with a key-value pair of demand and provisioned resources, respectively. This paper is organized as follows. Section II presents the related work, and methodology background is discussed in Section III. Our proposed Dynamic Clustering method is presented in Section IV. Experiments and results are shown in Section V and conclusion is presented in Section VI.

II. RELATED WORK
Workload clustering and classification have been studied in literature for many reasons such as for cloud modeling and evaluation, workload emulation and forecasting, and cloud resource reconfiguration. An adaptable model for generic large-scale workload is proposed in [29]. In this work, the authors formulate an accurate, realistic, and adapted workload model. They use Google Cluster Workload Trace schema to represent large scale workloads. The analysis of the workload is defined using four laws: 1) submission time, defined by the inter-arriving rate of task and modeled by Pareto distribution, 2) type, i.e. either task type or service type, 3) make span, defined by the task duration and modeled using long-tailed distributions like Pareto or log-normal, and 4) priority of the tasks order based on importance. K-Means is used to cluster submitted jobs with k equal to four classes for task and service types. The modeled workload characteristics are dynamic in nature with different parameters such as frequency, mass, and disparity. In [24] authors cluster Alibaba cloud cluster traces using K-Means method to analyze and identify job group characteristics and the relationship among different job groups which in turn will help in scheduling jobs at run time. Authors in [23] proposed simple set theory to enhance K-Means clustering methods for cloud workloads and data center configuration parameters. In [30] a Markov model with hidden layers is used to characterize workload stochastic behavior to cluster and classify workload patterns. K-Means is used in [20] to cluster google traces jobs and tasks as workload characterization to gain insight on the performance of workload demands. Hierarchical and conventional K-Means clustering is used to characterize workload by assigning common groups of jobs and common groups of machines in [21]. The goal is to schedule submitted jobs with appropriate data center resources while reducing power and batch job assignment latency.
An elastic scaling framework for cloud computing layers using combined AI methods with optimization are discussed in [31], [32] and [33]. Authors in [31] analyzed and clustered workload using metaheuristic-based method for elastic scaling. Their work depends on two AI methods: Genetic Algorithm (GA), and Fuzzy C-means. An elastic framework with hybrid workload clustering using two methods, K-Means, and Imperialist competition algorithm, is proposed in [32]. In [33] resource provisioning framework is proposed for elas-tic scaling in cloud PaaS layer using autonomic computing and reinforcement learning (RL).
VM consolidation considering workload characterization patterns in cloud data center is proposed in [18]. The authors propose a fully distributed and threshold free Dynamic Virtual Machine Consolidation (DVMC) algorithm called GLAP that combines Q-Learning with a gossip-based protocol. A VM consolidation method is proposed in [34] that uses optimization methods to reduce multi-tier workload latency. Workload modeling based on common pattern clustering of the workload is proposed in [22]. Selecting clustering method that can match workload and data centers characteristics is an issue. A new approach to select clustering method that relates VMs to tasks in large data centers is proposed in [35], [36]. The framework selects the best clustering algorithm from a set of clustering algorithms based on cluster validation methods. In [37], the author proposed a scaling dimension method which is determined using linear algorithm related to the workload type. Workload analysis is used in [38] for optimal cloud resource configuration. In this work, deep reinforcement learning is used to handle heterogeneous big data workload. The authors proposed a new model (named SARA), which does three tasks: 1) cluster workloads into groups using Bisecting K-Means, 2) search the optimal configuration of clustered workloads using deep reinforcement learning, and 3) continue to cluster new workload and find the best configuration set.
The researchers in [17] analyze data center resource to find Zombie servers that consume energy by running in idle state. Through the workload analysis, it was shown that significant reductions in power consumption and CO 2 emission can be achieved by optimum resource scaling. Workload forecasting for elastic resource management in edge clouds is proposed in [39]. The authors use a combined method between time series analysis called Auto Regressive Moving Average (ARMA) and Elman Neural Network (ENN) to forecast workload based on error correction. Authors in [19] used a combination between K-medoids clustering algorithm and multilayer perceptron neural network (MLP) model for workload prediction. The proposed work focuses on prediction workload pattern of new submitted tasks based on pool of historical tasks.
Cloud resources and capabilities have been investigated in [40] where the authors surveyed cloud hardware resource design for AI-enabled cloud computing. Deep machine learning workload framework is proposed in [41] to handle machine learning stages (storing data sets, training phase, evaluation, and production model) in cloud environments. This will allow users to deploy and test machine learning models in distributed computing models without considering resource limitation and configuration using standard Application Programming Interface (API). In [42], the authors used reinforcement learning method on GPU clusters for deep learning workload scheduling. To the best of our knowledge, there is no method to dynamically cluster cloud workloads and data center resource configuration set. Specifying number of classes and initial group characterization for cloud workload and resource configuration is the first step in dynamically clustering datacentre workload. All workload aspects such as needed resources (RAM, CPU, Disk, Network), duration, volume, speed, location, and tasks must be considered for the analysis. In addition, data center resources in cloud environment continues to change in terms of capabilities, computation power and configuration technology. Servers, storage, and network infrastructure must be characterized such that they can be adapted for new resource types, workload types and provisioning methods. This work focuses on determining number of classes dynamically adapting to the changes of workload demands and data center resources using hybrid method that uses probabilistic statistical theory and unsupervised learning with optimization for selecting best solution. Table 1 summarizes the related work based on four criteria: clustering method, workload type, experiment design, and performance evaluation.

III. METHODOLOGIES REVIEW
Methodologies used in this work are based on statistical, optimization theory and machine learning model. Using statistical analysis methods to find cloud data center traces logs mean and variance, then formulate mapping function of random variable model, allow to model workload and datacenter configuration characteristics. Unsupervised machine learning customized K-Means method is then applied to minimize population set inertia (mean square error between same group set) guided by density function to find out number of centers (means) and centers initial values.

A. KERNEL DENSITY ESTIMATE
Kernel Density Estimate (KDE) [43] is a method that can describe sampled statistics population probability distribu-tion using random variable definition. The probability density function (pdf) for continuous model or probability mass function (pmf) for discrete model [44], are transformation functions for the events in the sample space which associate probability values to the outcome of random experiment. The estimator can expose some characteristics of statistical data such as skewness and multi-modality. A common approximation method for estimator is histogram, which is a frequency distribution for the population events. It is found by dividing the range of data into intervals named bins and then counting the number of data points in the intervals, which represents height of the chart. Define h as bin width, x 0 origin or start of sample range of population X set, m number of bins and i ∈ 1, 2, . . . , m an integer number to represent bin index, then bin interval is defined as [x 0 + ih, x 0 + (i + 1)h]. The histogramf (x) can be defined as Equation 1, which shows the number of elements per bin divided by the product of bin width and the total number of data points, N . The term N is used in the equation for normalization. A more generalization of histogram method is obtained by making bin size variable in context to data set distribution as shown in Equation 2.
From histogram, the probability of each bin can be defined by dividing number of elements in the same bin to the total number of elements. Using this method, a probability mass function (f (x)) can be formulated as a random variable to describe the behavior of the sample set. The main two statistical parameters that are considered in any random variable are mean (µ), which is the expected value defined as E(x) = xf (x), and variance that represents square of standard deviation (σ 2 point to the most common value of population and variance show how far population deviate from mean. These values can be obtained from data set using sampled mean and variance. However, these values are not enough to characterize the data fully and build statistical prediction model.

FIGURE 1. Kernel types.
A solution for this limitation is to use random variable model for population, which can provide all information about any population including mean and variance. Histogram is a simple clustering method that can help in finding population classes by grouping elements in bins (a naive description for random variable). In machine learning models [45], unsupervised learning method called descriptive learning aims to find the system interested output pattern that belongs to system output. Histogram clusters sample data input D without any information about the expected output values, where input data is defined as D = {x i } N i where N is the number of data points. Formulating probability mass function for sample data using histogram distribution as a random variable is used in clustering to indicate number of classes and the centroids initial values. However, the limitation in histogram method in clustering and nonparametric analysis is caused by the discontinuity of probability density function. A solution to this problem is to introduce a function that represents a weight for each population point and by finding the total overlapped weight. Using this function called kernel k(x), the density estimator can be defined as Equation 3. Kernel function must satisfy three conditions: 1) it must be a positive function k(x) ≥ 0, 2) it must be symmetric k(x) = k(−x), and 3) it must be continuous and decreasing for x > 0, k (x) ≤ 0. There are many types of kernel function that can be used, but the most common used in machine learning Python libraries [46]- [49] are Gaussian, Triangular and Cosine as depicted in Figure 1.
Introducing width h window attribute for the kernel function that can be used as a smoother for kernel estimator function, as Equation 4 depicts. The bandwidth of the estimator will show the overlapping in density in sampled space. This can show the points neighbor relation as likelihood estima- tors.d Generalizing kernel estimator using weight will make a correction for the estimator for high related point used as validation for the function fitting, as shown in Equation 5. The weight with the kernel function will smooth over all density function estimator. This function can be a tradeoff between over fitting and under fitting problems.
Choosing kernel function when there is a high number of sampled points does not have a big weight because the fitting shape will be the same for all types of kernel functions, as shown in Figure 2 for a RAM request workload. Here the number of bins is selected experimentally and it is 150. The important factor is to choose bandwidth h for the kernel function. It will impact the overall density function shape because it will control the density function smoothness, where wider bandwidth will make function more smooth and resilient against spikes shapes (non deterministic events), as shown in Figure 3.
A complex statistical method is developed by statisticians Ramsay and Silverman for choosing the bandwidth, h. The theory of this method is outside the scope of this paper, and provided in [50]. In our work, a toolbox in Python programming language library is used. The method uses sampled of the population. In Python library there are three types of bandwidth estimator that can find out the bandwidth based on statistical population. The default method with Gaussian kernel, called ''normal_reference'', can be shown by Equation 6, where A = min(s 2 (X ), IQR(X )) is inter quartile range, N number of data points, and C is a constant matrix of Hansen values based on Silverman rule of thumb [50].

B. K-MEANS
K-Means is an unsupervised learning model that works to cluster data set into specific number of cluster groups defined by random or selected centroid values µ 1 , . . . , µ k , where µ k represents centroid of cluster k. µ k = 1 p i∈c k x i , where p = |c k | represents number of data group points in cluster k, and c k a subset of C represents clustered grouped points. It works by iteratively finding the minimum distance between centers and all points x i in the same cluster group set c k , until reaching distance threshold. The objective function J is defined as the Mean Square Error (MSE) of the distance between centroid and data point member in same group, as shown in Equation 7. This objective function needs to be minimized at each iteration.
K-Means works in two nested loops that requires a time complexity of the product of number of group members and number of centroids, i.e. O(k × p). There are many clustering methods, which are characterized based on following perspectives [51]: 1) architecture, 2) robustness, 3) restrictions on latent features, and 4) reconstruction loss. K-Means is the simplest and most widely used method with three limitations: 1) setting number of clusters k and initial centroids points value µ c p , 2) finding out number of iterations per minimization cycle (minimize cost function J (C, P)) by defining the inertia distance threshold value, and 3) validation for the clustering and the accuracy for the class members in context to group points logical relation. There are many enhanced versions of K-Means method that overcome some limitations to be used in cloud workloads clustering [38] like Bisection K-Means [52], which solves the local minimum problem in objective function optimization. It works by setting k value to 2 for the data set for initial centroids to find two groups. It is close to hierarchical clustering method, top down (divisive), and evaluates each group using sum of the square error and repeatedly dividing until number of groups is smaller than k. All clustering algorithms work using unguided search on the data points set, which can produce non logical result. Using informative methods about population to help clustering method will significantly influence the result towards the best consistence and accurate groups. In this work a novel way is used to enhance K-Means method of clustering using kernel density estimator to cluster cloud workloads and data center configuration options dynamically.

IV. DYNAMIC CLUSTERING METHOD
In this paper, a combined method to cluster cloud workloads and datacenter configuration is proposed using histogram, kernel density estimator and K-Means. This guided clustering method will dynamically determine number of classes based on statistical population characteristics of cloud workloads and data center configuration. Figure 4 depicts the flow chart of the proposed dynamic clustering method. It is built by integrating five phases: 1) Silverman to find bin width and number of bins, 2) Histogram to describe the logs trace distribution for attribute extraction and correlation, 3) KDE to find random variable for the log type that generates PDF and CDF, 4) K-Means to cluster correlated logs, and 5) Validation of the clustering process to check accuracy of clustering. The proposed method starts by data cleaning and ends by validating the clustering result as depicted in Algorithms 1, 2, 3, 4, and 5, which are discussed in details later.
The flow chart in Figure 4 begins in Algorithm 1 by gathering and storing the logs and replaying in a timely manner to be processed. Data cleaning and extraction is done using Algorithm 2 to evaluate logs value, more specifically to check whether it is empty or of a wrong data type or format. It is followed by extracting log attributes as vectors, which are processed individually. The attribute vectors are statistically analyzed to find population mean, variance and Hessian constant matrix. These values are used to calculate bin bandwidth using Silverman Equation 6, in order to find number of bins over all the population range. After that histogram is applied, as defined in Equation 2, and the sampled mean and variance are obtained for population relation evaluation for all points of each group, to create CDF using Equation 5 in Algorithm 2. A new vector M =x 1 , . . . ,x k represents sampled mean of all bin point members of each group that will be used as K-Means initial centroids. The empty bins must be removed, and the number of bins must be updated as the k value, the number of clusters. Information that must be obtained for clustering are by correlating the two attributes, which are number of means (vector size) and the initialization of each bin center. Two vectors with the minimum number of classes are joined, and vector with higher number of classes are merged to same as number of vector with lower number of classes (Algorithm 3). Algorithm 4 applies K-Means clustering using Equation 7, and evaluates the clustering inertial using kernel density estimate (Equation 3) distributions. A full information about the data set attribute vectors will be formulated using KDE that generates the probability distribution function, and probability density function. These two functions can inform the bin group probability volume, using a convenient ratio factor as the event probability weight. Bin boundaries can be defined based on bandwidth of kernel function by segmenting the distribution into the most representing bins, that is the bins that have a higher probability value. The result of this stage will be used to validate the group points relation. Finally correlation and validation is applied for clustering, as implemented in Algorithm 5. Table 2 summarizes all symbols used in the equations and algorithms.
The five phases of the proposed dynamic clustering process are presented in detail in the following algorithms: 1) Algorithm 1, Logs Cleaning and Normalization phase: In this part, the algorithm starts by reading the data center and workload demand trace attributes in a real time manner. These trace attributes are processed individually. Cloud data center and workload log traces are raw data that must be processed before being clustered. The processing involves extracting valid and meaningful log traces, by checking log values to see if there is any missing or an invalid value. The algorithm will approximate the missing value by averaging the neighbors of the current point as shown in Line 2 of Algorithm 1. Log values must be normalized to the maximum value for all log types as rational value to resource units, as shown in Line 6.  Tables 3 and 4 show the lists of workload and data center configuration attributes. Table 3 summarizes the extracted workload trace attributes where each attribute is denoted as WL i . Table 4 summarizes the datacentre  the histogram empty bins are identified and removed, as shown in Line 4 of Algorithm 3. Next, two attribute vectors are combined based on minimum vector bin numbers. The merging process works on converting the longer vector to the same size of shorter vector attribute bin number, by finding the merge step size, as shown in Line 7. After the merge step is obtained, this value is used to find the bin boundaries and new bin labels, as shown in Lines 11 and 12, respectively. In Line 13, the new point count for each bin is calculated. Finally, through Lines 14 to 17, the new mean and variance are calculated for the merged bins. 4) Algorithm 4, Correlation and Clustering phase: This part works by applying K-Means using the number of bins as class number k and the bins mean of each group members as centroids, as shown in Lines 1 and 2 of Algorithm 4. Clustering accuracy error is evaluated using the maximum inertia value as shown in Line 4. This is an iterative process until the error value becomes lower than the threshold value. Here inertia is the Mean Square Error matrix of clustering classes for each bin member point. Next, the classes' centroid values are updated in Line 3 using standard deviation as a step by step guide projection. The inertia error matrix is compared with bin's variance vector between group members as shown in Line 7, and overall mean square error is calculated in Line 8. In the second test Silhouette value clustering index, which indicates a relation consistency between classes' point members, is checked against a threshold value of 0.6. If the Silhouette value is lower than 0.6, it indicates a good clustering K-Means relation, as shown in Line 13. In the third test the slope value per bin is compared against a threshold value, as shown in Line 15. The threshold value is selected as 0.7 of the maximum slope since it was statistically found (our previous work [23]) that 0.7 of maximum slope shows a normal relation between the attributes.

A. ALGORITHMS COMPLEXITY
Our proposed approach works based on capturing logs during time window which is of a fixed size (sizeof (X )), which means in general the whole algorithm complexity is a constant. Since the buffer size is also fixed, the time complexity is also a constant. Algorithm complexity analyses are different for individual algorithms. For Algorithm 1 Logs Cleaning and Normalization, and for Algorithm 2 Preparation and Initialization the complexity is O (1). For Algorithm 3 Merge, time complexity depends on the minimum value of Silverman bins number (Min) and logs dimensions (Attno = sizeof (X )). There are nested dependent loops that

Algorithm 4 Correlation and Clustering Phase
Require : Bba1, Bpla1, Ha1, Ha2, µ 1 , µ 2 1), where K-Means class numbers are set and center initialization is updated using projection multiplication process. This is because the updated process is limited by a constant iteration number that does not exceed the standard deviation. Algorithm 5 Validation Phase cost is O(Min × Attno) for two nested independent loops. The overall complexity of our proposed approach is therefore O(Min × Attno).

A. IMPLEMENTATION LIBRARIES AND TOOLS
Python programing language is used for development and implementation of the proposed methodology. For machine learning and statistical analysis, the following python if E ij ≤ Eth && V ij ≤ 0.6 && S ij ≥ Sth then 12: MM i,j ← 1 /* one means a high correlation and clustering. */ 13: if V ij ≤ 0.6 && S ij ≥ Sth then 14: MM i,j ← 0.75 15: if S ij ≥ Sth then 16:  [47], [48] for K-Means clustering and (ii) python library developed by Vanderplas [49] for kernel density estimation.

B. RESULTS
In this section, a full analysis of the proposed dynamic clustering method will be presented. For attributes correlation and clustering mapping, the relation between all attributes is shown in Table 5. In this table, the upper right triangle, separated by blue cells, represents the total of MSE (J) as a rational normalized value to the maximum value of 100000, that can represent clustering accuracy. The bottom left triangle of the table represents the number of classes k used to make the clustering. There is an inverse relation between the number of classes and the total MSE error value, such that with a lower number of classes the error value is higher. But in some cases, such as between datacenter attributes DC5 and DC4, the value of this error is very small because the distribution of the statistical points is close (e.g. machine list and CPU resource units are correlated factors).
The algorithm tests the relation between all attribute types to find attribute relation magnitude. The relation between the attributes is described using normalized MSE, as shown in Equation 7. A good relation has a lower index value. Clustering works to define number of classes and its member points that belong to each class. This creates a labeled data.
The matrix in Table 5 shows the results of all the correlation experiments between all trace attributes that can indicate the matching relation between the attributes. There are two matching aspects; first, logical matching where correlation vectors considered are in the same characteristics set. For example, in Table 3 workload clustering is defined based on correlation between all attributes to each other, such as defining job classes according to number of submitted tasks. Second, statistical analysis of log trace correlation that can measure the nature of the trace's relation.
In Figure 5, job WL1 has initially 7 bins that are distributed among jobs range, which becomes three bins after histogram cleaning. As well, tasks WL2 has 403 histogram bins distribution and after cleaning it becomes 164 active sets. The merging set that shows jobs and tasks correlation will use the minimum number of classes, because population relation density (the PDF) of jobs are concentrated in a narrow range, as Figure 6 depicts. In Figure 6, CDF guides the probability of tasks and jobs occurring during small ranges. This can give us a small number of job classes based on the number of task classes relation, which is a good logical attribute for the workload characteristics, as Figure7 indicates the labeled points.
Repeating the correlation process for all attributes of workloads and data center configurations, we notice some relations that are not directly connected and correlated, such as correlation between submitted jobs WL1 and CPU demands WL3. As Figure 8 depicts, job (7 bins) distribution in histogram model is not related at all and there is no doubt that classifying jobs based on CPU demands (75 bins) will not be accurate. In Figure 9, the PDF indicates two totally non consistent distributions as well as it shows the job CDF slope is close to one, which indicates jobs are arriving in batches grouped with fast growing rate. On the other hand, CPU demand PDF   is more distributed and the CDF slope is lower in magnitude, which also has a higher variance. Figure 10 shows a view of jobs and CPU demands cluster labels. From Figure 10 we can conclude that most of CPU demand bins are segregated of jobs bin ranges except a limited number of job classes, due to which many of CPU demands classes' granular details may be lost. On the other hand, logical attribute for the workload characteristics, as shown in Figure 13, indicates a high correlation between tasks and CPU demands. From Figure 11, it is observed that there is a high overlap in the population histogram between number of tasks and CPU demands. There are a large number of classes in both WL2 and WL3 attributes. For WL2 it is 403 and for WL3 it is 75. In the PDF and CDF shown in Figure 12 the clustering result of the CPU demands per submitted tasks is depicted. It shows that CPU demands with respect to tasks distribution is faster in growth and more concentrated in a fixed narrow range with high slope value. However, the submitted tasks distribution is wider in range and has a lower slope value in CDF. After merging and bin cleaning phase of Algorithm 3, the CPU demand bin number becomes 74 and the task bin number becomes 164. Figure 13 shows the clustering results from CPU demand and tasks for 74 classes, which is the minimum between CPU demand and task bin numbers.  For the datacenter configuration attributes set, the mapping between all of DC configuration attributes are listed in Table 4. All the attributes are correlated, but the most correlated attributes for elastic model are as follows: machine list (DC2) correlated with datacenter capacity of both available CPU and RAM resources (DC9, DC10) and provisioned CPU and RAM resources (DC3, DC4). The correlation result for total MSE is depicted in Table 5, and some examples are discussed in the following text. Not all of them are mentioned due to paper size limitation and similarity in result analysis. Similarity in DC attributes come from datacenter configuration method, which is based on just adding or removing physical hosts with minor differences in machine resource units.    Figure 14 shows mapping of two histograms: Provisioned RAM (DC4) with 34 bins and Datacenter RAM capacity (DC10) with 78 bins. The distribution of PDF provision RAM is more compact compared to datacenter configuration capacity (number of machines), which is continuously increasing. This means upgrading of the DC machines is expanded (scale out) as Figure 15 depicts. There are some lags in hardware scaling compared to provisioning resources, this is due to the capacity resource scaling as Figure 17 depicts. The full available resources are provisioned as they become available for both CPU and RAM. This reflects existing machine lists that are scaled out or the new added machines are involved VOLUME 8, 2020   in DC production resources. The clustering of datacenter RAM capacity with respect to RAM provisioning is depicted in Figure 16. The classes here are very clear with scaling units (almost equal scale RAM resources units) that can participate in each configuration time window.
Another example that should be considered in clustering elastic attributes is DC and WL attributes correlation, which is non logical for elastic model correlation (because it reflects the datacenter actions not elastic scaling condition). Figure 18 depicts the RAM demands (WL4) with 125 bins to provisioned RAM (DC4) with 34 bins. The PDF and CDF distri-  butions in Figure 19 show the demanded resources are for a longer duration, however the provisioning is done faster in a shorter period as CDF slope indicates. This causes a violation of the elastic feature (same scale of RAM resources units) that participate in each configuration time window. Labeling these kinds of relation (provisioned to demand) directly without elastic conditions check, will cause violation in elastic scaling and SLA as shown in Figure 20.
Elastic model works on matching the provisioned class to the demand class using simple (look-up) method while connecting the labeled demand class to the provisioned data center configuration. Mapping resources to demands in cloud resource orchestrator depends on appropriate linking between the demand workload class and the data center provisioned resource class. This is supposed to be independent from the pre-existing old actions (the map between demands and provisioned resources). The solution is to cluster scaling resources unit with respect to provisioned resources, then map them to demands. This will ensure a more effective cloud elastic manager's action. Results show the overlap between correlated attributes that indicates the relation between workload demand class types to its equivalent best matching class with the provisioned class.

C. VALIDATION
We adopted three ways to validate clustering accuracy and consistency: First, clustering accuracy is determined by running K-Means with changing k values. A statistical measurement has been applied that shows k values close to the extracted number of bins will produce the maximum Euclidean distance between cluster member nodes, with the minimum rate per bin. This can indicate a good relationship between bin cluster point members.
The second method is to test the cluster member consistency using the Silhouette method. In this validation method, K-Means clustering is evaluated in the range between [−1, 1], where a high value means the member objects are matched in a good way. Figure 21 depicts the relationship between cluster members, which is not negative and is more than 0.6. This indicates a well-matched object to its own cluster. The third validation method uses Cumulative Distributed Function (CDF) slope. CDFF(X ) = M i=0 inf (x), which is defined as accumulating bin groups probability, can describe the behavior of each cloud log attribute population as a random variable. Using CDF slope, we can describe the attribute bin probability growth within the vector range. Close slope values between two attribute vectors can indicate a high relation between these two attributes. The slope of the CDF curve can indicate the speed of probability density distribution events. With large slope values, the probability intensity of the events becomes high, but with smaller slope values, the probability of the events, in that range, is rare and the probability intensity is lower. Using this method, we can indicate the highest range in the two attributes population that can be correlated and grouped by comparing with the clustering methods over population ranges. This shows an instantaneous view for specific events to check if there is a continuous bins relation. If continuous bins slope is equal, then a high relation between these two bins point members is found by CDF segmentation.
Applied on probability volume based on area integration using histogram bins as increment step, the points members in the same bin have very high relation. Also the points in neighbors' bins are related. Updating class boundaries of bin centroid groups relies on K-Means minimum distance inertia value between same bin group members and the total mean square error as shown in Equation 7. By combining related centroids (merging similar bin groups) will reduce M attribute vectors, and K-Means iteration cost by finding new sample mean and variance on new merged bins in the attributes vector instead of using random substitution. This way centroid update will be reduced because the mean and variance of bin distribution have high correlation based on KDE probability distribution.
The validation of this work is based on the aforementioned three methods, which can be applied on all attributes clustering combinations regardless if they are logically connected or not (for elastic condition). The first method calculates the maximum Euclidean Distance (ED) for all clustering group classes elements and the distance is compared with standard deviation error (SE). In all cases, we found ED < SE, which indicates good clusters are formed using the proposed methodology. In the second method, the Silhouette method is applied on all clustering cases achieving a reasonable correlation data consistency index (Silhouette score) of 0.4 value, as Figure 21 depicts. All Silhouette coefficients (for each class value) of clustering classes are passing the Silhouette score value indicating good cluster formations. Finally, the last method involves checking the normalized MSE as calculated using Equation 7. A value less than 0.70 of the maximum error ratio indicates a good  cluster. With these constraints we achieved a reliable dynamic clustering method, as Table 5 depicts. But in some cases, there are violations for absolute mean square error between some attributes. We found 24 such cases out of total 288 cases, which is about 8.34%. The accuracy of the cluster therefore can be considered as 91.66%, which is an acceptable value.
To test the behavior of the algorithms and their reliability, we run the experiment 10 times for each of the two attributes. Figure 22 depicts the variation (Confidence Interval) in Silhouette and Euclidean distance for each experiment. The outputs are normalized to the maximum value to make the figure more readable. Confidence range is represented in the graph as black middle bar in the output value. From the figure we can conclude the error range for each experiment does not exceed 0.093 in WL1 and WL3 cases, which is an acceptable value and shows that the algorithms work properly.

D. PERFORMANCE EVALUATION
The experiments have been applied with K-Means by selecting the number of clusters and centers dynamically, the number of maximum running iterations fixed to 300 as the default value. The time complexity achieved with the proposed method is reduced to be O(1) rather than O(k × p). This is achieved because of the initialization of K-Means classes  centroids and the way of updating them using variance guide, which reduces number of K-Means iterations. The execution time is significantly reduced, as shown in Figure 23. Using the proposed enhanced method by selecting the number of classes and centroids initialization using attributed statistical measurement instead of initializing them with random values, the number of iterations is reduced to a maximum of 10 iterations, as Figure 24 depicts. This is a significant reduction from standard K-Means method, where the maximum number of iterations is found to be 42. The validation methods for the proposed approach depends on bin number as time complexity, which is acceptable. The only challenge in validation is to calculate CDF, which uses integration method. However, it is not a major issue since the validation process can be done in offline mode.

VI. CONCLUSION
In this work, we have introduced a customized guided K-Means clustering method for cloud elastic model that depends on KDE and the Silverman method to find the initial centers and number of classes. The proposed approach can reduce K-means time complexity and enhance accuracy of the clustering. From our detailed analysis, a reduction of about 75% on average is obtained in execution time from regular K-Means algorithm and a 87.5% reduction is obtained in K-Means iterations. With this method, datacenter configuration and workload demands are mapped dynamically with adaptation to their characterization changes, which allows the cloud management system to accommodate any type of workload and datacenter hardware configuration types. Our next work will be about classification of workload and DC configuration for elastic scaling module in cloud management system.