A Novel Approach for Gaussian Mixture Model Clustering Based on Soft Computing Method

Determining the number of clusters in a data set is a significant and difficult problem in cluster analysis. In this study, a new model-based clustering approach is proposed for the estimation of the number of clusters. In the proposed method, the number of components in each variable is determined by using univariate Gaussian mixture models. The number of alternative cluster centres and mixture models was determined according to the number of components in heterogeneous variables. In this study, appropriate Gaussian mixture models were determined with the help of "mixture model soft computing method" for the first time. Vector arrays showing the number and addresses of clusters in appropriate Gaussian mixture models were created, and according to the parameter estimations of these models that fit the arrays, the best model was obtained through information criteria. The clustering success achieved with the proposed mixture model soft computing method was compared with the results of Gaussian mixture model clustering methods namely, mclust, clustvarsel, varselLCM, selvarMix and vscc model selection methods in R package. All respective methods analyse and determine the number of clustering for the data sets, synthetic-1, synthetic-2, Iris, and Landsat Satellite Image data sets, respectively and evaluate the correct classification rate. The results revealed that the proposed method shows better results for the determination of number of clustering as well as correct classification rate. The novelty of the study is that a new model-based dimension reduction method is proposed for the estimation of the number of clusters. A deterministic clustering approach is proposed for clustering and classification success on reduced data.


INTRODUCTION
Model-based clustering is widely used in cluster analysis for clustering data from the mixture of Gaussian distributions. McLachlan and Rathnayake, Bozdogan, Scrucca and Raftery, and McNicholas are some of those who use the mixture of multivariate Gaussian distributions in cluster analysis [1][2][3][4].
In model selection studies for the perspective and strategies of mixture models, Celeux et al. proposed using cluster analysis based on mixture models to determine the number of components ( ) in the finite mixture models [5]. In multivariate data, components in the heterogeneous variable are used to determine the number and the location of clusters in the mixture model [6]. Each sub-group (component) in the variables corresponds to at least one cluster in the mixture model [7]. In model-based clustering, mixture models are created according to the number of components in the variables or subsets of variables. When the number of components in variables is for g<2, it is called a homogeneous variable, and because this variable does not have any effects in creating a subset, it is excluded from calculations [8]. Galimberti and Soffritti obtained multiple cluster structures in mixture models, depending on the number and location of subgroups in variables [9]. Galimberti et al. presented the mixture components of heterogeneous variables as variable sub-vectors in their study, and they defined how the components in variables affect clustering in model-based clustering. In this study, it was explained that each sub-vector in the variables has information on at least one set [10]. Akogul and Erisoglu proposed a model-based clustering method that uses Analytic Hierarchy Process (AHP) to reveal clustering in the data set. The proposed AHP method was used to determine the best model among the conditions based on certain criteria [11]. A variable/feature selection approach, which is based on Bayesian factors, was used to select the best model among the subsets that will occur in model-based clustering. The most appropriate model is determined among the candidate subclusters according to the assumption based on the Bayesian Information Criterion (BIC) difference [12]. It is very important in clustering analysis to prevent information loss within the variables that are reduced while variable selection occurs. There are subspace learning feature selection methods that improve learning performance by using the local discriminant information and geometry information found in the original data. [13][14]. In multivariate data, mixture models were created based on the numbers and volume of the components in the variables [15]. Fop and Murphy generalized variable selection methods according to the related and unrelated variables in model-based clustering. They named the variables that did not affect cluster formation in model-based clustering and did not contain useful information in terms of group membership as "Redundant variables". In their paper, the methods based on mclust namely, mclust, clustvarsel, varselLCM, selvarMix and vscc were applied on a synthetic data and compared results [16].
The choice of assignment algorithms is important when assigning observations to components in variables. The chosen distance function is important when assigning the observations to the components in the variables. Therefore, an adapted similarity measure is used in the cluster analysis [17][18][19][20]. Components in heterogeneous variables are assigned observations based on their means. Since the volume of each component in the variables is different, k-means algorithms assign a different number of observations to the components according to their means. Different observation numbers assigned to the components of the variables provide suitable solutions for EVI-VVV types from parsimonious models with a different covariance matrix structure. Covariance matrices, which are obtained from different number and size components of variables, significantly affect the number and location of clusters in mixture model clustering [21]. Finite mixture models in a grid structure based on the number of components are obtained from multivariate Gaussian mixture distributions. Among the mixture models obtained according to the determined cases, the best models are selected based on information criteria [22].
In this study, a new model-based approach is proposed for cluster number estimation of multivariate data based on Gaussian mixture models (GMM). The algorithmic method, developed based on soft computing, consists of variable/feature selection, creation of mixture models and selection of the best model. The proposed method was applied on two synthetic datasets and two real datasets namely, Iris (UCI), Landsat satellite Image dataset. The results obtained from the application were compared with the well-known methods namely mclust, clustvarsel, varselLCM, selvarMix and vscc. All obtained results show that the proposed clustering algorithm outperforms existing approaches.
The contributions of this paper are as follows: (1) The variable/feature selection method was developed with univariate mixed models for the data set.
(2) By defining the grid structured mixed models based on the component numbers in the variables, the model numbers in the search space were obtained.
(3) Appropriate-GMMs were obtained according to the number of components falling into the variables in the reduced data. Vector representations were defined for A-GMMs and the parameters of the models were calculated from linear models. (4) Information criteria for finite mixed models are calculated and the best model is obtained based on information criteria.
The study is organized as follows. In Section 2, MMSCM and model-based clustering stages are explained for the proposed number of clusters estimation approach. In Section 3.1, all steps are explained on the synthetic-1 and fifteenvariable synthetic-2 data sets, which are simple and comprehensive, respectively, to facilitate the understanding of the optimum cluster number estimation method with MMSCM. In Section 3.2, the recommended method (MMSCM) was applied on two real data sets and the results are compared with well-known methods of GMM based clustering. In Section 4, the results of the study are discussed and compared and the success of the method is presented.

A. THE MODEL-BASED CLUSTERING
Grid structured models are created with the components of each variable in multivariate data. The number of AGMMs among mixture models with grid structure is determined by using "MMSCM" model-based clustering. The number and volume of components of the variables in the mixture model reveal the number and structure of clusters in multivariate data. An algorithmic clustering method, which consists of five steps, is proposed for the estimation and clustering of the cluster number. Model-based clustering assumes that a data set consists of several clusters with different distributions. All variables in the data set are modelled by the mixture of these distributions. The model-based clustering assumes a set of n observations with p-dimensions, such that an observed random sample is expressed as = ( 1 , … , ) [23]. The probability density function of finite mixture distributions are as follows; ( ; Ψ) = ∑ ( ; ) =1 (1) where ( ; ) are probability density functions of the components and indicates the mixing weight (volume of clusters in the mixture model) in cases of 0 < < 1 and ∑ = 1 =1 ( = 1, … , ). The parameter vector Ψ = ( , ) contains all of the parameters of the mixture models. Here, = ( 1 , … , ) denotes unknown parameters of the probability density function of the i th components in the mixture models. In Equation (1), the number of components or clusters is represented by g.
The mixture density function of the multivariate normal distribution is given as; (2) where Φ ( ; , Σ ) are assumed to be multivariate Gaussian densities of the form where and Σ for = 1, … , indicates the mean vector (locations of clusters in mixture model) and the covariance matrix (shapes of clusters in the mixture model), respectively [24]. All unknown parameters of the model are shown as Ψ = ( 1 , … , −1 , ) , where obtained from compound vectors of = ( 1 , … , ) and Σ = (Σ 1 , … , Σ ).

B. DETERMINATION OF NUMBER OF COMPONENTS IN VARIABLES
In finite mixture models, the correct determination of the number of components in each variable provides the correct calculation of the number of clusters of mixture models [25]. Some well-known clustering algorithms such as GMM, Kmeans, K-Nearest Neighborhood (K-NN), Support Vector Machines (SVM), Decision Trees (DT), etc., are used to determine the number of components in mixture models. In the proposed method, U-GMMs were used as the unsupervised clustering method to determine the number of components in the variables.
The number of components in U-GMMs corresponds to a component in each variable. U-GMM is shown as; ( ; ) = ∑ ( ; ; σ ) =1 (4) where ( ; ) denotes density function of univariate Gaussian mixture distributions, denotes the components of Gaussian mixture distributions, denotes mixing weights, and ( ; ; σ ) denotes component probability density function. The component probability density function is shown as; where denotes the mean and σ denotes the standard deviations of Gaussian distribution. Log-likelihood (logL) and BIC [26] values obtained from U-GMMs are used to determine the components in the variables. Expectation and Maximization (EM) algorithms are used to estimate the parameters of π , and, σ in U-GMMs. Parameters are estimated with the EM algorithm to determine the optimum component numbers in mixture models. The likelihood value is calculated by using estimated parameters. The BIC value is calculated depending on the likelihood. The numbers of components in each variable are determined according to the information criteria. The mixing weights and covariance matrices in the mixture model are indirectly affected by the number of observations in the components. The covariance matrix structure for multivariate GMMs corresponding to the clusters of components of the GMMs in the grid structure is shown as follows; This type of covariance matrix is used due to the existence of different sizes of components in variables. The geometric standard spectral decomposition of a covariance matrix can be interpreted as follows; Σ = (7) where the scalar constant denotes volume, the orthogonal matrix of eigenvectors denotes orientation, and the diagonal matrix denotes the shape of the covariance matrix, respectively, with the form Utilising this decomposition of the covariance matrix Σ , geometric characteristics of the distributions can be imposed and a suitable model can be generated, where, and denote the number of components and dimensions of mixture models, respectively. For more detailed descriptions of parsimonious covariance matrices family and mixture model types, see [27].
The mixture weights ( ) of the models obtained from the covariance matrix shown in (7) are calculated from the number of elements in the component. Probability weights in the mixture model are the most important parameters for determining the number and structure of the cluster. Unsupervised clustering algorithms such as GMM and kmeans can be used to assign observations to the components from U-GMM. The mean of the observations is used to determine the cluster centres of the mixture models. While determining the components to which the observations belong, their distance from the cluster centre is used. The k-means assignment algorithm is used to assign different numbers of observations to the components according to their distance from the cluster centres. In this study, " mclust [28], clustvarsel [29], varselLCM [30], selvarMix [31] and vscc [32], " packages of R software were used to determine the number of components on model-based clustering in the synthetic-1, synthetic-2, Iris, and LSI data sets. MATLAB ® 2018b software was used to determine the model selection method. The best model can be obtained by using statistical information criteria for model selection after fitting the models to the data set with the Likelihood estimation method. Variable selection and assignment of observations to components are shown in Algorithm 1 as follows. (4) is applied to each variable .

2.
In U-GMMs, the number of components ( ) is determined based on log-L and BIC values. 3. Homogeneous variants ( = 1 ) are eliminated, the algorithm continues with heterogeneous variables ( ≥ 2). 4. The component numbers in the variables are considered preliminary information for the k-means algorithm. According to the known k numbers, the observations are assigned to the components they belong to.

C. MIXTURE MODEL SOFT COMPUTING METHOD BASED ON THE COMPONENTS OF VARIABLE
The minimum and maximum numbers of clusters in the mixture model are denoted as C and C and are defined as follows; where represents the dimension of data and represents components of the heterogeneous variables. The number of GMM based on the components in heterogeneous variables represented by M can be calculated as follows; where the term "-1" in Equation (10) represents the null model. Theorem 1. The number of cluster ways to form clusters in variables with components where ≤ and ≠ 0 is given by Proof. Odell and Duran (1974, p.26) [33]. Definition 1. A function is defined as : ( ) → ( ) between the cluster centres corresponding to the components of the variables, and the AGMMs obtained by the orientations of the cluster centres. This function defines a "one-to-one and onto" relationship between the components of variables, and the number of AGMMs. Where, } is the domain of the function corresponding to the number of components in the variables. The number of AGMMs are obtained as the range set of the function, ( ), as follows; where and correspond to the components in multivariable data for dimensions and clusters in mixture models, respectively. The number indicates the minimum and maximum clusters that can occur in mixture models.

D. STRUCTURE OF GRID-BASED POSSIBLE MIXTURE MODELS
In this study, a novel clustering method is proposed to calculate the number of mixture models in the grid structure based on the components in heterogeneous variables according to the soft computing method. Mean, covariance matrix, and probability weights were calculated from the population for each component of the variables that make up mixture models in multivariate data. Each model that corresponds to the appropriate model is defined as; are variance-covariance matrices for component Gaussian density functions for = 1, … , . Each possible mixture model corresponds to a vector representation. The vector representation of the model, for example, "10110100", corresponds to each mixture model in determining the mixture models in the grid structure according to the soft computing method. The number of clusters in the models is shown in the structure blocks in the GMM with ( ), which is called the degree of subset, as follows; where represents "0" and "1" in elements of vector arrays.
Another structure block corresponding to the orientation of the clusters in the GMM is the length of the subset and is denoted as ( ) . The distance between the specific first cluster and the positions of the last cluster in the subset are shown as; where and represent the first and the second cluster centre of vector, respectively.
The number of clusters in the vector representation of mixture models is equal to the number obtained by the degree of the subset ( ( )) in GMM. Besides, the location of the "1" in the vector sequence of the mixture model is shown with the location determined by the length of the subset ( ( )) in the block structure of GMMs.
The introduced concept of vector representation is that the structure blocks represent the clusters in the GMM model. While the cluster corresponding to each component is represented by "1", the null cluster in components is indicated by "0". A vector representation corresponds to each appropriate model obtained from mixture models. The creation of grid structured mixture models and their vector representations are shown in Algorithm 2 as follows.
Algorithm 2 Grid structure mixture models and vector representations Input: The variables and components ( ∈ ) that will create the dimensions of the models in the grid structure are placed in the grid-based model. Output: Grid structure mixture models and vector representations 1. For ∀ ∈ , the variables and components selected in the data set are determined in the grid-based model.

With
= { } and = ∏ =1 , min and max ranges are obtained for the number of clusters in grid structured mixture models. 3. The number of mixture models in grid structure is calculated with M = 2 ∏ =1 − 1. 4. Vector representations consisting of "0" and "1" digits are created for mixture models in the grid structure. 5. The number of components in each mixture model is obtained by ( ) = ∑ =1 .

E. INFORMATION CRITERIA FOR APPROPRIATE MIXTURE MODELS IN GRID STRUCTURE
logL functions of GMMs in grid structure are calculated as follows; ( , , ) = ∑ =1 (∑ ( , , ) =1 ) (16) BIC is calculated as follows depending on the logL function, the number of independent parameters , and the number of observations ; where n and d represent the number of observations and the number of free parameters in the model, respectively. The model that maximises BIC is selected.
Based on the variables of the data set, the mixture clustering algorithm, which determines the number of clusters appropriate for the data structure from the components and the structure of the clusters, was developed by applying several methods step by step as stated in the sections above. The determination of appropriate mixture models by MMSCM and the best model selection are shown in Algorithm 3 as follows.  (16) and (17) are calculated using vector sequences for ( ; Ψ) = ∑ Φ ( ; , Σ ) =1 and AGMMs. 4. The best model is selected from AGGMs based on information criteria.

The algorithm terminates.
In summary, U-GMMs were used to determine the number of components in the variables. While the k-means algorithm was used to assign observations to the components in the variables, the soft computing method in the resulting models was solved with the GMM. In the last step of the clustering algorithm, the best model was obtained by using the vector representation of GMMs for model-based clustering. In Figure  1, the proposed approach is described to determine the number of clusters of a data set in the mixture model soft computingbased clustering. The proposed method will be applied to the synthetic-1, synthetic-2, Iris [34] and 3D LSI data sets [35].

F. MIXTURE MODEL SOFT COMPUTING METHODS FOR OPTIMISATION
In this section, an effective optimization algorithm for MMSCM is introduced by determining the objective function. The proof of convergence of the algorithm is also presented. For time complexity, we define information complexity. The objective function basically consists of two parts: variable selection with univariate normal mixture models and the number of estimations with soft computing method.
A computer consists of Intel(R) Core(TM) i7-8700 CPU@3.20GHz and 8-GB RAM, Intel UHD Graphics 630 running on Windows 10 with a 64-bit R XX compiler was used for this study. Each step of the study was done in the R sotfware development environment.

1) OBJECTIVE FUNCTION
The aforementioned univariate mixed models and the variable selection method reduce the variables that have no effect on clustering. Therefore, it preserves the information on the variables in the data set, the geometric structure and volume of the clusters to be formed. In the reduced data, information complexity is minimal since there are no iterative processes when calculating the number of clusters and mixed models of grid structured models based on variable components.
The objective of the proposed method is to determine the best mixture model among the reduced total mixture models with the minimum number of variables. To achieve that, the proposed method minimizes not only the number of variables but also number of total mixture models.

2) CONVERGENCE ANALYSIS
This section discusses the convergence of GMM with MMSCM. The purpose of the proposed clustering is to separate the ℜ data set into clusters different from each other as 2 ≤ ≤ .
The problem can be solved using a two-stage deterministic method to identify components in a data set with multivariate normal mixture distributions. In the first step, the dimension reduction takes place by selecting the variable ( ≥ ). In the last step, the mixed models obtained from the variables are updated to minimize the problem from convergence to the best model selection.

3) TIME COMPLEXITY
In this section, the time complexity is analyzed to present the effectiveness of the MMSCM algorithm. In MMSCM, run time is mainly spent in U-GMM based variable selections, determining vector representations of AGMMs, and calculating information criteria for each grid based mixture models.
n: number of observations, d: number of variables (dimension), and m: the number of components (clusters) in the model, while the information complexity of the Em algorithm is ( 2 ) [36] in multivariate models, for variable selection in univariate models information complexity is ( ) .The information complexity of calculating model numbers to obtain vector representations of AGMMs is ( ) . Finally, the information complexity is ( 2 + ) to obtain the information criteria for each AGMMs.

CLUSTERING ON THE SYNTHETIC DATA SETS
In this section, the proposed method for the estimation of the number of clusters is applied on the synthetic-1 data set produced to explain the simple and clear steps of the study. To measure the performance of the proposed method, it was applied on Synthetic-2 dataset with more variables (15 variables). Results, which are gathered from the analysis of Synthetic-1 and Synthetic-2 data sets, were compared with the results of mclust, clustvarsel, varselLCM, selvarMix and vscc methods. In Table 1, the number of variables/features, number of observations and number of components/clusters of the data sets used are given.

1) APPLICATION OF THE PROPOSED CLUSTER ESTIMATION METHOD ON SYNTHETIC-1 DATASET
In this section, the principles of the proposed MMSCM are explained on the synthetic-1 data set. In order to determine the number of clusters with univariate approaches, a multivariate synthetic-1 data set was produced by simulation.
The synthetic-1 data set was generated from the mixture of Gaussian distributions using mean vectors and covariance matrices, with three variables and four clusters. It is designed to have 1, 2, and 3 components in the variables, respectively, to demonstrate the availability of different numbers of components in the variables and a different number of observations in each component. While creating the synthetic-1 data set, the parameters that make up the variables are given as follows: The mean and standard deviation values for According to the information criteria obtained from the U-GMMs, there are 1, 2, and 3 components in variables 1 , 2 , 3 respectively. logL and BIC values obtained from U-GMMs to determine the components in variables 1 , 2 3 are given in Table 2 below. The components in the synthetic-1 data set and the number of observations per component are given in Table 3. , of the mixture model to be created in the grid structure were calculated according to the components in the variables. Thus, the minimum and maximum clusters in the synthetic-1 data set resulted as 3 and 6 respectively. Variable components and cluster centres are illustrated in Figure 2. In the synthetic-1 data set, the variable 1 is called as "redundant variable" because it has a homogeneous structure, and the variable selection is made so that the reduced data set consists of variables 2 3 . Total mixture models for clusters obtained from variable components of the synthetic-1 data set.
was computed as = 2 6 − 1 = 63 for = 6 . The cluster numbers, the number of total models, and the number of appropriate models of the synthetic-1 data are shown in Table 4. For the three-dimensional synthetic-1 dataset, the mean vector and covariance matrix structure are in the form of After dimension reduction, the corresponding cluster centres of the components in variables 2 3 and grid structures of appropriate models were obtained.
Component density function is the probability density function for bivariate Gaussian distribution.
The logL and BIC of the best model obtained from the parameters calculated based on the components in the variables of the synthetic-1 data set are shown in Table 4.
The best mixture model that fits the 3D synthetic-1 data set, which is determined by model-based clustering among 25 appropriate models, was the four-component mixture model. Information criteria and vector representation of the model obtained from mixture density functions for determining the best model are shown in Table 5. According to the centres presented in Figure 2, the structure blocks of the best model was obtained as ( ) = 4 and ( ) = {1,3,4,6}.
The results obtained from MMSCM are compared to the results of mclust and mclust based model selection methods for synthetic-1 data set in Table 6.  The illustration of BIC values and number of components for synthetic-1 data set is shown in Figure 3. The GMM results were compared with the proposed MMSCM to estimate the number of clusters and the correct classification ratio on the synthetic-1 data set. According to the general CCR values, the recommended soft computing clustering method is approximately 15% more successful than mclust based methods. For the full data in the synthetic data set, the scatter plot obtained from mclust is shown in Figure 4. The 3D surface graph of the model, which has "101101" vector representation with MMSCM, is illustrated in Figure 5.  Table 6 for the Synthetic-1 dataset, MMSCM achieved an average of 15% higher success than the mclust based methods.

According to the classification success values obtained from
In the graph of the number of components in Figure 3, it was seen that the best covariance model was the full type.

MODEL SELECTION FOR SYNTHETIC-2 DATA SET WITH MMSCM
The synthetic-2 data set was generated by simulation with the proposed MMSCM for dimension reduction and selection of the best model. In the data set with 15 variables, the 1 st and 2 nd variables were produced as 2 components and the other variables were homogeneous. The parameter values used while generating the variables in the data set are as follows; The  According to the logL and BIC values presented in Table 6,  Table 8. The number of components was determined with U-GMM for the variables and dimension reduction was made by eliminating homogeneous variables. 1  Mixture models obtained from the components of the variables 1 and 2 in the reduced data set and the logL, BIC values, and vector sequences of the models are shown in Table  9. The model with 3 components and "1110" vector array representation according to logL and BIC values was determined as the best mixture model among the AGMMs.
The results obtained from MMSCM are compared to the results of mclust and mclust based model selection methods for synthetic-2 data set in Table 10. Although the classification successes are the same, mclust and vscc methods could not determine the redundant variables.
The number of components and BIC value graph obtained through mclust according to different covariance types are shown in Figure 7. The comparison of the performance of GMM with the correct classification rate of the synthetic-2 data set according to the clusters and locations obtained by MMSCM is given in Table 9. The surface plot of the best mixture model obtained by MMSCM is shown in Figure 8. According to the results in Table 10 for the Synthetic-2 data set, the success of the proposed method and the mclust based methods are the same. While MMSCM used VVV full covariance for the number of component estimation, other methods used EVI covariance type as seen in Figure 7.

CLUSTERING FOR THE REAL DATA SETS
While the principles of the proposed method for modelbased clustering in the grid structure on the synthetic-1 and synthetic-2 data sets were explained in the previous section, the clustering algorithm of the proposed method was applied on Iris and LSI real data sets one after another in this section.

METHOD FOR THE IRIS DATA SET
The steps of the proposed method (MMSCM) is explained on the Iris data set (UCI Machine Learning Repository), which is widely used for clustering and classification analysis. Iris data set, presented by Fisher in 1936 and widely used in clustering, is a multivariate data set with three clusters (Setosa, Versicolor, and Virginica), four variables (Sepal length, Sepal width, petal length, and petal width), and 150 observations.
In the finite mixture models, it has always been a difficult problem to correctly determine the number of components in the variables. In the proposed method, the number of components of the heterogeneous variables in the Iris data set is determined with the U-GMM. logL and BIC values were obtained from U-GMMs to determine the components in variables 1 , 2 , 3 4 (Sepal length, Sepal width, petal length, and petal width) and are given in Table 11. Applying U-GMMs to the 4 variables in the Iris data set show that there was no (homogeneous) component in the Sepal width variable ( 2 ) according to the findings obtained from the values in Table 6. Thus, the variable 2 was determined as a "redundant variable" and a variable selection was made. While determining the number of clusters is based on the number of components in the variables, according to MMSCM assumptions, homogeneous variables do not affect the number of clusters and clustering. Variable selection was made by removing homogeneous variables, so dimension reduction was applied to the data. Sepal length ( 1 ), Petal length ( 3 ) and Petal width ( 4 ) variables is used to determine the clusters and also the number of clusters in the Iris data set.
Components of variables 1 , 2 , 3 4 of the Iris data set and assigned observations are given in Table 12.  for the number of clusters, which are determined based on the components in heterogeneous variables. Thus, ≤ ≤ cluster intervals were obtained according to the component numbers = 1,2, = 1, = 1,2, and = 1,2 for variables 1 , 2 , 3 , and 4 in the Iris data set. The grid structure model consisting of the components in variables 1 , 3 4 , the clusters in the model, and the components forming each cluster are shown in Figure 9. The multivariate data set is converted into mixture models in grid structure according to the number of components in the variables and the cluster centres that may occur. Components of variables that fit cluster centres are shown in Figure 9.

SOFT COMPUTING FOR APPROPRIATE MIXTURE MODELS AMONG TOTAL MIXTURE MODELS
According to the maximum number of clusters from Equation (9) obtained from the components in the variables, the total number of mixture models from Equation (10) is obtained as M = 2 8 − 1 = 511. The null model without any set is deducted from the calculation.
Under the assumptions of the proposed clustering algorithm, the number of models, which can occur according to the components of each variable, and mixture models that fit the assumptions from these models are calculated as shown in (12) with the MMSCM.
In the soft computing method, where 1 = 2, 3 = 2, 4 = 2 correspond to components variables 1 , 3 4 , 1 , 3 , 4 indices are used to denote the number of clusters and ranges from 2 to 8 to show the number of clusters in the mixture model.
The number of clusters, the number of models, and the number of possible mixture models are given in Table 13. In this section, the soft computing method, which was expressed in Section 2, was proved that it could obtain the number and locations of the cluster in the Iris data set.
According to the information criteria, the best model was chosen from the appropriate mixture of multivariate Gaussian densities. The logL and BIC values of the best mixture models for the Iris data set are given in Table 14. The three clustercentred mixture model was seen as the best model that fits the data among the mixture models in the Iris data set. For the Iris data set, the number of components' plots for complete and reduced data obtained from mclust are shown in Figure 10 and Figure 11, respectively.
Author Name: Preparation of Papers for IEEE Access (February 2017) As it can be seen on the Figures 10 and 11, mclust based methods result 2 components for complete data. Since they cannot reduce the number of variables, real number of components could be achieved. On the other hand, the variables, which are determined by the proposed MMSCM, are applied on mclust based methods, the real number of component, 3, is obtained.
The 3D scatter plot of the best mixture model obtained from the proposed clustering method is shown in Figure 12. The vector representation of the best model, which fits the data set among the mixture models from the AGMMs, is "10010010". The cluster centres in the best-mixture model are the 1 st (Setosa), 4 th (Versicolor) and 7 th (Virginica) centres as shown in Figure 9. According to the centres in Figure 9 The results obtained from MMSCM are compared to the results of mclust and mclust based model selection methods for Iris data set in Table 15.
For the Iris dataset, MMSCM's variable selection and component count estimation is 22% more successful than other methods. While Mclust-based methods could not select variables, the estimation of the number of clusters was wrong and low success rate was obtained. However, when the variable selection determined by MMSCM was applied to the compared methods, they were able to correctly estimate the number of clusters. According to the CCRs calculated for the mclust based methods and MMSCM and shown in Table 14, a higher success rate with 32%, in other words, better model fit was achieved as a result of components determined by MMSCM and dimension reduction, while mclust based methods directed a wrong number of components as 2.

DATA SET
The proposed clustering method was applied to the remote sensed LSI data. Pixel values in the 3 rd , 4 th , and 5 th bands were used among seven variables of LSI data. There are 5 components in the three-variable LSI dataset: Wheat, Potato, Vegetable Garden, Citrus and Bare Soil [24]. The components of each variable in the LSI data set corresponded to those in the U-GMMs are given in Table 16.  according to the information criteria obtained from the U-GMMs. The components in variables of the LSI data set and the number of observations per component are given in Table  17. The cluster number ranges, = {3,3,3} = 3 and = 1 2 3 = 3.3.3 = 27 , of the mixture model to be created in the grid structure were calculated according to the components in the variables. Thus, the minimum and maximum clusters in the LSI data set were 3 and 27, respectively. Variable components and cluster centres are illustrated in Figure 13. Total mixture models for clusters were obtained from variable components of LSI data set.
can be computed as = 2 27 − 1 = 134217727 for = 27. The clusters, grid-based total mixture models, and appropriate mixture models for components are given in Table  18.  The results obtained from MMSCM are compared to the results of mclust and mclust based model selection methods for LSI data set in Table 19.  Runtime analysis of MMSCM, the four datasets mentioned above are listed in Table 21. It can be seen from Table 21 that the processing time is longer when the size of the dataset is large. Table XX also shows the acceleration and high-quality clustering results of MMSCM.
With respect to the values shown in Table 20, mclust based methods determine 9 components for VVV covariance types. Moreover, low CCR percentages for mclust based methods yield that the number of components are determined wrongly.
The BIC values as well as number of components with respect to the different covariance types are presented in Figure 14. It could observed on Figure 14 that the optimum number of components for VVV covariance type on LSI data set is 5. The results of the 5-component model obtained from mclust method are as follows: BIC= 786830.9, CCR= 62.9%, and ARI= 0.44. The scatter plot of the components and the positions of the components are shown in Figure 15. The correct classification table for LSI data set is given in Table 22. The image data of 5 clusters and their colours separated as a result of the clustering method proposed in the LSI data set are shown in Figure 16. For LSI dataset, MMSCM achieved more than 50% success compared to mclust based methods. Existing methods of cluster estimation are quite unsuccessful in big data with a small number of variables but a large number of observations. With the univariate grid structure mixture models approach, MMSCM has shown a very high success not only in the number of clusters and CCR, but also in time complexity.

IV. CONCLUSION
In this study, based on the components of heterogeneous variables in the data according to the mixture model soft computing method, a novel method was proposed for determining the clustering in GMM. The developed clustering algorithm was applied to the synthetic-1, synthetic-2, Iris and LSI data set, and it was observed that the variable cluster and number of clusters made an accurate clustering compared to the studies in the literature.
The dimension reduction method proposed by MMSCM has been important preliminary information in accurately estimating the number of components for mclust based methods.
In conclusion, the proposed MMSCM yields %15 better CCR results for synthetic-1 data set. Moreover, for synthetic-2 data set, MMSCM is better than mclust and vscc methods with respect to the variable selection. Furthermore, the proposed method gives better results for number of cluster estimation and variable selection as well as higher CCR regarding to Iris data set. Finally, for the LSI data set, MMSCM not only estimates number of cluster better, but also results a higher CCR value. It is clear that the proposed MMSCM performs better on the variable selection and the number of cluster estimation in comparison with the mclust based models.
The disadvantage of the proposed algorithm is that the number of elements of the search space increases exponentially when the number of variables increases and there are too many components in each variable. Determining AGMMs and obtaining vector representations among mixture models in the search space increases the time complexity. In order to overcome this problem, solutions for parallel computation can be studied in the proposed algorithm.
In addition, the proposed method gives faster and more accurate results than existing methods in terms of variable selection and cluster number estimation in big data.
In future studies, it is aimed to combine the mixture model soft computing method with deep learning methods in estimating the number of clusters of big data within the framework of variable selection. In addition, it is expected to combine the proposed method with multi-criteria decisionmaking methods and test it in different application areas. Besides, the variable selection, number of cluster estimation and classification success approaches of the proposed method should be developed as a package in R software.