A Robust Spatial Information-Theoretic GMM Algorithm for Bias Field Estimation and Brain MRI Segmentation

,


I. INTRODUCTION
Brain magnetic resonance (MR) images have been widely used for diagnosis of brain diseases because of high contrast, high spatial resolution and multi-spectral characteristics. Accurate segmentation of major brain tissues into the gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF) is a key technology and important link in the study of many brain disorders. Hence brain MRI segmentation has attracted extensive interests. However, MR images generally contain the noise and a smoothly varying bias field, The associate editor coordinating the review of this manuscript and approving it for publication was Eduardo Rosa-Molinar .
which is also referred to as the intensity inhomogeneity [1]. Most of existing intensity-based segmentation methods cannot achieve accurate segmentation results because of the noise and the intensity inhomogeneity. In recent years, several methods were proposed and developed for accurate segmentation [2]- [6], [30]- [33]. These methods involve algorithms with respect to thresholding process, region based, clustering methods, soft segmentation, the matting.
The statistical model-based techniques are successfully applied in image segmentation because of the simplicity and efficiency. In the model-based algorithms, standard Gaussian mixture model (GMM) [8] is well known and has been widely utilized in image segmentation. The GMM is also generally VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ utilized and modified to correct intensity inhomogeneity [9]- [11] and its parameters can be estimated by the expectation maximization (EM) algorithm [7]. Wells et al. [9] proposed an adaptive segmentation algorithm for MRI data (AS-EM), which uses the EM algorithm to simultaneously correct bias field and segment brain MR images. Van Leemput et al. [10] further developed an automated bias field correction method (AM-EM) based on EM algorithm and utilized a digital brain atlas to provide the prior probability maps of brain tissues. However, the traditional GMM supposes that the pixels of an image are independent with each other and do not take any spatial information into account, hence the GMM lacks enough robustness to noise and outliers. In order to overcome the disadvantages of GMM, the seminal Markov random field (MRF) model has been proposed to introduce the prior information including spatial information of pixels [12]- [15]. It makes segmentation results smoother and less sensitive to noise. Blekas et al. [12] proposed a spatially constrained mixture model for image segmentation (SVFMM), which imposes spatial smoothness constraints between neighboring pixels. Diplaros et al. [13] proposed a novel generative GMM and an EM algorithm (SCGM-EM) for image segmentation. The SCGM-EM assumes that the hidden class labels of the pixels are generated by prior distributions, which share similar parameters within neighbor pixels and can reduce the effect of noise. To retain more image details, Nikou et al. [14] proposed a class-adaptive spatially variant mixture model (CA-SVFMM) which can adaptively determine the spatial directions. CA-SVFMM uses MRF to model the joint prior distribution of each pixel label. However, the M-step of EM algorithm cannot be applied directly to estimate the model parameters from the observations. In order to deal with this problem, Nguyen and Wu [15] proposed a fast and robust spatially constrained GMM (FRSCGMM). It introduced a novel spatial factor into the prior distribution and can directly apply the EM algorithm to estimate parameters. Although these models work well for noisy image, most MRF-based algorithms still have some drawbacks that high computational complexity and difficulty to preserve detail information.
Retaining image details and reducing the influence of noise simultaneously are contradictory. On one hand, the smoothing parameter should be chosen small enough to prevent the image from losing much of its sharpness and details. On the other hand, the value of the smoothing parameter has to be chosen large enough to restore images corrupted by noise and outliers. So, if we want to overcome the effect of the noise and preserve more detail information simultaneously, we can choose the small smoothing parameter to retain more image details and use some methods to identify/eliminate outliers or high-level noises which are not smoothed out. Song [16] proposed a robust information clustering algorithm, which used the mutual information (MI) maximization method to evaluate the reliability of input data and identify outliers. However, it only gave an intuitive discussion without a detailed theoretical demonstration. Wang et al. [17], [18] developed the MI maximization method into a complete theorem, called the information-theoretic approach for high level outlier identification. The experimental results have shown that the MI maximization is indeed a powerful tool to identify outliers.
Motivated by the aforementioned observations, we proposed a robust spatial information-theoretic GMM algorithm for bias field estimation and brain MRI segmentation. To reduce the effect of noise and preserve more details, the non-local spatial information is introduced. The smoothing parameter can be chosen small enough to retain more image spatial structure information. In the end of experiment part, we analyze the effect of the smoothing parameter. The selection of the smoothing parameter is usually dictated by the considered application of the model. To identify and eliminate outliers, we extend the MI maximization method to the mixture model and achieve more smooth and accurate results. By combining these two approaches, the model can retain more image details while reducing the influence of noise. In order to reduce the influence of intensity inhomogeneity, we use a linear combination of a set of orthogonal polynomials to estimate the bias field. The objective function is then integrated with the bias field estimation to make the model estimate the bias field, meanwhile segmenting images. The experiment results on synthetic and clinical brain MR images illustrate the superior performance of the proposed model compared with other state-of-the-art segmentation methods.

A. BIAS FIELD ESTIMATION
The observed MRI signal I can be modeled as where I is the observed MR image, B is an unknown bias field, J is the true image to be restored, and n is an additive noise [22]. Within the truth signal J , the intensity of pixels in the same tissue have a specific value, reflecting the physical property being measured. The aim of bias field estimation is to eliminate the bias field B from the observed image and correct the intensity inhomogeneity. Classically, the bias field B is assumed to be smooth and slowly varying in small local region. We usually use a set of orthogonal polynomials to approximate the bias field. In this paper, we use two-dimensional Legendre polynomials to approximate the bias field B and the number of polynomials is set to 3.

B. THE CLASSICAL GAUSSIAN MIXTURE MODEL
We set image I = {x i } i = (1, 2, 3 . . . N ), where x i with dimension D is the ith pixel of image I and N is the number of total pixels. Let the neighborhood of the ith pixel be presented by ∂ i . Labels are denoted by ( 1 , 2 , . . . , K ). GMM [19] assumes that each observation x i is considered independent of the label k , k = 1, 2, . . . , K . So we can segment the whole image consisting of N pixels into K labels. The density function of GMM is given by where = {π ik } , i = (1, 2, . . . , N ) , k = (1, 2, . . . , K ) is the set of prior probabilities for the pixel x i in label k , which satisfies the constrain 0 ≤ π ik ≤ 1 and K k=1 π ik = 1.
Each Gaussian distribution called a component of the mixture can be written in the form: where µ k with D dimension is the mean vector, the D × D matrix k is the covariance, and k denotes the determinant of k . Since each data point x i is modeled as statistically independent, the joint probability density of the image data However, the classical Gaussian mixture model only considers the intensity distribution, which makes it sensitive to noise without any spatial information.

C. GAUSSIAN MIXTURE MODELS BASED ON MRF
To reduce the effect of noise, the Markov random field (MRF) has been widely utilized into the GMM. The spatial information is generally incorporated among prior values by utilizing the MRF distribution: where Z is a normalizing constant, T is a temperature constant, and U ( ) is the smoothing prior. Based on Bayes's rules, the posterior probability density function can be written as So the log-likelihood function can be written as We can obtain various models by choosing different energy U ( ). By adopt different kinds of U ( ), many MRF-based mixture models have been successfully applied to image segmentation. In SVFMM [17], Z and T in (7) are set to one, and the spatial information is taken into account by setting U ( ) as where β is a constant value. In CA-SVFMM [14], U ( ) is given by where S is the total number of the considered directions and β ks is a variable parameter. In general, S is equal to four (S = 4: horizontal, vertical, and two diagonal directions). However, introducing spatial information in the above way adds computational complexity. Due to the complexity of the log-likelihood function, the M-step of EM algorithm cannot be applied directly to the prior distribution π ik . Thus, the resulting algorithms are computationally complex and utilize large amounts of computational power to solve the constrained optimization problem of the prior distribution π ik . In order to overcome the disadvantage of these algorithms, Nguyen and Wu [15] introduced a novel factor G ik defined as where β is the temperature value that controls the smoothing prior and z ik is the posterior probability. t indicates the iteration step. Then the new smoothing prior U ( ) is defined as The main advantage of G ik is the ease of implementation, and incorporation of the spatial relationships among neighboring pixels in a simpler metric. Actually, the factor G ik plays a role as a linear filter for smoothing and restoring images corrupted by noise. However, the drawback of this energy U ( ) is that it may cause over smoothing for the segmentation and lose the image details especially for the CSF segmentation of brain MR images.

D. MI MAXIMIZATION FOR OUTLIER FURTHER ELIMINATION
In order to eliminate the high-level outlier, Wang et al. [17], [18] proposed a novel adaptive spatial informationtheoretic fuzzy clustering algorithm (ASIFC). The ASIFC algorithm is a two-step image segmentation algorithm and its second step is the high level outlier elimination by MI maximization. The MI maximization method in information theory provides a way to reassess the reliability of input data. The objective function of ASIFC algorithm is where e i is the estimated input data distribution p(x i ) (with constraint N i=1 e i = 1). u k|i = p(w k |x i ) is the membership of assigning observation x i to cluster center w k and d s (x i , w k ) is an improved similarity measure. After generating the estimated cluster center w k and membership value u k|i , we can evaluate the input data reliability. Highly unreliable data points need to be labeled as outliers and are going to be clustered again with a bigger neighboring window. We can obtain a set of new robust input data reliability values e i for every input data points by using the information-theoretic method. e i can be used to label/eliminate the remaining unreliable data points (high level outliers) and thus improve the clustering/segmentation results of the algorithm.
To derive the robust estimation of input data distribution e i , ASIFC defines the mutual information between the input data distribution and the optimal membership distribution as follows: Considering N i=1 e i = 1 limitation, Lagrange's method is used. The robust estimation of e i is given by where u k|i = p(w k |x i ) is the membership function. e i will be iteratively updated by (14). It has been proved that when e i is near to zero, the input data x i can be regard as an outlier point [18]. In this paper, we extend this method to the mixture model by replacing the membership u k|i with posterior probability z ik . Both theory and experiments have proved that the MI maximization is a powerful tool to identify outliers as the highly unreliable data point (pixel).

III. PROPOSED MODEL A. MODIFYING SPATIAL INFORMATION
Motivated by the constructions of factor G ik in FRSCGMM and the information-theoretic method in ASIFC, we proposed a robust spatial information-theoretic GMM algorithm for brain MR images segmentation. In order to circumvent these shortcomings in above Gaussian mixture models based on MRF, we introduce a modified spatial factor F ik considering the non-local information because the image patches contain more structure information. The novel spatial factor takes the non-local spatial information into account based on the posterior probabilities and prior probabilities, which can preserve more details for the final segmentation and overcome the over-smoothing disadvantage of factor G ik . The modified spatial factor F ik based on the posterior probabilities and prior probabilities is defined as (15) where β is the temperature value that controls the smoothing prior, N i is the neighborhood of pixel x i , z mk is the posterior probability and π mk is the prior probabilities for the pixel x m in label k. w im is the non-local weight, which is defined as h is the smoothing parameter, which controls the smoothness of noise. The smaller h is, the more image details will be retained. In the end of experiment part, we analyze the effect of the smoothing parameter h. Defining the neighbor of pixel x i is a neighbor patch P i , then the similarity of pixel x i and x m is the gray-similarity of image patch P i and P m . The distance between patch P i and P m is calculated as following i is the nth pixel value of the image patch P i . The non-local weight w im are used to characterize the similarity between image patch P i and P m . If the similarity of P i and P m is high, the corresponding weight w im will be great and the neighboring pixel x m will have more influence on the current pixel x i . Compare with (10), we find (15) can adjust the influence of the neighboring pixels automatically by the weight parameter w im . If the neighbor of pixel x i and x m is similar, they may belong to a same tissue and the weight w im will increase. Otherwise, if two pixels are far different, the pixel x m will have small effect on current pixel x i . Therefore, the introduced non-local spatial information can make the segmentation result retain more image details.
Different from the spatial factor G ik in FRSCGMM [15], the proposed factor F ik not only takes into account the local neighborhood information, but also considers the spatial structure information of pixels. The proposed algorithm introduces the non-local information and integrates the non-local weight into model directly. Therefore, our algorithm can retain more image details while reducing the influence of noise. Now, a novel factor F ik is proposed to incorporate the spatial information into the smoothing prior. The new smoothing prior U ( ) is given by where t indicates the iteration step. The MRF distribution p ( ) in (5) is given by Given the MRF distribution p ( ), we can obtain the loglikelihood function: Application of the complete data condition in [19], maximizing the log-likelihood function L ( , θ | X ) in (20) will be equivalent to maximizing the objective function J( , θ | X ): The posterior probability z ik can be computed as follows: In order to estimate the bias field B in brain MR images, we assume that the bias field is modeled as a linear combination of a set of smooth basis functions where g l , l = (1, 2, . . . , S) is a set of basis functions, S is the number of g l , and w l ∈ R, l = (1, 2, . . . , S) are the realvalued combination coefficients. In this study, we use orthogonal polynomials to represent g l and S = (P + 1)(P + 2)/2, where P is the degree of these polynomials. Theoretically, any function can be approximated by a linear combination of a set of basis functions up to arbitrary accuracy [20]. Now, the log-likelihood function of the proposed model can be written as where (x i |θ k , b i ) is the Gaussian distribution with parameter θ k and the bias field b i , which can be defined as Therefore, the objective function of the proposed model is given by where The next step is to maximize the objective function J ( , θ, B | X ). Similar to the MRF-based methods [12], [14], Z and T in (26) are set to one. Then, we have the new objective function ).
To obtain the estimations of µ Considering the prior distribution follows the constraint K k=1 π ik = 1, we use Lagrange multiplier method to get the estimation of π (t+1) ik . We have Finally, we should obtain the combination coefficients w = {w 1 , w 2 , . . . , w S } in order to estimate the bias field B. By solving ∂J w (t+1) = 0, we can get where Iterating through these necessary conditions (29)-(33) will lead to the maximum of objective function J ( , θ, B | X ).

B. OUTLIER ELIMINATION BY MI MAXIMIZATION
To identify and eliminate outliers, we extend the MI maximization method to the mixture model by replacing membership value u k|i in Eq. (14) with posterior probability z ik shown in Eq. (27). After generating the estimated posterior probability z ik , we can evaluate the input data reliability by the information-theoretic method. Highly unreliable data points will be viewed as outliers and need to be eliminated. We can VOLUME 8, 2020 obtain a set of robust input data reliability values p (x i ) for every input data points by using the mutual information (MI) maximization method. The reliability values p (x i ) of highly unreliable data points(outliers) are zero. So, we can use it to label/eliminate the outliers. Similar with [18], in order to get the robust estimation of input data distribution p(x i ), we define the mutual information between the optimal parameter distribution p(θ) and the input data distribution p(X ) as follows: The next objective is to maximize the mutual information I (X , θ). Considering N i=1 p (x i ) = 1 limitation, Lagrange's method is used. Finally, we can get the robust estimate for p (x i ) by the following function In this paper, we introduce a novel spatial factor based on the non-local spatial information to reduce the effect of noise and preserve more image details. The spatial factor contains not only posterior information but also prior information, which can preserve more details for the final segmentation and overcome the over-smoothing disadvantage of factor G ik . It plays a role as linear filters for smoothing and restoring images corrupted by noise. The spatial factor is also easy to implement and incorporates the spatial relationships among neighboring pixels in a simpler metric. In addition, in order to reduce the influence of bias field on segmentation, we integrate the bias field estimation into the objective function, which makes the method can estimate the bias field meanwhile segmenting the image. Finally, we use the mutual information (MI) maximization method to identify outliers in order to further improve the accuracy of segmentation. The corresponding algorithm is a two-step segmentation algorithm. In the first step, it uses the improved GMM algorithm based on MRF to segment images and estimate the bias field at the same time. Afterwards, the information-theoretic method is used to eliminate outliers and further improve segmentation results. The flow step of the proposed algorithm can be summarized as follows: Step 1) Initialize the parameters { , θ, B}.
Step 2) E step: a) Evaluate the posterior probability z ik in (27) by using the current parameter values. b) Update the factor F ik by using (15).
Step 6) Update the input data distribution p (x i ) by using Step 7) If then go to Step 8; otherwise, return to Step 6.
Step 8) Label unreliable data points. Create a 3 × 3 window centered at the outlier, get the clustering labels for all neighboring pixels within this window.
Step 9) Store all label counts for each cluster into ξ k (k = 1,2,. . . ,K). Find the maximum value of ξ k and its corresponding cluster index m (ξ m = max(ξ k )).
Step 10) If ξ 1 = ξ 2 = . . . = ξ K , increase the current window edge by two more pixels, go to Step 8. Otherwise, update the outlier label with m.
Step 11) Iterate for all outliers until there is no further change, then stop.
In the next section, experimental results will be given to demonstrate the robustness and accuracy of the proposed algorithm, as compared with other GMM-based models.

IV. EXPERIMENTAL RESULTS
In this section, we conduct the proposed algorithm on a set of synthetic and clinical 3T brain MR images to illustrate the performance of our model. These images are segmented to grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF). In the comparative study, some models are chosen to compare with our proposed algorithm. These models are standard GMM [8], DCA-SVFMM [14], FRSCGMM [15], BMMSI [24], ASIFC [18], and PFCM [34].
Without special instructions, each algorithm in our experiments uses the following parameters. In DCA-SVFMM, the total number of considered directions is set to be 4 and the size of neighborhood window is 5 × 5. For FRSCGMM, temperature value is obtained by searching the interval [0, 10] in terms of the optimal JS value and the size of neighborhood window is 5 × 5. The initial value of the smoothing parameter β in BMMSI is set to 65 and the learning rate l is set 89622 VOLUME 8, 2020 to 10 −6 . The size of neighborhood window in BMMSI is also 3 × 3. The maximum iteration number for each algorithm is 100. For PFCM, the values of a, b, m, η are set to 1, 5, 7, and 1.5, respectively. Finally, the temperature value β and the smoothing parameter h in the proposed algorithm are obtained by searching the interval [0,10] and [0,200] respectively in terms of the optimal JS value. All methods are initialized by using K-means method.
To quantify the segmentation results and verify the performance of the proposed model, Jaccard similarity (JS) [1] is used as a criterion. It is defined as the ratio between intersection and union of the segmentation result S 1 and the ground truth S 2 : The JS value is between 0 and 1 and a more accurate segmentation result should correspond to a higher JS value. In order to show the robustness to noise, we test our algorithm on a synthetic image. Fig. 1 (a) is the original image and (b) is the original image corrupted by Gaussian noise (parameters: mean 0 and variance 0.007). Fig. 1 (c)-(g) are the segmentation results of the standard GMM, BMMSI, the proposed algorithm, ASIFC and PFCM, respectively. Table. 1 shows the JS values comparison of segmentation results on the synthetic image with different noise level. We can see from the results that our algorithm is much more robust to the presence of noise as compared to other algorithms because of the incorporation of the non-local spatial information and the outliers elimination by MI maximization.

A. SEGMENTATION OF SYNTHETIC BRAIN MR IMAGES
In this section, the synthetic brain MR images will be used for segmentation to show the performance of our algorithm. The first experiment in this section are carried out in 3T-weighted brain MR images. Fig. 2 shows the segmentation result of our algorithm on 3T-weighted brain MR images with 5% noise and 80% intensity inhomogeneity. The first and second row show the initial images with intensity inhomogeneity, estimated bias field, bias corrected images, and the segmentation results, respectively. It can be seen from the results that the intensities within each brain tissue in the bias-corrected images become quite homogeneous. Fig. 2 demonstrate that the results of our method are consistent with the expected tissue regions. We can evaluate the performance of bias field correction by quantifying the intensity inhomogeneities of the bias field corrected images using the coefficient of joint variation(CJV). For two tissues T1 and T2 (GM, WM or CSF), the CJV is defined as where σ (T) and µ(T) are the standard deviation and the mean of the intensities in the tissue T. The smaller CJV values indicate better bias field correction results. The CJV values of the three tested methods for the 21 images (N5F80) are plotted in Fig. 3, which shows better performance of our method than two other methods. In the second experiment, we compared our method with above four existing segmentation algorithms on a simulated brain MR images with different noise level to show the robustness of our method to noise. All images in this experiment are selected from BrainWeb (http://brainweb. bic.mni.mcgill.ca/). The BrainWeb provides full three-D synthetic brain MRI data with three modalities (T1, T2 and PD) and it can also provide brain data sets with various slice thickness, noise levels and levels of intensity inhomogeneity. In this experiment, we use the T1-weighted 1mm synthetic brain MR images with no intensity inhomogeneity and a noise level ranging from 3% to 7%. Fig. 4 and Fig. 5 respectively show the segmentation results on the simulated brain MR images with noise level 5% and 7%. Fig. 4 (a) and Fig. 5 (a) show the original images corrupted by noise and outliers. Fig. 4 (b) and Fig. 5 (b) are the ground truth. We can see that the standard GMM (Fig. 4 (c) and Fig. 5 (c)) performs poor for the image with heavy noise. Due to the lack of spatial constrains, the GMM is very sensitive to noise and outliers. The DCA-FCM (Fig. 4 (d) and Fig. 5 (d)) is a directionally adaptive algorithm and has more complexity on the log-likelihood function. The incorporated local spatial information makes the M-step of EM algorithm cannot be directly applied to the prior distribution. In order to update the contextual mixing proportions,  the DCA-FCM needs to solve a quadratic equation for each pixel belong to each cluster. It leads the segmentation results unstable. It can be seen from Fig. 4 (d) and Fig. 5 (d) that the DCA-FCM cannot achieve good segmentation results in high noise environment. The segmentation result of the FRSCGMM (Fig. 4 (e) and Fig. 5 (e)) is much smoother. The FRSCGMM constructs the spatial information based on the prior and posterior probability, but it only utilizes a smoothing filter over the whole image. Therefore, it may get over-smoothed segmentation results. To improve the segmentation accuracy, the BMMSI (Fig. 4 (f) and Fig. 5 (f)) choose the Student's t-distribution for its component function and the representation of contextual mixing proportion is closely related to the Gaussian distribution of its neighborhood. However, only image intensity is adopted as feature to apply to segmentation for the BMMSI. It do not consider the region or boundary information of image, hence it cannot achieve the satisfied results. Comparing with these four algorithms, our proposed method can visually produce better segmentation results by using the non-local spatial information and MI maximization method.
For quantitative comparison, we apply above five algorithms to the segmentation of 30 MR images with a noise level ranging from 3% to 7%. The segmentation accuracy is measured by using means and standard deviations of JS values on GM, WM, and CSF. The results are listed in Table 2. This comparison demonstrates that the proposed algorithm produces the most accuracy segmentation (with higher means of JS values) and has the best ability and robustness of denoising (with lower standard deviations of JS values). In addition, we can also find that the proposed algorithm has the highest mean JS values for CSF tissue, which shows that our method can retain more image details.
In the third experiment, we evaluate the proposed algorithm and compare it to five existing brain MRI segmentation algorithms, including AS-EM [9], AM-EM [10], BCFCM [2], RSCFCM [22], and MICO [23], to show the robustness of our method to intensity inhomogeneity. The parameters for each method are set with the default values specified in the papers. Please refer to the corresponding references for more details.
In order to show the impact of the intensity inhomogeneity, the segmentation results obtained by applying above six methods to the simulated brain MR images from BrainWeb are displayed in Fig. 6. Fig. 6 (a) is the initial image with 80% intensity inhomogeneity and 3% noise and Fig. 6 (b) is the ground truth. The intensity inhomogeneity level in the original image is very high, which makes the segmentation algorithms hard to obtain accurate results. AS-EM (Fig. 6 (f)) is an iterative algorithm which interleaves classification with bias field estimation. However, AS-EM is constructed by manually selecting representative points of each cluster, which may cause the segmentation results that are not fully objective and reproducible. The AM-EM (Fig. 6 (e)) is an improvement of AS-EM and it uses a digital brain atlas to provide the prior probability maps of brain tissues. However, the parameters initialization of the AM-EM remains very significant, which may result in unsatisfied segmentation results under K-means initializations. Fig. 6 (g) shows the segmentation result of the RSCFCM. The RSCFCM only uses pixels information on one of the four directions in the neighborhood to decrease the influence of noise, hence it cannot retain more image details. MICO (Fig. 6 (d)) can estimate the bias field well and effectively reduce the influence of severe intensity inhomogeneity, but it is sensitive to noise and outliers. The BCFCM (Fig. 6 (c)) can adaptively VOLUME 8, 2020 segment brain MR images and correct bias field. But it also cannot preserve image details well. Comparing with segmentation results of other five algorithms and the ground truth, our proposed method can effectively overcome the effect of intensity inhomogeneity and noise and retain more image details, which makes the segmentation results more accurate.
Similarly, we also apply these algorithms on 21 brain MR images selected from BrainWeb (60th∼80th), which contain 3% noise and the intensity inhomogeneity ranging from 20% to 80%. The mean and standard deviation of JS values for the segmentation of GM, WM, and CSF are listed in Table 3. Both visual and quantitative comparisons demonstrate that our proposed algorithm has the superior performance in dealing with the intensity inhomogeneity.

B. SEGMENTATION OF CLINICAL BRAIN MR IMAGES
In this section, we will segment the clinical brain MR images which are generated from Internet Brain Segmentation Repository (IBSR, http://www.cma.mgh.harvard.edu/ ibsr/). Above six algorithms (AS-EM, AM-EM, BCFCM, RSCFCM, and the proposed method) will be applied to the segmentation of these clinical brain MR images in this experiment. Fig. 7 and Fig. 8 respectively show the segmentation results on the clinical brain MR image data sets from IBSR (205_3# and 15_3#). Each data set has intensity inhomogeneity. Fig. 7 (a) is the initial image (21th of 205_3#) with the low-level intensity inhomogeneity and Fig. 7 (b) is the ground truth. Fig. 7 (c-h) are the segmentation results of above six algorithms. Fig. 8 shows the segmentation results on the 34th image of 15_3# from IBSR. Fig. 8 (a-b) are the initial image with severe intensity inhomogeneity and the ground truth, respectively. And Fig. 8 (c-h) are the segmentation results. It can be seen from the Fig. 7 and Fig. 8 that the proposed method can effectively overcome the effect of intensity inhomogeneity and noise.
The segmentation accuracy (mean ± standard deviation of JS values) of these algorithms on the data sets (205_3# and 15_3#) from IBSR is listed in Table 4. We only list the JS values of GM and WM because the data sets only contain the few pixels belonging to CSF. Table 4. demonstrates that our proposed algorithm can obtain more accurate results.
The proposed algorithm has two main constant parameters: the temperature value β and the smoothing parameter h. We will test the effect of them to show the robustness of our method. The parameters are chosen small enough to prevent the image from losing much of its sharpness and details. But the much smaller β and h will make the influence of neighbor pixels much smaller than the center pixel, which will decrease  the robustness to noise and outliers. However, if we choose the much bigger β and h, it will cause over-smoothing and losing image details. The selection of β and h is usually dictated by the considered application of the model.
The effect of the parameter β and h in our algorithm is tested on the synthetic MR brain images (3%, 5%, 7% noise and 80% intensity inhomogeneity) from BrainWeb. For the constant parameter, we experiment on the synthetic brain MR  FIGURE 9. Illustration of the effect of β on a synthetic brain MR image data with parameters: noise level 3%, 5%, 7% and intensity inhomogeneity level 80%. images of slice 60th∼80th. The average JS values of GM, WM, and CSF are shown in Fig. 9 and Fig. 10. It can be seen from the Fig. 9, when we set β as 2, our algorithm can get a better result for segmentation in different noise level. In our paper, the temperature value β is manually set to 2. And Fig. 10 shows that we should choose a smaller h in low noise level images to retain details and choose a bigger h in high noise level images to tolerate the noise. In our paper, the smoothing parameter h is set to 50 in 3% noise level images and 75 in 5% and 7% noise level images.

V. CONCLUSIONS
This paper proposed a robust spatial information-theoretic GMM algorithm for the bias field estimation and brain MRI segmentation. This method efficiently incorporates non-local spatial information and integrates the bias field estimation model into a GMM framework to simultaneously estimate the intensity inhomogeneity and segment brain MR images. By incorporating the non-local spatial information, the proposed algorithm can reduce the effect of noise and preserve more details. Meanwhile, it also uses the MI maximization method to improve the robustness to outliers. Therefore, the proposed method overcomes the drawbacks of existing GMM-based clustering algorithms, including limited robustness to outliers, over-smoothness for segmentation and limited accuracy for image details. The experimental results on both synthetic and clinical brain MR images demonstrate that the proposed algorithm can simultaneously overcome the influence of noise and intensity inhomogeneity, and outperforms other several state-of-the-art algorithms. CHENG NING is currently pursuing the M.S. degree in mathematics from the Nanjing University of Information Science and Technology, Nanjing, China. Her current research interest is in image segmentation with statistical model. CHUNZHENG CAO received the Ph.D. degree in applied mathematics from Southeast University, Nanjing, China, in 2012. He is currently a Professor of statistics with the School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing. His research interests include statistical learning, functional/longitudinal data analysis, and Bayesian inference.
JIANWEI YANG received the M.S. degree in mathematics from Xi'an Jiaotong University, in 1996, and the Ph.D. degree in computer science from Hong Kong Baptist University, Hong Kong, in 2005. He is currently a Professor with the School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, China. His current research interests include pattern recognition and computer vision. VOLUME 8, 2020