Analysis and Identification of Dermatological Diseases Using Gaussian Mixture Modeling

Skin diseases are common and are mainly caused by virus, bacteria, fungus, or chemical disturbances. Timely analysis and identification are of utmost importance in order to control the further spread of these diseases. Control of these diseases is even more difficult in rural and resource-poor environments due to a lack of expertise in primary health centers. Hence, there is a need for providing self-assisting and innovative measures for the appropriate diagnosis of skin diseases. Use of mobile applications may provide inexpensive, simple, and efficient solutions for early diagnosis and treatment. This paper investigates the application of the Gaussian mixture model (GMM) based on the analysis and classification of skin diseases from their visual images using a Mahalanobis distance measure. The GMM has been preferred over the convolution neural network (CNN) because of limited resources available within the mobile device. Gray-level co-occurrence matrix (GLCM) parameters contrast, correlation, energy, and homogeneity derived from skin images have been used as the input data for the GMM. The analysis of the results showed that the proposed method is able to predict the classification of skin diseases with satisfactory efficiency. It was also observed that different diseases occupy distinct spatial positions in multidimensional space clustered using the Mahalanobis distance measure.


I. INTRODUCTION
Human skin performs various functions like Vitamin-D synthesis, internal organs protection, control of water loss, and shielding the body from environmental hazardous. Human skin consists of three layers: epidermis, dermis, and hypodermis as shown in Fig. 1. The external layer is called epidermis. It is the thinnest layer with thickness varying from 0.05 to 0.15 mm. It provides mechanical resistance and acts as barrier against bacteria, harmful chemicals, and ultraviolet (UV) radiations [1]. Dermis is the middle layer whose thickness varies from 1.5 mm to 4 mm. Its primary function is to protect the body from mechanical stress and strain. It is divided in two strata, papillary dermis and reticular dermis. Papillary dermis consists of loose fiber bundles connecting it to the epidermis. Whereas, the reticular dermis is much thicker than the papillary dermis and contains dense networks of elastin, collagen, reticular fibers, capillary vessels, sensory receptors, and hair follicles [2]. Beneath the dermis, is a layer of fat and loose fibers known as hypodermis. It stocks fat and provide thermal insulation. Skin diseases are generally caused by virus, bacteria, fungus, and chemical disorders. Uncontrolled spread of the skin diseases may be dangerous and hence, timely treatment is important and also contagious skin diseases may prove to be even more dangerous [3] for which various methods are in use for automatic identification, classification, and prediction of the necessary precautions to be taken [4]. The problem is more severe in visually similar skin diseases.
In resource-poor environment, where health workers have even lesser expertise specially in dermatology, need more convenient and innovative measures for the proper diagnosis of the skin diseases. In these areas, dermatological services are commonly provided by medical staff and expertise in the dermatology cannot be expected. Therefore, queries are generally sent to specialists and the response takes several days to arrive. Studies conducted in rural areas of countries like Colombia showed that average waiting time for a dermatologist was more than three weeks [5]- [6]. Thus, the use of mobile applications to enable real-time dermatological diagnosis in these areas has great applicability. Innovative technical solutions can help in bridging the gap in resource-poor environment [7]- [8]. Several dermatology related mobile phone applications are available and leading to the share of teledermatology from 11.0% in 2014 to 20.1% in 2017 [9]. Most of these applications provide only consultation for self-diagnosis using visual, audio, and data services of mobile communication [10]- [15]. Teledermoscopy reduces costs, avoids unnecessary biopsies, and decreases the time to initial therapy [16]- [17].
In dermatology, identification of the problem of skin, nails, and hair is carried out. Sometimes, allergies, irritants, genetic structure, and immune system disorder is also responsible for dermatitis, hives, and other skin problems such as acne, cold sore, blisters, hives, actinic keratosis, carbuncle, eczema, psoriasis, measles, etc. Human skin has many types of cancers like melanoma, basal cell carcinoma, and squamous cell carcinoma. Dermoscopy is a non-invasive and visual symptoms based method for the identification of skin abnormalities. Identification carried out by naked eye is generally limited in accuracy, therefore, computer assisted techniques are more effective [18].
Recently, artificial intelligence has also been used in dermoscopy [19]. CNN has shown encouraging results in accurate diagnosing of the skin diseases. These applications under-perform with the images taken in poor lighting conditions and generally lead to wrong diagnoses [20]. The problem is more severe in the diseases with similar symptoms. The main limitations of CNN is the code complexity and the requirement of large amounts of input data for training [20]- [ 21].
Some of the important algorithms used in the area of image processing for estimation and approximation of images include adaptive image equalization algorithm [22] which automatically enhances the contrast of an image using GMM. Contrast equalized image is generated by the preeminent gaussian component and cumulative distribution functions of the input intervals. Nicholas et al. [23] introduced sub-regions histogram equalization that partitions the image based on its smoothed intensity values that are obtained by convolving the input image with a gaussian filter. Multilayer feed forward neural network [24] for precise and computationally effective division of components from the dermoscopic image utilizing genetically optimized fuzzy grouping approach is used. The literature reviewed on melanoma skin cancer [25] highlights that various approaches like artificial neural network (ANN) and data mining can be used for classifying skin cancer images. Accuracy obtained by these respective algorithms are 95-98% and 85%. In continuation to the above literature, various skin diseases can be detected and classified with various approaches, some of the important ones include: wavelet transformation and fuzzy inference system [26], support vector machine (SVM) [27] with 65.56% accuracy, kmeans clustering and fuzzy-c means clustering [28], rule based and forward chaining inference engine [29], case based reasoning [30] achieving accuracies of 70%, 66.6% and 90%, respectively. For human skin color detection various methods have been used which include statistical modeling (GMM) [31][32][33] and genetic algorithm [34].
GMM has been used in various applications. In this context, GMM for classification of Alzheimer's disease is introduced in [35]- [36] leading to the fact that the approach used in [36] is better than statistical hypothesis testing. A combination of GMM and various generative models [37] like k-nearest neighbors, naive bayes, multilayer perceptron and discriminative models (SVM, decision trees) have been reported for emotion recognition.
Also, GMM classifier is used for identification of normal and abnormal retinal images of patients suffering from diabetes which attained an accuracy of 97.78%. GMM is used for multiple limb motion classification [38]- [39], using continuous myoelectric signals. In continuation, GMM along with genetic algorithm is also used in [40] for auto segmentation of magnetic resonance images (MRI) lesions. Gaussian mixture model and logarithmic linearization algorithms [41] are used for pattern classification of ECG signals achieving an accuracy of 99.21%. In [42], Equal-Variance GMM has been used to model the characteristics of images, where, equal variance is shared by all the GMM variables. It has also been used in identification of cancer chemosensitivity of heterogeneous cellular response to perturbations in fluorescent sphingolipid metabolism [43] for extracting texture and intensity from the cellular images of the flow cytometry assay. GMM-based approach has also been used in [44] to multiparametrically characterize prostate tissue on transrectal dynamic contrast-enhanced ultrasonography giving an accuracy of 81%.
GMM has also found its application [45] for detection of falling positions in human beings. In this context, authors contribution has been to extract six postures of physically movements of human beings including lying, sitting, standing, getting up, walking, and falling from height captured in video camera. Mixture of gaussian model combined with average filter models have been used in this approach. Although, GMM has widely been used for several classification based applications, its use for analysis and classification of skin diseases has not yet gained much momentum.
Mostly, Euclidean distance is used for multidimesnional classification and hence leads to limited accuracy for search spaces with different weighted coordinate axis. Zhang et.al [46] developed a method called low-rank and sparse matrix decomposition-based mahalanobis distance (MD) method for anomaly detection. Their method used MD for detection of probable anomalies lying in the images analysed from sparse matrix decomposition. MD has also been used in Ribonucleic acid (RNA) sequencing to analyse molecules for prediction of breast cancer survival rate [47]. In [48], it was reported that MD solved the clustering problems associated with traditional Euclidean Distance (ED) in clustering ECG features by reducing iterations to 50%.
Melanoma detection method based on Mahalanobis distance learning and constrained graph regularized nonnegative matrix factorization has been successfully applied in [49] by incorporating global along with the local geometry in supervised learning based training for dimensionality reduction.
In [50] extreme learning machine method for multiclassification with Mahalanobis distance approach has also been investigated. MD was used for inter-class and ED for intra-class distance measurement resulting in about 1% improvement. The same approach has been adopted in the present investigations for distance measurement in four dimensions of the feature vectors (C, Cr, E, and H ) using 72 weights, 8 priors, 8×4 centers and 8×4 co-variances for GMM based classification of skin diseases.
The objective of the paper is to investigate the discriminative capabilities of gaussian mixture model based algorithm for the diagnosis of skin diseases from their visual images using Mahalanobis distance measure for mobile platforms where implementation using CNN is difficult because of the limited resources available within the device. GMM is computationally affordable, tractable, and efficient for small datasets in comparison to CNN [51]- [53]. Eleven different types of skin diseases (Molluscum Contagiosum, Milia, Discoid Lupus Erythematosus, Tinea Corporis, Warts, Acne Blackhead, Psoriasis, Discoid Eczema, Chromoblastomycosis, Athletes foot, Melanoma) along with their variants were taken for the investigations. The methodology for the estimation of GMM parameters has been discussed in the following section. The results and discussions are presented in Section III. Conclusions and future work have been discussed in Section IV.

II. METHODOLOGY
This section describes the material and the algorithm used for investigating the discriminative capabilities of GMM based algorithm for skin diseases from their visual images. Multivariate Gaussian density over two variables 1 Y and 2 Y is shown in Fig. 2. The proposed method is shown in Fig. 3. Images of skin patches having different diseases were taken from DermNet Nz database [54]. For investigation, image of the normal human skin is taken as the reference. Images are resized to 256 pixels x256 pixels and RGB components are separated for each image. Each RGB component is segmented into 8x8 blocks. For each block, GLCM parameters (contrast, energy, correlation and homogeneity) are calculated. The distribution of GLCM parameters is approximated using GMM, which is a parametric density estimation approach assuming that input data is to be generated by more than one Gaussian process [55]. GMM may be written as a weighted sum of m components of Gaussian densities: where, x is a D-dimensional feature vector, Clustering can be improved by using GMM algorithm which can be used to estimate GMM parameters, i.e. mean ( k μ ), weight ( i w ), and covariance ( i  ) [56].
The advantages of using GMM based algorithm is that it has low complexity and scalability. It computes the probabilities of cluster memberships by maximizing the log-likelihood of the data generated. GMM is an iterative method in each step of which posterior probability t ik P at t iteration is given by [57]- [58]. Parameters are updated on the basis of the probabilities from the previous step using: Mostly, the clustering algorithms use Euclidean distance for classification assuming the data to be isotropically gaussian [59]. In multivariate modeling, the feature vectors don't satisfy this condition and, hence, the clustering leads to wrong classification. The solution to this problem may be the use of Mahalanobis distance and covariance matrix Σ . It is scale-invariant [61]. It is based on correlations between variables leading to efficient identification and analysis of different patterns available in the input feature vectors. MD measures the relative distance between two variables with respect to the centroid [62]. It is a data driven measure that can ease the distance distortion caused by a linear combination of the attributes [63].

III. RESULTS AND DISCUSSIONS
The distributions of contrast, correlation, energy and homogeneity of red, green, and blue components of the chosen dermatological diseases were modeled using GMM. About 100 iterations were needed for convergence of the GMM and its approximation of the feature vectors for each type of skin diseases provided 8 priors, 8×4 centers, and 8×4 co-variances for each RGB component, giving a total of 72 valued feature vectors. For classification of the diseases, Euclidean and Mahalanobis distances amongst the diseases were also estimated with respect to the normal skin. Fig. 4 shows the output of GMM modeling of these feature vectors for normal human skin, whereas, Fig. 5 to Fig. 15    In all the diseases and its variants, maximum wide peak is observed for correlation and minimum wide peak for energy in RGB components. Similar results were observed for normal skin also.
Mathematical and visual analysis of the GMM modeled feature vector of different diseases show that peak structure is disease depended and may be very useful for predicting the dermatological diseases from their visual images. For example, the Mahalanobis based scatter plots (Fig. 17) show better results as dissimilar diseases get relatively more scattered as compared to that of Euclidean based scatter plots (Fig.  16). Further, instances of same disease (e.g. Melanoma in Fig. 19) give close grouping as compared to Euclidean based scatter plots in Fig. 18.

IV. CONCLUSION AND FURURE WORK
Investigations using GMM based modeling of GLCM parameters (contrast, correlation, energy and homogeneity) showed that different types of dermatological diseases have unique peak structure and, hence, they can be easily predicted only using their colored images. It was also observed that different diseases occupy distinct positions in Mahalanobis based classification. The extension of the work to other skin diseases on larger data sets is in our future plan.

NO CONFLICT STATEMENT
On behalf of all authors, the corresponding author states that there is no conflict of interest.