Image Clustering Algorithms to Identify Complicated Cerebral Diseases. Description and Comparison

This article presents two algorithms developed based on two different techniques, from clusterization theory, namely k-means clustering technique and Fuzzy C-means technique, respectively. In this context, the study offers a sustained comparison of the two algorithms in order to properly choose one of them, depending on the image to be analyzed and the solution that is desired. Algorithms are used in image processing, respectively as application of image processing techniques in brain computed tomography image analysis. There were also compared the results obtained by running the algorithms with a different number of centroids, as well as the execution times of each algorithm in part. Image processing and obtaining the results presented in this document was made possible by using the MATLAB R2018b environment. This fact is possible because some components of the brain, such as the blood vessel network or the neural network, have a fractal arrangement, which makes it easy to analyze their structure, in order to provide predictions or treatments to patients in discussion afﬂicted with a serious brain disease, as accurately as possible.


I. INTRODUCTION
Due to the image processing techniques that are becoming more and more efficient, today we can identify diseases such as stroke or Alzheimer's by using techniques such as k-means clustering or Fuzzy C-means (FCM). Thus, these methods realize the grouping of pixels of the same type into clusters, which, after processing, provide an image in which the areas in which the disorders occur can be easily identified. We mention that although the k-means algorithm and Fuzzy C-means are well known, the methods have been seriously improved, which is seen during running time (several seconds), a time considered small for the analysis of a computed tomography. The novelty of this article is the multi-level comparison between the two methods, a quantitative comparison made among the first in specialized literature.
The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang . Today, we need such ways to facilitate the diagnosis process because, according to statistics, this type of illness is one of the three main causes of disability or death worldwide, and rapid identification is becoming an emerging requirement [1]- [3].
As a manifestation, the stroke consists in interrupting the blood circulation through a portion of the brain. As a result of this phenomenon, cells deprived of blood circulation begin to die, and the final effect is paralysis or numbness of the body parts managed by these cells.
Two types of stroke can be identified: ischemic and hemorrhagic. Ischemia occurs as a result of narrowing of the arteries in the brain as a result of the formation of a clot either due to fat deposits or other diseases specific to the arteries. The hemorrhagic one results from the perforation or rupture of a blood vessel in the brain.
In order for the results to give confidence, the data set from the input of the presented system, in this case the tomography to be analyzed, must be made to a higher quality, since the filtering methods that can be applied to it can introduce values that will alter the final result. Image processing and obtaining the results presented in this document was made possible by using the MATLAB R2018b environment.
This study contains five chapters namely the introduction, other three distinct chapters, followed by the conclusions of the developments produced in the fifth and last chapter.
The second chapter describes the k-means algorithm and the possibilities of its application in medical imaging. Also, this chapter contains the set of results obtained from the analysis of a tomography.
The third chapter presents aspects of the FCM algorithm, as well as the results obtained through its use in the medical field.
The fourth chapter offers a comparison of the two algorithms including execution times of each on both color and black and white tomography images, in order to choose the appropriate one according to the image to be analyzed and the solution that is desired.
Therefore, as long as some components of the brain, such as the blood vessel network or the neural network, have a fractal arrangement, we can easily analyze their structure in order to provide predictions or treatments as fair as possible to the patients concerned. Basically we will perform what in specialized language is called a fractal analysis of the evaluated brain images [3].

II. K-MEANS ALGORITHM
Considering a multi-dimensional space, we can define a number of groups (clusters) through which we can associate points with similar characteristics.
As applications, we can exemplify by the cholera analysis in London, a project through which the outbreaks of the disease were determined and annihilated. Another successful project was Skycat, through which the celestial objects were classified. Moreover, today we can order documents by helping us with their content. A comprehensive analysis of the text, by defining a dimension for each word contained, leads to the positioning of the entire document in a category according to the number of occurrences of the word of interest [2].
The definition of the hypothesis in which the distances between the points of a multitude have values eligible for cluster formation, depends on a method adapted to measure them. Thus, let D (x, y) be a measure of the distance by which we can define the following cases [3], [4]: • D(x,y) = 0. The spacing of a point against itself is 0. • D(x,y) = D(y,x). This equality implies symmetry. • D(x,y) ≤ D(x,z) + D(z,y). The inequality of the triangle. In most cases the points are defined in a space of k dimensions, the distance between two points x = [x 1 , x 2 . . . x k ] and y = [y 1 , y 2 . . . y k ] being given by one of the following formulas [2]: 2 (1) • L ∞ norm: When defining a Euclidean space is not possible, grouping points becomes a delicate matter. For example, a web page can define a space with 10 8 dimensions, a word designating a size, an environment in which the calculation of distances would not be so easy. As a result, we can choose a technique by which the distance D (x, y) can be closely related to the scalar product of the vectors conforming to x and y, given that later the result will be determined by calculating the values of both x and y [2].
The determination of the lengths of the engaged vectors is performed according to the square roots of the squared sums for the occurrence numbers of each word. Then, the sum of the products of the occurrence numbers for each word is divided by the product of lengths to result, finally, in a normalized scalar product, a value which subtracted from 1 leads to the determination of the distance between x and y [2].
Algorithms for grouping points of a set can be classified as follows: • Hierarchical type. The method initially assumes that any point in the set is a cluster, and along it the nearby clusters merge.
• Centroid type. The technique randomly selects the centroid and allocates the rest of the points according to the distance from the centroid. The k-means algorithm stores information in the central memory, defining a number of k centroids and associating the points of their nearest centroid. A centroid can migrate together with the assignment of points.
In order to obtain the results presented hereby in this article, it was necessary to use the kmeans function (division function) in the Statistics and Machine Learning toolbox contained in the MATLAB R2018b software. This function returns the number of each cluster to which a point has been assigned, after dividing the information into the k clusters. Each point in the information is processed as an object with a well-defined place in space. When using this function it is necessary to define the number of desired clusters.
Unlike hierarchical clustering, k-means acts on real observations, not on each pair of observations, thus becoming more suitable for processing significant amounts of data.
In a cluster obtained with the k-means function we find the centroid (calculated using different methods for each value of the accepted distance) and the member objects, which are at a distance minimized as far as possible by the algorithm. Also, VOLUME 8, 2020 the number of iterations of the algorithm can be controlled by the user [3].
When using this function it is recommended to: • Compare the results obtained by running the algorithm with a different number of centroids. This operation is performed to determine the proper value of k.
• Evaluate the graph corresponding to the figures of interest.
• Resume the algorithm and select the solution with the smallest total sum of distances.
Considering the advantages of the k-means algorithm, we find that we can easily use it in tomography analysis to determine if a patient has had a stroke or Alzheimer's [5]- [8].
Thus, we developed a program in the MATLAB R2018b programming environment that works as follows [3]: • The image containing the brain to be analyzed is loaded ( Figure 1).
• The image is translated into gray tones (using the rgb2gray function).
• Clusters are created by partitioning information.
• It is advisable to display a silhouette chart to verify the extent to which the points in each group are close to the neighboring groups. This value ranges from 1 (indicating points that are far removed from neighboring groups) to 0 (points that are not distinct in one cluster or another) to −1 (points that are probably allocated to the wrong cluster). The 'cityblock' parameter emphasizes that the k-means algorithm is based on the sum of absolute distances.
• The graph in Figure 2 shows that the points in cluster number 2 are separated by the other groups, and the others in clusters number 1 and 3 contain some reduced or negative values, which indicates a less successful separation.
• By increasing the number of centroids to 4, we observe a better separation of groups ( Figure 3).
• 5 centroids can be defined to test the optimal number of initial groups, but we notice that the result starts to be altered, several points having low or negative values ( Figure 4).   • By default, the algorithm runs by randomly choosing the locations of the centroids, but converges to a solution characterized by the local minimum (nonglobal), that is, the technique can divide the data in such a way that the displacement of any point in another cluster can increase the total distance. Of course, the hypothesis from which the program starts is essential in the process of determining the result. Therefore, using the 'Replicate' parameter, various solutions can be tested by repeating the clustering process starting from different values of the centroids and returning the result with the smallest total sum of distances between all the tests. The k-means algorithm can be applied, also in the MATLAB environment, using the imsegkmeans function that segments the image into k clusters set by the user. The function labels the clusters, and then, through another function, labeloverlay, we can display the clusters with one color, depending on the associated tag. This method helps to quickly identify the areas of interest in the image. Thus, in Figure 5, it is easier to identify a disease such as Alzheimer's using the program described [9]- [11].
Similarly, the same techniques were applied to the images in Figures 6,7,8,9 and 10. It is worth noting that for the second tomography analyzed a number of five clusters was more appropriate. The graphs for 3 and 4 clusters respectively presented a significant number of points with negative values, which denotes the inclusion of the point in an inappropriate group (probably due to the noise in the initial image).

III. FCM ALGORITHM
The Fuzzy C-means technique is one of the most popular image clustering methods [5], [12]. By having an easy 88436 VOLUME 8, 2020   implementation, this process can provide desirable results if the hardware resources of the system running are generous. For 2D datasets, runtime, memory required, and output quality do not require very high values. On the other hand, for 3D sets, significant resources are recommended in order to achieve an efficient implementation.
The computational efficiency is obtained using the histogram of the image intensities during the clustering process instead of the raw image data [13], [14].
The elements of the FCM algorithm are:   weight that gives the degree of belonging of object i to cluster c j . The restrictions of the algorithm are given by the following statements: • each weight for a point x i must be 1.
• each cluster c j contains, for a non-zero weight, at least one point, but for a weight 1 it will not contain all points.
The computational process by which the centroid of a c j cluster is calculated is governed, mathematically, by the following relation: It is reported that Equation (6) is a nuanced expression of a centroid-defining relation using the k-means algorithm. The difference is given by the fact that the points are considered in their entirety and their contribution is weighted by the coefficient of belonging. VOLUME 8, 2020  The updating process is done by reducing the amount of average errors as follows: Therefore, w i,j implies a large value for the case where x i is located close to the c j centroid.
Another parameter of interest is p (the index that characterizes the influence of weight, p ∈ (1, ∞)) and can be located in one of the following cases: • p > 2. This case implies the decrease of the weight characteristic of the nearest cluster of the point.
• p → ∞. In this situation the exponent tends to 0 and the weight tends to 1/k.
• p → 1. The exponent determines the increase of the weight of belonging to the nearest cluster.
The stages of application of the algorithm are: • Selecting an initial fuzzy pseudo-partition (assigns values to parameter w i,j ).
• Resume the first step.
• Calculate the centroid specific to each cluster using the fuzzy partition.
• The fuzzy partition is updated until the centroids begin to no longer change.  In order to obtain the images in Figures 11 and 12, noise suppression was required through a morphological reconstruction of the image to be analyzed [13]- [15].
Then, by using the histogram, the complexity of the computational process is greatly reduced [16].
Finally, applying the median filter will clarify the cluster membership of the points.

IV. COMPARISON OF THE TWO ALGORITHMS
The specialty theory characterizes the FCM algorithm as being similar to the k-means method, calling it Soft K-Means.
In principle, the functions are identical, the only difference being the use of a value that defines the degree of belonging of a point to each of the clusters used. This coefficient is closely related to an exponent, called ''rigidity'', which has the role of taking into account the stronger connections between points. It is noteworthy that when the value of rigidity tends to infinity, the determined vector becomes a binary matrix, resulting in an identity between FCM and k-means algorithms.
Thus, speaking mathematically, we observe in the definitions of the functions that characterize the two methods a similarity in terms of minimizing the sum of the average error (SAE): • k-means: where p represents an index that characterizes the influence of weight, p ∈ (1, ∞).
In terms of computational performance, the FCM technique must perform k multiplications for each point and for each dimension, actions that make it slower than the k-means method. On the other hand, if a cluster is to be analyzed in which the points are scattered along a certain size or two, the use of FCM [17], [18] to the detriment of k-means is recommended [19], [20].
The image partition/clustering is performed to identify, extract from context and it characterizes/represents the distinct anatomical structure, highlighted on the examined tomographic image.
Observation. Nevertheless, today, by image segregation, the software is capable to even detect the perivascular space, much finer in size, so more quantitatively, but able to identify them all, at the same time [21]. On this occasion, together with the specific anatomy observed, we can make predictions about brain pathology and its physiology.
Then we apply herein two widely used algorithms for tumors in detecting k-means clustering and Fuzzy C-means clustering. The clustering algorithms are compared to estimate their efficiency, which we prove, by evaluating the execution time duration and accuracy of the algorithms, respectively. The results show that the execution time is less in k-means compared to Fuzzy C-means clustering technique, almost 10 times, because the number of iterations of k-means is less than Fuzzy C-means clustering. This statement holds true for color images (color tomography) as well as for grayscale images (black and white tomography), as seen in Figures 13 and 14.
In Figures 13 and 14, the graphs of two histograms for k-means clustering and Fuzzy C-means (FCM) clustering respectively, with the execution time (in seconds) on the oy axis, are represented.
As an indicator of accuracy, we will use the so-called recall, a term well known in the literature. Recall is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents, while precision is defined as the number of relevant documents retrieved by a search divided by the total number  In Figures 15 and 16, the graphs of two histograms for k-means clustering and Fuzzy C-means (FCM) clustering respectively, with Recall calculation as a percentage on the oy axis, are represented.
We mention that the two algorithms namely k-means and Fuzzy C-means were performed on 10-iteration sample set and the result was tabulated into Execution Time and Recall of clustering techniques in the color image tomography, respectively in black and white image tomography. The data distribution and the resulting values are represented by histogram.
Identifying malignant nodosities or cerebral cancer constituents from computerized tomography exploration is  To facilitate this service immediately, the techniques presented above have been suggested and the methods highlighted in this article have been refined. During the last decade, approaches in this category have demonstrated admirable outcomes, superior to those obtained by standard methods in different areas [22], [23].
Today, specialized investigators are trying different new techniques for in-depth analysis to enhance the performance of detection systems in neoplasm screening by computed tomography, including both identification and localization of abnormalities, called mini-tumors in medical language, (about the size of some pixels, though) which escapes the observation of some inexperienced radiologists.

V. CONCLUSION
This study presents a comparison of two point clustering algorithms with close values in clusters, in order to analyze irregularities of the surfaces contained in the images.
Currently, according to studies conducted in the medical field, a major cause of deaths worldwide is represented by brain diseases (the most important of which is stroke, followed by Alzheimer's). Thus, the need to use image processing techniques that complement the experience of the medical doctors in making decisions regarding the diagnosis and treatment of certain diseases becomes critical.
The images obtained from cranial tomography represent a good starting point in this study because, if they are made at a higher quality, they can provide crucial information in establishing a diagnosis. The technique from the profile laboratories can capture images of the brain or other organs with an impressive resolution, but it must be improved by implementing some disease detection and signaling functionalities. This process involves capturing an image, processing it using dedicated algorithms and suggesting a verdict/diagnosis.
Firstly, due to the shortcomings of the construction of the equipment intended for the acquisition of the medical image, surfaces that present various types of noise appear in the final photograph. In this sense, it is necessary a reconstruction of the image or a severe filtering so that the input data of the detection algorithm is as successful as possible, respectively of a high quality. Otherwise, the system will provide an altered result, inconsistent with the medical reality represented.
The presented algorithms, k-means and Fuzzy C-means, use the theory of clustering data from a set in order to observe the points of interest.
Such report demonstrates that the results obtained by k-means clustering method are better than the results obtained by Fuzzy C-means method, because Fuzzy C-means is a semi-supervised method. Accordingly, the pre-processing procedure is necessary, whereas k-means clustering does not require pre-processing since it is unsupervised method and number of iterations is less in this case. Likewise, maximum lossless compression is achieved by k-means clustering. It also provides accurate results with minimal amount of data. Consequently, k-means clustering is more efficient than Fuzzy C Means method and is less error-sensitive, at the same time.
The number of groups in which points can be assigned is set by the user before running the programs, depending on the component parts of the image (physically speaking, congruent with the different types of tissue). With the help of the programming environment MATLAB R2018b, two programs were implemented that processed (a series of filters were applied to eliminate the noise and adjust the contrast of the images) and analyzed the cranial tomography to detect the above-mentioned diseases. Thus, the system presented in this article brings a novel concept, innovative, in the decision-making process of diagnosing brain disorders by using the elements of set theory in the processing of characteristic medical images. Project that focuses on the conception of a multicore platform suited for the integration into multidomain high criticality systems. In particular, he has specialized in hard real-time system analysis, formal methods, worst-case execution time estimation, hardware architecture, formal languages and abstract state machines. Since 2014, he has been a Postdoctoral Researcher with the Embedded System Unit of the Computer Science Department, ENSTA ParisTech. As a researcher, his main interests are about ensuring critical nonfunctional properties of hard real time systems and the related certification process. As a Teacher, he is involved in numerous computer science courses in engineering schools as well as in the evaluation of the national exam for admittance in several engineering schools. He has served as a Board Member, the Session Chair, and a Reviewer for international conferences and journals. He is devoted to volunteering and the Chair of the IEEE France Young Professionals. The main area of his scientific interest is the nonlinear dynamics theory, with its applications in different physico-chemical systems (nanostructures, composites) and biological systems as a research expertise, including in particular time series, fractals analysis and diffusion process. He has published more than 150 articles in national and international journals, 85 ISI journal articles, 50 communications in national and international meetings (more than 30 on invitation), 434 citations without self-citations, respectively 17 books, chapters in books and monographs. His Hirsch factor value is currently 18.