Loading web-font TeX/Main/Regular
A Novel Hybrid K-Means and GMM Machine Learning Model for Breast Cancer Detection | IEEE Journals & Magazine | IEEE Xplore

A Novel Hybrid K-Means and GMM Machine Learning Model for Breast Cancer Detection


Graphical abstract of hybrid K-means and GMM Machine Learning model for Breast Cancer Detection.

Abstract:

Breast cancer is the second leading cause of death among a large number of women worldwide. It may be challenging for radiologists to diagnose and treat breast cancer. Co...Show More

Abstract:

Breast cancer is the second leading cause of death among a large number of women worldwide. It may be challenging for radiologists to diagnose and treat breast cancer. Consequently, primary care improves disease prevention and death. Early detection increases treatment options and saves life, which is the major target of this research. This research indicates the versatility of the methodology by integrating contemporary segmentation approaches with machine learning methods, which are developing areas of research. In the pre-processing process, an adaptive median filter is utilized for noise removal, enhancement of image quality, conservation of edges, and smoothing. This research makes a significant contribution by proposing a new parameter for evaluating K-means and a Gaussian mixture model (GMM) performance. A hybrid combination of segmentation and detection was applied to breast cancer. The proposed technique is significant for classifying benign and malignant tumors. The simulated results are discussed and evaluated to determine the competence of this method for the early diagnosis of breast cancer. This method allows medical experts to recognize breast cancer at a faster rate and provide higher accuracy. An ANOVA test was used to determine the multi-variant analysis and prediction rate for the proposed method.
Graphical abstract of hybrid K-means and GMM Machine Learning model for Breast Cancer Detection.
Published in: IEEE Access ( Volume: 9)
Page(s): 146153 - 146162
Date of Publication: 27 October 2021
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

Experts in modern medical areas are focusing more on technical approaches for a variety of chronic diseases. Even though many diseases are incurable, such as cancer, stroke, heart attack, chronic liver diseases, viral hepatitis, and coronary artery disease, the death rate from breast cancer is increasing every year. According to a statistical report on medical health, cancer is a genetic disease that leads to variations in genes involved in the functionality of human body cells. Variation of the gene in genetic diseases may affect the internal parts of human organs for future generations. It may also affect DNA structure, resulting in environmental exposure to substances such as UV radiation, smoking, and other variables that are significant in the development of breast cancer [1]. Despite this, 60% of women affected by breast cancer are diagnosed at the last stage, which leads to death in women.

The main contribution of the proposed method is to segment the disordered portion of the cancerous cells in the breast image. The novel idea in this work includes a hybrid technique for determining breast cancer detection, and multi-variant analysis is performed to improve the prediction rate for the proposed system.

Research on breast cancer has increased in the past decade when abnormalities and uncontrollability in breast cell tissues develop into serious breast cancer in women [2]. It may include angiosarcoma, ductal carcinoma in situ (DCIS), and lobular carcinoma in situ (LCIS). As a result, it is critical to track the number of deaths caused by breast cancer before treatment. Figure 1 (a) and (b) show cancerous and non-cancerous images taken as exemplars. Therapeutic imaging is a non-invasive method of examining the inside of the human body that can help doctors detect and treat breast cancer at an early stage.

FIGURE 1. - a. Cancerous; and b. Non-cancerous breast image [3].
FIGURE 1.

a. Cancerous; and b. Non-cancerous breast image [3].

The determination of breast cancer in the initial stage is controllable. Breast cancer is caused by microcalcifications and masses, which are common abnormalities. Microcalcifications and breast masses occur in the connective tissues and epithelia of the breast region [4]. Breast tumors emerge in the breast and differ in size and shape. These are classified as benign or malignant, depending on their severity. Benign breast lumps are non-aggressive and non-cancerous, but they expand and impinge on adjacent organs, causing additional complications [5]. Malignant breast tumors are aggressive and cancerous. They must be treated as soon as possible to avoid mortality. Benign masses are oval or circular with confined and smooth borders, whereas malignant tumors are uneven in shape. Malignant breast masses are defined as fuzzy, rough, or ambiguous lumps. Furthermore, the cancerous tumor appears whiter than any surrounding tissue. The challenges and benefits of previous breast tumor classification and detection have led to the development of an automatic technique for assisting professional radiologists in ensuring greater interpretation and accuracy.

A diagnostic mammographic image is typically pre-processed to remove the pectoral muscle with a mammogram encircling for the detection process. By removing the pectoral muscle and background areas from a mammographic image, accurate breast profile segmentation on the surface can be determined [6]. Cancer tissues with larger pixel intensities were detected more easily than those in the breast area. The intensities of opaque breasts in normal tissues are similar to those in cancer areas; hence, tumor areas are productively generated. The manual techniques implemented by radiologists fail because of the similar appearance of microcalcifications and breast masses. Finding the tumor mass by segmenting the region of interest is a challenging task in research [7]. As a result, early detection technologies combined with automated systems must aid radiologists in accurately diagnosing breast tumors.

Screening models are utilized for screening breast cancer, including clinical and self-breast checks, magnetic resource imaging (MRI), mammography, and ultrasound. Mammography is an efficient and reliable radiographic procedure for detecting breast masses [8]. During screening, a 3D model of the breast is generated from various angles. High-quality and high-resolution images are utilized in subsequent image processing techniques, including feature extraction and segmentation. Thus, prior identification of breast cancer aid in reducing the death rate was considered in this research [9]. The proposed research uses a hybrid K-means and GMM machine learning model to increase the classification accuracy, reduce the error rate, and achieve a high signal-to-noise ratio.

The structure of this study is organized into different sections. The second section involves related works based on breast cancer classification and detection. The third Section presents the materials and methods used in the proposed work. The fourth Section discussed in detail about the experimental results in detail. The final section concludes with the novelty of this research.

SECTION II.

Related Works

The existing technique in the literature presents a computer-aided detection (CAD) method that depends on classification and feature extraction using machine learning (ML) models, which aid radiologists in identifying breast tumor lesions in X-rays. The initial process contains a pre-determined deep convolutional neural network (DCNN), and deep features are extracted in the second stage [10]. These are further fed with a support vector machine (SVM) classifier and various kernel functions. The third process presents deep feature fusion, which increases the accuracy of the SVM classifier compared to other methods.

Various methods have been used to identify various computer-aided detection approaches for breast cancer using ML techniques [11]. The inputs of these approaches are grouped into histopathological images, which have a variety of visual patterns and seem to be complicated in recognizing quality features to assist in the recognition of cancer. The author investigated various pre-trained CNNs to extract attributes from the histopathology images. These images were taken from the BreakHis dataset [12], which is publicly available.

Several approaches emphasize feature extraction, histopathological imaging, and segmentation. Pre-processing and adaptive learning based on the Gaussian aggregate model and interconnected element survey-based interest localization around the formed extraction are all components of this method. This approach operates in correlation with SVM to detect breast cancer [13].

Full-field digital mammography (FFDM) is broadly used to screen for breast cancer [14]. Contrast-enhanced digital mammography is an expanding technology in the current field comprising low-energy images related to FFDM and recombines images supporting cancer neo-angiogenesis, which are the same as breast MRI.

The advanced level of artificial intelligence (AI) technique and the natural image classification method for breast figure categorization tasks were investigated. The author has explained the performance of the neural network (NN), support vector machine (SVM), Bayesian methods, and random forest (RF) algorithms for breast image classification [15].

Advanced soft computing technology is used to pre-process the images and achieve the best classification process. Using a hybrid combination of photoacoustic images and machine learning to compare the region of the curve, the specificity and sensitivity of SVM has the potential to have a significant impact on diagnostics [16].

A novel classification technique depends on the fuzzy Gaussian mixture model (FGMM) by merging the fuzzy logic system and Gaussian mixture model power for the CAD method. This approach is used to distinguish between normal and malignant mammography images [17]. The confusion matrix was applied to generate the FGMM performance metrics, which improved the FGMM diagnostic accuracy and reliability in breast cancer diagnosis.

Breast cancer can be detected earlier using mammography. This model is based on a technique for mammography segmentation that is given with increased thresholding [18]. Furthermore, the final segmented image from the original image can easily identify breast cancer. In general, amplified segmentation is employed in all biomedical images for better detection, feature extraction, and visualization, which improves the accuracy of diagnosis.

Fuzzy multi-layer support vector machine (FMSVM) classification was used to estimate the extracted features, and their effects were determined [19]. This method is based on a combined image set taken from the publicly available mini MIAS databases [20]. This shows the efficacy with which benign, normal, and malignant tumors can be detected. It is also used to detect the tumor area and determine the location of the tumor is mainly concentrated [21]. It focuses on identifying the best algorithms for determining the tumors that exist in the breast. The most effective strategy for tumor diagnosis is a hybrid combination of K-means, dilatation, and canny edge detection techniques.

An automated breast segmentation process is employed to find the hottest region in thermograms by employing a morphological watershed driver to assist the experts in discovering the tumor in an effective method of infrared thermography [22]. An operation for thermogram assessment is the time required to achieve the proposed thermal stabilization. Image analysis for an automated system has low breast cancer grades in digitized histopathology, and intermediates have been examined [23]. Object-level, semantic-level, pixel-level features, hematoxylins, and eosin-stained breast biopsy tissue from 106 patients were identified among the multiple levels of feature sets. In this study, a hybrid active segmentation method was used to classify nuclei from images. A cascaded approach was used to construct multiple SVM classifiers for abnormal mammogram classes [30].

A segmentation model based on various machine-learning approaches is presented [26]. This model was trained effectively using normal back propagation to improve the neural network convergence rate and segmentation. The typical technique for the segmentation process in breast cancer is discussed using an advanced soft computing paradigm [27]. Pixel-to-pixel-level classification and segmentation are effectively used to detect all mammograms. These models are effectively trained using an advanced machine-learning approach with a better accuracy rate [28]. This research is further enhanced by the gaps in the existing soft computing strategy, which comprises the numerous tools and datasets employed in this work [29]. Breast cancer can be diagnosed at an earlier stage based on histological images. Hyper-parameter tuning was used to improve the efficiency of the trained model [31].

A residual neural network model for breast cancer segmentation is performed by fine-tuning the magnification factors. Using this process, the classification accuracy was calculated [32], [33]. Diagnostic tools are used to detect abnormalities in the breast using breast ultrasound (BUS) imaging. Three classifiers are employed to increase the classification accuracy: K-nearest neighbors (KNN), random forest, and decision tree [34], [35]. The subjective approach of classification is to use SVM and decision tree to categorize malignant and non-malignant categories [36], [37].

SECTION III.

Materials and Methods

This study presents a K-means segmentation model using a hybrid combination approach to detect cancerous and non-cancerous breasts. For image pre-processing, an adaptive median filter was applied for K-means classification and the Gaussian mixture model (GMM). Cancer is the uncontrolled accumulation of cell groups in a specific body location and the second most common cause of death in women worldwide. It is possible to treat the condition when it is properly recognized in its early stages. Several studies have been performed to detect cancers. However, no accurate techniques have been developed to date. Hence, a novel approach was used to accurately identify tumor regions. The proposed model was utilized to visually detect tumors and determine the location of the tumor. This work mainly focuses on the detection of tumors situated in the breast and fragments benign and malignant images using K-means and GMM algorithms.

Digital mammographic images, such as normal, benign, and malignant, were obtained from the source [20]. A pre-processing technique improves the image quality for further processing by reducing or removing surplus or unrelated elements in the mammography image background.

A. Dataset and Data Preparation

The Mammographic Image Analysis Society (MIAS) is a consortium of UK research organizations authorized to better understand mammograms that have a digital mammography database [20]. It consists of normal and abnormal breast images of the patients. The database contains 322 open-access digitized films and is accessible on a 2.3 GB 8 mm (ExaByte) tape. The radiologist’s “truth”-markings on the areas of any anomalies may be included. The database was padded/clipped and trimmed to a 200 micron pixel edge, resulting in an image size of 1024\times1024 pixels. The dataset is publicly accessible, and mammography images are acquired from the link [20]. Preprocessing is the main problem in low-level image processing. Pre-processing enhances the intensity between the background and objects, resulting in more accurate breast tissue structure projections. Screen-film mammography (SFM) is not accurately positioned in the scanner during the digitization process. The breast area boundary is removed from background objects, such as artifacts, scanning labels, and breast position. The image was smoothened and segmented by eliminating the uneven background of the breast tissue. Therefore, accurate extraction of the breast region was achieved by deleting the boundary and background. Hence, a pre-processing technique is essential to improve the quality. It also prepares a mammogram for the forthcoming processes, namely segmentation and feature extraction. Some components, such as high frequency and noise, were removed with the assistance of an adaptive median filter.

The adaptive median filter operates in a rectangular xy space. It varies the Rxy size in the filtering operation based on the conditions mentioned below. The median in the 3-by-3 neighborhood near the corresponding pixel in the collected images was used to create each output pixel. The image edges, on the other hand, are replaced with zeros. The filter output holds only one value that replaces the present pixel value at (x, y), where the point at which R is centered at time. The notation used is:

  • \text{S}_{\mathrm {min}} = \text {minimal} pixel value of Rxy

  • \text{S}_{\mathrm {max}} = \text {maximal} pixel value of Rxy

  • \text{S}_{\mathrm {med}}= \text {median} pixel value of Rxy

  • \text{S}_{\mathrm {xy}}= \text {value} of pixel at coordinates (x, y)

  • \text{R}_{\mathrm {max}} = \text {maximal} allowed Rxy size

Thus, adaptive median filtering is used to smoothen the non-repulsive noise arising from 2D signals without blurring borders and conserved images. The pre-processing model is used for orientation, segmentation, label, enhancement, artifact removal, and mammography. It is used to create masks to pixels with high intensity for decreased resolution and breast segments. The median filter causes the entire image fuzzier to transform the boundaries of objects present in the image into crisp, fine, and straight lines that are isolated directly.

Pre-processing

Preprocessing was performed using an adaptive median filter. This is the most significant step in medical image processing for detecting breast cancer using mammography images. The pre-processing image output was utilized for noise-free image classification. Figure 2 shows the various input images, such as normal, benign, and malignant, which are considered for further processing. The boundaries between microcalcifications and breast tissue were enhanced in the initial view of the images. The outcome of an adaptive median filter shows a better restoration of grayscale images. This helps to reduce the noise level when compared to other multilevel median filter types.

FIGURE 2. - a. Normal image b. Benign image c. Malignant image [20].
FIGURE 2.

a. Normal image b. Benign image c. Malignant image [20].

B. Proposed Model and Algorithm

The proposed model consists of an input breast database, image preprocessing, background elimination, filtering, and segmentation, as shown in Figure 3. The input dataset from the Mammographic Image Analysis Society (MIAS) is publicly accessible, and mammography images are extracted. Low-level image processing is often used in pre-processing to increase the contrast level. This improves the intensity between the backgrounds to produce reliable breast tissue. Background elimination is the process of creating a foreground mask to separate a component from the background. This method is used to detect objects from motionless images. An adaptive median filter approach was used to remove the impulse noise and speckle from the images. In the proposed hybrid approach, the labeled features of both k-means and GMM are effectively used to partition the region or seed points into various sub-instances.

FIGURE 3. - Proposed model for breast cancer segmentation.
FIGURE 3.

Proposed model for breast cancer segmentation.

The cluster numbers and mean values were initialized using K-means. The Euclidean distance is used to determine the distance (each instant) between the center of the cluster and the case. The center of each cluster was measured using the Euclidean distance, and the instance was allocated to the cluster with the minimal distance. As a result, the image points were labeled and clustered using the estimated distance. The cycle is terminated when each group is clustered, and each center is updated by averaging the points that belong to that cluster. When each instance permanently settles in clusters, the algorithm terminates. In other words, the instances are not transmitted from one cluster to another. GMM is a versatile segmentation approach that allows the selection of a component distribution, estimating the density for each group, and constructing soft clustered boundaries. GMM utilizes the expectation-maximization (EM) algorithm to compute the GMM parameters. The EM design is an iterative process in which the maximum likelihood is determined when the observed data are considered to be incomplete. Every frequency in the EM design contains two main processes: E-step (i.e., expectation) and M-step (maximization). In the E-step, the current estimates and observed data of the model parameters were used to evaluate the missing data. This parameter is the conditioned expectation to determine the terminology option. Under the hypothesis that such missing data are known, the M-step maximizes the probability function. The E-step was used to estimate the missing data. The design ensures that likelihood maximization occurs in each cycle, guaranteeing convergence.

GMM is a function of the likelihood to maximize the parameters, namely variance and mean. Thus, the parameters are estimated using the EM algorithm. In the initial stage, the number of means, classes, mixing coefficients, and variance were initialized. In the expectation step, compute the probabilities of the posterior with the present parameter values using (1).\begin{equation*} \gamma _{m}\left ({x }\right)=\frac {\pi _{n}G(x \mathord {\left /{ {\vphantom {x {\mu _{n},\sigma _{n}}}} }\right. } {\mu _{n},\sigma _{n}})}{\sum \nolimits _{m=1}^{n} {\pi _{m}G(x \mathord {\left /{ {\vphantom {x {\mu _{m},\sigma _{m}}}} }\right. } {\mu _{m},\sigma _{m}})}}\tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where G represents a Gaussian mixture model. In the maximization step, parameters such as variance, mixing coefficients, and mean are computed using the present posterior probabilities using equations (2), (3), and (4), respectively.\begin{align*} Mean~\mu _{m}=&\frac {\sum {\gamma _{m}(x_{k})x_{k}}}{\sum {\gamma _{m}(x_{k})}}\tag{2}\\ Variance~\sigma _{m}=&\frac {\sum {\gamma _{m}\left ({x_{k}-\mu _{m} }\right)} {(x_{k}-\mu _{m})}^{T}}{\sum {\gamma _{m}(x_{k})}}\qquad \tag{3}\\ Mixing~Coefficient~\pi _{m}=&\frac {1}{G}\sum {\gamma _{m}(x_{k})}\tag{4}\end{align*}
View SourceRight-click on figure for MathML and additional features.

The log-likelihood is evaluated by (5), \begin{equation*} \ln {L\left ({Y \mathord {\left /{ {\vphantom {Y {\mu,\sigma,\pi }}} }\right. } {\mu,\sigma,\pi } }\right)=\sum {ln}}\sum \nolimits _{n=1}^{N} {\pi _{n}G(x \mathord {\left /{ {\vphantom {x {\mu _{n},\sigma _{n}}}} }\right. } {\mu _{n},\sigma _{n}})}\tag{5}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

According to the density calculation, the cluster k numbers in the GMM segmentation model are automatically computed using the thresholding technique for each image. The mammography images are segmented into regions of the k cluster, where every pixel belongs to a cluster after the GMM parameters are computed using the EM design. As a result, the image is segmented into benign, normal, and malignant tissue classes using k-means and GMM. Finally, the accuracy of the segmentation method is expressed as a percentage, as in (6):\begin{align*}&\hspace {-1.2pc}Accuracy \\=&\frac {absolute~TP+absolute~TN}{absolute~TP\!+\!absolute~FP \!+\!absolute~TN\!+\!absolute~FN} \\&\times 100\tag{6}\end{align*}

View SourceRight-click on figure for MathML and additional features. where TP, TN, FN, and FP are true positive, true negative, false negative, and false positive, respectively. The above equation provides more accuracy for segmentation in the proposed method. The error rate is calculated for two n \times m. Images of monochrome in Equation (7).\begin{equation*} Error~Rate\!=\!\frac {1}{nm}\sum \nolimits _{a=0}^{n-1} \sum \nolimits _{b=0}^{m-1} {\vert \vert {(K\left ({a,b }\right)\!-\!I(a,b))}^{2}\vert \vert }\tag{7}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where K and I are two images, one of which is a noisy approximation, and the other is not. The signal-to-noise ratio (SNR) is the ratio of signal strength to noise power, which is measured and expressed in decibels (8).\begin{equation*} {SNR}_{decibel}=10{log}_{10}\left({\frac {R_{signal}}{R_{noise}}}\right)\tag{8}\end{equation*}
View SourceRight-click on figure for MathML and additional features.

A signal rate greater than 1:1 (i.e., more than zero dB) indicates that the signal is greater than the noise. The steps for k-means and the GMM algorithm are as follows.

K-Means Algorithm

Input:

Normal, Benign, and Malignant image Output: Segmented image Start

Step 1

Set the number of clusters k to assign in the given data.

Step 2

Select k value randomly from the centroids of the group.

Step 3

Repeat

Step 4

Expectation: Select the each point to its closest centroid.

Step 5

Maximization: Estimate the new centroid of each point in the cluster.

Step 6

Until Centroid positions and coordinates does not change

End

GMM Algorithm

Input:

Normal, Benign, and Malignant image

Output:

Segmented image Start

Step. 1

Consider j Gaussian classes with random \mu and \sigma .

Step. 2

Calculate posterior probability of each pixel for each class

Step. 3

Assign pixel to class with highest probability

Step. 4

Update \mu and \sigma for each class

Step. 5

Estimate maximum likelihood estimation

Step. 6

Repeat Step 2 to 6.

End

The proposed K-means and GMM models detect breast tumors and segment the images into benign, normal, and malignant categories. Greater accuracy was obtained with a lower error rate. The pseudo-code for the proposed method is as follows:

Proposed Algorithm

Input:

Normal, Benign, and Malignant image

Output:

Segmented image Start

Step. 1

Selecting a mammographic image from the image collection database

Step. 2

The pre-processing technique is applied to improve image quality

Step. 3

Eliminating breast region boundary and uneven background

Step. 4

Removal of noise and high frequency through an adaptive median filter

Step. 5

K-means and GMM segments data into k-clusters

Step. 6

Frame the expectation step using Eqn. (1)

Step. 7

Calculate mean, variance, and mixing coefficient in maximization step using Eqn. (2), (3), and (4)

Step. 8

Evaluate the log-likelihood in the GMM model using Eqn. (5)

Step. 9

Estimate the accuracy values using Eqn. (6)

Step. 10

Classification of a normal, benign, and malignant segmented image

End

The hyper-parameters of the k-means, GMM, and hybrid methods are presented. Using this algorithm, the training process was obtained for all data in the given breast image repository. Cross-validation was used to evaluate the proposed model to determine a better breast cancer model.

SECTION IV.

Experiment, Results and Discussion

A. MIAS Dataset

Initially, the input data were imported from a breast data repository [20]. The original 322 images (161 pairs) at 50-micron resolution in “Portable Gray Map” (PGM) format and accompanying truth data description are included in the Mammographic Image Analysis Society (MIAS) dataset of digital mammograms (v1.21), as shown in Table 1.

TABLE 1 Dataset Descriptions
Table 1- 
Dataset Descriptions

A digital dataset for screening mammography (DDSM) was obtained from the University of South Florida. In image preprocessing, artifacts are one of the limitations in the given image owing to the marking of some additional lesion spots. In addition, MIAS datasets were used to enhance the size of the data collections for further processing. Pre-processing and classification techniques were utilized to evaluate the accuracy of the proposed method (322 images, 64 benign, 51 malignant, and 207 normal breast images).

Subsequently, the images must be pre-processed to increase the difference in intensity between background objects and produce reliable breast tissue structure representations. Furthermore, an adaptive median filter was utilized to eliminate noise and high frequencies. Additionally, hybrid k-means and GMM models were applied to segment the clusters using different sets of parameters.

Input images are classified into three types, namely normal, benign, and malignant images, which also include physician marking on the place of abnormality. The database concludes with four types of abnormalities: suspicious lesions, architectural distortions, circumscribed calcifications, and masses. The proposed method was evaluated using mammography image collection, and the results are presented separately. The image set was divided into classes based on size.

B. Segmentation

The infrared images of three different cases, namely normal, benign, and malignant, were segmented and implemented using MATLAB R2019a. When a mammographic image contains microcalcifications, the proposed method allows for binary outcomes to indicate whether the tissue is benign, normal, or malignant. This process was computed in an Intel®Core™i5–8265 U processor at 3.9 GHz using Windows®10 operating system of 64-bit with 8 GB DDR4 memory.

Figures 4, 5, and 6 depict the segmentation of normal, benign, and malignant tissues from mammogram images. A step-wise reflection of the methodology is depicted by projecting essential stages, such as removal of the pectoral muscle, filtering process, and segmentation.

FIGURE 4. - Normal Image – Segmentation process flow using hybrid segmentation model.
FIGURE 4.

Normal Image – Segmentation process flow using hybrid segmentation model.

FIGURE 5. - Benign Image – Segmentation process flow using hybrid segmentation model.
FIGURE 5.

Benign Image – Segmentation process flow using hybrid segmentation model.

FIGURE 6. - Malignant Image – Segmentation process flow using hybrid segmentation model.
FIGURE 6.

Malignant Image – Segmentation process flow using hybrid segmentation model.

C. Comparative Analysis

An extensive analysis of the proposed segmentation model was performed by comparing the hybrid model with three other methods: GMM, K-means, and thresholding methods. Figure 7 depicts the performance of the true-positive rate versus the false-positive rate.

FIGURE 7. - Performance comparison – TPR vs FPR.
FIGURE 7.

Performance comparison – TPR vs FPR.

Figure 8 shows that K-Means is slower than GMM with a K-Means initializer. Hybrid GMM and K-means algorithm converge after 3rd epoch. Expectation-Maximization procedure is assured to have a local maximum after 10th iteration. At this point, the overall convergence of optimized K-means and GMM is existing at 10th iteration. GMM consumes less computation time than other existing techniques. This occurs when it finds a local minimum existence that is not close to the global minimum.

FIGURE 8. - Computation time (sec) for different number of iterations.
FIGURE 8.

Computation time (sec) for different number of iterations.

When a precise value for k is specified, it can be substituted for k in the model reference, for example, \text{k}=10 for 10-fold cross-validation. This method is most commonly used in applicable machine learning to determine unknown data. The region of interest (ROI) was subjected to a 10-fold cross-validation procedure. With 322 ROI images, the dataset was partitioned into 30% testing and 70 % training.

The learning rate of \epsilon is chosen via cross-validation with a value of 0.001 along with the hyper-parameter decision. It has been observed that initializing high precision to the cut-off value Dmax and a uniform initialization of \tau _{\mathrm {i}} is advantageous. The centroids were adjusted to random values. Larger values cause \sigma (t) to decline faster, which may impair convergence. Smaller values are always acceptable, but they take longer to reach convergence.

Various segmentation approaches were compared with the proposed method to validate the performance measures. K-means and GMM have 93.8% and 65% accuracy with high error rates of 29.47% and 24.35%, and low SNR, respectively. Thresholding had 86% accuracy and error rates of 32.58% and 10.17%, respectively. The accuracies of the three categories of SVM with kernel functions were 56.93%, 72.28 %, and 84.33 %, respectively. Growth region hand selection and FCM-GA selection had accuracies of 63% and 71%, respectively. The proposed hybrid model (K-Means and GMM) has a better accuracy of 95.50%, a low error rate of 18.64%, and a high SNR of 13.05. Table 2 presents a comparative analysis of classification accuracy, error rate, and SNR parameters for benign, malignant, and normal images after 10 epochs and an average execution time of 0.068 s.

TABLE 2 Comparative Analysis of Proposed Model With Exisiting Techniques
Table 2- 
Comparative Analysis of Proposed Model With Exisiting Techniques

The application of hybrid K-means and GMM segmentation will assist physicians in making early diagnoses by improving the qualitative identification of breast cancer in mammography images. Table 2 shows that the proposed hybrid model has a segmentation classification accuracy of 95.5 %, an error rate of 18.64, and a signal-to-noise ratio of 13.05, which is significantly more reliable than the existing techniques. Furthermore, the proposed technique minimizes the error rate.

The efficacy of the proposed method is presented in the diagnosis of breast cancer and its reliability in identifying malignant tumors from benign tumors. Using this method, medical experts can identify breast cancer faster with greater precision.

Extensive result analysis is presented with multi-variant matrices, such as accuracy, error rate, and signal-to-noise ratio. Analysis of variance (ANOVA) is a statistical approach for determining one or more variables in a set that differs significantly from one another. It checks the impact of one or more factors by comparing the means of different samples, as shown in Table 3.\begin{equation*} \sigma =\sqrt {\frac {1}{N}\sum \nolimits _{i=1}^{N} \left ({x_{i}-\bar {x} }\right)^{2}}\tag{9}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where, \sigma = \text {standard} deviation

  • \text{X}_{\mathrm {i}} = \text {sub} sets

  • \bar {x} = \text {arithmetic} mean of data.

  • N = number of sample sets

  • \sum {({\mathrm {Xi-}\bar {x})}^{2}} = \text {Sum} of all sample points.

TABLE 3 Multi-Variant Analysis of Performance Measures With Anova Test
Table 3- 
Multi-Variant Analysis of Performance Measures With Anova Test

The f-ratio value was 1.0638, p-value was < 0.0001, and significant at p < 0.05.

The ANOVA test showed an improved prediction rate for the proposed breast cancer performance metrics. The hybrid model proved the improvement in the detection of malignant breast cancer. The inference of this analytical study is to improve the accuracy, lower error rate, and high SNR.

Table 4 shows that among the various statistical tests, ANOVA has better result for the proposed work.

TABLE 4 Various Statistical Test
Table 4- 
Various Statistical Test

SECTION V.

Conclusion and Future Work

In this research, two segmentation approaches, namely the K-means and Gaussian mixture model (GMM), are used to segment different categories of breast images, such as normal, benign, and malignant. It is proven that the hybrid approach has better performance measures, such as an accuracy of 95.5%, an error rate of 18.64%, and a signal-to-noise of 13.05 when compared to other existing techniques. The ANOVA test checks the impact of one or more factors by comparing the mean, variances, and standard deviations of different samples. It shows a high prediction rate for the hybrid segmentation technique used in breast cancer detection.

The hybrid GMM and K-means model is a novel method for detecting breast cancer with good accuracy. Initially, the breast image from the data repository was preprocessed. Removal of speckle noise and special markings in medical images enhances image segmentation quality. The results show that the hybrid GMM and K-means perform better than the existing techniques. The future scope of this method shows better outcomes in terms of precision, and the segmentation models are greatly emphasized. This intelligent healthcare model will bring a revolution in the medical era by solving human problems in society, especially in detecting breast cancer in women at an early stage.

ACKNOWLEDGMENT

The authors would like to thank all our universities for facilitating our time support in this study.

References

References is not available for this document.