Feature Extraction of White Blood Cells Using CMYK-Moment Localization and Deep Learning in Acute Myeloid Leukemia Blood Smear Microscopic Images

Artificial intelligence has revolutionized medical diagnosis, particularly for cancers. Acute myeloid leukemia (AML) diagnosis is a tedious protocol that is prone to human and machine errors. In several instances, it is difficult to make an accurate final decision even after careful examination by an experienced pathologist. However, computer-aided diagnosis (CAD) can help reduce the errors and time associated with AML diagnosis. White Blood Cells (WBC) detection is a critical step in AML diagnosis, and deep learning is considered a state-of-the-art approach for WBC detection. However, the accuracy of WBC detection is strongly associated with the quality of the extracted features used in training the pixel-wise classification models. CAD depends on studying the different patterns of changes associated with WBC counts and features. In this study, a new hybrid feature extraction method was developed using image processing and deep learning methods. The proposed method consists of two steps: 1) a region of interest (ROI) is extracted using the CMYK-moment localization method and 2) deep learning-based features are extracted using a CNN-based feature fusion method. Several classification algorithms are used to evaluate the significance of the extracted features. The proposed feature extraction method was evaluated using an external dataset and benchmarked against other feature extraction methods. The proposed method achieved excellent performance, generalization, and stability using all the classifiers, with overall classification accuracies of 97.57% and 96.41% using the primary and secondary datasets, respectively. This method has opened a new alternative to improve the detection of WBCs, which could lead to a better diagnosis of AML.


I. INTRODUCTION
Features are data descriptors used to describe data elements such as classification and clustering. A comprehensive understanding of WBC features is critical for differentiating between various types and subtypes of leukemia. Current methods used for WBC detection, segmentation, and classification face several challenges, although they are performed The associate editor coordinating the review of this manuscript and approving it for publication was Aasia Khanum . using automatic and manual approaches [1]. Manual detection of WBCs is conducted by pathologists and is typically subject to human error and produces inaccurate results. This process is tedious, time-consuming, and subject to interand intra-class variations among pathologists. Only 76.6% of the cases showed agreement between pathologists during leukemia diagnosis [2]. Other challenges are associated with the complex nature of WBCs, including irregular boundaries and the textural similarities between WBCs and other blood components, which cause difficulties in separating WBCs from other blood components [1], [3]. Also, WBCs are complex in terms of shape, texture, color, and intensity diversity [4], [5]. WBCs are heterogeneous and have different subtypes, including normal and abnormal cells. In addition, different staining and illumination variations render WBC recognition more difficult [6], [7]. However, current automated methods of WBC detection used in laboratories primarily focus on quantitative rather than qualitative methods used in image processing and pattern recognition [1], [8], [9]. Therefore, the use of new computer-aided systems for accurate WBC detection can aid the development of stable and generalized learning systems. In this study, a new feature extraction method for WBC detection was proposed. The proposed method is a hybrid CMYK localization method based on image processing and a deep-learning-based feature fusion method using a CNN. The proposed method can also be used to build a semantic segmentation model to help pathologists detect and localize WBCs to improve the diagnosis accuracy.

II. RELATED WORKS
WBC recognition is a challenging task because of the complex nature of cell images, which makes the identification of significant WBC features more difficult. Researchers have attempted to extract and identify significant features of WBCs to discriminate between WBCs and other blood components. WBC features can be categorized into two types: handcrafted features and deep-learning-based features. Handcrafted features are obtained using image processing techniques and are used with traditional machine-learning (ML) algorithms. Conversely, deep learning-based features are automatic features extracted using deep learning models and can be used with fully connected layers (part of a deep learning model) or linked to an external ML classifier. Many researchers have used handcrafted features to perform WBC recognition and segmentation and have shown good performance [10]- [16]. However, these methods exhibit marked limitations in terms of efficiency and generalization for solving complex problems [17], [18]. Therefore, several researchers have focused on investigating deep-learning-based features [6], [19]- [26]. Lu et al. [27] extracted and fused multiscale features using a feature encoder with residual blocks to develop a WBC segmentation system. They also used convolution and deconvolution decoder to improve the WBC segmentation mask. Their method was evaluated using four datasets of normal WBCs: neutrophils, eosinophils, basophils, monocytes, and lymphocytes. Their system achieved the best results compared to other benchmark methods.
Roy et al. [7] proposed a white blood cell (WBC) feature extraction method using the DeepLabv3+ architecture and a ResNet-50 feature extractor to extract WBC features. The extracted features were then used to build a WBC segmentation system. The system was evaluated using three different public datasets and achieved 96.1% segmentation accuracy. Abdurrazzaq et al. [28] used a singular value decomposition approach to localize WBCs using the similarity level of features between WBCs. Their results showed an improvement in WBC nuclei detection as well as WBC detection, particularly for WBCs with light color intensity, compared to other methods. Their method achieves an average segmentation accuracy of 63%. Khomairoh et al. [29] used the Haar cascade model for WBC feature extraction to extract ROIs for WBC segmentation. The model was built using a dataset of M4, M5, and M7 AML subtypes, and features were extracted using different convolution kernels including edges, lines, and four-rectangle kernels. Subsequently, a color-based method was used for nuclear and cytoplasmic segmentation. The overall accuracies of nucleus segmentation were 87.5%, 90.4%, and 84.6% for M4, M5, and M7, respectively. However, for cytoplasmic segmentation, the model achieved overall accuracies of 75%, 71.4%, and 80.76% for M4, M5, and M7, respectively. Hegde et al. [21] compared the performance of handcrafted features with that of deep-learning-based features using a CNN. The handcrafted features considered included shape, color, and texture, and deep-learning-based features were extracted from the fc6, fc7, and fc8 layers of AlexNet. The two approaches were then compared using a neural network (NN) classifier. Both methods achieved comparable results, with an overall accuracy of 99%. Saleem et al. [20] used feature fusion with DarkNet-53 and ShuffleNet to extract WBC features for both segmentation and classification, and achieved 98.6% segmentation accuracy.
Ramya et al. [30] extracted a set of image levels and statistical features using segmented WBCs for classification into AML and normal. The extracted features included color, shape, and gray-level co-occurrence matrix (GLCM). Rad et al. [31] used statistical and morphological features to develop a new object detection technique to overcome the problem of the initial contour in the level-set segmentation method. They used statistical and morphological features inside and outside the contour to develop an automatic regionbased initial contour. Their method achieved an overall accuracy of 96%, was evaluated using two external datasets, and achieved optimal results. Puigdollers et al. [16] used a bagof-words approach to extract the local image descriptors for WBC detection. Their method achieved an overall accuracy of 80% and did not require carefully crafted features to localize WBCs, thus making it simpler and more generalized.
Loddo et al. [32] extracted color and statistical features to train a multiclassification system based on an SVM and KNN. Color features were calculated pixel-wise by averaging the color values of each pixel using a 3 × 3 pixel neighborhood. The model achieved 99% accuracy, and was extended to develop a WBC counting system using a circular Hough transform, which achieved 99.7% accuracy [33].
Literature shows that WBC feature extraction has primarily focused on normal WBCs and acute lymphoid leukemia (ALL). However, owing to several challenges, limited research has been conducted on the recognition of WBCs obtained from AML patients. Therefore, this study focuses on feature extraction using several types of WBCs, including normal WBCs and WBCs obtained from AML microscopic images.
This study makes the following contributions to the literature: 1. We developed a new WBC localization method based on the CMYK color space transformation and image moments, named CMYK-moment localization. 2. We proposed a new CNN-based feature extraction method based on the feature fusion of pointwise and localized features by combining a shallow layer with a deep stacked layer to extract generalized features without losing pixel information originality. 3. We proposed a hybrid WBC feature extraction framework for CMYK-moment localization and CNN-based feature extraction based on feature fusion.

A. DATASET
This study used a single-cell morphological dataset of leukocytes from patients with AML and non-malignant controls (AML_Cytomorphology_LMU). The dataset consisted of 18,365 expert-labeled single-cell images obtained from peripheral blood smears of 100 AML patients and 100 controls at Munich University Hospital between 2014 and 2017. The dataset is classified into 15 different types of singlecell images. Four of these were leukemic cells and the other 11 were normal blood cells. Among the 11 types, seven were mature leukocytes and four were immature. Cancerous and noncancerous WBCs were classified by expert pathologists based on standard morphological classification [26], see Figure1.
A secondary dataset of 17,092 normal peripheral blood samples from individual cells was used to evaluate the model performance. The dataset was obtained using CellaVision DM96 in RGB color space. The images were obtained using the RGB color space, jpg format, and 360 × 363 dimensions, and were labeled by an expert pathologist. The dataset consists of eight classes of different types of blood cells, including segmented neutrophils, eosinophils, basophils, lymphocytes, monocytes, erythroblasts, metamyelocytes, myelocytes, promyelocytes, and platelets [34].

B. PROPOSED MODEL
The proposed feature extraction method is a hybrid of the CMYK-moment localization and CNN feature fusion. In this study, the CMYK-moment localization method was used to extract the ROI by using a combination of the color transformation method (CMYK) and image moments. An ROI was used to reduce the amount of irrelevant information to extract context-free WBC features that depend only on WBC cells, which helps identify more generalized WBC features that can be used to detect different types of WBCs [29]. The elimination of unnecessary information also mitigates overfitting and decreases computation time. Extracting features that can successfully identify different types of WBCs and discriminate between WBCs and other blood components is challenging because of the complex biological nature of WBCs, such as their shape, texture, color, density variations [4], [5], irregular boundaries and textural similarities between WBCs and other blood components. However, deep CNN convolutional filters can extract complex textural patterns compared to other conventional texture feature extraction methods, such as Gabor filters [1], [3]. In general, CNN convolutional filters have been shown to achieve better performance with images than with other types of data [35]. The proposed CNN feature fusion model consisted of four layers. The first layer is the input layer of the RGB images, followed by two convolutional layers. The first layer is a single pointwise layer and a stacked spatial layer consisting of two layers. The first convolutional layer helps extract simple features such as edges, whereas the second layer is used to extract more complex patterns such as texture features. The proposed feature extraction method was divided into four phases: Phase I (ROI localization), Phase II (feature extraction), and Phase III (model evaluation). The proposed method is illustrated in Figure 2.

1) PHASE I: ROI LOCALIZATION
In this phase, an ROI was extracted using the CMYK moment localization method. In this method, RGB images are converted into the CMYK color space. The C channel was then extracted, and the Otsu thresholding method was applied to generate a binary mask to extract the WBC nucleus. Postprocessing operations were then applied using morphological opening and maximum connected region (MCR) to remove isolated components and obtain a refined binary mask. The centroid of the nucleus was then calculated using image moments. The nucleus centroid was calculated using the following equation: (2) VOLUME 10, 2022 where I (x, y) is the image intensity and C X and C Y are the x and y coordinates, respectively. The maximum diameter of WBCs was determined by drawing a circle around the centroid. A square polygon was then drawn around the circle to extract the ROI, Figure 3.

2) PHASE II: FEATURE EXTRACTION
In this phase, the ROI images were annotated into a foreground representing WBCs and a background representing other blood components. Subsequently, 2D CNN convolutional layers were used to extract features using the feature fusion of pointwise and localized features.   Convolutional layers are key components of a CNN architecture, and consist of a set of kernels used to convolve the input image and forward it to the top layers using the following equation:  where h j (X) is the j th feature map obtained by the convolutional operation of the input image at the special location X = (x, y); g ij is the kernel defined between the f i input channel and the h i feature map; and ⊗ is the convolution operation defined as follows:  [36], [37]. In this study, the features were extracted using 2D separable convolution, where the features were extracted from each channel separately. The extracted features were then averaged over the three channels using pointwise convolution. The model consisted of two parallel layers. The first layer is a single pointwise convolution layer of 64 filters using a 1 × 1 kernel size and LeakyReLU activation function with alpha = 0.3, followed by an average pooling layer of size two. The VOLUME 10, 2022 where P v c (y) is the pixel value at location y of the c th channel and v th convolution layer after applying the LeakyReLU activation function. The average pooling can be represented by the following equation: where A v c (y) represents the pixel at location y after applying the average pooling operation; p h and p w represent the image height and width, respectively; and P v c (x) represents the pixel at location x after applying the LeakyReLU activation function [24].
The pointwise layer was used to extract low-level features without losing pixel information owing to the multiple convolutional operations (Huang et al., 2017). The second layer was a stacked layer consisting of two convolutional layers of 64 filters using a 3 × 3 kernel and a LeakyReLU activation function with an alpha coefficient of 0.3. The convolution layers were followed by average pooling and a dropout layer at a rate of 20% to avoid overfitting [38]. A zero-padding technique was applied to the input images to maintain the same output fitters' size as input images. Feature fusion is used to combine the features obtained from the two parallel branches to construct the final set of features. The algorithm for the proposed feature extraction method is illustrated in Figure 4. The extracted features were then mapped to the corresponding binary labels after removing the unlabeled data to obtain the final dataset for model training. Figure 5 shows the network configuration of the proposed method.

3) PHASE III: METHOD EVALUATION
The proposed feature extraction method was evaluated using several classification algorithms on two datasets: primary and secondary. The primary dataset was used for model training and validation, whereas the secondary dataset was used only for testing. The classification algorithms applied for method evaluation included fully connected layers (FCL) [39], random forest (RF) [40], support vector machine (SVM) [41], and XGBoost [42]. Classification performance was measured using seven evaluation metrics: overall accuracy, sensitivity, precision, specificity, F-score, area under the receiver operating characteristic (ROC) curve (AUC), and intersection over union (IoU) [43]- [45]. The overall accuracy measures the rate of correctly classified pixels; the sensitivity measures the rate of correctly classified WBC pixels and is also known as the true positive (TP) rate; the specificity measures the correctly classified non-WBC pixels and is also known as the true negative (TN) rate. Precision measures the positive prediction value (PPV) of the model, and the F-score measures the harmonic mean of both precision and sensitivity. Therefore, the F-score provides a single measure of both precision and sensitivity, which is particularly useful for imbalanced data classification problems. The AUC measures model performance using several thresholds. The similarity between a predicted object and its corresponding ground truth is measured using IoU, also known as the Jaccard index [43], and is commonly used in the field of object detection. Equations (8) - (14) were used to calculate the performance measures, as follows:   The model was developed and implemented using an Intel R Core TM i7-9750 h @ 2.60 GHz 192 with a 64-bit operating system, an x64-based processor, 16 GB of RAM, and an NVIDIA GeForce RTX 2070 with a max-design. The algorithm was written in Python using the Keras deep learning package and other image-processing packages to extract handcrafted features.

IV. RESULTS AND DISCUSSION
A set of 128 features was obtained, and each pixel was labeled as foreground (WBC) or background (other blood components). The dataset consisted of 3,192,550 pixels; 1,795,988 (56.3%) pixels were labeled as foreground and 1,396,562 (43.7%) pixels were labeled as background. The dataset was then divided into 80% and 20% for training and testing,  respectively. The following are the results of the four evaluation methods mentioned in Section III.

A. PRIMARY DATASET
The proposed feature extraction method achieved an overall accuracy of 97.34%, 97.57%, 97.15%, 97.39% using FCL, RF, SVM, and XGBoost, respectively. Table 1 shows that the proposed feature extraction method achieved comparable results for all classifiers. However, SVM showed more efficacy in terms of computation time (computation time = 38.7 s) and comparable overall accuracy using the same hardware facilities (see Figure 6).  Table 2 shows that the proposed feature extraction method exhibited stability and produced VOLUME 10, 2022 comparable results among the classifiers. However, the SVM achieved better performance compared to the other classifiers and mitigated overfitting compared to the other classifiers (see Figure 7). Figures 8 and 9 show the results of applying the proposed feature extraction method to primary and secondary datasets, respectively. The proposed method was able to detect all types of blood cells present in the datasets and accurately detect platelet cells that were not present during training with the primary dataset.

V. BENCHMARKING THE PROPOSED FEATURE EXTRACTION METHOD WITH OTHER METHODS
The selected benchmarking methods were chosen based on many experiments, starting with conventional methods and ending with advanced deep learning methods. However, deep learning-based features using CNN achieved the highest accuracy compared with the other methods. The proposed feature extraction method was benchmarked using several other feature extraction methods. These methods include a feature bank of texture features that uses Gabor filters, local binary pattern (LBP), edge detection filters, K-means clusters, and Gaussian filters. The first method used a feature bank of Gabor texture filters (M1). The second method used Gabor filters and an LBP filter (M2). The third method uses Gabor filters, LBP features, and edge-detection features (M3). The fourth method applied a combination of Gabor filters, LBP features, edge detection features, and K-means clusters (M4). The fifth method applies Gabor filters, LBP, edge-detection features, K-means clusters, and a Gaussian filter (M5).
The benchmark feature extraction methods were evaluated using multiple classifiers, including FCL, RF, SVM, and XGBoost, based on the overall accuracy, sensitivity, specificity, precision, F-score, AUC, and Jaccard index, which were used to evaluate each classifier ( Table 3).
As shown in Table 3, using only Gabor filters, the model achieved overall accuracies of 89.02, 91.58, 84.79, and 89.78% using FCL, RF, SVM, and XGBoost, respectively. However, adding the LBP texture feature did not markedly improve the performance, achieving 90.18%, 91.81%, 85.88%, and 89.92% overall accuracy using FCL, RF, SVM, and XGBoost, respectively. Similarly, the addition of edge detection filters, including the Canny, Robert, Sobel, Scharr, and Prewitt filters, did not markedly improve performance. The overall accuracies achieved using the FCL, RF, SVM, and XGBoost were 91.25%, 92.62%, 90.21%, and 91.83%, respectively. However, adding k-means clusters as a new feature improved the performance to 93.99%, 95.55%, 92.18%, and 93.20% when FCL, RF, SVM, and XGBoost were used, respectively. The addition of a Gaussian filter increased the overall accuracy to 94.93%, 96.33%, 93.18%, and 93.7% using FCL, RF, SVM, and XGBoost, respectively. Finally, the results obtained by the benchmark methods were compared with the results obtained using the proposed feature extraction method. The proposed method achieved the best results compared with all feature extraction methods, with overall accuracies of 97.34%, 97.15%, 97.39%, and 97.57% using FCL, RF, SVM, and XGBoost, respectively. Figure 10 (A-D) compare the different feature extraction methods and the proposed method using FCL, RF, SVM, and XGBoost. Figure 10 shows that the proposed feature extraction method consistently improves performance compared to the other benchmark methods. Figure 11 summarizes the performance of the proposed feature extraction method compared to other benchmark methods using FCL, RF, SVM, and XGBoost. Figure 12 shows the results of applying the benchmark feature extraction methods for WBC recognition compared with those of the proposed method. Figure 12 shows that the application of the feature bank using Gabor features (M1) achieved good overall performance but exhibited poor performance in MOB and NGB detection. The addition of the LBP texture feature (M2) did not improve the performance and exhibited an inferior performance in NGB recognition. The addition of edge detection features (M3) marginally improved the NGB detection performance but exhibited worse MOB detection performance. The addition of k-means (M4) to the feature bank improved both the NGB and MOB performance. Adding a Gaussian filter (M5) produced inferior results for both the MOB and NGB. However, the proposed feather extraction method exhibited the best performance in recognizing different types of WBCs, including MOBs and NGBs.
This study highlights one of the primary limitations of handcrafted features: they are strongly associated with the type of selected features that require extensive domain knowledge and experience [17]. They are also difficult to automate, require more training time (see Table 1), cannot be generalized, and cannot be used to detect other types of blood cells that are not used during training. Figures 10 and 11 show that handcrafted features (M1 to M5) produced inferior results compared with the proposed methods, which might be because only one type of WBC was used for model training. Using handcrafted features, Abdurrazzaq et al. segmented the entire WBC only in cases where WBCs have high WBC intensity colors, such as basophils and monocytes; however, in cases where WBCs were presented with light intensity colors, their method was able to segment the WBC nucleus and failed in cytoplasm segmentation [28].
The proposed feature extraction method produced the best results compared to the other comparative feature extraction methods. As shown in Table 3, despite the complexity and heterogeneity of WBCs found in this study, the use of CMYK-moment localization combined with deep-learningbased feature fusion improved the detection of WBCs with different shapes, structures, colors, illumination, and staining variabilities. In this study, features were extracted, and the model was trained using only one morphological class (BAS). However, the model was able to classify all the other 14 different types of WBC morphological classes with a high level of accuracy. The proposed feature extraction method also achieved a good performance when identifying different types of blood cells using an external dataset of different types of normal WBCs and platelets. Thus, the results demonstrate that the proposed method is generalizable and invariant to shape, structure, color, and other differential factors. However, these characteristics may be attributed to the use of an ROI, which helps establish a context-free learning environment by extracting only features that are related to WBCs, while eliminating other irrelevant features. Using 2D convolution layers improves learning by extracting more complex features compared with other conventional methods, such as Gabor filters, which can extract only simple features. Feature fusion of different kernel sizes and stacked layers of different depth levels helps to increase the generalizability of the proposed model. Figure 11 shows that most models fail to detect WBCs in some situations, particularly for cells with low contrast light color intensity [28], and cells that showed similarities of cytoplasm and background color. NGB and MOB are the most difficult WBCs to detect, which may be due to their light cytoplasm colors and smooth cytoplasm textures, which make it difficult to discriminate between the cytoplasm and image background. However, despite these difficulties, the proposed feature extraction method achieved excellent performance when detecting all types of WBCs, including MOB and NGB.
In this study, a hybrid method of CMYK-moment localization and a deep CNN-based feature fusion model were used for feature extraction. The model consists of two parallel layers: 1) a shallow convolution layer that extracts pointwise WBCs without losing pixel-wise information originality and 2) a deep stacked coevolution of two layers that extracts localized WBC features. The first layer was used to extract simple features such as edges, and the second layer was used to extract more complex WBC patterns. In this study, we hypothesized that using multiple convolutional operations might lose original pixel information; therefore, the first layer was applied as a shallow layer with a kernel size of one.

VI. CONCLUSION
Using a hybrid model of conventional image processing and deep learning methods using feature fusion, we developed a generalized WBC feature extraction method. Using ROIs, as opposed to the entire image, helped reduce noise and omit irrelevant features, thereby aiding the generation of a stabilized and context-free learning environment to extract the WBC features. 2D convolution layers also create strong feature extraction methods for WBCs because they are invariant to morphological structure, color, and staining variations. Feature fusion of single and stacked convolutional layers of different kernel sizes and depth levels helped establish VOLUME 10, 2022 a generalized feature extraction framework. The proposed method detected 15 types of WBCs, including both normal and AML cancer cells. The proposed method also harnessed the benefits of conventional image processing techniques for extracting ROIs and the strengths of deep learning methods for feature extraction. In the future, the proposed model could be further improved by hybridization and parallelization methods to reduce the computation time and increase model accuracy. MOHD SHAFRY MOHD RAHIM was born in Malaysia. He received the bachelor's degree in computer science and the M.S. and Ph.D. degrees in computing from UTM, Malaysia. He is currently a Professor at UTM, where he leads a research group comprising of a faculty members and a graduate students. He is also the Chair of undergraduate students at UTM. He has published research articles in several renowned international journals and conference papers and has been a keynote speaker in many international conferences.
TAN TIAN SWEE received the M.S. and Ph.D. degrees from UTM, in 2004 and 2008, respectively. He has spearheaded numerous projects and acquired prestigious grants from various sources. One of his notable milestones is his collaboration between the UTM and IJN. He is currently a Senior Lecturer with the Faculty of Biomedical Engineering, UTM, and works as a Program Coordinator with the Faculty of Biomedical Engineering. He has published numerous high-impact-factor journals, thus establishing his expertise in this domain. His research interest includes digital signal processing. He is also a member of the Medical Device and Technology Group and Frontier Materials Research Alliances.