Automated Diagnosis of Leukemia: A Comprehensive Review

Leukemia is the rapid production of abnormal white blood cells that consequently affects the blood and damages the bone marrow. The overproduction of abnormal and immature white blood cells leads to the damage of the immune system due to the reduced production of red blood cells and platelets by the bone marrow of the body. This hematological malignancy is generally diagnosed by manual methods such as complete blood count (CBC), bone marrow aspiration, or microscopic examination of the blood smear. Nevertheless, the manual methods of leukemia diagnosis are economical but are found to be less reliable, time-consuming, and hectic. Technological advancement in the medical field has effectively addressed these issues in the past. The problems in the manual diagnosis of leukemia detection have been overcome by the development of automated methods using the computer-aided diagnostic (CAD) systems for efficient and reliable leukemia diagnosis. Since the last decade, multiple approaches have been proposed for the CAD systems regarding pre-processing, segmentation, feature extraction, feature selection, and for the improvement of the classification accuracy of the CAD system for the leukemia detection. This paper presents a comprehensive review of the CAD systems for the detection of the various types of leukemia. The review presented here entails the details of various CAD systems for the automated diagnosis of various types of leukemia and analyses their methodologies in terms of their efficiency in pre-processing, segmentation, feature extraction and selection, and overall classification accuracy of the CAD system.


I. INTRODUCTION
The Leukocytes, commonly known as WBCs (white blood cells) are a prime component of our immune system that protects the human body from viruses, infections, bacteria and other foreign bodies. These nucleated cells are formed in the bone marrow and circulate through the bloodstream, so they can be found throughout the lymphatic system [1]. Leukocytes differ from the other blood cells such as red blood cells and platelets and therefore are majorly classified in two categories based on cell heredity as lymphoid and myeloid and cell structure as agranulocytes and granulocytes. These categories of leukocytes are further divided into five subcategories i.e. neutrophils, basophils, eosinophils, The associate editor coordinating the review of this manuscript and approving it for publication was Ravibabu Mulaveesala . lymphocytes and monocytes as given in Figure 1. Lymphocytes are further divided into three cell types as thymus (T cells), bone marrow (B cells) and natural killer (NK or K) cells [2]. However, the cancerous production of abnormal and immature leukocytes affects the immune system leading to the reduced ability of bone marrow to produce RBCs and platelets [3]. The proliferation of leukocytes, i.e., blasts in the bone marrow and blood is specified by a disease  commonly referred as leukemia. The abnormal production of malignant immature leukocytes termed as blast results/cells reduces the production of normal blood cells [4]. As it is crucial to maintain a required level of leukocytes in the blood, hence complete blood count i.e. CBC is often an indicator of disease. The causes of leukemia have not been known precisely perhaps due to the inherited and environmental factors. However, the diagnosis of leukemia at an earlier stage is mandatory as the abnormal leukocytes rapidly spreads through the blood stream and affects the other body organs [5]. The manual diagnoses of leukemia are less reliable, time consuming, tedious and inter-observer variation dependent. Therefore, automated computerized methods were introduced for Leukemia diagnosis in early 90s which quickly became popular owing to their low cost, high accuracy and less or no manual involvement. Consequently, in the subsequent part of this section we shed some light on the origin and evolution of automated diagnosis of Leukemia. However, before that it is only natural to talk about the types or classes of Leukemia first.

A. CLASSES OF LEUKEMIA
The types or classes of leukemia depends upon the type of affected leukocytes. If the immature leukocytes are lymphocytes then leukemia is classified as Lymphocytic Leukemia, and if the immature leukocytes are monocytes and granulocytes then it is considered as myelogenous Leukemia [6]. In general, leukemia can be categorised as acute (when affected leukocytes are unable to perform as normal leukocytes) or chronic (when affected leukocytes can perform as normal cells), making chronic leukemia extreme [7]. An illustration of famous Leukemia types and their diagnoses is presented in Figure 2. Detailed description of each of these types is given below.

1) ACUTE LYMPHOBLASTIC LEUKEMIA (ALL)
Acute lymphoblastic or lymphocytic leukemia is the type of cancer occur in bone marrow and blood. Lymphoblast is the immature blast cell that has altered morphology and is activated by antigen which transforms it into a mature lymphocyte [8]. The French American British (FAB) classifies ALL into three classes i.e. L1, L2 and L3 according to their morphological differences. The L1 cells are homogeneous, vacuoles-less, round nucleus and contain cytoplasm. L2 cells are larger in size than L1 cells. They have irregular nucleus and varying cytoplasm. L3 cells are of homogeneous shape, with round nucleus and cytoplasm containing vacuoles [9]. Figure 3 depicts the image samples of healthy leukocytes and four leukemia types. Although numerous studies have been proposed for the automated diagnosis of acute lymphoblastic leukemia, only fewer studies have proposed the methods to diagnose the subtypes (L1, L2 & L3) of ALL [10].

2) ACUTE MYELOID LEUKEMIA (AML)
It is the most common type of acute leukemia that is particularly common in elderly people and is known as acute myelocytic leukemia, acute myelogenous leukemia, acute non-lymphocytic leukemia and acute granulocytic leukemia [12]. It is characterized by the formation of immature blast cells which are other than lymphocytes [13]. In AML, the bone marrow may also make abnormal red blood cells and platelets [14]. The symptoms of AML include anaemia, fever, ulcers of mucous membranes and granulocytic insufficiency [15], [16]. Because of the cells that form myeloid cells, AML differs from other types of leukaemia. FAB has classified the blast cells into two types as Type I and Type II. Both of the blast types have prominent central nucleoli which are usually three to five in number with well-defined and uncondensed chromatin. Type I has lesser number of cytoplasmic granules and bigger number of primary granules, compared to Type II blasts. The major discriminating factor of AML from other leukemia types is its eight sub types, according to FAB classification. These subtypes are shown in Table 1 along with their characteristics. For M1 to M6 AML diagnosis, typically >30% blasts proportion is needed in the bone marrow smear. M0-M5 AML subtype start in the primitive WBCs, whereas M6 and M7 start in primitive RBCs and platelets respectively [15].

3) CHRONIC LYMPHOCYTIC LEUKEMIA (CLL)
It is common in adults, particularly in people who are over the age of 60, and common in men than women. This type of leukemia is different from others in terms of symptoms: inflamed lymph nodes, anorexia, weight loss, fatigue, and weakness. It is characterized by the accumulation of B-cell lymphocytes [12]. In CLL, almost 90% of leukocytes appear to be similar to normal cells. However, comparable to normal lymphocytes, the para chromatin is clearly separated and condensed from nuclear chromatin. Occasionally, it is less condensed relative to normal lymphocytes. That is why the diagnosis of CLL is a difficult task. In many lymphocytes, the nucleoli are clearly visible with moderate cytoplasm [16].

4) CHRONIC MYELOID LEUKEMIA (CML)
It is predominant in young and adults of the middle age. In CML, there is an excessive production of immature granulocytes that spill out of bone marrow and travel in the blood stream. Moreover, the percentage of myeloblasts is less than 10% with the excessive production of neutrophils, monocytes, myelocytes, and eosinophil myelocytes. In most of the cases, myelocytes and neutrophils exceed in quantity than the percentage of the normal leukocytes. The symptoms of CML include anaemia, weight loss, progressive enlargement of FIGURE 3. Image sample of (a) Healthy leukocytes [11]; (b) acute lymphoblastic leukemia (ALL); (c) Chronic lymphocytic leukemia (CLL); (d) Chronic myeloid leukemia (CML); and (e) Acute myeloid leukemia (AML) [7].
spleen, fever and night sweats. Over time, a CLL patient gets a decline in the quantity of RBCs and platelets is associated with anaemia, bruising and blood loss [15]. This review paper entails a detailed review of CAD for the various types of leukemia. The research reviews different datasets and approaches used for various types of leukemia diagnosis. The review analyses and compares different approaches and presents a comprehensive review of the studies done in the recent past.
The organization of paper is as follows: The subsequent subsection presents authors take on the origin, progress, challenges and developments in automated diagnosis of Leukemia. Subsequently, a detailed discussion on materials and methods available in the literature for automated leukemia diagnostic systems is provided in Section II. That also presents detailed notes on image acquisition, publicly available datasets and stages involved in computer aided Leukemia detection, i.e., pre-processing, segmentation, feature extraction, and leukemia classification. Section III describes performance measures which have been used in the literature while Section IV presents a discussion on the state of the art of the automated diagnosis of Leukemia that not only describes the traditional methods but reports on modern machine learning and deep learning methods. Section V presents analysis and discusses on the reported work while Section VI concludes the research giving it a future dimension.

B. BACKGROUND OF AUTOMATED DIAGNOSIS OF LEUKEMIA
This section gives a brief overview of the origin and evolution of the automated diagnosis of Leukemia. In this regard, major challenges faced by researchers and scientist along with their solutions, are also discussed separate subsections.

1) ORIGIN
One of the conventional methods of the diagnosis of leukemia is to analyse the medical history of a person and his family. However, that requires a series of appointments and blood test examinations. Cytogenetic analysis is another type of leukemia diagnosis method where abnormality in individual chromosomes is observed [17]. Complete blood count (CBC) is the conventional method of leukemia diagnosis which is performed by doctors as they check the complete count of WBCs, RBCs and platelets. The CBC identifies the leukemia cells but is not reliable for the confirmation of leukemia in the patient. For this reason, bone marrow aspiration and microscopic examination of blood smear are the conventional manual methods used for leukemia diagnosis [18], [19].
Generally, Leukemia is diagnosed by the manual examination of microscopic blood smear images by physicians and experts. Immunohistochemistry is another method of analysis of the blood samples where the antigens of the cells in a tissue are analysed for the diagnosis of leukemia. Other methods for the leukemia diagnosis include interventional radiology techniques such as biopsy, catheter drainage, and percutaneous aspiration and other methods like Long Distance Inverse Polymerase Chain Reaction (LDI-PCR), Molecular Cytogenetics, and Array based Comparative Genomic Hybridization (aCGH). However, radiological methods suffer from resolution and sensitivity of the radio images and requires extensive labour work to diagnose various leukemia types [20].
The problem with these manual methods of Leukemia diagnosis is their labour intensive and time consuming nature that leads to lesser efficiency. Moreover, the accuracy of diagnosis is subjective as it requires highly trained medical professionals to perform sophisticated examination. These practical challenges paved the way for the advent of automated diagnostic systems for accurate and effective leukemia diagnosis. Apart from overcoming these problems of manual diagnosis, computerized diagnosis systems are found to be more reliable, efficient and accurate than manual detection techniques [10].

2) PROGRESS
Technological advancements have led to the progressive and improved methods of leukemia detection. Over the years, automated methods have been increasingly used in the medical for the diagnosis of cancer, malaria, skin diseases, and many others [21]. However, since last decade, numerous automated methods for various types of leukemia diagnostics have been proposed which are efficient and have improved accuracy of classification. Dozens of researches have been presented in this field related to WBC segmentation, blood counting, blood smear analysis and leukemia diagnosis using Image Processing Techniques [22]. In recent years, various studies have proposed computer vision, machine learning (ML) and deep learning (DL) frameworks that effectively classify the leukemia blood cells with better accuracy. One of the biggest advantages of computer vision-based leukemia detection methods is the ease in the diagnosis: instead of analysis of blood and bone marrow smear, the image of these samples can be remotely transported [23]. These automated methods of leukemia detection using microscopic blood smear images have utilized variety of methods that are efficient and effective. This can help hematologists and pathologists to effectively and efficiently diagnose leukemia and provide the patient with the appropriate treatment in less time.

3) MAJOR CHALLENGES AND KEY DEVELOPMENTS
As automated diagnosis of Leukemia using microscopic images had increased since decades, but still there are multiple open problems for researchers to address or improve upon. Initially, noise, artifacts and poor quality of the acquired images were prominent challenges which reduced the efficiency of computer aided diagnosis (CAD) methods for Leukemia. These issues were progressively addressed by introducing various pre-processing techniques in the subsequent literature. To enhance the quality of the acquired image, various pre-processing methods were developed to enhance the quality of the medical image, e.g., Histogram Equalization, Contrast Stretching etc. However, the drawback of these methods was their inability to distinguish between noise and image pixels leading to poor image quality. Another technique used in literature is unsharp masking but it is vulnerable to noise and hence making image blur and difficult to diagnose Leukemia cells. That prompted researchers to employ variants of mean and median filters to filter out noise prior to image enhancement. However, these classical noise removal methods are known eat up important image details that adds to lack of image quality. Hence, development of reliable and efficient pre-processing techniques is still a major challenge in automated diagnosis of Leukemia.
The techniques used for detection of Leukemia cells from an image are generally based on Histogram based thresholding techniques for segmentation of region of interest. Despite their ease of implementation, these techniques require the selection of a suitable threshold that is a challenging task. In order to overcome this issue, researchers Watershed Segmentation is used despite its yielding over segmentation.
A key issue with these classical techniques is that these methods may yield good results for one specific problem but may decrease accuracy for another.
Once the region of interest is segmented, a number of classifier are developed to classify the presence of absence of Leukemia cells. For this purpose, support vector machines (SVM) are most commonly used as a classifier. However, its efficacy is limited because it is a binary classifier hence it is not suitable for the classification of subtypes of Leukemia. It can only be used for classifying healthy cells and Leukemia Blast cells. Lately, Machine learning and Deep learning methods are being actively utilized for segmentation and classification of healthy and malignant cells owing to their increased accuracy. Nevertheless, this superior performance comes with challenges of its own which include computational cost and availability of large datasets to learn properly and help in diagnosis efficiently. Key results in the automated diagnosis of Leukemia are presented pictorially in Figure. 2.

4) PRESENT DAY METHODS
Automated diagnosis of Leukemia has seen various innovative solutions which involve classical image processing methods and ML/ DL methods. These techniques follow a pipeline of specially designed stages generally termed as preprocessing, segmentation, feature extraction and classification of the disease. Here, success of every stage leads to success of previous stage. Hence, high accuracy of classification depends upon favourable outcome of all the stages, and is problem dependent. A key development in automated diagnosis of Leukemia is the use of innovative DL networks which have achieved higher amount of accuracy in segmentation and classification. The recent innovation, in this regard, is the concept of transfer learning that tunes a DL network with better accuracy for another problem to detect Leukemia. These networks includes AlexNet, ResNet etc. With the development of ML and DL techniques the accuracy of the Automated diagnosis of Leukemia increased but at the cost of vast resources spent on computing power and dataset collection.

II. MATERIALS AND METHODS
An automated procedure for detecting lymphoblasts from colored microscopic images comprises of four successive steps which are segmentation, leucocytes identification, lymphocytes identification, and identification of potential lymphoblasts. The process of automated diagnosis of Leukemia start with acquisition of images from Leukemia patients which may be subject to noise, artifacts and poor contrast due to the imaging modality. Consequently, the acquired datasets are preprocessed to enhance their quality. Next, segmentation techniques are utilized to extract the foreground from the background pixels based on their distinguishable characteristics, e.g., color, texture and shape [24]- [26]. Lastly, WBCs are distinguished from RBCs through classifiers whereby the presence of nucleus is first determined from the color information of the cells. In the next phase, Lymphocytes are discriminated from the other WBCs by examining the nucleus structure and presence of cytoplasm. The final phase of classification examines the morphological deformation in the lymphocytes for the determination of potential lymphoblasts. This general methodology of automated CAD system using microscopic images and computer vision methods for leukemia detection and classification is depicted in Figure 4 and it consists of five distinct stages [27] that not only depict all the steps involved in automated diagnosis pipeline of Leukemia but also list key results in each stages of the pipeline. Here, we aim to present a graphical overview of the literature on leukemia diagnosis where divide listed techniques into their respective categories to impart better understanding to the reader.
In the following subsections, we present a thorough discussion on each stage which involves discussion and critical analysis of the key pre-processing, segmentation, feature extraction and classification algorithms.

A. IMAGE ACQUISITION
In recent years, many researches have proposed variety of techniques for leukemia diagnosis. For the purpose of investigation, variety of public and private leukemia datasets have been formulated from the blood samples collected from healthy and leukemia patients. These blood sample images are acquired and digitized using different types of cameras and microscopes. In figure 4 we depicted types of Cameras and microscopes used for Image acquisition and we also give brief overview of the Datasets that are Publicly available and can be used for further research, this section is normally ignored in literature. The datasets which are publicly available are used by many researchers for investigating the integrity of their proposed framework as given in figure 2. However, many researches have collected their own datasets from local hospitals and laboratories which are kept private and can be requested on demand. Furthermore, since DL based approaches require a large data repository for the training purpose, various data augmentation methods have been presented in the literature that makes the data large enough to be suitable for training the classification models. Various data augmentation techniques include image transformation methods such as image rotation, translation, shearing, blurring, mirroring, Gray scale image transformation and histogram equalization [28]- [30] have been used.   taken with various microscopes at various resolutions in a web-based image library. The ASH images bank is a benchmark dataset that includes leukemic and normal (AML and ALL) blood photographs that have been labelled by specialist pathologists. The database consists of 240 images, including 100 images of healthy patients, 80 images of AML and 60 images of ALL. Around 420 of the sub-images were extracted from 100 healthy patients' images, 80 AML patients' images, and 60 ALL patients' images [32]. The majority of the data was obtained between 2003 and 2015 from a variety of patients ranging in age from 4 to 73 years. Figure 6 shows the image samples obtained from ASH dataset.

1) PUBLICLY AVAILABLE DATASETS
BloodSeg image dataset contains 367 blood smear images that have ground truth segmentation done by an expert for the nucleus of the cell and captured at 640 × 480 pixels resolution. All images are acquired with an optical microscope coupled to a color CCD camera where the magnification of the microscope is 100 times the objective of the lens.
CellaVision dataset is made up of 100 300 × 300 colour images that were obtained from [34]. The images of the cells are usually purple color with possibility finding red blood cells surrounding the white blood cells as given in Figure 7.
JTSC dataset was obtained from Jiangxi Tecom Science Corporation in China [36]. It comprises 300 WBC images with a colour depth of 24 bits and a resolution of 120 × 120 pixels. The blood smears in the dataset were treated with VOLUME 9, 2021 FIGURE 8. JTSC dataset images of colored images (first row), ground truth images (second row) [35]. a new developed haematology substance for quick staining of WBCs, and the images taken were by a Motic Moticam Pro 252A optical microscope camera using an auto-focus microscope N800-D motorised. The majority of the images have a yellow background as shown in Figure 8.
SMC-IDB, [37] proposed this dataset. This image dataset of peripheral blood smears is obtained from the Giemsastained slides. The images in this dataset are distinct as they were captured by different cameras and microscopes. The dataset contains 367, 640 x 480 pixel representations of peripheral blood and is accessible at [38].
IUMS-IDB The Iran University of Medical Science [39] dataset, available at [40] contains 195 microscopic photographs taken from the peripheral blood of around 10 patients. WBC images are taken by Canon V1 camera that is attached on Canon optical microscope. The images are stored in jpg format with 300 dpi and 3872 × 2592 pixels resolution. The pictures are in RGB color format. Visually they are different from ALL-IDB and SMC-IDB photos. Figure 9 shows the difference of image samples from ALL-IDB1, SMC-IDB and IUMS-IDB. Despite the availability of the public datasets, some studies have also used blood smear images from Google or internet [28], [41]- [43]. Table 2 summarizes the publicly available leukemia datasets along with their specifications which are used in the automated detection of leukemia.

2) PRIVATE DATASETS
Various studies have used ASH and ALL-IDB public leukemia dataset. However, ALL-IDB dataset does not contain blood cell images of other leukemia subtypes. Furthermore, for ML and DL based frameworks considerable amount of data is needed to train the model [30]. For this reason, various researchers have collected their own dataset of blood smear images from pathologists and local leukemia hospitals [27], [46]- [48]. The details of the datasets used by the past studies are given in the next sections.

B. COLOR/STAIN NORMALIZATION
Numerous cancer types diagnosis can be performed by analysing histological image samples [49]. The histopathological image sample diagnosis based absolutely on the visually analysing tissue images under a microscope. Though, the appearance of histological tissue may have several color intensities depends upon the ability of operator, scanner specifications and staining process. Due to this variability in stain the diagnosis is affected, and accuracies of CAD systems also decreases. For this reason, the color/stain normalization method has been introduced. This technique proved to be an effective tool, letting to normalize of the color/ stain appearance of a original image with respect to a reference image. Many of the researchers in literature have proposed techniques for Staining/ Color normalization. In [50] a novel ACD (adaptive color deconvolution) algorithm is proposed for separating stain and normalizing color of hematoxylineosin-stained WSIs (whole slide images). Four datasets of WSI were used to evaluate this technique comprising lung, cervix, and breast cancers and comparison between 6 stateof-the-art techniques was performed. The technique proposed attained the most reliable performance in color normalization as per the quantitative metrics. In [51] an automated novel stain normalization and separation technique for H&E stained slides of histological images is proposed. The suggested techniques Stain Color Adaptive Normalization (SCAN) can enhance contrast between background and histological cells/tissue while preserving local formation. Discussed methods were validated both quantitively and qualitatively on multiple datasets, providing extremely satisfactory outcomes with lowest possible computational costs. In [52] a proposed technique takes H&E (Hematoxylin and Eosin) stained color digital images as an input to recognize lymph cell. The method includes cells segmentation from extracellular matrix, feature extraction, categorization and overlap resolution. We can observe that since 2008 there is increasing number of techniques proposed for normalization of H&Estained histological images. Yet, drawbacks/weaknesses of these proposed methods and future possibilities in research encourage the analysis of new methods that may provide pertinency in medical applications and promises to provide greater performance of Computer Aided Diagnosis systems.

C. PRE-PROCESSING
Images are often distorted by variety of factors that distort the image quality [53]- [57]. This includes the light source, poor contrast, background clutter, blurriness, or any other type of noise [58]- [61]. Pre-processing is the step intended to remove such kind of unwanted entities from the images [62]- [65]. It is one of the crucial steps in a variety of domains particularly medical [66]- [69]. Thus, it is one of the important steps to pre-process the medical images using variety of simple and complex image processing and computer vision techniques so as to improve the accuracy of the diagnostic systems [70]- [72]. A variety of image pre-processing methodologies have been presented in literature that pre-processes the blood images to enhance the region of interest (ROI) to be utilized further for diagnosis of leukemia [73]. These techniques are of two main categories, one is noise and artifact removal, another is image enhancement. For each category we also enlisted the techniques commonly used in literature for Noise and Artifact removal as well as for Image enhancement as depicted in figure 4. These two main categories are further divided into different types and most common of them are the Histogram Equalization, Linear Contrast Stretching, Unsharp Masking, and application of filters like Gaussian, Median, Minimum, and Weiner Filters. Occasionally, the results of filter application or contrast adjustment on the blood images are not satisfactory and it is better to convert the image to another domain as from Red, Green, Blue (RGB) domain to Hue, Saturation, and Value (HSV) or Hue, Saturation, and Lightness (HSL) domains for better identification and efficient detection of region of interest [74]- [76]. An overview of some of the excessively used pre-processing techniques is as follows.

1) HISTOGRAM EQUALIZATION
It is among the simplest image processing techniques that enhances the contrast of the image by normalizing its histogram [61], [77]. It is useful for images which have either darker background or brighter foreground and vice versa. Thus, it is suitable to improve the background and the blood image contrast [78]. However, it is not a robust technique since it is unable to distinguish between noisy and true image and thus reduce the image quality.

2) LINEAR CONTRAST STRETCHING
It enhances the contrast of blood image by spreading its intensity values to a dynamic range. Thus, it normalizes the intensity values present in an image. It is a simple and useful technique suitable for low contrast images [79]. However, it is highly vulnerable to noise and its effectiveness is reduced if the image contains outliers.

3) UNSHARP MASKING
It is a simple method to produce sharp images and remove blurriness from the image. It can be applied to a whole image or onto selective parts depending upon the need and the required adjustment in the image. Yet, this technique is not robust as it is highly sensitive to noise.

4) GAUSSIAN FILTERING
It is a simple filter for removing blurriness and noise from images [80]. However, it is not beneficial in the cases when it is used alone for removing noise as it can often blur the fine details such as edges and may affect the contrast of the image.

5) MEDIAN FILTERING
It is one of the famous nonlinear filters for removing salt and pepper noise. Unlike Gaussian filter it preserves the fine details and sharp edges in the image. However, its efficacy is limited to the images having low density of salt and pepper noise and does not perform well for the images which have a percentage of salt and pepper noise.

6) MINIMUM FILTERING
It is useful for removing salt or the positive outlier noise. It highlights the lighter objects to be easily recognized after the process of segmentation [81], [82]. However, it is not robust enough for highly noisy blood images as it can remove details present in an image.

7) WEINER FILTERING
It produces the linear estimation of the original image by minimising the mean square error of the ideal and original VOLUME 9, 2021 image. It removes the blurriness and additive noise from the image making it approximately closer to the original image.
Despite the benefits of all of the aforementioned techniques, all of them lack in one aspect or the other and in some cases lead to the loss of the image details. Furthermore, these are feasible for some low contrast blood images that do not have high density of noise. Thus, researchers have emphasized the need of efficient and reliable pre-processing techniques to enhance the image quality for the better diagnosis of leukemia [83].

D. SEGMENTATION
In automated diagnosis system of disease, Image segmentation is a main part that analyse bone marrow and blood smear images [84]- [86]. In this step an image is subdivided into its constituent objects or regions [87], [88]. The process of leukocytes segmentation separates the cell from its background. This is frequently done by the distinction of nucleus from cytoplasm of the cell [89]. This can be conveniently done through variety of image processing techniques and also available in various medical softwares. A variety of techniques for image processing have been proposed in literature which produces a binary image of leukocytes that is the mask of the original color image [42]. Reference [17] proposed a framework for blood image segmentation that segments nucleus of the leukocyte cell and produces good results. For Segmentation techniques, again there are two main categories, one is Traditional methods that were initially used for Automated diagnosis of Leukemia along with latest trends as Deep Learning Methods utilized nowadays in order to provide better accuracy with less computational time as shown in figure 4. These categories further divided into famous techniques as cauterization, morphological filtering, contrast stretching, thresholding, color space conversion and watershed segmentation and various Deep learning networks [90], [91]. Reference [14] has reviewed various methods of segmentation that have been used for segmenting bone marrow and peripheral blood smear images for classifying various types of leukemia. It is observed that segmentation of leucocytes is performed using clustering particularly K-means clustering and neural networks during the years 2016-2020. Contrast Stretching is found to be the best method for enhancing the nuclei of the WBC followed by the application of morphological filter that averages the diameters of the WBC. Various studies have looked for the morphological alterations in the grayscale microscopic blood smear image of the WBC [92]. Some studies have performed segmentation of the peripheral blood smear and bone marrow images based on Neural Networks along with the classification of leukemia [7], [36], [93]- [97]. The common methods of segmentation of peripheral blood smear and bone marrow images that are used in literature for automated detection of leukemia are discussed as follows.

1) THRESHOLDING
It is simple yet productive method for image segmentation used in variety of medical studies and particularly in leukemia detection. It converts the gray scale image into binary image by segmenting foreground image from background regions. Depending upon the threshold value, if the intensity value of the image region is less than the predefined threshold value, then the resultant image will have zero value (black) or if the intensity value of the image region is greater than the predefined threshold value, then the resultant image will have one value (white color). Thus, a resultant image produced is a binary image made up of black and white pixels. Foreground and background are not always fixed and depends upon the region that is desired to be extracted. It is an effective technique for high contrast images and gives good results of segmentation [98]. Global thresholding or Otsu's thresholding is an extensively used methods in leukemia studies but it blurs the edges of the blood cells so after performing Otsu thresholding edge preserving filters are applied [99].
Lymphocytes detection is extensively been done by Zack's algorithm. This algorithm segments the blood smear image using a threshold value that forms a line among the lowest to the highest histogram value [32]. Thus, this method is called as triangle oriented thresholding method [100].

2) REGION GROWING
Type of region-based segmentation that effectively selects the region of interest from the predefined criteria. Such criteria can be fine details such as edges or pixels as seed points [101]. This is actually a pixel-based segmentation where initially a single pixel is selected and as per criteria the neighbouring pixels are selected and connected in the region of interest of the initial pixel. It is a widely used method in cancerous detection [102].

3) WATERSHED SEGMENTATION
It is a form of region growing segmentation technique that starts from marker (initial pixel) and based on the predefined criteria it includes the neighbouring pixels [103]. It is extensively used segmentation technique in the medical images because of the efficacy of production of its results as it extracts the closed and connected regions relative to other segmentation techniques that distorts the boundaries and disconnected regions are produced for which morphological operations need to be performed. Since local minima can produce over segmentation of the blood smear images, thus variants of watershed segmentation are used. One such variant is the marker controlled watershed segmentation that uses external and internal markers for segmentation of the region of interest [104].

4) MORPHOLOGICAL SEGMENTATION
This extracts different components from the blood smear image based on the set theory in mathematics. It has the basic idea of moving a predefined structuring element over the image like a sliding window and specific region will be extracted from the image depending on the morphological operation performed. The common morphological operations that are used in leukemia detection are erosion, dilation, opening, closing, and hole filling [105]. This type of segmentation is effective for regions and shape-based feature descriptors and for the same reason it is used in the segmentation of microscopic blood images [106], [107].

5) K-MEANS CLUSTERING
A method that segments the image regions into clusters of similar objects. It is a semi supervised learning technique for the unlabelled data. It divides the images into K clusters using the total number of blood smear images so that images having the similarity can be grouped into one cluster. It is an extensively used method for blood image segmentation that extracts the WBCs and blasts from the image [108].

6) FUZZY C-MEANS
It is general form of K-means clustering that allows each data point to have same cluster. This technique can effectively handle outliers and works well for the noisy medical images unlike the segmentation done by K-means clustering. It gives better results for the images that have high illumination or percentage of noise but it is a time consuming method because of the long calculations [109].
Since last decade, many studies have performed artificial intelligence (AI) based segmentation of the WBCs. These studies utilize some sort of neural network for the segmentation purpose, especially for the medical images. Some of the extensively used are Convolutional Neural Network (CNN), Support Vector Machine (SVM), Functional Link Artificial Neural Network (FLANN), and Artificial Bee Colony Back-Propagation Neural Network (ABCBPNN). These methods segment the microscopic peripheral blood and bone marrow images and are also used for classification of leukemia cells.

E. FEATURE EXTRACTION
Feature extraction is a phase of prime importance in image processing, pattern recognition and machine learning. It is the phase intended for dimensionality reduction; where large datasets are represented by distinct set of values that are non-redundant and significant enough to be used for recognition and classification purpose. In the context of leukemia detection, the various type of features that are used in my studies include shape, color, texture, fractal dimension, statistical, and many others [110], [111]. For Feature Extraction, we also enlisted the Key features that are commonly extracted for Automated Leukemia Diagnosis. These Features mainly falls into two categories, one is Intensity and Texture Features another is Morphological Features. Also the main techniques used to extract both the Feature categories are given in figure 4. The features extracted at this phase are used in the next phase for leukemia detection and classification. The accuracy of the detection of leukemia highly depends upon the features that are used for classification purpose. Thus, selecting the best and non-redundant features is of utter importance for the accurate diagnosis of leukemia. The detection of acute lymphoblastic leukemia is highly dependent on the feature extraction phase as the characteristics of the nucleus and cytoplasm of the blast cells [110]. The brief overview of the features that have been used in the literature for leukemia detection is presented as follows.

1) INTENSITY AND TEXTURE FEATURES
Texture plays an important role in the medical images because of the ease of selection of region of interest (ROI). Additionally, color and intensity also plays the significant role in the ROI selection. In particular, for the identification and classification of blast cells, texture and intensity of the blood smear features are important and significant for leukemia detection. Gray Level Co-occurrence Matrix (GLCM) is a statistical method for feature extraction that examines the texture by analysing the spatial association between the pixels. In the context of leukemia detection, it is a useful method to extract features from the blood smear image based on its texture and intensity and thus classify the blast cells [110]. Gabor texture feature is another important method to extract the texture features [99]. Local Binary Pattern (LBP) is a fast and efficient texture feature that is widely used for leukemia detection particularly for the detection of blasts through the variance in the illumination [112], [113]. Reference [114] proposed a variant of LBP that is promising for the detection of leukemia cells. Entropy is the texture feature that can be used for measuring the randomness of the nucleus from the blood smear image, leading to a reliable feature for detection of the acute leukemia [115]. Fractal dimension is an extensively used feature to measure quantitative information from the blood and bone marrow smear images. The fractal geometry of the nucleus of the WBC is measured to analyse the roughness of the cell and to identify it as either blast or normal [116]. Hausdorff dimension is used with the fractal dimension as a feature to measure the roughness of nucleus and is widely used for the analysis of microscopic blood smear images [117]. First order statistical features are important features that are based on the original pixels of the blood smear images. These features are based on the histogram and include mean, standard deviation, energy, entropy, skewness and kurtosis [118]. Since blood cells are darker than the background so color features are extensively used for the classification of blasts. Usually mean values of different color models are used as features to the classifiers [119]. Usually color features are used by extracting features in the HSV domain.

2) MORPHOLOGICAL FEATURES
Morphology is the important factor to be considered for medical image processing. In leukemia detection and classification, morphology of the blast cells is an important feature since they have area, perimeter, and circularity that is different from the normal cells, thus providing ease in the classification of blasts [120].
Shape based features such as circulatory, solidity, eccentricity, area, perimeter, and many others can help in the VOLUME 9, 2021 classification of the blast cells. These features have significantly been used for the acute leukemia detection [121]. Bending energy is another shape-based feature that is used for acute leukemia detection particularly acute lymphoblastic leukemia through the boundary of the cell and its curvature [122]. Roundness Ratio is also an important feature that is extensively used for leukemia detection and WBCs counting. It is an efficient feature for the improved classification of leukemia cell and its subtypes because of the variance in the circular shape of the blasts that makes it a distinctive among other features [123]. Chain code features are widely used in acute leukemia detection. These features trace out the nucleus and cytoplasm boundary thus separating the nucleus and cytoplasm of blasts [96]. Some studies have used morphological and texture features in conjunction while some have utilized NN based features [105]. Reference [27] used a variety of features for lymphoblasts classification such as Hausdorff dimension, signature contour, shape features that include eccentricity, perimeter, compactness, elongation, form factor, area, and solidity, color feature as homogeneity, energy, correlation, and entropy. The combination of features produces a better accuracy of lymphoblasts classification. Reference [73] utilized the basophilia intensity texture features to characterize the cytoplasmic profile of the leukemia cell. This type of texture feature is obtained by thresholding segmentation method to the green component of the RGB blood smear image followed by counting the pixels of this region. Reference [74] obtained features through histogram and used local direction operator (LDP) to extract features from the nucleus of the leukemia cell. Reference [124] proposed a method for efficient leukemia detection. Their method utilized Fisher's Discrimination Ratio (FDR) followed by Exhaustive Search to get three distinctive features which are diameter, cluster prominence of the nucleus, and the minor axis of the ellipse bounding.
Since a lot of information is present in the blast cells, so a larger feature set for classification might not produce good results. Thus, feature selection should be performed to reduce the dimensionality of the feature set and for the selection of non-redundant features prior to the classification phase. The common algorithms used for feature reduction are Local Direction Pattern (LDA) and Principal Component Analysis (PCA). Some other feature selection methods that have been used in the literature include Fisher's Discrimination Ratio followed by application of Exhaustive Search [4]. Reference [125] used PCA to effectively reduce the redundancy in the feature set for leukemia classification [110]. Probabilistic Principal Component Analysis (PPCA) and Genetic Algorithm (GA) are also efficient methods for feature reduction [110], [126].

F. FEATURE CLASSIFICATION
Classification of leukemia is normally through supervised learning where model is trained on the labelled data and tested on the new data that is different from the training data [89]. Classification model for leukemia can be partially or fully automated. For Classification we also enlisted the key Classifier types Which have been used for Classification purpose in literature in figure 4. Machine learning and deep learning models are also used for the classification of leukemia cells. The common classifiers used in the literature are SVM (Support Vector Machine), Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN), Probabilistic Neural Network (PNN), Naive Bayes (NB), and Adaptive Neurofuzzy Inference System (ANFIS). Brief overview of the different classifiers used for the leukemia detection is given as follows.

1) SUPPORT VECTOR MACHINE (SVM)
It is one of the extensively used algorithms for blood cell analysis, leukemia detection and classification. It is extensively used for binary classification as normal vs blast or normal vs acute leukemia classification. However, for subtype classification of any type of leukemia a hybrid approach is used [127], [128].

2) MULTILAYER PERCEPTRON (MLP)
It is an artificial neural network model that is simple and has powerful computational capability. Its design is layered and each layer is connected to the subsequent layer through a network of nodes. which are neurons that map inputs to the outputs through the activation function [129].

3) PROBABILISTIC NEURAL NETWORK (PNN)
It is an extensively used classifier for leukemia detection and classification [130]. It approximates the probability distribution function (PDF) of each class of the leukemia cell by application of Bayes theorem.

4) K-NEAREST NEIGHBOR (KNN)
A simplest and efficient method that is extensively used for classification and regression scenarios. It uses the lazy learning and nonparametric method to classify the acute leukemia [131].

5) RANDOM FOREST (RF)
It is a classification model that uses ensemble learning to classify an object from the input feature vector. It is an efficient classifier that is not prone to overfitting and contains diverse groupings of trees that perform voting for the selection of the output class and the class with maximum votes is selected. In the context of leukemia detection and classification, RF efficiently classifies between the normal and blast cells [132], [133].

6) NAIVE BAYES (NB)
It is a probabilistic classifier that is efficient and simple. It is based on the Bayes' theorem that assumes independence in the feature values and it is extensively used for general leukemia classification and for acute lymphoblastic leukemia detection from the microscopic blood images [134], [135].

7) ADAPTIVE NEUROFUZZY INFERENCE SYSTEM (ANFIS)
It is a hybrid method that is a combination of fuzzy logic and ANN. It is a robust classifier that is extensively used in the medical images and performs better than the RF algorithm even if the feature set is not strong. Thus, it is extensively used for the classification of blast and normal cells from the microscopic blood smear images [136], [137].

III. PERFORMANCE MEASURES
This part explains the metrics that number of researchers consider in order to evaluate the efficiency of the suggested approaches. A system that can detect the existence of lymphoblast cells in given images could operate with a variety of module structures and the classification is based on the correctly classified blast cell or no blast. As a result, (T P ) (True Positives) can be calculated as the number of cells properly categorized by the test as positive; (T N ) (True Negatives) can be calculated as the quantity of cells classified correctly by the test as negative; False Positive (F P ) shows to the quantity of cells that the test classifies as positive but aren't; False Negative (F N ) indicates to the number of elements that the test classifies as negative but aren't.as. These measures can be calculated as shown in Eq.1, 2, 3 and 4, respectively. Whereas Seg img is the Image segmented, and GT img is the Image Ground truth. These expressions are used to describe how well system performs. (1) True positive rate (TPR) is the correctly classified elements' probability with leukemia and can be calculated as shown in Eq.5. False positive rate (FPR) is the correctly classified elements' probability without Leukemia and can be determined as shown in Eq. 6.
Sensitivity (Sn), called as a recall, is the fraction of the true positive results to the total results that are supposed to be as shown in Eq.7. Specificity (Sp) represents the ability to identify background pixels as shown in Eq.8. Accuracy reflects the model's ability to achieve correct predictions and can be calculated using Eq. 9. Precision also termed a PPV (positive predicted value) is fraction of true positive results to the total positive results as given in Eq. 10.
Precision/PPV = T P T P + F P (10) F1-score identifies the not correctly classified cases by using precision and recall as given in Eq.11. AUC (The area under the curve) used to estimate the performance of binary classification model as used in [138]. It is the area underneath the receiver operating characteristic curve that plots two parameters; sensitivity over 1-specificity as given in Eq. 12. Classification Error(CE) as the total error in an analyser calculated as given in Eq. 13 The jaccard similarity is the fraction of the intersection and the union of ground truth and segmented image presented in Eq. 14 and the overlapping error can be calculated using Eq. 15 where S img and G img represent the segmented region and ground truth respectively. A receiver operating characteristic (ROC) curve illustrates the clinical specificity and sensitivity relationship for every single probable cut-off. It is a graph with x-axis displaying FPR and y-axis displaying TPR. The Matthews Correlation Coefficient (MCC) is used to calculate of the quality of binary classifiers was presented by Brian W. Matthews biochemist [139] and can be calculated as depicted in Eq. 16.
Pearson Correlation Coefficient (PCC) is image quality index and is used to measure linear correlation between source image and the normalized image, and its value is between 0-1, where result greater than 0 shows the correlation of two said images and value 0 indicates that two images are unsimilar. PCC can be calculated using Eq.17.where χ i and y i indicates source image and normalized image and µ χ & µ y are the source image and normalized image means.
Quaternion Structure Similarity Index Metric (QSSIM ) is another image quality metric and its range lies between 0-1 where value 0 depicts poor color normalization and value tends towards 1 indicates that the method used for color normalization is better. (QSSIM ) can be calculated by given below Eq. 18. Where µ qref & µ qnorm indicate sample image and normalized images; and the standard deviations are denoted by σ qref & σ qnorm .
Structural Similarity Index Metric (SSIM ) is a perceptual measure for calculating degradation in image quality, and its value lies between 0-1 where near to zero indicates poor normalization method and near 1 indicated that the method for color normalization is better. Three factors are involved in calculating SSIM, i.e. Luminance, contrast and Structure. and mathematical formula to calculate SSIM is given below in Eq.19 where µ x and µ y are the original and normalized images and σ x & σ y are their standard deviations and the constants are c 1 and c 2 . Reference [140] used PCC, QSSIM and SSIM image quality metrics to calculate the performance of its purposed stain normalization method. Absolute error is an error quantity between true and normalized image. This error can be evaluated by using formula given in Eq.20

IV. REVIEW OF THE LEUKEMIA DETECTION METHODS FROM PERIPHERAL BONE MARROW AND BLOOD SMEAR IMAGES
This section discusses the research performed for blood diseases, especially Leukemia identification and Classification.
In this regard, we first discuss traditional image processing methods for automated diagnosis of Leukemia which are followed by a detailed discussion on recent state of the art ML and DL-based frameworks.

A. TRADITIONAL METHODS
These methods employ conventional image processing algorithms for segmentation, feature extraction and classification of Leukemia. A class of methods for ALL and AML leukemia diagnosis employ classical thresholding techniques for segmentation and extraction of morphological and texture features which are subsequently fed to support vector machines (SVM) for classification [32], [141]- [143]. SVM classifies between the normal and blast cells while further classifying them into its sub-types: L1, L2 and L3. A framework for automated diagnosis of ALL employs geometry and KNN for acute lymphocyte classification in peripheral blood microscopic images [144]. Similarly, in [145] geometric, color, statistical and textural features were used to classify between malignant and benign cell using Naive Bayes and K-NN. Here, 60 image samples of blood were used for evaluation that provided accuracy of 92.8% for classification. A system for classification of two acute leukemia types, i.e., AML and ALL, was presented in [146] which employed twelve manually obtained features from images.
Here, again classification was done using K-NN whereby experiments were run on 1500 image dataset, providing an accuracy of 86%. A mobile-cloud-assisted segmentation and classification framework for leukocytes into sub-classes was proposed in [147]. This method used k-means clustering algorithm for removing irrelevant components via morphological operations on a number of features, i.e., statistical, texture and geometric. Subsequently, an ensemble multi-class SVM was used for classification at the end. Here, the dataset used for evaluation consist of 1030 images of WBCs and 98.6% accuracy was obtained.
A system for classification of AML and its subtypes M4, M5 and M7 was introduced by Setiawan et al. [148]. Initially, the cell segmentation was performed using a color k-means algorithm. Classification was done on six statistical features using multi-class SVM. That led to 87% accuracy for segmentation and 92.9% accuracy for classification. A three-layered feature extraction framework, coding and classification [149] uses blood smear images to identify leukemia and its type. For this purpose, feature extraction was done using DSI feature transform and classification was performed using multi-class SVM. Experiment evaluation was done on 400 image samples providing an accuracy of 79.38% for classification. An interesting approach that used CD marker, texture, shape and color as features, K-means clustering for segmentation and SVM, Deep Learning for feature extraction and classification achieved an amazing accuracy of more than 99% [142].

B. DEEP LEARNING BASED METHODS
Lots of studies used deep-learning techniques for WBCs and Leukemia Segmentation and classification. As [95] suggested a pipeline which employs deep learning for WBCs classification. Number of pretrained networks like Overfeat-Net, AlexNet, VGG were checked for feature extraction, additionally, a novel architecture from scratch was trained and tested. The maximum accuracy obtained was 96.1% employing the new architecture. The dataset used, contains 2551 images. Reference [150] suggested CAD-system. The given CAD system comprises of two major steps i.e. recognition & classification. For recognition SSD [151] is applied, whereas for classification multiple networks pretrained were analysed. The network architectures applied were VGG, AlexNet [152],ResNet and GoogleNet. Maximum average accuracy attained was 97% for AlexNet when checked on dataset of 7500 image. A DCLNN architecture in [153] is recommended. The given architecture is quite simple in comparison to pretrained architectures available and is also trained from scratch. This methodology attained an average accuracy of 88% for classification, when tested on image dataset of 13,000 images. In recent study [154], CAD-system is recommended for WBCs classification by applying capsule networks. These networks architecture majorly contains encoder and decoder. The encoder encodes image to a vector with 16-dimensions containing information required to depict the image, and decoder learns decoding the instantiation parameters provided in the image of the item it identifies. Euclidean distance is used in the decoder for measuring the loss function for calculating resemblance between the features reconstructed and the trained feature. The key advantage while applying the capsule networks is getting images reconstructed which can be inspected visually. By applying this network architecture, average obtained accuracy was 92.5% for 263 images.
Reference [155] proposed a deep learning network architecture for WBCs and RBCs segmentation. The recommended architecture is SegNet architecture [156]. The proposed architecture employs encoder and decoder pair for creating feature maps for segmentation. By using architecture, segmentation of 42 images was performed, and mean intersection over union achieved was 79%. Yu et al. [157] offered WBCs classification method by employing multiple deep learning networks. The proposed system was assessed on 2000 microscopic images dataset of 7 WBCs types and compared to various conventional techniques. An average accuracy obtained was 88.5%, which confirms the dominance of employing a CNN. Shafique and Tehsin [30] proposed a CNN based framework for diagnosis of ALL leukemia subtypes and achieved 99% accuracy for binary classification (healthy vs. ALL) cells and 96% accuracy for diagnosis of ALL subtypes. Another CNN based framework for ALL subtype classification was proposed by Thanh et al. [158] that achieved 96.6% accuracy. However, Nizar et al., have proposed a CNN based framework that is capable to diagnose all subtypes of leukemia and perform better than other ML based binary leukemia classification approaches [7].
References [159] and [160] have proposed a DL based framework for ALL classification and its subtypes using the bone marrow imagery. Their proposed framework achieves a reliable accuracy of 97.78% and can be a viable technique for pathologists. An efficient NN based method has been proposed by Kumar et al., for the identification of ALL and its subtypes [22]. The diagnosis of Leukemia using AB Colony (ABC) for the training of Back Propagation Neural Network (BPNN), initially, Principal Component Analysis (PCA) used for the dimensionality for the leukemia dataset. Then the optimum feature set is obtained by ABC algorithm followed by the classification phase. Their results suggest that the ABC-BPNN system based on PCA is more accurate than the genetic algorithm based BPNN (GA-BPNN) with achieved accuracy of 98.72%. However, their proposed framework of ALL and subtype classification is tested on a small dataset.
Vogado et al. [161] presented a diagnosis system for leukemia used on 377 images. The proposed system uses CNN network and transfer learning for extracting discriminant features. Later, feature selection employed information gain. Lastly, Classification was performed by SVM classifier. Classification accuracy of approximately 99% was obtained with three heterogeneous datasets. Zhao et al. [162] offered a detection and classification system for WBCs. Later, for extracting features, CNN was used. Lastly, RF and SVM classifier were used together for classifying WBCs. The system suggested was assessed on datasets. The average accuracy for classification obtained was 92.8%. Habibzadeh et al. [163] proposed a system for WBCs classification based on deep learning and transfer learning both. The method anticipated starts from pre-processing step to transfer learning for extracting features. Lastly, Inception and ResNet were used for classification. 1244 images of WBCs were used to evaluate the performance of proposed method. Accuracy of 99.84% was achieved which is the top most case. Lin et al. [164]leukocyte proposed a system for classification of leukocytes. Initially, advanced technique was used to extract complex leukocytes by k-means algorithm. A CNN model performed classification. The recommended method was assessed on 368 images dataset. The achieved accuracy results for classification was 98.96%.
Rehman et al. [105] suggested a system for Classification for ALL and its types. Initially, the ROI of lymphoblast from bone marrow aspirants was segmented with thresholding method. For classification CNN i.e., AlexNet was used. 330 Images dataset was used for assessment. The accuracy achieved for classification was 97.78%. Wang et al. in [165] offered a technique for detection and classification of WBCs using ensemble classifier in which outputs are fused for a multiple CNNs. An average accuracy for classification produced was 99.37%, using a 3000 images dataset for each category.
The advancement in deep neural networks impacted in evolution of another type of automated leukemia diagnosis methods labelled as end-to-end learning-based methods. The main concept behind this type is to construct and develop a DNN which gets input microscopic blood smear image and returns the output category of the image instead of undergoing the multiple challenging steps followed normally in traditional methods like ROI (region of interest) for segmentation and feature extraction. Table 3 gives an overview of the methods that have performed WBC segmentation. It summarizes the major work in leukocytes segregation, counting, blood smear analysis and highlights the studies that have performed automated detection of leukemia and have utilized some kind of image processing, computer vision, machine learning or deep learning framework. Table 4 provides survey of the studies that are carried out on ALL and its sub-types classification using image processing and Table 5 depicts a survey about work carried out for WBCs and ALL diagnosis using Deep Learning methods. Table 6 summarizes the studies that have proposed      methods for ALL and AML classification from bone marrow and peripheral blood smear images. Table 7 summarizes the studies that have progressively performed AML diagnosis and classification using computer vision, image processing, ML and DL frameworks. It is clearly observed that not many studies have performed myeloblast detection and VOLUME 9, 2021 classification. References [166] and [79] have the highest accuracy of myeloid classification using Hausdorff dimension, color, texture, shape and spatial features, K-means clustering, and Hidden Markov random field based segmentation. It is also observed that either SVM, or NN are used for classification of the myeloid leukemia. Table 8 summarizes the studies that have performed leukemia detection in general or any other type of leukemia using the image processing, computer vision, machine learning or deep learning approach. Reference [167] proposed framework for diagnosing types of leukemia whereas [42], [46], [168]- [173], did general leukemia classification. Table 9 compare and analysis the accuracies of different methods for Segmentation and classification of WBCs and types of Leukemia (ALL, AML and CLL) using image processing, ML, DL approaches along with their Computational time. For WBC segmentation [174] provides 97.4% accuracy with 2.4 to 2.6 msec computational time per leukocyte, which is good. For ALL segmentation and Classification [110] technique provides better accuracy of almost 99% with computational time of just 0.346 sec (per Lymphocyte) but [118] gives less accuracy i.e. 97% with almost same computational time. For AML computational time with full feature set is 104 sec and with reduced feature it is reduced to 98 sec and providing 95% accuracy [175].As compared to acute Leukemias little work has been done in past for Chronic Leukemia.
Although various studies have performed WBC analysis, segmentation, counting, blood smear analysis and acute leukemia identification and classification only handful studies have addressed the chronic leukemia diagnosis. In the literature, Alférez et al., [176] has classified the CLL & Hairy Cell leukemia (HCL) using texture, geometry and granulometric features and classified through Fuzzy C-means (FCM).

V. DISCUSSION
Automated methods for leukemia diagnosing using microscopic blood smear images increased since last decade and the increase in their number can be clearly observed. Many studies have proposed automated detection of acute type of leukemia (ALL and AML) but just a few research studies have proposed framework for the diagnosis of chronic type of leukemia (CML and CLL) using image processing methods [14]. Although numerous studies have been presented for the automatic diagnosis of acute lymphoblastic leukemia, only fewer studies have proposed the methods to diagnose the subtypes of ALL. In the case of AML classification, little work has been performed [10] while no study is found for CML type of leukemia detection. The diagnosis of leukemia into subtypes is a tough task and many studies have only performed binary classification such as any subtype of leukemia versus the healthy leukocytes [21], [28], [29], [158], [177]- [179]. There are just a handful studies which have proposed a framework that diagnosis all types of leukemia with optimal and acceptable accuracy. After reviewing literature, it has been found that 79% of the studies have utilized private data that is obtained from either nearby hospitals or pathologists from the local laboratories. 17% of the studies have utilized publicly available datasets such as ASH and ALL-IDB in their methodologies. 14% studies did not mention the source of the dataset and 32% studies have used SVM for classification of leukemia cells [14]. Since automated frameworks require large repository of data to train the model, various data augmentation methods have been suggested in the literature that makes the data large enough to be suitable for training purpose. Most of them include image rotation, translation, shearing, blurring, mirroring, gray scale image transformation and histogram equalization [30].
Patel and Mishra [177] proposed an automated approach for the diagnosis of acute leukemia from the microscopic blood smear images and suggested some methods such as K-state grouping, Zack algorithm, and histogram alignment for filtering the blood smear images. Their proposed method achieves with an accuracy of 93.57% using SVM and 97.78% using Deep convolutional neural network. Rehman et al. [105] performed classification of ALL into its types from the peripheral bone marrow smear images. Their proposed framework that converts the smear image to HSV domain and applies operation on the saturation component to get lymphoblasts followed by thresholding and hole filling. They have used convolutional neural network to extract features and classification of ALL into subtypes. Their proposed framework works better than ML based approaches and achieves accuracy of 97.78%. An efficient neural network based method has been presented by Kumar et al., for the identification of ALL into its subtypes [22]. The detection of Leukemia using AB Colony for the training of BPNN. Initially, Principal Component Analysis (PCA) used for the dimensionality for the leukemia dataset. Then the optimum feature set is obtained by ABC algorithm followed by the classification phase. Their results suggest that the AB Colony based Back-Propagation Neural Network (ABC-BPNN) system based on PCA is more accurate than the genetic algorithm based BPNN (GA-BPNN) with achieved accuracy of 98.72%. However, their proposed framework of ALL and subtype classification is tested on a small dataset. Zhang et al. [180] suggested a NN based framework for the classification of leukocyte that achieves accuracy of 94.57% using the CNN and HOG features and tested on a larger dataset i.e. 5000 images obtained from a local hospital. They also tested their results using CNN features and classified using SVM classifier and achieved accuracy of 94.23% that is not different from 94.57%. Thus, their results suggest that hybrid neural network framework achieves greater accuracy of leukocyte classification instead of utilizing only HOG features achieved only 85% accuracy. However, their outcomes were not compared with other state of the art techniques. Many studies have been proposed that can ideally identify and segment blast cells.
Out of the various segmentation methods, marker-based watershed technique is found to be the best that can segment the overlapping blasts cell with an accuracy of 96.29% [110]. Some studies have utilized the Neural Network (NN) for the segmentation of peripheral blood smear and bone marrow images along with the classification of leukemia [7], [36], [93]- [97].Additionally, many other segmentation techniques VOLUME 9, 2021 such as HSV color based, thresholding, watershed, and clustering based methods have also been used in the past, but these methods lack in efficiency due to variation in contrast, light and noise. Thus, it is needed to develop more accurate methods to segment blast cells from peripheral blood smears images [83]. Rawat et al. [41] have improved the accuracy of blast cell classification to 99.517% using Histogram green color of RGB component for pre-processing of blood smear images and using K-means clustering technique for segmentation of blast cells followed by the extraction of geometry, statistical and textures features and classification by ANN. In the classification phase, different techniques were carried out by researchers to obtain the accurate results, where the ANN and SVM techniques were the most accurate. Moreover, the detection phase was dependent on the previous stages (segmentation and feature extraction). The studies that used Support Vector Machine (SVM) have presented higher accuracy compared to other classifiers. Moreover, other studies also found the Fuzzy logic-based classifier was greater than SVM in terms of accuracy. Therefore, it is indispensable to develop an effective classifier to detect blast cells from peripheral blood smears images. Makem and Tiedeu [208] proposed algorithm based on adaptive fusion to fuse color components obtained from CMYK and HSV color spaces to extract nuclei of the WBC without the applying morphological operators. Their method addressed the problem of contrast, color or brightness variation and does not requires the application of morphological operations. Srisukkham et al. [200] proposed the framework for ALL classification with achieved accuracy of 94.94%. Their proposed method subdivides the blood smear images and pre-processes it followed by the application of marker-controlled watershed segmentation that segments the blast cells. Their propose framework extracts features from Particle Swarm Optimization (PSO), Cuckoo Search (CS) and Dragonfly Algorithm (DA) and classification was performed using SVM for detection and classification of blasted cells.
By reviewing literature, it has been observed that in studies related to blood smear analysis such as anaemia, or other, thresholding is found to be the common method of segmentation. In number of research studies that have performed leukemia diagnosis and classification, clustering-based segmentation i.e. K-means and Fuzzy C-means clustering technique is the extensively used method for the segmentation of blast cells with achieving accuracy that is more than 95% [4]. Watershed segmentation and thresholding are two other widely used methods in the literature. Thresholding, watershed segmentation, clustering are the methods that are extensively used in the studies related to WBC analysis i.e. identification, segmentation, and counting. References [32], [74], [105], [141], [144], [177], [197], [209], and [210] have detected acute leukemia using thresholding-based segmentation methods. Otsu's thresholding is found to be a better algorithm for segmentation than Zack's algorithm in case of acute leukemia classification [92].
For leukemia detection, morphological features have been extensively utilized that takes into account the shape of the blast cell. Other extensively features used are fractal dimension, Hausdorff dimension, color and texture based features. Since a lot of information is present in the blast cells, so using a larger feature set for classification might not produce the good results. Thus, feature selection should be performed for reduction of the feature set dimensions and for the selection of non-redundant features prior to the classification phase. The common algorithms used for feature reduction are Local Direction Pattern (LDA) and Principal Component Analysis (PCA). Some other feature selection methods that have been used in the literature include Fisher's Discrimination Ratio followed by application of Exhaustive Search to select the best feature set [4]. For the classification of leukemia cells, various ML and DL based frameworks have been proposed but amongst all ANN and SVM are found to be most accurate. One of the extensively used classifiers for the blast cell classification is the SVM followed by MLP and ensemble methods. However, the identification and classification phase is highly dependent on the segmentation outcomes and the features extraction from the peripheral blood smear images [89]. The studies which have performed analysis on multiple classifiers have found SVM to be highly accurate for ALL classification whereas other studies have found the accuracy obtained from Fuzzy logic-based classifier is greater than SVM. However, in few cases, MLP has performed better than the other algorithms for leukemia classification [4]. A study performed by Jakkrich et al. has achieved an accuracy of 99% using SVM and deep learning algorithms [142]. Moreover, neural network based frameworks have also performed better for ALL classification and have achieved an accuracy of 97.78% that is more than the ML algorithms [105]. Therefore, there is still a need to develop an effective classifier that is not prone to overfitting or underfitting, can detect blast cells from peripheral blood smears and addresses the challenges and complexities of blasts classification.
In future work, a combination of two or more segmentation techniques can be made to segment acute leukemia cells. The features selection and dimensional reduction techniques should be applied to achieve greater efficiency in the leukemia classification [94]. The available literature studies that performed microscopic blood smear image analysis using various image processing and computer vision techniques have performed their analysis using Image Processing Toolbox in MATLAB which is not a familiar software for a lot of medical professionals and pathologists. Thus, there is a need to develop specialized and user friendly medical software that can be readily used by medical professionals.

VI. CONCLUSION AND FUTURE WORK
Leukemia is a fatal disease that weakens the immune system by affecting the WBCs and bone marrow. This review paper has critically analysed the dozens of studies that have conducted in the past. It has presented a survey of techniques used in automatic diagnosis of leukemia. A survey of image processing, computer vision, ML and DL based framework has been presented that analyses and compares these methods in terms of datasets, pre-processing, segmentation, features and classification algorithms and classification accuracy. CBC, blood smear examination under microscope and bone marrow aspiration have been the conventional manual methods for the detection of leukemia. However, these methods have shortcomings and because of the ease provided by the technology, dozens of automated methods for CAD systems have been recommended for the Leukemia diagnosis. These automated image processing and computer vision-based methods have the ability to diagnose the leukemia readily with ease and accuracy. Thus, the patient can be given a proper treatment in due time and his life can be saved. Some of the studies presented in the past have a very high accuracy of leukemia classification. Thus, CAD systems have the potential to replace the manual diagnosis of leukemia. However, by reviewing literature it has been inferred that many of studies have proposed frameworks for detecting acute leukemia and its subtypes but just few studies have proposed methodologies for detecting chronic leukemia or all leukemia subtypes. Hence, more studies should be conducted to develop propose methodologies that can detect all subtypes of leukemia, thus helping the pathologists to diagnose leukemia efficiently and accurately. Furthermore, pre-processing and segmentation are important stages in leukemia detection and categorization, and accuracies of classifiers highly depends upon it. Thus, researches should propose more efficient pre-processing and segmentation algorithms that works better for high density noisy images and segments the blast cells more efficiently thus helps in achieving a higher accuracy of leukemia classification. Deep learning frameworks have been used by researchers for automated diagnosis of Leukemia, but a common challenge with these frameworks is to obtain big datasets which is difficult to get. So, big datasets must be created to make optimal usage ML & DL frameworks. If the data available is limited, then some methods must be proposed to give promising results with less available data, and avoiding the overfitting issue. Lastly, to implement meaningful ML & DL algorithms, there is need for the ML & DL tools to be available, also perform well technically, transparent and explainable to both providers and patients.
Although the review covered here is reliable and comprehensive and has attempted to cover all aspects concerned to automated detection of leukemia. However, the timeliness of this study should be regarded as one of limitations as the research in this domain is increasing rapidly increasing and variety of techniques have been proposed in the multifarious aspects.