Exploiting the multi-scale Information Fusion Capabilities for aiding the Leukemia Diagnosis Through White Blood Cells Segmentation

Leukemia is one of the most terminal types of blood cancer, and many people suffer from it every year. White blood cells (WBCs) have a significant association with leukemia diagnosis. Research studies reported that leukemia brings changes in WBC count and morphology. WBC accurate segmentation enables to detect morphology and WBC count which consequently helps in the diagnosis and prognosis of leukemia. Manual WBC assessment methods are tedious, subjective, and less accurate. To overcome these problems, we propose a multi-scale information fusion network (MIF-Net) for WBC segmentation. MIF-Net is a shallow architecture with internal and external spatial information fusion mechanisms. In WBC images, the cytoplasm is with low contrast compared to the background, whereas nuclei shape can be complex with an indistinctive boundary for some cases, therefore accurate segmentation becomes challenging. Spatial features in the initial layers of the network include fine boundary information and MIF-Net splits and propagates this boundary information on multi-scale for external information fusion. Multi-scale information fusion in our network helps in preserving boundary information and contributes to segmentation performance improvement. MIF-Net also uses internal information fusion after intervals for feature empowerment in different stages of the network. We evaluated our network for four publicly available datasets and achieved state-of-the-art segmentation performance. In addition, the proposed architecture exhibits superior computational efficiency by using only 2.67 million trainable parameters.

which is tedious, time-consuming, error-prone, and subjective procedure. Hence, manual procedures are required to be replaced or assisted with artificial intelligence-based automated systems. Advances in computer-assisted diagnosis have a potential impact on the diagnostic industry [4]. Specifically, deep learning has played a vital role in various disease detection systems [5]. Computer-assisted diagnosis is the need of modern diagnostic systems. Leukemia diagnosis is significantly associated with WBC. Patients with leukemia exhibit changes in WBC count and morphology. Traditional diagnosis is based on manual assessment and analysis of WBCs. There is an improvement gap for accurate and early detection of leukemia which can be best filled with deep learning-based automatic and robust diagnosis frameworks. WBCs are normally classified into five main types; eosinophils, neutrophils, lymphocytes, basophils, and monocytes [4]. WBC segmentation is challenging because the cytoplasm is in low contrast with the background while the nucleus for some cases has an indistinctive boundary with a complex shape. Most of the existing WBC segmentation methods do not consider cytoplasm and nucleus joint segmentation. Moreover, some of the methods perform joint segmentation but their segmentation performance is not up to the mark. Lastly, many existing networks do not have a good computational efficiency therefore they require a large number of trainable parameters. In WBC segmentation, along with problems from the medical point of view, we have also discussed the challenges from a computer vision point of view in discussion section. To address these problems, we develop WBC segmentation network capable of performing joint segmentation for cytoplasm and nucleus with good computational efficiency. We used multi-scale information fusion for preserving and propagation of objects boundary information. We also used image information fusion for an improved feature extraction and enhanced learning. Information fusion from different stages of network helps in thorough learning of image features and results in accurate predictions. We improved segmentation performance using novel techniques without compromising the computational efficiency. We evaluated our architecture on four publicly available datasets and exhibited state-of-the-art segmentation performance. In Figure 1, sample images with their corresponding ground-truth images are shown. Leukemia brings changes to WBC count, and morphology [6] whereas morphological detection is mainly based on accurate boundary predictions for desired classes. In addition, there are a few statistical analyses based on the area of cytoplasm and nuclei which consequently aid in leukemia diagnosis. Our main contribution to this study can be summarized as follows. -We developed a novel network, namely multi-scale information fusion network (MIF-Net), for joint segmentation of cytoplasm and nucleus to highlight WBC morphology and count for aiding the leukemia diagnosis. MIF-Net possesses a shallow architecture with internal information fusion after intervals that improve the segmentation performance through feature empowerment.
-MIF-Net splits low-level multi-scale fine boundary information from the initial layer and propagates to deep stages of the network for external information fusion. It helps in accurate boundary predictions for both cytoplasm and nucleus. Proposed architecture is evaluated on four WBC publicly available datasets and achieved state-ofthe-art segmentation performance. MIF-Net uses only 2.67 million trainable parameters, which confirms its promising computational efficiency.
The remaining paper is organized as follows. Previous study related to WBC segmentation is discussed in Section II. Proposed method is explained in Section III. The experimental results along with implementation details are provided in Section IV. In the end, conclusion of this study is presented in Section V.

II. RELATED WORKS
WBC analysis is the topic of vast interest among medical experts because of its clinical significance in the diagnosis of many critical diseases. WBC segmentation methods can be mainly divided into two categories; WBC segmentation based on handcrafted-features and WBC segmentation based on deep-features.

A. WBC segmentation based on handcraftedfeatures.
Handcrafted-feature-based methods are generally based on conventional image processing schemes. In one of the methods, a color-band-thresholding scheme was employed for WBC segmentation, counting, and analysis [7]. This study concluded that color space components enable the computer-aided system to achieve the highest segmentation performance [7]. In addition, it also refers that nuclei-based detection exhibits better performance than cytoplasm-based detection for counting purposes [7]. However, the proposed assessment is limited to the same resolution and magnification factor, and tunning of a constant multiplier is required for other resolution and magnification factors [7]. Another study used hue, saturation, and value (HSV) components to improve the counting results in blood smear images [8]. According to this study [8], HSV components help in improving the accuracies using area features and eccentricity. Presented work eliminates the need for unnecessary preprocessing and provides a simple and fast method for blood smear images analysis [8]. This method is evaluated using a few images [8]. Similarly, another method proposed adaptive histogram thresholding for leukocyte localization [9]. At first, nuclei is extracted followed by background removal using combination of image components and adaptive histogram thresholding. Later, the complete cell is extracted to obtain the cytoplasm region through subtraction operation [9]. This method is evaluated on a single dataset [9]. In [10], a clustering-based prediction method is proposed for decimating infected cells with noninfected cells. This method refers to a hybrid clustering approach in which a rough k-mean clustering is combined with rough soft covering clusters [10]. Method proposed in this study is capable of handling uncertainties by applying soft covering rough approximations [10]. Proposed method requires preprocessing and its processing time is expected to increase while working with multiple color images [10]. Subsequently, geometry and sparsity constraints are used to perform segmentation for nuclei and cytoplasm in leukocytes [11]. In this approach, accurate cell detection is achieved with the help of a fitting technique and effective information is preserved for better results [11]. This method was evaluated on datasets with single cell-based leukocyte images [11]. In another method, WBC assessment and segmentation are carried out by undergoing entropy-based procedures [12]. To improve the segmentation performance, a bi-level threshold is used in this approach [12]. Moreover, experiments are performed on a single famous leukocytes dataset for evaluation of their method [12]. In a study, the nucleus and cytoplasm of WBC are segmented using image texture and color-based enhancements [13]. In this method, cytoplasm highlighting, and elimination of undesired information are attained using enhancements and discrete wavelet transform [13]. This method requires contrast stretching as preprocessing to produce desired results [13]. In another method, leukocytes segmentation is performed using K-means clustering [14]. Initially, RGB image is converted and fed to K-mean clustering for extraction of nuclei. Finally, nuclei and background differences are taken from the main image to find cytoplasm [14]. Proposed method is evaluated on a single dataset [14].

B. WBC segmentation based on deep-features.
Deep learning has brought the conventional diagnostic industry to the verge of fast, automatic, and intelligent diagnostics. Deep learning-based segmentation approaches are famous because of their robustness and high segmentation performance. Deep-features-based methods are widely used in many computer-assisted diagnosis applications [4]. This study [4] refers to CNN-based algorithms that use transfer learning for WBC segmentation. However, this transfer learning-based approach had performance limitations especially for small cells [4]. In this study [15] some famous encoder-decoder-based segmentation architectures are employed for the segmentation of WBC. Proposed method employs VGG16 as an encoder and its features are used to improve the segmentation performance [15]. This method is evaluated on a single dataset for segmentation [15]. In this study, convolutional neural network (CNN) based on encoder-decoder structure was used for WBC segmentation [16]. Multi-scale features were fed to the encoder whereas at the decoder-end, features were attained with the help of a context-aware feature map decoder to reconstruct the segmentation mask for predictions [16]. The proposed architecture is based on famous U-Net architecture [16]. Similarly, WBC multi-scale features were extracted with the help of a context-aware encoder in CNN [17]. A residual architecture performed features refinement and predictions were made with segmentation mask generation [17]. The network in the proposed method is based on famous U-Net architecture with the addition of a feature refinement module [17]. Subsequently in another method, initial segmentation is performed using an unsupervised technique [1]. In the first part, the foreground region of the image is extracted using Kmeans clustering. In the second part, support vector machine (SVM) is trained using the first part segmentation as labels.
Finally, SVM provides the pixel-wise classification for further improvement in performance [1]. However, this method was tested on two datasets with no small-sized cells [1]. In a study, leukemia is diagnosed by initially performing segmentation and finally classifying the diseased and nondiseased cells [18]. In this method, fuzzy C means algorithm is combined with active contour output using a hybrid mutual information model for segmentation [18]. This framework needs preprocessing for predictions [18]. Subsequently, WBC segmentation is performed using multi-spectral imaging techniques [19]. SVM is directly applied on each pixel for pixel-wise segmentation whereas feature selection was conducted through sequential minimal optimization [19]. Feature vector is formed using the intensity of each pixel [19]. Similarly in [20], an algorithm for the recognition of five main types of WBCs is presented. Cytoplasm and nucleus segmentation is performed by using the snake algorithm and gram-Schmidt orthogonalization. After feature extraction, SVM and artificial neural network are employed for classification [20]. The proposed method has a limitation of generating multiple contours in case of multiple, separate, and apart nuclei regions in the same cell [20].

A. Proposed method overview
The count and morphology of WBC is considered as a key biomarker for diagnosis of leukemia. Therefore, we develop a novel architecture for the segmentation of cytoplasm and nucleus in WBC microscopic images. The low contrast of cytoplasm and indistinctive boundary of nucleus for few images can make the WBC segmentation challenging. We develop a multi-scale information fusion-based architecture to deal with these challenges. Extracted features in initial layers of any network carry potential low-level spatial information. Overview of the proposed framework is shown in Figure 2. WBC images are provided to spatial information splitter as input. This spatial information includes fine boundary information which is splitted in multi-scale using spatial information splitter. Fine boundary information is propagated to the advance layers of the network for information fusion. In the external information fusion section, boundary information is fused with downsampled spatial information. External information fusion significantly helps in boundary predictions for the cytoplasm and nucleus. Moreover, splitter also transfers spatial information for internal fusion. Internal information fusion ensures feature empowerment across the network. Internal fused information is periodically downsampled and finally after a few layers of operations it is also fused with external information fusion. In the end, prediction masks are generated in accordance with fused features. Our network provides segmented cytoplasm and nucleus along with WBC count as output which can aid in the diagnosis and prognosis of leukemia.

B. Cytoplasm and nucleus segmentation using MIF-Net
WBC microscopic images are usually different in shape, stain condition, and size which makes their segmentation challenging. We developed a multi-scale information fusionbased network that is capable of delivering better segmentation performance even in challenging conditions. The network architecture of proposed MIF-Net is shown in Figure 3. Spatial information of input image is extracted using image input layer. This spatial information is splitted through spatial information splitter which is based on a convolution layer. Spatial information splitter splits the spatial information in three different scales using strided convolution layers. These multi-scale image features significantly help in segmentation performance improvement. U-Net [21] and SegNet [22] are famous encoder-decoderbased segmentation architectures. In encoder-decoder structures, pooling layers and un-pooling layers are used for downsampling and upsampling of spatial information, respectively. Pooling and un-pooling layers cause information loss along with their normal operation [23]. This information loss is more crucial at the decoder-end because after decoding prediction masks are generated. To cater this problem, MIF-Net uses a shallow architecture and we employ transpose convolution in place of un-pooling layers. Similarly, we use strided convolution in place of pooling layers for multi-scale information downsampling, and propagation. Both transpose and strided convolution layers are learnable layers and help the network in optimum learning. In addition, MIF-Net presents a unique internal and external information fusion concept. Initial layers of the network possess fine boundary information of the objects and we propagated this boundary information in the boundary information propagation (BIP) section. BIP uses strided convolution for multi-scale propagation of spatial information. Three strided convolution layers with the corresponding stride (S-1/S-2/S-3) took part in multi-scale information dissemination and later this information is fused with spatial information in external information fusion blocks. As shown in Figure 3, we used two categories of fusions; External information fusion (EIF) and internal information fusion (IIF). In EIF, all three skip connections are originated from the encoder's first layer (splitter), bypassing the remaining encoder layers and passing through BIP, which are terminated in the decoder. Therefore, all the fusions in decoder getting external inputs from the encoder are named EIF. In the proposed architecture, three EIFs are employed for multi-scale external information fusion. In all IIF, skip connections are originated from the encoder's preceding convolution layer and are terminated in the subsequent convolution layer of encoder. Hence, all the fusions in encoder getting internal input from encoder are named IIF. In the proposed network, four IIF are employed for spatial feature empowerment after intervals. Three EIF blocks, namely EIF-1, EIF-2, and EIF-3 are used for external information fusion between boundary information and downsampled spatial information. Generally, all network layers cause some feature degradation along with their normal operation. MIF-Net developed an internal information fusion scheme to compensate for this feature degradation problem. MIF-Net design contains four IIF blocks; namely IIF-1, IIF-2, IIF-3 and IIF-4 which are employed for feature empowerment. S-1 shows the strided convolution with stride 1 and it takes the input from splitter and provides the output to EIF-3. Similarly, S-2 refers to strided convolution with stride 2 and it also takes the input from splitter, reduces the feature map size with a factor of 2 (half), and its output is fed to EIF-2 for information fusion. S-4 shows the strided convolution with stride 4. S-4 also takes input from splitter; reduces the feature map size with a factor of 4 (quarter) and its output is fused at EIF-1.
In Figure 4, we presented both external and internal fusion process interpretation for a detailed explanation. As shown in Figure 4, _ _ refers to the strided convolution which is itself the part of BIP. Features being input to _ _ (BIP) are originated from spatial information splitter. Two convolution layers conv i and conv j provides spatial features of and , respectively. The addition symbol given at the left side of Figure 4 represents IIF and both spatial features, and , are fused producing internally fused feature at the output.
= + Transpose convolution ( _ _ ) upsamples the internally fused feature and provides ′ at the output. Splitted spatial boundary information is fed to strided convolution ( _ _ ) for information propagation in designated scale and it provides spatial boundary information ( ) at its output. This and ′ are fused and provides external fused spatial boundary information ( ) as given in mathematical expression below = ′ + This external fused boundary information is further processed through batch normalization ( _ ) and rectified linear unit (ReLU) and changes in spatial information is represented with ∆ . The final output is fed to the next external information fusion block. Loss function in any deep learning-based convolutional neural network guides the network for optimal learning. In the proposed architecture binary cross-entropy loss is employed to lead the training process. MIF-Net layer level configurational detail is presented in Table 1.

C. MIF-Net architecture comparative details.
MIF-Net architecture is developed from scratch, and it is not based on any other network. The initial layers of the network contain the low-level spatial information. This low-level spatial information from the initial layer is splitted into four paths for external and internal information fusions. Splitter is based on a convolution layer followed by BN and ReLU layers. The splitting of spatial information is shown in Figure  5. Three strided convolution layers with strides 1, 2, and 3 change the features to multi-scale according to their stride values. Feature map size at splitter (Conv1_1) is 300×300×64 and after strided-conv (S-2) feature map size becomes 150×150×64 because of stride value of 2. Similarly, after strided-conv (S-4) feature map size is downsampled to 75×75×128 with increased channels. Lastly, strided-conv (S-1) retains the feature map size (300×300×32) with decreased number of channels and applies the convolution operation for the last external information fusion. All this multi-scale information from strided convolution layers subsequently undergoes EIF. In MIF-Net, IIF ensures feature empowerment after intervals whereas EIF helps in improved boundary predictions for cytoplasm and nucleus. Multi-scale information fusion is being widely used in different deep learning-based applications [45], [46]. In [45], a scale estimation network (SENet) with head detection network is presented for person localization. SENet employs ResNet-101 as the backbone to address the problem of vanishing gradient. Similarly, in [46] multi-scale object proposal method followed by multi-class object classification is proposed. In this method, a feature pyramid network (FPN) is employed for multi-scale feature maps to be converted into object proposals. FPN is used with the backbone of ResNet. MIF-Net is an entirely different architecture compared with [45] and [46], some of the major differences are highlighted in Table 2. In MIF-Net, a maximum of 256 channels are used which results in a small number of parameters requirement. Subsequently, only three downsampling operations are performed in MIF-Net to keep the final feature map size sufficiently large for avoiding minor information loss. Unlike [45], [46], the proposed method uses a total of 7 fusion operations including EIF and IIF. In the proposed method, spatial information is splitted from the initial layer. Multiscale information in MIF-Net is attained using strided convolution with different stride values. MIF-Net architecture is a computationally efficient architecture, and it requires only 2.67 million trainable parameters.

A. Description of experimental setup and databases
We performed experiments on four publicly available datasets to evaluate the proposed framework.

B. MIF-Net training
Neural networks require extensive annotated data for the optimal training of the network. Annotated medical images data is usually insufficient for training. Therefore, we perform augmentation of training data to artificially create the training database. In augmentation, we logically perform different morphological, geometric, and arithmetic operations. In our data preparation, we used vertical and horizontal flipping along with different translation and cropping schemes for data augmentation. We used the original image size for Dataset 2 experiments whereas Dataset-1 and dataset 3 were resized to 300 × 300 and 712 × 568, respectively. Same data splitting criteria are followed as given in [16]. Hence, 70% of images are randomly selected for training while 30% of images are randomly selected as testing whereas 10% of the training split is used for validation to avoid overfitting. We trained our network from scratch; we did not use any pre-trained network or weights migration for our model training. Adaptive moment estimation (Adam) is employed as an optimizer to optimize the adaptive learning process [24]. Adam optimizer is famous for its fast convergence with limited memory requirements using firstorder gradient [24]. Class labels identifiers based on corresponding ground-truth images are assigned before starting the training process. An initial learning rate of 0.001, L2 regularization of 0.0005, and a square gradient decay factor of 0.95 were set as initial training hyperparameters. Overall optimization is attained using Adam optimizer. Validation is used to ensure that network is not overfitted with training data. A stopping criterion based on validation accuracy with a stopping threshold value of 20 is adopted. If the validation accuracy does not increase for 20 consecutive iterations training stops and network with optimized hyperparameters is saved. with threshold value of 20 is set to stop and save the network with optimized hyperparameters. avoid the overfitting The training loss and accuracy plots are shown in Figure 6(a), 6(b), and 6(c) represents that the training accuracy is approaching to approximately 100% with decaying loss for all datasets. Similarly, validation accuracy and loss plots shown in Figure 6

C. Evaluation metrics
We evaluated the proposed network on four publicly available datasets and compared our results with existing state-of-the-art results. We presented both visual and numeric results for comparative evaluations. We performed segmentation for cytoplasm and nucleus for Dataset-1 and 2 whereas Dataset 3 has only cytoplasm annotations (for basophils) therefore we performed segmentation for cytoplasm only. Evaluation metrics used for MIF-Net includes precision (PRE) [25], misclassification error (ME) [26] , the dice coefficient (DI) [27], mean intersection over union (mIoU) [28], false-positive rate (FPR) [29], and falsenegative rate (FNR) [11]. Mathematical expressions of evaluation measures are shown in equations (4)- (9).
represent the desired region whereas represents the undesired region, in WBC ground-truth image. Likewise, refers to desired predicted segmentation result whereas refers to undesired predicted segmentation results. Same evaluations are also presented in terms of true positive (TP), false-positive (FP), and false-negative (FN) using different colors for visual interpretation of qualitative results. A higher DI and mIoU refers to a better segmentation performance likewise, a lower ME, FPR, FNR score also represents a better segmentation performance.

V. EXPERIMENTAL RESULTS
We present both numeric and visual results of MIF-Net evaluation for a comprehensive assessment.

A. MIF-Net segmentation results on Dataset-1
Experiments were performed on Dataset-1 for the joint segmentation of cytoplasm and nucleus. Dataset-1 possesses a rapid stain condition; consequently, the contrast of cytoplasm with background is low which makes the segmentation challenging. As shown in Figure 8, nucleus boundary is indistinctive in a few cases, hence accurate boundary prediction becomes intriguing for such WBC nucleus. Multi-scale information fusion mechanism in MIF-Net helps the network in preserving boundary information. Despite challenges, MIF-Net manages to exhibit state-of-theart segmentation performance using its effective design.
Dataset-1 good segmentation results are shown in Figure 7 whereas some poor segmentation images are provided in Figure 8. poor segmentation results are because of staining conditions, contrast limitation, and indistinctive boundary of objects. Dataset-1 numerical results comparison with state-of-the-art methods for cytoplasm and nucleus is presented in Table 3 and 4, respectively. Numerical results also confirm the superior segmentation performance of the proposed network.

B. MIF-Net segmentation results on Dataset 2
Experiments were performed to evaluate the proposed method on Dataset 2 for the joint segmentation of cytoplasm and nucleus. Dataset 2 images also have low contrast between cytoplasm and background. It makes the learning process challenging for the network. However, MIF-Net still manages outperforming results using its information fusionbased advance design. Visual results for good segmentation and poor segmentation of cytoplasm and nucleus are shown in Figures 9 and 10, respectively. Poor segmentation results are because of indistinctive boundaries in the cells. In the second row of Figure 10, poor segmentation results can be attributed to granules in the nucleus. Numerical results comparison with state-of-the-art methods is also presented in Tables 5 and 6 for cytoplasm and nucleus, respectively. MIF-Net exhibits better segmentation performance compared with other methods.

C. MIF-Net segmentation results on Dataset 3.
Proposed architecture is further evaluated on Dataset 3 to check its effectiveness. Dataset 3 contains images with smallsized cells and some images are containing multiple cells in the same image. MIF-Net still delivers outperforming segmentation results for pixel-wise predictions. Good and poor segmentation visual results are shown in Figures 11 and  12, respectively. Poor segmentation results are because of small-sized WBCs available in the dataset. Subsequently, numerical results comparison with state-ofthe-art segmentation methods is given in Table 7. This comparison also validates the effectiveness of MIF-Net for WBC segmentation.  Tareef

D. MIF-Net segmentation results on Dataset 4.
MIF-Net is further evaluated on Dataset 4 to confirm the effectiveness of the proposed method. This dataset has a few cases having multiple cells in the same image. Some of the cells are quite close to each other. Qualitative results presented in Figure 13 confirm that the proposed method exhibit good segmentation performance. More specifically, in Figure 13 (rows 4 and 5), some cells in the same image are quite close to each other, nonetheless, MIF-Net provides good segmentation for such challenging cases as well.
Quantitative results comparison with state-of-the-art methods is presented in Table 8. It also confirms the outperforming segmentation accuracies of the proposed method requiring only 2.6 million trainable parameters.

E. Discussion
This study proposes a comprehensive framework for joint segmentation of cytoplasm and nucleus from WBC images. It is evident from clinical literature that leukemia brings changes in morphology and count of WBCs. Accurate segmentation of WBC leads to exhibit accurate morphology and provides the exact area of nucleus and cytoplasm. Therefore, the proposed method can aid in the diagnosis and prognosis of leukemia and other related diseases. WBCs are different in shape, size, and morphology which makes the segmentation task challenging. MIF-Net still manages to exhibit good performance by transferring fine boundary information, and multi-scale information fusion using its effective architecture. To confirm it we perform cross-dataset evaluation between multiclass datasets; Dataset-1 and Dataset 2. Results of cross-dataset segmentation evaluation for Dataset-1 and 2 are provided in Tables 9 and 10, respectively. Dataset-1 is evaluated using a trained network with Dataset 2 whereas Dataset 2 evaluation is carried out using a trained network with Dataset-1. The main reasons for underperformance, in cross-dataset evaluation, are possibly the stain and RBCs difference. Dataset-1 uses a rapid staining condition whereas Dataset 2 has a standard staining process. Moreover, Dataset 2 has several adjacent solid-shaped RBCs which influence the network predictions while cross-dataset evaluation. Despite dealing with major changes, MIF-Net still manages to provide a better cross-dataset performance which confirms the generalizability of the proposed method.

Ablation study
In CNN's, pooling layers are traditionally employed for downsampling purposes. However, many studies reported that pooling layers also cause information loss and features degradation [23]. Therefore, we replaced pooling layers with strided convolutional layers to minimize spatial information loss and thereby to achieve enhanced performance. As presented in Table 1, convolution layers are using a maximum of 256 channels, therefore, the number of trainable parameters required by each convolution layer is not much high, and the proposed network is able to outperform using only 2.67 million parameters. In the trade-off between segmentation performance and trainable parameters, we preferred segmentation performance as the total number of trainable parameters was not too high. A large stride is preferred for better performance in CNNs [44], MIF-Net also uses large stride values for multi-scale features propagation as presented in Figure 3. Ablation studies are presented to compare the segmentation results of proposed MIF-Net using strided convolution (proposed) with MIF-Net using pooling layers. Segmentation performance difference for cytoplasm and nucleus is presented in Table 11 and 12, respectively. Ablation studies confirm that the replacement of pooling layers with strided convolution layers is useful to achieve better segmentation performance.

MIF-Net technical contribution
MIF-Net is developed for the joint segmentation of cytoplasm and nuclei from WBCs microscopic images. Initial layers of a CNN contain valuable fine boundary information of the objects. As shown in Figure 3, we split this boundary information in multi-scale using strided convolution layers with different strides. BIP in MIF-Net employs stridedconvolution in place of pooling layers to increase the learnability of BIP. It is evident from the ablation study presented in Tables 11 and 12, the replacement of pooling layers with strided convolution enables the network to achieve significant performance differences. This multi-scale information is fused at different stages of the network for improved boundary predictions. Subsequently, we employed IIF after intervals for reducing the spatial loss and thereby ensuring the features empowerment. Information fusions used in our architecture are based on residual connectivity. To the best of our knowledge, this is the first residual connectivity-based architecture using both internal and external fusions simultaneously for WBCs joint segmentation. The proposed method exhibited outperforming results with a promising computational efficiency.

Challenges in WBC segmentation
In WBCs segmentation, along with problems from medical point of view, there are number of issues and complexities from the computer vision perspective as well. Some of the challenges associated with WBCs segmentation are exhibited in Figure 14. WBCs can have equal or even smaller sizes compared with adjacent erythrocytes which can make its segmentation challenging (row 2, Figure 14). In some cases, the nucleus indistinctive boundary and irregular shape can also become a problem for accurate segmentation (row 5, Figure 14). Similarly, cytoplasmic boundary predictions are also challenging in the case of several adjacent RBCs with cell (row 4, Figure 14). Some of the images have adjacent cutcells which can also mislead the network for false prediction (row 3, Figure 14). In our case, cut-cells could not hit the accuracy of the network for WBC segmentation or count. In the last row of Figure 14, a sample WBC image with nucleated RBC (NRBC) from Dataset 2 is shown. Since NRBC also has a nucleus therefore its segmentation becomes challenging. As evident from Figure 14 (row 6), the proposed method managed to deliver promising segmentation performance even with NRBCs. However, NRBC cases are rare in all four datasets of this study. Therefore, we intend to work on WBC segmentation with the majority of NRBC cases in the future. Many images have WBCs adjacent with RBCs and interestingly it also could not mislead the proposed network (row 4, Figure 14). Likewise, some images have multiple cells in a single image nevertheless MIF-Net provides a high segmentation performance with accurate WBCs count (row 1, Figure 14). However, the segmentationbased method is likely to have a limitation of considering adjacent cells as a single cell for WBC count. To confirm this, we intend to work on WBCs segmentation for adjacent cells cases in the future.

Leukemia diagnosis
Leukemia is one of the critical and common types of blood cancer that occurs due to the replication of anomalous WBCs. Acute myelogenous leukemia (AML), acute lymphoblastic leukemia (ALL), and chronic lymphocytic leukemia (CLL) are categorized as the common types of leukemia [31]. WBC count is attributed as one of the core biomarkers for the clinical diagnosis of leukemia [32]. Medical specialists normally perform manual assessments for checking the size, shape, position, and nuclear-cytoplasmic-ratio (NCR) of WBCs for leukemia diagnosis. Manual assessment is timeconsuming, less accurate, and tedious. Therefore, we proposed a framework that can assist the leukemia diagnosis by providing automatic WBC segmentation and count. AML can be detected by identifying the large size and irregular shape nucleus [6] [33]. Likewise, another study states that size and shape assessment of cytoplasm and nucleus aids in categorizing the ALL and AML [31]. Smudged cells also play a key role in the detection of CLL. Subsequently, MIF-Net is proposed to exhibit all these anomalies associated with morphology and count through the WBC segmentation and computational assessment.

Diagnosis of COVID-19 and other infectious diseases
WBC count and morphology have a vital role in the clinical diagnosis of many diseases like coronavirus disease 2019 (COVID-19), blood cancer, and other infections. COVID-19 has been declared a pandemic by the world health organization in 2019 because of its high transmissibility and fatality rate. It is evident from many studies, COVID-19 also brings significant changes in WBC's count and morphology [34]. These changes and anomalies associated with COVID-19 also vary with disease progression and intensity. COVID-19 patients are observed to have pyknosis which refers to shrinkage of the nucleus [35]. Similarly, some patients exhibit karyorrhexis in which some area of cytoplasm is covered by nuclear membrane because of its rupturing [35]. Reactive lymphocytes having larger cytoplasm is also commonly reported in COVID-19 patients [35], [36]. Many studies also testified to distorted neutrophils and smudged cells in COVID-19 patients [37]. Likewise, pseudo-pelger-Huet is also a common anomaly associated with COVID-19 [38]. Detection for all these anomalies is based on the segmentation performance of the nucleus and cytoplasm. Generally, WBC counting and morphological analysis are carried out manually which is an inaccurate and timeconsuming process [7]. Therefore, proposed automatic WBC segmentation-based method can be used to aid the existing COVID-19 detection systems.

NCR
NCR is the ratio between the nucleus and cytoplasm area in a WBC. NCR is a key measure that provides computational analysis of WBC to assess maturity, malignancy, and morphology for the diagnosis of leukemia and other related diseases [38]. NCR is also associated with cell maturity since the nucleus area decreases with the course of cell maturity [38]. Similarly, leukemia patients exhibit some anomalies which change the shape and area of the cytoplasm. Therefore, pixel-wise predictions provide an accurate area for nucleus and cytoplasm which directly helps in precise NCR computations. In Figure 15, a sample segmented image from Dataset-1 is taken for calculating NCR. Mathematical calculation of NCR for the same image is given in Equation (10). Along with many merits, MIF-Net has a few limitations as well. Since all four datasets of this study are having rare WBCs cases with NRBCs, therefore, we could evaluate only some sample NRBC cases. We intend to work on the segmentation of WBCs images with several NRBCs cases in the future.

VI. CONCLUSION
Leukemia is a fatal disease, its traditional diagnosis is based on manual assessments, which is a subjective, error-prone, and tedious process. To fill this gap, we developed MIF-Net for joint segmentation of cytoplasm and nucleus in WBC images. Leukemia brings changes in the count and morphology of WBC. MIF-Net is a shallow architecture that applies internal and external fusion to provide accurate WBC count and morphological predictions. Initial layers of CNN carry fine boundary information and MIF-Net fuses this boundary information with spatial features for enhancing the segmentation performance. MIF-Net is evaluated on four publicly available datasets and outperformed the existing state-of-the-art methods with superior computational efficiency. The proposed method can steadfastly assist the health experts and contribute to reducing the burden of the diagnostic sector.
In the future, we will work on segmentaion of adjacent cells. In addition, we will also consider other types of cancers for computer-assisted diagnosis.