A Novel Stacked CNN for Malarial Parasite Detection in Thin Blood Smear Images

Malaria refers to a contagious mosquito-borne disease caused by parasite genus plasmodium transmitted by mosquito female Anopheles. As infected mosquito bites a person, the parasite multiplies in the host’s liver and start destroying the red-cells. The disease is examined visually under the microscope for infected red-cells. This diagnosis depends upon the expertise and experience of pathologists and reports may vary in different laboratories doing a manual examination. Another way around, many machine learning techniques have been applied for spontaneous detection of blood smears. However, feature engineering is a challenging task that requires expertise to adjust positional and morphological features. Therefore, this study proposes a novel Stacked Convolutional Neural Network architecture that improves the automatic detection of malaria without considering the hand-crafted features. The 5-fold cross-validation process on 27 , 558 cell images with equal instances of parasitized and uninfected cells on a publicly available dataset from the National Institute of health, the accuracy of our proposed model is 99 . 98%. Furthermore, the statistical results revealed that the proposed model is superior to the state-of-the-art models with 100% precision, 99 . 9% recall, and 99% f1-measure.


I. INTRODUCTION
Malaria is an infectious and life-threatening disease caused by protozoa plasmodium with a minimum of seven days of the incubation period. This disease is transmitted through the bite of female mosquito Anopheles that also known as malaria vectors. Among 400 species of Anopheles mosquitos, only 30 species are malaria vectors. P. falciparum and P. vivax are the most common single-cell Plasmodium species that cause malaria and can be toxic. Initial symptoms, headache, vomiting, fever, and chills can be mild and difficult to recognize as malaria if remain untreated can cause severe illness The associate editor coordinating the review of this manuscript and approving it for publication was Shiping Wen . and may lead to death [1]. According to the World Malaria Report in 2018, a number of malarial deaths i.e., 435000 were reported in 2018 [2].
Malarial virus transmission depends on climate conditions especially after rain and it is more intense when the temperature became feasible for a longer span of life of a mosquito. This is the reason for 90% world's malaria cases occur in Africa and common in other tropical regions such as Latin America and Asia [3]. Early detection can prevent harmful consequences and a patient can be treated with proper medicine on time.
To identify the malarial parasite, numerous techniques have been proposed and microscopic examination of Giemsa stain blood smear is manifest [4]. Other techniques include polymerase chain reaction and rapid diagnostic tests to detect the antigen in the blood. Although other tests outperform in malaria detection, however, microscopy is widespread due to low cost and less complexity and its efficacy depends upon pathologist expertise [5]. False diagnosis may lead to more severe malaria or the use of un-necessary malarial drugs [6]. To improve the treatment of an individual patient by automatic detection of malarial parasites is a very appealing area of research. It has two advantages; first, it will improve diagnosis even with limited resources, and secondly, it is cost-effective.
Automatic parasite detection from thin blood smear under microscope results to differentiate parasite species. The first step is to segment Red Blood Cells (RBCs) and then these segments are classified as infected or uninfected [7]. However, by applying machine learning on the medical image analysis task, feature engineering is challenging to get desired results because hand-crafted features are being used to make decisions [7], [8]. Furthermore, experienced individuals are required to adjust the size, angle, position, and region of interest (ROI) of the image. To cope with these issues, Deep Learning (DL) is being used to extract high-level features that result in end-to-end extraction of features and classification [9], [10].
For traditional image classification and analysis, the spatial correlation of neighboring pixels contains important information [11]. Convolutional Neural Network (CNN) is designed to extract such information i.e., end to end feature extraction and classification through weights and pooling [10]. However, the size of training data greatly affects the classification performance of CNN [12] as opposed to the traditional machine learning models [13]. To cope with the aforementioned issues, transfer learning has been proposed in which the features are extracted by a pre-trained network including but not limited to GoogleLeNet [14], VGGNet [15], and ResNet [16]. Transfer learning has been used as a shortcut where training time is saved by compromising performance [17].
The DenseNet architecture is a variant of CNN which is composed of dense layers in which each layer is fully connected to the later one and each layer serves as a feature extractor [18]. DenseNet significantly improves performance for medical images without considering a large number of parameters [19]. Therefore, this work proposed a stacked CNN architecture that learns a different level of abstraction of a complex representation of malaria parasites for the classification of parasitized and uninfected cells for disease screening. A pipeline of our proposed architecture can be seen in Fig.1.
The rest of the paper is structured as follows. Section 2 describes the most relevant and state-of-the-art researches related to our proposed work. Section 3 gives a summary of the dataset, preprocessing, and related steps performed on the dataset. Section 4, presents a brief explanation of the deep learning model proposed in this work, experiment details, and machine specifications used for the experiment. In section 5 results are discussed and finally, section 6 concludes the work with possible future directions.

II. RELATED WORK
For automatic blood smear classification, traditional machine learning approaches have been used such as Diaz et al., classified blood smear images using a Support Vector Machine (SVM) to detect infected erythrocytes and their infected stage. This approach performed well with 94.0% sensitivity on a dataset containing 450 images [20]. In the structural characterization of blood cells, machine learning plays a vital role such as computer-aided learning techniques for pattern recognition that have been used by many researchers to identify the malaria parasitemia. Das et al., [21] extract features from erythrocytes textures and then apply feature selection techniques to further reduce it to 96 features and then applied statistical techniques such as Bayesian network and SVM for classification. The highest accuracy of 84.0% is achieved by a Bayesian network with the 19 most important features. Shen et al. [22] used a stacked autoencoder to learn features automatically from infected and infected images of cells.
Another way around, computer vision-based malarial parasite detection studies have also been proposed in the literature such as Tek et al. used a modified K-nearest neighbor (KNN) after applying normalization and color correction on input images of 9 blood films for binary classification [23]. Automatic detection and staging of infected RBCs by malarial parasite P. falciparum by using a quantitative phase analysis of images without staining is performed by [24]. Mustafa et al. [25] [36] and Fuzzy C-mean clustering method [37] for malarial parasite detection. Mustafa et al. also proposed thresholding as an important preprocessing step. Fuzzy C-mean outperformed among all the other five methods. Although reported outcomes using machine learning are reasonable, all the techniques need to prove their ability on large datasets as all these approaches have been evaluated on small sets of images. Therefore, there is a need for a deep learning approach such as Convolutional Neural Network (CNN), which has proved robustness on large datasets.
Deep neural networks such as Generative Adversarial Network (GAN) [38], Discrete-time Recurrent Neural Networks (DRNNs) [39] and Memristive Neural Networks (MNNs) [40] have been widely used for various tasks. CNN models has been extensively used for traditional image analysis, phoneme recognition [41], document recognition [42], visual document analysis [43], face labeling [44] and object recognition [45], [46]. CNN model-based on 16 layers of malarial parasite detection was proposed in [26] which only classify the blood cells as infected or uninfected. The model was trained with approximately 27000 images and achieved 97.0% accuracy, specificity and sensitivity which is higher than transfer learning. To compensate limited resources images were resampled to the size of 44×44 pixels. The first application based on a deep belief network was proposed by Bibin et al. [27] and tested on 4100 peripheral blood smear images which achieved 89.66% F1-score.
A customized CNN architecture for the detection of plasmodium in blood smear on Leishman stained focused stacked images was proposed by Gopakumar et al., which show 97.0% sensitivity and 98.0% specificity [28]. Automatic identification of the malarial parasite was proposed in [29] which uses the patient level evaluation and thumbnails to improve the user confidence in system findings with overall 89.7% precision, 94.1% specificity and 89.7% sensitivity. Rajaraman et al. [30] evaluated a pre-trained end-to-end (i.e., feature extraction and classification) CNN model based on single-cell images. They observed that a pre-trained ResNet-50 model as an outstanding tool for diagnosis with an accuracy of 98.6%, 98.1% sensitivity, 99.2% specificity, and 95.7% F1-score. The detailed summary of existing works is shown in Table 1.
Even though the existing state-of-the-art deep learning approaches have shown promising results in malarial parasite detection but still there is room for improvement. Sometimes uninfected blood samples do not contain plasmodium but may contain other types of remnants that are wrongly classified as infected by a classifier. Therefore, color normalization techniques are needed before the classification. In this work, we evaluated the customized CNN model for feature extraction and then to classify images as infected or uninfected cells. Rajaraman et al. use 3 layers of CNN (in our case it is 5), the second thing in all three layers they applied same filter size with the same number of kernels (3×3, @32) while in our case we vary the kernel size (4 × 4, 3 × 3, 2 × 2) with kernels ranging from 32 to 256. Sequentially reducing kernel size helps the model to get trained on small size malarial cell detection.

III. DATASET & PREPROCESSING A. MALARIAL DATASET
Dataset used in our study contains images based on Giemsa stained slides of thin blood smear obtained from malaria screener research activity of 50 healthy patients and 150 P. falciparum-infected patients. It is taken from the National Institute of Health (NIH). Images in a dataset are manually annotated by slide reader experts of Mahidol Oxford Tropical Medicine Research Unit Bangkok, Thailand, and collected at the National Library of Medicine (NLM). The dataset contains 27, 558 images with the equal occurrence of infected and uninfected red blood cell images as shown in Table 2. Infected blood cell image samples contain plasmodium as shown in Fig. 2(a) and uninfected blood image samples do not contain plasmodium as shown in Fig. 2(b). Colored patches of red blood cells are of variant sizes (110 − 150 pixels), which are resampled to 120×120 according to the input requirement of classifier during preprocessing.

B. PREPROCESSING
The original images of the malaria dataset are captured by a mobile device therefore these images are in different sizes. Thus, before any training and testing, we resampled the images to unified image size. On the first step of preprocessing we convert all images to fix the size of 120 × 120 pixels. Secondly, we apply the kernel on the image to get the edges. On the third step of preprocessing we convert BGR to YUV to get the values of one luma component (Y ) and two chrominance components, called U (blue projection) and V (red projection). Color variations in blood cell images exist due to the use of chemicals which can result in error margin. This problem can be solved by normalizing the image. Ciompi et al. [47] applied stain normalization on colorectal tissue classification and proved that it improves the performance. We also applied normalization in the fourth step of preprocessing to equalize the intensity values. The last step of preprocessing is to convert back the YUV image into RGB. Preprocessing steps are important to reduce noise and to improve image quality. Fig. 3(a) represents the original image before preprocessing, Fig. 3(b) shows edges obtained after applying kernel. Fig. 3(c) displays images in YUV color space to get (Y ) component Fig. 3(d) shows intensity equalization and Fig. 3(e) represents images after converting back to BGR color space.

C. DATA SPLITTING AND CROSS VALIDATION
Malaria dataset is split into train/test with a ratio of 70 : 30 and to check the robustness of the model we applied a 5-fold cross-validation that is a moderate value which neither causes high bias or high variance. We randomly partitioned 5 equal subsets of our dataset; one set is used as validation and rest are used to train the model. This process is repeated five times with each subset. Then all these five subsets are averaged and used for model evaluation.

IV. PROPOSED METHODOLOGY A. OVERVIEW
We are using a stacked CNN to overcome the shortcomings of manual feature extraction. We apply re-sampling to extract more information to CNN with a fixed sampling pattern. In addition, we applied stain normalization to preserve image characteristics. VOLUME 8, 2020 The pipeline of the proposed approach composed of the following steps. First, we apply pre-processing steps to input images by re-sampling and normalizing images. Then we apply stacked CNN by fine-tuning it(filters, kernels, and strides) along with max-pooling and dropout layer. As we experimented to test different design strategies, We also created and tested models having 1,3, and 5 convolution layers, pooling layers, and a dense classification layer. We progressively increased the number of convolution layers, dropout layers, and pooling layers to see an increase in the performance of the model. We used these 1 and 3-layer models as a baseline and compare our results with these models. We used varied filter sizes and the number of layers until we achieve the best result. We could not find any other CNN architecture which showed improved performance than our proposed model. Since our 5 convolution layer stacked CNN model is the best deep learning technique for Malarial parasite classification.

B. CNN
CNN is a type of deep neural network that learns a complex hierarchy of features by convolution, nonlinear activation, and pooling layers [12]. CNN is designed for image recognition tasks as well as for image classification. Now it is commonly used in image segmentation. Traditional approach sliding window process regions independently which results in low efficiency. An alternate method is fully CNN that is trained in the end to end fashion by making computation more efficient. Fully connected layers are used at the end of the network for semantic information encoding. We have used stacked CNN as shown in Fig. 4 for detecting parasites in infected cell RGB images. CNN is a multi-layered feedforward network inspired by biology. Filters or kernels in a layer are applied to the input of the first layer or the output of the previous layers and result in a feature map. The output of all convolutional layers is concatenated as a feature map and fed into fully connected layers. CNN has been proved as a de-facto standard by providing robust results in medical domain classification tasks. CNN has been applied for the classification of lung disease [48], brain tumor segmentation [49], chest x-rays [50], chest radiographs [51] and kidney disease [52]. In recent studies CNN has been explored for malarial parasite image classification of Giemsa stained images as parasitized or uninfected in [27], [28], [30].
The main components of CNN are convolutional layers, Rectified Linear Unit (ReLU), and pooling layer or subsampling layer. Features are extracted by the convolutional layer, ReLU is easy: it converts any negative elements of the matrix to 0 and keep the others positive constant. For the activation function, we applied the Rectified Linear Unit (ReLU).
where y is the output activation and i is the given input. During training kernel weights are applied on the input image to extract local features at convolution and subsequent layers extract high-level features form these local features. In multichannel images, CNN opens up ways in malaria diagnosis. The cross-entropy error is used as a loss function as it is used for binary classification. It is calculated as shown in equation 2.
where i is the binary indicator of class labels (0 or 1), a log is a natural logarithm and p is the predicted probability. CNN is a backpropagation variant algorithm and therefore we used sigmoid output as the error function. Here N is the total number of classes in the sigmoid layer and one neuron 93786 VOLUME 8, 2020 corresponds to each class in the output layer. In our case, the number of classes is two parasitized and uninfected. CNN architecture produces output at two neurons in every case of binary classification. For an ideal case of the parasitized cell, the output will be 1 and 0 of first and second neurons respectively. For uninfected images, the output will be 0 and 1 that is the reverse of previous output. The term inside the log function computes the chance of output 1 and i j is the output for a true class that is c for input. In our case, c is 1 for parasitized and 2 for uninfected. At testing time labels are assigned by excluding the softmax loss to maximize the response.

C. EXPERIMENTAL DETAILS
The design of stacked CNN architecture used in our experiment is shown in Fig. 4. Our proposed stacked architecture has been designed of total 22 layers, 5 convolutional layers, 2 max-pooling layers, 4 dense layers, 1 average pooling layer, 1 flatten layer, 8 layers with 20% dropout and 1 fully connected layer as shown in Fig. 4. The Rectified Linear Unit (ReLU) activation function is used in this setup.
Here, an input image of size 120 × 120 is resampled from 200 × 200 pixels which are enough to hold neighborhood details for making a final decision. In conv2D layer filter size (2 × 2), (3 × 3) and (4 × 4) are applied to convolve. The kernel size used at every convolutional layer is shown in the subscript in Fig. 4. In the MaxPooling2D layer, 2 × 2 pool size is used and in average pooling layer (3 × 3) pool size is used. Every convolutional layer is followed by a dropout layer which discards 20% of neurons. The output of the final Con2D layer of 256 output neurons is followed by an average pooling layer. This is followed by 4 dense layers of input to other activation functions. We prefer max-pooling layers before the average pooling layer as we did not want to average the details at an early stage. As it is designed for the binary classification problem, cross-entropy function is used to calculate the error between predicted actual and predicted output. Due to the binary classification output is set to 2. After setting input and output reasonable CNN can do fair classification. We choose the appropriate deep CNN architecture not too deep nor too shallow having 5 convolutional layers in our task. However, we applied max pooling to deal with the nonlinearity of features. Adam optimizer is used to remove biases. Bias is set to 0 and random weights are randomly initialized. Batch size is set to 32 samples and continued for 13 epochs. We applied a shallow to deep CNN model with a 1 convolutional layer to 5 convolutional layers and used them as a baseline method as shown in

V. RESULTS & DISCUSSIONS
All the experiments are carried out on a 2GB Dell PowerEdge T 430 graphical processing unit on 2x Intel Xeon 8 Cores 2.4Ghz machine which is equipped with 32 GB DDR4 Random Access Memory (RAM). The training takes 3.5 hours to give the final result on the 'Malarial dataset'.
We evaluated our proposed architecture on Accuracy, Precision, Recall, and F1-score. The results of all these metrics are shown in Table 4. From Table 4, one can conclude that our proposed architecture with 5 fold cross-validation outperformed with 99.964% accuracy, 100.0% precision, 99.928% recall, and 99.964% F1-score. Table 4 enlist some of the results achieved with a different number of convolutional layers are presented. Stacked CNN-1, stacked CNN-3 and Stacked CNN-5 show accuracy 50.145%, 61.412% and 99.879% respectively. It has been observed that as the number  of CNN layers is increasing till 5-layers, the accuracy, precision, and F1-score also increases. The results obtained by CNN-5 layers with 5-fold cross-validation are the best among all. Detail of filter size and parameters used for tuning can be seen in Table 6. Graphical representation of the result can be seen in Figure 6. Figure 5 shows training accuracy as well as loss, precision, and recall for our stacked CNN architecture. The model is trained for 13 epochs with accuracy > 99%.
We analyzed that it is difficult for a classifier to differentiate stains from plasmodium or any other artifact in blood. This is the reason we applied stain normalization in preprocessing steps. We reduced noise in preprocessing to improve the image quality. Results proved that these activities improved the overall result. Our proposed stacked CNN 5-layered model results are compared with preprocessing steps and without preprocessing steps in Table 5. There is a drastic difference in terms of accuracy, precision, recall, and F1-score after performing preprocessing steps. Our stacked CNN model achieved optimal metric values by using five-phase extensive pre-processing, hyper-parameter optimization, different filter sizes, and dropout layers.  In literature, many studies extracted features by using pre-trained CNN before classification [53] and others used customized CNN [30]. The stacked approach outperformed in the classification of parasitized and uninfected blood image cells. In our case, we identified optimal layers of CNN for feature extraction before classification for malarial parasite detection. It accurately identifies infected cells in terms of accuracy, precision, recall, and F1-score. We have further compared the results of the proposed stacked CNN model with state-of-the-art Deep learning models proposed in the literature. Deep CNN Models [26], [30] and [31] was chosen as baseline methods for comparison with proposed stacked CNN as these models have recently achieved best results for malarial parasite detection. The models selected for comparison purposes are tested on a dataset based on Giemsa stained thin blood smear image slides. In [26] researchers proposed a 16 layered CNN architecture containing 6 convolutional layers and obtained good accuracy of 97.37% and 97.36% F1-score. The setting used in said work is as follows; filter sizes (5 × 5, 4 × 4, 3 × 3) are used and (5 × 5) filter size is used in 4 convolutional layers out of 6 layers. It can be observed that shallow models give reasonable accuracy with a large filter size. Hence filter size plays an important role in CNN architecture. We used a small filter size and reduced it in a sequential way form (4 × 4) to (2 × 2) which helps the model to train on small spots on infected parasitized blood cell images.
The ResNet model was proposed in [30] which outperformed among all five used pre-trained deep learning models with 95.70% accuracy and their 3-layered customized CNN model achieved 94.00% accuracy. Rajaraman et al. [30] also claimed that they need to use color normalization techniques to improve accuracy. The proposed model shows good accuracy improvement over the baseline [30] by applying color normalization at the pre-processing level. The comparison of the proposed model and other state-of-the-art models are shown in Table 7. To get rid of bias and to reduce overfitting we applied 5-fold cross-validation toward optimal development of stacked CNN architecture. Results of 5-fold cross-validation is represented in Table 4 with 99.964% Accuracy, 100% Precision, 99.928% Recall and 99.964% F1-score. We present results of all 5 folds in Table 8 and also compares estimated accuracy after performing 5-fold cross-validation with other best performing models in the literature as shown in Table 9. Evidently in each fold, our model does not show any variance thus ensure robustness and generality.

Importance of Stain Normalization:
We also evaluate the performance of our stacked CNN architecture by training it to the blood smear images without VOLUME 8, 2020  applying stain normalization. Applying the CNN model directly to the dataset images gave a poor accuracy value of 49.61% accuracy. It is evident from Table 5 stain normalization remarkably improved the performance of our proposed model by 50% and reaching to 99.96%. Based on the results presented in Table 5 we investigate the importance of stain normalization in our classification task and we found it important step to include in training and evaluation of the proposed model.

VI. CONCLUSION
Traditional machine learning methods have shown limited accuracy for malarial parasite detection. Therefore, this work proposed a stacked CNN model-based on an end-to-end artificial neural network to improve malarial classification from thin blood smear images. The achieved results prove that with varying filter sizes and depth, convolutional layers can extract different abstract level features for classification. This study proves that features extracted by CNN are better than hand-crafted features. The stacked CNN model using stain normalization outperformed than state-of-the-art deep learning methods. Experimental results of 5-fold cross-validation confirm the superiority of the proposed Stacked CNN model with 99.96% accuracy. Our future direction entails further refine it to improve the classification accuracy within-subject or cross-subject.