Diagnosis of Malaria Using Double Hidden Layer Extreme Learning Machine Algorithm With CNN Feature Extraction and Parasite Inflator

Malaria, a life-threatening disease worldwide, can be diagnosed using antigen tests and microscopy tests. However, both of them are erroneous and time-consuming. Therefore, a trustworthy and fast early malaria prognosis infrastructure is required. In this age of machine learning (ML), there are several ML-based methods to do the task. This paper proposes an unorthodox method for malaria prognosis based on an extreme learning machine (ELM) algorithm. In this regard, Convolutional Neural Networks (CNN), ELM, and double hidden layer (DELM) have been used as classifiers. A CNN model has been used as a feature extractor and also as a classifier to perform a comparative study. The derived features have been used to train ELM and DELM. Two versions of the malaria image dataset have been used: one is the original dataset, and the other is a modified dataset where ambiguous samples have been removed. The parasite inflator acts as the shape increaser of the small, darker malaria parasites in the RBC images in order to detect malaria easily. CNN-DELM has achieved a sanguine result on every performance standard compared to CNN and CNN-ELM. The proposed CNN-DELM method has achieved 97.79% and 99.66% accuracy for the original version and the modified version, respectively. Hence, the proposed CNN-DELM model has also produced either comparable or better results when compared to other methods proposed in the literature, showing its robustness in detecting malaria.


I. INTRODUCTION
In 2015, the World Health Organization (WHO) reported that around 438 thousand people died due to malaria parasites, and 620 thousand people died in 2017, and approximately 300-500 million people were affected by this disease The associate editor coordinating the review of this manuscript and approving it for publication was Yu-Huei Cheng .
yearly [1]. South-East Asia, the Eastern Mediterranean, the Western Pacific, and the Americas have all been recognized as high-risk regions by the WHO. Malaria is a dangerous and deadly disease that is triggered from the bite of female anopheles mosquitoes that host plasmodium parasites. There are around 400 types of anopheles; among them, 30 types mainly act as parasite carrieres. To be a parasite carrier, the female anopheles mosquito has to bite a malaria-infected person. The life cycle of the anopheles mosquito consists of four stages: egg, larva, pupa, and adult. To bring the eggs into the adult stage, they have to feed on human blood. In this feeding process, malaria spreads all around. There are five Plasmodium parasite species, including Plasmodium falciparum, P. vivax, P. ovale, P. malariae, and P. knowlesi. Among them, P. falciparum and P. vivax are the most dangerous. It takes 10 to 15 days or even more to remain in hibernation after a carrier mosquito bite. After the hibernation period, it seizures the red blood cells (RBC) and reduces the number of RBC, revealing different indications of malaria, such as fever, chills, nausea, and vomiting [2]. Malaria should be diagnosed as soon as possible because it can rapidly turn into a severe stage and is life-threatening. It cannot be passed from one person to another, however malaria can be transmitted from mother to fetus, contracted through blood transfusions or sharing injections [3], [4]. This disease can be spread in hot, humid climates near natural water sources where Anopheles mosquitoes transmit deadly diseases [5].
In general, there are two traditional ways to diagnose malaria where; one is thin blood cell microscopy, however it is a prolonged process; usually, a microscopist must manually identify a minimum of 5000 cells to certify the condition. The second is an antigen diagnostic examination; despite being significantly faster than the former, antigen-based rapid diagnostic tests are error-prone and costly. In underdeveloped countries, patients cannot be admitted to prompt care, or antigen-based expedited therapy diagnostic studies cannot be reimbursed. Hence, for this reason, it is crucial to formulate a system that diagnoses malaria quickly and accurately, and the system must be cost-effective.
In the last decade, researchers have proposed several machine learning (ML) and deep learning (DL) based automatic diagnosis systems to detect various life-threatening diseases [6], [7], [8]. In this study, the advantages of both ML and DL have been merged to detect malaria from RBC images more efficiently.
Some RBC samples in the original dataset were mislabeled. This problem has been resolved in the updated dataset. In this study, both datasets have been used. Some preprocessing processes have been utilized, which are referred to as ''parasitic inflators'' in the group. As a feature extractor, a lightweight CNN has been used, while DELM has been used as the final classifier.
The main contributions of this paper are as follows: • Two dataset: the original and the modified malaria dataset, has been used in this study • Small darker parasite spot has been increased in shape using proposed ''Parasite Inflator'' to detect malaria easily • The proposed CNN-DELM model shows the best performance for both original and updated dataset.
The rest of the paper is organized as section II that represents the recent works on malaria classification. Section III describes different steps of the proposed method. The results of different approaches are presented in Section IV, and the outcomes compared to the findings of other recent studies.
The key conclusions are presented at the end in section V.

II. RELATED WORK
Malaria, as a life-threatening disease, has attracted the curiosity of researchers all around the world. Malaria was formerly mainly diagnosed in the laboratory, demanding a massive lot of human knowledge. Automatic systems based on ML, DL have been utilized by most researchers to detect malaria from RBC images nowadays. In this section, some of these recent and well-known studies have been described. Rajaraman et al. developed a customized deep CNN model to extract features from parasitized and uninfected cells images [9]. They have utilized various transfer learning (TL) algorithms, for instance, AlexNet [10], VGG-16 [11], ResNet-50 [12], Xception [13], etc., for the detection of malaria and achieved a high accuracy of 0.959 using VGG-16. Masud et al. proposed leveraging deep CNN for real-time detection of malaria from RBC images [14]. They developed a custom CNN using cyclical stochastic gradient descent (SGD) as an optimizer and achieved an accuracy of 97.30%. Rosado et al. have used SVM to look at how to identify malaria parasites and white blood cells using smartphones [15].
Maqsood et al. utilized various TL algorithms for the detection of malaria from RBC images [16]. Further, they developed a customized deep CNN model that outperforms the other TL models. Before training their customized model, they utilized bilateral filtering and image augmentation techniques to emphasize features of RBC. Sriporn et al. utilized the Xception TL model with various types of activation functions and optimizers for the detection of malaria [17]. They achieved high accuracy of 99.28% while combined Nadam optimizer with Mish activation function. Jain et al. proposed a low-cost, simple CNN with no preprocessing cell images and achieved an accuracy of 97% [18]. Dong et al. proposed DL methods for automatic identification of malaria-infected cells [19]. They utilized three transfer learning models including the LeNet, AlexNet, and GoogLeNet, and achieved an accuracy of over 95%. Yang et al. customized a CNN for the classification of malaria parasites in thick blood smear images [20]. To select parasite candidates, they used an intensity-based Iterative Global Minimum Screening (IGMS) on a thick smear picture and achieved an accuracy of 93.46% ± 0.32% and AUC of 98.39% ± 0.18%. Shah et al. proposed a deep CNN for the detection of malaria from RBC images and achieved an accuracy of 95% [21].
Khan et al. derived the aggregated features from RC images using the aggregated laplacian coefficient [22]. After that, they utilized a random forest classifier to classify malaria from these derived features and achieved a recall of 86%. Olugboja et al. utilized a support vector machine (SVM) and CNN to detect malaria from a private dataset of 2565 RCB images collected from the University of Alabama at Birmingham [23]. They achieved an accuracy of 95% and 91.66% using CNN and SVM, respectively. Fuhad et al. derived features from RBC cells images using CNN for automatic diagnosis of malaria [24]. They also performed various preprocessing techniques, for instance, knowledge distillation, data augmentation, etc. For the classification of malaria, they used SVM of k-nearest neighbors (KNN) and achieved an accuracy of 99.23%. Anggraini et al. introduced a system that uses image segmentation techniques to separate blood cells from their background [25]. They performed segmentation by global thresholding to retrieve erythrocyte and other blood cell components in each image. Gopakumar et al. focused on a stack of images and developed custom CNN [26]. The cell counting problem was recast as a segmentation problem, and a two-level segmentation technique was proposed.
Liang et al. proposed an ML technique based on a CNN to automatically categorize single cells in thin films [27]. Based on 27,578 singles, ten-fold cross-validation was performed and achieved an average accuracy of 97.37% using a new 16-layer CNN model on cell images. Tomari et al. utilized the global threshold approach on a green channel color image to extract RBCs from the background [28]. Then, a morphological filter and related component labeling were used, noise and holes in the RBCs were removed. Following that, the geometrical features of the RBCs are used to extract information from them. They utilized an Artificial Neural Network (ANN) to detect malaria from this derived informations. Diaz et al. utilized an SVM to identify preprocessed blood smear images to detect infectious erythrocytes [29]. They used a dataset of 450 malaria images, and their model performed well in terms of specificity and sensitivity.
Pattanaik et al. proposed a computer-aided diagnosis (CAD) model for the detection of malaria from cell images [30]. They used an artificial neural network with a functional connection and sparse stacking is used to pre-train the parameters of the system and achieved a classification accuracy of 89.10% and sensitivity of 93.90%. Bibin et al.
proposed a novel deep belief network (DBN) for the classification of malaria from RBC images [31]. For the classification of 4100 images of peripheral blood smears into the parasite or non-parasitic class, they utilized a trained model based on a DBN and achieved f1-score and sensitivity of 89.66%, and 97.60% respectively.

III. PROPOSED DELM ARCHITECTURE AND CONSTITUENTS
For the previous few decades, researchers have concentrated on developing various computer-aided systems that can help detect many life-threatening disorders from the image of a medical exam. Many diseases such as Malaria, heart disease, breast cancer, brain tumor and some other disorders, all demand early detection to save lives. It is critical to create a system that identifies and addresses these diseases early on to save money and lives. This research introduces a new architecture of diagnosing malaria parasites. Kaggle has provided the RBC images. For pre-processing of the cell images, this study used recognized image processing techniques such as morphological processing and other processes, which are explained later. Once pre-processing has been done, a lightweight CNN has been designed to extract and find the most informative attributes. Additionally, a newly structured DELM method has been presented for identifying malaria-infected cells. Figure1 shows the approach under consideration.

A. DESCRIPTION OF DATASET
The malaria dataset has been collected from the Kaggel [32]. It contains 27,558 images of RBC. The dataset contain equal amount of malaria infected and uninfected samples hence it is a balance dataset.
When analysis of the dataset has been perform, some of the samples have been arisen confusion. It shows the probability of miss-labeling of some samples. Fuhad [24] has VOLUME 11, 2023 also got same problem and to solve this problem, the authors have been consulted with medical expert. They removed 647 falsely labeled parasitized data which was considered as false parasitized and 750 falsely labeled uninfected images which were named as false uninfected. The samples have been labelled correctly. They have shared the updated samples through google drive [33].
The updated dataset contains 13,132 infected and 13,029 uninfected samples. This study has been carried out using both the original and updated dataset. Figure 2 shows some samples regarding labeling issue.

B. PRE-PROCESSING
Image pre-processing is a crucial step for this type of study because model outcome highly depends on pre-processing techniques applied. It makes the learning process smooth.

1) PARASITE INFLATOR
To detect malaria, thin or thick blood seamer is stained with chemicals and then a microscope is used to recognize the parasite. Due to the chemical effect, the parasite can be identified as a darker spot in the blood seamer.
In this study, any small darker spot that represents a parasite has been increased in shape so that it is easy to classify. As the background of the malaria images is black, at first the background has been converted into white. After that, the darker parasite spots have been inflated using erosion. To perform erosion, a 4 × 4 sized rectangular kernel has been used. To inflate enough, the erosion has been performed 15 times. Figure 3 represents some samples before and after preprocessing. After preprocessing the small sized parasite are increased in size that is easily recognizable.

2) RESIZING
The malaria dataset contains images of different sizes. Same size of images is required to conduct the analysis. Due, to make the analysis faster the images have been resized 32 × 32 pixels in size. Images have been resized after applying the parasite inflator.

3) NORMALIZATION
Pixel intensity of an image varies from 0 to 255. Each pixel value acts as a feature in the CNN model. To convert all the pixel values into the range of 0 to 1 for reducing the learning complexity, normalization is used. It can be done by dividing image pixel values by 255.
The preprocessing can be represented in this way:  [34]. It will help handle more data and get better data management. Because of the intricate imaging attributes for cell images, a new CNN model for finding 512 of the most significant aspects has been designed for malaria identification. The Figure shows the proposed a shallow CNN model. The proposed CNN model utilizes three convolutional layers and two fully connected layers. The convolutional layers use batch normalization and max-pooling after each layer. The inputs of the layers are normalized to speed up running and increase stability [35]. A pooling layer has been put after each convolutional layer. The researchers used Maxpooling with 2 × 2 filters, which enable the extraction of the most significant parts of images via the capability to obtain the greatest value in each cluster at the convolutional layers [36], [37], [38]. Figure 4 shows the proposed CNN for features extraction. First, the ''SAME'' padding has been included in the first convolution layer as the output results from applying filters to all image tuples. Border components have been examined because of the features that can be included. No extra padding has been used to compute the border elements. On the contrary, the 'VALID' padding ignored the border components. To avoid gradient vanishing, ReLU has been utilized as an activation function [18]. After the final dense layer, a sigmoid has been introduced.
Dropout regularization has been used to overcome the overfitting problem. Throughout the training phase of a neural network, randomly selected nodes are discarded during the weight update phases [39]. The Adam optimizer was used because it is exact for CNNs, improves training on vast volumes of data, and for calculating the loss binary cross-entropy has been used [40]. Using the developed model, the learning rate is 0.001, and the model was trained for 100 epochs, with a batch size of 1024. And finally, 512 prominent features have been derived from the last dense layer. Here is a summary of the lightweight CNN model shown in Table 1.

D. EXTREME LEARNING MACHINE
To reduce the training time complexity caused by the repeated model parameter tuning process, ELM has been developed by Huang et al. [41]. It is a neural network (NN) which is feedforwarding consisting of an input layer, single hidden layer and an output layer. It can attain the lowest training error. For a regular NN, network parameters such as weight and bias are chosen arbitrarily and the back-propagation (BP) algorithm is used to optimize the parameters. In contrast to regular NN,  though the weights between input and hidden layer are chosen arbitrarily, no optimization process is used. And the weight between the hidden and output layer is calculated analytically using the Moore-Penrose Pseudoinverse method. The ELM architecture is simple and does not have iterative parameter tuning that makes the training process faster and achieves adequate performance in disease classification [42], [43]. Figure 5 shows the proposed DELM model for malaria classification. In ELM, 500 hidden nodes have been used. In this study, DELM has also been used. It has been consisted of 500 hidden nodes in each layer. There is no vanishing gradient issue with ELM. However, the ReLu activation function was employed with hidden nodes to provide non-linearity in this case. The number of hidden nodes and the ReLu activation function have been chosen through trial and error method.
ELM can be represented using following steps: DELM has been proposed by Ding [44]. DELM can implemented in the similar manner. The required step are given bellow [45]:

A. EVALUATION MATRICS
For determining the effectiveness of the models, several evaluation standards have been addressed for instance accuracy, precision, recall, f1-score, and receiver operating characteristic (ROC) curve. All of these criteria have been calculated using a confusion matrix.
Accuracy is defined as the percentage of accurately detected cases among all cases. It demonstrates how good the classification system is at detecting patterns [46], [47].
where true positive, true negative, false positive, and false negative have been represented as TP, TN, FP, and FN. In this study, precision has been measured as the percentage of patients who properly recognized having malaria out of all those who had the disease [48].
The recall has been measured as the percentage of malaria patients must be correctly identified as malaria out of all patients [48].

B. ENVIRONMENTAL SETUP
An online platform kaggle used for running the python code with TPUs. Keras has been used to implement the algorithms, with TensorFlow as the framework. MS Excel has been used for preparing required graphical representations of this study.   A PC with a 64-bit Windows 10 operating system has been used.

C. ORIGINAL DATASET
This section discusses the experiments that have been conducted and the results obtained through the use of various performance measurements. First, the RBC images have been collected from Kaggle and pre-processed. Then developed a shallow CNN for extracting the most informative features from these processed images and also used for the classification of malaria. Finally, ELM and DELM have been proposed to detect malaria more accurately and efficiently than CNN. The algorithms have been trained using 24802 processed images where 12387 images from normal, 12415 images are from infected patients with malaria. To evaluate the performance and effectiveness of the models, 2756 data has been used for testing the models where 1392 from uninfected and 1364 cell images from infected patients. A confusion matrix (CM) has been used to calculate the accuracy, precision, recall, f1-score, and AUC shown to examine the reliability of the proposed models. Figure 6, 7 and 8 demonstrated the CM for CNN, CNN-ELM and CNN-DELM respectively. Figure 9, and 10 shows the accuracy and loss curve for the CNN. The training and testing accuracy for detection    of malaria is 97.45%, and 96.59% respectively, whereas the losses are 0.1126%, and 0.1472% respectively. All the scores of CNN has been presented using Table 2.
The receiver operating characteristics (ROC) curve of the CNN model is 98.98%, which showed in Figure 11. It serves as a means of determining the efficacy of a learning model for consistency [50]. After extracting the most prominent features using CNN, standardization has been performed on these features. Then ELM has been applied for the detection.
First, CNN-ELM has been utilized and achieved an optimistic accuracy, precision, and recall of 97.68%, 98.09%, and 97.23%, respectively, which are shown in Table 3. Then utilizing CNN-DELM, the model's performance increased with accuracy and recall of 97.79%, and 97.67%, which are demonstrated in Table 4. The ROC of the CNN-ELM and       to train the models; 11759 have been from healthy people, while 11784 have been from malaria-infected people. The suggested models' performance and efficacy have been evaluated using 2616 data, including 1269 images of uninfected cells and 1347 images of infected cells. The accuracy, precision, recall, f1-score, and AUC of the proposed models were calculated using a confusion matrix (CM). The CM for CNN, CNN-ELM and CNN-DELM have been presented in Figures 15, 16 and 17, respectively.    The accuracy and loss curves for the CNN are shown in Figures 18, and 19. When it comes to the identification of malaria, training and testing accuracy are 99.59% and 99.50%, with losses of 0.0065% and 0.0372%, respectively. All the scores of CNN with updated images have been presented in Table 5.
It is shown in Figure 20 that the CNN model has a ROC curve with a 99.99% confidence level. Following the extraction of the most significant features using CNN, standardization of these features has been conducted on the features that have been derived. Then ELM and DELM were used to detect malaria, which was a breakthrough in the field.    First and foremost, CNN-ELM has been employed, with optimistic accuracy, precision, and recall rates of 99.62%, 99.61%, and 99.62%, respectively, as indicated in Table 6. Then a CNN-DELM model has been introduced and achieved an improved accuracy, and recall of 99.66%, which are represented in Table 7. According to Figures 21 and 22, the ROC of the CNN-ELM and CNN-DELM is 99.49% and 99.52%, respectively.

E. COMPARISON OF PERFORMANCE TO OTHER WORKS
The performance of the proposed DELM with shallow CNN for malaria classification from processed RBC images has been evaluated and compared with that of other current models. Here comparisons have been performed for original and modified images. The current state-of-the-art models have been summarised in Section II. Table 8 represents the comparative analysis. Existing methods have used different train-test ratios. The proposed method has also been evaluated through different train-test ratios to conduct the comparison. Based on the ratios, the table is divided into four parts. First three parts for the original dataset. The last part is for the modified dataset.
Rajaraman et al. utilized the Kaggle RBC images for the detection of malaria and achieved the highest accuracy 95.90% by using ResNet-50 [9]. Fatima et al. proposed a computer-aided system for the prediction of malaria [51]. They have utilized four filters which were average, Gaussian lowpass, median, bilateral filter for removing noises and enhancing the quality of cell images. They achieved high accuracy and f1-score of 91.80% and 91.53% using the bilateral filter. Maqsood et al. trained their customized CNN model using 17370 cell images and 8272 images used for testing their model [16]. They achieved accuracy and f1-score of 96.82%. The proposed CNN-DELM has been trained and tested using the same fold of cell images from the original dataset and achieved an optimistic accuracy and recall of 97.59%, and 97.58% respectively which are shown in table 8.
Jain et al. used the Kaggle RBC images for trained their various models [18]. They achieved the highest accuracy of 97.00% while 9926 cell images using for testing their CNN  model. The proposed CNN-DELM model has been trained and tested using the same dataset and achieved an accuracy of 97.26%.
Khan et al. proposed three ML classifiers for the classification of malaria from cell images [22]. They tested their models using 5511 images from the Kaggle and achieved the highest accuracy recall and precision of 0.86, and 0.82 using a random forest classifier. The same testing dataset has been used for CNN-DELM and achieved recall, the precision of 98.63% which is quite good from their work.
Fuhad et al. corrected the RBC images with help of a medical expert because some of the images are mislabeled [24]. they found 647 false parasitized and 750 false uninfected images and they removed the falsely labeled images and reduced the data from 27558 to 26161. They trained their models CNN-SVM, CNN-KNN by using these modified cell images. They have attained an accuracy 99.23% and recall of 99.54%. The same modified images have been used  for testing the proposed CNN-DELM and it has obtained the highest accuracy and recall of 99.66%, and 99.55% respectively. Figure 24 shows the graphical comparison of the proposed method with CNN [24]. From the comparative performance analysis represented in Table 8, the proposed model shows its robustness. It outperforms the existing methods based on different performance benchmarks. Table 9 shows different state-of-the-art models and proposed models with brief structural descriptions. From Table 8 and Table 9, it can be said that the comparison is fair enough. The exact number of test samples of a similar dataset (original and modified) has been used to compare. Data preprocessing (using the proposed parasite inflator) and feature extraction techniques have been used, similar to the existing methods. Further, ELM has been used as a classifier that acts as a strong classifier. The mixture of the feature extraction capability of CNN and the excellent classification capability of DELM shows the way of optimistic outcomes. Hence, the unorthodox structure of the proposed model can claim novelty. The datasets that have been used in this study, both original and updated, are balanced and large enough for the robustness of the model. There is color variation in the images of RBC from this dataset. As chemicals have been used to recognize malaria parasites in RBC, the color may change with the amount and VOLUME 11, 2023 In this study, images are good enough. The effectiveness of this framework with low quality is excluded from the scope of this study.

V. CONCLUSION
in this paper a novel CNN-DELM has been presented for the automated diagnosis of malaria from tiny blood samples. The result shows that the detection accuracy improves when falsely labeled images are removed. Due to several morphological techniques, the malaria parasites are highlighted. A lightweight CNN has been trained on preprocessed cell images for extracting the most informative features. Eventually, DELM successfully differentiate between infected and uninfected samples from these features. The proposed parasitic inflator, CNN's capability to extract features, and DELM's ability to generalize make the proposed approach more effective. The efficacy of the proposed framework is clear from its magnificent outcomes; whereas, the CNN-DELM has obtained 97.79% accuracy and 97.88% recall for the original dataset. Also, it has acquired a promising score of 99.66% accuracy and 99.55% recall for the updated dataset. Since the proposed framework outperformed the state-of-art models, the result of this study is expected to help medical professionals for detecting malaria and diagnose the malaria-affected individuals more quickly and efficiently.
We have a plan to work on the multiple hidden layer ELM algorithm with CNN feature extractor and parasite inflator for malaria detection. Research and Development Institute, and an Assistant Professor of electrical and electronics engineering at the University of Dhaka. He is currently a Senior Lecturer in computer science with the University of Huddersfield, U.K. He has authored more than 100 publications in peer-reviewed international journals. His research interests include applied AI, machine learning, data science, and the IoT.
MD. NAHIDUZZAMAN received the B.Sc. degree in computer science and engineering from the Rajshahi University of Engineering and Technology (RUET), Kazla, Rajshahi, Bangladesh, in 2018. He is currently a Lecturer with the Department of Electrical and Computer Engineering, RUET. Apart from this, he also works as a Research Assistant with the Qatar University Machine Learning Research Group. He has several peer-reviewed journal publications. His research interests include machine learning and its applications in disease detection, power sector, and agriculture. He is a Reviewer of BioMedical Engineering OnLine.
MD. ROBIUL ISLAM received the B.Sc. degree in computer science and engineering from the Rajshahi University of Engineering and Technology (RUET), Kazla, Rajshahi, Bangladesh, in 2017, where he is currently pursuing the M.Sc. degree. He is also an Assistant Professor with the Department of Electrical and Computer Engineering, RUET. As a data scientist, he is fascinated by topics like segmentation, machine learning, and pattern recognition.
MD. SHAMIM ANOWER (Member, IEEE) received the Ph.D. degree (Hons.) in electrical engineering from the University of New South Wales, in 2012. He is currently a Professor with the Department of Electrical and Electronic Engineering, Rajshahi University of Engineering and Technology. He has more than 150 peer-reviewed publications to his credit, including journal (18 Q1 Scimago ranked) and conference articles with H-indices of 20, i10-indices of 33, and 1205 citations. Energy and mining technology, quantum information, advanced digital, data science, and ICT, and cyber security are the three focus areas covered by these reports. He is also an IEB Fellow.