Development and Validation of an Early Scoring System for Prediction of Disease Severity in COVID-19 Using Complete Blood Count Parameters

The coronavirus disease 2019 (COVID-19) after outbreaking in Wuhan increasingly spread throughout the world. Fast, reliable, and easily accessible clinical assessment of the severity of the disease can help in allocating and prioritizing resources to reduce mortality. The objective of the study was to develop and validate an early scoring tool to stratify the risk of death using readily available complete blood count (CBC) biomarkers. A retrospective study was conducted on twenty-three CBC blood biomarkers for predicting disease mortality for 375 COVID-19 patients admitted to Tongji Hospital, China from January 10 to February 18, 2020. Machine learning based key biomarkers among the CBC parameters as the mortality predictors were identified. A multivariate logistic regression-based nomogram and a scoring system was developed to categorize the patients in three risk groups (low, moderate, and high) for predicting the mortality risk among COVID-19 patients. Lymphocyte count, neutrophils count, age, white blood cell count, monocytes (%), platelet count, red blood cell distribution width parameters collected at hospital admission were selected as important biomarkers for death prediction using random forest feature selection technique. A CBC score was devised for calculating the death probability of the patients and was used to categorize the patients into three sub-risk groups: low (<=5%), moderate (>5% and <=50%), and high (>50%), respectively. The area under the curve (AUC) of the model for the development and internal validation cohort were 0.961 and 0.88, respectively. The proposed model was further validated with an external cohort of 103 patients of Dhaka Medical College, Bangladesh, which exhibits in an AUC of 0.963. The proposed CBC parameter-based prognostic model and the associated web-application, can help the medical doctors to improve the management by early prediction of mortality risk of the COVID-19 patients in the low-resource countries.


I. INTRODUCTION
COVID-19 disease recorded in Wuhan, China, in December 2019 has quickly spread throughout the world while some parts of the world are even suffering from the second and third waves of the pandemic. As of July 12, 2021, the worldwide confirmed cases are 187 millions in more than 206 countries with 4.03 millions deaths caused by COVID-19 [1]. COVID-19 was declared a pandemic by the World Health Organization (WHO) on March 11, 2020 [2]. The coronavirus mostly affects the lungs of the patients and leads to pneumonia [3]. The majority of patients were mildly affected by the disease with common respiratory symptoms [4]. Fever and cough are the most common clinical symptoms. There are around 20 % of the cases, where the radiographic chest images did not show any abnormalities in the initial stages of the COVID-19 infected patients [5]. Serious cases should meet one or more of the following procedures, according to the sixth edition of the Novel Coronavirus Pneumonia Diagnosis and Treatment Plan: 1) shortness of breath (30 breaths per minute), 2) oxygen saturation (93 percent at rest), or 3) arterial partial pressure of oxygen/fraction of inspired oxygen (300 mm Hg) [6]. Roughly 10-15% patients associated with severe outcomes showed extreme conditions such as severe pneumonia, Acute respiratory distress syndrome (ARDS), or multiple organ failure, before or during hospitalization [7]- [9]. A large cohort study from 2449 patients showed that large hospitalization (20-31 %) and intensive care unit (ICU) admission rates (4.9-11.5%) have overwhelmed the healthcare system [8]. This can be prevented by prioritizing hospital care for patients who are at high risk of deterioration and death while treating the low-risk patients in ambulatory settings or at home-based self-quarantine facilities. As a result, specific predictive methods for predicting the risk of severe COVID-19 infection are urgently needed [7].
Several studies have shown that biomarkers can assist in the classification of COVID-19 patients with an increased risk of serious disease and mortality by providing a vital information about their health status. Clinical machine-learningbased nomograms have been developed and proposed by several groups [10], [11], which allows parameter-based risk estimation, thus easing the decision-making process for the management. Zheng et al. [12] showed from 141 patients of Zhejiang, China that the white blood cell count, neutrophil count, and platelet counts were at the normal range for 87.9%, 85.1%, and 88.7% of the patients, respectively. Among the severe patients, 82.8 percent had lymphopenia, which pronounces with disease progression. A scoring system called NLP based on neutrophil, lymphocyte, and platelet counts has been shown to be useful in patient stratification. This model was developed on a small dataset and was not validated on any external dataset. Al Youha et al. [13] proposed the Kuwaiti Progression Indicator (KPI) Score as The associate editor coordinating the review of this manuscript and approving it for publication was Derek Abbott . a prognostic model for predicting COVID-19 severity progression. Unlike other self-reported symptoms and arbitrary parameter-driven scoring schemes, the KPI model was based on quantifiable laboratory readings. The KPI score categorizes patients as low risk if their score is below −7 and high risk if their score is above 16, but the authors consider the risk of advancement in the intermediate category (patients with scores between −6 and 15) to be unknown. Many prognostic systems, however, fall into this intermediate category. Weng et al. [14] used 301 adult patients to build an early prediction score called ANDC to predict mortality risk for COVID-19 patients. Age, neutrophil-to-lymphocyte ratio (NLR), D-dimer, and C-reactive protein reported during admission were identified as mortality predictors for COVID-19 patients using least absolute shrinkage and selection operator (LASSO) regression [14]. A nomogram with an integrated score, ANDC was proposed to ascertain death probability, which demonstrated a good association between the true and predicted output. Two cut-off values of the ANDC score were used to divide COVID-19 patients into three risk categories: low, moderate, and high. In the low-risk, moderate-risk, and high-risk groups, the death likelihood was 5%, 5% to 50%, and more than 50%, respectively. Ramachandran et al. [15] showed that elevated Red Blood Cell Distribution Width (RDW) in hospitalized COVID-19 patients is associated with a substantially increased risk of mortality and septic shock. However, other blood count parameters, which were not mentioned in this article, should be investigated in relation to RDW. Based on 372 COVID-19 patients from China, Gong et al. [16] showed that one demographic and six serological markers (serum lactate dehydrogenase, C-reactive protein, the coefficient of variation of red blood cell distribution width (RDW), blood urea nitrogen, albumin, and direct bilirubin) were linked to extreme COVID-19. However, the performance of the reported models degraded on the validation cohort. Both elevated RDW at admission and diagnosis were found related to an increased mortality risk based on 1,198 adult patients diagnosed with COVID-19 from four hospitals between March 4, 2020, and April 28, 2020 [17]. Jianfeng et al. [18] proposed a prognostic model using lactate dehydrogenase, lymphocyte count, age, and oxygen saturation (SpO2) as primary predictors of COVID-19-related death based on a cohort of 444 patients. Internal and external validation showed strong discrimination, with C-statistics of 0.89 and 0.98, respectively. However, external validation revealed over-and under-prediction for low-risk and high-risk patients, respectively, even though the model was promising for internal validation. Yan et al. [19] used a machine learning method to identify three biomarkers (lactic dehydrogenase (LDH), lymphocytes, and highsensitivity C-reactive protein (hs-CRP)) and used them to predict individual patients' mortality 10 days ahead with over 90% precision. High levels of LDH, in particular, have been shown to be important in distinguishing the majority of patients, who need immediate medical attention. However, no scoring system is introduced in this study that can aid clinicians in quantitatively stratifying patients at risk.
Using a cohort of 1,590 patients from 575 medical centers, Liang et al. [20] proposed a deep learning model to develop an online calculator for patient triage at admission by identifying the severity of illness. This will ensure that the patients at the highest risk will receive adequate treatment as soon as possible and thereby healthcare resource utilization will be maximized. This model is a very useful technique for patient stratification however, it depends on demography, radiography, and other clinical criteria as well as comorbidity data to make a decision, which is not always accessible to the low-resources countries. Wang et al. [21] found that the neutrophils to lymphocytes ratio (NLR) and Red Cell Distribution Width Standard Deviation (RDW-SD) combined parameter is the best hematology index for predicting the severity of COVID-19 patients. However, only 45 COVID-19 patients were included in the study. Huang et al. [22] used nine independent risk factors at admission to the hospital to quantify the risk score and stratify the patients into various risk groups in a retrospective, multicenter analysis on 336 confirmed COVID-19 patients and 139 control patients. This research did not use any external validation. The independent relationship between the baseline level of four indicators (NLR, LDH, D-dimer, and Computer tomography (CT) score) on admission and the severity of COVID-19 was assessed using logistic regression technique. The presence of high levels of NLR and LDH in serum could help in the early detection of COVID-19 patients who are at high risk. It was shown that the usage of LDH and NLR together increased the detection sensitivity [23]. This model, however, is based on a CT image-based ranking, which is not available for all patients. In a limited number (84) of hospitalized patients with COVID-19 pneumonia, Liu et al. [24] suggested combining the NLR and CRP to predict 7-day disease severity. A retrospective cohort of 80 COVID-19 patients treated at Beijing You'an Hospital was analyzed to identify risk factors for serious and even fatal pneumonia and establish a scoring system for prediction, which was later validated in a group of 22 COVID-19 patients [25]. Age, diabetes, coronary heart disease (CHD), percentage of lymphocytes (%LYM), procalcitonin (PCT), serum urea, CRP, and D-dimer were found to be correlated with mortality by LASSO binary logistic regression in a chort of 2,529 COVID-19 patients. The researchers then used multivariable analysis to determine that age, CHD, %LYM, PCT, and D-dimer were independently posing risk for mortality. A COVID-19 scoring system (CSS) was developed based on the above variables to classify patients into low-risk and high-risk categories with discrimination of AUC = 0.919 and calibration of P = 0.64 [26]. Another study on 82 COVID-19 patients found that respiratory, cardiac, hemorrhage, hepatic, and renal damage were responsible for 100%, 89%, 80.5%, 78%, and 31.7% of deaths, respectively. The majority of the patients had elevated CRP (100%) and D-dimer (97.1%) [25]. D-dimer is shown as a prognostic factor which has also been shown to substantially increase the chances of death if it is greater than 1 µg mL −1 at the time of admission [27], [28]. While several predictive prognostic models for early detection of individuals at high risk of COVID-19 mortality have been proposed, there is still a significant gap in the prediction model based on complete blood count (CBC) parameters based on detailed interpretable machine learning based models and quantitative scoring framework. Measurement of multiple biomarkers for a large number of patients is difficult in different countries and healthcare facilities. This is a critical problem for lowresource countries (LRCs), thus it was interesting to see how well a model based on CBC parameters could stratify the risk-factor of COVID-19 patients compared to a standard model based on all of the parameters recorded in the literature. No previous studies have evaluated the important biomarkers among CBC parameters as early warning models for predicting the risk of severe COVID-19, to the best of our knowledge.
Important CBC biomarkers were identified using machine learning algorithms in order to develop an early prediction based scoring technique, which can stratify the patients into risk groups. This can assist in better patient care based on easily accessible CBC biomarkers. The top-ranked CBC features with the best classification performance were used to construct a multivariable logistic regression-based nomogram to predict the risk of death. The results of this study include a quick, easy-to-use, and accurate algorithm for predicting high-risk individuals and can help in the efficient utilization of healthcare resources.

II. METHODOLOGY
The authors have used a publicly available clinical dataset from China to develop the machine learning model and scoring techniques in this study, details of which is provided later. Moreover, the authors have collected a dataset in collaboration with medical doctors from different COVID-19 care centers in Bangladesh for external validation. Firstly, the Chinese raw dataset was pre-processed before experimenting with various popular feature ranking techniques and machine learning models. The pre-processing includes filling the missing data using data imputation techniques and then normalize the imputed data for feature ranking and classification. The best performing combination of the features (with the help of popular feature ranking techniques) and machine learning classifiers were investigated. The best performing logistic regression classifier was used to develop a multi-variate nomogram based scoring technique to detect the risk of mortality due to COVID-19. The developed nomogram is then further validated with the completely unseen external dataset of Bangladeshi population to confirm its robust performance. The details of the complete methodology is shown in Figure 1.

A. STUDY DESIGN AND PATIENTS
Firstly, this retrospective study was performed in the COVID-19 healthcare center for confirmed patients in Wuhan at the center of the outbreak in China. Blood samples and Medical records were collected from 375 patients from 10 January 2020 to 18 February 2020. Epidemiological, demographic, clinical, laboratory, and mortality outcomes were recorded from electronic health records. This dataset of 375 COVID-19 patients were made public by Yan et al. [19] and the study was approved by the Tongji Hospital Ethics Committee.
Secondly, a retrospective study was performed between 12 April and 31 August 2020 at Dhaka Medical College Hospital, Bangladesh which is approved by the Hospital Ethical Committee. Clinical parameters with hospital admission, discharge/death outcomes were collected for 103 patients (Survived-61 (59.22%), Death-42 (40.78%)) and these data were used for external validation. The Bangladeshi dataset is made publicly available by the authors and can be found in [29].

B. STATISTICAL CHARACTERISTICS
Statistical analysis of the patients' demographic, clinical, and outcome data was carried out by Stata/MP 13.0 software. Gender, age, and twenty-three complete blood count (CBC) parameters were identified from the Chinese database which has seventy-six bio-markers. Gender differences in data were described using percentage while other variables were characterized using missing data, the mean and standard deviation for survival and death outcomes. Univariate analysis was conducted on gender while Wilcoxon's ranked tests were done on the rest of the variables. P-value was calculated on a 95% statistical significance threshold and therefore P-value should be less than 0.05 to be considered as significant. Each patient has multiple blood samples however, some patients have some parameters missing while others have different parameters missing. The patient data at admission was used to identify the key predictors of the disease severity. Missing data can be dealt with differently in different types and sizes of data. In the simplest technique, patients with incomplete parameters could be removed from the study if the number of subjects is very large in the study however, it can lead to loss of very useful information of the data [30]. It is a good practice to identify and replace missing data i.e., carry out data imputation prior to modeling for the prediction task. A popular approach to missing data imputation is to use a machine learning model to predict the missing values. This requires a model to be created for each input variable that has missing values. There are several models popularly used for this purpose, such as k-nearest neighbor (KNN), random forest, multiple imputation using chained equations (MICE) data imputation technique, etc. KNN imputation technique was proven to be generally effective for clinical data imputation [31]. Therefore, the KNN technique was used in this study. The hyper-parameters of the KNN algorithm is the distance measure (e.g. Euclidean distance) and the number of contributing neighbors for each prediction. KNN parameters were set to: number of neighbors = 5, weights = 'uniform', and distance = Euclidean.
Several data normalization techniques were used to transform the features to be on a similar scale, which improves the performance and training stability of a model. Commonly used techniques are scaling to a range, clipping, log scaling, and z-score normalization. When the dataset does not contain extreme outliers, the Z-score technique is suitable for normalization, and therefore, the 'Z-score' technique was used for data normalization. Since the number of patients with death and survival outcomes were not equal or the dataset was imbalaced, therefore, a very popular clinical data augmentation technique called Synthetic Minority Over-sampling Technique (SMOTE) was used to make the dataset balance.

2) TOP-RANKED FEATURES IDENTIFICATION
The feature selection technique automatically selects those features which are the most contributing features for predicting the output. This reduces overfitting, improves accuracy, and reduces training time. Several different feature selection techniques are used in the literature, such as univariate selection, recursive feature elimination (RFE), principal component analysis (PCA), bagged decision trees like random forest and extra trees, and boosted trees like Extreme Gradient Boosting (XGBoost), etc. In this study, authors investigated random forest, extra tree and XGBoost techniques. However, random forest provides higher accuracy in selecting top-10 features in the mortality prediction among the 25 features including age, gender, and CBC parameters. As per literature this technique is better suited for datasets with many predictor variables [32].

D. SELECTION OF CLASSIFICATION MODE
In this study, several supervised machine learning (ML) classification models such as linear discriminant analysis [33], random forest [34], support vector machine (SVM) [35], XGBoost [36], logistic regression classifiers [37] and Multilayer perceptron (MLP) [38] are compared for classification. Linear discriminant analysis (LDA) finds the probability of an input belonging to the various classes and predicts based on the highest probability. SVM is a very popular ML algorithm for different applications for non-linear classification using high-dimensional feature spaces. XGBoost is a supervised ML algorithm that can be used for the training data with multiple features. Logistic regression is a commonly used medical statistics-based supervised ML model, dedicated to classification tasks. The logistic function is a sigmoid function that contracts the real continuous values into a probability of [0, 1] [37]. A multilayer perceptron (MLP) is a type of feedforward artificial neural network (ANN), which is made up of at least three layers of nodes: an input VOLUME 9, 2021 layer, a hidden layer and an output layer. MLP utilizes a supervised learning technique called backpropagation for training [38].
Different classification models were compared using the top-10 ranked features from the testing data to calculate the performance matrices in classifying death and survival cases. The best performing classifier among the aforementioned classifiers was evaluated for different combinations of features as input to the model by calculating the receiver operating characteristic (ROC) -area under the curve (AUC) and performance metrics such as Precision, Sensitivity, Specificity, Accuracy, and F1-Score. Since the model development dataset was made balance using SMOTE technique [39], the threshold for the ROC calculation was 0.5 [40]. Different classification algorithms and different features' combination of the best performing algorithm were validated using 5-fold cross-validation where training and testing were done on 80% and 20% data, respectively, and this process was repeated 5-times to test the entire dataset. Since some CBC parameters are present in count and percentage forms, top-ranked 10 features from 25 feature sets were identified and investigated with count parameters and with percentage parameters. Weighted average within 95% confidence interval was calculated for sensitivity, specificity, precision, F1-score, and overall accuracy from the confusion matrix that accumulates all test (unseen) fold results of the 5-fold cross-validation.
Here, the correct mortality prediction of dead patients is True Positive(TP), and the correct mortality prediction of survived patients is True Negative (TN). The incorrect mortality prediction of dead patients as survived is False Negative (FN) and the incorrect mortality of survived patients as dead is False Positive (FP).

E. LOGISTIC REGRESSION-BASED NOMOGRAM
Nomogram is a two-dimensional graphical tool that consists of several lines scaled and arranged in such a way to be used to predict the outcome probability. This is an important component of modern medical decision-making. In this work, a multivariate logistic regression analysis based nomogram technique was used, which was originally developed by Alexander Zlotnik in Stata/MPv13.0 [41]. The parameters were drawn as a numerated horizontal axis scale and the values for the patient are put on the numerated scale. A vertical line was drawn down from the different horizontal lines to a score axis. All the scores on the score axis were added to make a total score and this was linked to a death probability. It can be noted that a higher score corresponds to a higher death probability.
Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable. Logistic regression uses input values (x) that are combined linearly using weights or coefficient values to predict an output value (y). In a logistic regression model, the outcome variable is modeled to binary values (0 or 1) and the odds are defined by the ratio of the probability (P) of happening an event to the probability of not happening (1-P). Therefore, the probability can vary between 0 and 1 but the odds vary between 0 to infinity. The natural logarithm of odds is the linear prediction which is a linear combination of binary (e.g., gender) or continuous (e.g., age) predictors. Linear prediction can be used to calculate the death probability, as shown below: Different investigations were carried out to identify the best feature combinations for creating the nomogram. The best feature combination was selected based on the performance matrix and the AUC calculated from the ROC curve. To develop the nomogram and validate its' performance, the entire Chinese dataset was divided into two subsets: training (70%) and internal validation (30%). Dhaka Medical College hospital patient cohort was used for external validation. Calibration curves were plotted using the internal and external validation sets to compare the model performance of predicting the outcomes compared to the actual outcomes for patients with COVID-19. Decision curve analysis (DCA) was performed to obtain the threshold values of each CBC parameter individually and in combination to evaluate the model performance using Stata/MPv13.0.

F. EARLY WARNING CBC SCORE
In the development of the prognostic model, CBC parameters derived from an initial blood sample of the patients at their admission were used. However, these patients have multiple blood samples recorded during their hospital stay, which can be used for longitudinal model evaluation as an early predictor of the patients' outcome.
The corresponding probability of death for a given CBC score was determined from the model and two cut-off values were identified based on 5% and 50% of death probability and associated CBC score to group the patients into three groups, such as low, moderate, and high-risk groups. The death probability less than 5% is considered to be in the low-risk group, while probability between 5% and 50% is considered moderate risk group and finally the probability above 50% is considered to be in the high-risk group.

A. PATIENTS' CHARACTERISTICS AND OUTCOME
There are two sets of data used in this study: one was 375 patients from Wuhan Hospital, China and the other one was 103 patients from Dhaka Medical College Hospital, Bangladesh. 375 COVID-19 positive hospital admitted patients were used for model development and internal validation, where 46.4% (174) were died and 53.6% (201) were discharged from hospital after recovery. The model was externally validated on 103 COVID-19 positive patients, where 59.2% (61) patients were survived and 40.8 % (42) patients were died. For 375 patients from Wuhan hospital, th minimal, maximal, and median hospital stay of the patients before outcomes (death or discharge) were 0 days, 35 days, and 12 days, respectively. On the external validation set, for 103 patients, the minimum, maximum, and median hospital stay before death or discharge were 5 days, 25 days, and 9 days.
Depending on the patient' outcome, 375 patient' data were summarized in Table 1(A).59.7% (224) and 40.3% (151) patients who were male and female, respectively with a mean age of 58.83 ± 16.46 years. 76 demographic and laboratory parameters are available in the development dataset however, only 23 CBC parameters and two demographic parameters were used for this study.
Missing variables in the dataset were imputed using the KNN algorithm. Detailed characteristics of the 25 parameters were listed in Table 1 and it was evident from the chi-square and ranked-sum test that some parameters are statistically insignificant (p > 0.05) while others are statistically significant (p < 0.05) in predicting the death outcomes of the patients. It was found that age, gender, neutrophils (%), lymphocyte (%), eosinophil (%), monocyte (%), platelet count, red blood cell distribution width, white blood cell count, mean platelet volume, basophil (%), platelet large cell ratio, PLT distribution width, eosinophil count, neutrophils count, mean corpuscular hemoglobin, ESR, basophil count, and lymphocyte count had a statistically significant difference between death and survival group while hemoglobin (%), Mean corpuscular volume, red blood cell count, mean corpuscular hemoglobin concentration, and monocyte count are statistically insignificant among the different groups.

B. SELECTION OF CLASSIFIER AND FEATURE RANKING AND TUNING
A random forest feature ranking algorithm was used to identify top-ranked 10 features among the 16-statistically significant features (Figure 2). These top-ranked 10 features were investigated with 5 different classifiers to identify the best performing classification model and results are reported in Table 2. Logistic regression is outperforming other networks in the binary classification problem using the topranked 10 features. It provides overall accuracy, and weighted precision, sensitivity, specificity, and F1-score of 88%, 88%, 87%, and 90% respectively. In the rest of the study, therefore, logistic regression was used as the classifier. It was also important to check the most useful variables for the early prediction of death among the top-10 features.
To determine the association of the independent variables with the outcomes, classification using logistic regression was performed with Top-1 to Top-10 features. Figure 3 clearly shows that the top-ranked 9 features produce the highest value of AUC (0.95). Table 3 shows the overall accuracies and weighted average performance for the other matrices for the different models using Top 1 to 10 features for 5-fold cross-validation using the logistic regression classifier. Top-9 features produce the best performance with AUC = 0.95 and verall accuracy, and weighted precision, sensitivity, specificity, and F1-score of 90%, 90%, 91%, 90%, and 90%, respectively (Table 3). However,both neutrophils and lymphocytes were present in percentage and count in those top-9 features. Therefore, it is necessary to investigate the performance of those features with and without the percentage of neutrophils and lymphocytes. Figure 4(A) shows the ROC curves for the best 8-features considering neutrophils and lymphocytes as a percentage only (excluding neutrophils and lymphocytes as count) while Figure 4(B) shows the ROC curves for the best 8-features considering neutrophils and lymphocytes as count only (without neutrophils and lymphocytes as a percentage).
It is clear from Figure 4B that Top-7 features without neutrophils and lymphocyte percentages while considering their count parameters provides the same AUC (0.95) as was obtained from Top-9 features while those parameters were present both in percentage and count (as shown in Figure 2). However, the percentage of neutrophils and lymphocytes did not outperform with neutrophils and lymphocytes count. This performance is further verified in Table 4, where the features with neutrophils and lymphocyte counts performed better using 7 features (Table 4). Top-ranked 7 features were: neutrophils count, lymphocyte count, age, monocyte (%), platelet count, red blood cell distribution width, and white blood cell count. The performance of the best combination of features for both the experiments can also be seen in the form of confusion matrix in Figure 5. These were used for the nomogram creation and scoring technique development and validation.

C. LOGISTIC REGRESSION BASED NOMOGRAM
Stata/MPv13.0 was used to derive a multivariate logistic regression-based nomogram using (1000 times) bootstrapping technique. Logistic regression coefficients, standard error, the ratio of regression coefficient and its standard error, the significance of z, 95% confidence interval (CI) of z were reported in Table 5. Z-value, which is the regression coefficient/standard error, generally shows the strength of predictors in the prediction of outcome. A high positive or negative z-value represents a strong predictor while zero represents a weak predictor. Table 5 shows that out of the 7 parameters white blood cell count is a very weak predictor while the other six variables are good predictors. However, logistic regression classifier' performance showed in Figure 4 and Table 4 demonstrated that 7-variables outperform 5 variables. Therefore, no variable was discarded out of these 7-variables in developing the nomogram. Figure 6 shows the calibration curves for internal (A) and external validation (B). External validation was done using the dataset collected at Dhaka Medical College and confirms the reliability of the developed model with an AUC of 0.963. The Decision Curve Analysis (DCA) can be seen in Figure 7, which can prove the clinical utility of the model. It is evident from Figure 5 that the performance is the best using the model compared to the performance using all the features or the individual features. This indicates that all of them contributed to the prediction of outcomes and also confirmed the need to combine seven predictors in the model.
As shown in Figure 8, the nomogram is comprised of 8 rows while rows 1-7 are representing independent variables. For each variable, an assigned score was obtained by drawing a downward vertical line from the value on the variable axis to the ''Score'' axis using COVID-19 patient data. The points of the seven variables correspond to the score (row 8) and the scores were added up to the total score, as shown in row 8. Then a line could be drawn from the ''Total Score'' axis to the ''Prob'' axis (row 9) to determine the death probability of COVID-19 patients. However, it is useful to derive the mathematical equations explaining the total score, linear prediction, and death probability based on which the score is calculated:   The corresponding probability of death for a given risk score was determined from the model and is listed in Table 6. In particular, risk score cut-off values of 10.8 and 10.96 corresponded to 5% and 50% of death probability, thus these values can be used to stratify COVID-19 patients into three groups: low, moderate, and high-risk groups. The death probability was less than 5%, between 5% and 50 %, and more than 50 % for the low-risk group (Score < 10.8), moderate risk group (10.8 ≤ Score ≤ 10.96), and high-risk group (Score > 10.96), respectively.

D. PERFORMANCE EVALUATION OF THE MODEL
The authors have categorized, the patients from the internal cohort in training and testing subgroups as well as an external cohort into three subgroups (low, moderate, and high-risk) by associating actual outcome with the predicted outcome using the score. For the internal training set (Table 7A), the proportions of death were 1.2% (1/183) for the low-risk group, 23.33% (14/60) for the moderate-risk group, and 90.75% (108/119) for the high-risk group while for the internal test set (Table 7B), the proportions of death were 0% (0/36) for low-risk group, 21.74% (5/23) for moderate-risk group and 85.19% (46/54) for the high-risk group. For the external test set (Table 7C), the proportions of death were 0% (0/42) for the low-risk group, 26.32% (5/19) for the moderate-risk group, and 88.1% (37/42) for the high-risk group. It was found that the true death rates were significantly different (p < 0.001) among the three subgroups. Therefore, this nomogram-based VOLUME 9, 2021  scoring technique can be used to early predict patient' outcomes to categorize them into low, moderate, and high-risk groups as shown in Table 6, and prioritize the moderate and high-risk group patients. Figure 9 shows an example nomogram-based scoring system for a COVID-19 patient with the variable values at admission. Individual scores for each predictor were calculated and added to produce the total score and death probability was calculated to 99%. This was done as early as 5 days before the death of the patient.

IV. DISCUSSION
The current study looked into the correlation between disease severity and clinical data from complete blood count (CBC) test. Based on the data collected at the time of hospital admission, the Random Forest algorithm classified ten predictors as death probability predictors. To find the best performing classification model, these top-ranked 10 features were investigated with 6 different classifiers. Classification using logistic regression with Top-1 to Top-10 features was used to evaluate the independent variables' relationship with the outcomes. Figure 2 clearly illustrates that the Top-9 features deliver the highest AUC (0.95) value. Table 2 demonstrates the overall accuracies and weighted average results for other matrices and the confusion matrices for various models using the Top 1 to 10 features for 5-fold cross-validation features using the logistic regression classifier. The Top-9 features-based model shows the best result with an AUC of 0.95 and overall accuracy, weighted precision, sensitivity,  specificity, and F1-scores of 0.9, 0.9, 0.9, 0.91, and 0.90, respectively. Neutrophils and lymphocytes, on the other hand, were both presents in those Top-9 features in terms of percentage and count. As a result, it is important to investigate VOLUME 9, 2021 FIGURE 7. Decision curves analysis comparing different models to predict the death probability of patients with COVID-19. The net benefit balances the mortality risk and potential harm from unnecessary over-intervention for patients with COVID-19. which of those features with and without the percentage of neutrophils and lymphocytes, should be used in the model development.
Previous research on the Coronavirus family, such as SARS [42], Middle East respiratory syndrome (MERS) [43], and COVID-19 [44], showed that age is a primary predictor of mortality. This research came to similar conclusions since as people get older, immunosenescence and/or multiple medical problems make them more susceptible to severe COVID-19 illness [14].  Both neutrophils and lymphocytes are essential components of the immune system since they aid in host defense and dis-infection. They can be represented in terms of count or percentage or ratio (Neutrophil-Lymphocyte Ratio -NLR). Lymphopenia, a medical disorder characterized by a reduction in the number of lymphocytes in the blood, is a common symptom in COVID-19 patients and may be a major factor in disease severity and mortality [45]. In our investigation, we have found that the percentage of neutrophils and lymphocytes were impactful and also confirmed the previous research results that a lower percentage of these two concentrations was correlated with severe COVID-19 patients [46]. Patients with community-acquired pneumonia have substantial immune system activation and/or immune dysfunction, leading to changes in these amounts [45]. Furthermore, as particular anti-inflammatory cytokines induce immunosuppression and lymphocyte apoptosis, bone marrow circulates neutrophils, resulting in a rise in NLR [47]. However, in comparison to other models, both parameters for high-risk patients were found to be small in this sample.
Lu et al. [6] showed to predict confirmed or suspected short-term patients mortality associated with COVID-19.
Hepatocytes produce CRP in response to leukocyte-derived cytokines induced by infection, inflammation, or tissue damage [48]- [50]. It was found in this report, which assessed increased CRP levels at admission for COVID-19 patients with high mortality risk. This suggested that these patients had developed a severe lung inflammation or probably a secondary bacterial infection, which needs clinical antibiotic treatment [50].
Non-survivors had lower lymphocyte and neutrophil percentages, as well as higher age than survivors [14]. COVID-19 severity was significantly related to the inflammatory response to the infection, in addition to dysregulation of the coagulation system and/or immune system. This could result in more serious medical issues such as ARDS, septic shock, and coagulopathy among other diseases. As a result, this type of prognostic model will help in the creation of a fair and customized treatment plan for critically ill patients.
In this study, seven key predictors acquired at admission were chosen using Random forest feature selection to construct a nomogram-based prognostic model with excellent calibration and discrimination in predicting COVID-19 patients' death probability. It was also tested on an external validation cohort. Furthermore, the model was validated using various blood sample data obtained from patients during their hospital stay, and it was found to be accurate in those cases as well. The AUC values for the development set, internal validation, and external validation cohorts were 0.954, 0.883, and 0.96, respectively. Further-more, this nomogram-derived risk score provided a clear, easy-to-understand, and interpretable early warning method for stratifying high-risk COVID-19 patients at admission and assisting clinical management. Using this risk score assessed and determined at admission, COVID-19 patients were divided into three risk categories, each with a different risk of death. Low-risk cases could be separated and handled in an isolation unit, while moderate-risk patients could be treated in the hospital's isolation ward. Patients in the highrisk group, on the other hand, can be closely monitored and, if possible, referred to essential medical facilities or the intensive care unit (ICU) for immediate care.
The study suggests that research on COVID-19 clinical data may aid in early mortality prediction. In this study, we have developed the model and confirmed its performance using five-fold validation. Furthermore, the model was verified with a completely unseen data from a different county and the performance was still very reliable, as can be seen from the results in Figure 6 B.

V. CONCLUSION
In conclusion, the nomogram-based scoring technique can predict the risk of COVID-19 patients with good discrimination and calibration based on multiple CBC predictors (Neutrophils count, Age, Platelet count, Monocytes, WBC, Lymphocyte count, Red blood cell distribution width). The model has a high degree of precision in predicting the patient's outcome much earlier than the real clinical outcome. The model was tested on a completely unknown external dataset, i.e. the dataset collected from Dhaka Medical College, Bangladesh while developed on Chinese dataset. The authors have explored the various combination of feature selection technique, features and machine learning classifiers in this study with state of the art performance which was deployed as a web-application for clinical use (Figure 10). A mobile application or web-application deployment is suitable for clinical parameters compared to deep learning approach on a smaller dataset [52]. The proposed scoring technique would assist clinicians in creating an effective and optimized patient stratification management strategy without overburdening healthcare resources, as well as minimizing mortality by providing support to the severe patients earlier.
The authors are collecting a multi-country and multi-center larger dataset to increase the model performance and robustness by using a large dataset.

DISCLOSURE OF POTENTIAL CONFLICTS OF INTEREST
The authors declare that they have no conflict of interest.

ETHICAL APPROVAL
This study was carried out by two clinical datasets where the development and internal validation were carried out by the publicly available dataset by Yan et al. [19] and external validation was done by COVID-19 clinical data collected from Dhaka Medical College hospital, Bangladesh with the ethical approval from the hospital ethics committee. COVID-19 patients' identification was removed before the data were shared with the researchers in Qatar and medical doctors in Bangladesh were involved in data acquisition and de-identification. COVID-19 patient data collected from Wuhan Hospital by Yan et al. [19] was approved by the Tongji Hospital Ethics Committee. SOMAYA AL-MAADEED received the Ph.D. degree in computer science from Nottingham University. She supervised students through research projects related to computer vision, AI, and biomedical image applications. She is currently a Professor with the Computer Science Department, Qatar University. She is the Coordinator of the Computer Vision Research Group, Qatar University. She enjoys excellent collaboration with national and international institutions and industry. She is the principal investigator of several funded research projects related to medical imaging. She published extensively in computer vision and pattern recognition and delivered workshops on teaching programming. In 2015, she was elected as the IEEE Chair of the Qatar Section. She filled IPs related to cancer detection from images and light. He has been awarded a certificate of excellence at his training program for two consecutive years 2018 and 2019 and was nominated for the Rising Star Certificate as the best performing plastic surgery trainee, in 2020. He has attended several training clerkships and courses in several countries, such as the USA, Germany, Lebanon, Jordan, and India. He has over 15 published articles in the field of plastic and reconstructive surgery, clinical epidemiology, and COVID19, with many national presentations and several international conference presentations.
SUHAIL A. R. DOI is currently an Associate professor and the Head of the Department of Population Medicine, College of Medicine, Qatar University. He works at Australian National University (ANU). Prior to joining the ANU, he was the Head of the Clinical Epidemiology Unit, the Chair of the Population Health Domain, School of Medicine, the Director of Epidemiology programs, and an Associate Professor of Clinical Epidemiology, The University of Queensland. He has also worked as a Specialist in endocrinology abroad and continues to do so part-time after coming to Australia. He has published widely and has more than 160 publications listed on PubMed, written a core text on clinical epidemiology, and edited a book on the methods of clinical epidemiology as part of the Springer series in epidemiology and public health. His research interests include research that addresses unanswered questions in patient care as well as questions related to methods of research design and analysis used in medicine. Thus, his research focuses on patient care topics, such as epidemiology, prognosis, and treatments of disease as well as methodology especially that related to meta-analysis. He is an Associate Editor of the journal Clinical Medicine & Research and a reviewer for several journals in endocrinology, medicine, and public health.
MUHAMMAD E. H. CHOWDHURY (Senior Member, IEEE) received the Ph.D. degree from the University of Nottingham, U.K., in 2014. He worked as a Postdoctoral Research Fellow and a Hermes Fellow with Sir Peter Mansfield Imaging Centre, University of Nottingham. He is currently working as an Assistant Professor with the Department of Electrical Engineering, Qatar University. Before joining Qatar University, he worked in several universities in Bangladesh. He has two patents and published around 80 peer-reviewed journal articles, conference papers, and four book chapters. His current research interests include biomedical instrumentation, signal processing, wearable sensors, medical image analysis, machine learning, embedded system design, and simultaneous EEG/fMRI. He is currently running several NPRP and UREP grants from QNRF and internal grants from Qatar University along with academic and government projects. He has been involved in EPSRC, ISIF, and EPSRC-ACC grants along with different national and international projects. He has worked as a Consultant for the projects entitled ''Driver Distraction Management Using Sensor Data Cloud'' (2013-2014, Information Society Innovation Fund (ISIF) Asia). He is an Active Member of British Radiology, the Institute of Physics, ISMRM, and HBM. He received the ISIF Asia Community Choice Award 2013 for a project entitled Design and Development of Precision Agriculture Information System for Bangladesh. He has recently won the COVID-19 Dataset Award for his contribution to the fight against COVID-19. He is serving as an Associate Editor for IEEE ACCESS and a Topic Editor and a Review Editor for Frontiers in Neuroscience. VOLUME 9, 2021