Artificial Intelligence-Based Cyber-Physical System for Severity Classification of Chikungunya Disease

Background: Artificial intelligence techniques are widely used in solving medical problems. Recently, researchers have used various deep learning techniques for the severity classification of Chikungunya disease. But these techniques suffer from overfitting and hyper-parameters tuning problems. Methods: In this paper, an artificial intelligence-based cyber-physical system (CPS) is proposed for the severity classification of Chikungunya disease. In CPS system, the physical components are integrated with computational algorithms to provide better results. Random forest (RF) is used to design the severity classification model for Chikungunya disease. However, RF suffers from overfitting and poor computational speed problems due to complex architectures and large amounts of connection weights. Therefore, an evolving RF model is proposed using the adaptive crossover-based genetic algorithm (ACGA). Results: ACGA can efficiently optimize the architecture of RF to achieve better results with better computational speed. Extensive experiments are performed by utilizing the Chikungunya disease dataset. Conclusion: Performance analysis demonstrates that ACGA-RF achieves higher performance as compared to the competitive models in terms of F-measure, accuracy, sensitivity, and specificity. The proposed CPS system can prevent users from visiting hospitals and can render services to patients living far away from hospitals.


I. INTRODUCTION
The Healthcare system plays a vital role in the development of any nation.Government is responsible to design proper healthcare policies to protect their citizens from any outbreak of diseases [1].Hence, the outbreak of any new disease such as coronavirus is a major challenge for the healthcare system.Several viruses exist that can affect both animals and human beings.Chikungunya is one of the viruses that can be spread very rapidly and may create a big problem for the health system.Two basic types of infected mosquitoes i.e., Aedes albopictis and Aedes agypti transmit this virus in the human body [2].The symptoms of Chikungunya are joint pain, sudden high fever, and rash.Some infected persons have headaches, fatigue, digestive complaints, and conjunctivitis [3].The symptoms of Chikungunya are very similar to dengue because the same mosquito carries both viruses [4].But in Chikungunya, joint pain is more severe as well as redness of the eyes.The symptoms of sore throat are different from dengue infection.Chikungunya may not cause death.As per literature, it is found that the patient VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/recovers within a week of this disease [5].But, the joint pain may last for a few months.The doctors start the treatment by perceiving the symptoms of patients.However, the exact measurement of these symptoms is not possible.Therefore, the treatment of patients may not be effective.
Instead of symptoms found in the patients, reverse transcription-polymerase chain reaction (RT-PCR) and serological tests are used to diagnose Chikungunya.Both tests require blood samples of patients [6].However, these tests are unable to provide reliable performance for this disease.Supervised learning techniques such as machine learning and deep learning can be used to evaluate the severity of this disease by considering the symptoms of patients and laboratory tests [7].The severity classification of Chikungunya infected persons is still an ill-posed problem.
In [8], a fog-based framework for Chikungunya disease diagnosis was designed.J48 was utilized to classify Chikungunya infected patients.In [9], wearable internet of things (IoT) and fog-based framework for classification and controlling the Chikungunya disease was proposed.Fuzzy-C means (FCM) classifier was utilized for Chikungunya classification.But J48 [8] and FCM [9] suffer from overfitting and hyper-parameters tuning problems.In [10], a particle swarm optimization-based ANFIS (PANFIS) model was implemented for the diagnosis of Chikungunya disease.Initially, an adaptive neuro-fuzzy inference system (ANFIS) classifier was used to classify the infected patients.Thereafter, particle swarm optimization (PSO) was utilized to overcome the parameter tuning problem with ANFIS.It achieved remarkable results compared to artificial neural networks (ANN).But, PANFIS [10], [11] suffers from the over-fitting problem.Also, sometimes PSO may be stuck in local optima and suffers from premature convergence problems [12], [13].
Therefore, to overcome the over-fitting and hyperparameters tuning problems, an efficient evolving Random forest (RF) model is proposed for the severity classification of Chikungunya disease.The main contributions of this paper are as follows: 1) A cyber-physical system (CPS) based severity classification model is proposed for Chikungunya disease.
In CPS system, the physical components are integrated with computational algorithms to attain better results.2) Evolving RF model is proposed for severity classification of Chikungunya disease.An adaptive crossoverbased genetic algorithm (ACGA) is utilized to evolve RF model.3) Deep learning model is also implemented and compared with ACGA-RF for severity classification of Chikungunya disease.The remainder of this paper is organized as follows.Section II presents the related work.The proposed ACGA-RF model is described in Section III.Performance analyses are presented in Section IV.The concluding remarks are discussed in Section V.

II. RELATED WORK
Artificial intelligence techniques are widely used in solving medical problems.Recently, researchers used machine learning techniques for the classification of Chikungunya disease.Hossain et al. [5] utilized the different symptoms of patients for the accurate assessment of Chikungunya disease.Their proposed framework collected the data from the interviews of patients.They used a belief-based rule system for predicting the level of Chikungunya.Their model attained an accuracy of 92%.Yang [14] developed a decision system for the diagnosis of Chikungunya disease.The neural network was used for classification by considering the uncertainty of the disease's symptoms.However, the uncertainty of some symptoms is not considered in this approach.Ganesan et al. [15] presented three different models to diagnose the Chikungunya disease.However, these models require human intervention for the assessment of this disease.Caicedo-Torres et al. [16] proposed a machine learning-based classifier for differentiating the dengue and Chikungunya patients.Their classifier was tested on 447 patients.The logistic regression model outperformed the other models.The accuracy obtained from logistic regression was 87%.
Ibrahim et al. [17] presented the backpropagation method for predicting the epidemic disease.They used epidemic disease factors for prediction.Thereafter, these factors were applied to the clustering technique.Their method is capable to identify the epidemic disease using feature classification.Coelho et al. [18] used a transfer learning model for predicting mosquito-borne diseases.They used time-series data from two Brazilian cities.Both the long short-term memory neural network model and random quantile forest model provided the same prediction performance.Caicedo-Torres et al. [19] utilized the machine learning techniques for envisaging the morbidity of Chikungunya in Colombia.Kernel ridge regression was used for forecasting the Chikungunya cases.Crossvalidation and mean absolute error were used.
Sippy et al. [20] developed two prediction models on the Machala dataset.The first model namely the severity index for suspected arbovirus model (SISA) that utilized demographic data.Another model namely the severity index for suspected arbovirus (SISA1) with laboratory utilized the laboratory data.The accuracies obtained from SISA and SISAl were 91% and 95%, respectively.Both models are capable to envisage arbovirus hospitalization.
Shimpi et al. [21] used a backpropagation algorithm to predict the Chikungunya disease.Five gradient-based optimization techniques were used.The pre-processed features were applied to the backpropagation algorithm for classification.The accuracy obtained from this model was 95%.However, a small dataset was used for validation purposes.Eng et al. [22] used machine learning techniques for predicting the binding affinity of T-cell epitopes of Chikungunya.They built prediction models for identifying binders and nonbinder.This model will be helpful for vaccine development.From the existing literature, it is found that the majority of the existing models suffer from hyper-parameters tuning, (i.e., optimization of initial control parameters), over-fitting and data insensitivity problems [23], [24].Hence, there is a need to develop an efficient model for severity classification for Chikungunya disease.

III. PROPOSED FRAMEWORK
Motivated from [8], [10], an efficient model is proposed in this paper for diagnosis of Chikungunya disease.Figure 1 shows the proposed artificial intelligencebased cyber-physical system (CPS) for the diagnosis of Chikungunya disease.There are two main components in the proposed model, i.e., physical space and cyberspace.In physical space.the data related to users' health is collected and forwarded to cyberspace for predicting the severity of the Chikungunya virus [25], [26].In cyberspace, there are two sub-layers, i.e., the cloud subsystem layer and data classification layer [27], [28].In the cloud subsystem layer, data and the proposed trained model are stored.At the data classification layer, the severity prediction of the Chikungunya virus is done in real-time by applying the proposed ACGA-RF model after extracting the potential features.Finally, doctors are involved for further assessment of the results.

A. PHYSICAL SPACE
In physical space, there is a data acquisition layer that collects users' personal data, symptoms related to Chikungunya, and records of initial screening with the help of sensors.
The Chikungunya dataset is obtained from [8] and [9].Sood and Mahajan [8] and [9] built the dataset by taking the symptoms-based dataset from [29] that contains eleven health features, i.e., abdominal pain, muscle pain, bleeding disorder, fatigue, eyes pain, itching, nausea, sore throat, joint pain, skin rash, and fever, of 2367 patients.Dataset with 5032 cases comprising environmental variables was taken from [30] and [31].Monthly climate variables, i.e., rainfall, temperature, and humidity, were taken from [32]- [34].The symptoms-based data was combined with climate and environmental features to validate ACGA-RF (for further details refer to [8] and [9]).Table 1 shows the common features of Chikungunya virus.It contains several sensing devices to observe the health of patients.The observed information is then transferred to the data classification layer to classify the health of a particular client.The description of the dataset is mentioned in Table 1 (refer [8], [10]).Chikungunya symptoms such as headache, exposure to the risky area, nausea, fever, rash, vomiting, etc.

B. CYBER SPACE
Cyberspace is comprised of two sub-layers, i.e., the cloud subsystem layer and the data classification layer.The detail of registered users and the proposed trained model is stored in cloud subsystem layers.Whereas, at the data classification layer, diagnosis of the Chikungunya virus is done by applying ACGA-RF to the potential features.Also, RF-based model is used as it has the abilities of generalization, learning, fault tolerance, and adoption.However, RF model is sensitive to its control parameters.An efficient tuning of these control parameters can improve the performance of RF.Therefore, to automatically optimize control parameters of RF, an ACGA-RF model is proposed.Finally, the predicted severity report of a patient is transmitted to doctors for further treatment.

IV. PROPOSED EVOLVING RANDOM FOREST MODEL
This section discusses the proposed evolving random forest model.Initially, RF model is briefly discussed to understand its basic notations and hyper-parameters.Random forest (RF) is an ensemble of classification/regression trees [35], where every tree shows a mapping from feature space to the response.Trees can be obtained either using a subsampled data set of actual data or bootstrapped.Every tree is conditionally independent of one another.However, RF model is sensitive to its architecture and hyper-parameters.Thus, an adaptive crossover genetic algorithm (ACGA) is utilized to obtain the optimized architecture and hyperparameters of RF.
In literature, genetic algorithm is widely accepted to optimize various classifiers such as deep learning models [23], [36].Genetic algorithms utilize crossover and mutation operators during evolution phases to obtain the final solution.It has been found that the selection of efficient crossover and mutation operators is a challenging problem.To efficiently select crossover operator(s), Xue et al. [37] proposed an adaptive crossover genetic algorithm (ACGA).A group of crossover operators was utilized during the evolution process.Based on the performance of crossover operators, roulette wheel selection was utilized to select a specific crossover.In this paper, ACGA [37]  init_CA(): Dominated offsprings are then evaluated by using the actual and child offsprings.The respective outcomes are saved in rW and pL.P δ is computed during M /2 th step.
Penalty and reward of offsprings is saved in nP IT ×C and nR IT ×C , respectively.After IT number of phases, A S is updated by considering nP IT ×C and nR IT ×C .All steps are repeated until the termination criterion (i.e., FE) is satisfied.
The succeeding subsections discuss the steps of ACGA.

A. FITNESS FUNCTION
Fitness function is designed to optimize RF by using sensitivity and specificity.It is defined as: Here, X is an offspring.f 1 and f 2 represent the sensitivity and specificity parameters, respectively.

26:
Optimized values ← T P 27: end while 28: return Optimized values computational speed compared to the existing crossover operators [37].
Reduced surrogate [41], [42] can avoid unnecessary crossover operations when parents have similar offspring.Initially, it evaluates parents and forms a group of crossover points where both parents have different genes.In the absence of such a crossover point, no crossover operator is implemented.Chaotic crossover [40] can obtain a better converged and distributed group of Pareto-optimal offspring.

C. PENALTY AND REWARD
Penalties and rewards are allocated to offsprings by utilizing two matrices namely pL and rW as: Pareto optima among the offsprings is used to modify rW and pL.

1) PARENT IS NON-DOMINATED
Pareto optima is evaluated between child and respective parent offsprings.If child offsprings are dominated by parents, then append pL q by 1, otherwise append rW q by 1.The pseudocode of updation of penalty and reward is depicted in Algorithm 2.

2) DOMINATED PARENT
If parent 1 (ρ 1 ) is dominated by parent 2 (ρ 2 ), then pareto optima of each child offspring is compared with ρ 2 .If child offspring is dominated by ρ 2 , then append pL q by 1. Otherwise, update rW q by 1.

D. UPDATION OF ADAPTIVE CROSSOVER SELECTOR
A S is utilized to update the crossover selection probabilities.During evolution process, it is implemented after every IT steps (refer [43]).Two matrices, i.e., nP IT ×C and nR IT ×C are used to hold the values of pL and rW , respectively.Recently updated IT 's pL and n R values are utilized to modify A S .To evaluate the probability for q th (q = 1, 2, . . ., C) crossover, addition of q th column is utilized (refer [43]).

V. PERFORMANCE ANALYSIS
To analyze the efficiency of proposed model, health-related attributes are collected from [8], [34], [44], [45].It mainly consists of attributes such as age, sex, location, fever, skin  rashes, and joint pain.It is collected for approximately 10000 users.Around 1805 users are at no risk while 3890 are at normal risk and 4305 are at high risk.The acquired data is stored in cloud storage and is used for recognition by an optimized RF model.ACGA-RF is compared with J48, SVM, ANN, RF, adaptive neuro-fuzzy inference system (ANFIS), PANFIS [10], and deep learning (DL).
To implement DL model, various layers (i.e., feature input layer, fully connected layer, batch normalization layer [46], [47], ReLU layer [46], [47], softmax layer [46], [47], and classification layer) are utilized.For normalizing the input data, Z-score normalization [48] is used.Minibatch size is set to 8. Adam [49] a stochastic gradient descent optimizer is used to achieve the better convergence of DL model.The performance of RF achieves the best training and validation accuracy values are 99.5% and 97.6%, respectively.Although it shows remarkable results, the performance of the proposed approach is still far from the optimal results.It shows an over-fitting problem as there is a high difference between training and validation accuracy values.The performance of ACGA-RF model achieves the best training and validation accuracy values are 100.0%and 99.6%, respectively.Thus, ACGA-RF model achieves remarkable results than both deep learning and RF models.Also, ACGA-RF is least affected by the over-fitting issue there is only a 0.4% difference between the training and validation accuracy which are 1.7 and 1.9 for deep learning and RF, respectively.Also, ACGA-RF shows an enhancement in validation accuracy over deep learning and RF as 1.3% and 2.0%, respectively.

B. CONFUSION MATRIX ANALYSIS
To evaluate the performance of ACGA-RF, confusion matrix analyses are also achieved.Table 4 depicts the confusion matrix analysis proposed ACGA-RF, RF, and deep learning (DL) models.It is found that RF achieves the testing accuracy of 97.54%.It is found that the deep learning (DL) model achieves the testing accuracy of 98.31%.DL shows an average improvement of 0.77% over RF model.It is found that ACGA-RF achieves an accuracy of 99.43%.Thus, the proposed ACGA-RF achieved an average improvement of 1.89% and 1.12% over RF and deep learning-based severity classification models, respectively.

C. COMPARATIVE ANALYSIS
Boxplot and ANOVA are used for statistical analysis.The hypotheses for every performance measure can be defined as:   Figures 3 -6 show that H A is accepted for all the considered performance metrics as the evaluated p − values are lower than 0.01.Thus, the performance of different models is significantly different from each other.But, it is not possible   to find which technique outperforms the others.Thus, the boxplots are obtained to evaluate which technique performs significantly better than the others (see Figures 7 to 10).
Figure 7 demonstrates the accuracy analysis among ACGA-RF and the existing recognition Chikungunya disease recognition models.It is clearly shown that ACGA-RF achieves remarkably significant and consistent accuracy values.Compared to the existing Chikungunya disease recognition models, ACGA-RF achieves 1.3822% improvement in terms of accuracy.
Figure 8 shows F-measure analysis among ACGA-RF and the existing Chikungunya disease recognition models.ACGA-RF outperforms the existing Chikungunya disease recognition models by an average improvement of 1.4145%.Figure 9 shows the specificity analysis among the existing models and the proposed Chikungunya disease recognition model.ACGA-RF achieves a 99.21% (median) value, which is significantly better than the competitive models by an average improvement of 1.3972%.Therefore, ACGA-RF can efficiently diagnose normal risk Chikungunya disease-infected patients.Figure 10 shows the sensitivity analysis among the existing models and proposed Chikungunya disease recognition model.ACGA-RF achieves 99.24% (median) value, which is significantly better than the competitive models by 1.4172%.Therefore, ACGA-RF can efficiently diagnose the high risk Chikungunya disease patients.
It is found that ACGA-RF takes on an average of 1 hour 36 minutes during the training process.During the testing process, it takes only 2.746 minutes to achieve the results.Additionally, ACGA-RF provides the testing results on an average of 1.674 seconds for a single patient.Therefore, ACGA-RF can be used for real-time applications.

D. DISCUSSION
Table 5 shows the comparison among the proposed model with state-of-the-art machine learning and deep learning models.It is found that in [8], J48 was utilized for severity classification of Chikungunya disease.It achieved the sensitivity, specificity, F-measure, and accuracy values as 0.935, 0.965, 0.827, and 92.7865, respectively.Fuzzy-C means (FCM) [9] achieved the sensitivity, specificity, and accuracy values as 0.867, 0.888, and 0.934, respectively.PANFIS [10] achieved sensitivity, specificity, F-measure, and accuracy values as 0.9578, 0.9787, 0.9431, and 0.9871, respectively.DL based Chikungunya diagnosis model achieved sensitivity, specificity, F-measure, and accuracy values as 0.9831, 0.9763, 0.9874, and 0.9818, respectively.Compared to these models, ACGA-RF achieved a sensitivity, specificity, F-measure, and accuracy values of 0.9943, 0.9905, 0.9968, and 0.9936, respectively.Therefore, the proposed ACGA-RF achieved significantly better results than the existing models in terms of accuracy, specificity, sensitivity, and F-measure by 1.3822%, 1.3972%, 1.4172%, and 1.4145%, respectively.

VI. CONCLUSION
In this paper, a cloud-based CPS is designed and implemented for the recognition of Chikungunya disease.The proposed system is divided into two main categories, i.e., physical space and cyberspace.Once, the data related to user-health are collected, it is stored in the cloud sub-system layer.An evolving RF model was proposed for the severity classification of Chikungunya disease by using ACGA.ACGA can efficiently optimize RF architecture to achieve better results with better computational speed.The comparative analysis demonstrates that ACGA-RF achieves significantly better testing performance than the existing models in terms of accuracy, specificity, sensitivity, and F-measure by 1.3822%, 1.3972%, 1.4172%, and 1.4145%, respectively.Thus, the proposed Chikungunya disease recognition model is beneficial for real-time medical applications.
In near future, the deep transfer learning models can be used to obtain more efficient results.Further novel metaheuristic techniques can be designed to efficiently tune the deep learning architectures.Also, the proposed model can be applied to other kinds of datasets.

1 )
User Personal Data: This comprises the user's personal data.GPS sensors are used to collect data such as UID, name, age, address, and mobile number.2) Health-related Data: This set consists of vital signs.It provides information related to rashes on the skin, muscle pain, appetite loss, joint pain, fever, redness in the eyes, headache, nausea, sore throat, fatigue, and vomiting.Biosensors are used to collect this information.3) Environmental and location Information: It provides the position of patients, susceptible users, uninfected users, and positions of mosquito dense areas, water quality, humidity, and mosquito breeding sites.These positions are evaluated by Global Positioning System (GPS) sensors to obtain the travel history of every patient.Radio Frequency Identification (RFID) tags and mosquito sensors are also utilized to store the proximity interactions between infected/uninfected/ susceptible users, mosquito densities, and breeding sites.
is utilized to form evolving RF model.Three different crossover operators are used.The working of the proposed ACGA-RF is depicted in Algorithm 1. init_P(): Population P is created using normal distribution by creating M vectors.Each vector represents the architecture and hyper-parameters-related values of RF. init_A(): Adaptive crossover selector (A S ) is utilized by assigning probabilities to each crossover as 1/C.C shows a number of utilized crossovers.init_R(): Given crossover is selected according to roulette wheel selection and probabilities obtained from A S .init_C(): Apply selected crossover on parents' offspring to form child offsprings.init_M(): Mutation operator is utilized to obtain child offspring.Compute child offsprings are saved in the offspring population(P δ ).

FIGURE 2 .
FIGURE 2. Accuracy and loss analysis of the deep learning-based severity classification of Chikungunya disease.

Figure 2
Figure 2 shows the accuracy and loss analysis of the deep learning-based severity classification model for Chikungunya disease.It clearly shows that the deep learning achieves significantly better convergence speed.It achieves 98.3% validation accuracy, therefore, least affected by the impact of over-fitting as training accuracy is 100%.The performance of RF achieves the best training and validation accuracy values are 99.5% and 97.6%, respectively.Although it shows remarkable results, the performance of the proposed approach is still far from the optimal results.It shows an over-fitting problem as there is a high difference between training and validation accuracy values.The performance of ACGA-RF model achieves the best training and

TABLE 4 .
Confusion matrix analysis among the proposed ACGA-RF, DL, GA-RF, PSO-RF, and RF models.

FIGURE 3 .
FIGURE 3. ANOVA analysis of severity classification models for Chikungunya disease in terms of testing accuracy.

FIGURE 4 .FIGURE 5 .
FIGURE 4. ANOVA analysis of severity classification models for Chikungunya disease in terms of testing sensitivity.

FIGURE 6 .
FIGURE 6. ANOVA analysis of severity classification models for Chikungunya disease in terms of testing F-measure.

TABLE 1 .
Data attributes related to Chikungunya virus.

Table 2
[10]cts the personal-parameters of registered users.It presents a brief description of the user's personal parameters such as name, age, gender, address, mobile number, and contact details of guardians (refer[8],[10]).Table3showsVOLUME 10, 2022
µM 7 , H A Means are not equal.(4) where µM i shows various severity classification models for Chikungunya disease.M 7 shows the proposed ACGA-RF.H 0 and H A define the null and alternate hypotheses, respectively.ANOVA table consists of various attributes such as a sum of squares (SS), degrees of freedom (df ), the mean sum of squares (MS), F-statistics (F), and P-value (P).If value of F is lesser than the level of significance, then we can reject H 0 ) and conclude that the models are significantly different from each other.

TABLE 5 .
Comparative analysis among the proposed ACGA-RF, DL, and state-of-the art models.