An IoMT-Based Incremental Learning Framework With a Novel Feature Selection Algorithm for Intelligent Diagnosis in Smart Healthcare

Several recent research papers in the Internet of Medical Things (IoMT) domain employ machine learning techniques to detect data patterns and trends, identify anomalies, predict and prevent adverse events, and develop personalized patient treatment plans. Despite the potential of machine learning techniques in IoMT to revolutionize healthcare, several challenges remain.The conventional machine learning models in the IoMT domain are static in that they were trained on some datasets and are being used for real-time inferencing data. This approach does not consider the patient’s recent health-related data. In the conventional machine learning models paradigm, the models must be re-trained again, even to incorporate a few sets of additional samples. Also, since the training of the conventional machine learning models generally happens on cloud platforms, there are also risks to security and privacy. Addressing these several issues, we propose an edge-based incremental learning framework with a novel feature selection algorithm for intelligent diagnosis of patients. The approach aims to improve the accuracy and efficiency of medical diagnosis by continuously learning from new patient data and adapting to patient conditions over time, along with reducing privacy and security issues. Addressing the issue of excessive features, which might increase the computational burden on incremental models, we propose a novel feature selection algorithm based on bijective soft sets, Shannon entropy, and TOPSIS(Technique for Order Preference by Similarity to Ideal Solution). We propose two incremental algorithms inspired by Aggregated Mondrian Forests and Half-Space Trees for classification and anomaly detection. The proposed model for classification gives an accuracy of 87.63%, which is better by 13.61% than the best-performing batch learning-based model. Similarly, the proposed model for anomaly detection gives an accuracy of 97.22%, which is better by 1.76% than the best-performing batch-based model. The proposed incremental algorithms for classification and anomaly detection are 9X and 16X faster than their corresponding best-performing batch learning-based models.


I. INTRODUCTION
I N RECENT years, the Internet of Things (IoT) in health- care has increased exponentially.The IoT in healthcare is called the Internet of Medical Things (IoMT).The IoMT models comprise a network of medical devices and applications connected through the internet.IoMT enables the provision of real-time patient data to healthcare professionals, which facilitates remote monitoring.The availability of real-time data helps in making informed decisions quickly.Thus, IoMT is revolutionizing the healthcare domain by improving patient-doctor communication, reducing healthcare expenses, and increasing the efficiency of medical processes [1].
According to MarketsandMarkets report [2], the worldwide market for medical devices employing the Internet of Things (IoT) is predicted to grow to USD 94.2 billion by 2026, which was at USD 26.5 billion in 2021, at a compound annual growth rate of 28.9%.Reducing healthcare costs and actively monitoring patients is causing this growth.The global IOT healthcare market [3] is expected to grow with a CAGR of 17.8% from 127 billion USD in 2023 to 289 billion USD in 2028.Advancements in medical devices and healthcare infrastructure and digital transformation across the industry are the primary drivers for this.
In IoMT, various machine-learning algorithms [4], [5] are used to analyze and interpret large amounts of data from medical devices like wearables, sensors, and monitoring systems.These algorithms make use of statistical models which help in detecting patterns and trends in the data, identifying anomalies and unusual readings, making predictions based on the available data, predicting and preventing adverse events, and developing personalized treatment plans for patients.For instance, Iwendi et al. [6] developed an IoMT-based diet recommendation system, Sayeed et al. [7] developed an IoMT-based seizure detection model, Khan et al. [8] developed an IoMT model for elderly monitoring, Siddiqui et al. [9] developed an IoMT-based cancer detection model, Datta Gupta et al. [10] developed a histopathological classification model using machine learning techniques.
Despite the potential of machine learning techniques in IoMT to revolutionize healthcare, several challenges remain.The conventional machine learning models in the IoMT domain are static in that they were trained on some datasets and are being used for real-time inferencing data.This approach does not consider the patient's recent health-related data, i.e., the data that the IoMT devices have provided, since the models are not trained on the recent data.The models need to be aware of the latest health updates of the patient.Further, in the conventional machine learning models paradigm, the models must be re-trained again, even to incorporate a few sets of additional samples.Since the training of the conventional machine learning models generally happens on cloud platforms, there are also risks to security and privacy.
In this work, we propose an IoMT-based edge-enabled incremental learning framework for intelligent diagnosis in smart healthcare.Incremental learning [11] is a machine learning paradigm that allows a model to learn continuously from new data.Incremental learning algorithms are adaptive, flexible, and efficient.The model can learn from new data, adapt to new patterns, and improve its accuracy over time.It is ideal for use in a smart healthcare environment where patient data is constantly generated.The proposed incremental learning framework can handle the data in real-time in an efficient way.The proposed model can also learn data samples one by one.The proposed incremental framework requires less computational resources and storage space than traditional models because it processes and stores only relevant and new data.Thus, the proposed IoMT framework is more cost-effective than the conventional machine learningbased frameworks.On the other hand, conventional models require significant computational resources and storage space to handle large datasets and complex models.Another significant contribution of incorporating incremental learning is privacy: the proposed incremental models are trained locally on edge devices, which can help preserve the confidentiality of sensitive data.
We propose a three-layer edge-based incremental learning framework.The first and last layers are used for data collection and application deployment, respectively, in line with the existing works [7], [8].The novel second layer (edge layer) allows incremental training and inferencing at the edge.To this extent, we proposed two algorithms for incremental classification and anomaly detection on the preprocessed IoMT data, inspired by Aggregated Mondrial Forests (AMF) and Half-Space Trees (HST) algorithms, respectively.Addressing the issue of excessive features, which might increase the computational burden on incremental models, we proposed a novel feature selection algorithm based on the concepts of bijective soft sets, Shannon entropy, and TOPSIS.Detailed experimentation shows that the proposed algorithms perform better than the other incremental algorithms by a good margin.More importantly, the proposed algorithms achieve significant improvement (>13%) in terms of accuracy, and they also reduce the computational time by more than 80x, compared to the conventional batch-based machine learning algorithms.

A. CONTRIBUTIONS
The main contributions of the paper can be highlighted below: i) We propose a novel edge-enabled IoMT-based incremental learning framework that addresses several issues of the conventional machine learning models.ii) We propose an advanced feature selection algorithm based on bijective soft sets, Shannon entropy, and TOPSIS.iii) We propose two incremental algorithms for classification and anomaly detection inspired by Aggregated Mondrial Forests (AMF) and Half-Space Trees (HST) algorithms.iv) We extensively evaluate the proposed algorithms, comparing their performance and computational time concerning the batch-learning-based algorithms.

B. ORGANIZATION
The rest of the paper is arranged as follows.Section II reviews the works related to the proposed framework.Section III describes the overall workflow of the proposed IoMT framework, the proposed feature selection algorithm, and the proposed incremental algorithms for classification and anomaly detection.Section IV discusses the performance of the proposed algorithms in terms of different metrics.Finally, Section V concludes the work.

II. RELATED WORK
This section describes the works related to intelligent diagnosis in smart healthcare systems and identifies the research gaps.

A. IoMT FOR INTELLIGENT DIAGNOSIS IN HEALTHCARE
The article [12] proposes the MSSO-ANFIS framework as a culmination of the Adaptive Neuro-Fuzzy Inference System (ANFIS) and Modified Salp Swarm Optimization (MSSO) algorithm for investigating and identifying the key characteristics of heart disease using IoMT devices.The framework collects patient data, which is stored and transmitted to the cloud constantly and in real time.The data is passed to the hospital administration to identify the disease after a prediction from the ML framework.Levy crow search algorithm [13] is used for feature selection in the proposed framework.
The authors report an accuracy of 99.45% and a precision of 96.54%.

B. INCREMENTAL LEARNING-BASED ANOMALY DETECTION
In the paper [14], the authors addressed the problem of average monitoring performance and feature extraction due to the problem in conventional batch processes of ignoring mutual correlation among process variables and autocorrelation.The issue has been addressed by proposing a Dynamic Hidden Variable Fuzzy Broad Neural network(DHVFBN).Going into detail, Slow Feature Analysis (SFA) addresses non-linearity and captures dynamic features.The model's Quick reconstruction and updating are done using Fuzzy Broad Learning System (FBLS), which reduces the computation time as the model need not retrain after new faults are added.In the paper [15], the authors proposed Online Incremental Learning Algorithm (OILA), which uses a regression-based approach to predict health parameters and incorporates a feedback mechanism to increase the accuracy.Alerts are generated whenever an anomaly is seen in the patient's health parameters.It aims at solving a current challenge with the anomaly detection algorithms-not being able to process the data incrementally and hence needing to improve at anomaly detection and predicting anomaly at a correct instance.The proposed algorithm is compared with the Kalman Filter.Lastly, they have validated their algorithm with real-time health parameter data sets.

C. INCREMENTAL LEARNING-BASED CLASSIFICATION
In [16], a pattern classification system is presented in which classifier learning and feature extraction are simultaneously done online and in one pass.It was achieved by combining some classifier models and incremental principal component analysis (IPCA).The problem the authors recognised with the approach mentioned above and worked on is because of the limitations of IPCA; the training samples had to be learned individually.They propose to address this problem by chunk IPCA, which takes many samples to process at a time.In order to discuss the scalability of IPCA in one-pass incremental learning situations, the classification performance with large-scale datasets was evaluated.The results show that chunk IPCA reduces training time in comparison to IPCA.The authors also studied the influence of the size of given chunk data and the size of initial training data on learning time and classification accuracy.In [17], the authors focus on the forgetfulness aspect in incremental learning, leading to a significant performance drop.The paper surveys class-incremental learning for image classification and performs substantial experiments on thirteen class implementation methods.Another set of experiments included a comparison of class-implemental on multiple large-scale image classifications.

III. PROPOSED METHODOLOGY
This section describes the proposed IoMT-based incremental learning framework in detail.Firstly, we will explain the overall workflow of the framework, followed by a clear description of its components.

A. OVERALL WORKFLOW
Figure 1 represents the proposed framework.The proposed framework broadly consists of three layers -the device, edge, and application layers.The device layer consists of various IoMT devices like SPO2 sensors, smart watches measuring blood pressure and heart beat rate, EEG devices, etc., which monitor the patient and send the signals to the edge layer.The edge layer is the most essential in the proposed framework.In the traditional frameworks for smart diagnosis in healthcare, the machine learning models are deployed on the cloud [18].However, such deployments have latency and security issues.Addressing this issue, several researchers tried to train machine learning models at the edge devices [19], [20].Although edge-based deployment addresses the concerns with cloud-based deployment, it has a notable drawback: the machine learning models deployed at the edge must have a lower memory footprint and computational requirements, which is often not the case, given the model size and performance tradeoff.We propose deploying incremental machine learning models on edge devices in this work.Given the lower memory footprint and lesser computational requirements of incremental machine learning models, the issues in conventional edge-based deployment will not be raised here.Besides the lower memory footprint, edge-based deployment has additional benefits in the proposed framework.i) The proposed framework has an anomaly detection machine learning model to identify the anomalies in the IoMT data of patients.Given the critical nature of anomaly detection tasks, detecting and responding to potential anomalies in less time is important.The edge deployment reduces the latency and is hence helpful.ii) Given the concern of data security in healthcare, with the help of edge-based deployment, the sensitive data can be stored on the local network, thus reducing the risks of cyber-attacks and data breaches.iii) The proposed system also includes an incremental learning-based inference system.The edge deployment also reduces the need to transmit the data to the cloud while improving the speed and accuracy of the model.The edge layer consists of four broad components: edgebased data preprocessing and analytics, feature selection mechanism, edge-based incremental learning model, and edge-based incremental inference.The data from the patient's IoMT sensors will be preprocessed and analyzed for insights into the edge preprocessing component.Then, the data will be passed to the edge incremental learning component after feature selection.The edge incremental learning component consists of two online machine-learning models for anomaly detection and disease diagnosis.The proposed algorithms for these models are presented in detail later.After the models are trained, the weights and other parameters are stored on the cloud for retraining or inference.In the proposed system, the anomalies in the data and the disease diagnosis are inferred in an interval fashion.The anomaly detection inference happens more frequently than disease diagnosis for apparent reasons.The IoMT data will be preprocessed and passed to the inference component, which infers and alert the patient whenever required.The inferences will also be stored on the cloud for use by physicians.
The final application layer consists of a secure cloud storage platform.The cloud also consists of the patient's electronic medical records, which organize the information regarding the patient systematically.Apart from the EMRs, the parameters and hyper-parameters of the edge-trained machine learning models will also be stored on the cloud.With the access permissions from the patient, a physician can access the EMR information of the patient, which aids in the patient's treatment.

B. FEATURE SELECTION EMPLOYING BIJECTIVE SOFT SETS, SHANNON ENTROPY AND TOPSIS
In the conventional IoMT setting, several features can be surged due to an increase in the available IoMT sensor readings from the patient.All these features may need to be more helpful in performing classification and anomaly detection, thus raising the need for a feature selection algorithm.Feature selection is particularly important in the proposed edge-based incremental setting, where one of the main goals is to reduce the computational burden.In this section, we describe the components of the proposed feature selection algorithm, followed by the implementation details.

1) SOFT SETS
A soft set is a mathematical concept used to represent imprecise or incomplete information in a set.This extends the concept of a traditional set by introducing a degree of uncertainty in the membership of the elements.
Definition 1: If U represents the universe set, E represents the parameter set chosen for analysis and P(U ) represents the power set of U , then pair (F, E) is said to be a soft set over U , where F is a function that maps elements from E to subsets of U .This can be shown as: Definition 2: A soft set (F, E) over the universal set U is considered to be a bijective soft set if it meets the following criteria [21]: 1 2) For any two parameter values e i , e j ∈ E, e i ̸ = e j F(e i ) ∩ F(e j ) = ∅

2) SHANNON ENTROPY
The Shannon entropy tells about the level of uncertainty in information.This uncertainty can be expressed using probability theory.The function H , proposed by Shannon, fulfills the specified conditions for the values of p i in the estimated joint probability distribution P.
The conditions are as follows: 1) H is a continuous function that is always positive.
2) When all p i are equal and defined as Shannon Entropy is used to capture the preferences of network security experts.This concept is helpful for calculating weights of parameter values [22], which tells about their relative importance.The projection value is calculated, which is used to calculate entropy, then divergence, and finally, weight for the parameter values.

3) TOPSIS (TECHNIQUE FOR ORDER OF PREFERENCE BY SIMILARITY TO IDEAL SOLUTION)
It is a multi-criteria decision analysis technique that compares various alternatives and ranks them based on their distance from ideal and non-ideal solutions.The optimal choice is the alternative that minimizes the distance from the ideal solution while maximizing the distance from the non-ideal solution.TOPSIS procedure includes several key steps.Firstly, the alternatives and criteria are identified and put into a matrix.Then, we normalize the matrix and calculate the weighted normalized decision matrix.After this, we determine ideal and non-ideal solutions and calculate the relative closeness of the alternatives to the ideal solution.The alternative with maximum relative closeness is taken as the optimal choice.

4) IMPLEMENTATION
This section applies the proposed methodology for the breast cancer dataset.
1) A set of effective features is selected by us as follows:     F is a function that maps elements from EF to subsets of U .The mapping is given as follows: All the above soft sets fulfill the conditions required for being a bijective soft set.As an example, in the softset (F 1 , EF 1 ), the combination of all the soft sets of (F 1 , EF 1 ) forms the common universe U or For the second condition, taking any two parameter values, suppose x 11 , x 12 ∈ EF 1 where x 11 ̸ = x 12 , it holds that F1(x 11 )∩F1(x 12 ) = ∅ indicating that there is no intersection between the soft sets associated with different parameter values.4) The network security (NS) experts were required to give preferences to different features' values.These preferences will be used in Shannon Entropy.The available options for preferences include Low (0.2), Medium (0.5), High (0.7), and Very high (0.9).The preference decision matrix is shown in Table 1, which collects all the preferences given by NS experts.5) We then calculate the projection value (p ij ) for each parameter value x ij as: The entropy values e j for each parameter value x ij is calculated as: The degree of divergence d j is calculated as: For each parameter value x ij the weight W j is calculated as: All these values are tabulated in Table 2.  3, 4 and 5 represent these sets as soft sets.The Weighted Utility Value (WUV) is also computed in these tables which tells about the effectiveness of each functional concept for a given NS i .WUV i,k , weighted utility value for a requirement set of NS i and functional concept k is computed as: x mn represents a column iterated by j, W is the weight of each column and h kj is used for accessing values in the softset table .7) The ideal solution (x * i ) and non-ideal solution (x − i ) is calculated for each requirement set NS i as: ) These are shown in Table 6.8) For each NS i , the separation measure from an ideal solution and a non-ideal solution is calculated as: These are shown in Table 7.The combined separation measure is shown in Table 8.It is calculated as: 9) The relative closeness between the concepts FC k and the ideal solution is calculated as: These values are shown in Table 9.Now, the cases are ranked according to their C * k values in descending order.Ranking is given as The best concept is the one with the maximum value of C * k .This particular alternative will exhibit the least distance from the ideal solution and the greatest distance from the non-ideal solution.Based on the above requirements, the functional concept FC 1 emerges as the optimal choice.The relative closeness for FC 1 is 0.822 which is highest compared to others.FC 1 is composed of x 12 , x 21 , x 33 , x 42 and x 53 which can be inferred as medium, poor, very good, medium and high respectively.

C. PROPOSED INCREMENTAL ALGORITHM FOR CLASSIFICATION
We proposed an incremental learning classification algorithm inspired by the AMF algorithm for real-time IoMT data classification [23].The proposed algorithm is an ensemble learning method that combines the Mondrian Forest algorithm [24] with an aggregation scheme.This combination improves the performance of the algorithm for online learning tasks.  2 presents the proposed algorithm.The algorithm creates a set of Mondrian trees on a data stream.Each tree in the ensemble is constructed on a randomly selected subset of the data with a fraction f and has a weight w t .The weights of the trees are updated iteratively based on the performance of each tree.The goal is to learn a set of Mondrian trees that minimize the data stream's loss function value.The algorithm starts by initializing the weights of all trees to 1.Then, for each tree T t from 1 to n estimators , the algorithm randomly selects a subset of the data with a fraction f to build the tree.The tree is constructed using the MondrianTree function that recursively partitions the data into smaller subsets based on a selected feature and a split value.
If aggregation is enabled, the weight w t of the current tree is updated according to the performance of all previous trees in the ensemble.The update rule is a weighted exponential function penalizes trees with high loss values.The weight of the current tree is updated using the following equation: where L(T i , D i ) is the estimated loss of tree T i on data D i , and step is the step-size for the aggregation weights.The MondrianTreefunction (in the second part of Algorithm 1) builds a decision tree, which partitions the data into a hierarchical structure that preserves the temporal order of the stream.The tree is constructed recursively by selecting a feature and then a split value that maximizes the reduction in the expected loss.The expected loss is a weighted sum of the losses of the child nodes.The weights are the fraction of instances that belong to each child node.The process is continued until a stopping criterion is met, when all instances in a node belong to the same class or the node contains only one instance.The resulting tree is either a leaf node with the class of the single instance in the subset or an internal node with a split feature, split value, child nodes, and class probabilities.

D. PROPOSED INCREMENTAL LEARNING FOR ANOMALY DETECTION
We proposed an incremental learning anomaly detection algorithm inspired by the Streaming Half-Space Trees (HS-Trees) algorithm [25] for real-time anomaly detection in the IoMT data.The streaming data coming in as input is generally of high velocity and volume; therefore, traditional batch-based learning methods can become computationally infeasible due to their high computational and memory requirements.Our proposed model particularly addresses streaming data.
Algorithm 2 presents the proposed incremental algorithm for anomaly detection.It operates on a data stream D with a given window size γ and a number of HS-Trees n.The aim is to compute an anomaly score scr for a streaming instance x.The algorithm begins by creating a specified number of HS-Trees.The MassUpdate function is invoked for each instance x in the first γ instances of the stream that updates the mass of the corresponding nodes in the trees.If it's the reference window, the algorithm increments the reference mass (Node.r),otherwise, it increments the latest mass (Node.l).If the current node's depth (Node.k) is less than the maximum depth, the function recursively calls itself for the next level node that x traverses.This update in mass for the first γ instances creates an initial reference mass profile for the HS-Trees.The reference mass profile is used to infer anomaly scores of new data coming in the latest window.
The Score function computes the anomaly score for an instance x in an HS-Tree T .It traverses x from the root of T to a terminal node Node t and returns Node t .r× 2 Node t .k, where Node t .k is the depth level of the terminal node and Node t .r is the mass of the terminal node.
In the algorithm's main loop, a streaming point x is received and scr is set to 0. For each tree T in the HS-Trees, the Score function is called to compute the anomaly score for x in T , and the MassUpdate function is invoked with the reference window set to false.This updates the latest mass in the corresponding nodes.After accumulating the tree scores, the algorithm reports scr as the anomaly score for x.
After scoring a batch of data points, the algorithm updates the model.It transfers the latest mass values from the nodes in the latest window to the nodes in the reference window.This ensures that the reference window always contains the most up-to-date mass profile for scoring the next batch of data points.The count variable c keeps track of the number of instances processed.When c reaches γ , the model is updated by transferring the non-zero latest mass l to each node's reference mass r.Then, the latest mass l is reset to 0 for nodes with non-zero mass l and the c is reset to 0.

IV. EXPERIMENTATION AND PERFORMANCE EVALUATION
In this section, the conducted experiments along with the comparisons with the baseline and existing state-of-the-art  models are described, beginning with a brief description of the datasets and evaluation metrics used.

A. DATASETS AND EXPERIMENTAL SETUP
Diabetes data set [30] is used for classification.This dataset encompasses critical attributes, including pregnancies, glucose levels, and diabetes pedigree functions.It has 768 instances, eight features and two classes (whether the patient has diabetes or not).All features are numerically valued.We use the Breast Cancer dataset [31] for anomaly detection.It consists of 683 instances, nine features and two classes which are benign and malignant.The features are cytological characteristics like clump thickness, marginal adhesion, infrequent mitoses, uniformity of epithelial cell size, epithelial cell diameter, bare nuclei, normal nucleoli, uniformity of cell shape and blandness of nuclear chromatin.
In our research, we implemented the machine learning models using Python programming language.Many open-source libraries were used, including Numpy, Pandas, Scikit-learn and Matplotlib.Example taken on a system with: Intel Core i5-9300H 2.40GHz processor, 8GB of RAM and NVIDIA GeForce GTX 1650 graphics card.We have used accuracy, ROCAUC, Logloss, and weighted F1-score as metrics for evaluating the models.Apart from these metrics, we have also measured the time taken by the models using the time module in Python.

B. INCREMENTAL CLASSIFICATION
We compare the performance of the model proposed in the table 11 with other basic incremental algorithms and state-of-the-art incremental algorithms [26], [27], [28].The proposed model obtains the best accuracy, precision, and F1-score values.The recall score of the proposed models is also quite close to the best recall score.We evaluate the performance and duration of the proposed model compared to the aggregated algorithms in the Table 10.
As can be seen from the table, the proposed model has the best accuracy on the Diabetes dataset and takes the least training time compared to other batch learning algorithms.SVM takes a lot of time to train.The accuracy results of the Decision Tree and Naive Bayes algorithm are similar.Overall, our model gives the best results in terms of accuracy and time taken in comparison to other batch models.
Figure 3 presents the plots of ROCAUC, log loss, accuracy and weighted F1-score of the proposed incremental algorithm for classification.From Figure 3a, it can be seen that the graph starts to saturate at around 100 data points, after which it starts to increment slowly, suggesting that as we increase the size of the dataset, the ROCAUC value will increase, which establishes the model's better performance.From figure 3b, it can be observed that the graph starts to saturate after around 100 data points.Then it slowly decrees, suggesting that as the dataset increases, the log loss decreases as the model becomes better at predicting.From Figure 3d, it can be inferred that the model's weighted F1-score stabilizes around 200 data points and very acutely increases afterward.Figure 3c shows that when the model is in its initial stage, accuracy is less because the model is not trained with much data, but as the size of the Dataset increases, the model learns how to predict more precisely, and accuracy starts to increase.
Figure 4 compares the proposed incremental algorithm with the batch learning algorithms for classification in terms of accuracy and time.As can be seen from the figure 4a, the proposed model performs best in terms of accuracy, followed by the ensemble learning model of Naive Bayes, Support Vector Machine and Decision Tree.The proposed model has low accuracy initially, but as the model learns with each iteration, accuracy improves and outperforms other models.
From the Figure 4b it can be judged that our proposed model is is more computationally efficient compared to other batch processing methods.

C. INCREMENTAL ANOMALY DETECTION
Table 13 presents the performance comparison of the proposed model, a baseline incremental algorithm and stateof-the-art incremental algorithms [27], [32].Our proposed model outperforms every other model in all the metrics with an accuracy of 97.22%.The proposed model is better in terms of accuracy than One class SVM with a Quantile filter [32] by a difference of 1.68%.The One-Class SVM with Quantile filter is the second-best model with an accuracy of 95.90%.The threshold filter used over here classifies the anomaly score based on a fixed threshold value.Any instance having score above the threshold is classified as an anomaly.On the other hand, the quantile filter classifies the anomaly score based on a running quantile of anomaly scores and classifies any score above current quantile as an anomaly.The quantile 380 VOLUME 2, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.  is updated with each new observed score allowing the filter to adapt to change in data over time.
Table 12 shows the accuracy and training time of different ensemble learning models and planned models.The proposed model outperforms other collective learning models in terms of both accuracy and training time.The accuracy of the model is 97.22%, which is 1.76% higher than the classified forest model.The isolation forest model performed best among all batch-learning models.The OneClassSVM model used here is also a learning-based model, different from the model used above for incremental anomaly detection, which is a stochastic use of a class of SVM algorithms.
Figure 5 presents the plots of ROCAUC, log loss, accuracy and weighted F1-score of the proposed incremental algorithm for classification.Figure 5a shows the ROCAUC score's variation with each dataset point incremented to the model.Overall the ROCAUC score increases with each update in the model until the number of train observations is 70.It reaches its peak value around 70 train observations, after which the ROCAUC score is nearly constant.Figure 5b is a log loss curve.The curve initially gives a high value and then drops down.It fluctuates around log loss equal to 1, and gradually, the fluctuation weakens and starts to settle around 1. The figure 5c, which is the accuracy curve, the curve first dips at initial train observations.Then it increases gradually until the number of train observations equals 50.After the number of observations equals 50, the accuracy is nearly constant throughout the domain.The weighted F1 figure 5d has a similar trend as the accuracy curve.The curve first dips at initial train observations.This dip is more profound than the one in the accuracy curve.Then till the number of train observations is equal to 50, the weighted f1 increases.After this, the curve fluctuates a bit and slowly becomes constant.
Figure 6 compares the proposed incremental algorithm with the batch learning algorithms for anomaly detection.
From Figure 6a, it can be inferred that the proposed model outperforms the other three models in all analyses.The accuracy of the proposed model was nearly the same after every training iteration.Figure 6b compares the time models take to get trained for different numbers of train observations.The proposed model takes the shortest time for almost all big data training.The training data size is equal to the number of train observations here.Among the batch learning algorithms, the elliptic envelope performs the best in terms of the time taken to train.Our proposed model also outperforms the elliptic envelope model, especially in many training observations, as seen in the figure.

V. CONCLUSION
In this paper, we have presented an edge-based IoMT framework with an incremental machine learning approach, incorporating a novel feature selection algorithm based on bijective soft sets, TOPSIS, and Shannon entropy.We proposed two incremental learning algorithms for classification and anomaly detection.Our proposed model for both tasks achieved good results and performed relatively better than other incremental and traditional batch-based learning models.We perform a detailed experimental evaluation comparing the proposed models with the existing incremental learning algorithms and batch learning-based algorithms.We use metrics such as accuracy, precision, recall and F1 score.The Diabetes dataset and the Breast Cancer dataset are used to evaluate the effectiveness of the proposed system, baseline system, and existing supplementary study system.The proposed model for classification gives an accuracy of 87.63%, which is better by 13.61% than the best performing batch learning-based model.Similarly, the proposed model for anomaly detection gives an accuracy of 97.22%, which is better by 1.76% than the best performing batch-based model.The proposed incremental algorithms for classification and anomaly detection are 9X and 16X faster than their corresponding best-performing batch learning-based models.

FIGURE 2 .
FIGURE 2. Flow chart of the proposed algorithm for classification.

Algorithm 2
Proposed Incremental Learning Algorithm for Anomaly Detection Input: Data Stream D, Window Size γ , Number of HS-Trees n Output: scr -anomaly score for each streaming instance x Construct n HS-Trees and for each tree T do for each instance x in the first γ instances of the stream do MassUpdate(x, T .root,true) end end Save this initial reference mass profile for each HS-tree T .

FIGURE 3 .
FIGURE 3. Plots of ROCAUC, log loss, accuracy and weighted F1-score of the proposed incremental algorithm for clasfication.

FIGURE 4 .
FIGURE 4. Comparison of the proposed incremental algorithm with the batch learning algorithms for classification.

FIGURE 5 .
FIGURE 5. Plots of ROCAUC, log loss, accuracy and weighted F1-score of the proposed incremental algorithm for anomaly detection.

FIGURE 6 .
FIGURE 6.Comparison of the proposed incremental algorithm with the batch learning algorithms for anomaly detection.
1 n , H exhibits a monotonic increase with respect to n. 3) When considering any value of n greater than or equal to 2, the function H (p 1 , . . ., p n ) can be expressed as the sum of two terms.The first term is H (p 1 + p 2 , p 3 , . . ., p n ), and the second term is (p 1 + p 2 )H EF 2 , EF 3 , EF 4 , EF 5 ],where EF 1 = marginal adhesion EF 2 = epithelial cell diameter EF 3 = blandness of nuclear chromatin EF 4 = infrequent mitoses EF 5 = uniformity of epithelial cell size These features selected can also be termed parameters or attributes.The following values are assigned to the features selected.These values represent target values for network security requirements.EF 1 = {x 11 ,x 12 ,x 13 } = {Low, Medium, High} EF 2 = {x 21 ,x 22 ,x 23 } = {Poor, Good, Very Good} EF 3 = {x 31 ,x 32 ,x 33 } = {Poor, Good, Very Good} EF 4 = {x 41 ,x 42 ,x 43 } = {Low, Medium, High} EF 5 = {x 51 ,x 52 ,x 53 } = {Low, Medium, High} 2) Five combinations are formed by combining parameter values from each useful feature.In this context, a suitable combination refers to selecting parameter values from the useful features representing a specific case (an alternative).These combinations are termed functional concepts.The union of these concepts form the universe set U .U = {EF 1 , EF 2 , EF 3 , EF 4 , EF 5 } The functional concepts are given as follows: FC 1 = {x 12 , x 21 , x 33 , x 42 , x 53 } FC 2 = {x 11 , x 23 , x 31 , x 42 , x 52 } FC 3 = {x 13 , x 22 , x 32 , x 43 , x 51 } FC 4 = {x 11 , x 22 , x 31 , x 41 , x 51 } FC 5 = {x 12 , x 23 , x 31 , x 42 , x 52 } 3) The valuable features can be expressed as soft sets, which assist the network security expert in determining which functional concept fulfills a specific target value.

TABLE 7 . Separation measure of each NS from NIS and IS.TABLE 8 .
Combined separation.

TABLE 9 . Relative closeness of FC.
6) The network security experts select different parameter values which form the requirement sets.The best functional concept will be chosen based on these requirement sets.The sets are as follows: NS 1 = {x 12 , x 22 , x 31 , x 43 , x 52 } NS 2 = {x 11 , x 23 , x 32 , x 41 , x 53 } NS 3 = {x 13 , x 21 , x 33 , x 42 , x 51 } Tables

Algorithm 1
Proposed Incremental Learning Algorithm for Classification Input : Data stream D, number of trees n estimators , step-size for the aggregation weights step, use of aggregation in trees use_aggregation, regularization level of class frequencies dirichlet, use of pure node splitting split_pure, subsampling fraction f and random seed seed.Output: Ensemble of n estimators , Mondrian trees T = {T 1 , . . ., T n estimators } with aggregation weights w 1:T .Initialize w 1:T to 1 for t = 1 to n estimators do D t ← random subset of D with fraction f ; T t ← MondrianTree(D t , w t , dirichlet, split_pure); L ← {x ∈ D : x j ≤ v}; D R ← {x ∈ D : x j > v} T L ← MondrianTree(D L , w, dirichlet, split_pure) T R ← MondrianTree(D R , w, dirichlet, split_pure)

TABLE 10 . Comparison of proposed algorithm with batch algorithms for classification.
Traverse x from the root of T to a terminal node (Node t ) return Node t .r× 2 Node t .kwhere Node t .krepresents the depth level of the terminal node having mass Node t .r * represent the subsequent level of Node that x visits MassUpdate(x, Node * , referenceWindow) end Function Score(x, T ):