IoT-Based Patient Health Data Using Improved Context-Aware Data Fusion and Enhanced Recursive Feature Elimination Model

The Internet of Things (IoT) in the healthcare market is propelled forward by the implementation of digital systems for monitoring and analysing health problems. IoT and smart devices can contribute to a highly smart environment. Smart medical devices interconnected with smartphone apps can collect medical and other required health data. “Data Fusion (DF)” refers to integrating data and knowledge from multiple sources. However, these techniques are also applied to other domains, including text processing. Using data from multiple distributed sources, the objective of DF in multisensory contexts is to reduce the chance of detection errors and increase their reliability. The objective is to increase scalability, performance efficiency, and identification. A medical device’s ability to scale up or down demonstrates its capacity to respond to environmental factors. A more scalable system performs as expected, with no interruptions, and makes the best available use of the resource management it has. To ensure that these tracking devices all work the same way, it is essential to form a specialised group to develop uniformity in areas such as communication channels, aggregation of data, and smart interfaces. The main contribution of this research is pre-processing, DF using the Improved Context-aware Data Fusion (ICDF) algorithm, feature extraction via Improved Principal Component Analysis (IPCA), feature selection through the Enhanced Recursive Feature Elimination (ERFE) algorithm, and a classifier using an ensemble-based Machine Learning (ML) model. The Improved Dynamic Bayesian Network (IDBN) is a good trade-off for tractability, becoming a tool for ICDF operations. The simulation results show that the proposed ICDF model achieves higher performance in terms of 97% accuracy, 96% precision, 97% recall, and 97% F1 score in the healthcare system.


I. INTRODUCTION
Health care is a crucial area where ubiquitous applications may be found. Pervasive computing takes place everywhere without users' input [1], [2]. When it refers to ubiquitous systems, a collection of devices are linked together using wired and wireless technologies to create an invisible and Artificial Intelligent (AI) system that can operate with (or) without human involvement. These interconnected devices can work independently of one another. Each device is The associate editor coordinating the review of this manuscript and approving it for publication was M. Shamim Kaiser . equipped with embedded chips that allow them to connect to a network of other devices, confirming that connection is constantly accessible [3], [4], [5]. Because of this, its users can use pervasive computing in different domains, such as healthcare, homecare, transportation, etc.
Universal healthcare's primary goal is to employ pervasive computing technology to offer healthcare to humans everywhere, at any time. It alternates with the conventional healthcare system, which involves finding symptoms, communicating with a medical expert, reporting symptoms, and receiving treatment [6]. Pervasive healthcare, on the other hand, distributes healthcare to humans wherever they are and at any time. It is based on using sensing and communication systems to frequently monitor a patient's health by collecting information using sensors. This enables reliable health data to be sent to a doctor (or) medical expert, allowing for prompt diagnosis and treatment of a patient's health issues everywhere [7], [8]. Significant advances in cloud-based communication, and sensing technologies, have led to the creation of intelligent portable and wearable devices like Personal Digital Assistants (PDA) (i.e.,) mobile phones and smartwatches) that have allowed the implementation of a wide range of ubiquitous medical system innovations.
IoT is a novel Information Communication Technology (ICT) method that sends and receives information using communication networks [9]. IoT means a connection of many things and their communication with one another using the Internet. The IoT plays a primary segment in the development as well as in the advancement of smart systems. IoT design is divided into five levels. Every layer has its own functions that, in the meantime, give services to its top (or) bottom layers. IoT is a platform which is built on a network of physical components, devices, vehicles, buildings, and so on; all of them are formed by electronic, software, and embedded sensor systems [10], [11]. Radio Frequency Identification (RFID) labels, sensors, and smart objects link physical devices using the internet. This research uses a Context-aware Data Fusion (CDF) method to accurately improve the data prediction process.
However, Such data are structured and maintained such that physicians may access them in order to provide improved treatment for their patients. In the existing work, CDF-EMLM is introduced for improving health data treatment. However, the classifier accuracy is reduced due to the irrelevant data in the database. In order to mitigate the problem, this research work focused on developing the ICDF and Efficient Feature Selection Algorithm (EFSA) for improving the classification process for predicting the healthcare data. The main contribution of this paper is pre-processing, DF using the ICDF algorithm, Feature Extraction (FE) using IPCA, Feature Selection (FS) using the ERFE algorithm and classifier using Ensemble Machine Learning (EML) model. The proposed method provides more accurate results using hands-on algorithms for the assumed dataset [12].
The main contribution of this research is pre-processing, DF using ICDF algorithm, feature extraction via Improved Principal Component Analysis (IPCA), FS using ERFE algorithm and classifier using EML model. The proposed method provides more accurate performance measures using practical algorithms for the assumed dataset [13].
In this paper, authors introduce an ICDF method that proactively determines the driving context and implements it to sequentially fuse sensor data from different models operating at distinct levels of the model. However, this work focuses on the challenge of classifying healthcare data perceptual and cognitive systems and assumes that the proposed ICDF-EMLM method for sensor fusion can be used in many settings, from object predictive models and monitoring to epidemiological research and clinical practice [14]. Finally, these data are learnt using the EMLM for performance checking. ICDF can improve the robustness of predicting healthcare data perception using a particular sensor-DF method. Our research is the first to investigate a context-aware DF method that can dynamically fine-tune when and how DF is carried out. EML methods are metaalgorithms that integrate many ML techniques into a single prediction model. When compared with a single model, this technique provides for higher predictive performance. Here, the Enhanced Neural Network (ENN), Modified Extreme Gradient Boost Classifier (MXGB), and Logistic Regression (LR) are combined to make a EML model for predicting healthcare data [15].
Through this study, the author aims to address the research hypothesis, ''How to implement IoT technological data with ML-based ERFE algorithm and classifier to develop a better healthcare diagnosis, prediction and monitoring system?' ' As a result, the goal of this paper is to deliver a required controller for researchers who want to study, experiment, and develop data-driven computerized AI systems based on ML techniques.
This paper identifies the following areas where further study is needed: • The analysis and features of different types of realworld data and the functionality of distinct learning methodologies will assist us in defining the scope of our research project.
• To present a deep understanding of ML algorithms that can be used to improve the knowledge and capabilities of a data-driven application.
• The goal of this discussion is to study the viability of ML-based methods in a range of practical settings.
• This paper's goal is to review the landscape of possible future research scenarios for smart data analysis and services within the scope of our ongoing investigation. Another section of work is formed such that: Section 2 reviews recent data fusion methods for health care applications. Section 3 elucidates the introduced technique. Section 4 demonstrates outcomes along with their discussion. Section 5 describes the conclusion along with further enhancement.

II. LITERATURE REVIEW
Here, some of the recent techniques for IoT-based health care applications for improving the suggested model are reviewed. And this section motivates the improvement requirements of the prediction model in this field. There are probably millions of things associated with the internet. The higherlevel ones can store essential data, process information efficiently, and take appropriate decisions because of their complex and sophisticated frameworks (e.g., smart devices). Contrarily, some of them have low-level models, very minimum storage, and severely restricted computing power (e.g., body sensors). The IoT is inherently complicated due to the interdependencies between these devices. Massive amounts of device-collected data can be collected, modelled, and sensible over in order to implement data knowledge extraction within the IoT set-up [16], [17]. This is referred to as IoT intelligence. The sensor-based DF, context-based modelling, and context-aware reasoning add further context awareness to the image. Space considerations, including time and location, are an added important feature of the IoT. For IoT processing, the location, routine, and timescale of data are critical factors to consider when extracting and analysing context-aware information from sensor data. Tracing these devices has become progressively more difficult as the number of IoT devices continues to grow. The temporal transformation is represented in Fig. 1. The context life cycle describes sensor data collection, modelling, processing, and FE. So it's a good idea to work on IoT-related solutions, communications infrastructure, and paradigms. Context acquisition, modelling, reasoning, and distribution are the four primary characteristics shared by these life cycles. Data from multiple physical and virtual sensors are aggregated during the context information collection phase. The data must be modelled according to the meaning of the data during the context modelling phase. Processing raw data is a prerequisite for FE in the context reasoning phase. In the last subcategory, context distribution, the collected information is distributed via multiple methods, such as servers, scripting languages, and frameworks [18], [19], [20]. Fig. 2 demonstrates the context life span in the IoT. Reference [21] discussed how an improved and smart healthcare system demonstrates a designed and stable society. Doctors can monitor patients' health conditions remotely using the IoT, which has been integrated into the digital healthcare system. Automated and smart IoT systems that monitor patients' health conditions store and display that data over the internet and immediately inform physicians of critical conditions were presented in this research work. In order to make the system more accessible and userfriendly, this research focuses on making it more affordable. With this system, doctors will always know about their patients' current health status. In the event of an emergency, the system notifies the patient's doctor and family members about any injury. Because of remote monitoring, a significant number of lives are saved, and a doctor will be in charge of providing health care.
By using a method proposed by [22], it was possible to determine the similarity measure for only the variables in the study. The technique offers a solution which can be applied with standard statistical methods and software either for discrete (or) continuous data. The common factors in most marketing applications are psychographic (or) demographic. Furthermore, factors to be merged are media viewing and product purchase. A scenario like this is one where the method is able to accurately predict the distribution of media usage and online ordering in combined application. That's an object that is used to make marketing decisions. The fusion of discrete variables is essential in marketing applications. For this situation, the researchers devised a system for loosening the requirement of conditional independence. Researchers back up their method with information from extensive surveys of British consumers about what they buy and how they use the media.
Reference [23] suggested utilizing Deep Recurrent Neural Network (DRNN), a powerful DL method founded on sequential information, as a body sensor-based method for behaviour detection. They combine data from body sensors, including electrocardiography (ECG), accelerometers, and magnetometers. Kernel Principal Component Analysis (PCA) is used to improve the FE. After that, robust features are utilized for training an activity called Recurrent Neural Networks (RNN), which is subsequently utilized to recognize behaviours. Using three publicly available standard databases, the method was compared with traditional techniques. The experiments test demonstrates that the suggested method performs better than existing methods.
The suggested method by [24] was implemented utilizing complex event processing techniques that natively support a hierarchical processing method and concentrate on managing streaming data ''on the fly,'' which would be a significant necessity for storage-constrained IoT devices and timecritical application areas. Preliminary results show that the suggested technique facilitates fine-grained decision-making at various DF stages, improving overall performance as well as the reaction time of public healthcare services and therefore encouraging the use of IoT technology in the healthcare industry.
Reference [25] used sensor signal fusion and case-based reasoning to categorize physiological sensor signals. The suggested method was tested, utilizing sensor DF to identify people as Stressed (or) Relaxed. During the data collection phase, physiological sensor signals such as Heart Rate (HR), Finger Temperature (FT), Respiration Rate (RR), Carbon dioxide (CO2), and Oxygen Saturation (SpO2) are gathered. Sensor fusion is accomplished in two ways: (a) at the decision-level using features extracted using traditional methods, and (b) at the data-level using FE using Multivariate Multi-scale Entropy (MMSE). The categorization of the signals is done using Case-Based Reasoning (CBR). Compared to an expert in the subject, the developed method could correctly diagnose Stressed or Relaxed individuals 87.5% of the time. Consequently, it showed potential in the physiological area, and it may be feasible to apply the technique to other important healthcare methods in the future.
In a Fog Computing (FC) environment, [26] suggested a DF-supported Ensemble model for working with clinical information collected from BSNs. Daily activity information is gathered from multiple sensors and combined to produce high-quality activity information. The merged information is then sent into an Ensemble classification model to forecast early cardiac disease. Ensembles are hosted in a FC platform, as well as prediction calculations are distributed. The findings from the FC environment's different nodes are integrated to generate a single result. A new Kernel Random Forest Ensemble (KRFE) is utilised for classification, providing considerably better outcomes than a Random Forest (RF). A comprehensive research study backs up the solution's applicability, and the findings are encouraging, as they obtained 98% accuracy whenever tree depth is equivalent to15, the estimator's number is 40, and the prediction work is based on 8 features.
Andò et al. [27] developed smart systems for categorising Activities of Daily Living (ADL) that rely on data from inertial sensors fixed in a user device. A unique multi-sensor DF technique is described, which combines data from an accelerometer as well as a gyroscope. Aside from alarm management, the data given by such a system may be used to track the progression of a user's disease, as well as during rehabilitative efforts. Overall, user testing findings show that the adopted paradigm performs well regarding sensitivity and specificity while executing falls and ADL categorization activities. The sensitivity index for categories of falls and ADL studied in this article is 0.81, whereas the specificity index is 0.98.
Medjahed et al. [28] proposed a telemonitoring method based on a multi-modal platform with multiple sensors placed at home that allows access to a complete and carefully regulated universe of collections of data. It combines physiological as well as behavioural data from the aged, their acoustical surroundings, and environmental factors with medical expertise. Specific methods process and analyse every modality. To fuse different subsystem results, a DF technique depending on Fuzzy Logic (FL) with rule collection driven via clinical rules is utilised. By identifying multiple distress scenarios, this multimodal fusion improves the overall system's dependability. In reality, this fusion technique accounts for temporary sensor failure and improves model reliability as well as robustness in the face of environmental disruptions (or) material limitations (Battery, Radio Frequency (RF) range. FL fusion approaches provide the telemonitoring platform much flexibility, particularly when merging modalities or introducing additional sensors. The suggested telemonitoring system would provide elderly individuals with continuous in-home health monitoring.
In the context of the Narrow Band-IoT (NB-IoT), [29] proposed a multi-source heterogeneous DF formed on perceptual semantics. To begin, examine the effects along with the core technologies of NB-IoT, which include physical as well as media access control layer fundamental methods.
Furthermore, to reduce data redundancy as well as increase network lifespan, they investigated centralized and dispersed modes of the NB-IoT network and suggested a multi-source heterogeneous DF formed on semantic perception to create a consistent system. Ultimately, to continue with DF and get the final fusion outcome, an enhanced D-S evidence theory is utilized. According to the study, the suggested method has a faster rate of convergence, is more stable, and is much better at judging the results of fusion in real-world situations.
Depending on the Dempster-Shafer (D-S) theory, as well as an Adaptive Weighted Fusion Method (AWFM), [30] introduced a Data Fusion-IoT. While DF analyses the dependability of every device in the network as well as any conflicts among devices. It is done whilst taking into account the information lifespan, the distance between sensors and entities, and the need for less computing. To describe uncertain information (or) measure the resemblance between two bodies of evidence, the suggested technique employs principles founded on the Basic Probability Assignment (BPA). Murphy provides a detailed study using benchmark data simulation and real datasets from a smart building test bed to test the effectiveness of the suggested technique in contrast to D-S. Regarding reliability, accuracy, and conflict management, DF-IoT performs better than the above approaches. The system's accuracy was up to 99.18%, 99.18% on artificial datasets, 98.87%, 98.87% on actual datasets, 0.58%, and 0.58% on conflict.
Amrouche et al. [31] proposed a application-independent approach toward the segmentation of task executions in a semi-manual industrial assembly setup by exploiting the expressive features of the distribution-based gaze feature Nearest Neighbor Index (NNI) to build a Dynamic Activity Segmentation Algorithm (DASA). The proposed approach is enriched with a ML model acting as a feedback loop to classify segment qualities. The approach is evaluated in an alpine ski assembly scenario with real-world data, reaching an overall 91% detection accuracy.
To improve the classification accuracy, [32] presented the Recursive Feature Elimination with a Cross-Validation (RFECV) approach for Type-II diabetes prediction. Dealing with overfitting issues and improving accuracy without deleting unnecessary records is the primary challenge of this approach. To predict diabetes diagnosis, it used other preprocessing techniques and following classical Machine Learning (ML) algorithms: Logistic Regression (LR), Artificial Neural Network (ANN), Naïve Bayes (NB), Support Vector Machine (SVM), and Decision Tree (DT). A personal healthcare system that is adaptable and scalable was presented by [33]. With the help of embedded sensors, the system continuously monitors the health of the wearer. The collected data is sent to the Raspberry Pi, which acts as a processor and processes and analyses the data. This data is stored in the cloud for reasons of scalability and adaptability. Digital health care is one of the most important applications of the IoT. The healthcare sector is given a new lease on life chance to the IoT. With the help of IoT, physicians can confidently and swiftly use pertinent patient data to take appropriate action, which is one of the best methods. The quality of medical information and patient care will be greatly enhanced as a result of this development. A concrete platform for connecting all resources and raising standards of living is provided by the IoT. When a critical condition happens, the results of the analysis are sent automatically to the doctor.
From the above discussion, some methods have good prediction performance with the experimental results. However, those methods have some issues in the DF process. So this research focuses on improving the prediction rate of health care data using improved CDFT with an efficient FS model for reducing the problem in the fusion process. The details of this method are described in the below segment (Tab. 1). A DF method developed on DL was addressed by [34] for fusing the era of digital. The application of DL in this scenario is designed to assist adaptive fuzzy in learning the hidden mappings connected with these data. Applications of this framework include smart monitoring and user-friendly graphical analytics, and authors showed how to retrieve data from IoT devices linked to human objects and accurately tag them on the objects captured by a digital camera.
The use of DF to assess travel time in wireless sensor networks for Intelligent Transportation Systems is examined in this study [35], as well as its economic benefits and limitations. Specifically, the probability-based, AI-based, and evidence-theory-based methods were evaluated for this purpose. So, researchers can learn more about how to predict travel times if they look at both the pros and cons of DF techniques.
Propose a consortium blockchain-based Personal Health Records (PHR) and sharing scheme that is security-aware and privacy-preserving, as motivated by [36]. For Medica IoT (MIoT)'s PHR symmetric encryption storage, we use the Interplanetary File System (IPFS). Zero-knowledge verification can then provide proof for validating blockchain keyword index authentication. Modified attribute-based cryptographic primitives and custom smart contracts enable secure search, privacy-preserving, and personalized access control in MIoT scenarios. Security method analysis aims to illustrate that the designed protocols meet the desired design objectives.
This research [37] set out to develop a multi-sensor platform that would include a conductive fabric-based wearable device for monitoring plant growth and a system for assessing local microclimates (i.e., temperature and relative humidity). In a real-world scenario, the platform was tested on a tobacco plant. The findings were optimistic regarding the platform's implications for tracking plant growth and its ability to measure temperature and relative humidity.
Combining Emotional Intelligence (EI) and a sensor network based on IoT devices [38] provided a human-centred method in the health sector. An IoT device is a Raspberry Pi wired with sensors and a camera. These monitors collect biometric data from the body and EI derived from facial expressions. For reliability, security, and tamper-proof data sharing and storage, the system is hosted on an Ethereumpermissioned blockchain. Physical Unclonable Functions (PUF) are used to verify the identities of connected devices in a network. PUF-based authentication is 330% faster than traditional methods, according to a comparative analysis. A low latency of 20 ms is achievable with this system. The suggested method uses smart contracts to control access based on roles and to help build scalable and harmonious digital health care platforms.

A. LIMITATION OF RELATED WORKS
• Instead of manual monitoring, ML can be used to analyse ECG data. The process can proceed without human involvement, which contributes to reducing costs. IoT is highly vulnerable to cyberattacks, so security must be enhanced.
• Using smart devices and a microphone increase scalability even more. The standard IoT method doesn't involve handling much data.
• ML-based Signal monitoring involves reducing errors and improving accuracy. Increased execution time is required when more FE is from the data.
• It is necessary to improve the adaptability as well as the scalability.
• For this method to effectively process data in the real-time system, an unsupervised model needs to be developed.
• It is required that wearable devices have their batteries replaced regularly. It is necessary to reduce the overall size of the data.

III. PROPOSED METHOD
Using IoT devices, healthcare systems may gather patient data over a long period of time. This work focused on developing an ICDF and an efficient FS algorithm for improving the classification process for predicting healthcare data.
ML algorithms such as dimensionality reduction via feature engineering, dimensionality expansion via clustering, dimensionality reduction via association rule learning, and DL are discussed here. The term ''context awareness'' refers to a system's (or) component's ability to take in data about its immediate surroundings and adjust its actions accordingly. Automatic data collection and analysis to direct actions are at the heart of what's known as ''contextual'' (or) ''contextaware'' computing. We suggest a guideline for incorporating context into ML models [39]. A conditional probability distribution is divided into two parts: context-free and context-sensitive. Put another way, context-aware systems are at least partially autonomous because they can adjust their offerings to meet the user's requirements without any intervention on their part. Context-aware systems must first make context information accessible to the computer. EML methods increase complexity, which reduces the model's interpretability and makes it challenging to draw important business insights. A lengthy period of time is required for the computation and design of EMLM [40].
In this article, the authors mainly focus on ML algorithms, which are commonly used in IoT-related domains like sensor networks and context-aware systems. In the following sections, we investigate Supervised Learning (SL), Unsupervised Learning (UL), and Reinforcement Learning (RL) techniques. We only discuss IoT-related and context-aware ML-related algorithms and techniques here because ML has such a broad range of applications [41]. The classifications of IoT-specific and general ML algorithms are depicted in Fig. 3. The IoT-DF middleware is divided into four submodules: data receiving and computation, knowledge inference, userrelated acquisition and adding new features, and service decomposition and performance [42]. The architecture of the system is depicted in Fig. 4. Using the smart library's IoT and other sensing devices, the DF middleware's receiving and processing module can collect and process data in real-time. In order to convert the raw data from the bottom layer into meaningful events, the knowledge reasoning module must first apply the rules to the data. Reference [43] and [44] Whenever an event is sent from the knowledge reasoning module, it is uniformly processed by the event description module into a simple form. In order to get the lower-level  hardware devices to carry out the demand from the top layers, the event decomposition module is used to translate those requirements into commands the system can accept [45]. Fig. 5 depicts the system's functional decomposition diagram, which represents the proposed methodology's flow process step-by-step.
• Data Fusion: The data from IoT devices will be gathered and preprocessed to clarify the fusion processing.
In this work, a Dual Filtering Method (DFM) for data preprocessing is introduced, which attempts to label the unlabelled attributes in the data gathered so that DF can be done accurately.
• Improved Dynamic Bayesian Network: IDBN is a good trade-off for tractability and becoming a tool for ICDF operations. Here, the inference problem is handled using the Hidden Markov Model (HMM) in the Deep Belief Network (DBN) model.
• Improved Principal Component Analysis: IPCA is used to pull out features and reduce the number of dimensions.
• Feature Selection Process: The FS process uses the ERFE method to eliminate irrelevant data in a dataset.
• Ensemble-Based Machine Learning Model: This data was learned using the EMLM model for performance checking. The ENN, MXGB, and LR are combined to make an ensemble mode for predicting healthcare data. VOLUME 10, 2022 A detailed explanation of the proposed method is elaborated in the next part. Fig. 6. Shows the overall process of the suggested methodology.

A. PREPROCESSING
Data collection from physical devices is the initial stage in context acquisition. Next, to eliminate noise as well as other measurement outliers, measurement preprocessing is performed using filtering and estimating methods. To correct incorrectly labelled data, classification models that operate on clean data sets can be single classifiers or ensemble models are used after the filtering process. 'Cleaned data sets' refers to data elements that remain after cleaning (or) filtering. The preliminary filtering performed in this work removes noise from data instances before filters are applied. In this research work, a DFM is introduced to process the data. The Kalman Filter (KF) is a statistical method to measure a system's state, while the Particle Filter (PF) is an unexpected way to measure a system's moments.

B. KALMAN FILTER
The KF is a widely utilized Statistical State Estimation Technique (SSET) for fusing dynamic signal level information. A system's state estimations are calculated using a recursively implemented prediction and update method, which believes that the present state of a system is dependent on the preceding time interval's state. One of its many benefits is its computational efficiency. KF, for example, has frequently used combined accelerometer and gyroscope information to provide good estimation [46]; for example, KF could be used to identify postural sway during quiet standing (standing in one place while doing no other activity) while leaning on something).

C. PARTICLE FILTERING (PF)
Whenever target probability density moments cannot be calculated analytically, PF is a stochastic method to approximate them. Its idea is to create random numbers named particles from a readily sampled ''importance'' distribution. Next, for every particle, a weight is assigned that compensates for the difference between the goal and the significance probability.
PFs are frequently employed in Bayesian contexts to determine the posterior density mean. They have the advantage of whole target distribution estimation without making any assumptions, making them especially helpful for nonlinear and non-Gaussian systems. PF may be used to check biomechanical conditions depending on the accelerometer and gyroscope information.
These two filtering steps make a significant contribution to noise-free, effectively labelled clean data for further analysis. Assuming each data instance 'x j is a multi-label set l j = {l jk } R k=1 .When dealing with Binary Classifications (BC_, Let's assume that C(+) represents positive labels and C(-) denotes negative labels in the set 'li'. Equation (1) and Equation (2) apply if pb(+)/pb(-) are positive and negative label probabilities in the set 'li'.
for a Laplace correction that is applied. The margin between classes |pr(+)-pr(-)| is small if C(+) is very close to C(−). when items are mislabeled, this can happen in two ways. when labelling complex instances, the second case occurs. as a result, if |pr(+)-pr(-)| is small for an instance 'x i ', inference algorithms may not be able to integrate the instance and must be filtered. algorithm 1 is used in the proposed work, which is listed below. steps to 7 use |pr(+)-pr(-)| to perform preliminary filtering. noiseless data is returned from lines 14 TO 16 and correct labels from steps 9 to 13, while the second level of filtering is contained in steps 8 and 9.

D. IMPROVED CONTEXT-AWARE DATA FUSION (ICDF)
DBN is a good trade-off for tractability in this work, and it becomes a tool for DF operations. DBNs are used in this study to uncover the effects of context variables on environments without being bound by probability distributions. They use HMMs to discover the observable symptoms of an instance by dividing data into time slices that represent the states of the instance. The hidden variable 'V t ', abbreviated as DBN, is primarily used to infer the states of a known feature of interest. Sensory readings and their contexts are used to update the system. S t =(S 1 t , . . .S n t )is the set of sensor's readings active in the 't time slice, and the contexts set is represented by Cn t =(Cn 1 t , . . .Cn n t ) based on the application's environment. The size of the Conditional Probability Table (CPT) and learning in the training phase are controlled by limiting the number of context variables.
Sensors and state transitions must be defined in DBN. Pb(S t |V t ) represents how sensor information is affected by the system's current state (or) sensor model, whereas Pb(V t |V t−1 , C nt ) represents the likelihood that a state variable has a specific value given its initial value and current context. The utilised DBN is a first-order Markov model and a given system state within the t time slice; thus, v t can be defined For the practical formulation of belief, a Bayes Filter-like procedure is applied using the Bayes rule, and it is possible to state Equation (4) Bl where ''Normalizing Constant'' is used. The sensor nodes in 'S t ' do not rely on C nt context variables in a Markov assumption; in a 'S t ' state variable, and assuming sensor measured values are mutually independent, parent node value 'S t ' can be computed using Equation (5) Pb where s i t is the specific value of the 'i sensor in a 't time slice, Equation (6).
where α-normalizing constant. C nt could be sensibly neglected from the last term, as V t−1 does not rely on C nt following context when the next state,'V t ' is not considered. Therefore, using the Markov model stated in Equation (7): belief is described with a recursive Equation (8) Bl where α-integrated with normalization constant 't η '. The inference is executed via storing DBN in two slices where time, as well as space, updating the network's beliefs, is not dependent on the length of the sequence. The Computational complexity is O(n+m), where 'n represents sensors, 'm represents possible V t values, and the overall complexity of Bl(v t ) for all V t is O(m 2 +m·n). However, the inference issue in DBNs is identical to that in BNs, in which the desired quantity is the posterior marginal distribution of a collection of hidden variables with an order of observations (up to date of belief): Specify observed variables as well as hidden, correspondingly. Time-series inference is classified as filtering (τ = 1), smoothing (τ > 1), (or) forecasting (τ < 1) based on the time frame of observation used in calculations.
Building a large static BN for required time section numbers and then using standard methods of inference for static BNs provides a direct method to imply probabilities in a DBN. Nevertheless, this necessitates knowing the end of a time interval in advance.
Furthermore, the data-processing complexity of this technique might necessitate a significant time (memory). As a result, DBN inference is often performed via recursive operators that revise DBN's belief state as new observations become accessible. A message-passing mechanism for static BNs is identical in principle. So in this work, the Hidden Markov Model is used to mitigate this problem. The aim is to use messages specified on Markov's cover of variables to Dseparate the past from the future and use a forward-backwards technique to distribute the entire obviousness along DBN, as previously discussed.

E. HIDDEN MARKOV MODEL (HMM)
Each HMM consists of finite states (N) with its own probability distribution. Changes between states are controlled by a set of probabilities called ''transition probabilities.'' The VOLUME 10, 2022 following processes must be completed to build up a word recognition model that is formed on HMMs: (a) Choose some states and observations, (b) choose an HMM topology, (c) choose training and samples, (d) train the system using training data, and (e) test it using testing data. Fig. 7 depicts the example of a 7-state; HMM that only allows transitions to the same state → next state → subsequent state [47]. Every state has transitions to the same state → next state → next state. During training and testing, the order in which the model's states change is set by the FE described in the previous section.
where a mn represents the probability that a n the current state provided by S m (Equation (9)). It is computed as a ratio of transitions that are expected from (S m to S n ) state of transitions predictable out of state S m (Equations (10) and (11)).

2) EMISSION PROBABILITIES (B) MATRIX
Here , b n (p)' is the likelihood that O p present observation is given S n present state (Equation (12)). It is computed as a ratio of the expected number of times where O p served with 'S n to the expected number of times in S n state (Equations (13) and (14)).
The parameters will describe a model that best matches the training observation sequences once they converge to particular values. Because the streams (DBN/HMM) are synchronised at each time slice, all observation sequences in the proposed ICDF model should work better than the current DF process.

F. FEATURE EXTRACTION
To conduct self-optimization, FE chooses important context information about the environment and internal state of the system. These features may be deduced from the DF module's output and through a meta-analysis of the system's internal activity. The objective of FS is to drive the self-configuring continuously component's behaviour in response to context changes. For dynamic circumstances, static criteria such as frequency of reconfiguration (or) even the system's overall goals are unsuitable. This level is split into two halves. The first performs measurements in the time (or) frequency domains. The signal (or) preliminary information required to determine the features used to make these measurements. Next, it gathers parameters from every measurement and combines them with data from the categorization issue. Fig. 8 depicts the FE procedure utilizing the suggested method.

G. IMPROVED PRINCIPAL COMPONENT ANALYSIS (IPCA)
PCA is primarily utilized for exploratory data analysis and the creation of prediction models. It's frequently used to show how genetic distance and relatedness among populations are shown. After normalising the raw data with eigenvalue decomposition of the data covariance (or) correlation; matrix (or) singular value decomposition of a data matrix, PCA is usually performed. Every attribute's normalization entails subtracting all data values from all variable's measured mean to arrive at a zero empirical mean (average); optionally, every variable's variance is normalizing for making 1, read Z-scores. Component scores, otherwise factor scores (transformed variable values relating to a specific data point), and loadings are commonly used to describe the outcomes of a PCA (the weight through which every original standardized variable has to be multiplied to obtain a component score). When component scores are normalized to unit variance, data variance must be included in loadings (i.e., eigenvalues magnitude). When component scores are not standardized (and so include data variance), loadings should be unit-scaled (''normalized''). All these weights are termed eigenvectors; they are the cosines of the orthogonal rotation of variables to their principal components (or) back [48].
Its function may be regarded as disclosing the internal structure of data in such a way that perfectly illustrates the data's variation. Whenever a multivariate database is shown as a collection of coordinates in a high-dimensional data space, PCA can provide the viewer with a lowerdimensional image or projection of an object from its most relevant viewpoint. It is accomplished by reducing the dimension of the modified data by only utilizing the first few principal components. Whenever the data set contains outliers, nevertheless, PCA suffers primarily as a result of an investigation. In a setting with a massive amount of data, separating outliers is difficult. This problem is solved by constructing the Adaptive Gaussian Kernel Matrix (AGKM).

H. THE ADAPTIVE GAUSSIAN KERNEL MATRIX CONSTRUCTION
Assume a collection of s nodes V={v i, 1≤i≤ s}, each of which may interact with the central coordinator 'v 0 ' in a distributed setup. A Local Data Matrix (LDM) P i ∈R ni×d with 'n i ' data points in 'd' dimension (n i > d) exists on every node 'v i '. The Global Data Matrix (GDM) P∈R n×d is then formed by concatenating the LDM, that is P T = P T 1 , P T 2 , . . . . . . P T s and n = s i=1 n i . The i th row of P is denoted by 'p i '. Assume the data points are centred so that the mean is zero. That is, s i=1 p i = 0. Uncentered data necessitates a rank-one change to methods, whose communication and computation expenses are conquered by the costs of other stages.
Assume a nonlinear transformation φ(x) from the original D-dimensional feature space to the M-dimensional feature space. Here, generally M D. after that, every data point 'x i ', a point φ (x i ) is projected. Do conventional PCA in the new feature space. However, be aware that this might be both expensive and inefficient. However, make the calculation clearer by using kernel techniques. Basic notation and equations (19), (20), and (21) are described.
The benefit of using kernel techniques is that you don't have to calculate manually 'φ (x i ) . Directly generate the kernel matrix from the training data set xi k (x, y) = (x T y) d (19) or k (x, y) σ is an AGKM parameter.
Whilst k=1 and the centre is an r-dimensional subspace, PCA is a unique case. Top 'r right singular vectors of 'P', defined as key components, span this optimum r-dimensional subspace, which may be determined via the Singular Value Decomposition (SVD).

I. FEATURE SELECTION USING ENHANCED RECURSIVE FEATURE ELIMINATION (ERFE)
Both approaches look for fewer input variables for predictive systems, and FS and dimensionality reduction techniques are linked. Differentiation is that FS selects which features to preserve (or) remove from a database, while dimensionality reduction produces new input features via projecting the information. Hence, dimensionality reduction is a form of FS instead of a substitute.
Furthermore, a smaller set of features may provide a better understanding of the system to be trained as well as a computation speedup. Other advantages might include cost-effectiveness, such as in biological applications where a smaller subset of attributes must be assessed to diagnose a disease with similar accuracy. The RFE technique uses ML's generalization capabilities, making it ideal for minor sample issues.
Feature Selection Wrapper Technique (FSWT): FSWT removes redundant and weak features that have the most negligible impact on the training error while keeping the mas independent and robust features to enhance the model's generalization performance. It employs an iterative feature ranking process, an example of backward feature removal. This method begins by creating a model based on the complete collection of features and then ranking each feature by its relevance. The model is rebuilt, and the feature importance is recalculated after removing the least significant feature. To save the feature rating, assume 'T' is a sequence number. 'T i ' contains the top-ranked features upon which model refit and performance are accessible throughout every iterative round of backward feature reduction. A best-performing 'T i ' value is calculated, and top-performing features are matched to the final model. Algorithm 2 depicts the ERFE method procedure for removing unnecessary information. The mean has substituted some missing values in the specified features. Following the selection of the above features, the testing was performed using the entire dataset. For efficient learning, features are scaled at a unit variance. However, traditional RFE has some limitations, such as it does not concern the next state of the feature process practically. So, this work is mostly about making ERFE, a new method that changes the rules for removing features from each state.

1) ADAPTIVE LEARNING FUNCTION-BASED RFE
In this, the AL function φ is introduced for ordering the features as per their importance (high-value means highly vital). The primary difference between the new technique and the old RFE is that the original RFE has no regard for the future state, whereas ERFE will keep redundant (or) weak features that can be merged with other features. So, the suggested ERFE improves generalization accuracy, especially when there are few features. (22) where x ij is i th feature evector's j th element. Following the calculation of Equation (22) for every one of the P features, they may be ranked in order of significance (high-value means more significant). A certain number of characteristics from the bottom of the sorted list are eliminated in the next phase. This procedure of removing a certain number of uninformative features is continued until the remaining features are empty. To figure out the best number of features, a performance measure must be made for each attribute trained on a certain number of features. An AL function 'φ' was used to combine the weights, which were specified as follows: rank (r i ) = |r l |r l ≥ r i | where W i is the RFE weight generated by Equation (23), and 'r i ' is the outcome of attribute Rank for the i th feature in Equation (24), the rank function was used to transform the actual ranks provided by findings into a rank-based. This implies that the feature with the Top Rank receives a weight of 1, the feature with the 2 nd Top rank receives a weight of 2, the feature with the 3 rd Top rank receives a weight of 3, and so on, up to P. The modification was carried out to avoid single attributes having a weight significantly more significant than the others, which could occur if these attributes had a considerably higher degree (interconnection with other attributes) than others.

J. ENSEMBLE-BASED MACHINE LEARNING MODEL (EMLM)
EML methods are meta-algorithms that integrate many ML approaches into a single prediction model. When compared with a single model, this technique provides for higher predictive performance. As a result, EML techniques have achieved first place in many significant ML competitions. EML techniques have had much success in breaking records on challenging datasets. Three ML-based classification algorithms are employed to classify data in this study.
Here, the Enhanced Neural Network (ENN), Modified Extreme Gradient Boost Classifier (MXGB), and LR model are combined to make a EML model. Finally, output the classification result based on the voting method.

IV. ENHANCED DEEP NEURAL NETWORK (EDNN)
DL is a successful approach for producing highly accurate predictions from complicated data sets. In this research work, you will design an improved NN model. Combining a fuzzy inference system with a DNN is the basis for this work. A Fuzzy Neural Network (FNN) with several hidden layers underlying this model. A FNN is a learning method that incorporates Neural Network (NN) methods to apply the attributes of FL systems [49]. In this approach, a DNN with n hidden layers is employed, as the value of 'n may be fixed during training. Increasing the number of hidden layers in the network will improve the accuracy of the outcome, but increasing the number of hidden layers increases the system's complexity and decreases training performance. As a result, choosing hidden layer numbers is also part of the training procedure. Based on the objectives of an analyst, a DL framework may be designed to handle raw data such as images or 1-D signals. Without the requirement to label the data, the weights may be learnt iteratively in an unsupervised way. After these networks have been trained, every neuron's activity may be evaluated for its response to '+ve' and 've' stimuli. A specific input may contain multiple labelled features. This validation process can also be used to fine-tune the neurons in the network by using backpropagation.
To summarize the goal of utilising DNN, it should be noted that choosing the parameters of the provided data is a big challenge, and since many of these parameters are chosen based on user experiences or trial-and-error, an overfitting problem is introduced in the process. A computational method can be used to automate this procedure while minimising errors. As a result of this study, a Fuzzy Inference Method (FIM) for reducing this issue was developed. The neurons that respond can be fed into a FIM to help figure out which features are good.

• Integration WITH FIM
Complex non-linear situations can be modelled using if-then statements employing FIM algorithms. The benefit of structured rule-based algorithms is that they may be influenced by subjective data. This allows an analyst to give expert knowledge to the system, perhaps enhancing categorization findings (or) altering the system's behaviour [50]. This feedback bias may also help autonomous systems learn faster while preserving stability. FIS will further analyse the features derived from the data using DL, causing the system to replicate human reasoning while also providing a way of biassing the system with feedback from an analyst. Both feature vector inputs, as well as the correctly labelled output, would be needed to train the system. The extension concept is used to expand DL models to include FL. Fuzzy aggregation, which accounts both for the granularity of the data collection and as a way to manage incomplete data from every modality, is enabled by the inclusion of FL. These processes also allow for a subjective assessment of every modality's value. Linguistic statements given by the user are converted into symbolic notations that may be used to initiate NN. These may be seen in metrics such as the number of neurons per layer as well as the overall NN model's number of layers. The fuzzy inference block is then driven by NN. Fuzzy rules, for example, can be modified to integrate specialized knowledge about the input data. This system has the advantage of using fuzzy rules to describe the behaviour of a fuzzy system. The FIM block's input pattern is analysed by the learning algorithm block. Adjusted system weights are backpropagated through the system, allowing the system to adapt over time by automatically altering the behaviour of the predictors.
Assume the rules comprise three Takagi and Sugeno-style fuzzy If-Then rules.
The below expression defines the Fuzzy based NN mathematical operations, Layer I. An adaptive node with node function is included in Equations (25), (26) and (27).
where µ Ai (x) , µ Bi (x) and µ ci (x) are any acceptable parameterized (MFs, O l, I) is membership grade of a fuzzy set A = (A 1 , A 2 , B 1 , B 2 (or)C 1 , C 2 ), which shows the degree to which supplied input (x,y,z) satisfies the quantifier. ''Premise Parameters'' is the name of this layer. Furthermore, any suitably parameterized MF, such as a generalized ''bell function'', can be used as the membership function for Equation (28): where a i , b, c i are the set of parameters. The Bell function fluctuates as the values of parameters vary, revealing the different types of MF for fuzzy sets in the process. This type of membership function is also employed in the latest research.
Layer I : Each node in this layer is a fixed node, with an outcome equal to the sum of all incoming signals (Equation 29).
Every output node indicates a rule's ''Firing Strength''. Layer II: The 'N' function of normalization has a fixed node labelled N Equation (30): The results are generally known as ''Normalized Firing Strengths''.
Layer IV: Adaptive nodes are included in Equation (31): Because each node in this layer is a multiplication of the 3 rd layer's Normalized Firing Strength and the outcome of DNN, it's termed ''Consequent Parameters.'' Layer V : It contains a signal fixed node designated 'S' with a summing function that calculates the DNN network's total output as the sum of all incoming signals Equation (32).
Finally, the enhanced DNN model provides the best results.

V. MODIFIED EXTREME GRADIENT BOOST (MXGB) CLASSIFIER
MXGB is ML model for classification and regression problems based on the Gradient Boosting Decision Tree (GBDT) [51]. The inner nodes of the regression tree contain attribute test values, whereas the leaf nodes with scores indicate a judgment. Fitting the model requires both the 1 st -order and 2 nd order derivatives of the prediction loss functions. This is because XGBoost uses an AL method with a 2 nd order approximation.
To provide a clear process, this work first develops the additive tree boosting 2 nd order approximation. The volume of data is referred to as 'm , while the number of features is 'n . The 'raw prediction' before the sigmoid function will be represented as 'z i ' as well as the probabilistic prediction will beŷ i = σ (z i ), in which the sigmoid function is represented by σ .It's crucial to remember that the notations differ, and the y i n that their analysis is indicated as 'z . The true label is denoted by 'y i ', while the parameters for the two-loss functions are denoted by (α, γ ) correspondingly. Gradients and Hessians' expressions are recorded in a merged form independent of the value of 'y i ', as this simplifies programme implementation and facilitates vectorization in other programs.
In practice, the AL goal is Equation (33): t th iteration of the training process is denoted by 't . In the equation, noticed that the notations have been replaced. When VOLUME 10, 2022 Equation (34) is worked with the 2 nd -order Taylor expansion, the following results are found: The last line is because l(y i , z (t−1) i ) term may be omitted from the learning goal because it does not affect the model fitting in the t th iteration, Equation (35).
Hand-derived derivatives will be necessary because XGBoost does not enable automatic differentiation. Furthermore, the obtained expressions might be used in various ML applications. The activation for both loss functions is sigmoid, as the following fundamental characteristics of sigmoid will be applied consistently in derivatives Equations (36) (37) and (38): A. REGULARIZATION-BASED ADAPTIVE FACTOR (RA) Next, methods are utilized to reduce overfitting in addition to the regularised objective. After every stage of tree boosting, the regularization-based AL function adds new weights by a factor of RA w j * . Adaptive reduces the effect of each tree and makes room for future trees to improve the model. This is similar to how a learning rate is used in optimization. Calculate the optimum weight 'RA w j * ' of leaf j for a fixed structure q(x) by Equation (39) where ∈ is a factor of an estimate, this implies that there are approximately 5 chosen points. Every data point is weighted by the 'h i ' that indicates the weight; hence, EQU (34) may be rewritten as Equation (40) Define I j = {i|q(x i ) = j as an example, the set of leaves 'j . Then, by extending ' ' as follows, it may be rewritten as Equations (41) and (42) With labels g i /h i and weights 'h i ', this is precisely a weighted loss. Finding candidate splits that meet the requirements is difficult in large datasets.

VI. LOGISTIC REGRESSION (LR)
The LR model used, which is a popular ML algorithm that is frequently used in real-world applications like data mining [52]. For example, in this study, the risk factors for provided data were examined, and the likelihood of occurrence was predicted depending on the risk factors. The most common use of logistic regression is for classification, mainly twocategory problems (i.e., there are only two types of outcome, every representing one category), which may determine the possibility of each classification event occurring. The following is a representation of a LR model, Equation (43): here Y is a binary dependent variable (Y=1 If an event happens; Y=0 otherwise), 'e is the foundation of natural logarithms, whereas 'Z' is Z = β 0 +β 1 X 1 +β 2 X 2 + . . .+β p X p with 'β 0 ' constant, β j coefficients and 'X j ' predictors for p predictors (j=1,2,3,. . . ,p).

VII. RESULT AND DISCUSSION
It discusses proposed authentication systems for healthcare applications. This proposed method was developed based on patient data and developed using MATLAB. This work uses two types of dataset to predict healthcare data. The first one is the Mobile HEALTH (M-HEALTH) dataset, and the second one uses a battery-less wearable sensor; researchers were able to recognise activity in healthy older individuals. The suggested methodology's performance is evaluated using data sets.
With the help of IoT tracking devices, the authors could watch a patient getting long-term treatment for heart disease, their daily activities during treatment, and the effects of the treatment on the patient's body, and then train based on the results. As a result to that, the trained dataset was around 7358 instances and 258 columns-the main attributes which were highly monitored as patient acceleration movement and angle (Tab.2).

A. MHEALTH (MOBILE HEALTH) DATABASE
The MHEALTH database (http://archive.ics.uci.edu/ml /datasets/mhealth+ dataset) contains recordings of physical movement and vital signs, while 10 participants with substantial variations were involved in various fitness activities. In order to monitor the movement of various parts of the body, sensors are attached to the chest, right wrist, and left ankle of a person. ECG readings from a sensor on the chest can also be used for essential heart monitoring, testing for various arrhythmias or examining the effects of exercise on ECG.

B. RECOGNITION OF ACTIVITY IN HEALTHY OLDER PEOPLE USING A BATTERY-LESS WEARABLE SENSOR DATA SET
The database comprises the motion data of 14 healthy older people aged 66 to 86 who conducted widely scripted tasks while wearing a battery-less, sternum-level wearable sensor. Owing to utilizing a passive sensor, the data is scarce and noisy. Participants were randomly assigned to one of two clinical rooms (S1, S2). Room setting S1 (Room_1) collects data using four RFID reader antennas (one on the ceiling and three on the walls), while room setting S2 (Room_2) collects motion data using three RFID reader antennas (2 on the ceiling and 1 on the wall level). The following activities were carried out: Walking to a chair, sitting in it, getting out of it, walking to bed, laying down in bed, Getting out of bed and making my way to the front door. As a result, the following class labels may be applied to each sensor observation: Sitting on bed-Sitting on a chair-Lying on a bed-ambulating, which comprises standing and walking around the room.
Numerous criteria commonly used in binary classification were utilized in the tests to examine the effectiveness of the various approaches in predicting side effects. To compute multiple criteria, first measure the True Positive (TP), False Positive (FP), True Negative (TN), and False Negative Rates (FNR). The initial performance measure was precision, defined as the ratio of relevant retrieved instances. The recall measure, defined as the ratio of relevant instances that have been retrieved, is the second performance measure. The metrics of accuracy and recall are both crucial in assessing the performance of a prediction technique, despite their frequently contradictory nature. As a result, these two metrics may be merged with equal weights to produce the F1-score, a single measure. The final performance measure was the accuracy measure, which was the ratio of correctly predicted events to all predicted events.
The proportion of accurately obtained positive observations to all predicted positive observations is known as precision Equations (44), (45), (46) and (47).
The proportion of correctly detected positive observations to the total number of observations is known as sensitivity/recall.
The weighted average of Precision and Recall is described as the F1-score. As a consequence, FP and FN are taken.
The following is how accuracy is computed for positives and negatives: Accuracy = (TP + FP)/(TP + TN + FP + FN) (47)   The resulting graph identified the comparison performance among suggested and current techniques. Fig.9. illustrates the precision comparison results of the proposed ICDFT-EMLM for the healthcare dataset. Based on the results, it was decided that the ICDFT-EMLM technique was very accurate. Fig. 10. illustrates that the results from the proposed ICDFT-EMLM have high recall results. Fig.11. shows the F1-score comparison outcomes of the proposed ICDFT-EMLM for the healthcare data. From the results, it was concluded that the proposed ICDFT-EMLM   has high F1-score results. Fig.12 Precision analysis of the proposed ICDFT-EMLM healthcare data.  Tab. 4. tabulates the performance comparison results for both datasets. Through the table, it is proved that the introduced approach has more outstanding performance results for Dataset-I than Dataset-II, whereas Dataset-I has fewer. Fig. 13 illustrates the accuracy comparison results of the proposed ICDFT-EMLM model for the healthcare dataset. From the results, it is concluded that the proposed ICDFT-EMLM technique has high accuracy results.   From the above, Fig. 14 and Fig. 15. show accuracy and precision comparison outcomes for both datasets. And it is identified that the proposed method has higher results than the existing methods.
The above graph states the Accuracy comparison results of the proposed ICDFT-EMLM model for the healthcare data. The final metrics were obtained from the Performance Comparison Results for the Proposed and Existing Methods for Dataset-I and Dataset-II (Fig. 18).
Comparing the trained Dataset-I and Dataset-II evidence the results; HMM method improves the fusion process, which will increase the prediction performance. After that, the IPCA is utilized for FE as well as dimension reduction. The FS process uses the ERFE method to eliminate the irrelevant data in a dataset. Finally, this data will be learnt using the EMLM for performance checking. Here the ENN, MXGB   and LR models are combined to make a ELM for predicting the healthcare data. Thus the results indicate that the proposed ICDFT-EMLM model improves the prediction performance of the healthcare data (Fig. 19).
As a result of comparing the performance of proposed approaches with the existing methods (Tab. 5).
The DFA method had 86.059% accuracy, a precision of 83.29%, and a recall of 88.19%. Also, with the F1-score, it was obtained as 85.67%. By using the same metrics, CDFT is 90.950% of accuracy, with a precision of 88.92%, and recall achieves 88.19%, resulting in an F1-score of 89.88%. Also, the 3 rd parameter CDFT-HLCM shows 94.40% and a precision of 93.01%, with the Recall result as 94.67%, then  also with F1-score achieves 93.83%. The proposed method, ICDFT-EMLM, with 97.900% Accuracy and Precision of 96.97%, verifies a 97.77% resultant for Recall, then an F1-score of 97.37%, which was the highest prediction among all these methods. Tab. 6 has given the comparative results of II-datasets, Dataset-I shows the accuracy metrics of 97.90%, and comparatively, Dataset-II proves the accuracy rate of 95.80%. Dataset-I proves a comparative precision rate of 96.97%, while Dataset-II proves the precision rate of 93.42%. Then while performance comparison with Recall shows a percentage of 97.77% for Dataset-I and Dataset-II, it shows a recall rate of 96.15%. Finally, the performance analysis was made with the F1-score, which proves the final value of around 97.37% and then the dataset-2, which shows 94.76%. So these comparison proves that the proposed method shows the best practice analysis while comparing II-Datasets.

VIII. CONCLUSION AND FUTURE WORKS
This research work focused on developing the Improved Context-aware Data Fusion (ICDF) and efficient FS algorithm for improving the classification process for predicting healthcare data. In order to accurately achieve data fusion, this study addresses a Dual Filtering Method for preprocessing the data, which attempts to label unlabeled data features. And then, the improved Dynamic Bayesain Network (IDBN) is a good trade-off for tractability, becoming a tool for ICDF operations. Here the inference problem is handled using the Hidden Markov Model (HMM) in the Dynamic Bayesian Network (DBN) model. Thus proposed HMM method improves the fusion process, which increases the prediction performance. After that, the Improved Principal Component Analysis (IPCA) is utilized for feature extraction as well as dimension reduction. The feature selection method uses Enhanced Recursive Feature Elimination (ERFE) to eliminate the irrelevant data in a dataset. Finally, this data is learnt using the Ensemble-based Machine Learning Model (EMLM) for performance checking. Here the Enhanced Neural Network (ENN), Modified Extreme Gradient Boost Classifier (MXGB) and Logistic Regression (LR) model are combined to make a predictive model (Ensemble Model) for predicting the healthcare data. Thus the results indicate that the proposed ICDFT-EMLM model improves the prediction performance of the healthcare data. The experimental results of two different datasets, Dataset-I, give the accuracy metrics of 97.90%, and comparatively, Dataset-II proves the accuracy rate of 95.80%. Dataset-I proves the comparative precision rate of 96.97%, while Dataset-II proves the precision rate of 93.42%. Then while performance comparison with Recall proves the percentage of 97.77% for Dataset-I, and Dataset-II shows a recall rate of 96.15%. Finally, the performance analysis was made with the F1-score, which proves the final value of around 97.37% and then the Dataset-II, which shows 94.76%. So these comparisons prove that the proposed method is the best practice analysis while comparing two different data sets.
Overreliance on technology, rising rates of medical mistakes and depression, shifting power dynamics, and other human factors all play a role. The Internet of Things has enabled remote monitoring and diagnosis of various health problems, the measurement of a wide range of health parameters, and the provision of testing facilities thanks to the application of these ideas in the healthcare industry. As a result, healthcare has shifted from focusing on hospitals to be more centred on the individual receiving treatment.