Information Fusion for Assistance Systems in Production Assessment

We propose a novel methodology to define assistance systems that rely on information fusion to combine different sources of information while providing an assessment. The main contribution of this paper is providing a general framework for the fusion of n number of information sources using the evidence theory. The fusion provides a more robust prediction and an associated uncertainty that can be used to assess the prediction likeliness. Moreover, we provide a methodology for the information fusion of two primary sources: an ensemble classifier based on machine data and an expert-centered model. We demonstrate the information fusion approach using data from an industrial setup, which rounds up the application part of this research. Furthermore, we address the problem of data drift by proposing a methodology to update the data-based models using an evidence theory approach. We validate the approach using the Benchmark Tennessee Eastman while doing an ablation study of the model update parameters.


I. INTRODUCTION
systems accompany the operators during the machinery operation by providing assessment during decision-making.These systems support the operators with (real-time) information on the process in terms of production, machine condition, and recommendations to handle faults or to improve the machine's performance.Assistance systems have typical com-ponents such as a (real-time) data collection system, a (fault) detection system, a knowledge base, a computing engine, and an (interactive) user interface [1] [2] [3].Due to their high performance, data-based models are a popular choice when selecting a detection system with reported applications in medicine [4], industry [1] [3], road infrastructure [5], and agriculture [2].Usually, the data-based models are trained using a specific dataset presenting good results.However, not all data-based models can handle new upcoming faults in the data.Hence, an anomaly detection system must have a mechanism to recognize an upcoming anomaly and the capability to learn upcoming data that differs from the original training data.Equally important is the system's capability to adapt or retrain the data-based models automatically.The retraining or automatic update of the models must consider a minimum size of training data that assures that the models capture the essential patterns to be learned.
Systems composed by the combination or fusion of several individual models often present better results and robustness than individual models (e.g., bagging and boosting).Though data-based models attain high performance, alternatively, expert-centered knowledge-based models provide versatile features, which are production context and expert domain knowledge.The challenge here lies in how to combine a data-based model and a knowledge-based model.Thus, a common framework is required to perform a fusion of both systems.Such a framework must provide not only a way to combine the models' outputs but to quantify the uncertainty.The uncertainty provides information regarding how reliable the combined system output is.
We propose a novel methodology for assistance systems that rely on information fusion in production assessment, in which several information sources can be combined into a more robust system output.The novelty of this paper is presenting a common framework that allows the fusion of several information sources on the decision level using the evidence theory.Besides, we quantify the uncertainty of the system output to provide a better assessment of system output reliability.An essential contribution of this paper is the ability of the databased model to handle unknown fault cases in the data, which allows the model to update the models automatically.
The individual contributions of this paper are: • A methodology for the automatic model update of ECSs, while feeding up data with unknown fault cases.The methodology includes an uncertainty monitoring strategy that improves the anomaly detection of the EC, stores the data of the unknown condition, and retrains the pool of classifiers of the EC.We present the parameters of the automatic update module: threshold size, window size, and detection patience.The automatic update methodology is rounded up with experiments using the benchmark dataset Tennessee Eastman.The EC is tested using different fault class scenarios, in which we test the impact of a window during anomaly detection.Moreover, we present a detailed analysis of the automatic update parameters regarding retrained EC performance.• A general framework to combine n number of information sources on the decision level to generate a robust system prediction.The framework uses the Dempster-Shafer evidence theory.Besides, the framework quantifies the uncertainty of the prediction, which can be used to assess the reliability of the system prediction.• A methodology to combine a multiclass EC with an expert-centered knowledge-based model, in which we apply the general framework of the information fusion.The system architecture shows the components of each model, namely, the inference model and model update module.The application of the information fusion system is tested with the data of an industrial setup using a small-scaled bulk good system.The performance of the individual models (EC and knowledge-based) is compared with the combined system.
This paper is structured as follows: Section II presents a literature survey on the main topics of this paper.The theoretical background is described in section III.Our proposed approach is detailed in section IV.Section IV-C and section IV-D present the methodology for information fusion and model update, respectively.Section V portrays a use case for retraining the EC using the benchmark Tennessee Eastman.Whereas section VI presents a use case for information fusion using the data of a bulk good system laboratory plant.Finally, section VII summarizes the conclusions and future work.

II. RELATED WORK
This section reviews the literature related to information fusion, update of data-based models, and assistance systems.

A. Assistance Systems
Assistance systems provide valuable information for the users.They can be whether non-invasive or with direct control of the process.The assistance can range from recommendation systems [6] [7], interactive systems [8], or even systems that prevent actions from the user.Architectures of assistance systems commonly contemplate the modules: data collection, a condition detection engine, a knowledge base, and an (interactive) user interface [9].The (fault) condition detection engine is vital to identify the current state of the machinery or process.The engine is usually powered either by a knowledgecentered model [9] or a data-based model [10].The knowledge base plays a crucial role in the assistance system because it provides the information that supports the user when a (faulty) condition is active [9].There are different ways to build a knowledge base, namely using ontologies [9] [11], knowledge graphs [8] [12], or static databases.The proposed architectures of assistance systems contain the primary modules to support the users.However, there are factors to be considered, such as the update of the condition detection engine and the knowledge base, and the quantification of the system uncertainty.The challenge lies in a holistic architecture that addresses these factors and proposes the interactions of the primary systems.This research differentiates from the state-of-the-art, in which we propose a holistic methodology using information fusion for assistance systems with a special focus on production assessment.In this sense, the methodology addresses the major components of the assistance system architecture.We propose a novel architecture based on the evidence theory that can combine n number of information sources while quantifying the uncertainty of the resulting system prediction.For this purpose, we provide a detailed description of the architecture in terms of components and their relationships, with a special focus on the role of uncertainty.

B. Information Fusion
Information fusion is a popular approach to combining several sources of information because the combined system often yields better performance and robustness.Information fusion on the decision level is a common practice using data-based models (e.g., supervised classifiers in the case of bagging) [13].The use of information fusion and databased models is reported in [14] [15], in which evidence theory combines models at the decision level.Information fusion using evidence theory provides an additional feature: the uncertainty quantification [10].The uncertainty serves to assess the output reliability of the combined system [16].
Alternatively, knowledge-based models are expert-centered approaches containing valuable expert domain and environment context [17].Different knowledge-based approaches can be found in the literature using case-based reasoning (CBR) and natural language processing (NLP) [3], ontologies, and assistance systems [9].Though combining the strength of databased and knowledge-based models might be considered a logical step to follow, finding a common framework to perform the fusion is challenging.Besides, knowledge-based models often have a low number of input features in comparison with data-based models.The last aspect requires special attention while performing an inference of the primary systems before performing an information fusion.Current research methodologies cover the information fusion of data-based models [14] [15].However, existing literature does not report the fusion of data-based and knowledge-based models, though the heterogeneity of the sources could improve the overall result.We propose a methodology for the information fusion of a data-based model with an expert-centered model, in which we use the Dempster-Shafer evidence theory as a general framework for the fusion.Besides, we test the feasibility of the methodology using data from an industrial setup.

C. Update of Data-based Models
The ability of data-based models to handle data with unknown fault cases has grown interest in the research community [18] [19].A primary step is identifying the unknown fault case or anomaly from the upcoming data.There are different approaches reported in the literature to detect anomalies, which propose the use of evidence theory [20] and unsupervised learning [21] [22].After identifying the anomaly from the data, the next step is updating the model.In this sense, some methodologies are focus on concept drift detection [23] [24], incremental learning [25] [26], emerging classes or labels [27] [28] [29], and incremental class [28].Thus, detecting an anomaly is followed by an update or retraining of the databased model.However, there are challenges associated with the retraining or updating of models: the size of the training data sufficient to capture the essence of the upcoming fault.An essential factor to consider is the performance evaluation of the retrained models.A careful study of the parameters is required because only some upcoming faults might be handled with the same set of retraining parameters.Existing literature addresses the anomaly detection [20] [21] [22], and even the identification of emerging classes (or unknown conditions) [27] [28] [29].However, the model update using uncertainty remains unexplored.To this end, we propose a methodology for updating data-based models using DSET, in which we monitor the uncertainty of the fusion to trigger a model update.We focus on the model update of databased models, specifically for ensemble classification using evidence theory.Besides, we perform an ablation study of the retraining parameters while showing their impact on the model performance.We demonstrate the robustness of the model update using the benchmark Tennessee Eastman.

III. THEORETICAL BACKGROUND
This section presents the basic theory for performing information fusion and the transformation of model predictions using an evidential treatment.The equations of this section are applied during the development of the sections IV-C and IV-D.

A. Evidence Theory
Dempster-Shafer [30] defined a frame of discernment Θ = {A, B} for the focal elements A and B. The power set 2 Θ is defined by 2 Θ = {ϕ, {A}, {B}, Θ}}.The definition of a basic probability assignment (BPA) is given by: m: 2 Θ → [0, 1], in which the BPA must comply with m(ϕ) = 0, and A⊆Θ m(A) = 1.The last equation represents the sum of BPAs.The focal elements of Θ are mutually exclusive: The Dempster-Shafer rule of combination (DSRC) defines how to perform the fusion of two mass functions (e.g., sources of information) using the equation: where m DS (A) is the fusion of the mass functions m 1 and m 2 .The conflicting evidence b k is defined by: It is important to remark that, while using DSRC, the conflicting evidence is distributed by each focal element.Yager [31] defined an alternative rule of combination, which in contrast to DSRC, assigns the conflicting evidence to the focal element Θ.The Yager rule of combination (YRC) is defined by the equation: where m Y (A) is the fusion of the mass functions m 1 (B) and m 1 (C).The focal element Θ of the mass function m Y (A) is defined by: m Y (θ) = q(θ) + q(ϕ), where q(ϕ) represents the conflicting evidence.Likewise DSRC, the conflicting evidence q(ϕ) is represented by: In the case of multiple fusion operations, the mass functions are combined using the following equation: where m(A) is the fusion of the n mass functions, and N ∈ N.

B. Evidential Treatment of Model Predictions
We consider models with a common frame of discernment Θ = {L 1 , L 2 , ..., L N }, where N represents the number of labels or classes, , and N ∈ N. The power set is represented by 2 Θ = {ϕ, {L 1 }, {L 2 }, {L 1 , L 2 }.The last term represents the overall uncertainty U .Each model (e.g., classifier or a rulebased system) provides a prediction in the form of a unique label p = L 1 or as an array, p = [L 1 , L 2 , ..., L n ].In section III-A, the sum of BPAs is defined as A⊆Θ m(A) = 1.In [10], we proposed a strategy to transform a prediction into a mass function.This operation plays an essential role in the fusion of different information sources.We presented a sum of BPA that considers the weights of each focal element w m , and the quantification of the overall uncertainty U : S wbpa = N j=1 m j • w mj + U = 1, where n ∈ N and w m is the weight of the evidence m.The following conditions must be fulfilled: ∀m j .m j > 0 and w mj → [0, 1].The overall uncertainty is defined as U = 1 − N j=1 m j • w mj , in which a high value of U represents a high uncertainty on the body of evidence (e.g., lack of evidence).We consider that the focal elements are mutually exclusive, which means that only one label is active at the time, which transforms S wbpa into S wbpa = m Rj • w m R j + U = 1.However, we adapted the sensitivity to zero approach of Cheng et al. [32], using the equation [33]: k = 1 − 10 −F , where k ∈ R, F ∈ N, and F ≫ 1.Thus, we transform S wbpa into: where m ′ pj represents the j th focal element, and is defined using: where k is the approximation factor, N is the number of focal elements of Θ, and N ∈ N. The active prediction p can be transformed into a mass function m using: m = m ′ p .w p .The mass function can be represented as a row vector using the following equation: and the uncertainty U is defined as:

IV. INFUSION: INFORMATION FUSION FOR ASSISTANCE SYSTEMS IN PRODUCTION ASSESSMENT
This research proposes an INformation FUsion approach for asSIstance systems in productiON assessment (INFU-SION).This section covers the topics: theoretical background, prediction systems, information fusion, model update of the prediction system, and the assistance system.
As a first insight into this theme, we present a general system overview as seen in Fig. 1.
The general system is conformed by n systems used as information sources.The motivation behind this is the creation of a more robust system.The general system overview is composed of the blocks: • The batch data is the numerical representation of the physical behavior of a machine.The data is split in three categories: training data D T r , validation data D V a , and testing data D T e .The data is used during the training and inference processes of the models.• The modules form the production assessment system: - -The assessment module matches each ensemble prediction with its corresponding assessment.-The knowledge base has the assessment for each ensemble prediction.
• The assessment is presented to the user (operator) through a user interface.A primary motivation of this paper is the integration of databased and knowledge-based models because the combined outcome profits from the strengths of both models.Therefore, the n systems of Fig. 1 are transformed into two major systems: an ensemble classifier (EC) that groups different databased models and a knowledge-based model.Section IV-A details both systems.

A. Prediction Systems
As presented in Fig. 1, a (prediction) system is conformed by an inference model and an update module.The trained model represents the physical system and is used to predict the system's answer while feeding data to it.The inference model can be data-based (e.g., a supervised classifier), an ensemble classifier (EC) formed by several models, a model built on equations representing the physical system, an ontology, or a knowledge-based model.The model update module adapts the system when the initial conditions have changed (or unknown events occur).The update is performed automatically or manually, depending on the module strategy.
A model M i is trained using a training dataset D T r (in the case of data-based models), or is modeled using the relationships between the process variables and thresholds (in the case of a knowledge-based model).A training dataset D T r contains N o T r number of observations, N f T r number of features, and N c T r number of classes.A frame of discernment Θ is formed by all the labels (or classes) that the model can predict: Θ = {C 1 , ..., C N }, where N ∈ N.
where ŷi ∈ Θ.The prediction ŷi is transformed into the mass function m i using equations ( 6)-( 9): where w Mi represents the (confidence) weights for each class predicted by the model M i .
We focus this research on a prediction system using EC and rule-based knowledge models.Previous research deepened in these two topics separately [20] [10].Fig. 2 details the INFUSION system, where the prediction systems are adjusted to a data-based and knowledge-based model.Thus, the databased model is represented by the EC using the ensemble classification and evidence theory (ECET) approach [20], and the knowledge-based model is built using the knowledge transfer framework and evidence theory (KLAFATE) methodology [10].It is important to remark that each system has an inference model and a model update module.It is important to note that ECET is an EC formed by n systems, specifically the n supervised classifiers.ECET presents a similar structure from Fig. 1 for the system's prediction, except for the model update module.
The model update module of KLAFATE is manual because it relies on the expertise of the team expert.The methodology is explained in detail in [10].The automatic model update module of ECET is introduced in this research and is explored in detail in section IV-D.The main blocks of this module are: • The pool of classifiers and the list of hyperparameters reported in [20].• The anomaly detection module which monitors the ensemble uncertainty U E and the anomaly prediction ŷAN of ECET, and the system uncertainty U Sys and the system prediction ŷSys .1) ECET Prediction System: In [20], we presented an approach of ensemble classification using evidence theory (ECET), in which we propose the use of information fusion to combine the predictions of N number of classifiers.In this paper, we extend the contribution of [20] by formalizing the approach theoretically.This theoretical formalization plays a crucial role in section IV-C and section IV-D, which correspond to the methodologies of information fusion and model update, respectively.Thus, given a n number of classifiers, each classifier produces an output ŷi using equation (10), where ŷi ∈ Θ.The output is subsequently transformed into a mass function m i using equations ( 6)-( 8).The ensemble classifier (EC) is obtained by combining all the classifiers, specifically using the DSRC on the mass function of each classifier prediction.As described in equation ( 5), the DSRC can be used for multiple fusion operations.However, the fusion is performed in pairs.For instance, in the case of three classifiers, the fusion of m 1 (corresponding to the output ŷ1 of model C 1 ) and m 2 is performed first, the result of this fusion m 1 ⊕ m 2 is then combined with m 3 .The fusion of the pair of mass functions m i and m Di−1 is represented using: where i ∈ N, m i is the mass function of the current classifier, and m Di−1 is the fusion of the previous mass functions.After where i ∈ N. The last element of the fusion F Di , which is a row vector, corresponds to the uncertainty U Di : , where N is the cardinality of the frame of discernment Θ, and N ∈ N.After performing the last fusion, the system prediction ŷEN is calculated using: where ŷEC ∈ Θ.The system uncertainty is calculated using: A similar procedure is performed when using the YRC to calculate the fusion F Yi , the previous mass function m Di−1 , and the uncertainty U Yi .It is important to remark that the current mass function m i is used for DSRC and YRC.
2) KLAFATE Prediction System: In [10] we presented a knowledge-based model using the knowledge transfer framework using evidence theory (KLAFATE) [10].The knowledge was extracted from a failure mode and effects analysis (FMEA) and modeled in rules.Thus, a knowledge rule R i is defined as the function: where V 1 represents a process variable, T 1 is a threshold or limit value of the process value, N V is the number of process variables, N T is the number of thresholds, N V and N T ∈ N. The knowledge rules are mutually exclusive: R i ∩ R i+1 = ϕ.The knowledge model is represented as a set of rules [10]: where L T R i represents the approximated rule R i , m is the number of knowledge rules, m ∈ N, and L T R , R i ∈ Θ.The active rule is obtained using equations ( 6)-( 9): where k is the approximation factor, N is the cardinality of Θ, k ∈ R, and N ∈ N. Thus, the mass function is defined using equation ( 8): where w R1 is the (confidence) weight of the rule R 1 , and U is the overall uncertainty.The uncertainty U is calculated using the equation ( 9): The (confidence) weight w Rj is defined using the equation [10]: The mass function m Ri is transformed into the prediction ŷKE using: where ŷKE ∈ Θ.

B. Assistance System
The assistance system provides an interactive source of assessment for the user while receiving the process data.It provides the current status of the system (e.g., system prediction and uncertainty), the assessment (e.g., troubleshooting through the FMEA knowledge base) in the case of a fault case, and notifies in case of an unknown condition for the consequent model update.
The knowledge of the FMEA is stored as a knowledge tuple T U i [10]: where F M represents a failure mode, P is a process, SP a subprocess, C a set of causes, E a set of effects, RE a set of recommendations, and i ∈ N. A set of recommendation is also represented as: where N RE ∈ N. The latest representation applies to the sets of effects and causes.
In the assessment context, the rule R corresponds to the system prediction ŷSys , and the confidence weight w R to the system weight w ŷSys , where R, ŷSys ∈ Θ Sys , and w Sys = 1.It is important to remark, that each system prediction ŷSys is linked to a knowledge tuple T U i, a failure mode F M , and to a weight w Sys : ŷSys ⇐⇒ T U i , ŷSys ⇐⇒ F M , and ŷSys ⇐⇒ w ŷSys .In contrast, a system prediction ŷSys can be associated to a set of causes C, effects E, and recommendations RE.The assessment module is modeled through a matching function that associates a system prediction ŷSys to the rest of the knowledge of the tuple T U i : where i ∈ N. The matching function f M a provides the assessment while feeding the system prediction ŷSys , specifically returning the troubleshooting information associated with the failure mode: the process P , the subprocess SP , the set of causes C, the set of effects E, and the set of recommendations RE.The assistance system was described in detail in a previous work [10].

C. Information fusion
Information fusion has a growing research interest because it improves robustness while combining different models.To this end, we propose a novel framework for combining n number of models using DSET.Moreover, this framework is used for the fusion of a data-based model and a knowledgebased model.
Thus, as presented in Fig. 1, the system is formed by n number of subsystems.The system mass function m Sys is obtained after applying the information fusion to all subsystems: where n ∈ N, and m Sys (A) ∈ Θ Sys .The system mass function m Sys is also referred as F Sys .It is important to remark that all the systems share the same frame of discernment: Θ KE = Θ EC = Θ Sys , and where C 1 represents the first class (or fault case), N Sys is the number of classes (or fault cases), and N Sys ∈ N.
The equation (22) can also be represented as: where i, N Sys ∈ N.This paper adapts the system to two main subsystems: a data-based model M EC and a knowledge-based model M KE .
As a first step we obtain the outputs ŷEC and ŷEC by feeding data to the models M KE and M EC : and where D T e is the testing data.
The predictions ŷEC and ŷKE are transformed into the mass functions m EC and m KE respectively, using equations ( 6)-( 9): and where w Mi = 1 ∀i, and i ∈ N. The next step is to obtain the system fusion F Sys by applying either DSRC or YRC.Thus, the system fusion F D Sys is calculated using DSRC and applying the equations ( 1), ( 2), ( 22), (24): Likewise, the system fusion F Y Sys is calculated using YRC and applying the equations ( 3), ( 4), ( 22), (24): . The system uncertainty U D is calculated using the last DSRC fusion F Di : ŷSys using: where F Di [|Θ Sys |] corresponds to the overall uncertainty of the system fusion F Di .Likewise, the system uncertainty U Y is calculated using the last YRC fusion F Yi : where F Yi [|Θ Sys |] corresponds to the overall uncertainty of the system fusion F Yi .The last step is the calculation of the system mass function m Sys and the system uncertainties using DSRC U D and YRC U Y .The system mass function m Sys is obtained from the last DSRC system fusion F D Sys : m Sys = F Di .The mass function m Sys , then, is transformed into the prediction ŷSys using: where ŷSys ∈ Θ Sys .Algorithm 1 describes the steps for the information fusion of N Sys number of subsystems while feeding the testing data D T e , where N Sys ∈ N. Algorithm 1 is an updated version of the algorithm presented in [20].

D. Model Update
The anomaly detection functionality is crucial in the model update because it identifies when an unknown condition is present.We present an (automatic) model update for ECET based on uncertainty monitoring.The (manual) model update of KEXT was proposed in [10].The model update is a sequence of five steps: anomaly detection, collection of unknown data, data isolation using a window, retraining, and inference.
1) Model Update for ECET: Performing ECs are usually the result of a suitable dataset that fits the patterns of the existing data.However, the occurrence of new unknown fault cases might undermine the performance of the ECs, leading to a retraining procedure of the models.To this end, our methodology provides the theoretical basis for updating the data-based models using DSET, in which we monitor the uncertainty of the fusion to trigger a model update.The model Algorithm 1 Information Fusion of N Sys Systems [20] 1: procedure INFORMATION FUSION 2: for j = 1 to N Sys do ▷ N Sys Subsystems 3: ŷi ← M j (S i ) ▷ by Eq. ( 25) 5: ▷ by Eq.( 6)-( 9), ( if i = 1 then 7: else 11: 12: 13: 16: 17: ŷSys = arg max Θ m Sys ▷ by Eq.( 33) 19: return ŷSys , U D , U Y update of ECET is performed automatically using an anomaly detection strategy, in which the uncertainty is monitored.However, The model update can be set as semi-automatic (e.g., the user receives a notification from executing the model update module) in case the unknown condition needs to be analyzed in detail first.Algorithm 2 describes the sequence of the model update.
Algorithm 2 Model Update of ECET.
1: procedure MODEL UPDATE 2: ŷEC ← M EC (S j ) 3: m EC ← f m ( ŷEC , w EC ) ▷ by Eq.( 6)-( if C A = T rue then ▷ by Eq. (36) 5: D T empj ← collect data(X A , ŷA ) if C S = T rue then ▷ by Eq. ( D A ← D T emp 10: 11: 12: 13: MT r ← retrain(M, D ′T r ) 14: M T r ← MT r ▷ Replace old models We proposed an anomaly detection strategy using ECET in [20], in which an unknown condition A K was detected: where ŷA is a parallel prediction to the EC prediction ŷEC , A K ∈ Z, and K ∈ N. The condition for anomalies C A is defined as: where The terms b k and q(ϕ) are calculated using the equations ( 1)-( 2), and ( 3)-( 4), respectively.
In this paper, we propose the monitoring of the EC uncertainties U D EC and Y D EC , as well as the system uncertainties U D Sys and Y D Sys .The condition for anomalies from equation ( 35) is transformed into: where C A EC and C A Sys represent the condition for anomalies of EC and system, respectively.Thus, the anomaly detection of the system is defined as: The data collection of (unknown) conditions needs to satisfy the condition C D : where C S is the condition that satisfies a minimum number of consecutive data samples.The condition C S is defined as: where i A is the number of consecutive data samples, S M n is the minimum number of consecutive data samples, and i A , S M n ∈ N.
The collected data of the unknown condition D A has the same features f T r of the (old) original data D, such as f A = f T r .In contrast, the number of observations o A might differ from that of the original data o T r .Thus, the data D A is represented by a number of observations N o A , in which each observation is composed by the features X A = f A and the associated label (or class) ŷA .
The data D A is represented as: where S M n is the minimum number of consecutive samples of the unknown condition, N f A is the number of features, The data D A is split into training D T r A and testing data D T e A : The training data is split The next step is to integrate the existing data D with the collected data D A using the following equations: The EC prediction ŷEC usually has not a constant steady value because of the diversity of the classifier's predictions.For this reason, we propose a window on the EC prediction ŷEC that can ease the data isolation of the unknown condition.The window smoothes the EC output because it considers a window of N w number of the last samples for the calculation of the windowed EC output ŷEC : where ŷECi w ∈ Θ Sys .A graphical representation of the window procedure is exemplified in Fig. 3. Having the data and the frame of discernment updated, we can proceed with the retraining of the pool of classifiers.The retraining is performed using the training methodology presented in [20].
The last step is to test the EC using the testing data D T e .For this purpose, we first update the frame of discernment Θ Sys : where Θ Sys Old is the old frame of discernment, A K is the new focal element, and N, K ∈ N. Thus, the updated Θ Sys is transformed into: 2) Model Update for KLAFATE: Though knowledge-based models contain valuable expert-domain knowledge, the modeling process is time-consuming and requires frequent updates to avoid knowledge obsoleteness.To this end, our methodology provides the theoretical framework for uncertainty monitoring using DSET, which can be used to trigger the update of the knowledge model by the team of experts.The model update of KLAFATE is triggered by an uncertainty rise, either on the system or the knowledge model.Thus, the expert team is gathered to analyze the possibility of an unknown condition.Consequently, the expert team recommends adding information sources by including signals, process variables, or hardware to capture new physical signals.The latest purpose is to ease the identification of unknown conditions to create new knowledge rules in the FMEA.Once the expert team analyzes the acquired knowledge, the knowledge rules are validated using key performance indicators (KPI) in the short and long term.The process to create a rule-based system is described in [10].

V. USE CASE: MODEL UPDATE FOR ENSEMBLE CLASSIFICATION USING TENNESSEE EASTMAN DATASET
As described in section IV-D, the approach's novelty is a methodology for updating data-based models while injecting unknown fault cases in the data.The methodology uses primarily an uncertainty monitoring approach based on DSET.This section presents the results of the improved anomaly detection approach and the model update methodology.The robustness of the approaches is tested using the benchmark Tennessee Eastman.We present a description of the dataset.We describe the experiment design explaining the defined scenarios and the performance metrics.The subsection results provide the performance of the experiments.A discussion subsection closes this section by presenting the findings and limitations of the approach.The model update for the data-based model (ECET) and knowledge-based model (KLAFATE) are green highlighted in Fig. 2.

A. Description of the Tennessee Eastman Dataset
The benchmark Tennesse Eastman (TE) was created by Down and Vogel with the motivation to provide an industriallike dataset based on the Tennesse Eastman chemical plant [34].The TE chemical plant have five principal process components: condenser, reactor, compressor, separator, and stripper.The dataset is amply used in literature to compare the performance of data-based models.The dataset models a chemical process considering 21 fault cases and a normal operation case.The dataset is divided into training sets and testing sets.The training set consists of 480 rows of data containing 52 features for each fault.In contrast, the training set of the normal condition contains 500 rows of data.The testing set consists of 960 rows of data, in which the first 160 rows belong to the normal condition and the rest 800 rows belong to the fault case.Given the prediction difficulty, the fault cases are usually grouped into three categories: easy cases (1, 2, 4, 5, 6, 7, 12, 14, 18), medium cases (8,10,11,13,16,17,19,20) and hard cases (3, 9, 15 and 21) [35].A detailed dataset description can be found in [34] [20].

B. Experiment Design
We followed the procedure proposed in [20], in which we used the benchmark TE to test the performance of the proposed approaches.Besides, we considered a pool of ten classifiers (e.g., five NN-based models and five non-NN-based models) as the basis of the ECs.We considered only experiments using ML-based ECs, and Hybrid ECs (a combination of non-NNbased classifiers and NN-based classifiers).The procedure is documented in detail in [20].We trained the classifiers of the ECs using the fault cases (0,1,2,6,12) as the basis of the experiments.We defined two experiment scenarios: data isolation using a window and an update of ECs.We develop the approach using the IDE Anaconda and the libraries Scikitlearn and PyTorch [36] [37] [38].We perform the experiments on a Ubuntu 20.04.3 LTS environment using a CPU i7-7700 @3.60GHz x 8, 32GB RAM, and a GPU NVIDIA GeForce GTX 1660 SUPER.
1) Data isolation using a window: We selected the MC ECs M3 and H5-2 from the previous work [20] with the best performance criteria.The EC M3 consists of non-NN classifiers, whereas the EC H5-2 is hybrid.We compared the results obtained by performing a variation on the window size.The base classifiers' and ECs' hyperparameters are detailed in [20].
2) Update of ECs: We selected the ML-based ECs M3, M4, and M5 to perform the experiments and comparisons.Given the constraint of limited retraining data, we discard NN-based and Hybrid ECs.The procedure consists of two data batches for each experiment.The first batch contains the known fault cases (0,1,2,6,12) and one anomaly case (e.g., fault case 7).The EC identifies the anomaly through uncertainty monitoring, collects the anomalous data, and retrains the EC if the data is sufficient.We assign the anomaly data with the arbitrary label 30.The second batch contains testing data of the fault cases (0,1,2,6,12) and the anomaly (e.g., fault case 7).For comparison purposes, the original label 7 is changed by the new label 30.We defined three main experiments, namely, the retraining of the ECs using all the fault cases (1,...,21), the study of the retraining parameters (e.g., threshold size, window size, and detection patience) using the fault cases (7,8,15), and the fine-tuned retrained ECs using all the faults (1,...,21).We selected the fault cases (7,8,15) as anomalies to have a case for each primary data group (easy, medium, and hard).
3) Performance Metrics: We use the performance metrics F1-score (F1) and fault detection rate (FDR, also known as recall).F1 and FDR are detailed in [39].

C. Results
This subsection presents the experiment results of the model update approach.For this purpose, the experiments are divided into two parts: data isolation using a window and a model update of EC.
1) Data Isolation using a Window: We perform experiments using different window sizes to study their impact on the EC performance.We compare the effects of using no-window (w = 0) and a window (w = 20, w = 50).
Table II presents the F1-scores of the BIN EC M5 and MC EC H5-2.The hyperparameters of the base classifiers and ECs were reported in detail in [20].The BIN EC M5 presents comparable results while varying the window size with average F1-scores of 0.6%, 0.64%, and 0.65% for the window sizes (0, 20, 50), respectively.In contrast, the MC EC H5-2 presented higher results using a window (20,50) compared to no-window w = 0.The MC EC H5-2 presented average F1-scores of 0.63%, 0.81%, and 0.88% for the window sizes (0,20,50), respectively.Fig. 4 presents the plots of the MC EC H5-2 trained with fault cases (0,1,2,6,12) and using the anomaly fault case (7) while doing a variation on the window size (0, 20, 50).Figures 4a, 4b and 4c show the confusion matrices for the window sizes w = 0, w = 20, and w = 50, respectively.The confusion matrices for the window sizes w = 20 and w = 50 present better results than the confusion matrix with window size w = 0.The predictions plots of figures 4d, and 4d confirm the results of the confusion matrices, in which the predictions (blue) are closer to the ground truth (red) for EC using the window sizes w = 20 and w = 50.The anomaly case (7) is represented as the label (-1) in the predictions plot.It is important to remark that the approach using a window smooths the EC predictions.
2) Model update of EC: We perform three different experiments in this subsubsection: the model update of the EC (retraining), the study of the variation of the retraining parameters, and finally, selecting a fine-tuned retrained EC.
We test the model update of the EC using all the fault cases of the TE dataset.For this purpose, we selected the MC ECs M3, M4, and M5.The hyperparameters of the base classifiers and ECs were reported in detail in [20].Table III presents the F1-scores of the MC ECs M3, M4, and M5 trained with the fault cases (0, 1,2,6,12).The MC ECs M3, M4, and M5 present comparable results with an average F1score of 0.39, 0.36, and 0.37, respectively.The EC MC M3 detected the anomalies (7,17) with F1-scores higher or equal to 0.43 and the anomalies (13,14) with F1-scores higher or equal to 0.33 and less than 0.43.The EC MC M4 detected the anomalies (8,14,17) with F1-scores higher or equal to 0.67 and the anomalies (7,10,11,15) with F1-scores higher or equal to 0.38 and less than 0.54.Alternatively, the EC M5 detected the anomalies (14,18,20) with F1-scores higher or equal to 0.54 and the anomalies (8,17) with F1-scores higher or equal to 0.43 and less than 0.54.Fig. 5 presents the plots of the MC ECs M3, M4, and M5 trained with fault cases (0,1,2,6,12) and using the anomaly fault 7. Figures 5a, 5b and 5c show the confusion matrices for the ECs M3, M4, and M5, respectively.The confusion matrix of the MC EC M5 presents better results than the confusion matrices of the other ECs.Alternatively, the prediction plots of figures 5d, 5e and 5f present mixed results, in which M3 identifies the anomaly better, but the case ( 12) is confused with the anomaly.In addition, M5 presents a better prediction of the known fault cases but has a lower anomaly detection.The uncertainty quantification (UQ) using DSET is presented in figures 5g, 5h and 5i for the MC ECs M3, M4, and M5, respectively.The MC EC M5 presents steadier values than the MC ECs M3 and M4, which confirms the prediction pattern.The latest can be enunciated as the lower the uncertainty, the better the classification performance (likeliness).The next step is the study of the retraining parameters.For this purpose, we test the effects of the threshold size, window size, and detection patience.We chose the MC EC M3 to perform the experiments and selected the threshold sizes (150,250,350) and anomalies (7,8,15).a) Effects of the threshold size: Table IV presents the F1-scores of the MC ECs M3, M4, and M5 trained with the fault cases (0,1,2,6,12).The retraining parameters window size and detection patience are fixed with values of ws = 20 and pt = 15, respectively.The MC EC M3 presented higher results using a threshold size th = 150 with an average F1-score of 0.81 for the anomaly (7), compared with the values of 0.57 and 0.50, corresponding to the threshold sizes (250, 250).The MC EC M3 presents comparable results for the anomaly (8) with average F1-scores of 0.81, 0.82, and 0.82 for the threshold sizes (150, 250, 350), respectively.In contrast, the MC EC M3 presented higher results using a threshold size th = 350 with an average F1-score of 0.74 for the anomaly 15, in comparison with the values of 0.54 and 0.55, which correspond to the threshold sizes (150, 250), respectively.Fig. 6 displays the EC M3 performance for each class while effectuating variations on the threshold size (150,250,350) for the anomalies (7,8,15).The best performance corresponds to the anomaly (8), in which the EC M3 detects the fault cases (0,1,2,6,12) often correctly, and it has limited anomaly detection.In contrast, the EC M3 presents a lower performance while applying the anomalies (7,15).b) Effects of the window size: Table V presents the F1scores of the MC ECs M3, M4, and M5 trained with the fault cases (0,1,2,6,12).The retraining parameters threshold size and detection patience are fixed, with values of th = 250 and pt = 15, respectively.The MC EC M3 presented average F1-scores higher than 0.84 using window size (10,50) for the anomaly (7).Alternatively, the MC EC M3 presented average F1-scores higher than 0.72 for the anomaly (8) using the window size (20,50).In contrast, the MC EC M3 presented higher results using a window size ws = 50 with an average F1-score of 0.74 for the anomaly (15), in comparison with the values of 0.50 and 0.55, which correspond to the window sizes (150, 250), respectively.Fig. 7 displays the EC M3 performance for each class while effectuating variations on the memory size (10,20,50) for the anomalies (7,8,15).The best performance corresponds to the anomaly (8) using a window size ws = 20, in which the EC M3 detects the fault cases (0,1,2,6,12) mostly correct, and it has a limited anomaly detection.In contrast, the EC M3 presents a lower performance while applying the anomalies (7,15).c) Effects of the detection patience: Table VI presents the F1-scores of the MC ECs M3, M4, and M5 trained with the fault cases (0,1,2,6,12).The retraining parameters threshold size and window size are fixed with values of th = 250 and ws = 20, respectively.The MC EC M3 presented an average F1-scores of 0.84 using detection patience of pt = 5 and pt = 30, respectively, compared to the average F1-score of 0.57 for pt = 15.In the case of anomaly (8), the MC EC M3 presented higher results using detection patience pt = 15 with an average F1-score of 0.82, in comparison with the values of 0.78 and 0.58, which correspond to the detection patience (5,30), respectively.The MC EC M3 presented average F1scores higher than 0.73 for the detection patience (5,30), while the average F1-score of 0.55 is obtained with the detection Table IV: Anomaly detection results of MC EC M3 using the fault cases (0,1,2,6,12), the anomalies (7,8,15), thresholds variations (150,250,350), window size (20), patience (15), and F1-score.Fig. 8 displays the EC M3 performance for each class while effectuating variations on the detection patience (5,15,30) for the anomalies (7,8,15).The best performance corresponds to the anomaly (8) using detection patience pt = 15, in which the EC M3 detects the fault cases (0,1,2,6,12) mostly correct and has a limited anomaly detection.In contrast, the EC M3 presents a lower performance while applying the anomalies (7,15).Finally, we present the performance of the ECs with the tuned retraining parameters.Table VII presents the F1-scores of the MC ECs M3, M4, and M5 retrained with the fault cases (0,1,2,6,12) and the respective anomaly.In this case, the anomalies cases are all fault cases except for the original training cases.The retraining dataset contains the original fault cases and the detected data from the anomaly (unknown fault case from the data).The retraining parameters are threshold size th = 250, window size ws = 20, and detection patience    The MC EC M3 detected the anomalies (7,11) with F1-scores higher or equal to 0.55 and the anomalies (9,13,17) with F1-scores higher or equal to 0.34 and less than 0.42.The MC EC M4 detected the anomalies (8,14,17) with F1-scores higher or equal to 0.67 and the anomalies (7,10,11,15) with F1scores higher or equal to 0.38 and less than 0.54.Alternatively, the EC M5 detected the anomalies (14,18) with F1-scores higher or equal to 0.68 and the anomalies (7,11,15,17,20) with F1-scores higher or equal to 0.31 and less than 0.54.

D. Comparison with Literature
Though the current approach can automatically update the models while detecting unknown fault cases from the data, the stored data to retrain the models might be insufficient for some fault cases.Thus, the stored data for some fault cases might not capture the essential patterns to identify the Table VI: Anomaly detection results of MC EC M3 using the fault cases (0,1,2,6,12), the anomalies (7,8,15), patience variations (5,15,30), threshold (250), memory size (20)   condition.In contrast, the contributions of literature presented in the comparison consider all the extent of the testing data.
Table VIII compares the anomaly detection results between the proposed approach and the literature.The multiclass ECs M3, M4, and M5 are originally trained using the fault cases (0,1,2,6,12).The testing data consists of the fault cases (3,9,15,21), which represent unknown conditions to the ECs.For this purpose, each EC is retrained with one fault case at a time.We use the F1-score as a performance metric to compare the proposed approach with other literature contributions.It is essential to mention that the MC EC H5-2 from a previous work [20] uses the full extent of testing data, as well in the case of Top-K DCCA [21].The results of the ECs M3, M4, and M5 present lower results with average F1-scores of 20.36%, 3.50%, and 2.59%, respectively.The results of H5-2 and Top-K DCCA present general scores of 63.69% and 50.04%, respectively.Only M3 presents a score of 31.07%for the fault case 21, which still lies under the better performance results of H5-2 and Top-K DCCA with scores of 63.1% and 50.05%, respectively.respectively.

E. Discussion
The ECs improved the anomaly detection capability after implementing the window size.In the case of the MC EC M5, the general F1-score improved from 0.6 to 0.65 using a window of w = 50 for the latest score.In the case of H5-2, the results are remarkable, in which the general F1-score score improved from 0.63 to 0.88 using a window of w = 50 for the latest score.However, a side effect of the window is a delay effect on the ensemble prediction, which is reflected while comparing Fig. 4d and Fig. 4f.
There are remarkable effects on the EC M3 performance while doing variations on the retraining parameters, namely, threshold size, window size, and detection patience.The results are mixed, and the average performance depends on the studied anomaly.However, from the results, it is possible to identify that a threshold of T h = 150 presented the best average results for anomaly 7.In contrast, a threshold of T h = 350 presented the best results for anomaly 8.Alternatively, the plots of Fig. 6 visualize the performance of each class while doing variations on the threshold.The MC EC M3 presents an overall good performance while applying anomaly 8, in which the EC classifies the known cases mostly correctly and has a limited detection of the anomaly.In contrast, the anomaly detection feature decreases the performance of the known fault cases for some fault cases, which is visually represented in Fig 6a while applying anomaly 7. Variation of the window size reported favorable average performance results for a window of me = 50 while considering all the anomalies (7,8,15).In contrast, the plots of Fig. 7 show that the best results correspond to the window size me = 20 while applying anomaly 8, in which the EC classifies known cases properly, and it has a limited detection of the anomaly.Likewise the threshold experiments, a similar effect of decreasing classification performance of the known cases is detected.Generally, a patience of pt = 5 presented the best average results for all the anomalies (7,8,15).In contrast, the plots of Fig. 8b show that the best results correspond to the patience pt = 15 while applying anomaly 8, in which likewise the window size experiment, the EC classifies the known cases mostly correctly, and it has a limited detection of the anomaly.Likewise the threshold and window size experiments, the performance of the EC is affected by some faults while using the anomaly detection approach.
The performance of the retrained MC ECs presented mixed results.For instance, the EC M3 detected the anomaly cases (4,5,7,11,13) with FDR scores higher than 77% and the anomalies (10,20,21) with FDR scores higher than 53%.However, the results of the retrained ECs presented a lower performance than other literature contributions.The average FDR scores of M3, M4, and M5 are 50.18%,43.60%, and 51.44%.It is important to remark that the retrained models only use 250 samples as training data (only 52% of the available data), in which other fault cases might be included as a side effect of the parameter patience.

VI. USE CASE: PRODUCTION ASSESSMENT USING INFUSION ON A BULK GOOD SYSTEM
As described in section IV-C, the approach's novelty is a methodology for the information fusion of data-based and knowledge-based models.The methodology primarily uses a novel framework for combining n number of models using DSET.
This section presents the results of the information fusion approach and an ablation study considering the different system configurations.The system configurations consist of the detection system using: the data-based model, the knowledge model, or a hybrid model (data-based model together with a knowledge model) using information fusion.We test the approach using a dataset of an industrial setup, namely, a bulk good system laboratory plant.We describe the testbed and the dataset.We present the results and a discussion of the findings.Fig. 2 displays the main blocks of this section: the data-based model (ECET), the knowledge-based model (KLAFATE), and the outer module for the information fusion of both models.

A. Description of the Bulk Good System Laboratory Plant and Dataset
The bulk good system (BGS) laboratory plant is an industrial setup used for testing production and fault detection experiments.The BGS consists of four stations that represent standard modules of a bulk good handling system on a small scale: loading, storing, filling, and weighing stations.A detailed description of the BGS and applications can be

B. Experiment Design
This subsection presents the methodology followed for the ECET and INFUSION experiments using the BGS dataset.Besides, we describe the performance metric used to compare the experiments.
1) ECET using the BGS Data: We followed the same methodology of [20] for the creation of MC ECs using the BGS data, which includes the pool of base classifiers, the grid of hyperparameters of each classifier, and the grid of hyperparameters for each EC.We used the data-based models: decision tree (DTR), K-nearest neighbors (KNN), AdaBoost (ADB), support vector machine (SVM), and naive Bayes (NBY).For this purpose, we first trained the pool of classifiers using only ML models, which implies the search for the proper hyperparameters for each model.The second step is creating the ECs, using the EC hyperparameters.The last step presents the inference results of the ECs while injecting the BGS data.
2) INFUSION using the BGS data: The knowledge-based model KEXT was presented in [10], in which we describe the knowledge rules.We only use the failure modes fm1, fm2, and fm3 for the INFUSION experiments.We present a comparison table using knowledge, data fusion, and knowledge and data fusion models.The KEXT model represents the knowledge model.The data fusion models are represented by the ECET ECs models and a fusion of two data-based models.Lastly, the knowledge and data fusion models are represented by the combination of the SVM-KNN-KEXT models and the INFUSION models composed of an MC EC and the KEXT model.
3) Performance Metrics: We use the F1-score as the main performance metric to compare the different experiments.Panda et al. [39] present a detailed description of the F1-score calculation.

C. Results
This subsection presents the results using the BGS data for the ECET and the INFUSION architectures.For this purpose, we present the F1-score results of the models or ECs.Besides, we display the confusion matrix, classification predictions, and uncertainty for the different architectures.
1) ECET using the BGS Data: The first is to train the pool of base classifiers, which we performed using the module grid search of scikit-learn.Table X presents the hyperparameters of the base classifiers trained with the cases (1,2,3), which corresponds to the failure modes (fm1, fm2, fm3), respectively.The next step is applying the ECET methodology to find the most performing MC ECs.We obtained the ML-based MC ECs, shown in Table XI.The hyperparameters expert (Exp), diversity (Div), version of diversity (Ver), and pre-cut (PC) are set to False.score of 1.00, whereas the base classifiers DTR, KNN, and ADB have values of 1.0, 1.0, and 0.96, respectively.Fig. 9 presents the plots of MC ECs M3, M4, and M5 trained using the cases (1,2,3), which correspond to the failure modes (fm1, fm2, fm3), respectively.Fig. 9a, 9b, 9c show the confusion matrices for the MC ECs M3, M4, and M5, respectively.The confusion matrices present the same performance for the MC ECs M3, M4, and M5.Fig. 9d, Fig. 9e, Fig. 9f display the predictions in blue color compared with the ground truth in red color for the MC ECs M3, M4, and M5, respectively.Likewise, in the previous case, the prediction plots are identical for the MC ECs M3, M4, and M5.Fig. 9g, Fig. 9h, Fig. 9i present the DSET UQ for MC ECs M3, M4, and M5, respectively.In contrast to the previous plots, the uncertainty is reduced as the ensemble size increases.In the case of the MC EC M5, the model presents the clearest plot, except for the fm3, which has a noisy behavior.
2) INFUSION using the BGS Table XIII presents the F1-scores of the knowledge-based model, the fusion of databased models, and the fusion of data-based and knowledgebased models.The knowledge-based model is represented by the model using the KEXT methodology.The fusion of databased models is represented by the models using the ECET methodology (M3, M4, M5) and an additional case performing a DSET fusion of the data-based models KNN and SVM (without the ECET methodology).The fusion of data-based models and the knowledge-based model is represented by the models using the INFUSION methodology (IFS3, IFS4, IFS5) and an additional case performing a fusion of the models KNN, SVM, and KEXT.The KEXT model presents an average F1-score of 0.75, whereas the individual cases (1,2,3) presented values of 0.95, 0.79, and 0.52, respectively.The ECET and INFUSION models (IFS3, IFS4, IFS5) present the best average F1-score with a value of 1.00.The fusion of SVM and KNN presents an average F1-score of 0.96, whereas the fusion of KEXT, SVM, and KNN presents an improved average F1-score with a value of 0.98.It is important to remark on the INFUSION robustness, in which we perform the fusion of a high-performing ECET with a low-performing KEXT.The low performance of KEXT for some fault cases did not affect INFUSION's performance.INFUSION performance presents a steady high performance while examining table XIII and the confusion matrix from Fig. 10c.Alternatively, a detailed examination of the uncertainty provides an additional perspective on INFUSION's performance, in which the uncertainty presents areas with high values.Thus, uncertainty monitoring can be used to evaluate ECET and KEXT to determine the causes of low performance.

D. Discussion
The knowledge-based model KEXT presented mixed results, in which some faults are well identified or predicted.However, the strength of this approach relies on how well the rule represents a machine condition.Representing knowledge rules is a challenging task and often time demanding.An additional positive characteristic of the knowledge-based model relies on its explainability: an expert user can directly observe the logic and transform the rules.
Alternatively, the data-based models using ECET outperformed the knowledge-based model, which is clearly reflected in the F1-scores of Table XIII.However, the relationships between the features and outputs are often hidden (except for data-based models such as DTR, where the rules can be observed).It is important to remark on the number of features the models use, in which the knowledge-based models are built using less than ten features.In contrast, the ECET models are built using 133 features.
The fusion of data-based and knowledge-based models slightly improved the overall system's performance.The fusion model SVM-KNN-KEXT presented an improvement of fault 3 to the fusion model SVM-KNN, with scores of 0.95 and 0.92, respectively.In the case of INFUSION, the ECET results were already outstanding, resulting in a predominant effect on the fusion.The poor performance of some fault cases of KEXT did not affect the system performance.
The INFUSION methodology performed a fusion of the KEXT knowledge-based model and the ECET data-based models.No performance changes were reported since the ECET data-based models (M3, M4, and M5) presented already
• The (re)-training pool of classifiers module, which is formed by the blocks: train model using either the prior training data D T r , or using the re-training data D T r ′ .model validation either the prior validation data D V a , or the new validation data D V a ′ .uncertainty quantification

Figure 4 :
Figure 4: Anomaly detection using different window sizes for the MC EC H5-2 trained with the known cases 0,1,2,6,12, and using the fault case (7) as an anomaly.The confusion matrices of H5-2 are displayed in (a)-(c), and the predictions in (d)-(f).

( a )
Bar chart for M3 using A7 (b) Bar chart for M3 using A8 (c) Bar chart for M3 using A15

Fig. 10
presents the plots of the main models: the KEXT knowledge-based model, ECET data-based model (M3), and the INFUSION model (fusion of KEXT and ECET).
Fig. 10a, 10b, 10c show the confusion matrices for the models KEXT, ECET (M3), and INFUSION (IFS3), respectively.The confusion matrices with the best performance correspond to the models ECET and INFUSION.In contrast, KEXT presents a poor performance by detecting fm3.Fig. 10d, Fig. 10e, Fig. 10f display the predictions in blue color compared with the ground truth in red color for the models KEXT, ECET, and INFUSION, respectively.The clearest plots correspond to the ECET and INFUSION models, whereas the KEXT model presents a noisy plot.Fig. 10g, Fig. 10h, Fig. 10i present the DSET UQ for the models KEXT, ECET, and INFUSION, respectively.In the case of KEXT, the plot presents a continuous line since the expert team can only change the uncertainty's value.In contrast, ECET presents an extremely noisy plot for the fm3.In the case of INFUSION, the plot presents a steadier uncertainty.

Table I :
List of symbols and abbreviations.

Table II :
Anomaly detection results of selected ensemble multiclass classifiers using all the fault cases, and F1-score.

Table III :
Classification results of the ECs after retraining using all the fault cases, and F1-score.The retraining parameters are threshold size th = 100, window size ws = 20, and detection patience pt = 15.

Table VII :
Classification results of the RT ECs after retraining using all the fault cases, and F1-score.The retraining parameters are threshold size th = 250, window size ws = 20, and detection patience pt = 15.

Table
IX compares the anomaly detection results between our approach and the literature.We use the FDR to compare our results with the literature results.The retrained MC ECs M3, M4, and M5 present lower results with average FDR scores of 53.02%, 41.68%, and 35.04%, respectively.The MC ECs M3 and H3-4 present FDR scores of 87.97% and 73.76%, respectively.The approaches DPCA-DR, AAE, and MOD-PLS have FDR scores of 83.51%, 78.55%, and 83.83%,

Table VIII :
Classification results of the ECs after retraining using all the fault cases, and F1-score.The retraining parameters are threshold size th = 250, window size ws = 20, and detection patience pt = 15.

Table IX :
Classification results of the ECs after retraining using all the fault cases, and FDR.The retraining parameters are threshold size th = 250, window size ws = 20, and detection patience pt = 15.