System Condition Monitoring Based on a Standardized Latent Space and the Nataf Transform

This work introduces a new condition monitoring approach for complex systems based on a standardized latent space representation. Latent variable models such as the variational autoencoders are widely used to analyze systems described by a high-dimensional physical space. The encoding of such space defines a low-dimensional and physically representative latent space. Of note, however, the latent space obtained for complex systems operating under multiple conditions is often difficult to exploit in defining an efficient Health Index, thanks to the non-deterministic and hyperparameter-dependent nature of the latent space. In addition, the distribution of the healthy cluster is not known a priori. The original contribution of this paper is to use the Nataf isoprobabilistic transform to map the latent space into a standardized space. This normalizes the spatial structure of the latent space and relaxes the model’s sensitivity to hyperparameters during the learning process. Moreover, the characterization of the healthy condition in the standard Nataf space leads to the definition of two complementary health indices suitable for complex systems. An implementation in two case studies demonstrates the potential of the proposed approach. First, the approach was successfully applied within NASA’s Commercial Modular Aero-Propulsion System Simulation dataset. The second case study consisted of analyzing multiple degradation in operating wind turbines. Encouraging results emerge from both case studies, with critical conditions being detected significantly earlier than in competing approaches. The proposed approach can be generalized to complex systems equipped with multiple sensors, and overcomes difficulties related to latent space analysis of multiple condition systems.


I. INTRODUCTION
Modern engineering systems are increasingly complex and are expected to conform to rising standards of efficiency and reliability [1].These challenging trends are well illustrated by the technological evolution of aeronautical systems, energy conversion systems such as wind turbines (WTs), and large industrial machines, among others [2], [3], [4].The costs and strategic importance of these assets justify the need for advanced Condition Monitoring (CM) approaches.Such approaches aim at continuously monitoring the condition of assets, which in turn allows for optimized Operation and Maintenance (O&M) planning and enhanced reliability, in addition to providing economic benefits.Modern engineering systems are equipped with a large variety of sensors.The measures from such sensors provide a high-dimensional multi-physics description of the condition of these systems.The ever-increasing databases resulting from modern systems create new opportunities for CM.Data-driven approaches based on artificial intelligence models such as Deep Neural Networks (DNNs) receive a lot of attention from industry and academia [5], [6].
Latent space representations are low-dimensional projections of high-dimensional physical spaces achieved through an appropriate inference function.The Variational Autoencoder (VAE) model [7] is probably the most popular Latent Variable Model (LVM) combining artificial neural networks and Bayesian inference methods.Once this generative model is trained, the encoder allows to project the input physical space into a low-dimensional Latent Variable (LV) space.Most importantly, the latent space reveals relations between data points that are usually not evident in the original high-dimensional physical space.Due to these capabilities, the VAE model has proven results in a wide range of applications.Furthermore, multiple variations of the VAE model have been introduced to deal with specific data and modeling characteristics [8], [9], [10], [11], [12].
The case of systems operating under multiple distinct conditions is of particular interest for latent space representations.Indeed, variational encoding can isolate different operating conditions, and will ultimately reveal the relations existing between these conditions [13], [14], [15].Nevertheless, LVs corresponding to complex multiple 32638 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.condition systems are unsuitable for robust CM models due to several restrictions.First, the latent space representation of complex multiple condition systems is often intricate and non-deterministic.Indeed, the spatial representation of LVs is very sensitive to the model's hyperparameters and changes randomly for independent training instances.Finally, the distribution of the healthy cluster is not known a priori.
Variations of the classical unsupervised VAE model in the literature partially address the limitations of the classical VAE in analyzing complex systems [16].In particular, supervised implementations of the VAE model exploit labels such as binary classification (healthy or abnormal condition), multiple-class categorization (healthy condition plus multiple abnormal conditions), or regression (condition indicated by a scalar) [8], [9], [11].In [15], the authors show that the embedding of the VAE model with a classification DNN allows disentangling clusters in the latent space.In [17], the VAE is embedded with a regression DNN.These supervised learning models allow for disentangled clusters in the LV representation, but the latent space still varies between different instances of training.
The present paper aims to define a robust detection approach for complex systems, and to that end, introduces a new framework to achieve a latent space representation for high-dimensional complex engineering systems.The solution investigated herein consists in combining the LVs obtained by the VAE with the Nataf Transform (NT) [18].The NT has been widely used among several reliability applications, such as [19], [20], and [21].The features of the complex system are then projected from the original and intricate latent space into the standard Nataf space.The latter representation overcomes the limitations of classical latent space representations, which allows to define a robust Health Index (HI).
The following are the original contributions of the present paper: • The NT is used to map an intricate latent space into the standard Nataf space.It is shown that the standard Nataf space overcomes the limitations of VAE-based latent space representations of complex multiple condition systems.
• A pair of complementary HI based on the standard Nataf space are introduced.The performance of the detection using the proposed HIs is demonstrated within two application case studies.
• This work introduces two new cases supporting the use of latent spaces as built-in visualization tools to enhance the interpretability of DNN models.
The remainder of this paper is organized as follows: Section II reviews latent space representations and anomaly detection based on such spaces.Section III describes the Nataf Transform.The proposed approach is introduced in Section IV.Two case studies demonstrate the proposed approach: the degradation of engines from the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) FD001 dataset, in Section V, and the degradation of WTs from an operating North American wind farm, in Section VI.Section VII summarizes and discusses the results from the case studies.Finally, Section VIII concludes the paper.

II. PROBLEM FORMULATION A. LATENT VARIABLE MODELS
Let X = {x (i)  } N X i=1 be a dataset of N X observed variables collected from the same system S, whose unchanging underlying true distribution p * (x) is unknown.The observations x are assumed to be independently and identically distributed (i.i.d.) [22], and can be considered as random samples obtained from the unknown underlying process x ∼ p * (x).The objective of LVM is to describe the data space distribution through a set of unobservable variables Z = {z (i)  } N i=1 whose prior distribution p(z) is assumed to be known [22], [23].This latent space or manifold has no physical meaning, but is very suitable for revealing and disentangling the main relevant features from the entangled space of data X.An important condition for this feature extraction process and disentanglement is that the latent space must have a lower dimension than the original space, i.e., dim(Z) = N < N X = dim(X).
The objective of the LVMs is to find the posterior probability distribution function p(x) that gives the best approximation of the true distribution of the data p(x) ≈ p * (x).The posterior data distribution p(x) is then given through the conditional generative distribution p θ (x|z) by p(x) = p θ (x|z)p(z)dz, where p(z) is not conditioned on any observations, and is called the prior distribution over z.The latent posterior distribution p φ (z|x) can be obtained using the Bayes theorem for model inference through p φ (z|x) = p θ (x|z)p(z)/p(x).To this end, two parametric functions f φ and g θ are defined to sample X to the latent space Z, and conversely from Z to X.The inference function f φ , called the encoder, is used to map the data space to the latent space z = f φ (x) and is employed to parameterize the inference distribution p φ (z|x).The mapping of the data space from the latent space is obtained by the generative function x = g θ (z), called the decoder, which is used to parameterize the generative distribution p θ (x|z).
The key idea of this process is to force the latent prior distribution to follow a known distribution such as the Gaussian distribution p(z) ≈ N (z; 0, I).The latent Kullback-Liebler divergence loss is used to regularize the inference function in order to enforce the encoding data into the prior Gaussian latent space D KL (p φ (z|x) ∥ p(z)).Then, the mapping of the data space obtained by the generative function from the known prior distribution is forced to follow the true data distribution p * (x).Since the latter is unknown, p * (x) can explicitly be estimated by forcing the decoder to reconstruct the data x = g θ ( f φ (x)).A data reconstruction loss, such as the L 2 norm or the mean-squared error L θ,φ = 1 2 ∥x − x∥ 2 2 , is then used to guide this reconstruction process.

B. LATENT VARIABLE MODEL FOR FAULT DETECTION
It is important to note that when the two parametric functions (g θ , f φ ) are learned on a training dataset X train and the hyperparameters (θ, φ) are set, the posterior latent distribution obtained by mapping a new data sample X new will belong to the prior manifold p(z) even if p(x new ) ̸ ≈ p(x train ).The reconstruction data xnew = g θ ( f φ (x new )) obtained by both transition functions will have a posterior probability data distribution that somewhat approximates the training probability distribution p(x new ) ≈ p(x train ).See Fig. 1(a).This reconstruction process is very suitable for fault detection if the LVMs learn the mapping of the healthy probability distribution only.
For instance, let p(x healthy ) be the healthy data distribution and p(x faulty ) the faulty distribution for a given anomaly such as p(x healthy ) ̸ ≈ p(x faulty ).If the LVM has learned to map only the healthy data space xhealthy ∼ p(x healthy ), the reconstruction process obtained for any faulty data will give xfaulty ∼ p(x healthy ).The divergence between the original data distribution p(x) and the reconstructed data distribution p(x) is then used as a suitable HI HI = L(x, x).This concept is illustrated in Fig. 1(a), where the distribution of the reconstructed faulty samples is shifted from the original faulty data.Thus, exploiting the reconstruction error is a simple and effective way of detecting any shift in the probability distribution of the monitoring data space caused by an anomaly.
The simplicity of reconstruction-error-based detection is thanks to the fact that the model captures the characteristics of a healthy state.Furthermore, it does not require analyzing complex failure data, or any expert-dependent labeling.This concept involving the calculation of the difference of the residual outputs between monitoring and normal states has been used as a system-wide HI in several papers [24], [25], [26], [27], [28], [29], [30], [31], [32], [33].

C. LIMITATIONS OF THE LATENT MANIFOLD FOR FAULT DETECTION
In the case where only healthy data are inferred on the latent manifold, the inference transition function is less sensitive to anomalous data than is the generative transition function.Indeed, in most papers in which the latent manifold is used for fault detection, the main HI is a combination of the data HI and the latent HI [34], [35], [36], [37], [38], [39], [40].
The main advantage of the inference process over the generative process is that the latent manifold improves the interpretability of the fault detection model.Indeed, the disentangled latent manifold provides analysts and O&M practitioners with a visual tool.The representation of the LVs within time gives a graphical description for the evolution of the system's physical condition.Improving the sensitivity of the latent manifold for fault detection is thus of foremost importance.
Several latent manifold conceptual indicators could be incorporated to improve the detection of any deviation from the learned healthy state in the latent space [41].These latent conceptual indicators are the likelihood-based indicator, the discriminative indicator, the statistical distance-based indicator, the prototypical approaches, and the multivariate signal processing approaches [41].In [42], Balshaw et al. conducted a comparative study evaluating several latent manifold conceptual indicators.The authors concluded that data space HIs outperform latent space HIs.It was also highlighted that the temporal structure of the data must be considered to increase the sensitivity of the latent space to the anomalous data.This temporal preservation of the latent manifold involves the use of a sliding time window W t in the latent space from which the latent HI is developed, i.e., the Latent HI (LHI) can be written as LHI = F(z t ), where F(.) is a latent manifold conceptual indicator and t ∈ [t 1 , t 2 ].The temporal preservation of the latent variables demonstrated better performance in detecting anomalies in vibration signals than in the conventional static case [43].
Furthermore, to improve the interpretability of the latent manifold, it would be more appropriate to include some failure data during the learning process.This can be done in a semi-supervised way, with the expert analyzing some failure data before the learning process, or in an unsupervised way, where the expert analyzes the clusters obtained by the inference function (after the training).
One of the restrictions on the LVM is that the inference process is unique to a given asset and very sensitive to the obtained hyperparameters set during the training.Moreover, the posterior latent distribution of the healthy data is unknown.Because of these limitations of the LVM, a universal LHI suitable for several assets cannot be defined.As shown in Fig. 1(b), all the samples of the training set (faulty and healthy) lie within the same manifold, while the healthy distribution is not known a priori.
In addition, the posterior latent distribution could be unsuitable for some of the conceptual indicators of the manifold, as depicted in Fig. 2. Indeed, in this figure, it can be noted that point P 1 is equidistant to the centroids of the clusters A and B , but P 1 ∈ A .Also, the Euclidean distance is the same between the cluster C centroid and the points P 1 and P 2 , even if P 2 ∈ C and P 1 / ∈ C .Both cases suggest that the relations between points and clusters in the latent space cannot be described using solely the Euclidean distance.

D. TOWARD AN ENHANCED LATENT MANIFOLD FOR FAULT DETECTION
We introduce here an approach that leads to a standard and known posterior healthy distribution.As described in this section and illustrated further in this paper, our proposition allows to overcome the limitations of classical latent space representations.Ultimately, the enhanced representation space is used in the definition of HI for the detection of abnormal conditions in complex systems.
To achieve a standardized representation space, a secondstep inference transition function is introduced to map the latent manifold into a new manifold, as shown in Fig. 1(c).The inference function is the NT [18], [44], which is presented in detail in section III.
The NT takes the healthy dataset from the latent space as a reference.Let Z healthy be the set of healthy data points in the latent manifold.The NT maps Z healthy into S healthy such that S healthy ∼ N (0, I).Given a data point s (i)  ∈ S healthy , let || 2 , where || • || 2 is the Euclidean norm, and D = d (i) is the corresponding random variable.Since S healthy ∼ N (0, I), the random variable D follows the χ distribution with N degrees of freedom χ N .Specifically, the norm of points in healthy condition in the standard Nataf space is characterized by the density function f χ N (d) and the cumulative distribution function (cdf) F χ N (d) [45].This property allows to define HI based on the comparison between the Nataf space projection of datasets and the healthy reference condition.The latter is the same for any asset.Therefore, the NT provides a standard reference for defining a universal HI.Moreover, standard and universal thresholds can be set to quantify the degradation over time.Section IV presents a condition monitoring approach based on the standard Nataf space manifold.

III. NATAF ISOPROBABILISTIC TRANSFORM
The NT was introduced in [18] and is an important tool in reliability analysis [46].Its goal is to transform an intricate probabilistic description into a more tractable one [47].This section briefly describes the NT.
Let the random vector Z = [Z 1 , . . ., Z N ] T and the correlation matrix R Z be the probabilistic model describing the complex system, where each component corresponds to one specific feature or measure.The NT maps Z into the target description S = [S 1 , . . ., S N ] T ∼ N (0, I), where 0 is the null vector (with length N ), and I is the identity matrix (with size N ×N ).Fig. 3 depicts the NT for a two-dimensional latent space.
The NT can be written as the composition of two transformations T = T 1 • T 2 [47].First, T 1 : Z → U is given by (1), where is the cdf of the normal distribution N (0, 1), and F i : R → [0, 1] is the marginal cdf of Z i .The function F i is known a priori, or otherwise, can be estimated from samples with a kernel density estimator.
The random vector U resulting from (1) is assumed to be a Gaussian vector with marginal distributions N (0, 1).The components U 1 , . . ., U N are characterized by the correlation matrix R U ∈ R N ×N , which is referred to as the fictive correlation matrix.The transformation T 2 maps U into the random vector S with uncorrelated components.For this, let the matrix L ∈ R N ×N be the Cholesky decomposition of the inverse of R U , as in: The transformation T 2 : U → S is given by (3), and the space of the variable S is referred to as the standard Nataf space: Estimating the fictive correlation matrix components r Uij = {R U } ij , i, j ∈ {1, . . ., N }, is a key step in the definition of T 2 .The appendix presents the numerical method allowing to estimate r Uij from the correlation coefficients r Zij = {R Z } ij , i, j ∈ {1, . . ., N }.Also, as discussed in the Appendix, the computation of r Uij does not converge when the components of the healthy cluster distribution are highly correlated, i.e., r Zij ≈ 1.In such a case, the NT fails to map the initial distribution into a multivariate distribution with noncorrelated components [44].

IV. PROPOSED CONDITION MONITORING APPROACH
This section introduces a new CM approach aimed at the early detection of abnormal conditions in complex systems.The proposed approach is based on HIs defined from the standard Nataf space representation.The implementation of the approach comprises two phases: online and offline.The offline phase defines the building blocks that later constitute the online phase's pipeline.Fig. 4 depicts these two phases of the implementation with their respective steps, which are described in detail in the following sections.

A. DATABASE DEFINITION AND LATENT SPACE REPRESENTATION
We assume that the complex system is described by multiple measures.Although LVMs such as the VAEs are capable of processing many features and suit feature learning [7], a feature engineering analysis is recommended to select the pertinent measures.In particular, recent literature suggests that the latent space representation is enhanced by restricting the features to non-correlated measures with high informative power [48].The offline phase of the proposed approach requires labeled data in two steps: first, training the VAE model (or similar) and then defining the NT.In the training process, the need for labels depends on the complexity of the system and on the available measures describing the state of the complex system.The final latent space representation must be such that the healthy cluster is as disentangled as possible from other condition clusters.For simpler systems or high informative features, the unsupervised VAE can eventually project the different conditions into disentangled clusters.However, the VAE is usually not sufficient to disentangle categorical clusters for complex real-life systems or relatively noisy measures.To ensure the disentanglement of clusters in the latent representation space for these cases, supervised (or semi-supervised) implementations of the VAE are recommended.
The latent space representation is strongly influenced by the model hyperparameters, and therefore, these hyperparameters can be used to adjust the distribution of clusters in that representation.Choosing suitable hyperparameters leads to disentangled clusters in the latent space.Additionally, to ensure the convergence of the NT, the healthy cluster coordinates must be weakly correlated, i.e., have small correlation coefficients r Uij , i, j ∈ {1, . . ., N }.This requirement is further discussed in the Appendix.
In summary, to apply our approach, the database and its latent space representation must verify three conditions to ensure that the NT is successfully defined.
• The training database must be labeled, or at least, the healthy data must be identified from the degraded conditions.
• In the latent representation space, the cluster of healthy points must be disjoint with respect to any other cluster.
• In the latent representation space, the healthy cluster latent variable coordinates must be weakly correlated.
Overall, these requirements are easily met when the appropriate model is chosen for the latent space projection.As discussed later in this text, using the NT broadens the range of suitable hyperparameters.In other words, the use 32642 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. of the NT mapping relaxes the requirements on the model defining the latent space representation.

B. STANDARD NATAF SPACE REPRESENTATION
The NT is defined with respect to the healthy condition cluster, as depicted in Fig. 4(c).It then maps any point in the latent space into the standard Nataf space.Fig. 5 illustrates a typical standard Nataf space for a multiple condition system.
From the latent space representation, the definition of the NT comprises the following steps: 1) Characterize the healthy cluster in the latent space database.Specifically, estimate the empirical cdfs corresponding to the LVs Z 1 , . . ., Z N , and the correlation matrix R Z between these LVs.2) Estimate the fictive correlation matrix R U from R Z using the algorithm presented in the Appendix.3) Validate the NT.In the standard Nataf space, verify that the healthy cluster approximates the multivariate normal distribution and that the norm of all healthy points approximates the χ N distribution.The Euclidean norm in the standard Nataf space is highly representative of the relation between data points and the reference dataset.For instance, points distributed similarly to the reference condition in the physical space are mapped by the NT within the reference cluster in the standard Nataf space.More importantly, the projection of a healthy dataset into the standard Nataf space gives a cluster distributed as the healthy reference (see Dataset A in Fig. 6).For reference, the healthy cluster radius is defined as the percentile P 99 of the χ N distribution.In a 2D standard Nataf space, for example, this radius is r HY = P 99 [χ 2 ] ≈ 3.
Conversely, abnormal datasets projected in the standard Nataf space are inconsistent with the normal multivariate distribution of healthy conditions.Indeed, degraded datasets can manifest in the standard Nataf space in two different patterns: a sudden and rapid degradation, or continuous and slow degradation.In the first type, illustrated by Dataset B in Fig. 6, the complex system reaches failure within a few time steps.In such a case, the norm of successive data points goes rapidly from typical values to above the radius r HY .The second degradation type is illustrated by Dataset C in Fig. 6.This degradation pattern manifests as a shift in the distribution of points in the standard Nataf space.In this case, the norm of the projected points does not necessarily go beyond the reference radius r HY .
Accordingly, we introduce below two complementary HIs using the Euclidean norm of points in the standard Nataf space: I M and I N .The HI I M aims at detecting outliers more sensitive to rapid degradation with a few monitoring samples, whereas the HI I N is an out-of-distribution detector more sensitive to the slow degradation, but that needs more samples within sliding time windows.

C. HEALTH INDEX I M FOR OUTLIER DETECTION
The HI I M is the Euclidean norm of the point s(t) in the standard Nataf space, as given by (4).
The base alarm criterion for I M is defined by the threshold rule I M (t) > DT M , where DT M is a detection threshold to be set.Since the distribution of the norm statistics for healthy datasets is known to be the χ N , a general DT M value can be set.For example, one can set DT M = P 99 [χ N ], which is valid for any kind of complex system.Also, for any system whose data is projected in a 2D Nataf space (N = 2), DT M = P 99 [χ 2 ] = 3.0 is a suitable value.An alternative method for defining the value of DT M consists in using historical data from healthy systems.First, the I M is estimated for systems known to be in healthy condition.Then, DT M can be set as a percentile of the distribution of the estimated I M .This work analyzes these two techniques to define DT M .
It is recommended to associate the base detection criterion with a persistence criterion to limit the frequency of false alarms.For example, (5) establishes that an alarm is triggered when the base criterion is met four consecutive times.In practice, the alarm base and persistence criteria should be adapted to match the characteristics of the system of interest.
By definition, I M allows detecting degradation modes evolving within a few time steps.Depending on the domain of the complex system under analysis, the detection anticipation might not be sufficient to allow planning and action by O&M analysts and practitioners.Nevertheless, the pertinence of I M might be studied case by case.Conversely, the HI I N introduced in the next section is expected to detect the onset of degradation modes earlier than competing approaches.

D. HEALTH INDEX I N FOR OUT-OF-DISTRIBUTION DETECTION
Many HIs in the literature are based on out-of-distribution detection [15], [49], [50], [51].To our knowledge, no previous work has exploited the out-of-distribution detection in the standard Nataf space.The HI I N introduced in this section was precisely designed to detect shifts in the probability distribution of datasets in the standard Nataf space.
Let τ be a sliding period in which the complex system of interest operates.Let S τ be a random variable describing the distribution of the projection of the dataset of interest in the standard Nataf space, within the sliding period τ .Finally, let F τ (d) be the corresponding empirical cdf of D τ = d(S τ ).The HI I N is defined as the area metric between the analytical cdf F χ N (d) and the empirical cdf F τ (d), as given by (6) and illustrated in Fig. 7.In practice, the area metric can be estimated from samples of both distributions by using the Wasserstein metric (or Kantorovich-Rubinstein metric) [52], [53]: The definition of I N as the area metric between two cdfs was motivated by the following considerations: (i) the reference normal multivariate distribution exhibits a spherical symmetry, and therefore, no pertinent information is lost when the analysis is based on the norm of points; (ii) the reference distribution F χ N is well known; (iii) the proposed I N does not depend on the dimension N of the latent space, and (iv) the area metric is an interpretable HI.
The base alarm criterion for I N is given by I N (τ ) > DT N , where DT N is a detection threshold to be defined.The value of DT N can be defined by considering the distribution of I N estimated from systems known to be in healthy condition.This distribution of I N depends on the space dimension N as well as on the length of the period τ .
Combining this base criterion with a persistence criterion is important to prevent false alarms.The alarm criteria retained for the HI I N is given in (7).Again, these criteria might be modified according to the characteristics of the system under analysis.

E. ONLINE CONDITION MONITORING
The online phase is depicted in Fig. 4(d).The steps of the online phase are as follows: 1) VAE embedded with Classification (VAEC) encoding of the normalized SCADA dataset x τ into the set of latent space points z τ .2) NT-mapping of z τ into the standard Nataf points s τ .
3) Estimation of the two HIs from s τ .
• The HI I M is a point-wise index.It is estimated at each time step t k ∈ τ .
• The HI I N takes s τ .It is estimated for each τ .4) Application of alarm criteria on the estimated HIs I M and I N .Combine both statuses with the ''OR'' logical relation, i.e., an alarm is triggered when any of the alarm criteria is met.The online pipeline calculations are repeated periodically over periods defined by partitioning the timeline.We assume that the system data is defined over the time steps t 0 , t 1 , . . .A period of operation τ can be identified by using its starting and final time steps: τ (k, k + n) = {t k , t k+1 , . . ., t k+n }.The corresponding normalized dataset is then x τ (k,k+n) = {x(t k ), . . ., x(t k+n )}.The partition of the timeline into datasets τ (k, k + n) must meet two opposing requirements.On the one hand, the datasets must be big enough to allow describing the probability distribution of clusters in the latent and the Nataf spaces.On the other hand, the promptness of the detection depends on how rapidly new data points are fed into the online pipeline.Overlapping datasets can be used to satisfy both conditions.The partition of the timeline is then defined by the length of the datasets and the lag time between the beginning of two successive datasets.A sensitivity analysis is recommended for an appropriate choice.

F. PERFORMANCE OF THE CONDITION MONITORING APPROACH
To evaluate the performance of a detection approach, it is customary to implement it on a selection of reference study cases for which the ground-truth detection instants are known.If multiple case studies of reference are available, statistics such as accuracy, precision, F1 score, and Recall can be estimated (see [54]).
However, very often, only a few case studies are available.In such cases, one can prioritize case-specific evaluations.One primary performance metric for detection approaches is the anticipation of failure, i.e., the time between the instant when the detection approach triggers an alarm and the instant of failure [55].Since the instant of failure is rarely available, the anticipation is often estimated with a reference approach.The anticipation interval of an alarm is important for the viability of the detection approach.Indeed, the alarm anticipation must be significant enough to allow in-situ interventions, be it by shutting the system down, inspecting it, or scheduling necessary repairs.

V. CASE STUDY I: ENGINE DEGRADATION FROM C-MAPSS DATABASE
This section demonstrates the proposed detection approach in the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) FD001 dataset [56].This dataset comprises a run-to-failure time series of commercial aircraft turbofan engines in cruise condition.C-MAPSS datasets are widely used by the PHM community, and have indeed come to constitute benchmarking datasets for multiple CM and prognosis approaches, notably the Remaining Useful Life (RUL) estimation [57].In this case study, we apply the proposed HIs to detect the onset of abnormal conditions defined as a function of the ground-truth RUL values.The RUL estimation itself is beyond the scope of the present paper.

A. THE C-MAPSS FD001 DATASET
The C-MAPSS database comes from high-fidelity computational simulations of the operation of a large commercial turbofan.This case study considers the FD001 dataset, which comprises run-to-failure time series corresponding to the degradation of the turbofan in cruise condition [58].Sensors indicate measures at the main turbofan sub-components, namely the fan, the Low-Pressure Compressor (LPC), the High-Pressure Compressor (HPC), the combustor, the High-Pressure Turbine (HPT), the Low-Pressure Turbine (LPT), and the Nozzle.These components are depicted in Fig. 8 [58].
The dataset comprises three kinds of data: • Scenario descriptors: include major flight data such as altitude, flight Mach number, and throttle-resolver angle.
• Sensors: include 14 measurements covering temperature, total pressure, mass flux, and pressure ratio at multiple sections of the turbofan.
• cycle counting from 1st cycle at the beginning of the simulation until the failure cycle n failure .The RUL function is set as a piecewise linear degradation curve.For each run-to-failure time series, it is defined from the cycle counting data.Fig. 9 depicts the typical RUL function RUL(n) = min{n failure − n, RUL 0 }, where the constant initial value is set as RUL 0 = 125.

B. LATENT SPACE PROJECTION
The latent space representation of the C-MAPSS datasets has been analyzed in works such as [17], [60], and [61].The latent space projection introduced by [17] is suitable for demonstrating the approach we propose in the present work.Costa and Sanchez [17] introduced a DNN model consisting of an LSTM-VAE embedded with a regression DNN.The latent space is set with dimension N = 2.The resulting latent space projection is such that the distribution of points is an indicator of the RUL value.We refer the reader to the original paper for details on the LSTM-VAE and the regression DNN, as well as architecture and training parameters [17].Nevertheless, the original model led to a cluster with mostly correlated latent variable components z 1 and z 2 , thus hindering the definition of the NT.To overcome this limitation, the loss function was modified by increasing the weight of the Kullback-Liebler loss component.Fig. 10 depicts the resulting latent space projection of the training database.

C. STANDARD NATAF SPACE AND DETECTION CRITERIA
The reference healthy operational condition dataset comprises the training points for which the threshold relation RUL > RUL c is met.This dataset is used as a reference for the definition of the NT.Fig. 11 depicts the mapping of the healthy dataset into the standard Nataf space.
The distribution of the norm of all points from the reference healthy cluster in the standard Nataf space is depicted in Fig. 12.It confirms a good agreement between the empirical cdf and the χ 2 cdf F χ 2 .
To define the detection threshold for the two HIs, I M and I N , the distributions of these indices are estimated for the   healthy data points, and the detection thresholds are set as the 99th percentile P 99 of each distribution.Fig. 13 depicts the distributions for the two HIs obtained for the FD001 dataset.According to these experimental results, the alarm thresholds are respectively set as DT M = 3.1 and DT N = 1.1.
In this case, the standardization of the healthy condition in the Nataf space eases the definition of DT M .For instance, since the I M is strictly equal to the norm of points, the HI I M evaluated on a healthy dataset follows the χ N

D. EARLY DETECTION OF ABNORMAL CONDITIONS
Once the LSTM-VAER model is trained and the NT defined, new datasets can be projected into the standard Nataf space.The estimation of the two HI I M and I N for engine 2 is given in Fig. 14.The reference onset of degradation for this engine occurred at the 143rd cycle.The alarm criteria led to the I M alarm at the 140th cycle and the I N alarm was triggered at the 130th cycle.Fig. 15 depicts the curves for I M and I N for engine 50.The reference onset of degradation for this engine occurred at the 56th cycle.The HI I M detected the abnormal condition at the 53rd cycle and I N led to an alarm at the 38th cycle.
Fig. 16 depicts the I M detection time versus ground-truth degradation onset time for all the engines of the FD001 dataset.The HI I M led to the same detection instants as the target values (ground-truth degradation initiation) for most cases, with no significant anticipation.This result is attributable to the characteristics of the degradation function.As specified, we assumed that RUL < RUL c indicated the onset of degradation, and that the ground-truth RUL function is a linear function of the cycle counting.
From Fig. 17, it can be seen that the detection by I N anticipates the reference detection instants for almost all engines.Considering all the engines of the C-MAPSS FD001 dataset and the degradation defined above, the HI I N led to the  early detection of degraded conditions, with an anticipation of the reference degradation instant in 12 cycles on average.

VI. CASE STUDY II: WIND TURBINE CONDITION MONITORING
This second case study demonstrates the implementation of the proposed approach on a real-life database from a North American wind farm.Modern WTs are by default equipped with the Supervisory Control and Data Acquisition (SCADA) system, which generates multiple data describing the WT operation.Analyzing the high-dimensional and nonlinear physical spaces resulting from the SCADA sensors is a challenging task.Latent space representations are widely used in the WT CM literature [13], [15], [62], [63].
In particular, the authors of the present paper analyzed the case of WT CM in [15].Our previous proposition used the Mahalanobis distance to define an HI from a latent space representation.This definition allowed a good detection performance, but required extra effort in the definition of a convenient latent space.The selection of suitable hyperparameters can be tiresome and time-consuming for complex systems such as WTs.Also, the lack of comparability between independent training instances was pointed out as a limitation.We show in the case study below that the approach proposed in the present paper outperforms the previous one and finally achieves a standardized low-dimension representation for WTs by using the NT.This second case study demonstrates the suitability of our approach to analyze real-life industrial cases.

A. WIND TURBINE SCADA DATABASE
This case study uses a proprietary database from a North American wind farm.This database contains data on over 100 onshore WTs and covers operations spanning 2 years and four months.The following are the specifications of the WTs: horizontal axis; upwind; three-blade rotors; pitch-controlled; rated power of order 2 MW; cut-in wind speed V in = 3.5 m/s; rated wind speed of V r = 13 m/s, and cut-off wind speed V out = 20 m/s.
The database comprises three kinds of information: measures from the built-in SCADA system, SCADA log files, and O&M reports.The SCADA system is composed of sensors covering measures of geometrical, kinematic, thermal, and electrical characteristics.For each measure, the available data correspond to the average value over 10-minute periods, which is particularly appropriate for performance monitoring [64].It is worth mentioning that vibration and acoustic measurements are not available for the WTs under analysis.The log files indicate warnings and alarms generated by the SCADA system usually based on simple threshold rules.The O&M reports are completed by the O&M practitioners based on in-field observations during inspections and repairs.
Temperature-related abnormal conditions such as the overtemperature of critical components were reported in the database and are commonly investigated in the literature using temperature measures from the SCADA system ( [65], [66]).Table 1 lists the temperature-related conditions analyzed in the present paper.The color codes are defined for subsequent use in this text.The reader is referred to [55] and [63] for illustrations of some of these degradation modes.In all, 35 SCADA measures were made available in the database.The statistical analysis of the time series corresponding to these measures revealed some variables with little or no informative power, e.g., time series with mostly non-numerical values, constant-value time series, as well as sets of highly correlated time series.This work retained the 11 measures in Table 2.The selection includes multiple temperature measures that provide key information on the characterization of the abnormal behaviors listed in Table 1.The preprocessing of the SCADA measures consists of clipping and min-max normalization of data using the lower and upper bounds indicated in Table 2.These values follow from the statistical analysis of measures from all the WTs taken together.Data points with measures falling outside the interval [LB, UB] are removed from the database.The features are normalized into the [0, 1] interval.
A labeled dataset is required for the supervised learning of the retained LVM, the VAEC model.The labeling step consists of a semi-manual selection of data points based on the indications from the SCADA log files.This step was guided by industry experts and used considerations of the physical nature of each temperature-related condition.For example, to build the dataset representing the GBX (Gearbox oil overtemperature) condition, we analyzed the evolution of the key measures WS, n ROTOR , P, T GBX-BEA , T GBX-OIL , T AMB , and T NAC .The starting point to select data points corresponding the GBX condition the time period of gearbox overtemperature reported in the SCADA log files.Often, T GBX-BEA and or T GBX-OIL become increasingly high even before the starting instant indicated in the SCADA log files.Thus, the selected intervals were usually larger than the intervals of abnormal conditions indicated in the SCADA log files.
Given the large dispersion of the SCADA measures, many data points are required to build statistically representative datasets.Moreover, given an operating WT, only a few degradation cases, if any, are observed.Thus, it is not possible to build datasets for all conditions using data from only one unit.To overcome this limitation, we build datasets by gathering data points from multiple WTs.These WTs are supposed to have similar degradation modes since they are of the same model, are subject to the same manufacturing standards, and operate under similar wind and environmental conditions.In practice, the CM approach can be built for a subset of units from a large wind farm.Dataset augmentation techniques can be used if the number of data points for any dataset is still insufficient.Fig. 18 displays the datasets of interest in the normalized power curve.The ICE cluster is well separated from the other conditions and, therefore, can be detected directly from the power curve [67].Nevertheless, the datasets corresponding to the other critical conditions are mostly superposed, and their detection is not as straightforward as for the ICE condition.The ICE condition is included in this case study for validation of the proposed approach.

B. VAEC MODEL ARCHITECTURE AND TRAINING
The proposed approach was implemented in Python (ver.3.10).The VAEC model is built using the TensorFlow library and the Keras API [68].The supervised VAEC model was adopted since the classical (and unsupervised) VAE is not sufficient to obtain a latent space representation with disentangled clusters [15].In accordance with the data, the input (and output) dimension is N X = 11 and the classifier DNN has its output dimension N C = 5.The latent space dimension is set to N = 2.This choice allows to easily display the representation spaces in plots, thus defining a built-in visualization.The retained architecture is described as follows: • Encoder: three hidden layers and one dropout layer.The number of nodes per layer is, successively, 11 (input layer), 32, 16, 8, and 2 (output layer).The input layer is set with the ReLU activation function, and tanh is used in the remaining layers of the encoder.Moreover, a 10% dropout layer is placed after the 32-node layer to prevent overfitting.
• Decoder: 2, 8, 16, and 32 nodes per layer.The decoder output layer is set with the linear activation function.
The other decoder layers are set with the tanh activation function.
• Classification DNN: the input of the classification DNN is the latent space with dimension 2. The successive hidden layers have a decreasing number of nodes: 128, 64, 32, and 16.The tanh activation function is used in the input and in the hidden layers.Finally, the classification output is a five-node layer using the Softmax activation function.The supervised training uses the Adam algorithm with a learning rate of 0.0001, clip value of 0.3, number of epochs 1024, and batch size set to 128 [69].

C. SUITABLE MODEL HYPERPARAMETERS AND NATAF TRANSFORM
One of the advantages of the proposed approach is that it relaxes the requirements for the training hyperparameters.To demonstrate this claim, this section describes the influence of the loss function hyperparameters.The VAEC loss function is given by (8), where L RE is the reconstruction error loss, L KL is the Kullback-Liebler loss, and L CL is the classifier loss.The coefficients β kl ≥ 0 and β cl ≥ 0 allow to adjust the weight of each loss component in the VAEC training: The loss function coefficients are hyperparameters that strongly affect the latent space distribution.For instance, Fig. 19   space and the respective standard Nataf spaces defined taking the healthy cluster HY as the reference distribution.Note in Fig. 19(a), 19(c), and 19(e) that the latent space representations are highly variable, and this is true even when the coefficients are the same.On the contrary, the standard Nataf space gives a standardized representation of the WT healthy condition.
As a corollary, the range of suitable training hyperparameters is broader when using the NT, as compared to the latent representation based solely on the variational encoding of the data.It was indeed shown that different loss function coefficient choices lead to very different latent space outcomes, whereas in the Nataf space, the reference data invariably follows the multivariate normal distribution.The coefficients retained for the VAEC model training are β kl = 0.01 and β cl = 0.1.Hereafter, we use the reference standard Nataf space depicted in Fig. 19(e-f).
32650 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

D. DETECTION THRESHOLDS
The point-wise estimation of d(s(t)) gives a time series with a 10-minute time step that is highly dispersed.In the WT industry, it is common to use daily averages as a means to regularize HIs [55].The definition of the I M is adapted accordingly, as follows: Regarding the estimation of the out-of-distribution HI I N , a sliding period τ = 4 days is used with a lag time lag = 1 day (overlap of three days between two successive periods).The detection thresholds DT M and DT N were set from a selection of ten datasets covering the operation of WTs known to be in healthy condition.Fig. 20 depicts the distribution for both HIs I M and I N estimated from the selection of healthy datasets.As for the C-MAPSS experiments, the detection thresholds were set as the percentile P 99 of each distribution, which corresponds to DT M = 3.0 and DT N = 1.2, respectively.Fig. 21(a) and 21(b) depict the HI I M and I N for one of the ten datasets considered in the estimation of the healthy distribution.Both HI time series remain below the respective detection thresholds.Thus, as expected, no alarm is triggered for this healthy WT.

E. CONDITION MONITORING OF A WIND TURBINE IMPACTED BY MULTIPLE ABNORMAL CONDITIONS
This case study covers the operation of one WT within the period [t 0 = 2019 Oct 1, t f = 2020 Aug 3].The specific WT was selected from the wind farm under analysis because it had been reported to be suffering from multiple critical abnormal conditions; in particular, blade ice accretion (ICE) and main bearing overtemperature (BEA).Therefore, this WT allows to demonstrate the capacity of the proposed approach in detecting multiple types of abnormal conditions.The timeline of events was reconstructed with industry experts based on information from the SCADA log files and O&M reports.The following are the main events in the timeline: • t 0 = 2019 Oct 1: Beginning of the period of analysis.
• t * ICE = 2020 Feb 11: SCADA alarm ICE condition.• t Down1 = 2020 Feb 24: Shutdown labeled as ''Grid voltage fault'', and then as ''Communication Loss''.Three days later, an unsuccessful start-up attempt is labeled by the SCADA system with multiple failure modes, including the BEA condition.The unit is then kept down with ''Repair'' status for more than 40 days.
• t f = 2020 Aug 3: End of the period of analysis.
Fig. 21(c) depicts the I M time series estimated from the SCADA data using (9).The alarm criteria (given by ( 5) with DT M = 3.0) lead to a first alarm at t = 2020 Feb 11.This alarm superposes the occurrence of ice accretion on blades that were indicated by the SCADA system.An I M alarm is then triggered days later, at t = 2020 Feb 24, suggesting the WT was in abnormal condition just before it was shut down for repair.The SCADA system log files report multiple possible causes for the degradation.This unit remained shut down for 45 days with ''Repair'' status.The actions undertaken during this period were not disclosed.The WT starts operating again at t StartUp = 2020 Apr 09.After this date, the I M values go above the threshold DT M erratically.It is only after t = 2020 Jun 01 that I M becomes persistent and, thus, triggers an alarm.The I M alarm at t = 2020 Jun 01 is 28 days before the SCADA alarm at t * BEA .The estimation of the HI I N over the period of interest gives the I N (τ k ) time series depicted in Fig. 21(d).As previously mentioned, each τ k covers four consecutive days of operation, and the lag time between two successive datasets is of one day (τ = 4 days, lag = 1 day).The alarm criteria (given by (7) with DT N = 1.2) led to an alarm at t = 2020 May 22, which is 31 days before t * BEA and more than 2 months before t Down2 .Regarding the ICE episode in February 2020, although the base criterion I N > DT N was met multiple times in the freezing days around t * ICE , the persistence criterion filtered out any alarm during this month.Eventually, one could adjust the alarm criteria or use specific control charts to enable alarms for rapidly evolving degradation modes.Nevertheless, in this work, the I N alarm criteria were set to prioritize the detection of degradation evolving over the long term, i.e., within multiple days, in opposition to the HI I M alarm, which focuses on degradation modes evolving over the short term.The asset condition status results from the combination of the two HIs with the OR logical relation.Fig. 22 indicates the resulting alarm time series raised by the proposed approach for the WT under analysis.
To illustrate the rationale of the HI I N in a real-life case, the estimation of this HI is visually represented in Fig. 23 (See page 17).The steps for the estimation of I N are depicted for the periods highlighted in Fig. 23(a), where the periods τ k , k ∈ {A, B, C, D}, are:  • τ A = τ (2019 Oct 31, 2019 Nov 04).
• τ D = τ (2020 Jun 07, 2020 Jun 11).Each row of Fig. 23 shows the evolution of the abnormal condition over the four selected datasets.In the latent space representation in Fig. 23(b), it is remarkable that the dataset distribution evolves from the healthy cluster toward the main bearing overtemperature cluster (BEA).This progression is also visually evidenced in the standard Nataf space in Fig. 23(c), and even more so in the area metric between the empirical cdf and the χ 2 cdf in Fig. 23(d).
Finally, Fig. 24(a) and 24(b) depict the time evolution of the WT projection into the latent and Nataf spaces, respectively.Each point in these plots is the centroid of the projection of a four-day dataset.The timeline is indicated with the color map, which gives the visualization of the WT condition as a trajectory in the representation spaces.The blue cluster corresponds to the healthy HY condition and the red cluster to the BEA condition.Note that the transition from the HY cluster toward the BEA cluster is evidenced in both plots.

F. COMPARATIVE ANALYSIS OF THE PERFORMANCE FOR DETECTING MAIN BEARING OVERTEMPERATURE
This section focuses on the detection of the BEA condition that raised the SCADA alarm at t * BEA and led to the WT 32652 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.shutdown at t Down2 .To evaluate the performance, Table 3 summarizes competing detection approaches and gives the respective detection dates for the dataset analyzed above.The approaches from [55] and [70] detect main bearing degradation exclusively.In [71], a classical unsupervised VAE model is used.From Table 3, the anticipation obtained from the proposed approach outperforms that estimated using previous works [15], [55], [70].The gain in performance by I M is mostly due to the use of the labeled database to train the VAEC model.This supervised training allowed to have a disentangled LVM latent space that was then mapped into the standard Nataf space.The performance of the proposed approach is further discussed in the next section.

VII. SUMMARY AND DISCUSSION
This section summarizes and discusses the results from case studies I and II.
The success of the proposed approach in obtaining a standardized LVM representation for complex systems was evidenced within the two case studies analyzed in this paper, which used latent space representations based on supervised LVMs to achieve disentangled latent representations.The LSTM-VAE-Regression model was chosen to project the C-MAPSS FD001 dataset into a 2D latent space.Turbofan engines in this study case were considered as healthy or degraded.For the second case study, WTs SCADA data were projected into a 2D latent space using the VAEC model.Its supervised training used a labeled database comprising five conditions -the healthy condition plus four abnormal conditions.
It is worth noting that the VAE model alone might be sufficient to obtain a disentangled LV representation for simpler systems or systems with highly informative features.As long as the healthy condition is identified in the latent space and the LVs are not highly correlated, the NT can be defined.In most real-life complex systems, however, supervised implementations of LVM might be necessary to obtain a suitable LV representation, i.e., with disentangled clusters.The complexity of the systems analyzed in the case studies and the nature of the available measures guided the choice of supervised LVMs.Fortunately, the labels are required only in the offline phase.Once the LVM model and the NT are set, the online phase relies only on the updated features to give the condition output for the complex system being monitored.
The analysis in this paper used 2D representation spaces (N = 2).This setting facilitates plotting and visualization.It was shown that the resulting standard Nataf spaces verified the expected properties in both case studies.Indeed, in both cases, the healthy cluster follows the 2D normal distribution and the norm of its points follows the χ 2 distribution.The HIs I M and I N were defined from the norm in the 2D standard Nataf space.One could consider setting higher latent space dimensions if needed.For higher N values, the properties of the standard Nataf space and the definition of the two HIs are essentially the same, except for the degree of liberty N of the χ N distribution.
As expected, the HIs I M and I N gave complementary outcomes.The HI I M allowed to detect outliers of the normal distribution.It was proven to be particularly suitable for detecting abnormal conditions evolving within a few time steps, such as the ICE condition in case study 2. In practice, the margin of action for rapidly evolving degradation modes is limited, and the detection of such events is of limited interest.Nevertheless, the HI I M is retained in the proposed approach to ensure the robustness of the combined HI I M OR I N .
The two case studies revealed that the HI I N is suitable for degradation modes evolving in the long term, and was able to detect the onset of abnormal conditions long before these conditions became critical.In the C-MAPSS FD001 engines database, I N predicted the reference detection 12 cycles in advance, on average, whereas I M gave results similar to the ground-truth times of degradation initiation.In the WT case study, I N anticipated the SCADA alarm for the BEA condition in 31 days, outperforming the I M detection and multiple competing approaches.The earlier detection of abnormal conditions by I N is a promising result given that the longer anticipations give more time for O&M planning and interventions.
The choice of detection criteria might be adapted according to the application of interest and the accuracy and false alarm rate requirements.Moreover, a sensitivity analysis is recommended for optimal adjustments of settings such as τ and lag .The persistence criteria chosen for I N in this paper were such that the HI focused on degradation modes evolving in the long term.That explains why no I N alarm was raised for the rapidly evolving ICE condition in February 2020.The association of the two HIs allowed to maintain relatively simple base and persistence criteria.Regarding the detection thresholds, the results from the two case studies suggest that the definition of the threshold DT M from the theoretical distribution χ N is pertinent.Indeed, the theoretical value DT M = P 99 [χ 2 ] = 3.03 is very close to the values obtained from the empirical probability distribution for healthy engines (DT M = 3.1) and for healthy WTs (DT M = 3.0).Analogously, similar DT N values were estimated for the two case studies: DT N = 1.1 for the healthy engines and DT N = 1.2 for the healthy WTs.
The enhanced performance of the detection based on the standard Nataf space can be associated with two main aspects of the proposed approach.First, supervised learning was used in both case studies as a strategy to obtain disentangled clusters in the LVM latent spaces.In this regard, the proposed approach requires more information than the competing approaches analyzed in section VI-F.Second, the NT can be interpreted as a regularized zoom-in on the reference healthy condition.As a result, shifts in the original physical space are amplified in the standard Nataf space, leading to stronger indications of the onset of degradation modes.
Plots of the 2D LVM latent space and of the standard Nataf space supported the arguments and analysis presented in this paper.It was shown that, for the C-MAPSS FD001 data, the position in the latent space is related to the level of degradation of the engine.For the VAEC latent space trained to reproduce the WT SCADA data, the position in the latent space suggested the condition to which a data point belongs.The case studies showed that the standard Nataf space preserves the capacity to visually represent the evolution of the condition of complex systems.Finally, in Fig. 24(a) and 24(b), it was shown that the evolution of the system condition can be represented as a trajectory in the LV and Nataf spaces.Where the LVM latent space identifies multiple conditions, that can potentially provide diagnosis information since abnormal points move toward the corresponding abnormal condition clusters.Exploring this aspect, as well as a quantitative assessment of the improvements in interpretability, represents potential avenues for future research.

VIII. CONCLUSION
This work addressed the LVM-based representation of complex systems.The main contribution of this paper is the introduction of a standardized latent space representation for complex systems by using the Nataf isoprobabilistic transform.It was shown that, contrary to the highly variable LVM representation, the standard Nataf space preserves the probability distribution of the healthy condition, namely, the multivariate normal distribution.The proposed CM framework can be applied to a variety of complex systems.This versatility was demonstrated within two case studies covering distinct technological domains.
The standardization of the LVM representation allows to compare representation spaces corresponding to multiple assets.Most importantly, two HIs, I M and I N , were defined from the characterization of the standard Nataf space.The proposed approach combines the two HIs to achieve robust detection.I N allows to detect the onset of abnormal conditions significantly earlier than competing approaches.Complementarily, I M is a point-wise HI that captures degradation modes evolving within a few time steps.For example, in case study 2, I M detected ice accretion on blades, which was not detected by I N .The two case studies demonstrated the performance and potential of the proposed detection approach.
As a secondary contribution, this paper introduced two examples of applications for the emergent field of inter- pretability of DNN models.For instance, the two case studies illustrate how the LVM latent spaces and the standard Nataf space can be used to visually represent the evolution of complex systems.These visualizations complement the condition status information and can, therefore, enhance the interpretability and trust of the outcomes among O&M analysts and practitioners.

APPENDIX. NATAF TRANSFORM
This Appendix presents the numerical method allowing to estimate the fictive correlation coefficients r Uij = {R U } ij of the NT.

A. ITERATIVE COMPUTATION OF THE NATAF CORRELATION COEFFICIENTS
Let r Zij = {R Z } ij be the correlation coefficient between the physical space variables Z i and Z j , i, j ∈ {1, . . ., N }.The relation between the correlation coefficients r Uij and r Zij for given i, j ∈ {1, . . ., N } is given by (10), as shown at the bottom of the page.In this integral equation, µ Zi and σ Zi are the average and standard deviation of Z i , respectively, and ϕ 2 is the probability density for a bi-variate Gaussian distribution with correlation r Uij , as given by (11), as shown at the bottom of the page, [44].
The m-points Gauss-Hermite quadrature is employed for the approximation of (10), which is given by (12), as shown at the bottom of the page.The integral equation can then be recursively solved by using Algorithm 1.It takes r Zij as input to estimate r Uij .The settings are the error tolerance ϵ tol > 0, the maximum number of iterations N max ≫ 1, and the Gauss-Hermite quadrature order m with the respective roots and weights.For example, Table 4 gives the Gauss-Legendre quadrature roots and weights for the order m = 7 [44].The joint probability density function (pdf) ϕ 2 (u i , u j , r Uij ) is not defined when |r Uij | = 1, i.e., when Z i and Z j are correlated variables (See (11)).More generally, the integral equation given by (10) is not guaranteed to have a solution when the correlation coefficient is close to one [47].In practice, Algorithm 1 does not converge in such cases, which means that there is no uncorrelated set of variables that is equivalent to the original set in terms of probability description.Setting the representation space dimension as N = 2 allows to easily plot these spaces, therefore defining a built-in visualization tool.Moreover, the 2D NT requires the estimation of only one correlation coefficient r U 12 .In this paper, the NT is defined with respect to the healthy cluster projected into a 2D LVM space.Let Z HY 1 and Z HY 2 be the two coordinates associated with the latent space distribution of the healthy cluster.From Z HY 1 and Z HY 2 , one can estimate the empirical cdfs F 1 (z 1 ) and F 2 (z 2 ), and the Pearson correlation coefficient r HY Z 12 .Then, r HY U 12 is estimated using Algorithm 1.For notation simplicity, let ρ = r HY U 12 .Then, provided that ρ ̸ = ±1, the correlation matrix [1, ρ; ρ, 1] is invertible.Then, the NT is completely defined by ( 13) and (14), where is the cdf of the normal distribution N (0, 1).(14)

FIGURE 1 .
FIGURE 1. Latent variable models: (a) The reconstruction process as a suitable HI; (b) Training datasets include healthy and faulty samples; (c) The proposed Nataf inference process is defined with respect to healthy samples.

FIGURE 2 .
FIGURE 2. Illustration of a 2D LVM-latent space distribution for a multiple condition database.⊗: Cluster centroid.

FIGURE 3 .
FIGURE 3. The NT maps the latent space distribution Z = [Z 1 , Z 2 ] T in (a) into S = [S 1 , S 2 ] T ∼ N (0, I) in (b).The cdf F (d ) of the distance metric d is plotted (continuous orange line) against the χ 2 cdf (dashed blue line) for each space.

FIGURE 4 .
FIGURE 4. Framework of the proposed condition monitoring approach for a system operating under multiple conditions.The offline phase comprises three steps: (a) definition and labeling of the database, (b) training of the VAE embedded with Classification (VAEC) model, and (c) definition of the NT.The online phase (d) estimates the condition status from the normalized dataset x τ = {x(t ), t ∈ τ }.

FIGURE 5 .
FIGURE 5. Illustration of a 2D standard Nataf space projecting a multiple condition dataset.s = [s 1 , s 2 ] T is a point and d (s) = ||s|| 2 .The random vector S HY describes the distribution of the reference healthy cluster.If the computation of the fictive correlation matrix does not converge, one should re-train the VAE-like model to obtain non-correlated latent space variables.

FIGURE 6 .
FIGURE 6. Projection of datasets in the standard Nataf space.Dataset A represents a healthy dataset, whereas B and C correspond to two patterns of degradation.

FIGURE 7 .
FIGURE 7. Estimation of I N from the pdf of d (s) for the datasets (a) A and (b) C from Fig. 6.

FIGURE 10 .
FIGURE 10.Latent space representation of the C-MAPSS FD001 dataset.This 2D latent space corresponds to the LSTM-VAE-Regression model.

FIGURE 11 .
FIGURE 11.Standard Nataf space projection of the healthy condition data (RUL > RUL c ).

FIGURE 12 .
FIGURE 12. Distribution of the distance metric for the healthy condition cluster in the standard Nataf space.

FIGURE 13 .
FIGURE 13.Distribution of the HIs calculated on the healthy dataset: (a) I M and (b) I N .

FIGURE 14 .
FIGURE 14. HI I M , I N and the corresponding alarm status for turbofan 2 of the C-MAPSS FD001 dataset.

FIGURE 15 .
FIGURE 15.HI I M , I N and the corresponding alarm status for turbofan 50 of the C-MAPSS FD001 dataset.

FIGURE 16 .
FIGURE 16.Comparison between the reference degradation initiation instant and the HI I M instants of detection.

FIGURE 17 .
FIGURE 17.Comparison between the reference degradation initiation instant and the HI I N instants of detection.

FIGURE 18 .
FIGURE 18. Distribution of the datasets of interest in two plots: (a) normalized power curve, (b) normalized gearbox temperature versus normalized rotor rotation speed.
depicts three training instances of the VAEC latent

FIGURE 20 .
FIGURE 20.Distribution of the HIs calculated on a selection of ten WTs in healthy condition: (a) I M .(b) I N .

FIGURE 21 .
FIGURE 21.For a healthy WT: estimation of (a) I M and (b) I N .For a WT with reported main bearing overtemperature: (c) I M and (d) I N with and the respective alarms.

FIGURE 22 .
FIGURE 22. Timeline indicating dates of events and alarms in the case study.Gray fill indicates WT shut down.The proposed approach detected abnormal conditions starting at t = 2020 Feb 11 (coincides with t * ICE ), t = 2020 Feb 24 (which equals t Down1 ), and t = 2020 May 22 (which is 31 days before t * BEA ).

FIGURE 23 .
FIGURE 23.Estimation of the HI I N (τ k ) for 4-day periods τ k , k ∈ {A, B, C , D} indicated (a) in the T BEA time series; (b) Encoding of the datasets of interest τ k (in magenta) on the latent space; (c) NT of the datasets of interest τ k on the standard Nataf space; (d) Estimation of the HI I N from the empirical cdf.

FIGURE 24 .
FIGURE 24.Projection of the case study main bearing overtemperature as a trajectory in (a) the latent space and (b) the standard Nataf space.(c) Timeline color-map.

TABLE 1 .
Temperature-related WT conditions with the respective color codes.

TABLE 2 .
SCADA measures with the respective lower (LB) and upper (UB) bounds.