Data-Driven Virtual Sensing for Probabilistic Condition Monitoring of Solenoid Valves

There is an emerging industrial demand for predictive maintenance algorithms that exhibit high levels of predictive accuracy. Such condition monitoring tools must estimate dynamic quantities, such as Remaining Useful Lifetime (RUL) and the State of Health (SOH), based on a, typically, restricted set of measurements that can be obtained in an operational setting. These quantities exhibit inherent stochasticity and can only be approximately determined a posteriori to system failure. This paper proposes a generic prognostic tool for probabilistic condition monitoring of mechatronic systems, with the aim to improve the probabilistic prediction of condition metrics, specifically RUL and SOH. Therefore we propose to identify a Hidden Markov Model (HMM) from a fully instrumented measurement set, that is only available for a restricted set of run-to-failure experiments, typically gathered in an R&D setting. Although being artificial and retrospectively constructed metrics, we interpret RUL and SOH as physical measurements with the purpose to identify accurate degradation dynamics. Once the degradation model is identified, we practice the mathematical flexibility of the HMM framework to estimate several of the no longer available dynamic quantities of interest in real-time, from the limited set of measurements that are available in an operational setting. This modelling paradigm is known as virtual sensing. Predictive performance and computational efficiency are further improved by domain knowledge based pre-processing of the measurements. We apply our methodology to solenoid valves (SV), a widely used and often critical component in many industrial systems, which display a large variation in useful lifetime. Benchmark results show that the predictive capabilities of the presented methodology compares with prognostic techniques that are more computationally and memory demanding. Note to Practitioners—The motivation for this research is twofold. First there is a pending industrial need for improved diagnostic and prognostic tools. Second there is the observation that lifetime tests usually take place in an R&D setting and that expert labelling of Remaining Useful Lifetime (RUL) or State of Health (SOH) of a component or system is often based on measurement data that is not available in the industrial setting where the prognostic tools are to be deployed in the end. These two observations suggest that there is large potential in methods that can correlate the expert labelling, in particular RUL & SOH signals, with measurement data that is available in the industrial setting. Our approach has been tested in detail on the case of Solenoid Valves, which are widely used in industry and that are often safety critical. Our experiments demonstrate that the method compares with brute force approaches that overpower ours both in terms of computational as well as memory requirements. The method is furthermore generic and there is no reason to assume it would not work for other applications.

expert labelling of Remaining Useful Lifetime (RUL) or State of Health (SOH) of a component or system is often based on measurement data that is not available in the industrial setting where the prognostic tools are to be deployed in the end.These two observations suggest that there is large potential in methods that can correlate the expert labelling, in particular RUL & SOH signals, with measurement data that is available in the industrial setting.Our approach has been tested in detail on the case of Solenoid Valves, which are widely used in industry and that are often safety critical.Our experiments demonstrate that the method compares with brute force approaches that overpower ours both in terms of computational as well as memory requirements.The method is furthermore generic and there is no reason to assume it would not work for other applications.

Notations
• t Given sequence until time t.

I. INTRODUCTION
T HE operational efficiency of industrial processes can drastically benefit from the integration and optimization of maintenance strategies in the logistic decision making process.The practice of maintenance strategies allows for maximized operation time of equipment, avoidance of unexpected machine failure and reduction of overall operational costs [1].In today's industry, it is already common practice to rely on domain knowledge based approaches, such as Fault Tree Analysis (FTA) [2] and Failure Modes and Effect Analysis (FMEA) [3], for reliability engineering.Though, such rule-based methods require a priori knowledge about the degradation dynamics, they cannot cope with temporal dependencies and effects, and rely heavily on the actual expertise of practitioners [4].
Meanwhile, industry is undergoing the Fourth Industrial Revolution, which is characterised by the increased integration of digital systems in physical production environments.This evolution, in combination with the steady price drop of sensory equipment, is accompanied by increased instrumentation and data logging.Subsequently, measurement data has become abundantly available for (real-time) statistical analysis and (post-)processing [5].In turn this has increased the expectations and requirements that industrial practitioners have with regard to monitoring and estimation procedures in the context of maintenance, better known as Predictive Maintenance [6].
A central objective of predictive maintenance algorithms is to predict the Remaining Useful Lifetime (RUL) of a machine (component) [7] or to estimate its State of Health (SOH).The RUL of a machine is the expected usage time remaining before the machine requires repair or replacement.The SOH on the other hand is an artificial metric related to the condition of the system compared to ideal conditions, often the conditions at the time of manufacturing or when it was first commissioned.Both physics-based and data-driven approaches can be pursued to estimate RUL and SOH. 1lassical physics-based approaches rely on a comprehensive understanding of the system's behavior and potential failure mechanisms to predict the time evolution of critical states, which can be used to indirectly deduce the RUL [8].However, such physics-based approaches rely on intimate knowledge of the process as well as how systems can degrade or fail.There is furthermore a limit on the extent that long horizon forecasts can be trusted [19].So instead of relying on exact physical models and probabilistic simulations, rather one emulates the degradation dynamics directly using a discrete HMMs (dHMMs) where the latent state-space occupies a discrete set.For the purpose of RUL and SOH estimation, one imposes a so called left-to-right structure on the dHMM where the latent states represent degrading states of the system with the left most state(s) representing healthier and the right most state(s) faultier states.The RUL is estimated by calculating the remaining time of absorption conditioned onto the observation sequence up to the present time instant t.Whilst the computational procedure resembles that of a physics based approach, no true physical meaning can be attributed to the latent space.Useful references in this line of work are [20], [21], [22], [23], [24], [25], [26], [27], and [28].In conclusion we may note that lifetime tests are required to fit the imposed model structure making the distinction with data driven approaches more ambiguous.
Data-driven approaches, on the other hand, typically rely on a set of lifetime tests and associated condition monitoring data to estimate the RUL directly.These rely exclusively on a (limited) set of (accelerated or historical) lifetime tests and associated condition monitoring data.This library is then used to detect and compare trends in the real-time measurements, and, process the time series data into a direct estimate of the RUL.Similarity and survival models are two families of methods that rely on historical lifetime tests to predict statistical estimates with associated uncertainty [9], [10].These methods however do not involve an offline pre-processing step and generate an online RUL estimate based on a similarity assessment of the present time signals with recorded signals.More recently, the use of Deep Learning (DL) techniques has also been proposed [29].It is argued that DL approaches are well positioned to automate the pre-processing and extraction of useful features to a high degree [11].The information from the library is captured in a single purpose model, mapping the input into a deterministic RUL estimate.Extensions have been made that capture uncertainties, mostly by relying on Bayesian Neural Networks (BNNs), and thus are able to express their confidence in the estimate [12].Convolutional (BCNNs) architectures can be used to treat high-dimensional time series data, capturing both spatial and temporal features, representing the sensor signals as images.These approaches however yield no insight in the complex mechanisms underlying the process; and, once learned, the model cannot be repurposed.Important references are [12], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], and [41].
To overcome some of the deficiencies of strictly DL approaches whilst exploiting the principles and well understood theory of physics-based modelling, in this work we propose a data-driven virtual sensing 2 approach tailored to RUL and SOH estimation.Our data-driven virtual sensing approach assumes access to a highly instrumented time series data set from dedicated run-to-failure experiments, which may only be obtained in an R&D setting.In agreement with the virtual sensing paradigm, we aim to estimate several of the measurements that are impractical or impossible in an industrial setting, from a limited set of measurements that can be obtained.We therefore rely on Hidden Markov Model (HMM) theory as an explanatory causal framework, interpreting the data as (artificial) measurements from a dynamic latent state.A HMM is identified using a regularized implementation of the Expectation-Maximization algorithm.The learned HMM is then deployed in a model-based state estimation framework.By reinterpretation of the RUL and SOH as physical measurements -which are only available once the system has failed-this strategy can be repurposed to produce probabilistic estimates of both these quantities of interest.Moreover, we obtain a figure of the uncertainty as a silent feature originating from the HMM framework.To the best of our knowledge this is the first publication to address predictive maintenance by repurposing the virtual sensing paradigm.We consider the SOH and RUL as metrics that are a function of latent variables.This enables closer correspondence with the physical reality of the machine or component degradation mechanism that demands monitoring.This to ultimately reach for higher levels of predictive accuracies.
We validate our approach to predict the RUL and SOH with uncertainty quantification of Solenoid Valves (SVs).SVs are 2 Sometimes also referred to as "soft sensing".commonly used components in many industrial processes and often perform safety critical tasks.Accelerated lifetime tests show that there is a large variation in their lifetime.We aim to base the RUL prediction solely on the measurement of the AC current of the SVs, as to come up with a noninvasive prognostics tool whilst only relying on industrially available sensory equipment.Given the arbitrary linear character of the RUL and SOH over time, in this preliminary study we used a linear state-space model to model the deterioration dynamics of the SVs and make prediction about their lifetime and health.Furthermore, expert domain knowledge is leveraged to pre-process the time-series data in the form of useful feature extraction.Our goal with this case study is to show that by leveraging a limited amount of domain knowledge to replace the automatic pre-processing and feature extraction a much simpler linear time-invariant model is able to perform at similar level or above the BCNN approach from [12].
The contributions of this paper are: (1) repurposing Hidden Gaussian Markov Models (HGMM) in the context of condition monitoring by interpreting posterior RUL estimates of a historical dataset as measurement variables and treating the problem as a probabilistic system identification problem, (2) demonstrating the advantage of using multiple measurement variables during training which need not be reconstructed during real-time monitoring, apart from those that are of interest at that time, (3) validating and analyzing our proposed method in terms of predictive capabilities about the remaining useful lifetime of solenoid valves.

II. METHODOLOGY A. Mathematical Problem Formulation
First let us specify the distinction between the two types of measurements mentioned in the introduction.Measurements in general will be denoted with the symbol y ∈ R n y or z ∈ R n z , when a time label is available, a subscript is added, for example z t .For brevity we introduce the format y t = {y 1 , . . ., y t } for leading subsequences of temporal data.We silently assume that all sequences start at discrete time t = 1.Finally, we emphasize that the concept of a measurement is used in a broad sense, meaning that measurements can represent unprocessed sensory outputs but can also represent processed information or so called feature variables.
The first type of measurements, y, are assumed to be available always, meaning that whatever the circumstances and time of the experiment are, a value for the measurement is available.For the second type of measurements, z, this assumption is no longer valid, meaning that the corresponding measurement values are only available for a restricted set of experiments, for example the type of experiments that can be (and were) conducted in a research and development (R&D) facility but that cannot be replicated in an industrial environment for one reason or the other.The combination of these two measurements determine the complete measurement, {y, z}.It is further assumed that the complete measurement is available in a restricted set of experiments so that correlations can be drawn between y and z.
As stated before, for many applications of practical interest it would be considered highly purposeful if we could somehow estimate the measurements that are inaccessible in an industrial experimental setting; equivalently, that are only available in the R&D setting, based on the measurements values that are available in an industrial setting.More specifically, if we were able to construct a function that maps, y, into, z.
Provided the stochastic nature of measurements and system behaviour per se (aleatoric) and the uncertain reasoning inherent to modelling from data (epistemic), we adopt a probabilistic framework, and replace the deterministic function described above by a probabilistic model, providing a description of the probability density of z rather than a point estimate; all in all, yielding a more informative description of z.Formally such an estimate could be expressed as the probability of measuring, z, conditioned on the known measurement value, y, that would be p(z|y).Now, when a measurement sequence is available, meaning we have multiple measurements of the first category, e.g.y t , taken at different time instants in the (near) past preceding our current estimate about z t we can extend our probabilistic model accordingly.Of course, we could store all of these measurements, notation-wise, in a single measurement, y; though making the dynamic nature explicit will turn out to be very useful for computational purposes.In summary, the goal of this work is to compute the following probability from data.

B. Definition of RUL and SOH Signals
In this work we focus on the case where z represents some abstract monitoring metrics, in particular Remaining Useful Lifetime (RUL) and State of Health (SOH) signals.We aim to estimate RUL or SOH from measurements, y, that are easily accessed on the system when deployed in practice.Such are highly application specific and are not specified further here.
The RUL is a subjective estimate of the remaining time that an item, component, or system is able to function in accordance with its intended purpose before warranting replacement.In the ideal case the RUL can be associated to some physical quantity that exceeds some limit.One could then estimate a value for that quantity and calculate the probability that it exceeds the limit or even calculate the probability of that estimate exceeding the prescribed numerical limit within a certain period of time, the latter being the closest to the objective definition of RUL.
Here x t represents a physical variable describing the full state of the system, f denotes a function of that state that if exceeding a value ϵ f implies the system has failed and finally P( f (x t+T ) < ϵ f |y t ) denotes the probability of the predicted value f (x t+T ) exceeding ϵ f conditioned on the measurements y t and on its turn must exceed a threshold value ϵ P .
Unfortunately such intimate knowledge of the system is rare.In practice such an explicit physical definition of the end of life is usually absent.Therefore we may exploit the benefit of hindsight.For experiments that we know to have failed, we can calculate an RUL signal retrospectively, defining its value simply as the remaining time between its present evaluation and the moment that the system has failed. 3

RUL(
where the time instant T f refers to the first time failure is observed.Because this implies large variability in the initial RUL value, RUL(0), instead we use here the normalized RUL (NRUL).Consequently, any prognostics model should only predict the slope of degradation and no longer the initial bias.
From the NRUL we can still compute the RUL The SOH is a subjective estimate of the health index or condition of the system compared to the ideal situation.As opposed to the RUL it is difficult to determine a physical definition for the SOH other than the observation that the system finally broke down.To highlight correspondence with the (N)RUL signals we adopt a similar approach, yet recognize a healthy section in the system's lifetime before degradation and ultimately failure kick in.In retrospect, the NRUL predictor in (4) thus assumes the system instantly starts degrading from the onset of first commissioning.
Here T d refers to the time instant where deterioration kicks in.Now assume that we have access to a limited set of lifetime tests consisting of a set of measurements y(t) that can also be accessed in practice and a set of auxiliary measurements z(t).Using the approach described above we can then extend the auxiliary measurement set with signals RUL(t) and/or SOH(t).Corresponding to the formulation in section II-A we are interested in identifying a probabilistic model that can estimate the RUL/SOH signal from on-site measurements, y t .This approach circumvents the need for a physical cause of failure or a definition of the health of the system elegantly.

C. Virtual Sensing Using HMMs as an Explanatory Framework
The question that remains unanswered is how to compute this relation efficiently?Probabilistically we are trying to model the correlation between two (a primary and a secondary) measurement sequences, y and z respectively.The computational complexity of such a model is heavily influenced by the conditional independencies that are imposed on the variables that constitute the model.To foster further insight, it helps to visualize the conditional dependencies that govern the model 3 Remark that, in the absence of uncertainty, if f , in the example above, is a monotonically increasing function of time -which is a reasonable assumption provided that degradation is a progressive process that cannot reverse it action -the RUL in definition ( 2) coincides with the definition provided in Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.graphically [13].In Fig. 1a a graphical representation of the most straightforward probabilistic model implied by ( 1) is illustrated.Here arrows indicate a conditional dependency between the child and parent variables.Hence, the secondary measurement z t is thought to be conditionally dependent on the primary measurement sequence of the (exemplary) length t.This model seems to suggest a causal relation between the variables in y t and the variable z t .Under the assumption that the relation is stationary and that a sequence of length t contains all the required information to predict z t from the secondary measurement sequence, indeed the most straightforward way is to model the function p(z t |y t ) directly and shift the argument one time instant to compute z t+1 and so on [12].Though, clearly this is not the only way of modelling the correlation between to two measurement sequences y and z and certainly not the one with the most computationally rich structure.As an example, assume that we become interested in computing the probability p(z t+1 |y t ) or p(z t+ |y t ), ∈ N for that matter?Extending the previous modelling approach would then imply learning separate forward models for each of these distributions, not being able to lean on conditional independencies and associated factorization properties to simplify the calculations [13].
Therefore we propose another way of modelling the correlation between the two measurement sequences.We assume that there exists a single underlying process characterised by the latent state, x t ∈ R n x , that spawns both measurement sequences independently.The probabilistic graph corresponding with this point of view is represented in Fig. 1b.This model is known as a Hidden Markov Model (HMM) [14], [15].As suggested by the arrows, characterisation of the HMM requires specification of the following local probability models.
• initial probability p(x 1 ) • transition model p(x t+1 |x t ) • measurement models p(y t |x t ) and p(z t |x t ) Relying on the HMM imposed on the dynamics of the experiments, our estimate, p(z t |y t ), can now be decomposed by means of the latent state, x t , which circumvents many of the conditional dependencies that made the direct modelling approach complicated and computationally unattractive.Instead, the HMM offers a practical computational procedure.Specifically, the prediction of z t+ , ∈ N by conditioning on y t , then becomes trivial once we have calculated p(x t+ |y t ).
So why bother with estimating the second class of measurements, z, if we can also access a state estimate?The answer is related to the means by which we shall arrive at a model to practice the probabilistic inferences described above.Within this work, we assume that such a model is not existent and cannot be constructed in any meaningful sense by a team of experts.Especially this is the case when the measurements represent processed sensory output or features.The causality of the process and hence the temporal causality of the sensory output will be transferred to the features but the underlying physical phenomena will be obscured by the operations wielded to construct those features.Thus, instead we assume only to have access to a set of experimental data, {Y, Z}, existing out of K measurements sequences, y T and z T , i.e.Y = {y k T } k and Z = {z k T } k .From those measurements we shall identify a HMM.As we will come to explain later, in this case the state may be uninterpretable and therefore useless for any practical purpose other than calculation.In summary, we exploit the structure of the HMM only to have a more insightful model of the correlation between y and z, rather than we hope to extract any useful information from x itself.

D. Filtering & Predicting of Latent Variables
In the previous subsection, we boasted the computational flexibility of the HMM, demonstrating how p(z t+ |y t ) could be computed efficiently once we had calculated p(x t+ |y t ).
Here we explain how to compute the probability p(x t+ |y t ).
In HMM theory, the probability p(x t |y t ) is known as the filtering distribution and can be calculated recursively [16] p(x t+1 |y t+1 ) ∝ p(y t+1 |x t+1 ) p(x t+1 |x t ) p(x t |y t )dx t (8) The probability p(x t+ |y t ), > 0 is known as the predictive distribution and can be calculated recursively once the filtering distribution has been calculated.

E. Identification of H(G)MMs
Before we can practice the inferences described above, we require a probabilistic model, say M, capturing the conditional probabilities that govern the HMM.Such is usually obtained by parametrising the probabilities with a parameter θ .Formally, we write p(•|•, θ ), more in particular, we search quantitative local probabilistic models, p(x t+1 |x t , θ ), p(y t |x t , θ ) and p(z t |x t , θ ).In this section we further treat the question of how to characterize a HMM given the data, D = {Y, Z}.
A fully Bayesian approach would imply calculation of the a posterior probability over the parameters θ conditioned on the data D, i.e. p(θ |D).Assuming that we can evaluate the probability p(θ |D), the model is then determined by expressing the joint probability p(θ, and marginalizing out the parametric dependency, rendering the model p(x t+1 |x t , D), p(y t |x t , D) and p(z t |x t , D).This approach captures both aleatoric uncertainty about the system itself (as governed by p(•|•, θ )) and epistemic uncertainty about the true parameters.Unfortunately the marginalization and hence the approach in general is intractable but for a handful of toy problems.Alternatively, a Maximum A Posteriori (MAP) estimate, θ * , can be made by maximizing the probability of the parameters conditioned on the data, i.e. θ * = arg max θ log p(θ |D).The MAP estimate can then be substituted into the model p(•|•, θ * ) so to obtain an estimate of the true Bayesian model.This approach still requires that we evaluate p(θ |D) which, using Bayes' rule, can be rewritten as p(D|θ ) p(θ ).Application of the MAP modelling approach, thus requires us to define a prior probability for θ , which, in most cases, cannot be established in any meaningful way so that the the uniform is chosen.The MAP modelling approach with uniform prior collapses onto a Maximum Likelihood Estimation (MLE).The MLE parameter is estimated as θ * = arg max θ log p(D|θ ).This objective expresses the likelihood of the measurements as a function of the parameters.
Applying the MLE approach to the model in Fig. 1b, thus requires to solve the following optimization problem max where It is cumbersome to treat this optimization problem directly.Instead, the Expectation-Maximization (EM) algorithm estimates the parameters iteratively by considering a surrogate objective Q(θ, θ ′ ), updating the parameters as follows The objective, Q(θ, θ ′ ), is often referred to as the Evidence Lower Bound (ELBO).A brief overview of the EM algorithm is given in and Appendix B. For details we refer to [14], [15].
It can be shown that the series in (12) converges monotonically to the solution, θ * , of problem (10).The surrogate ELBO objective and the original MLE objective are then related as 1) Hidden Gauss-Markov Models: Solving (12) for arbitrary probabilistic state-space models remains intractable or requires the introduction of additional approximations.For linear state-space models however, ( 12) can be solved explicitly.A linear continuous HMM is referred to as a Hidden Gauss-Markov Model (HGMM).Characterisation of a time invariant HGMM requires specification of the following local probabilities.
Solution of ( 12) for given θ i then yields the required parameter updates.The solution of this problem for HGMMs is well studied in for example [16], [17], and [18].In particular the parameters {µ 1 , 1 , A, Q, C, R} can be expressed as a function of the smoothing distribution parameters which are also Gaussian, p(x t |{y, ).Similar to the filtering distributions (see sec.II-D), the smoothing distributions can be computed efficiently.We refer to Appendix C for computational details regarding the parameter updates.
2) Regularization: The measurement likelihood, p(Y, Z|θ ), of a time invariant HGMM is subject to an important parameter invariance structure that may impact convergence [17].It is well known that the latent space can be determined up to a similarity transform, x ′ = T x, where T is a non-singular square matrix.One verifies that the parameter set θ can be transformed in an equivalent parameter set θ ′ without affecting the value of (11).Therefore, the EM algorithm may converge to any member of this manifold, depending on the initial values for the parameter estimates.Theoretically one cannot express any preference to any of these members, however, for numerical reasons it is advised to constrain the model family in (12) by imposing restrictions on θ .
In this work we imposed the following constraints.
• The covariance matrices Q and R are constrained to the diagonal set.The update is performed as per usual but only the diagonal elements are maintained in Q ′ and R ′ .
• After each standard update, the transformation T is chosen so that C ′ = CT −1 = I.In case that n x > n y +n z , the leading n y + n z latent states are selected out.The case where n x < n y + n z was not considered since this would imply that n y + n z − n x measurements are linearly depend of the other n x .Finally, it is worth mentioning that determining the transformation matrix every update, generally yielded better results than fixing C = I.If these constraints are satisfied, the local probability models, p(y t |x t ), and, p(z t |x t ), are in fact uncorrelated, so that We list here also our solution initialization procedures.
• We initialize the elements of A and C uniformly in [0, 1).
• The covariance matrix 1 is determined as 1 = PP ⊤ where P is initialized similar to A and C.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
• The covariance matrices Q and R are initialised with diagonal elements uniformly in [0, 10 −2 ).
First we may note that the expressiveness of a discrete HMM is limited.In a left-to-right dHMM it is assumed that each state relates to some level of degradation of the associated object.Hence this implies that the associated observations can only depend on the degradation level of the object, i.e. the hidden state, which s by virtue of the HMM modelling assumptions.As a result either the range of measurement associated to a degradation state is limited or the variance of the emission model is large.Secondly, the brute force data-driven approaches aim to recognise patterns between the data and the new experiment excluding the possibility to exploit any physical insight nor allow any physical interpretation.
Our approach is hence situated in between either lines of work discussed above.Strictly speaking, the HMM structure simply offers a computational structure to calculate the probability (1) efficiently.Therewith our approach closely resembles the strictly data driven approaches.On the other hand, the HMM structure can also be compared with a state-space model that governs the physical dynamics of the system.So although it is difficult to attribute a physical interpretation to the identified latent state space, it does cover the dynamics of the latent physical system dynamics.In that sense our approach is closest related with the first line of work.With the exception that in our approach the internal state of the degrading object is represented by n continuous state variables, significantly improving the expressiveness of the model.The observations need not to depend exclusively on the degradation state but can rely on any combination of the latent state variables.In conclusion we emphasize that we know of no other works that can and do consider the use of other measurements signals in the vector z to stimulate the discovery of a physical latent space; which, as we will show with our experiments, turns out to be highly purposeful.

III. SOLENOID VALVE DATASET
We tested the approach from sec.II on a data set containing accelerated lifetime tests of Solenoid Valves (SVs).SVs are commonly used components in many industrial processes, often performing safety critical tasks.These tests show large variation in SV lifetime.Development of predictive maintenance algorithms for SVs could significantly benefit SV reliant applications.In this section we discuss earlier work on diagnostics and prognostics of SVs.Further we provide details regarding the set-up and data that we used in sec.IV.

A. Related Work on SV Diagnostics & Prognostics
Model-based approaches have been proposed by [42], [43], and [44].These works focus on deterministic fault detection and diagnosis.Data-driven approaches have been proposed for SOH diagnostics of SVs and were successful in anomaly detection and failure clustering.All of these methods, however, fail to give an indication of the End of (useful) Life (EOL).Tod et al. [45] presents a hybrid physics-based data-driven tool that is capable of diagnosing failure modes using a physically interpretable model.Mazeav et al. [11] proposed a Deep Learning method using an ensemble of Convolutional Neural Networks (CNNs) to construct a health index of SVs, which can be extrapolated to a deterministic RUL prediction, based on current measurements in the power electronics circuit feeding the SVs.Later this work was continued and expanded [12] by combining Bayesian CNNs with physical features as described in [45], as to be the first and to our knowledge the only, to obtain accurate RUL predictions with uncertainty quantification (UQ).These approaches obtain accurate results but tend to rely on computationally demanding learning methods and memory demanding models.

B. Set-up
An experimental test set-up, as shown in Fig. 2, was used to perform accelerated lifetime tests of 48 solenoid valves under realistic conditions.The tested valves are direct acting 3/2 way normally closed AC solenoid valves.The SVs are switched on and off at a rate of 1Hz by an AC voltage of 110V at 50Hz.During the open or "ON" state, fluidum can flow through the valve for approximately 0.5 seconds, followed by a transition to the closed or "OFF" state, during which fluidum is blocked for another 0.5 seconds during each cycle.The time-series data of one on/off cycle was captured every hour over the span of 6 weeks (∼ 1150h).All solenoids were supplied by a line pressure of 8bar and the tests were conducted at an ambient temperature of ∼ 25 • C.
These tests were performed in an R&D setting so that apart from easily accessible measurements, such as SV currents and ambient temperature.The currents are easily extracted from the power electronics circuit.Yet, the data set also contains less convenient measurement data that would not be available during normal operation, such as air supply temperature and pressure, valve surface temperatures and air flows at the outlet ports and ventholes (blow-off holes).The ground truth of the lifetime is measured in number of on/off cycles.

C. Solenoid Valve Degradation & Detection
There is a clear physical difference between a new valve, that is before operation, and a used valve at the end of its useful lifetime.The used valve shows signs of wear at its plunger and deformation of its valve seal, as can be seen on Fig. 3a and Fig. 3b respectively.Although, clearly, SV degradation is physically detectable we aim to develop a method that is practical for SVs during normal deployment.Therefore, our prognostics model will, in the end, solely rely on the current measurements that were mentioned above.Any prognostics  Three stages in the lifetime of a SV can be identified.The SVs are given a label, l ∈ {healthy, degrading, faulty}, by an expert based on all measurements mentioned above.There is no one clear criterion to determine these labels.Fig. 4 shows the measured signal of the outlet flow and current signal during the switch-on and the outlet flow during the switch-off of an on/off cycle for each of these three stages.Luckily, pronounced differences emerge from the visual comparison of these signals from the three different lifetime stages.
The 'outlet flow -ON' plot shows the outlet flow of the fluidum through the valve starting at the beginning of the open state, i.e. from the moment the plunger is being opened.The plunger is opened by applying the AC current signal as depicted by the 'current -ON' plot.The 'outlet flow -OFF' plot shows the outlet flow from the moment the plunger is being closed.The spring loaded plunger is closed from the moment the AC current signal is no longer applied and typically occurs faster than the opening.
In the healthy operating condition, there is no leakage when the valve is closed and throughflow steadily starts rising from the moment the valve is opened.When the valve is closed, the throughflow steadily decreases back to zero. 4 As mentioned before, faulty behaviour can express itself in various forms (fauling, other blockages, plunger blocked in end position . . .).The valve in its faulty operating condition used for Fig. 4 for example shows no sign of leaking air when it is closed, but on the contrary completely blocks throughflow of air when it should be opened.The opposite behaviour or an intermediate form would be classified as equally faulty behavior, demonstrating the non-trivial character of the problem.
A solid opening of a healthy SV corresponds to a single hit of the plunger, which translates to a single notch in the current measurement.A degrading plunger has a more bumpy movement of the plunger during opening, resulting a multitude of these notches during opening.Finally, the plunger at its EOL is stuck in this case and therefore displays none of these notches.Apart from that the sinusoidal-like waveform of the current signals clearly varies over the three lifetime stages.
Although described measurement sets are highly indicative for the SV degradation and general SOH, these are usually not available in an industrial setting, necessitating the development or indentification of alternative features.

D. Domain Knowledge Based Features
These time-series data carry information that can be more conveniently summarized in a number of features.From the discussion above, two crucial features related to the flow measurements can be determined • the measured flow when the valve is still closed just before opening, hence the first point in the 'outlet flow -ON' measurements q leak , • the measured flow when the valve is still open just before closing, hence the first value in the 'outlet flow -OFF' measurements q out .The first one indicates whether the supposed closed valve is leaking fluidum, the second one indicates whether the supposed open valve is hindered in supplying fluidum.
From this, three current features were extracted • The position of the (first) characteristic notch in the current signal (corresponding to the first hit), Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
• The impedance spectrum magnitude at the fundamental AC-source frequency I m,fund , • The inductance spectrum magnitude at the fundamental AC-source frequency I d,fund .Finally, The valve surface temperature can be reduced to the average value over one on/off cycle, as it merely varies over this short timespan.A strong increase of the valve surface temperature over a longer time period also indicates deterioration.

A. Mass-Spring-Damper System
In order to verify the proposed methodology in sec.II, we treat a mechanical example first.The system, as schematically depicted in Fig. 5, consist of two connected masses of which the first mass is also connected to a fixed reference.Both connections are established by a parallel spring and damper.The corresponding parameter values used in the simulation are listed in Table II.Its state is characterized by the positions x 1 and x 2 of both masses and their respective velocities v 1 and v 2 .The system is described by a linear time-invariant model which exactly fits the aforementioned HGMM framework and can thus be used to validate our method.The mode and model parameters used for simulation is detailed in Appendix A.
A HGMM was learned from K = 20 experiments assuming full yet noisy state observations and according the approach detailed in sec.II-E.To validate the virtual sensing framework we then aim to estimate the position and velocity of both masses whilst only having access to measurements for the first mass.More formally, First, we assess the performance of the model identification procedure.For this simple mechanical example, we know that the state that completely describes the systems' dynamics is 4 dimensional.However, for a practical use case, the dimensionality of the latent state is often unknown.Therefore, it is required to do a search over a range of latent state dimensions, as depicted in Fig. 6.Given that the 4 observed variables are independent, it is senseless to assess lower dimensions of the latent state, as these could never yield all the information contained in the complete observation (recall the discussion in sec.II-E2).We consider a range between 4 and 15.Clearly, a latent state dimension of 4 is most consistent and on average obtains the best maximization of the objective function (13).Second, we validate the filtering framework tailored to virtual sensing.Fig. 7 depicts filtered and predicted measurement sequences for both masses.Clearly, the state measurement of the second mass are successfully reconstructed from the state measurements of the first mass.

B. Solenoid Valves
Here we will deploy the virtual sensing strategy on the industrial problem of RUL and SOH prediction for SVs.
It is recommended to comment on the following issue before we address the estimation problem itself.It is anticipated that the supporting linear theory will produce artefacts that defy the definition of the RUL and SOH (recall sec.II-B).Specifically, the estimated values, and in particular the associated coinfidence interval, may adopt values in the set (−∞, 0)∪(1, +∞).Comparison of the estimated trajectory to the true underlying trajectory for the mass-spring-damper system based only on the measurement of the position and velocity of first mass.This is an insurmountable byproduct of the linear theory used to embed the degradation or failure dynamics.Resolving this issue by intrinsically adapting the emission distribution would destroy the linear context.The only workable resolution would be to post-process the output distribution by means of e.g.clipping or truncating the distribution.However then this would be an aesthetic intervention rather than an intrinsic property of the approach.So rather we accept these non-physical artefacts as a compromise to the computational efficiency.
With this final issue clarified, we can present out results.Recall that we aim to obtain a noninvasive RUL/SOH prognostics tool for condition monitoring of these valves.As discussed, therefore we only want to rely on current measurements as inputs for the operational model.However, for the system identification, multiple other features from the R&D data can be used.More formally, y = {I d,fund , I m,fund , notch} and z can be any combination of the following features RUL Remaining Useful Lifetime SOH State of Health  q out outlet flow during on-phase q leak leakage flow during off-phase t h time of first hit during valve opening T surf plunger surface temperature All results discussed below are obtained with 5-fold cross validation [15].
1) Latent State Dimensionality Study: As for the springmass-damper system, in order to guarantee that we will eventually obtain the best possible model feasible within our proposed framework, we first have to perform a search for the optimal latent state dimensionality.Fig. 8 shows the complete-data loglikelihood for different latent state dimensionalities in case z = {RUL}.All dimensionalities were assessed 50 times.Again, we see that increasing the latent state dimension above the number of observations during training, 4 in this case, results in decreased performance on average.Although some outliers seem to be able to exploit the increased model complexity.However, if we consider Fig. 9 and Fig. 10, which respectively show the loglikelihood of observing Y = y and RUL during operation, given only y, the model only seems to be able to properly fulfill its function as a filter for the measured observations and fails at estimating the RUL.The ability to increase the model complexity could have its benefits in case the presented set of features are a lower dimensional embedding a higher dimensional physical manifold.The fact that the RUL/SOH features lack physical interpretability could explain why this opportunity is not exploited.
2) Feature Analysis: A key idea of our framework that we can include information during the system identification procedure that is not observable during operation.This has the Fig. 10.
Comparison of the RUL estimate, log p(RUL|y), for different latent state dimensions on the SV data set for the case where y = {I d,fund , I m,fund , notch} and z = {RUL}.Fig. 11.Comparison of RUL models for including additional features in z apart from RUL and SOH.The optimal latent state dimension was determined using the procedure described in sec.IV-B1.Fig. 12.Comparison of MAE for RUL predictions averaged over the entire lifetime per solenoid valve.The optimal latent state dimension was determined using the procedure described sec.IV-B1.
advantage that a more accurate dynamics model can be learned and deployed with fewer inputs during the filtering phase.Fig. 11 shows the comparison of models deployed on test SVs, that were trained with different features combinations.The base case is always y = {I d,fund , I m,fund , notch}, z contains at least RUL/SOH.For each of these combinations of model inputs we first performed a latent state dimensionality study as discussed in the previous section.However, a general tendency could be observed to choose the dimension equal to the input dimensionality.Each combination was assessed 50 times.
Clearly, including either one of the other four available features from the R&D dataset has an impact on the model performance.The optimal features combination turns out to be {q out , q leak , T surf }.Fig. 12 shows that including additional Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.features during model learning outperforms the base case.We hypothesize that a superior underlying dynamic model can be identified, including the temporal dependencies of the various features included, whilst, on the other hand, the 3 current features remain sufficient to estimate the underlying state and thus the quantities of interest such as the RUL and SOH.
3) RUL Prediction: Some examples of both RUL and SOH predictions and the corresponding the 95% confidence intervals are shown in Fig. 13, for three different SVs.These predictions are obtained with the best possible model, resulting from the analysis in the previous section.The RUL predictions are relative values, with the ground truth scaled between 0 and 1.On top of the figure, RUL predictions, given by a model that was trained without the SOH labels, are shown.In the middle, SOH predictions, given by a model trained with SOH and without the RUL labels, are shown and the bottom row shows the results for a model where both labels were available during the model identification phase.
Even in case the model has never seen SOH labels, the behavior of the RUL estimate typically consists out of 3 phases: a rather constant RUL estimate around the value of 1 at the beginning of the valves' lifetime, followed by a phase of accelerated detorioration and finally a correct estimation of no remaining useful lifetime when the SV has failed.Naturally, the RUL predictions clearly tends towards the shape of the SOH characteristic.The behavior of these RUL predictions could be summarised as an underestimation of the SOH characteristic up to the point of EOL, forced by the learned monotonically decreasing RUL character.This learned monotonically decreasing character also result in predictions below 0 for the last solenoid, to only correct to zero once the model is confident that the system has failed.
For the second case, the model is clearly much more capable of correctly identifying the SOH dynamics, conforming our hypothesis.Although the variance on the prediction is not significantly smaller, the model seems much more robust as the prediction is less noisy compared to the previous case.
Finally, in case both labels are available during training, the RUL and SOH estimates are clearly highly correlated.Again, the RUL predictions tend to better approximate the SOH than the actual RUL characteristic.However, towards the end of the useful lifetime, the RUL predictions are accurate and more stable than in the first case.
One could argue that our model fails to capture the behavior of the RUL characteristic.However, as stated before, these empirical RUL signals are artificial constructs and are supported by little physical insights other than the observation that the system finally broke down.Reconstructing the corresponding RUL value as a signal that decays linearly with time is an arbitrary choice and any other decaying carrier signal would have been just as valid.Despite the forced bias of having to put forth a monotonically decreasing estimate of the RUL over the lifetime of a SV, the model persists in predicting a relatively constant value of the RUL, and thus deviating from the true RUL characteristic, in the early stage of the lifetime until it observes actual signs of deterioration.It can thus be stated that the model successfully identifies the true underlying degradation dynamics, to a certain extent, even given a poor indication of those dynamics.
Table I summarizes and compares our methodology to the results of closest related state-of-the-art methods, in terms of the mean absolute error (MAE) between the predicted RUL and the true RUL of the solenoid valves until failure.We compare the mean and standard deviation of the average MAE, the MAE at 70% and at 90% degradation per solenoid valve.This degradation corresponds to the respective 70% and 90% of the useful lifetime of the solenoid valve, which has been retrospectively determined for each solenoid valve.The raw current signals refers to the high frequency current measurements before extraction of any features as depicted in Fig. 4, current features refers to the extracted current features based on expert knowledge as described in III-C, the optimal feature set refers to the optimal features combination is described in IV-B2.The implementation of the similarity model is based on [9] and [47].The results from the frequentist CNN (FCNN) and Bayesian CNN (BCNN) are by [12] Mazaev et al.It is important to mention that the FCNN and BCNN automatically extract features from the raw current measurement, while the similarity model and the HGMM model use preprocessed current features.Furthermore, it is important to mention that additional improved results are obtained by [12] as they include two physical forces as input to the model, identified by Tod et al. [45] by leveraging a detailed physical model of the solenoid valves.
We can observe that our HGMM approach in combination with the current features outperforms the similarity model and performs in terms of predictive capabilities at a similar level as the end-to-end CNN models that deduce RUL estimates directly from the raw current signals.The possibility of including additional features during the identification of the degradation model clearly enables the HGMM to outperform all other models during deployment.

V. CONCLUSION
In this article we proposed a data-driven framework for probabilistic condition monitoring.Our aim is to model the correlations between two measurement sequences based on time series data from dedicated run-to-failure experiments, that are obtained in a R&D setting.This means the system is fully instrumented and measurements are logged that are impractical or even unachievable in a practical and/or industrial setting.Based on this model it is then possible to estimate the unfeasible measurements from the limited set of available measurements, a concept known as virtual sensing.This allows for real-time estimation of the unfeasible measurements in an industrial setting.Specifically, our approach identifies a Hidden Gauss-Markov Model from the complete measurement set and uses Bayes filtering techniques to reconstruct the unfeasible measurements in practice.
We showed that, although the RUL/SOH signals are not objectively physical quantities, our framework is capable of estimating these artificial signals with high precision.By default a measure of uncertainty is obtained as an inherited property of the Hidden Markov Model theory.The uncertainty on the RUL/SOH measurements can be interpreted as epistemic uncertainty on the prediction, which represents, according to the Bayesian interpretation, our belief about the RUL/SOH.Two practical tendencies were observed.First it is useful to include additional features during training other than the available measurements and the features that we want to predict.This is unsurprising given that the more information that we grant to the training process about the underlying dynamics, the better the final identified model.Though it was also observed that certain features destabilize the training procedure, as measured in terms of prediction accuracy, possibly on account of the assumed linearity of the model.Second, we observed that for most but a few feature combinations, the optimal latent state dimension equals the number of features.This implies that the optimal underlying dynamics should Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II THE SPRING-MASS-DAMPER'S PARAMETERS
be as complex as the measurements and adding additional complexity does not result in improved predictive capabilities.
Future work may include extension to forecasting rather than filtering.

APPENDIX A MASS-SPRING-DAMPER MODEL
The equations describing the mass-spring-damper system dynamics, with the variables defined as in Fig. 5, are given by the following continuous-time state space model ẋ(t) = Ax(t) + w(t) where To be able to calculate the filtering and smoothing densities, this model is discretized as described in [46].The simulations with the discretized linear state-space model where performed with a sample time T s = 0.01s.

APPENDIX B EXPECTATION-MAXIMIZATION
Consider a parametric probabilistic model existing of the two variable sets, Z , and, X .The model is described by the local probabilities, p(X |θ ), and, p(Z |X, θ ).Then the marginal distribution, p(Z |θ ), can be decomposed as Ad hoc, we can express the logarithm of p(Z |θ ) by introducing an auxiliary inference distribution, q(X ).log p(Z |θ) = q(X ) log p(X, Z |θ ) q(X ) dX + q(X ) log q(X ) p(X |Z , θ ) dX (20) The second term equals the non-negative relative entropy between the inference distribution, q(X ), and, the a posteriori distribution, p(X |Z , θ ).As a result log p(Z |θ ) is always greater than the first term.This observation implies that if we were to maximize the first term with respect to θ rather than log p(Z |θ ), the associated parameter update is also an ascent direction for the true objective.Consequently, the first term, often referred to as the Evidence Lower Bound (ELBO), can be used as a surrogate objective, and, the inference distribution can be chosen so to minimize the approximation error, equivalently, to minimize the relative entropy.The relative entropy is zero only between equivalent distributions, so that q * (X ) = p(X |Z , θ ) Substituting the optimal inference distribution in the ELBO, omitting the denominator, yields the surrogate objective from sec.II-E.Note that the inference distribution does not depend on the parameter.Thus, the former parameter estimate is used to determine the inference distribution, the present parameter estimate follows from optimizing the ELBO; hence, suggesting an iterative procedure which is known as the EM algorithm.
It is well know that the EM algorithm converges to the solution of the reciprocal MLE problem.Once the paremeter has reached its final value, therefore we have that APPENDIX C PARAMETER UPDATES FOR HGMMS From [16,Theorem 12.4] and [17], we adopt the following expression for Q(θ, θ ′ ), which applies to time invariant Hidden Gauss-Markov Models where {y, z} k t {y, z} k,⊤ t Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where µ k t and k t denote the smoothing distribution parameters.The smoothing distributions are defined as [16] p(x t |{y, z} k T ) = N (x t ; µ k t , The parameters are governed by efficient recursive procedures.We refer to [16] for details.
From the expressions above one easily verifies that the surrogate Q is maximized for the following parameter updates

Fig. 1 .
Fig.1.Graphical representation of probabilistic models for virtual sensing applications.Dotted lines represent an undefined number of repititions of the same graphical structure.

Fig. 2 .
Fig. 2. Test set-up for the accelerated lifetime tests of 48 solenoid valves.The figure on the rights depicts the CAD model of a SV.

Fig. 3 .
Fig. 3. New (left) versus used (right) solenoid valve.The used valve shows signs of wear at its plunger and valve seals.can be realised in a noninvasive manner and without the need for inconvenient sensors.Temperature and flow measurement are required to determine the exact moment of EOL of the SVs in our R&D data set.Three stages in the lifetime of a SV can be identified.The SVs are given a label, l ∈ {healthy, degrading, faulty}, by an expert based on all measurements mentioned above.There is no one clear criterion to determine these labels.Fig.4shows the measured signal of the outlet flow and current signal during the switch-on and the outlet flow during the switch-off of an on/off cycle for each of these three stages.Luckily, pronounced differences emerge from the visual comparison of these signals from the three different lifetime stages.The 'outlet flow -ON' plot shows the outlet flow of the fluidum through the valve starting at the beginning of the open state, i.e. from the moment the plunger is being opened.The plunger is opened by applying the AC current signal as depicted by the 'current -ON' plot.The 'outlet flow -OFF' plot shows the outlet flow from the moment the plunger is being closed.The spring loaded plunger is closed from the moment the AC current signal is no longer applied and typically occurs faster than the opening.In the healthy operating condition, there is no leakage when the valve is closed and throughflow steadily starts rising from the moment the valve is opened.When the valve is closed, the throughflow steadily decreases back to zero.4As mentioned

Fig. 4 .
Fig. 4. Evolution of a SV outlet flow during both on-and off-phase and the current signal during on-phase for three cycles over the course of a lifetime.The first cycle (blue) is during healthy operation, the second cycle (green) is during the degrading phase of the solenoid and the third cycle (red) is at the end of useful life of the SV.

Fig. 7 .
Fig. 7.Comparison of the estimated trajectory to the true underlying trajectory for the mass-spring-damper system based only on the measurement of the position and velocity of first mass.

Fig. 8 .
Fig. 8.Comparison of the RUL estimate, log p(RUL|y), for different latent state dimensions on the SV data set for the case where y = {I d,fund , I m,fund , notch} and z = {RUL}.

Fig. 9 .
Fig. 9. Comparison of the loglikelihood, log p(Y = y|y), for different latent state dimensions on the SV data set for the case where y = {I d,fund , I m,fund , notch} and z = {RUL}.

Fig. 13 .
Fig. 13.Comparison of SOH and/or RUL predictions for three different SVs.The upper figures show the RUL predictions of a model trained without observability of the SOH labels, the middle figures show SOH predictions in the opposite case and the lower figures show both RUL and SOH predictions in case both labels are available during the model learning phase.

TABLE I MAE
AVERAGED PER VALVE, AT 70% AND AT 90% DEGRADATION