Semi-supervised detection of structural damage using Variational Autoencoder and a One-Class Support Vector Machine

In recent years, Artificial Neural Networks (ANNs) have been introduced in Structural Health Monitoring (SHM) systems. A semi-supervised method with a data-driven approach allows the ANN training on data acquired from an undamaged structural condition to detect structural damages. In standard approaches, after the training stage, a decision rule is manually defined to detect anomalous data. However, this process could be made automatic using machine learning methods, whom performances are maximised using hyperparameter optimization techniques. The paper proposes a semi-supervised method with a data-driven approach to detect structural anomalies. The methodology consists of: (i) a Variational Autoencoder (VAE) to approximate undamaged data distribution and (ii) a One-Class Support Vector Machine (OC-SVM) to discriminate different health conditions using damage sensitive features extracted from VAE's signal reconstruction. The method is applied to a scale steel structure that was tested in nine damage's scenarios by IASC-ASCE Structural Health Monitoring Task Group.

several Machine Learning (ML), and in particular Deep Learning (DL), techniques [5][6][7]. However, a substantial part of anomaly detection approaches is based on Autoencoder (AE) architectures [4,[8][9][10][11][12][13]. AEs correspond to neural networks composed of at least one hidden layer and logically divided into two components, an encoder and a decoder. From a functional point of view, an AE can be seen as the composition of two functions E and D: E is an encoding function (the encoder) which maps the input space onto a feature space (or latent encoding space), D is a decoding function (the decoder) which inversely maps the feature space on the input space. A meaningful aspect is that by AEs, one can obtain data representations in terms of fixed latent encodings ⃗ h. In a nutshell, in anomaly detection tasks AEs are trained to minimize reconstruction error only on normal data instances, thus involving high reconstruction error on anomalous data. Then, the reconstruction error is considered as an anomaly score to classify the input data as anomalous or not, using a user-defined decision rule [14]. AEs' architectures have been presented with several variations such as Denoising Autoencoders (DAE), [15] which were meant to remove additional noise from input data, Sparse Autoencoders (SAE) [16], where a sparsity constraint is introduced on the hidden layer in order to emphasize meaningful features, and Variational Autoencoders (VAE) [17], that are generative models where the latent space is composed by a mixture of distributions instead of a fixed vector.
In recent decades, the attention to procedures for anomaly detection due to damage phenomena in civil constructions and infrastructures is more and more growing. Indeed, (i) safety standards for new constructions have increased -and therefore existing constructions could not comply with these standards for little degradation phenomena (ii) both new and existing structures are becoming increasingly smart with the use of several embedded sensors providing real-time information. For this reason, the research aimed at finding procedures that allow the set up of a Structural Health Monitoring (SHM) system for structures and infrastructures, i.e., for both buildings and bridges, are very numerous. Bridges are strategic structures for which important and expensive management and maintenance activities are foreseen because they are structural types particularly subject to environmental phenomena and variations in use conditions (loading-unloading cycles, temperature, etc.). Moreover, they do not have reserves of resistance capacity, which are characteristic of other structural types such as, for example, buildings. On the one hand, a proper model of the physics behavior of this type of structures in operational condition is not easy. This stimulates the use of automatic monitoring systems that can continuously and rapidly detect anomalous conditions due to damage, to ensure a quick response from the infrastructure manager. On the other hand, it is necessary to consider that (i) the high variability of the boundary conditions in which the bridge structure functions can alter the estimate of the anomaly (e.g., variable vibrations induced by wind actions, highly variable traffic load during the functioning of the structure, highly non-linear mechanical behavior of the materials that constitute the bridge) (ii) any algorithm implemented for a structural monitoring system hardly detect damage conditions if trained on an extensive database of measurements performed mainly in the operating conditions of the structure, namely in the absence of structural damage. This second aspect is crucial because the difficulties of measuring damage conditions are due to the intrinsic assumption made in the structural design approach, which expects the use of high safety factors to ensure that the operational conditions are well far from the structural limit condition. Therefore it is evident that investigating the use of damage detection algorithms that accurately provide warnings for structural monitoring is particularly challenging and interesting, regardless the subsequent necessity of damage quantification and structural prognostics. The monitoring strategies are mainly characterized by (i) types of monitoring (static or dynamic), (ii) analysis methodologies (i.e. input-output, with known forces, or output-only, with unknown forces) and (iii) analysis approach (i.e. data-driven or model-based, depending on whether the creation of a model to support the method is required). Static monitoring techniques usually consist of discrete more than continuous detection of gradual and slow variations of some parameters in rather long periods. By contrast, dynamic monitoring methodologies -which can use different techniques for identifying dynamic parameters, in the frequency domain [18] (e.g. peak picking, frequency domain decomposition, enhanced frequency domain decomposition) and in the time domain [19] (e.g. auto-regressive moving average models) -generally need to use a large amount of data. The records of accelerations, speeds and displacements can be post-processed through techniques operating in time or frequency domain, which affects the damage-sensitive feature. In the frequency domain, the features can be curvature, strain energy, flexibility and interpolation error [20,21] while, in the time domain, the feature is generally an error parameter [22].
In this work, we propose a semi-supervised data-driven DL-based framework to detect damages in an SHM system. Our proposal consists in using a VAE, trained on undamaged raw data, to represent input data through damagesensitive features (typically involved in structural damage detection [23][24][25]) and a One-Class Support Vector Machines (OC-SVM) [26] to classify data as undamaged or not, thus avoiding any user-defined decision rule. Damage-sensitive features are extracted by input data and their reconstruction computed through the VAE. Differently from other works based on standard AEs, our proposal leverages on the probabilistic aspects of a VAEs for the extraction of damagesensitive features from input raw data, which implies the capturing of more data variability in the latent encoding space than a standard AE, avoiding in this way several weaknesses that may be found by using AEs for anomaly detection instead [14]. Moreover, since the probabilistic encoder of a VAE approximates the generative distribution of input data through their latent representation (differently from an AEs, where a deterministic mapping from the input to the latent representation is learnt [14]), we expect that learning the distribution of undamaged data lets the encoder to model damaged data with different distributions, thus improving the robustness of the damage detection system. Finally, to the best of our knowledge, among various anomaly diagnosis studies in SHM based on machine learning methods, this paper aims to propose for the first time an analysis of the VAE latent representations in modeling damaged/undamaged data distribution and its impact on the damage detection through KL divergence analysis on the various damage cases. This paper is organized as follows. Section 2 briefly reviews the related literature; Section 3 describes the proposed architecture; Section 4 introduces the experimental assessment together with the discussion about the results, while in Section 5 an analysis on the VAE's functioning is provided. The concluding Section 8 is left to final remarks.

Related works
During the last years, due to the great success achieved in solving several kinds of problems and due to the increasing accessibility to computing hardware, the interest in using DL-based approach in processing massive data coming from SHM systems is raising, thus moving researchers to design SHM damage detection methodologies towards autonomous data-driven systems. One of the main advantages of introducing DL methods in SHM systems consists in automating the feature extraction process from raw input data through learnable non-linear transformations modeled as layers of a Deep Neural Network (DNN), thus eliminating the need for human-designed features, the requirement for specific feature knowledge and resulting in a DL-based SHM system that is end-to-end. [27]. The use of DNNs has introduced the possibility to process large datasets acquired from different types of sensors in data-driven SHM systems [28,29].
Yan et al. in [30] presented a multiscale cascading deep belief network named MCDBN for automatic fault identification of rotating machinery. The same authors in [31] proposed a novel hybrid deep learning model for multistep forecasting of diurnal wind speed called ISSD-LSTM-GOASVM. In [32], Xu et al. provided a summary of the state-of-the-art progress of AI applications in civil engineering for the entire life cycle of civil infrastructures. Li et al. in [33] conducted a comparison between the performance of a Convolutional Neural Network (CNN) and other methods, such as Support Vector Machine, Random Forest, k-Nearest Neighbor, and Decision Trees for damage detection in an experimental cable bridge model. The results demonstrated that the accuracy score was improved by at least 15 % when using a CNN. In [34], Li et al. presented an approach that integrates the electromechanical admittance (EMA) technique with CNNs to quantify structural damage severity under varied temperatures. Ai et al. in [35] proposed a novel approach based on CNNs integrated with EMA to identify compressive stress and load-induced damages of concrete cubic structures subjected to loading. The same authors, in [36], presented an EMA-based damage detection approach based on Principal Component Analysis (PCA) incorporated with ANNs. In [37], a new approach that utilizes a 1-D CNN has been introduced for detecting the general condition of a structure. This approach only requires two states of damage during the training stage, specifically undamaged and fully-damaged cases. The advantages in using 1-D CNNs in detecting structural damages were already inspected by the same authors in [38,39], where real-time capabilities of CNNs in detecting damages emerged. Shao et al. in [40] introduced a framework that utilizes Transfer Learning in a DL-based system for fault diagnosis. This approach enables and speeds up the training process of DNNs. Ai et al. in [41] proposed a novel approach based on 2D-CNNs for the raw EMA-based rapid damage quantification on structures. Tian et al. in [42] Bidirectional Long Short-Term Memory (LSTM) models to correlate girder vertical deflection and cable tension for condition assessment in SHM.
In [43], the authors proposed a DL framework that utilizes cloud computing to achieve efficient real-time monitoring and proactive maintenance of civil infrastructures. Cheng et al. in [11] introduced a data-driven method for performing health monitoring on machines, which is based on Adaptive Kernel Spectral Clustering (AKSC) and LSTM. In [44], a supervised anomaly detection method has been proposed by the authors, which utilizes a cluster of DNNs trained on time series signals transformed as grayscale images using computer vision techniques. In particular, in [44], clusters of DNNs are composed by stacked AEs trained by and greedy layer-wise training [45]. In [46], the authors presented an anomaly detection method that utilizes a Deep Coupling Autoencoder (DCAE) for handling multimodal sensory signals. The proposed method also integrates feature extraction of multimodal data into data fusion for fault diagnosis.
According to the growing interest in using AEs to solve general anomaly detection problems, several methods based on AEs for SHM damage-detection systems were proposed in literature. In [47], a monitoring method based on Conditional Convolutional AEs for identifying wind turbine blade breakages is proposed. Pathirage et al. in [48][49][50] proposed several AE-based frameworks to learn the relationship between the physical properties of a structure and its vibration characteristics. The frameworks considered modal properties as input data and produced elemental stiffness reduction parameters of the structure as output. This was done to enable the detection of damages. In [51], a method based on DAE is proposed to extract damage features from data of undamaged structures affected by noise and temperature uncertainties. Mao et al. in [52] combine Generative Adversarial Networks (GAN) with AE to perform unsupervised damage classification on time series data that is transformed into images through Gramian Angular Field imaging. In [53], stacked AEs were used to extract damage-sensitive features from modal parameters of vibration raw data. Rastin et al. in [54] proposed convolutional AE to perform unsupervised damage detection on benchmark datasets leveraging on reconstruction error of AE. In [23], an unsupervised method based on acceleration signals was proposed. The method involved preprocessing the raw signals through Continuous Wavelet Transformation (CWT) and Fast Fourier Transformation (FFT), before feeding the data from each sensor into an AE to extract features. The extracted features were then classified as damaged or undamaged using an OC-SVM. The same authors in [55] proposed a novel method to detect, in an unsupervised manner, structural damages directly from raw acceleration responses (thus avoiding the use of CWT and FFT) using a OC-SVM fitted on damage-sensitive features extracted from original signals and their reconstruction made by the AE. Li et al. in [56] proposed a novel approach, the New Generalized Autoencoder (NGAE), which incorporates a statistical-pattern-recognition-based approach that leverages on power cepstral coefficients of structural acceleration responses as damage-sensitive features to assess structural damages. In [57], Yan et al. presented a multi-domain indicator-based optimized stacked DAE to perform fault identification of rolling bearing.
However, a standard AE performs a deterministic mapping from the input data to its reconstruction, implying a lack in modeling data variability in latent representations [14]. This aspect involves several weaknesses in using an AE for anomaly detection tasks rather than a VAE, whose probabilistic encoder models the distribution parameters of the latent variables rather than the latent variables themselves [14], thus capturing more data variability and resulting in a more homogeneous latent space than a standard AE. The authors of [58] propose a novel anomaly detection approach that utilizes a combination of VAE and Support Vector Data Description (SVDD) [59]. In this approach, the SVDD decision boundary is learned simultaneously with the latent representations of data and fitted on them. This is done to prevent the problem of hypersphere collapse, which occurs when all the data points are mapped to a single point in the latent space [60]. Ma et al. presented a method based on VAEs in [61] to detect structural damages in the time-domain for SHM applications. The approach utilizes the latent representation obtained from the VAE's encoder to generate a time series of damage indexes during testing, which allows for the clear visualization of sudden changes in damage location. A method proposed in [62] employs a Convolutional VAE to extract features and performs anomaly detection using OC-SVM and Elliptic Envelope [63] on the learned latent representations. The authors of [64] proposed a damage detection approach that utilizes a VAE ensemble to calculate damage statistics based on Evidence Variational Lower Bound (ELBO) values. The ELBO values are then used to classify each input as damaged or undamaged using a decision rule defined by the user as a fixed threshold value. The authors of [65] proposed an unsupervised method for detecting tunnel damages from vibration data. The method uses a Convolutional VAE as a feature extractor and Wavelet Packet Decomposition (WPD) [66] to process the data and produce a damage index. The damage index is then compared to a fixed threshold value to classify the input data as damaged or undamaged. In [67] the authors proposed the Deep Order-Wavelet Convolutional Variational Autoencoder (DOWCVAE), a novel method for the identification of faults under fluctuating speed conditions. Xu et al. in [68] proposed a method based on VAE and GAN to assess the conditions of cable-stayed bridges. Yan et al. in [69] presented DRVAE, a novel DL model based on VAE for fault diagnosis of rotor-bearing system.
The approach presented in this work leverages on the advantages in using a VAE for anomaly detection [14] to perform damage detection in an SHM sys-tem. Differently from other methods, our proposal takes advantage of the VAE's probabilistic aspects to enhance the damage-sensitive feature extraction rather than using data latent representations modeled by VAE to detect damages. In particular, our proposal exploits the VAE's capability to model the undamaged data distribution through its probabilistic encoder during the training stage, in order to emphasize damaged data with different distributions. In this way, the difference in distributions is captured by the VAE's probabilistic decoder, which reconstructs the data less accurately as much as the damage increase. Finally, a OC-SVM is fitted on damage-sensitive features extracted by input data and their reconstruction in order to classify data as damaged or not.

Proposed architecture
In this work we propose a framework to perform a semi-supervised damage detection using a VAE followed by a OC-SVM. The main aim of our proposal consists in identifying the presence of damages regardless their intensity, thus producing outcomes from the application of this framework that can be interpreted in terms of a binary classification response.
A supervised method for identifying structural damage requires labeled data during the training phase, which means data must be recorded both in the undamaged and damaged states of the structure. However, in a real case study, the available data is assumed to be undamaged during the training phase. Therefore, the use of data on the damaged structure is subordinated to the adoption of Finite Element (FE) numerical models of the structure, which can simulate potential damage conditions. It should be noted that, for existing structures, the FE model is based on simplifying assumptions that may not fully match the experimental behavior of the structure. Updating the FE model can improve the accuracy of the simulation (e.g. by calibrating the matrix of masses and stiffnesses of the structure), but this process is time-consuming and requires extensive analysis. The described procedure, which uses a semi-supervised approach, circumvents this issue by relying solely on undamaged data during the training stage to detect structural decay without utilizing FE numerical models.
According to its definition, training a VAE on undamaged data involves the approximation of their intractable true posterior through their latent representation. In [70], an anomaly is defined as an observation that differs from regular data that it is considered to be generated by a different mechanism. This definition induces to consider distinct true posterior between undamaged and damaged data. Leveraging on this aspect, different latent distributions are generated by the probabilistic encoder if data are heterogeneous (i.e. including both undamaged and damaged data), thus inducing the probabilistic decoder to an erroneous data reconstruction if latent distributions are different from that of the undamaged data. Then, after a feature extraction stage, data are fed into a OC-SVM in order to learn a decision boundary to separate undamaged data from damaged data, and thus to classify new input datapoints as damaged or not. A representation of the framework is shown in Figure 1. In the following subsections VAE and OC-SVM models are explained. Fig. 1: Graphical representation of the proposed architecture. Data are firstly fed into a VAE. Then, using original and reconstructed signals, after a feature extraction stage, data are fed into a OC-SVM for being classified as damaged or not.

Variational Autoencoder
Considering x as data and z as its latent representation involved during the data generation process, a Variational Autoencoder (VAE) is a probabilistic generative model consisting of two main components: a probabilistic decoder, defined by a likelihood function p θ (x|z), with parameters θ, that generates new data from a latent variable z, and a probabilistic encoder, defined by a posterior distribution q ϕ (z|x), with parameters ϕ, that approximates the intractable true posterior p θ (z|x).
To admit inference, VAE training simultaneously optimizes both the parameters θ and ϕ while learning the marginal likelihood of the data in the following generative process: max where log p θ (x|z) can be defined as: where D KL (·) stands for the Kullback-Leibler (KL) divergence and p(z) is the prior distribution over the latent variables z [71]. Notice that KL divergence quantifies the difference between two probability distributions q and p. Due to the non-negativity of the KL divergence, the term L(θ, ϕ; x, z) is called Evidence Variational Lower Bound (ELBO) on the marginal likelihood and it can be written as below: where the second term is an expected negative reconstruction error between the input data and the data generated as output.
Leveraging on this formulation, VAE training can be performed by maximizing the ELBO [58]. However, the expected reconstruction error requires the sampling of random latent variables z from the approximated posterior q ϕ (z|x), which makes the training intractable in practice since the gradient of the ELBO with respect to the parameters ϕ can not be estimated. This problem can be avoided using the reparametrization trick : assuming the prior p(z) and the posterior q ϕ (z|x) to be Gaussian distributions with a diagonal covariance matrix, with the prior p(z) set to the isotropic unit Gaussian N (0, I), each random variable z i ∼ q ϕ (z i |x) = N (µ i , σ i ) is reparametrized as differential transformation of a noise variable ϵ i ∼ N (0, 1) as follows [71]: Assuming the framework above, the ELBO can be differentiated and optimized with respect to both the variational parameters ϕ and θ [17]. In particular, ELBO can be maximized via gradient descent; this aspect involves a certain flexibility in modeling both the probabilistic encoder and the probabilistic decoder. A typical choice falls on the use of Multi-Layer Perceptron (MLP) Neural Networks [72]. In such case, the probabilistic encoder network takes the data x as input and computes the mean and the standard deviation of the approximate posterior q ϕ (z|x) in order to sample the latent variable z. Then, the latent variable z is given as input of the decoder network which generates the reconstruction of the datax. The architecture is shown in Figure 2.

One-Class Support Vector Machine
Considering input data as points defined in a vector space, a Support Vector Machine (SVM) [73] is a two-class method that classifies data according to a decision hyper-plane that maximizes the separation between the two classes. Researchers in SHM (Structural Health Monitoring) have been attracted by SVM due to its robustness in generalization capabilities [74][75][76]. However, in order to detect damages in a monitored structure, the use of a SVM implies that both of the undamaged and damaged data of the structure must be available during the training stage.
A One-Class Support Vector Machine (OC-SVM), instead, is a method that requires only data related to one class to train the model. The fundamental objective of the training stage in an OC-SVM is to determine a hyper-plane that can accurately define the region including the training samples [77]. This is achieved by solving the following optimization: min w,ξi,ρ where N refers to the number of training samples, w refers to the decision hyperplane weights, x i is the i-th training sample, Φ(·) is a function that transforms data X ⊆ R d from its original space into a new feature space F ⊆ R d ′ allowing the kernel trick Φ(x i ) · Φ(x j ) = K(x i , x j ), ξ i is a slack variable controlling how much error is allowed during the training stage and v ∈ [0, 1] controls the proportion of outliers (i.e., training data lying outside the estimated region) as well as the number of support vectors. Considering quadratic programming and Lagrange multipliers, the optimization problem above can be transformed into the following dual form: where α i is the Lagrange coefficient of the i-th training sample x i . The nonzero coefficients α i will determine the support vectors required to evaluate the decision function for a new test point x: The test point x is outside the estimated region when the decision function f (x) returns a negative value, otherwise it is inside [26,55,77]. In this work, we focus on the using of the Radial Basis Function (RBF) as the Φ(·) function. In this way, the optimization problem involves the search of a hyper-sphere to estimate the region of the data rather than a hyper-plane. Moreover, we have set the parameter v ≈ 0 since we are interested in capturing as many training samples as possible to determine the region of interest fitted by the OC-SVM. A graphical representation of a OC-SVM hyper-sphere is shown in Figure 3.

Experimental assessment
The architecture proposed in this work was evaluated on the benchmark dataset from the case study related to the steel frame tested in Phase II of the SHM benchmark problem [78], whose results were published in 2003 by the International Association for Structural Control (IASC) -American Society of Civil Engineers (ASCE) Structural Health Monitoring Task Group. The results of the experimental assessment are compared with the performances obtained by the method proposed in [37] on the same dataset and with the performances obtained by substituting VAE with a standard AE, thus following the approach proposed in [55]. In this Section, firstly details on the benchmark dataset are provided. Then, details regarding how data were arranged and specifics about the model selection stage involved in the experimental phase are described. Finally, results are shown and discussed.

Case study: Experimental phase II of the SHM benchmark data
The frame is a four-story steel structure built at the University of British Columbia (Figure 4). The dimensions are 2.5 m × 2.5 m in plan, and the total height is 3.6 m. The structural elements are hot-rolled, grade 300W steel. The columns are B100x9 sections and beams are S75x11. In each span, the bracing system is composed of two threaded steel bars with a diameter of 12.7 mm and inserted along the diagonal. To make the mass distribution reasonably realistic, four slabs of 1000 kg are in the first, second and third floors, while slabs of 750 kg were used on the fourth. Further information can be read in [78]. Twelve accelerometers were placed on the structure as shown in Figure 5. On each floor, 3 accelerometers were installed on the west (in black), east (in red) and central column (in blue). All sensors are monoaxial: the accelerometers located on the west and on the east columns are oriented along the +X direction, while those on the central column are oriented along the +Y direction. In this paper, the signals are caused by shaker excitation, i.e., a band-limited white noise with components between 5-50 Hz.
Accelerations were recorded in the absence (Case 1) and in the presence of structural damage. Eight cases of damage were simulated. Table 1 and Figure  6 summarize the various damage scenarios in which the intensity gradually increases from Case 2 to Case 9. The simulated structural damage consists in the removal of diagonal stiffening elements in Cases 2 to 7, while the loosening of the connecting bolts is added in Cases 8 and 9. Figure 7 shows data distributions for each sensor and for each case. 1  Undamaged  2 On the first floor, diagonal element is removed in one bay 3

Case Description
On the first and the fourth floors, diagonal elements are removed in one bay 4 On all floors, diagonal elements are removed in one bay 5 All braces are removed in the east face 6 On east face all braces are removed, while on north face of the second floor, braces are removed 7 All braces are removed 8 Case 7 + loosening of the connecting bolts for two beams 9 Case 7 + loosening of the connecting bolts for all beams in the east face

Data arrangement
Data from Experimental Phase II were preprocessed following the setup proposed in [37]. In particular, each damage case S i , with 1 ≤ i ≤ 9, was considered as a set of signals collected by n sensors: where n ij = ⌊d j /s⌋. Then, data were shuffled and normalized between 0 and 1, differently from [37] where data were normalized between -1 and 1. The normalization stage was performed considering minimum and maximum values computed through all the training dataset for each sensor. Before starting the training stage, in order to have an estimate of the performances also on undamaged data, the 20 % of the samples from the Case 1 were extracted in order to evaluate the framework also on unseen undamaged data. Following the experimental setup in [37], accelerations measured on the structure during the random shaker excitation under 5-50 Hz were used. Acceleration measurements were sampled at 200 Hz. Data were measured for 120 s for Cases 1 -5, 300 s in Case 6 and for 360 s in the remaining cases. As it was explained above, an architecture for each accelerometer was trained using only undamaged data (Case 1). A length of s = 128 was considered to divide each signal in frames, thus obtaining 187 frames for Cases 1 -5, 468 frames for Case 6 and 562 frames for Cases 7 -9.

Model selection
A fundamental phase in using machine learning algorithms consists in finding the best set of hyperparameters, i.e. the set of parameters of both the ML model and the learning algorithm which remain unchanged during the learning phase and whose values influence the final ML model performance on a given dataset [79]. This stage is often referred to as model selection. Examples of hyperparameters related to our proposal are the number of layers for the probabilistic encoder Fig. 7: Graphical representation illustrating the data distributions for each sensor in every damage case of the benchmark dataset. Box plots were utilized to represent these distributions, and it can be observed that the distributions are not only significantly overlapping but also similar in the majority of the cases, suggesting that distinguishing between the damages may require additional analysis beyond examining the data alone. and the dimensionality of the latent space z of the VAE. Different approaches are known in literature to evaluate a ML model on some data during the hyperparameter search, such as the holdout method [80]. In our work, since only data related to the undamaged structure are involved in the training process, and since this set of data has a not-too-small number of samples, we chose k-fold Cross-Validation, that is commonly used for its statistical significance [79]. In particular, in our experiments we set k = 10 to determine the data partitioning. In order to explore and evaluate different sets of hyperparameters, we referred to hyperparameter optimization algorithms since, due to the high number of hyperparameters of the overall architecture, a manual tuning could have been too much expensive from a timing perspective. Among the different algorithms proposed in literature, our choice fell on the bayesian optimization [81].
In this work, VAE model selection stage was performed separately for each sensor considering 100 trials for the bayesian optimization in order to minimize the averaged reconstruction error on validation sets produced by the k-fold Cross-Validation. MLP Neural Networks were adopted as architecture to model both the probabilistic encoder and probabilistic decoder. Search spaces for hyperparameters were established during a preliminary manual analysis with the aim of minimizing the computational time needed for the overall model selection stage. The specific details of these search spaces can be found in Table 2. For each fold, the 20 % of the data were extracted from the training set and considered as validation set. The number of epochs was set to 1000 and the early stopping criterion was considered as convergence criterion with a patience of 50 epochs. As a result of the model selection stages, Shallow Neural Networks (i.e., MLP  Neural Network having 1 hidden layer) with the Sigmoid as activation function resulted to be the best architecture for VAE's probabilistic decoders and probabilistic encoders. Since the number of neurons in the hidden layer and the latent dimension assumed values respectively in neighborhoods of 40 and 20 reporting similar performances, we fixed the final configuration of each network as having 40 neurons in the hidden layer and 20 neurons for the latent representation. VAE's training stages were performed using Adam optimizer [82] with a learning rate of 0.001. The OC-SVM's parameter v was fixed to 0.001 and the RBF was considered as kernel function. An example of undamaged region fitted by the OC-SVM is shown in Figure 8.

Results
In this subsection, the experimental results related to the application of our proposal on the benchmark problem are reported. As in [55], the following damagesensitive features were considered: 1. Mean Squared Error (MSE), which measures the reconstruction error between the input acceleration signals and their reconstruction as follows: where n is the number of the signal features, x i is the i-th feature in the original signal andx i is the i-th feature in the reconstructed signal; 2. Original-to-Reconstructed-Signal Ratio (ORSR), computed as: that represents the ratio in decibels between the magnitudes of the original signal and its reconstruction.
The method performance evaluation was obtained by the score used in [37] in order to make a comparison of the results. Thus, to each set S ij , the probability of damage (P oD) was computed as follows: where c ij is the number of samples classified as damaged by the OC-SVM. Finally, the overall structure score for each case S i was computed by averaging the P oD values of each sensor: P oD avg,i = P oD i1 + P oD i2 + ... + P oD in n (10) As it was described in [37], a low value of P oD ij indicates a low probability that the signal i recorded by the j-th sensor belongs to an undamaged state. On the other hand, a high value indicates a high probability of belonging to damaged state. Same observations are valid for the P oD avg,i value.
Experimental results are reported in Table 3. We remark that the main aim of our proposal consists in perform damage detection from data. The P oD values of each sensor are interpreted as the probability of belonging to the damaged state, considering a P oD value of 0 % as an undamaged structure, 100 % as a damaged structure and 50% as a chance probability. We can notice that the P oD avg values reflect the a priori known damage conditions of the structure: damage probability is low for Case 1 (i.e., undamaged case), while it is high for all the remaining cases (i.e., damaged cases). It is worth noticing that P oD avg values higher than the ∼ 89% are always reached, except for Case 2 and Case 6, where P oD avg values of ∼ 70% resulted as outcome. In Case 2, we can notice that the P oD avg is decreased by the P oD values related to the central sensors. For each damaged case, P oD values of each sensor are not correlated to mutual position sensor-damage. Therefore, the choice to calibrate the framework for each sensor does not allow us to do damage localization. Nevertheless, the proposed approach can suggest which are the most efficient sensors to be selected to monitor a structure (such as sensors 3, 4 and 12). For instance, Figure 9 shows that the damage is better detected by sensor 12 (lateral sensor) than sensor 2 (central sensor).
In [37], P oD avg values related to Case 2 and Case 6 are estimated to be respectively ∼ 22% and ∼ 50%, while in our case they are estimated to be ∼ 70% and ∼ 71%. According to a probability perspective, results reported by [37] are close to the chance probability for Case 6, and is close to an undamaged probability for Case 2, while in our case the presence of structural damages is suggested in both the cases. Similar observations can be done for the remaining cases shown in [37], such as Cases 3, 4 and 5, where P oD avg values don't suggest the presence of a damage, even if present. Moreover, P oD avg values in [37] hide P oD values close to 0 and 100, thus giving a not-too-reliable estimate of the   Table 3: Results on the nine structural cases. For each sensor (rows 1-12), after a description regarding the sensor position (columns 1-3), P oD values are reported for all the Cases (columns 4-12). The last row reports the P oD values averaged for each Case. In parenthesis, the difference from the results using a standard AE is reported.
overall structural conditions in some cases: for example, Case 4 is reported to have a P oD avg value of 39.77 ± 36.24, having min = 0 and max = 100 suggesting, respectively, a fully undamaged and damaged condition of the structure; in our case instead, Case 4 is reported to have a P oD avg value of 99.96 ± 0.16, having min = 96.47 and max = 100, thus reporting a more reliable summary of the structural condition. It is also important to point out that, differently from [37] where a supervised damage detection method was proposed, we propose a semi-supervised methodology for damage detection, where only undamaged data are necessary for the training stage.

Analysis on the impact of the VAE
Differently from [55], where damage detection is performed using an architecture composed by an AE followed by a OC-SVM, in our proposal anomaly detection is performed using a VAE followed by a OC-SVM. As in [55], data, before being fed as input to the OC-SVM, are transformed using damage-sensitive features extracted from the original signals and their reconstruction made by VAE. As we have described above, a VAE has the capability of learning to produce distributions of data through latent representations generated by its probabilistic encoder. Moreover, differently from standard AEs, VAEs don't learn a deterministic mapping from input to their reconstruction, thus modeling data variability in latent representations [14]. In order to verify the advantages of using a VAE instead of an AE on the proposed method, an experimental assessment was made substituting VAE with a standard AE while maintaining the same architectures. Results are shown in Table 3 in parenthesis as difference from the results obtained through the use of VAE. We can observe that the P oD avg value related to the undamaged case (Case 1) is higher than the one reached by our proposal, thus exhibiting a lower capability in recognizing undamaged data than our architecture. Moreover, we can notice that P oD avg values for almost all the cases are lower than those reached by our proposal, involving that damages are detected with lower probabilities than our architecture. This aspect implies that the use of a VAE entails a more robust damage probability estimation than using a standard AE (4.65% improvement on average). A graphical representation of the P oD avg obtained through VAE and AE is reported in Figure 10.  Assuming that generating distributions of damaged data are different from that of undamaged data, our proposal aims to learn the latent distribution of undamaged data in order to induce the probabilistic encoder to encode damaged data with different generating distributions. As a consequence, the probabilistic decoder will hardly decode data coming from distributions diverse from those learned during the training stage, thus resulting in high reconstruction error.
In order to verify how much generating distributions of damaged data diverge from that of undamaged data, KL divergences were computed for each sensor and reported in Table 4. Recall that KL divergence quantifies the difference between two probability distributions q and p. We can notice from the averaged KL values reported as KL avg in Table 4 that latent distributions of damaged data diverge as much as damages increase, thus confirming the assumptions made above. This aspect suggests that latent representations become harder to decode by the probabilistic decoder of VAE as the damages increase ( Figure 11). Moreover, the increasing damages captured by VAE's approximation of generating distributions implies that the amount of damages is implicitly suggested in the damage identification process of our architecture. Using t-SNE [83], latent representations of each case related to a randomly chosen sensor are shown in Figure 12.  A traditional method for damage identification in structures is the Frequency Domain Decomposition (FDD) [18]. The method allows identifying the frequencies associated with the vibration modes of a structure based on the analysis of the accelerations recorded on the structure, due to natural vibration or shaking. A change in frequency indicates a change in stiffness: if the frequency decreases, the structure is more deformable and this could indicate that the structure is experiencing damage. Table 6 shows the frequencies of the first two vibration modes of the healthy structure (Case 1) and the eight damaged structures (Case 2 -9), obtained by FDD. Variation in percentage for each damaged case from the undamaged case is shown in brackets. The traditional FDD technique is scarcely able to detect damages for Case 2 due to low damage intensity, while it is able to detect damages for Cases 7, 8, and 9 where the frequency values decrease significantly (more than 60%) because they are characterized by the presence of several "damaged" elements. On the contrary, our method identifies all the different structural conditions.
Finally, by comparing the variations in percentage shown in Table 5 with the KL avg values listed in Table 4, we can notice a correspondence between the KL values obtained through the DL-based method and the frequency variations obtained through traditional FDD method: higher the frequency variation, higher the KL value. Thus, we could consider the KL value as a parameter suggesting a quantification of the damage, differently from [37] where the P oD values were considered to estimate the quantification of damage.

Noise impact analysis
A series of experiments was conducted to assess the performance of the proposed method across various simulated noise scenarios. Gaussian noise with different sigma levels was introduced to simulate the noise conditions. Since the input signal's magnitude was on the order of 10 −3 , the sigma level was gradually increased until it reached this threshold.  Figure 13 shows the effect of increasing noise factors on the data in two different scenarios, i.e. when noise is already during the training stage (a) and when noise emerges over time following the completion of the training stage (b). We can notice that the presence of noise alters the performances of the proposed pipeline only when its level reaches a magnitude comparable to that of the signal data (i.e., 10 −3 ), thus revealing that the pipeline is resistant to noise level either when it is already present during the training stage or when it occurs over time.
The traditional technique based on dynamic identification is not effective when the data are influenced by noise. In particular, the representation of the first singular value of the power spectrum is strongly distorted by noise when sigma is between 10 6 to 10 3 . Indeed, the resonance peaks -from which the vibration eigenfrequency of the structure can be read -are not detected. Conversely, when the noise is reduced, the frequencies are uniquely determined. Figure 14 shows the representation of the first singular value of the decomposed spectrum. The curve for the case without noise (i.e., when data are filtered) is presented in black. The other colors represent the curves obtained with raw data by adding noise. Therefore, frequency variation used as a damage-sensitive feature -and consequently, the traditional method -are inefficient in the presence of noise because the latter affects the detection of the frequencies themselves i.e., it does not allow their identification.

Remarks
In this work, we proposed a framework to perform a semi-supervised damage detection in an SHM system based on a VAE and a OC-SVM in order to minimize human interactions during the data classification process. It is important to note that, even though we have focused our studies on MLP, VAEs can be implemented using various other architectures, such as CNNs and RNNs. While we acknowledge that different implementations of VAEs can potentially impact the overall performance of the pipeline, our study primarily focused on examining the functionality of the entire framework to gain insights into its operation. Moreover, it is worth mentioning that there exist alternative generative methods for anomaly detection that could also be explored, e.g. GANs. Additionally, among other ML approaches such as SVDD or clustering algorithms that may also provide valuable insights, we focused on OC-SVM since it defines a decision boundary and offers advantages such as providing a good control over its definition through several hyperparameters.
Moreover, we have implemented s = 128 in accordance with the setup proposed by [37] as stated above. However, it is essential to highlight that the dimensionality of the sample could yield different outcomes. Figure 15 demonstrates that a sample size lower than ours may result in reduced information contained in the samples, leading to lower P oD avg , despite an increase in the number of samples. Conversely, incorporating more context (such as s = 256) can improve accuracy, even with a decrease in the number of samples. It is worth noting that despite this consideration, s = 128 appears to be a favorable compromise, as its performance closely aligns with that of 256. Thus, it is plausible that achieving the same result may be possible with a larger sample size.
Finally, for Case 6, certain sensors (specifically sensors 6, 7, and 8) fail in detecting the presence of damage, whereas the remaining sensors exhibit high PoD values. Despite the P oD avg value being reasonably high (approximately 70%), this outcome highlights two aspects. Firstly, there is room for improvement in the algorithm to better identify minor anomalies in the measurements obtained from less damage-sensitive sensors. Secondly, it is important to note that relying solely on the P oD avg value derived from trained networks for each sensor could lead to inaccuracies when numerous sensors lack sensitivity to damage.

Conclusions
In this work, we proposed a framework that allows to automate the entire damage identification process (from the training stage to the testing stage) requiring less time than a traditional SHM technique. In particular, if we consider a typical SHM technique (i.e. FDD) that compares the frequency of vibration of the structural system in different conditions to identify anomalies, we have to highlight that (i) the frequency identification is not always unique (ii) the threshold to define if there is an anomaly is completely arbitrary. The probabilistic aspects of a VAEs allow to model data heterogeneity with different generating distributions. In the case of undamaged/damaged data, the probabilistic encoder models different data distribution thus involving an implicit capture of damaged states of a structure and resulting in a more robust damagedetection system than using a standard AE. Moreover, the KL divergence, which is generally implied in VAE's training stage, could be evaluated for the cases in which a damage is detected in order to quantify it.
Currently, as we have seen in the discussion of the experimental assessment, our framework does not give the possibility to localize a damage according to the score obtained by the single sensors. Recently, several methods were proposed to interpret decisions of anomaly detection methods using XAI techniques [84]. For this reason, in future works, we would like to extend our framework in order to give the possibility not only to detect general damages of the structure, but also to reliably identify where the damages are located. Moreover, in future works, we aim to extend the application of our methodology to more complex structures associated with real-life case studies. This will enable us to evaluate the efficacy and robustness of our approach in practical real-world scenarios. In this scenario, we intend to tackle scenarios where the normal condition of a structure deviates from its established normal state, outlined in the training data, through a new normal condition. Novel normal state could be determined by several causes. such as changing loads. In this case, we would explore possibilities for adapting the existing normal state to accommodate the new conditions through a refined learning process, such as Transfer Learning techniques.