Genetic Algorithm-Wavelet Transform Feature Extraction for Data-Driven Acoustic Resonance Spectroscopy

Acoustic resonance spectroscopy (ARS) enables highly accurate measurement of the properties (geometry/material) of a structure based on the structure’s natural vibrational resonances. In general, measuring a specific property in multibody structures presents a significant challenge due to the complex overlapping peaks within the resonance spectrum. We present a technique for extracting useful features from a complex spectrum by isolating resonance peaks that are sensitive to the measured property and insensitive to other properties (noise peaks). We isolate specific peaks by selecting frequency regions of interest and performing wavelet transformation, where the frequency regions and wavelet scales are tuned via a genetic algorithm. This contrasts greatly from the traditional wavelet transformation/decomposition techniques, which use a large number of wavelets at different scales to represent the signal, including the noise peaks, and results in a large feature size, thus decreasing machine learning (ML) generalizability. We provide a detailed description of the technique and demonstrate the feature extraction technique, for example, regression and classification problems. We observe reductions of 95% and 40% in regression and classification errors, respectively, when using the genetic algorithm/wavelet transform feature extraction, compared to using no feature extraction, or using wavelet decomposition, which is common in optical spectroscopy. The feature extraction has potential to significantly increase the accuracy of spectroscopy measurements based on a wide range of ML techniques. This would have significant implications for ARS, as well as other data-driven methods for other types of spectroscopy, e.g., optical.


I. INTRODUCTION
A COUSTIC resonance spectroscopy (ARS) is a group of techniques that quantify some characteristics of a system based on the natural vibrational modes of the structure. ARS techniques typically use two transducers, a transmitter to excite the structure of interest, and a receiver to measure the amplitude and phase of the resulting vibrations in the structure. By sweeping through different excitation frequencies, we measure the acoustic resonance spectrum of the structure, which consists of a series of peaks, corresponding to the natural resonances in the system. The location of the resonance peaks is highly sensitive to the system geometry, materials, and boundary conditions. As a result, the spectrum contains a significant amount of information about the structure. That is why frequency-domain techniques have found applicability in a wide range of applications, including bulk elastic property characterization [1], geometry measurement [2], microscale mass measurement [3], chemical weapons identification [4], and nondestructive damage detection [5], to name a few.
However, determining specific properties and boundary conditions of a structure based on the acoustic resonance spectrum is nontrivial. There are several different existing techniques for performing this calculation, each of which is typically specialized to the application. Resonant ultrasound spectroscopy (RUS) enables calculating the elastic properties of a structure by fitting the peaks to a model via optimization [6]. In the case of simple geometries such as a sphere, cylinder, or parallelepiped, an analytical model can be used to rapidly solve for the elastic properties. For components with more complex geometries, numerical models can be • Feature extraction is used in combination with partial least-squares (PLS) for performing regression and classification on example multicomponent systems, and it is shown to provide significant benefits in accuracy compared to PLS regression or classification with no feature extraction or with traditional wavelet decomposition feature extraction.
• The feature extraction technique can be applied to any machine learning or regression/classification technique, and provides a significant reduction in feature complexity for multicomponent acoustic resonance spectra or any other type of spectra.
used to solve for the elastic properties [7]. However, RUS typically relies on identifying which peaks correspond to each resonant mode and/or require an accurate initial estimate of the material properties. Thus, RUS is typically limited to structures with well-defined configuration, simple geometry, and homogeneous material properties, which have relatively simple acoustic resonance spectra with well-spaced peaks.
In contrast, structures with multiple structural components tend to have complex acoustic resonance spectra with overlapping sets of peaks. This complex spectrum originates from resonances within various individual components or even coupled resonances between components, which typically makes it infeasible to identify which peaks correspond to specific resonant modes. In the case of a 1-D system consisting of a fluid enclosed by thin walls, this coupled resonance spectrum problem was solved using swept-frequency acoustic interferometry (SFAI). Here, SFAI uses knowledge about the acoustic propagation path and time-windowing to isolate the frequency response of the fluid from that of the coupled system [4]. However, the technique is limited to the specific transducerwall-fluid-wall-transducer configuration, and it may require manual tuning of the frequency region that is considered. Alternatively, data-driven techniques are commonly employed in optical spectroscopy [8]. These techniques are attractive because they can be implemented in situations where spectra contain noise or when intimate knowledge of spectra behavior may not be available, as long as sufficient numbers of data samples are available. These techniques estimate system properties via regression techniques such as multivariate linear regression [8] and partial least-squares (PLS) regression (PLSR) [9], and supervised machine learning (ML) techniques such as random forests [10] and artificial neural networks [11]. In data-driven techniques, the critical issue is representing the features (spectra) used to estimate the labels (system property of interest) such that variation in the features correlates with variation in the labels. To this end, there are many feature extraction techniques that transform the spectra into features that are useful for regression/learning. These techniques aim to represent the useful aspects of the spectra with the smallest number of features (M < N, for original spectrum length N ) to reduce training time and increase the learning accuracy and generalizability. In chemometry, the wavelet decomposition (WD) is a popular technique of feature extraction [12], [13], [14]. Here, a wavelet is convolved over the spectrum, resulting in high values in frequencies where the wavelet is similar to the spectrum. The WD is computed in levels, where in each level, we convolve the wavelet over the spectrum to produce a low-pass spectrum, subtract the low-pass spectrum from the original spectrum to obtain a high-pass spectrum, and then downsample the low-pass spectrum by a factor of two and pass it along to the next level of the WD algorithm. The WD signal comprises the final low-pass spectrum and the high-pass spectra from each level. If the maximum number of levels is used, the extracted features have the same length as the original spectrum, and they can be used to reconstruct the original spectrum exactly. Alternative feature extraction techniques include principal component analysis [8], autoencoder neural networks [15], and polynomial fitting [16]. In each technique, increasing the number of features M → N reduces the error between the original spectra and the transformed features, but it also increases the number of samples used to train the model to prevent overfitting, where the model learns noise in the spectra, rather than the trends that correlate with the desired system property. While these techniques are popular in optical spectroscopy, they are not widely used in ARS. This is at least partially due to different types of spectral interference in optical spectroscopy, versus ARS. In optical spectroscopy, spectra can often be considered to be additive and, thus, changing the ratio of species changes the amplitude of the peaks, but does not change the peak locations significantly [17]. In contrast, acoustic resonance spectra for coupled structures tend to be more complex with additive, multiplicative, frequency-shift, and coupled resonance interactions. Additionally, peak amplitudes are highly sensitive to the coupling and environment, and typically are not reliable for noninvasive measurements where transducers cannot be permanently attached. Because of the high sensitivity of the peak amplitudes to small changes in the system and/or transducer placement, many chemometric techniques are not effective.
Thus, to overcome the limitations of existing spectroscopy methods, the goal of this article is to present a new feature extraction technique to facilitate applying regression or ML techniques to acoustic resonance spectra. The technique is based on continuous wavelet transformation (CWT) using Ricker wavelets. In contrast to the traditional wavelet transform techniques, where a range of wavelet scales are selected to encompass all frequency components, we use a genetic algorithm to determine the optimal combination of wavelet scales to detect and isolate peaks that are robust to noise and sensitive to the desired property. Additionally, the technique identifies the frequency region(s) that provide the most benefit to measuring the desired property. We then employ a peak normalization scheme to emphasize the peak locations, rather than the peak amplitudes. In this article, we detail the genetic algorithm-wavelet transform (GWT) feature extraction method (Section II), and we demonstrate the technique (Section III) on two sets of data reflecting common ML problems: using regression to estimate a parameter from a simulated set of spectra corresponding to a multicomponent system comprising four stacked disks (Section III-A), and using classification to detect cracks within thermoelectric wafers, based on an experimental set of spectra (Section III-B).

A. Coupled Acoustic Resonances
As an illustrative example throughout this article, we will consider axisymmetric coupled vibrations within stacked disks, as shown in Fig. 1(a). The system consists of four disks of diameter D = 10 mm, variable thicknesses, and materials consisting of: 1) lead zirconate titanate (PZT-5J, red); 2) aluminum 6061-T6 (Al, green); 3) 310 stainless steel (SS, blue); and 4) another PZT-5J. The disks are coupled to one another via a distributed elastic layer with stiffness 104 N/m 3 , which approximates the coupling behavior of the layers in light contact. Additionally, in some cases, we represent debonding between the PZT and the Al disks by including a 2-mm-diameter, 25-µm-height triangular gap between the disks. We simulate the system in COMSOL, using the solid mechanics module to model the vibrations in each component, and the built-in piezoelectric model to account for the electrical-mechanical interactions within the PZT disks. To expedite the simulations, we model the system as axisymmetric, using a quadratic mesh with element sizes of <100 µm, to ensure that the elements are at least one-tenth of the size of the smallest simulated wavelength. Fig. 1(b) shows the simulated spectrum for each isolated component (color) with component resonances (black dot) and for the full system (black). For the isolated resonances, we apply periodic displacement to the lower face and measure the resulting displacement on the upper face. To extract useful information from the complex acoustic resonance spectra, we must address the following challenges.
Challenge 1: Transducer dominance. When using unbacked transducers [PZT-5J layers in Fig. 1(a)], the resonances within the transducers can dominate the spectra and obscure the resonances associated with the components of interest. This is apparent in Fig. 1(b), where the coupled system resonances near the PZT resonances (near 5.2 MHz) have much higher amplitudes than those further from the PZT resonance.
Solution 1: We propose a normalization scheme to suppress the effects of the transducers and accentuate the resonances within the other layers (see Section II-B for details).
Challenge 2: Overlapping spectra. From Fig. 1(b), we observe that the system response consists of overlapping spectra with interactions that are additive, e.g., peaks in each individual spectrum appear in the combined spectrum; multiplicative, i.e., peaks from one component can be emphasized if they are near resonances in another component or suppressed if they are near antiresonances in another component; and coupled, wherein the resonance peaks in individual components shift, merge, or disappear when the components are coupled. As a result, when measuring the properties in one component, the coupled system peaks can be sensitive to small perturbations in the other components or coupling between components. This is apparent in Fig. 1(c  and robust to changes in the peaks that are sensitive to "noise," i.e., changes in parameters that we are not measuring. In many cases, peaks attributed to different components have different widths. We propose a method of discriminating useful versus noisy peaks based on peak width so that we can emphasize useful peaks and suppress noisy peaks (see Section II-C for details). Additionally, we can remove sections that do not contain useful peaks so that the ML algorithms can focus on the regions containing useful information (see Section II-D for details).
Challenge 3: Peak amplitude sensitivity. From Fig. 1(c), we observe that, while some peak positions are robust to "noise" parameters, the peak amplitudes can still vary significantly.
Solution 3: To remove the effects of peak amplitude, we can use barcode normalization so that the peaks have uniform amplitude (see Section II-D for details).
Challenge 4: ML model complexity. Larger numbers of frequency samples in each spectrum require more complex ML models. This higher complexity tends to increase the time required for training and decreases the model accuracy [18].
Solution 4: The proposed Solution 2 reduces the number of frequency samples by focusing on smaller frequency windows. Solution 3 further reduces the number of frequency samples by creating a barcoded spectrum that can be downsampled without losing information about the peak locations.
We implement the four proposed solutions using the steps outlined in Sections II-B-II-E.

B. Preprocessing Peak Enhancement
Frequently in ARS, a spectrum will include less dominant peaks that contain important information about the system, but they are obscured by more dominant peaks (higher amplitude). We perform a two-step process to enhance the less dominant peaks, as illustrated in Fig. 2. First, we adjust the amplitudes of the peaks using a variable scale adjustment factor. Fig. 2(a) shows the initial raw spectrum x 0 (blue) and the adjustment factor a (green), which is calculated by applying a Gaussian filter to the raw spectrum where is the Gaussian filter with empirically set width σ a > 0, and the notation (X 0 * G)( f ) is the convolution of G over x 0 . We then calculate the amplitude-adjusted spectrum X a shown in Fig. 2 where γ a > 0 is a constant used to remove any singularities from the denominator. Empirically, we set γ a = median(X 0 )/20. We observe that, compared to the raw spectrum peaks, the adjusted spectra peaks now have similar amplitudes. Next, we emphasize peaks in X a that may be obscured by neighboring resonances based on the second derivative X Here, peaks in X a correspond to local minima in X ′′ a . Fig. 2(c) shows an example of the preconditioned signal X p (black) with emphasized peaks, which is calculated as follows: where γ p is a positive, scalar weight factor that is selected empirically. In the inset image in Fig. 2(c), we observe that some peaks in the adjusted spectrum were sufficiently close that they overlapped, obscuring the smaller peak. In contrast, the peaks of the preconditioned spectrum are easily distinguished from one another.

C. Continuous Wavelet Transform Peak Isolation
Within the coupled system, some resonances are more sensitive to the desired property that is being measured, while others are more sensitive to noise from, e.g., changes in other system properties or environmental conditions. Thus, we seek a method of isolating peaks from different components. Based on the observation from Solution 2 in Section II-A, we can isolate some peaks based on differences in the peak shape. To isolate peaks of differing shape, we calculate the CWT X CWT ( f ; σ CWT ), given as [19] X To facilitate isolating individual peaks, it is desirable to select a wavelet with a single central peak. Additionally, a wavelet with a single parameter is desirable to simplify parameter optimization. Thus, we select the Ricker wavelet (also known as the Mexican hat wavelet), which is defined where σ CWT is the scale of wavelet ψ.
The result of (4) and (5) is a transformed spectrum X cwt that resembles the peaks of spectrum X p with widths of approximately σ CWT . Fig. 3 shows a spectrum X p and the transformed spectra X CWT for several wavelet scales Preconditioned spectra X p and the continuous wavelet transformed spectra X CWT using Ricker wavelets (insets) with various wavelet scales σ CWT to isolate (a) narrow, (b) medium-width, and (c) wide peaks.
[ Fig. 3 is the size of the frequency window. We observe larger wavelet scales isolate wider peaks, while neglecting the finer peaks, and smaller wavelet scales isolate narrower peaks. The critical challenge in isolating the peaks of a spectrum is identifying the optimal number of wavelets and combination of wavelet scales to implement, as will be discussed in Section II-E. We note that this peak isolation scheme depends on differences in peak widths, where the widths of the peaks depend on the geometry, materials, and boundary conditions on the coupled components. It is possible for all components to have identical resonance peak widths, which would inhibit the peak isolation process. However, in most real-world applications, this is highly unlikely. Thus, the peak isolation process is applicable for most ARS applications.

D. Barcode Normalization
ML and regression techniques utilize the amplitudes of one or more features to estimate a parameter of interest. In the case of spectroscopy, the most basic set of features is the spectrum amplitudes at each frequency. In the case of ARS, the peak amplitudes are highly sensitive to "noise," arising from small variations of the contact between components, environmental conditions, etc. In contrast, the peak positions are more robust to noise and provide a more accurate representation of the system geometry, material properties, and boundary conditions. Thus, the features used in the ML or regression algorithms would ideally represent the resonance peak frequencies, rather than the peak amplitudes. In RUS, the features perfectly represent the peak frequencies and ignore the peak amplitudes by selecting the list of peak frequencies as the features [1]. However, this process requires specific peaks to be stored in specific elements of the feature list. This becomes infeasible with multicomponent systems, where noise and small perturbations in the system can cause peaks to be deleted, inserted, or even swap positions.
Alternatively, we can remove the effect of peak amplitude through barcode normalization, which is common in spectroscopy [20]. Fig. 4 shows an example of this process. Given a spectrum X (blue in Fig. 4), we first locate all of the peaks, which are the regions where X has a local maxima. We then create a binary signal X bin (black lines in Fig. 4) with amplitude equal to 1 at the peak frequencies of X , and 0 everywhere else. Finally, we convolve a Gaussian filter with standard deviation σ BC over the binary signal to compute the barcoded spectrum X BC (red in Fig. 4). Because the barcoded signal consists of multiple Gaussian peaks, each with width σ BC , there is an upper limit on the frequency content in X BC . Thus, we can downsample X BC to certain extent without losing important information, i.e., we can still identify the peak locations. We qualitatively determine that the spectrum has sufficient detail if we downsample by a factor of DS = round(σ BC /3 f ), for frequency spacing f , as shown by the downsampled X BC (black dotted line in Fig. 4). The Gaussian filtering process is critical for two reasons.
1) The binary spectrum is highly sensitive to noise, where any shift in the peak frequency will result in 100% change in the binary spectrum amplitude. The Gaussian filter replaces the abrupt peaks with gradual peaks so that small perturbations in the peak frequency are less significant. 2) Broadening the binary peaks into wider Gaussian peaks enables downsampling the spectrum. This reduces the number of features used for ML or regression, which decreases the training time required and increases the algorithm accuracy. For each CWT spectrum X CWT ( f ; σ CWT ), we apply the barcode normalization process using the barcode width σ BC = σ CWT . To extract useful features from a raw signal X 0 , we perform preprocessing to calculate X p (Section II-B), apply CWT to calculate the transformed spectra X CWT for N CWT scales σ CWT (Section II-C), and then perform barcode normalization to obtain N CWT barcoded spectra X BC (Section II-D). To further reduce the length of the feature vector, we select N f frequency regions from the spectrum, each defined by a center frequency f c and bandwidth f bw . We then concatenate the N CWT × N f barcoded spectra into a single feature vector X.

E. Feature Extraction Parameter Selection
Within the feature extraction steps, there are several parameters, as listed in Table I, that must be tuned to achieve Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Here, the superscripts indicate each of the N CWT wavelet widths and the N f frequency regions with center f c and bandwidth f bw . Additionally, we include any regression/ML hyperparameters θ ml , which depend on the selected algorithm. Note that, in this study, we select the PLS algorithm to estimate the system property of interest. Thus, θ ml contains a single integer coefficient that reflects the number of factors included in the PLS estimate.
We tune the parameters to achieve the highest property estimation accuracy using a genetic algorithm, which is a global optimization technique that minimizes an objective function based on the principles of natural selection [21]. Fig. 5 shows the process used to tune the parameters [ Fig. 5(a)] and an example convergence plot [ Fig. 5(b)] with the best (black) and mean error (blue) at each generation. The algorithm is trained based on input data consisting of multiple raw spectra X 0 and corresponding labels Y , along with a population of parameters θ. Within each iteration of the genetic algorithm, we perform the following steps.
Step 1: We calculate X p by preprocessing the raw spectra to adjust the amplitudes and emphasize obscured peaks using values of σ a , γ a , and γ p from θ (Section II-B).
Step 2: We compute the CWT spectra CWT } for N CWT wavelet widths σ CWT in θ (Section II-C).
Step 4: For each wavelet width j, we extract N f frequency regions where frequency region k has center frequency f (k) c and bandwidth f (k) bw (Section II-E). As a result, for each of the original spectra in X 0 , there are now N CWT × N f spectra with reduced length, compared to the original spectra. We then concatenate N CWT × N f spectra into a single vector BC , X (2,1) BC , . . . , X (N CWT ,1)

BC
, X (1,2) BC , X (2,2) BC , . . . where the superscript indices (i, j) correspond to the ith wavelet width and the jth frequency region. Note that the specific order that the X (i,j) BC barcoded and windowed spectra are concatenated into X is arbitrary.
Step 5: Perform regression/ML to estimate labels Y from features X . To ensure that we do not overfit the data, we perform k-folds cross-validation, wherein we randomly separate the samples into five groups, known as "folds." We iterate over each of the folds, where for each fold k, we define a validation set (X (k) val , Y (k) val ) as the kth fold and we define a training set (X (k) train , Y (k) train ) by combining the remaining folds. We then train the regression/ML model on the training set and then use the trained model to estimate the validation labelsŶ (k) val , thus resulting in validation error ε (k) val , which is averaged over all of the samples in the kth fold. The exact error function used can vary depending on the type of problem (see Section III). After iterating over the kfolds, we compute the average validation error |ε val | = |ε (k) val |. The objective of the genetic algorithm is to iterate through different combinations of parameters θ until ε val converges to a global minimum. The genetic algorithm then outputs the optimal parameters θ * .

F. Notes on Algorithm Implementation
GWT feature extraction provides a technique for extracting useful features from complex acoustic resonance spectra of multibody structures. However, there are some limitations of the technique.
First, the basis of the peak isolation method (Section II-C) uses differences in the widths of peaks corresponding to different components. However, for multiple components with similar geometry and material, the peak widths may be too similar to isolate them from one another. As an example, if the goal was to measure the thickness of the lower PZT layer, it would be difficult to distinguish between the peaks of the upper PZT layer. This can be alleviated by using higher frequencies; however, the spectra becomes more sensitive to the noise sources at these higher frequencies as well.
Another limitation is the implementation of the genetic algorithm, which is solving a nonconvex optimization problem. To accomplish this, the user must define maximum and minimum values for the hyperparameters, including frequency window centers and widths, number of frequency windows, and width number of wavelets. As the number and ranges of variables increase, the probability of finding the globally optimal set of hyperparameters decreases. To alleviate this problem, one can reduce the hyperparameters based on the knowledge of the system or employ an adaptive range reduction scheme and/or use of scaled crossover probabilities [22].

III. EXAMPLE IMPLEMENTATIONS
In this section, we will examine the performance of the GWT feature extraction technique for two problems that are common to ARS: regression, which is used to estimate system parameters (Section III-A) and classification, which is used to detect anomalies or distinguish between different categories of structures (Section III-B). To demonstrate GWT for a regression problem, we use COMSOL to simulate the spectra of the multicomponent stacked-disk system shown in Fig. 1(a), and we then estimate the thickness of the Al disk h Al via PLSR. To demonstrate GWT for a classification problem, we experimentally measure the vibrational responses of bismuth telluride (BiTe) thermoelectric wafers, and we identify the presence of microcracks using PLS classification. PLS enables estimating parametersŶ from the measured spectra X according to either Y r = 1, X T w, for regression or (7) Y c = 1, X T w > 0.5, for classification (8) where 1 is a vector of N s for N s samples, and w is the PLS weight vector (see [8] for details). The key differences between (7) and (8) is that the estimated regression parametersŶ r can take on any real value, while the estimated classification parametersŶ c are limited to binary values (0 or 1). We note that any ML technique can be easily substituted for the PLS method. To quantify the performance of the regression algorithms, we compute the regression errors as Alternatively, for the classification error, if the number of samples N 0 and N 1 belonging to each class is unbalanced (N 0 ̸ ≈ N 1 ), we want to weigh the errors to ensure that the algorithm is not biased toward one class. We compute this weighted error as where ε FP is the number of false positives, i.e., samples where Y = 0 andŶ = 1, and ε FN is the number of false negatives, i.e., Y = 1 andŶ = 0.
For both examples, we benchmark the performance of the GWT feature extraction with PLS (denoted GWT-PLS) against PLS technique without feature extraction (denoted PLS) and against PLS with WD feature extraction (denoted WD-PLS), which is common for optical spectroscopy [12], [13]. We note that, for the benchmark techniques, we perform a grid search to optimize the hyperparameters such as the number of PLS coefficients, the WD wavelet type (Coiflets, Daubechies, and Symlets [23]), order, and number of levels. This ensures that we are comparing the GWT-PLS performance to the best-achievable performance of the WD-PLS and PLS algorithms.
To prevent overfitting of the models, we randomly select 10% of the samples to set aside as test data, and then train the models on the remaining data samples. We use k-fold cross-validation with fivefold. Here, we randomly divide the remaining data samples into five sets, use four of the sets to train the model, and then calculate the estimation error for the remaining validation set. We then repeat this process for each combination of training/validation sets. We then analyze the performance of each algorithm in terms of the validation error as well as the test set error, where the algorithm has been trained using all five validation sets.
We implement the PLS, WD-PLS, and GWT-PLS algorithms in MATLAB [24]. In MATLAB, the genetic algorithm works for a fixed number of variables. As a result, we select a fixed number of frequency regions and wavelet widths. In the future, we will implement a genetic algorithm that can accommodate any number of frequency regions and wavelet widths. For the purposes of this demonstration, we select two frequency regions and four wavelet widths. Using these numbers of parameters and the simulated spectra, the GWT-PLS algorithm was trained until the minimum error in the generation population remained unchanged for 50 generations (achieved after 186 generations). This training required approximately 36 min to complete on a 12 core Intel Xenon Gold 5118 processor (2.30 GHz) with 32-GB RAM available. We note that the training time could be reduced by training populations in parallel, vectorizing the genetic algorithm for the entire population in each generation or performing the preprocessing STEP 1 outside of the genetic algorithm.

A. Regression Example: Estimating the Thickness of a Disk Within a Simulated Stack of Disks
To demonstrate the GWT feature extraction algorithm for regression, we simulate the spectra of the multicomponent stacked-disk system shown in Fig. 1(a). In this simulated example, the goal is to use the simulated spectrum to estimate the Al disk thickness h Al , for which we simulate values 1.0 mm ≤ h Al ≤ 2.0 mm, in steps of 0.10 mm. We introduce "noise" to the measurements by varying the thicknesses of the steel (9.5 mm ≤ h SS ≤ 10.5 mm in steps of 0.25 mm) and PZT (0.4 mm ≤ h PZT ≤ 0.6 mm in steps of 0.05 mm) layers, and we either include or exclude a debonded region of diameter 2 mm between the Al and lower PZT disk. We simulate all permutations of the variables, which results in 550 spectra, each with 2001 values covering the frequency range 1-5 MHz, which was selected to ensure a sufficiently large number of peaks in a reasonable amount of simulation time.
From the hyperparameter grid search for the PLS algorithm, we identify the optimal number of coefficients N * PLS = 25. For the WD-PLS technique, we identify the optimal number of coefficients N * PLS = 26, and optimal performance is found using seven levels of a third-order Daubechies wavelet, which results in a slight increase in the feature length to 2034 terms.
After training the GWT-PLS algorithm, we observed optimal parameters θ * that included N * PLS = 25 PLS coefficients, amplitude adjustment filter width σ a = 0.51 MHz, peak emphasis weight factor γ p = 0.94, frequency regions ( f c = 1.34 MHz and f bw = 0.74 MHz) and ( f c = 3.00 MHz and f bw = 0.93 MHz), and wavelet widths σ CWT of 5, 32, 62, 150, 183, and 199 kHz. This results in reduction of the feature length from 2001 values (original spectrum length) to 819 values, which will reduce the required ML model complexity. As a result, the model requires fewer training samples and experiences reduced training time and increased testing accuracy.
Additionally, we repeated the hyperparameter training ten times to determine how unique the hyperparameter combinations were. To quantify the consistency, we measured the normalized deviation, defined as the standard deviation, divided by the mean of each hyperparameter, dev(hp) = std(hp)/mean(hp). We observed normalized deviations of 2% and 8% for the frequency window centers and bandwidths, respectively, 30% for the wavelet widths, 0% for the number of PLSR coefficients, and 57% and 49% for the preconditioning terms σ a and γ p .
The low deviations in the frequency regions indicate that the algorithm can consistently identify the regions. There were larger deviations in the wavelet widths. However, we note that some of the optimal wavelet widths would appear in pairs, e.g., in the first run, the first and second wavelets had similar widths, and in the second run, the second and third wavelets had similar widths. This resulted in the high deviation and indicates that fewer wavelets may be necessary to extract similar features. Finally, because there was a large deviation in the preconditioning coefficients, but consistently low errors, we conclude that the feature extraction is less sensitive to those parameters than the others. Fig. 6 shows the parity plots (estimatedĥ Al versus the true h Al ) for the PLS [ Fig. 6(a)], WD-PLS [ Fig. 6(b)], and GWT-PLS [ Fig. 6(c)] techniques. We plot the 95% confidence intervals (CIs) for the validation samples (blue area), the test samples (red dot), and the ideal zero-error line (black). We measure the average validation |ε val | and test error amplitudes |ε test | over all five of the data folds, normalized by the range of label valuesh Al = 1.0 mm to be |ε val |/h Al = 5.1% and |ε test |/h Al = 4.2% for the PLS [ Fig. 6(a)], |ε val |/h Al = 5.0% and |ε test |/h Al = 4.2% for the WD-PLS [ Fig. 6(b)], and |ε val |/h Al = 1.7% and |ε test |/h Al = 1.3% for the GWT-PLS [ Fig. 6(c)].
We observe that the addition of the WD feature extraction does not improve the test error significantly. Alternatively, the GWT feature extraction results in test errors that are 70.4% lower than PLS with no feature extraction and with WD feature extraction. We note that for the PLS and WD-PLS techniques, the training errors are on the same order as the validation and test errors. This indicates that the algorithm is not overfitting the data significantly and thus implies that the PLS and WD-PLS accuracies cannot be improved significantly by tuning the model parameters.
Alternatively, for the GWT-PLS, we observe that the mean errors for the training and test sets are an order of magnitude lower than the validation error. This indicates that the GWT-PLS model is sufficiently complex to properly represent the system, and that the global optimization algorithm may not have encountered the true globally optimal parameters. These trends are likely due to the GWT technique providing a feature set that better represents the changes in h Al using significantly fewer features than the PLS and WD-PLS techniques. Additionally, the validation and testing error could potentially be improved further by, e.g., increasing the number of genetic algorithm mutations used before completion or adjusting the global optimization hyperparameters.
To further understand the performance of the GWT-PLS technique, we plot the discrete probability distribution functions (pdfs) of the validation errors ε val and testing errors ε test , calculated by normalizing the histograms such that they integrate to 100%. The error pdfs are presented in Fig. 7(a) for PLS, in Fig. 7(c) for WD-PLS, and in Fig. 7(e) for GWT-PLS. Here, the colors represent the validation (blue) and test errors (orange). We also plot Gaussian pdfs based on the mean and standard deviations of the errors. Using the chi-squared goodness-of-fit test with a confidence of 0.05, we determine that the validation errors for PLS and GWT-PLS follow a normal distribution, but the validation errors for WD-PLS and the test errors for all techniques do not.
The most prominent observation from Fig. 7(b), (d), and (f) is that many more of the validation and test samples have very low errors (<1%) for GWT-PLS, compared to those of PLS and WD-PLS. As an example, we can quantify the 95% CIs, i.e., quantify the error threshold ε thr such that 95% of the samples fall within −ε thr < ε < ε thr . For the validation sets, we observe 95% confidence levels of 19.2% for PLS, 17.4% for WD-PLS, and 11.2% for GWT-PLS. For the test sets, we observe 95% confidence levels of 20.2% for PLS, 18.7% for WD-PLS, and 0.9% for GWT-PLS. Thus, the GWT-PLS results in sample errors that are clustered around ε = 0, compared to the baseline spectroscopy techniques.

B. Classification Example: Detection of Microcracks in Thermoelectric Wafers
To demonstrate the GWT feature extraction algorithm for classification problems, we utilize experimental acoustic resonance spectra used to detect microcracks in BiTe wafers, as introduced by Greenhall et al. [5]. Fig. 8(a) shows the experimental setup used to measure the acoustic resonance spectra. The BiTe wafer rests on two foam pads and a ruby hemisphere, mounted to the center of a piezoelectric transducer. The piezoelectric transducer vibrates the BiTe wafer, while a laser Doppler vibrometer (LDV) measures the response at a single point on the BiTe wafer. We excited the BiTe wafer with a low-frequency "pump" signal that swept from f L = 8-12 kHz and a high-frequency "probe" signal that was fixed at f H = 100 kHz. Fig. 8 . We sampled the frequency spectra at 9175 frequencies spanning 0-140 kHz. The response signals consist of peaks corresponding to resonances in the wafer within the pump and probe regions. Small differences in the wafer dimensions lead to shifts in the exact peak resonances. The amplitudes of the peaks are highly sensitive to small changes in wafer dimensions and placement on the transducer and measurement location, thus indicating that peak amplitude is not as reliable as peak location.
If there are cracks within the BiTe wafer, acoustic crack breathing results in a nonlinear response. Here, basis of the nonlinear crack detection is that the pump frequency is excited at high amplitudes to cause crack "breathing," wherein the crack closes when the pressure is high and opens when the pressure is low. As a result, the low-pressure wave cannot pass through the crack, rectifying the received signal. This nonlinear response appears in the acoustic spectrum as harmonics of the pump excitation at 2 × f L , 3 × f L , etc. Additionally, we excite the wafer with a high-frequency probe signal which is periodically propagated or reflected as the Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. crack is periodically closed or opened by the pump signal, respectively. This periodic propagation and reflection appears in the frequency response as sideband modulations at f H ± f L , f H ± 2 × f L , f H ± 3 × f L , etc. [5], [25], [26]. Here, we note that the amplitude of the probe excitation is typically lower than that of the pump excitation because the probe signal is not used to fully open/close the crack. The lower probe excitation amplitudes and higher attenuation at high frequencies mean that harmonics of the probe frequency (2 × f H , 3 × f H , etc.) are typically not observed.
In addition to nonlinear responses from cracks, there may be some harmonics and sideband modulation in undamaged wafers too, albeit to a smaller degree, as shown in Fig. 8(b). This is due to "jumping" of the BiTe wafer on top of the ruby hemisphere when the normal vibrational forces exceed those of gravity, which also results in acoustic crack breathing. As a result, identifying the cracked BiTe wafers from the resonance spectrum is nontrivial.
The experimental dataset consists of 104 BiTe wafers, which includes 77 crack-free wafers and 27 cracked wafers, each with a single crack of length 13-515 µm based on optical microscope measurement. For each BiTe wafer, the acoustic resonance spectrum was measured on the front and back of the wafer at 25 locations defined as a 5 × 5 grid with 1-mm spacing at the center of the wafer. This resulted in 5200 data samples for training/validation.
For the baseline techniques, we perform hyperparameter optimization via grid search. For PLS, we identify the optimal number of PLS coefficients N * PLS = 25. For the WD-PLS technique, we identify the optimal number of coefficients N * PLS = 26, and optimal performance is found using nine levels of a fourth-order Daubechies wavelet, which results in a slight increase in the feature length to 9234 terms.
After training the GWT-PLS algorithm, we observed optimal parameters θ * that included N * PLS = 25 PLS coefficients, amplitude adjustment filter width σ a = 5.3 kHz, and factor γ a = 1; peak emphasis weight factor γ p = 0.03. Through ten runs of the hyperparameter training, where we specify two frequency windows, the average center frequencies for the windows were 9.7 and 51.9 kHz, with standard deviations of 2.7 and 47.4 kHz, respectively. We observe a high standard deviation in the second window due to the fact that some training runs resulted in a second window that covered the harmonics region (16)(17)(18)(19)(20)(21)(22)(23)(24) and others that covered the probe region (∼100 kHz). However, in every hyperparameter training run, the algorithm identified 0-24 kHz as the optimal frequency span, using either one or two frequency windows. This indicates that the pump response and harmonics were better predictors of damage than the modulated sidebands. Additionally, the average wavelet widths σ CWT selected through multiple hyperparameter training runs were 60, 136, 244, and 553 Hz. Here, we calculated standard deviations that were 2.6%, 21.9%, 73.8%, and 60.8% of the mean wavelet widths, respectively. Thus, the first and second wavelet widths were consistently selected by the hyperparameter training, while there was much more variation in the third and fourth wavelets. This indicates that the first two wavelets were likely more significant than the others.
Using the GWT-PLS algorithm, the feature length was reduced from 9175 values (original spectrum length) to 1580 values, which will reduce the required ML model complexity. As a result, the model requires fewer training samples and experiences reduced training time and increased testing accuracy.
We observe that two frequency bands identified by the algorithm cover: 1) the pump frequency and the first harmonics and 2) the probe frequency (but no sideband regions). This indicates that the harmonic nonlinearities were better predictors of damage than the modulated sidebands, based on PLS classification. Fig. 9 shows the classification results in terms of the classification error for the cracked wafers (left) and the undamaged wafers (right). We calculate the errors for the validation sets (blue) and the test set (orange) for each classification technique. We observe similar error values for the PLS and WD-PLS techniques: approximately (24%-validation and 15%-test) for the cracked wafers and (4%-validation and 3%-test) for the uncracked wafers. This indicates that the WD does not provide significant benefit to the classification. This is likely due to the fact that the WD extracts features for the entire spectrum (0 kHz < f < 140 kHz), while most of the valuable information is present in small windows. This results in a significant increase in the overall feature length, which can reduce the generalizability. Alternatively, we observe significantly smaller errors for GWT-PLS: (9.9%-validation and 8.1%-test) for the cracked wafers and (1.7%-validation and 1.6%-test) for the uncracked wafers. This indicates that GWT extracted features that reflected the changes in the spectra due to cracking, while remaining robust to changes due to BiTe wafer geometry and placement and measurement location. Thus, the GWT algorithm could significantly improve the results of classification algorithms using spectral data.

IV. CONCLUSION
We present a novel feature extraction technique to facilitate applying regression and ML techniques to ARS applications with multibody systems with complex spectra. The technique is based on selecting specific frequency ranges and convolving Ricker wavelets with specific widths over the spectra to isolate resonance peaks that are robust to noise and sensitive to the system property being measured. We select the frequency ranges and wavelet widths via a genetic algorithm.
We demonstrate the algorithm for two common ML problems: regression for parameter estimation in a simulated multibody system consisting of multiple stacked disks, and classification for detecting cracks in experimental thermoelectric wafers. We observe that the feature extraction technique results in a reduction in test set estimation error of approximately 95% for the regression problem and a reduction in test set classification error of 40%, compared to baseline techniques based on PLS with no feature extraction or PLS with WD feature extraction. Thus, for ARS measurements, the GWT feature extraction technique provides features that are more sensitive to the system property being measured, more robust to noise, and more compact, compared to common techniques in optical spectroscopy.
In addition to the PLSR and classification used in this article, GWT feature extraction can be applied to a wide range of regression and classification, and both supervised and unsupervised ML algorithms for the purpose of estimating system properties or performing classification. GWT can also be applied to other spectroscopy applications beyond ARS, such as infrared through Raman spectroscopy, X-ray spectroscopy, and nuclear magnetic resonance spectroscopy.