Development and Evaluation of Artificial Neural Networks for Real-World Data-Driven Virtual Sensors in Vehicle Suspension

Vehicle comfort, handling, and stability can be improved by a semi-active suspension with advanced control algorithms using the vertical velocities of the sprung mass (SM) and the unsprung masses (UMs) as inputs. Displacement and acceleration sensors are often used to estimate vertical velocities of UMs. However, these sensors are expensive and are susceptible to degradation. Virtual sensors (VS) have been proposed as a solution, and previous research using simulation data has shown that artificial neural networks (ANNs) can provide usable UM vertical velocity estimates. This study aims at finding ANNs structure and input sample window size to achieve best performance on real-world data. Novel dataset was created and used to test VSs based on eight structures of ANNs combining multilayer perceptron, convolutional neural network, long-term short-term memory (LSTM), and bidirectional LSTM layers. This article presents the results of 104 combinations of ANN structure and sample window size, which required 6240 training sessions. A Bayesian search was used to tune the hyperparameters of ANNs’ layers minimizing the root-mean-square error (RMSE) of estimations on the validation data, while a grid search was used to select the sample window size that minimizes RMSE of estimation on test data, ensuring that selected combinations are well generalized. The VS based on ANN with convolutional layers, achieved the lowest RMSE of 0.0210 m/s, and processing time of 0.421 ms for a window size of 23 samples while estimating vertical velocities of vehicle UMs from real-world data.


I. INTRODUCTION
Nowadays, the automotive industry is strongly focusing on automated driving technologies [1], so the ride comfort is becoming increasingly important.In particular, fatigue and motion sickness may be more critical in automated vehicles as the driver is less involved in vehicle control [2].Occupants may also engage in various activities while driving, such as The associate editor coordinating the review of this manuscript and approving it for publication was Junho Hong .
working on a laptop or reading, leading to higher demands on ride comfort [3].Some studies show that it would be beneficial to achieve a comfort level in a car similar to that of trains [3], [4].Nevertheless, various control algorithms are developed for driver assistance systems that can be used in conventional vehicles to improve comfort.These algorithms involve adaptive, semi-active [5], [6], [7], and proactive suspension with road surface preview [8].
The ongoing challenge to improve comfort, stability, and handling without increasing hardware costs requires continuous enhancement of chassis control performance [9].For example, semi-active and active actuators must replace passive vehicle suspension components to provide satisfactory comfort and contribute to better vehicle handling [10].Implementation of active components requires advanced suspension control and state measurement algorithms.Active and semi-active actuators can change the system characteristics by changing the damping force.Commonly, industrial control solutions such as Skyhook or hybrid and other state of the art algorithms use vertical velocities of the unsprung masses (UMs) and the sprung mass (SM) as inputs, and they are estimated instead of direct measurement [11], [12].No robust sensor is available at a reasonable price for vertical velocity measurements of SM and UM.Vertical velocities of SM are estimated using IMU's measurements and Kalman filters.IMU sensor placed on SM is used by other systems such as dynamic stability control (DSC).Displacement [13] and acceleration sensors [14] are commonly used and velocity of UM is estimated by fusing measurements of these sensors.This requires installation of additional sensors at UMs, leading to higher overall costs, packaging issues, and robustness problems.UM displacement sensors are sensitive to environmental influences and wear with use.Acceleration sensors measurements are prone to integration bias, and installation of such sensors onto each wheel hub is complex.A possible solution to this problem is the use of UM virtual sensors (VSs) [15].Such sensor could estimate UM vertical velocity using only data from sensors installed on SM.
A VS is a software algorithm that generates signals by combining and processing data received from physical sensors and estimators [16].The generated data is fed into complex functions or applications [17].The VS can be model-based and data-driven [18].In model-based approaches, mathematical models are used to define the relationship between input and output variables.Automotive applications often use variations of the Kalman filter (KF) for this type of VS [19].For example, the KF in [20] was used to estimate the vehicle sideslip angle, and the tire forces were estimated in [21].
The study [22] used the KF for suspension state estimation.However, due to the nonlinearity of vehicle components, it can be challenging to use mathematical equations to create an accurate VS model [23].On the other hand, the data-driven VS relies solely on recorded data obtained from observation of system operation [24].Practical implementations of data-driven VS include multivariate statistical methods and artificial intelligence methods.
Data-driven VS developed for vehicle suspensions has been investigated in previous studies [15], [23], [24], [25].In [24], the authors presented a data-driven approach based on deep learning (DL) for estimating the road profile height and state variables of vertical displacement and velocity of vehicle UMs using onboard sensors.The proposed VS was compared with the extended KF and the static nonlinear autoregressive exogenous model.The performance results were in favor of the proposed VS model.However, the algorithm was only tested in a simulation environment with a narrow range of driving conditions.Therefore, it is unclear how such sensor would perform with real-world data.
Previous studies [23], [25] provided promising results using data-driven VS based on artificial neural networks (ANNs) and specifically deep neural networks (DNNs) for estimation of UMs' vertical velocities.
The motivation behind this research rises from the observed deficiency in the development and testing of the data-driven VS for UM vertical velocity using realworld datasets.Furthermore, we aim to compare and select ANN structure and determine the most suitable input sample window size to achieve the highest estimation accuracy.
Thus, the objective of this study is to develop and compare data-driven ANN-based VSs (described in Section II) on a real-world dataset.This algorithm is part of vehicle state estimation, which is required as input for feedback control algorithms, that are not part of this article.The research presented in this article includes determining the optimal sampling window length and hyperparameter combinations for each of the selected ANN structures.In order to deal with the lack of datasets, a new dataset was created using raw data collected at the proving ground and under urban driving conditions from a vehicle demonstrator equipped with a semi-active suspension.The ANN training process was repeated 60 times for each combination of structure and sample window size, to select the best hyperparameters using a Bayesian search.Overfitting of the ANN is prevented by selecting the model iteration with the best root mean squared error (RMSE) on validation data from 60 training iterations.
The contributions of this study include: • Preparation of a real-world dataset for the development of a VS for the vertical velocity of UMs: A major contribution lies in the careful compilation of a real-world dataset tailored for the development and rigorous testing of estimators for the vertical velocity of UM.
• Systematization of tasks and methodologies for the creation of data-driven VSs: This provides a clarification of tasks and outlines a comprehensive methodology for the creation of data-driven VSs for vehicle suspension and sheds light on new approaches in this domain.
• Development and selection of best type and optimal structure of ANN for VS of UMs vertical velocity: Through extensive experimentation, this research identifies the most appropriate type and structures of ANNs that are best suited for the prediction of UM vertical velocity.
• Determination of the optimal input window size for real-time VS performance: The critical dimension of input sampling window size for the tested ANN structures is introduced, which significantly affects the estimation accuracy.
• Analysis of the influence of the input signals on the RMSE of output: In this study, the influence of input signals on the resulting RMSE of the estimations is rigorously investigated and quantified.
• Testing the impact of input signals' means compensation on output RMSE: The significance of input signal means compensation is evaluated, providing valuable insight into the impact of such compensation on the performance and estimation accuracy of VS.These contributions collectively advance the state of knowledge in the field of UMs' vertical velocities estimation, offering novel perspectives and methodologies that significantly enhance task understanding and solving capabilities.
The rest of this paper consists of four sections, excluding the introduction.Section II describes experimental setup and experiments for the real-world data collection, collected signals and dataset preparation, and development of VS, including testing the importance of inputs, the eight ANN structures, and hyperparameter search methods.In Section III, the experimental results are presented and analyzed, and the best solution is found.In Section IV, the main results of the research are discussed and the conclusions are drawn.

II. DATA COLLECTION AND NEURAL NETWORK STRUCTURES A. EXPERIMENTAL SETUP FOR DATA COLLECTION
This research aims to create VS that would replace UM displacement and acceleration sensors (see Figure 1) currently used for estimation of UMs' vertical velocities that are used in suspension control algorithms.There is a lack of a real-world dataset suitable for supervised learning of ANN and performance evaluation.Therefore, the original data needs to be collected using vehicle demonstrator and dataset prepared.
Audi A6 (2019 model year) with a semi-active suspension prototype developed by Tenneco was used in this research.This demonstrator vehicle was equipped with a dSPACE real-time target machine (RTTM) that runs a suspension controller that uses UM vertical velocity as input.RTTM is also capable of recording sensor, estimated, and algorithms' output data.Therefore, it is used for experimental data acquisition.The data is fed to the connected computer that records it into files.The data is logged at a fixed sample rate of 100 Hz into the recording file.
In order to enable the collecting of required data, additional sensors were installed on the vehicle; they included an IMU placed in the center of gravity of the vehicle, optical flow sensor (OFSs) mounted on the rear left side door for sideslip angle measurement, and UM vertical displacement sensors were mounted between lower suspension links and vehicle body in parallel to dampers.Furthermore, access to the vehicle's controller area network (CAN) channels was granted, this provided access to all standard invehicle sensors, which were in their standard locations in the vehicle.
An overview of the system is given in Figure 1.There the sensors are connected to the RTTM which implements suspension controller, signal recording, and proposed UMs' vertical velocities VS.It shows how front-left (FL), frontright (FR), rear-left (RL), rear-right (RR) UM acceleration and displacement sensors can be replaced or duplicated by VS for FL, RF, RL, RR.
Proposed UM vertical velocity VS is meant to use in RTTM instead of the currently used algorithm for estimation from measured UM vertical displacement.This would allow to remove displacement sensors if estimation using VS on real-world data proves to be good enough compared to current estimation.

B. EXPERIMENTS FOR DATA COLLECTION
The dataset was created based on test data collected on the Lommel proving ground using vehicle demonstrator.The procedures below describe the tests performed to collect the data.These tests are commonly used for vehicle handling, comfort, and stability studies, therefore recorded in-vehicle sensor signal data should provide enough information for training, validation and testing datasets.The data was recorded for different types of tests: acceleration and braking; skid pad, step steer, double-step steer, obstacle avoidance, sine with dwell, sinusoidal steer, and comfort tests.Also, data was collected under urban driving conditions.
In the acceleration and braking test, maximum acceleration is applied from a standstill in a straight line, and at 100 km/h, the vehicle is brought to a standstill with maximum braking force.
In the skid test (ISO 7975:2019), the driver controls the vehicle at a constant radius of turn.The tests started from velocity of 10 km/h.The vehicle velocity gradually increased.This test was performed up to the velocity level at which the driver could no longer keep the vehicle on the target trajectory.
In the step steering test (ISO 7401:2011), the driver accelerated the vehicle in a straight line until a velocity of 100 km/h was reached.After that, the accelerator pedal was held constant, and the driver performed a stepped steering wheel input (counterclockwise or clockwise) of 100 • with a rate of 400 • /s.The steering wheel angle was held at 100 • for at least 2 s after the first actuation.
In the double-step steering test (ISO 17288-1:2011), the driver accelerated the vehicle in a straight line until a velocity of 100 km/h was reached.Afterwards, the accelerator pedal was held constant, and the driver performed a stepped steering wheel input (counterclockwise) of 100 • at a rate of 400 • /s.The steering wheel angle was held at 100 • for 2 s after the initial actuation.After 2 s, the steering wheel was rotated to -100 • and maintained for 2 s.After that, the driver set the steering wheel back to 0 • .
The obstacle avoidance test (ISO 3888-2:2011) is a dynamic maneuver in which a vehicle moves rapidly from its original lane to adjacent road lane and returns to the original lane without exceeding lane limits.The goal was to ensure that the vehicle achieves a specific sequence of alternating high lateral acceleration values.During the test, the driver holds the accelerator pedal constantly.The initial longitudinal velocity of the vehicle was measured to ensure the repeatability of the test procedure.
In the sine-with-dwell test (ISO 19365:2016), the vehicle was accelerated to a velocity of just over 80 km/h.Afterwards, a constant accelerator pedal position was held.The steering wheel input has a waveform of a sine wave with a frequency of 0.7 Hz that pauses for 0.5 seconds after reaching the second peak.
The sinusoidal steering test procedure includes driving conditions where the vehicle reaches a lateral acceleration of about 6 m/s 2 , which is achieved with a steering wheel amplitude of about 50 • at a vehicle velocity of about 80 km/h.Stronger inputs can also be made up to and beyond the handling limit, e.g., steering wheel angle of 100 • for a more aggressive maneuver, but not at the handling limit, steering wheel angle of 150 • for a maneuver at handling limit.
The driver accelerates the vehicle in a straight line until a constant velocity of 80 km/h is reached.The accelerator pedal is held at a constant value.After that, the driver gives a sinusoidal, wave-like steering wheel input with a predefined magnitude.The frequency of the sine wave is approximately 1 Hz.The steering input lasts for two cycles.
The comfort tests included driving on a road with Belgian pavement, driving on a road with bumps at different velocities, and driving on a road with high-class (D-F) pavement irregularities.Driving is performed at a constant velocity in range of 25-70 km/h.These tests represent good and bad driving conditions.
The collected data serves as a comprehensive basis for building DL dataset due to the wide range of real-world driving scenarios.The diversity of these tests, provide valuable data, that is suitable for development of data-driven VS not only for UM vertical velocity but also for other signals that are recorded.

C. SIGNALS AND DATASET PREPARATION
In this paper, estimation of UM vertical velocity using VS is determined as a time series regression task with multiple inputs and multiple outputs (for each wheel simultaneously) when a current and a certain number of previous samples are available.
The main characteristics of each input signal must be described to understand the relationship between input and output and their usability.The values of the signals have different characteristics in terms of continuity, scale, and mean value.Discontinuous signals can be useful as switching signals for changing the behavior of the ANN; however, most ANNs that produce continuous output signals will benefit more from continuous input signals that can be used directly to form the output signal.The mean values of the signals are used to offset that signal towards 0, because it is the point of nonlinear part of selected activation functions.The standard deviation shows the extent to which the signal varies and is a good metric of the signal scale.The scales of the signals must be similar to speed up the DL process; therefore, the signals are normalized by dividing them by the standard deviation.Next, the signals are described and main characteristics are provided in Table 1.
Driver torque demand is a signal available from the driveby-wire gas pedal.The value of this signal is important to engine and DSC systems.Although not directly, it does affect the longitudinal acceleration that is measured by IMU.The acceleration induces change of pitch angle, thus unloads and loads the suspension and affects the vertical displacement and velocity of the UM.
The DSC regulation signal indicates the system's engagement in braking or engine torque reduction.Active DSC may change the load on opposite sides and ends of suspension, thus influencing the UM vertical velocity and displacement.
The master cylinder pressure signal indicates the applied braking pressure that directly correlates with deceleration.This deceleration modulates the vertical load distribution, thereby affecting the UM's vertical displacement and velocity.The steering angle and direction provide information about the steering input of the driver.This information is also used for the lateral acceleration and the distribution of the load on the left and right sides of the vehicle suspension.
The optimized steering angle is the target steering angle calculated by the vehicle taking into account user input and the requirements of other onboard systems.
Vehicle velocity information is important for ride control when the vehicle overcomes surface irregularities, bends, potholes, and turns.It provides information on the extent to which lateral and vertical acceleration can be expected and how quickly the road impact on the front suspension reaches the rear suspension.
The wheels' velocities are provided by the anti-lock braking system over CAN.These signals partly duplicate vehicle velocity, but also adds information about friction.
The X, Y, and Z axes accelerations, roll, and yaw rates obtained from IMU provide information about the state of the vehicle: orientation, acceleration, and rotation due to driver input and external factors.
The body sideslip angle provides information about the difference between the vehicle's heading and its actual direction of travel, usually is used in DSC.
These parameters, along with the vertical velocities of SM and UM, are used in advanced control algorithms for semi-active suspensions.However, not all inputs are equally important in estimating the desired output values.Therefore, it is necessary to evaluate their significance when processed by ANN-based VS.This was done using value replacement methods and evaluation of the RMSE change on the test dataset (see Section III (b)).
The output signals are the UM's vertical velocities for each wheel.The ground truth UM's vertical velocities were estimated from the vertical wheel displacement in reference to full suspension extension point on the experimental vehicle.At first, the signals from the vertical wheel displacement sensors were filtered at 50 Hz using a MATLAB Butterworth low-pass filter.Afterwards, the difference between the current and the last sample was calculated.Finally, the resulting difference between the samples is divided by the sampling time to obtain the vertical velocity of the wheel.This is the same filter used in the real-time processing of vehicle suspension control units.As it does use current and last samples, a delay of about half of sample period is introduced into the filtered signal.
Multiple pasts values of the input signals are needed to estimate the UM velocity at a given time as they provide information about the change in the signal.Therefore, the window length of the input samples is an important parameter as it affects the amount of signal memory available to the ANNs.The presence of past samples allows the learning of signal features at lower frequencies.At a fixed sampling rate of 100 Hz, the tested sample window sizes of 3 to 51 samples provide a signal buffer of 30-510 ms.This allows real-time signal processing using first-in-first-out buffers, with the newest data point added and the oldest removed at each sampling period.The results in the Section III show that the processing time is considerably shorter than sampling period.Compared to low-pass filters, the proposed ANNs avoid problems related to delays that directly depend on the sample window size, as ANNs predict current state from last and current input signals based on learned model.
The total number of samples included in the dataset is about 393 thousand.The samples are randomly divided into 272 thousand (70%) in the training set, 61 thousand (15%) in the validation set, and 60 thousand (15%) in the test set at the experimental record level.Therefore, all samples belonging to one recording were assigned to one of the parts and not seen in the other part, which prevented very similar samples occurring in all parts in due to slow signal changes compared to the sampling frequency.Also, this is only possible way to split data, as most driving action happens in the middle of the recording.The validation samples were used for RMSE minimization during training and Bayesian search, and the testing samples were used to compare ANN structures and sample window size combinations.
A grid search was applied to select the sample window size for each ANN structure to systematically explore the different combinations and determine the best configuration.Grid search is a widely used method for implementing structure and parameter tuning in machine learning.The grid values for the sample window size were selected based on performance variation to ensure that a larger window length was near the minimum RMSE point.Window sizes of 3,5,9,11,13,15,17,19,21,23,25,27,35 and 51 samples were selected because 3 samples are the least number of samples that can be used to predict the change of UM movement direction and 51 samples provide information of the lowest frequency that is important for suspension control.More than 51 sample window size was not tested as it would require even computational time while providing a diminishing increase in accuracy.

D. STRUCTURES OF ARTIFICIAL NEURAL NETWORKS
The selection of an appropriate network type, application domain, hardware imposes several requirements.First, an algorithm of ANN-based VS should be implemented using MATLAB and Simulink.Second, it should be compiled for a specific RTTM.Third, the compiled algorithm should run one cycle in less than 10 ms, because input signals are sampled at 100 Hz.The requirements limit the types of neural layers and scale of the ANN (including number of layers and number of neurons in them).Therefore, small task-specific ANNs were developed instead of using well-known backbone networks such as ResNet [26].For the development of VS, ANNs with multilayer perceptron (MLP), convolutional neural network (CNN), long short-term memory (LSTM), bidirectional LSTM (BiLSTM) layers, and MLP-LSTM combination were implemented and compared.
Structures of the networks are shown in Figure 2.There hatched blocks are part of a variant of the same type of network with different number of layers.In Figure 2, only fixed properties of the layers are shown, other important parameters (neurons/units, convolutional kernel's size and stride) are provided in Tables 2-9 and 11.The window size and network hyperparameters were selected using Bayesian and grid searches.This process is described further in Section II (e).Next, each ANN type is explained.
The MLP structure is the simplest and most flexible (see Figure 2. a).It includes one or more hidden layers of neurons.Each neuron has a weighted connection to all neurons in the previous layer.The connection weights were changed during the learning process.The disadvantage of this flexibility is the high computational complexity since each connection requires an additional multiplication.Many multiplications and additions are performed in the planned task; therefore, using a matrix form of the equations is more efficient, which can be accelerated by a multiple data instruction operation in modern processors, especially in graphical processing units (GPUs).In this context, an MLP layer with an arbitrary number of neurons can be expressed by the following equation: where y is the output vector; W is the learnable input connection weight matrix of shape (k, n), where k is the number of neurons, n is the length of the input vector; x is the input vector of shape (n, 1), where n is the length of the input vector; b is the bias vector; and σ is the nonlinear activation function.The hyperbolic tangent (tanh) function was used as an activation function for all the MLP layers, excluding the last function that did not use activation.This equation can be simplified by concatenating the bias vector with the weight matrix and adding one to the input vector x.This simplifies Equation ( 1) into a single matrix multiplication as follows: where, W is the learnable input connections and bias weights' matrix of shape (k, n), where k is the number of neurons, n is the length of the input vector + 1; x is the input vector of shape (n + 1, 1) because of the added value of 1 instead of a bias variable, where n is the length of the input vector.This equation can be processed faster because only dot product and activation operations are required.Most machine learning frameworks abstract related mathematical operations into high-level functions or ANN building blocks and use optimized code for processing.The described combination of weighting matrix and bias vector is done within functions, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
and an input vector is a single input of such functions.An MLP layer in a complex ANN is often referred to as a fully connected layer (FC) because neurons in the layer have weighted connections with neurons in the previous layer.MLPs with up to 3 FC layers and different numbers of neuron units were tested, and the best combinations were selected for comparison with other architectures.
CNNs have emerged as a powerful class of DL models, particularly due to their exceptional performance in image analysis and recognition [28].In recent years, their applicability has been extended to various domains, including signal processing and VS, due to their ability to extract relevant features from input data [29].They are particularly effective in capturing local patterns and hierarchical representations from such data [30].Therefore, it was decided to test a CNN in VS.
In a CNN, each convolutional layer consists of a set of adaptive filters that are convolved with the input data to extract relevant features.These filters capture local patterns in the data by performing element-wise multiplication and aggregating the results [31].Pooling layers can be then used to shrink the feature maps and reduce their spatial extent while preserving the most relevant information [32].The extracted features are then passed to one or more fully linked layers, which perform classification or regression tasks based on the learned representations [33].
A CNN layer with any number of units can be described by the following equation [34]: where X n,i,j is the value of the output feature map matrix of the convolutional layer (before activation function) at index n, i, j (n -index of the convolutional kernel; i -index of the row in feature map matrix; j -index of the column in feature map matrix); W n,h,w -value of kernel filter at index n, h, w (n -index of the convolutional kernel; h -index of the row in the convolutional kernel; w -index of the column in the convolutional kernel; A si+w,sj+h,c value of input array with index si + w, sj + h, c (s stride of kernel; c channel of input array); B n -bias of convolutional neuron with index n; N -number of convolutional neurons in the layer; H number of kernel rows; W number of kernel columns; C -number of channels in the filter that is the same as channels count of the input array.Each signal has own channels.In this study, a CNN-based DNN architecture tailored for this application is proposed to meet the real-time data processing requirements of a VS.The DNN consists of 2 or 3 convolutional layers, each followed by a nonlinear activation function of a leaky rectified linear unit (LReLU) to extract the essential features of signals in the time domain and the relationships between signals as images in 2D space (see Figure 2. b)).The LReLU can be described by the following equation: where a is the settable coefficient of steepness of the activation function in the negative region of the input values (a was set to 0.1).
LReLU avoids the problem of ANNs not learning when using rectified linear unit [35].Zero input values cause the zero-gradient problem for negative X t values.The output of the CNN layers is converted to a vector and given to FC layers, allowing the network to learn and make predictions at a higher level, similar to an MLP.
In the case of VS, the inputs to MLP and CNN are provided as W×M arrays, where W is the sample window size and M is the number of sensor inputs.These inputs are passed to the ANN in each sampling step and provide the output corresponding to the last sample.In this way, the output delay is minimized to the ANN processing time.
Unlike MLP or CNN, the LSTM stores its last state and output and uses them as inputs for decision making when processing the next sample in the sequence (see Figure 2. c)).The LSTM inputs were supplied as a W-length sequence of vectors, the length of which corresponds to the number of M-sensor inputs.The LSTM consists of a forget, input, output, and cell input gate (see Figure 3).The following equations can describe an LSTM layer with any number of units where The LSTM can be unrolled in time to observe the interfaces between successive input samples.Each time, the same LSTM unit processes the next data item in a sequence of length W and an input vector of length N. In such cases, t varies from 0 to W-1.
BiLSTM is similar to LSTM in that it uses the same structure (see Figure 2. d)).Unlike the LSTM unit, a BiLSTM unit uses two connected LSTM structures to obtain the last sample and outputs the current sample, while the other unit obtains the next sample and outputs the current sample.The outputs of the two LSTM structures for the same sample are linked.The output at each step depends not only on the past samples in the sequence, but also on the next data sample.
Such processing is possible only if all samples of the input sequence are available.The last structure type tested was MLP-LSTM, a combination of MLP and LSTM (see Figure 2. e)).In this model, the sequence data is first convolved and preprocessed with MLP before being deconvolved and processed with LSTM.This structure was tested to determine if an additional feature extraction layer could be better combined with the memory capability of LSTM.

E. ANN TYPE AND STRUCTURE SELECTION
Eight described above ANN structures with various types and numbers of layers were selected for investigation.
After determining all possible combinations of the selected sample window sizes and structures, a Bayesian search is applied to fine-tune the hyperparameters of the convolutional and fully connected layers.Bayesian search is a probabilistic optimization method that iteratively updates a probability distribution over hyperparameters based on the observed performance of the model.By leveraging prior knowledge and iteratively refining the distribution, a Bayesian search aims to efficiently explore the hyperparameter space and identify the most promising hyperparameter values.
The advantage of the Bayesian search for hyperparameters lies in its ability to handle a limited amount of data and efficiently use computational resources.It adapts to the information gained during the search process and dynamically focuses on promising regions of the hyperparameter space.This adaptive nature allows the Bayesian search to converge to optimal hyperparameters with fewer evaluations compared to other methods, such as grid search.
However, it is important to note that Bayesian search typically requires more computational resources and time than grid search because of its iterative nature and the need to update the probability distributions [27].Additionally, the performance of the Bayesian search depends heavily on the choice of the prior distribution and acquisition function used to guide the search process.
The training process was repeated at least 60 times for each ANN and window-size combination structure using ADAM optimization algorithm to select connection weights in neural networks and settings of LSTM and BiLSTM memories, while performing the Bayesian search for selection of hyperparameters.Each training session lasted for 30 epochs.
One epoch is a training cycle for all the training dataset samples.The selected initial learning rate was 0.001 with one learning rate drop by a factor of 0.2 after 20 epochs.The learning rate reduction enables more accurate minimization in the selected minimum point of the cost function.Also, the gradient rate threshold of 1 was set, to reduce the effect of high error during the start of the learning process.The data samples were shuffled before each epoch to improve the stability of the learning process.This article presents the results of 104 ANN structure and sample window size combinations, which required 6240 training sessions.The testing of RMSE and processing duration were carried out for each pair of ANN type and sample window size after training.The test set is being processed using a central processing unit (CPU) only and a batch size of 1, which includes sensor data that fills the selected sample window size.One Nvidia Geforce RTX 2080 Ti GPU was used for training using mini-batch sizes up to 10922 samples, and was limited by the amount of GPU memory, therefore mini-batch size was calculated dividing 32 768 by the selected sample window size.One computing core of the Intel i7-8700K central processor was used for processing duration testing using single sample mini-baches as in real-time processing.One core of the central processor was used as it reassembles well the processing on RTTM if no multi-threading is supported.The processor of RTTM provides lower core clocks, different CPU architecture and instruction sets, therefore the results are comparable between the test only.Simulink implementation on the same CPU would further reduce the computing duration, while implementation on RTTM would increase the processing duration.The average processing duration is calculated by dividing the test set processing time by the number of samples in it.
Overfitting is possible when training ANNs.Overfitting the training data usually results in the RMSE increase for the validation data, whereas the RMSE of the training data decreases.Several measures were adopted to prevent overfitting.First, the checkpoint with the best RMSE for the validation data is recorded for each Bayesian iteration.Second, dropout layers were introduced before the output FC layer in each ANN, and earlier in the MLP-LSTM structure.Thus, the risk of overfitting is reduced to the maximum possible extent for the current dataset.
The MATLAB code for VS training and testing is available at: https://github.com/eldux/UMvelocityVSresearch.

A. ANN TYPE AND SAMPLE WINDOW SIZE TESTING RESULTS
The performance metrics achieved by the simplest MLP with one hidden FC layer after hyperparameter search and learning with window sizes ranging from 3 to 51 samples are shown in Table 2.For these networks The RMSE ranged from 0.0235 to 0.0261 m/s, and the processing time ranged from 0.339 to 0.411 ms/sample.The processing duration is not strongly correlated with the number of neuron units in FC or 13190 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the window size.This is due to hardware limitations caused by the latency in data transfer between the memory and the computational units.The best performance for an MLP with one hidden layer was achieved using window size of 51 samples -RMSE of 0.0235 m/s and processing duration of 0.391 ms/sample.This model contained 752 hidden neurons.Next, 2 hidden layer MLP was tested.The performance metrics of the MLP with two hidden layers are listed in Table 3.The RMSE ranges from 0.0218 to 0.0256 m/s and processing duration from 0.404 to 0.507 ms/sample.The highest accuracy of 0.0218 m/s RMSE was achieved using a window size of 51 input samples with a processing time of 0.404 ms/sample.The model contained 1118 hidden neurons.Next, 3 hidden FC layer MLP was tested.The performance of the MLP with three hidden FC layers is shown in Table 4.The RMSE ranges from 0.0218 to 0.0246 m/s, while the duration from 0.350 to 0.6 ms/sample.The best model achieved an RMSE of 0.0218 m/s and a processing duration of 0.350 ms/sample with a window size of 51 samples.This model has 1222 hidden neurons and is faster than MLP with 2 hidden layers while providing same RMSE.The number of neurons on the second layer (FC2) is 50, which is noticeably lower compared to the first (FC1) and third (FC3) layers.This hourglass-like shape leads to a reduction of connections between layers and faster performance.Next, a DNN with 2 CNN layers was tested.
It should be pointed out that for all tested structures RMSE difference with input window sizes from 3 to 51 samples was in a range 18% from minimum RMSE.The performance of DNN with 2 CNN layers was tested and is shown in Table 5.The RMSE ranges from 0.0210 to 0.0234 m/s and the processing duration from 0.365 to 0.620 ms/sample.The best RMSE of 0.0210 m/s was achieved for window sizes of 21, 23, and 35 samples.Among them, one with input sample window size of 23 has shortest processing duration among them -0.421 ms/sample.This model achieved a 3.67% reduction in RMSE but was 20.3% slower compared to MLP with 3 hidden FC layers.This model has 195 convolutional units and 94 FC neuron units.Processing time was within acceptable limits, while performance was improved.Next, the 3-layer CNN was tested.The performance of the DNN with LSTM is shown in Table 7.The RMSE ranges from 0.0226 to 0.0244 m/s and the processing duration from 0.512 to 1.079 ms/sample.The best accuracy of 0.0226 was achieved for 35 samples window with a processing duration of 1.079 ms/sample.The RMSE is 0.0005 m/s higher compared to the best CNN based DNN.This model has 189 LSTM and 222 FC neuron units.The overall performance is worse than the best DNNs with 2 or 3 convolutional layers, and 2 or 3 hidden FC layers but better than ANN with one hidden FC layer.Next, BiLSTM was tested.The performance of the BiLSTM is shown in Table 8.The RMSE ranged from 0.0225 to 0.0244 m/s and the duration from 0.564 to 1.019 ms/sample.The best accu-racy of 0.0225 was achieved for sample window size of 35 with a processing duration of 0.986 ms/sample.This model has 98 BiLSTM and 238 FC neuron units.The RMSE is 0.0001 m/s lower than the best LSTM but lower by 0.0015 m/s than the best DNN with 2 convolutional layers.It estimates more accurately than an ANN with one hidden FC layer.It appears that bidirectional signal propagation provides a small RMSE advantage of 0.0001 m/s in the case of VS.BiLSTM was not investigated further.The performance by DNN based on MLP-LSTM combination is shown in Table 8.The RMSE ranges from 0.0229 to 0.0240 m/s and the duration from 0.478 to 1.151 ms/sample.The best accuracy of 0.0229 was obtained for 21 and 23 sample windows sizes with the shortest processing duration of 0.568 ms/sample for 21 sample window size.This model has 511 FC neuron units and 68 LSTM units.The processing time of MLP-LSTM seems to be slower than that of the DNN with 2 hidden FC layers and the same RMSE.Bayesian search resulted in lower LSTM unit counts for all investigated window sizes compared to LSTM and BiLSTM, and on average shorter processing times.The accuracy was worse than the best ANNs with LSTM/BiLSTM, CNN, and 2 or 3 hidden FC layers, but better than an ANN with one hidden FC layer.Developed VS for UM is using data from SM, the high frequency oscillations are not reconstructed, as they are not transmitted to SM (due tire and suspension damping).Lower frequency signals are more important in suspension control, thereby the performance of developed VS is sufficient.
In summary, the best-performing VS was based on DNN with 2 convolutional layers using input window size of 23 samples, and the second-best was based on DNN with 3 convolutional layers using input window size of 19 samples.

B. RESULTS OF INPUT IMPORTANCE TESTING
The importance of the inputs was tested by setting each input one by one to 0 or 1 and recording the difference of the RMSE on the test set.0 is used as it would mean no input, and 1 is used as it is the standard deviation after normalization.The results are shown in Table 10.These results show that the IMU roll rate is the most important input signal for estimating the UM vertical velocity.Other important signals include the IMU yaw rate, accelerations on X, Y, Z axes, steering angle optimized, and front left wheel velocity.Slip angle, longitudinal and transversal velocity, FR, RL, RR wheels velocities, driver torque requirement, and master cylinder pressure signals are less important.DSC regulation signal could be omitted, as it is not having impact.However, there is no signal that, when set to zero or one, improves performance.Redundant velocity readings from the wheel velocity sensors and transversal velocity readings from the OFS sensor reduce the effects of signal and measurement noise, as removing these signals increases RMSE.Yet if it is not possible to use OFS, it can be discarded as the impact on performance would be manageable, and the model could rely more on the IMU and CAN data instead.

C. RESULTS OF NO INPUT MEAN COMPENSATION
To better understand how mean value compensation of the input signals affects the performance, additional tests were performed where only the standard deviation was normalized each signal.This is important because the mean and standard deviation were calculated using only the training data.The mean was subtracted from the validation and test set data, and the results were divided by the standard deviation to bring all signals to approximately the same level.
Table 11 shows the performance results obtained with various sample window sizes, which are compared with those in Table 5.The lowest RMSE was obtained with window sizes of 21 and 27.The processing duration for window size of 27 samples was lower -0.425ms/sample.The RMSE range of 0.0210-0.0233m/s was similar to the one achieved by DNN model with 2 convolutional layers trained on data with mean subtraction.The computation time was not affected by not using the mean compensation.These results show that there is no significant difference between using or not using mean compensation.

IV. DISCUSSION AND CONCLUSION
The real-world dataset developed for this study provides a valuable resource for future research and industrial implementation of ANN-based VSs.It addresses the previous limitation of the lack of such datasets for supervised DL.The study of the optimal sampling window length for input signals and ANN hyperparameters contributes to understanding how ANN models can be tuned to achieve maximum VS efficiency and accuracy in estimation of UM vertical velocity.The results of the study suggest that DNNs with convolutional layers have the greatest potential to achieve this goal and outperform other types of DNNs by providing the lowest RMSE of 0.0210 m/s and a sufficiently short processing time of 0.421 ms/sample with a window size of 23 samples.A closer examination of the ground truth and estimated signals showed that CNN-based VS rejects higher frequency oscillations together with process and measurement noise while preserving average values.This is because SM parameters are used as inputs, which are already damped by the suspensions system.In addition, it was found that VS relies primarily on the roll rate provided by the IMU.However, none of the signals can be removed to achieve better performance, and DSC regulation signal can be removed without loss of accuracy.Other tests showed that the input signal mean subtraction has little impact on performance.
In summary, our results represent a significant advance in data-driven VS for vehicle suspensions control, especially for UM vertical velocity estimation.As one of the most important inputs for suspension control, accurate estimation of vertical velocities of UM and SM can significantly improve vehicle comfort, handling, and stability without the need for costly physical sensors.The findings of this research can speed up research of VS for UM vertical velocity development.
Future research should further optimize the ANN with convolutional layers to achieve even better accuracy and implementation on in-vehicle RTTM, such as the dSPACE MicroAutoBox, while maintaining a processing time of less than 10 ms.Such integration would allow real-time testing of the created algorithms and analysis of the impact on comfort when using VS instead of physical sensors.For this purpose, also a quarter-car test setup or demonstrator vehicle can be used.

FIGURE 3 .
FIGURE 3. The internal structure of one LSTM unit.
35 samples window.The RMSE difference between LSTM and BiLSTM was small, and the dependence on window size was very similar.

FIGURE 4 .
FIGURE 4. UM vertical velocity estimation RMSE using various sample window sizes with all tested ANNs.Estimated FL UM vertical velocity and the ground truth signals are shown in Figure5.Estimation was made using VS based on DNN with 2 convolutional layers.The estimate of the lower-frequency high-amplitude UM vertical velocity is shown in Figure5(a), and the estimate of the higher-frequency UM vertical velocity is shown in Figure5(b).It can be seen that the VS estimation tracks more closely the changes at lower frequencies and higher amplitudes, while rejecting high frequency oscillations.

FIGURE 5 .
FIGURE 5. Virtual Sensor and Ground truth comparison: a) lower frequency signal example; b) higher frequency signal example.

TABLE 1 .
Input signals, sources, and statistical characteristics.

TABLE 2 .
Metrics of 1 hidden layer MLP with various sample window sizes.

TABLE 3 .
Metrics of 2 hidden layers MLP with various sample window sizes.

TABLE 4 .
Metrics of 3 hidden layers MLP with various sample window sizes.

TABLE 5 .
Metrics of 2-layer CNN with various sample window sizes.

Table 6 .
The RMSE ranges from 0.0215 to 0.0231 m/s, and the duration from 0.429 to 0.604 ms/sample.The best accuracy of 0.0215 m/s was achieved for 19, 21, 27, and 35 samples windows sizes with a best processing duration of 0.525 ms/sample for window size of 19 samples.That model has 324 convolutional neurons and 200 FC neuron units.The overall performance was worse compared to DNN with 2 CNN layers.Therefore, DNNs with more than 3 CNN layers were not tested.Next, DNN with LSTM was tested.

TABLE 6 .
Metrics of 3-layer CNN with various sample window sizes.

TABLE 7 .
Metrics of LSTM with various sample window sizes.

TABLE 8 .
Metrics of BiLSTM with various sample window sizes.

TABLE 9 .
Metrics of MLP-LSTM with various sample window sizes.Figure 4 compares for all window sizes in a graph.There the most promising window sizes are 19 to 35.MLP, LSTM, and BiLSTM show decreasing RMSE up to Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 10 .
Input signal importance by change of test RMSE.

TABLE 11 .
Metrics of 2-layer CNN with various sample window sizes and no input mean subtraction.