Robust and Sample-Efficient Estimation of Vehicle Lateral Velocity Using Neural Networks With Explainable Structure Informed by Kinematic Principles

This paper presents kinematics-structured neural networks (KS-NN) for the lateral speed estimation of vehicles. The internal structure of the networks is designed to incorporate the kinematic principles, enhancing the physical explainability and generalization capacity. Both the internal structure and training method are devised for better generalization performance. Various linear and nonlinear variants of our estimator are assessed for accuracy and robustness. The approach is validated using an openly accessible dataset with two race cars. The performance of the novel networks is evaluated against Luenberger, neural network, factor graph and Kalman filter estimators from the literature. In comparison with a Luenberger kinematic observer, our networks improve noise rejection, and overcome the well-known observability problem for low yaw rates. Compared to existing neural network estimators, our networks exhibit better generalization capacity, are more sample-efficient, require fewer learnable parameters, and their structure is physically explainable.

Abstract-This paper presents kinematics-structured neural networks (KS-NN) for the lateral speed estimation of vehicles.The internal structure of the networks is designed to incorporate the kinematic principles, enhancing the physical explainability and generalization capacity.Both the internal structure and training method are devised for better generalization performance.Various linear and nonlinear variants of our estimator are assessed for accuracy and robustness.The approach is validated using an openly accessible dataset with two race cars.The performance of the novel networks is evaluated against Luenberger, neural network, factor graph and Kalman filter estimators from the literature.In comparison with a Luenberger kinematic observer, our networks improve noise rejection, and overcome the well-known observability problem for low yaw rates.Compared to existing neural network estimators, our networks exhibit better generalization capacity, are more sampleefficient, require fewer learnable parameters, and their structure is physically explainable.Index Terms-Lateral speed estimation, explainable neural network, neural observer, sample-efficient neural network.
Since the direct measurement of the lateral velocity is feasible only with expensive optical sensors or dual-antenna Global Positioning Systems (GPS) -which are not affordable for most commercial vehicles-many estimators have flourished.
Among the estimation techniques, model-based approaches take a prominent role.The literature (see Section I-B) presents two main types of model-based observers: a) those using a model of the vehicle dynamics and b) those based on universal kinematic principles.The latter type does not depend on dynamical characteristics that may vary among vehicles and environmental conditions.However, the kinematic estimation suffers from observability problems at low yaw rates (e.g., on straights) and significant noise sensitivity.The authors are with the Department of Industrial Engineering, University of Trento, 38123 Trento, Italy (e-mail: mauro.dalio@unitn.it;mattia.piccinini@unitn.it;francesco.biral@unitn.it).
Digital Object Identifier 10.1109/TITS.2023.3303776 Data-driven methods, especially neural networks, are alternatives to model-based approaches.They are trained on specific vehicles and environments for which they generally perform well.However, they are unexplainable black boxes.It is difficult to assess how much vehicle-and environment-specific they are, what they have learned, and whether they learned dynamical characteristics that may vary in operation.

A. Contribution of This Paper
The paper presents a specialized neural architecture for the physical problem at hand, offering several advantages: • Sample-efficiency, by requiring fewer parameters to model the physical phenomenon accurately; • Ability to generalize, resulting in robustness; • Explainability, as it is an interpretable linear parameter varying discrete-time model; • Being a linear parameter varying model, it can be implemented easily with automotive-grade hardware.The study evaluates 28 variants of our specialized model in 4 groups, 10 variants of benchmark general-purpose recurrent neural networks in 2 groups (GRU and LSTM, from [9] and [10]), the benchmark Luenberger kinematic observer [11], and the benchmarks factor graph and Kalman filter [12].
The evaluation of the above candidates for accuracy and robustness uses an openly accessible dataset of telemetries from two real race cars, with challenging top speeds of 240 km/h and lateral accelerations up to m/s 2 .

B. Related Work
The design of lateral speed estimators has been an active research topic for more than 30 years.The authors of [13] review the methods published until 2018, finding two broad categories: one based on vehicle models and the other purely data-driven.The former class can be split into three subgroups: dynamical vehicle models, kinematic models, or combinations of the two.
1) Dynamic Model-Based Observers: Observers adopting dynamical models typically depend on several vehicle parameters -including tire submodels-, which makes them vehicle and context specific.For example, [14] and [15] used Kalman filters (KF) with single-track vehicle models.Using a dynamical vehicle model, a sliding mode estimator was adopted in [16], in combination with a KF.Linear, nonlinear, and sliding mode observers were compared in [17], using dynamic vehicle models restricted to moderate accelerations.The authors of [12] presented a factor graph approach based on a vehicle dynamic model.
However, the use of dynamic vehicle models usually results in a high sensitivity to variations in the environment, operative conditions and/or vehicle parameters.
2) Kinematic Model-Based Observers: Kinematic observers exploit the principles of kinematics to estimate the vehicle velocity components, using the measured accelerations and angular rate.One seminal paper is [11], which developed a parameter-varying Luenberger observer such that the estimation error vanishes for a non-zero yaw rate.The approach of [11] is quite robust to variations in the physical parameters of the vehicle, which do not appear in the formulation.The authors of [18] extended the Luenberger estimator with an adaptive dead zone in the feedback terms, which permits larger gains to compensate for lateral acceleration biases.Starting from [11], a heuristic correction term was added in [19] to bring the estimated lateral speed to zero when the vehicle is driving straight.The lateral speed estimation in [20] and [21] was improved by utilizing the measurement bias estimates of an inertial measurement unit (IMU).Extended KFs were implemented in [22], [23], and [24] using the two-and three-dimensional kinematic models.
However, using basic kinematic estimators, the lateral speed becomes unobservable for low yaw rates.In addition, the estimation degrades in the presence of biases on the measured accelerations (e.g., due to gravity when the chassis rolls or pitches and when the road bank and grade angles are not negligible).
3) Mixed Kinematic-Dynamic Model-Based Observers: Mixed kinematic-dynamic methods were introduced in [25], [26], [27], and [28] to overcome the limitations of purely kinematic estimators.In these papers, lateral vehicle dynamic models improve the lateral speed observability for low yaw rates.The authors of [29] proposed a fuzzified weighted mean of the estimates obtained with a kinematic observer and with a dynamic model-based KF.
Compared with pure kinematic estimators, mixed kinematic-dynamic methods inevitably increase the sensitivity to changes in operative conditions and vehicle parameters.
4) Data-Driven Observers: Some authors used neural networks (NNs) in pure data-driven black-box state estimation methods.For example, NNs using gated recurrent units (GRUs) were employed in [9] and [30] to estimate the lateral speed with end-to-end approaches.Similarly, [10] used a long short-term memory (LSTM) network.A simple singlehidden-layer NN was presented in [31] to estimate the side slip using only simulated tests.A similar NN was adopted in [32] to predict the lateral tire forces, which were then fed to an extended KF for state estimation.The authors in [33] designed a single-hidden-layer neural observer using the set membership theory.To deal with variable road friction, [34] combined three nonlinear auto-regressive networks (ARX), to learn the side slip in three different adherence conditions.
However, pure data-driven neural network estimators tend to suffer from overfitting issues, since they implicitly learn vehicle dynamic characteristics that depend on the operating conditions used during training.
To the best of the authors' knowledge, the lateral speed estimators in the literature are limited by at least one of the following aspects: 1) The neural networks in data-driven estimation (e.g., GRU, LSTM) [9], [10], [30] usually overfit the vehicle and environment conditions used for training.
2) The neural networks' generic internal structures do not follow kinematic laws, and the learnable parameters cannot be physically explained.3) All the neural and many model-based kinematic estimators rely on measuring the steering wheel angle.However, this can decrease the robustness of the estimator: the relation between the steering angle and the lateral velocity depends on vehicle characteristics (e.g., tires and steering system), which can vary with different tires, road surfaces, and vehicle types.4) Some kinematic estimators require an IMU sensor, e.g., to estimate the roll angle [19].However, an IMU is not affordable for most commercial cars.

C. Structure of the Paper
Section II discusses the principles of traditional kinematic estimation and its limitations.Section III presents the kinematics-driven internal structure of the proposed neural networks.The interpretation of the structure of the neural network is explained in more detail in Section IV.Section V discusses the literature benchmarks used for comparison.Section VI introduces the open experimental race car dataset used to validate the approach, and the training method.Section VII analyzes the measurement noise and dynamic characteristics of the two race cars used in the dataset.Section VIII discusses the main findings, highlighting the improved robustness of the new networks.Finally, Section IX concludes the study and suggests areas for future research.

II. PRINCIPLES AND LIMITATIONS OF TRADITIONAL KINEMATIC ESTIMATION
This section provides an overview of the principles of traditional kinematic lateral speed estimation, along with a discussion of their limitations.Understanding these principles is crucial to comprehend the proposed neural network architecture, and its advantages over conventional methods.

A. Measurement Principle
The direct measurement of a vehicle's lateral velocity (v) is not easily affordable, but estimators can give indirect measurements.Kinematic estimators exploit measurements of the angular velocity (ω), acceleration (a), and forward speed (u) with the following principle.Let: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
be the velocity vector expressed in a moving reference frame {i, j , k}, where u, v, w are the components of v in that frame.The time derivative of (1) reads as: where the Poisson's formulas are used to express the derivatives of {i, j , k} (e.g., d i dt = ω × i).After computing the cross products, the projection of (2) on the axes {i, j } is: where {ω x , ω y , ω z } are the components of ω in {i, j , k}.
If the three-dimensional motion of the vehicle is approximated in two dimensions (ω x = ω y = 0, w = 0 and ω z = ω), with i being the vehicle longitudinal axis on a flat ground and j pointing leftwards (k upwards), the planar kinematic model is finally obtained: Since a y , u and ω can be measured by on-board sensors, (4a) provides a means to compute v, and hence, in principle, an estimation v of v could be obtained via integration.However, besides the approximation from (3a) to (4a), a y , u and ω are noisy, 1 and their integration according to (4a) will diverge following a partially random walk.
The drift can be attenuated by using both equations (4a) and (4b): the velocity v integrated by the former is used in the latter, to compute the expected longitudinal velocity û.Since the actual longitudinal velocity can be measured (u m ), the quantity û − u m becomes an indicator of the accumulated drift, which can be injected in (4a) as follows: where f1 (ω, û − u m ) and f2 (ω, û − u m ) are suitable feedback corrections, which can be designed as functions of the yaw rate ω and of û − u m .Equations (5a) and (5b) are the basic prototype of kinematic observers.

B. Unobservability Issues of Pure Kinematic Estimation
Using the basic kinematic model (4a,4b) and measuring the forward speed u m , the lateral speed v becomes unobservable as the yaw rate ω approaches zero (a proof can be found in [11]).As a consequence, the estimators based on pure kinematic models, like the Luenberger observer of [11], need to change their estimation model or be reset for low yaw rate values.
The structure of the novel neural model of this paper is (among the others) designed to overcome the observability issue.The proposed KS-NN model is a memoryless recurrent neural network (RNN).The input of KS-NN is a window of N past measurements {r k−N +1 , . . ., r k }, with r = {ω, u m , a y , a x }.The recurrence to compute { vk+1 , ûk+1 } starting from { vk−N+1 , ûk−N+1 } = {0, u m k−N +1 } is unrolled, for an easier graphical representation.The inner structure of the C blocks is depicted in Fig. 2(a).

III. NEURAL NETWORK ESTIMATOR WITH KINEMATICS-DRIVEN STRUCTURE
This section outlines the design of the proposed lateral velocity neural network estimator.Our estimator is labeled KS-NN (kinematics-structured neural network), and its internal structure is designed to generalize the kinematic laws.
Let us begin with an overview of the design and operating principle of KS-NN.
We collect in the vector r = {ω, u m , a y , a x } the input measurements required by KS-NN, namely the yaw rate ω, the forward speed u m , and the accelerations {a y , a x }.
KS-NN operates as a memoryless recurrent neural network (RNN), i.e., as a finite impulse response (FIR) system.As shown in Fig. 1, at each discrete time step k, a window of N past measurements In the following implementations, the sampling time of the KS-NN model is τ = 0.05 s, and N = 30, so that the input windows of past measurements span 1.5 s (= N τ ), which means that we ask the estimator to converge within 1.5 s.
Fig. 2(a) shows the internal architecture of the recurrent network (the C block appearing in Fig. 1).It is a bank of Luenberger-like modules (M p ) that are combined with the channel coding technique [36, Section III.A-1].Each module M p is a local model, activated by a function φ p (|ω|) with local support.Hence, each module learns to estimate v in a specific range of yaw rates (ω).There can be many local models M p , p ∈ {1, . . ., P} to improve the descriptive capacity of the network at varying yaw rates.
The structure of each module is similar to the Luenberger estimator [11], but with several learnable weights and a possibly nonlinear feedback function, with Q hidden neurons.
Let us see more in detail the structure of the models and how they are combined with channel coding.

A. Custom Neural Network Module M
The design of the local models {M 1 , . . ., M P }, depicted in Fig. 2(b), is based on a neural module, named M, which we now illustrate.Let us start from the explicit Euler discretization of the basic kinematic observer (5a,5b): where { vk , ûk } are the estimates of {v, u} at the time step k, and τ is the integration time step.We now cast the basic kinematic model (6a,6b) into a neural network module M, using the following recursion scheme: In (7a,7b), { vk M , ûk M } are the estimates obtained with the module M at the time step k.In comparison with the basic observation model (6a,6b), the proposed module (7a,7b) has additional learnable quantities, namely the parameters {γ i , β i }, i ∈ {1, 2}, and the weights and biases contained in the neural feedback functions { f 1 (•), f 2 (•)}, which will be described next.Equations (7a,7b) can therefore be seen as a generalized explicit Euler discretization of (5a,5b), in which the time step is τ .The {G 1 (•), G 2 (•)} functions in (7a,7b) are used to facilitate a graphical interpretation of the module M, provided in Fig. 2(b).
As will be described in Section IV, all the learnable parameters in (7a,7b) have a physically explainable role, improve the noise rejection capabilities of the observer, and overcome the unobservability problem of the basic kinematic model during straight driving.The learnable feedback functions f i (•) of (7a,7b), i ∈ {1, 2}, are designed with a two-branch shallow neural architecture: where i ∈ {1, 2}.As shown in Fig. 2(c), the f i (•) functions in (8) have a linear and a nonlinear branch.The linear branch computes w The nonlinear branch combines two fully-connected layers: the first layer ("Lin2 i " in Fig. 2(c)) has Q neurons and a Tanh activation function, while the second layer ("Lin3 i ") has 1 neuron and a linear activation function.Each of the f i (•) functions in (8), i ∈ {1, 2}, has a maximum of 3Q + 2 learnable parameters, namely the weights {w 1 i , w 2 ji , w 3 ji } and the biases {b 1 i , b 2 ji }, j ∈ {1, . . ., Q}.The number Q of neurons might be varied, together with the number of hidden layers, to change the complexity of the f i (•) networks.
The maximum number of learnable parameters in the neural module

B. Combining Multiple Parallel Modules
Within the neural module M (7a,7b), the learnable parameters are {γ i , β i } and the weights and biases of the functions The theory of vehicle dynamics [37] suggests that the behavior of the lateral velocity could vary with the magnitude Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Notably, the functions φ p (|ω|) are also called receptive fields, validity functions, or membership functions [38] in different contexts.Interested readers can find more details about our implementation in [36] and [39, Section III.A-1], where we discuss the relationship with similar approaches in biology and other engineering applications.
The complete local model approximation C is written as follows, and implemented in the network shown in Fig. 2(a): where { vk+1 M p , ûk+1 M p } are the estimates computed with the module M p , p ∈ {1, . . ., P}, using (7a,7b).The model C in (9) defines the interpolation of the estimates of the local models.The maximum number of trainable parameters for KS-NN is N p P= (6Q + 8)P.Note that the number of neurons Q and the number of local models P are adjustable hyperparameters, through which one can change the complexity and learning capability of the model.The role of Q and P is conceptually similar to the number of hidden layers and states of traditional recurrent neural networks [38].

IV. INTERPRETATION OF THE NEURAL NETWORK STRUCTURE
In this section, we show how the proposed neural network KS-NN can be interpreted in various ways.

A. Intepreting the Extended Kinematic Laws
In ideal conditions, with no measurement noise and on a perfectly planar surface, the evolution of the lateral speed could be in principle predicted through the integration of the exact kinematics, represented by equations (4a,4b).However, the existence of sensor noise, and the variation in chassis roll and pitch angles make the integration of (4a,4b) diverge.The feedback functions f1 (•) and f2 (•) in (5a,5b) are used to correct the divergence.
The model presented in equations (7a,7b) adapts the kinematic laws to the noise levels in the measured channels: compared to the discrete-time kinematic estimator (6a,6b), it has additional learnable parameters {γ 1 , γ 2 , β 1 , β 2 }, and feedback functions { f 1 (•), f 2 (•)} that are more general than those used in the Luenberger observer [11] (Appendix A).The additional parameters and extended feedback functions improve noise rejection and overcome the unobservability issue.
1) Interpreting the Output of KS-NN: We can decompose the lateral velocity estimate produced by KS-NN into three independent and fused estimates, each of which has a physical interpretation.To simplify the analysis, let us focus on a single neural module M, represented by equations (7a,7b), and consider the case where the feedback functions Since the resulting neural module M (7a,7b) is linear, it is possible to apply the Z transform, obtaining: where the quantities {A(z), B(z), C(z), D(z)} are given by: The equation (11) shows that v is determined by merging three sub-models: • The first sub-model estimates v given the measurements of the slip rate a y − ωu m , which is the right-hand side of the basic kinematic model (4a).
• The third sub-model estimates the lateral velocity v by learning the relationship between v and ω, for a certain forward speed u m contained in C(z). 2he learnable transfer functions A(z), B(z), C(z), D(z) weight the three sub-models.It is worth noting that equations (11) and (12) apply to one neural module M.However, KS-NN employs P modules M, which activate in specific ω intervals through channel coding.This approach allows the transfer function's pole-zero maps and gains to adapt to varying ω, enhancing the model's learning ability.
The role of the main learnable parameters in (11,12) will now be explained in detail.
2) Explaining the Role of γ 1 : We now focus on (7a), assuming a fixed β 1 = 1 and a learnable γ 1 ∈ (0, 1].The term γ 1 vk + τ (a y k − u m k ω k ) in (7a), with γ 1 ∈ (0, 1], operates like a leaky integrator, accumulating an estimate of the lateral velocity, while forgetting some less recent input noise.During straight driving (ω ≈ 0), the parameter γ 1 < 1 helps to let v converge to zero, therefore overcoming the observability problem of the basic kinematic model for low yaw rates. 3he measured yaw rate ω k is introduced in the ω k f 1 (•) term of (7a), to force (7a) to be decoupled from (7b) during straight driving, so that v → 0.
In the proposed approach, each of the M 1 , . . ., M P local models can learn different values of γ 1 .However, to preserve the kinematic relations, the learned γ 1 values are close to 1 in the local models at higher yaw rates, where there are no observability issues.
3) Explaining the Role of γ 2 and β 2 : The learnable parameters γ 2 and β 2 in (7b) modify the interpretation of the state û, which is no longer necessarily the true forward speed, but rather a transformed version of it, that is optimized for improved estimation accuracy of v.
The parameters γ 2 and β 2 modify the weighting transfer function C(z) in (12b), by adding a term proportional to u m when they are not equal to 1.This modification enables the third sub-model in (11) to learn a dynamic relationship among v, ω, and u m , that cannot be captured by the basic kinematic estimator (6a,6b) or the Luenberger observer of [11].
The parameters γ 2 and β 2 also represent additional degrees of freedom to learn the transfer functions {A(z), B(z), C(z), D(z)} in (11), which further improves the observer performance.
4) Explaining the Role of β 1 : The learnable parameter β 1 in (7a) generalizes the kinematic relation between v and the slip rate a y − ωu m , which improves noise rejection.The optimized β 1 values in the neural modules M 1 , . . ., M P are typically lower than 1.As a result, (7a) still integrates a y − ωu m , but the integrand has a reduced weight (β 1 ), which leads to improved noise rejection.The missing contribution is provided by the term ω k f 1 ( ûk − u m k ) in (7a), which represents a lateral velocity dynamic model that is learned with the state û.
5) Explaining the Role of b 1 1 and b 1 2 : The biases {b 1 1 , b 1 2 } in (10a) have the capacity to learn the measurement biases specific to a particular vehicle and sensor setup.However, this specialization can reduce the estimator's robustness, as discussed in Section VIII.

B. Interpreting KS-NN as an LPV Model
This section shows that the proposed KS-NN model can be interpreted as a linear parameter varying (LPV) system.
The combination of the local neural modules {M 1 , . . ., M P } in (9) can be expanded using (7a,7b): If the feedback functions { f 1 p (•), f 2 p (•)} are linear, then the dynamics can be represented as a linear parameter varying (LPV) discrete-time system.This is because, when performing the products among φ p (|ω k |) and the M p models in (13a,13b), the resulting equations can be written in a linear form, with parameters that vary with the value of |ω k |.

V. LITERATURE BENCHMARKS
The neural network KS-NN is compared with the following literature benchmarks: • Kinematic Luenberger observer [11], with 2 trainable parameters.The implementation is detailed in Appendix A.
• Gated recurrent network (GRU).The implementation follows the E1 estimator proposed by [9], using the same input signals of KS-NN.
• Long short-term memory network (LSTM).The implementation follows [10], using the same input signals of KS-NN.
• Factor graph (FG) and Kalman filter (KF) estimators of [12].The authors used the same race car dataset as our paper; hence their results are directly comparable with ours.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

VI. EXPERIMENTAL DATASET AND MODEL PARAMETERIZATION A. Experimental Dataset
We use real telemetries of race cars from the Revs Vehicle Dynamics database (Stanford University), which is openly accessible [40].It contains data from two cars -with challenging top speeds of 240 km/h and lateral accelerations up to 12 m/s 2 .They are: • Corvette Grand Sport 1963, hereafter named "Corvette"; • Ferrari 250 LM GT 1965, hereafter named "Ferrari".We focus on the Palm Beach sessions listed in Table I.For each vehicle, the available data are divided into two subsets, named {C-1,C-2} for the Corvette and {F-1,F-2} for the Ferrari: the −1 and −2 subsets will be used for training and testing, as explained below.Table I shows the names of the corresponding files in the Revs database and some lap statistics.
To comply with the design sampling time of the neural estimators, the original signals, stored at 1 kHz, are down-sampled to τ −1 = 20 Hz.

B. Training of the Estimators
We now describe how the datasets of Table I are used to train, validate and test the novel and benchmark neural estimators.
As mentioned in the introduction, observers can either be tailored to specific vehicles, or designed to be insensitive to variations in parameters such as operating conditions, vehicle types, and aging.Our objective is to develop observers with the latter characteristic, which can be achieved by designing and training neural networks with strong generalization abilities.This means that a network trained on data from one vehicle should be able to function effectively on another vehicle.
Networks with this level of generalization tend to be robust.Instead, networks specialized for a specific vehicle design may suffer from unpredictable degradation when vehicle parameters vary.
The benchmark Luenberger observer is constrained by design to use only kinematics.To a slightly lesser extent, the KS-NN networks introduced in this paper are also constrained to adhere to kinematics (Sections III and IV).However, the generic LSTM and GRU networks used for benchmarking are not ideal for our purposes.These networks are designed to fit the training data by identifying any hidden correlations in the input-output examples.With hundreds of parameters, these generic networks can easily learn characteristics that depend on fewer parameters, that differentiate one vehicle from another.
1) Controlling LSTM and GRU Overfitting: To force the benchmark GRU and LSTM networks to learn universal input-output relations as much as possible, we adopt the early stopping technique.The networks are trained with the training set of one vehicle and, as a validation set, 4 with the training set of the other vehicle.For example, the Corvette set C-1 for training, and the Ferrari set F-1 for validation, or vice versa, as shown in Table II.For convenience, this technique is referred here as cross-vehicle regularization.To have a uniform training procedure, the same training method with cross-vehicle regularization is adopted for the novel KS-NN networks as well.
With early stopping, the training process terminates as soon as the validation loss stops improving.This mitigates the risk of updating the network parameters in ways that are not general (at least not for the given validation examples), and yields the most robust network.
With early stopping, the trained network may vary with the initial parameter seeds (because of premature stopping).Hence, we repeated the training several times with randomized initial guesses, and evaluated the distributions of the resulting quality metrics, taking the best performing networks of the random search. 5) Training Procedure: The neural networks are developed and trained in Wolfram Mathematica 12.3.1,which uses the MXNet Deep Learning framework.
The training set for KS-NN consists of the input → output data {r k−N +1 , . . ., r k−1 , r k } → v k+1 , with v k+1 being the ground truth lateral speed measurements and r k = {ω k , u m k , a y k , a x k }.The model estimates vk+1 using the provided windows (with size N ) of past measurements r, according to the recursion scheme of Fig. 1.Using supervised learning, the network is trained to minimize the mean square error (MSE) of vk+1 − v k+1 .The Adam method [41] is employed for numerical optimization, with a maximum of 1000 training epochs (but early stopping typically occurs at about 200 epochs) and a small batch size of 64 (which helps generalization abilities).

C. Model Hyperparameters and Model Instantiations
Except for the benchmark Luenberger observer, all models have hyperparameters that control model complexity and the number of trainable parameters.For the benchmark LSTM and GRU networks, it is the dimension of the internal state that controls the model complexity.For the novel KS-NN networks of this article, the number of local models P and the number of hidden neurons in the feedback functions Q play a similar role.
Table III lists the models that were instantiated for each category.In total, there are 39 different candidates, with the number of parameters spanning the range 12-271 (except for the Luenberger, which has only two parameters).

A. Measurement Noise Analysis of the Experimental Vehicles
Equation (4a) states that v equals a y − ωu m .However, if we obtain v from the derivative of the ground truth lateral velocity v, it may not match a y − ωu m .This is because a y , ω and u m are noisy and contain measurement biases, such as errors in the mounting position of the accelerometer and gyroscope, chassis roll and pitch angles, and sensor offsets.Assuming that the ground truth lateral velocity v is more precise than the other signals, the quantity v − (a y − ωu m ) is an estimate of the overall noise on the right-hand side of (4a).
Fig. 4 compares the periodograms [42] of v − (a y − ωu m ), for the Ferrari and the Corvette, to the periodograms of the measured v, to visualize the signal-to-noise ratios, shown with shaded areas.The signal-to-noise ratio is more favorable in the Ferrari, where the power of v is relatively high compared to the power of v −(a y −ωu m ).In the Corvette, the power of v is lower, resulting in a higher noise-to-signal ratio.In particular, in the Corvette, below 0.1 Hz, the signal v is below the noise level.
1) Mean Bias: The mean value of v − (a y − ωu m ) is 0.10 m/s 2 for the Corvette, and a better -0.03 m/s 2 for the Ferrari.We can interpret this figure as a mean bias that originates somewhere on the right-hand side of (4a), which is the primary signal to estimate v kinematically.

B. Lateral Velocity Dynamics of the Experimental Vehicles
Fig. 5 shows that the measured lateral velocity v and lateral acceleration a y exhibit different dynamics for the Ferrari and the Corvette.The plot suggests that the Ferrari achieves higher lateral velocities for the same lateral acceleration, compared to the Corvette.These differences emphasize the importance of testing the observer's ability to generalize and potential overfitting issues, which is achieved by testing the observer on the vehicle not used for training.

VIII. RESULTS AND DISCUSSION
Fig. 6 gives a visual representation of the performance of the 39 candidate models in Table III.
The left column concerns models trained according to scheme 1 of Table II, i.e., with the Corvette set C-1 for training, and the Ferrari set F-1 for validation.The right column concerns models trained with scheme 2, i.e., using F-1 for training, and C-1 for validation.The performance metrics are evaluated on both test sets C-2 and F-2, and are the coefficient of determination R 2 in the top row and the root mean square error (RMSE) of the predicted lateral velocity in the bottom row.
More precisely, R 2 = 1 − var(v − v)/var(v) -with v the estimated lateral velocity, v the ground truth lateral velocity, and var(•) the variance operator-is a non-dimensional index showing how much of the variation of v is successfully accounted for by a model.Its complement to 1, FVU = 1 − R 2 = var(v − v)/var(v), is known as the fraction of variance unexplained by a model.
The root mean square error of the lateral velocity, on the bottom charts, complements the above information with the expected magnitude of the estimated error:

A. Model Quality in Terms of Accounted Variance
Let us first study the top row of Fig. 6.The dots show the coefficient of determination R 2 of the 39 models.The benchmark Luenberger model is shown in gray.The benchmark LSTM and GRU models have similar behavior and are shown together in purple.The KS-NN models with linear feedback (Q = 0) are colored blue.They form two clusters: models without biases (b 1 i = 0 in (10a)), and models with biases (b 1 i ̸ = 0 in (10a)).The KS-NN models with nonlinear feedback (Q > 0) are colored in orange.They also form two clusters: without biases (b 2 ji = 0 in ( 8)) and with biases (b 2 ji ̸ = 0 in ( 8)).

1) Training Scheme 1:
Let us now focus on training scheme 1 (Corvette C-1 as training set and Ferrari F-1 as validation set), shown in the top left sub-plot of Fig. 6.By training on C-1 with F-1 as the validation set, the models seek to optimize the performance of the Corvette, subject to not worsening the performance of the Ferrari.So, they seek to best model one vehicle while maintaining the maximum generalization capacity measured on the other.The graph plots R 2 , evaluated in test sets C-2 (Corvette) and F-2 (Ferrari), which were not used for training or selection.C-2 is on the y-axis.
All dots are above the main diagonal, meaning that the performance on C-2 (the Corvette test set) is better than the performance on F-2 (the Ferrari test set).This is expected, since we trained with Corvette data, and used Ferrari data only to stop the training process.The difference between C-2 and F-2, i.e., the distance from the main diagonal, can be seen as a measure of model specialization for one vehicle, and thus, the generalization loss.
The benchmark Luenberger observer performs similarly in C-2 and F-2.However, it only accounts for 76.5% and 74.7% of the variance of v, respectively, indicating that about 25% of the variance is not predicted by the Luenberger.
The benchmark LSTM and GRU models outperform the Luenberger on C-2, exceeding 90% of explained variance.However, their performance worsens on F-2, indicating specialization for one vehicle at the expense of the other.
The KS-NN linear and nonlinear models with biases perform even better than the benchmarks in the C-2 test set, and marginally better than the Luenberger observer in F-2.This means that they are able to improve both vehicles, with the greatest gains for the vehicle that provided the training set.
The KS-NN models without biases are the most robust, as they remain close to the main diagonal and exhibit equal improvements in both vehicles.The cost of such a greater generalization capacity is a decrease of R 2 to 86.8% in C-2 (from 92.9% of the models with biases), which, however, is balanced with an increase to 85.5% in F-2 (from 75.8% of the models with biases).We argue that models with biases (and, among others, the benchmarks GRU and LSTM) learn to compensate for the measurement biases discussed in Section VII.However, since the measurement biases vary in time and across different sensors and cars, specializing for one vehicle reduces performance for others with different biases.

2) Training Scheme 2:
The above considerations hold also for training scheme 2, shown in the top right sub-plot of Fig. 6, in which the Ferrari F-1 set is used for training, and the Corvette C-1 set for validation.However, the signal-tonoise ratio of the Ferrari is better than that of the Corvette (Section VII-A).Therefore, while performance in the Ferrari test set F-2 (y-axis) tends to improve, it comes at the cost of reduced generalization capacity for most models, except for the KS-NN nonlinear model without biases, which is the most robust.
In the top right chart, the most general models and the most specialized models are indicated.II), in which the C-1 (Corvette) dataset is used for training and the F-1 (Ferrari) is employed for early stopping.The second column is for the opposite training scheme 2. The "KS-NN nonlinear no bias" model exhibits the best generalization capacity, outperforming all other models on the vehicle not used for training.Compared to the benchmark Luenberger, the increase in R 2 is 25-43%, while the improvement over the benchmarks GRU and LSTM is 47-65%.

B. Model Quality in Terms of Expected Error
The bottom row of Fig. 6 displays the root mean square error (RMSE) of the predicted velocity.The KS-NN nonlinear model without biases exhibits the best generalization capacity: compared to the benchmark Luenberger, it improves the RMSE by 16-24% on the vehicle not used for training; compared to the benchmarks GRU and LSTM, it improves by 27-38%.

C. Models With the Best Generalization Capacity
Tables IV and V list the most general models per category.The general models significantly improve the observer's ability to predict the velocity of the testing vehicle, with only a minor loss in accuracy for the training vehicle.The model with the best generalization capacity is highlighted: it is the KS-NN network, with nonlinear feedback functions and no biases.Compared to the benchmark Luenberger, it improves the FVU From Tables IV and V, one may observe that robust solutions tend to have few parameters.
The nonlinear feedback functions { f 1 (•), f 2 (•)} (8) in the best general models realize variable feedback gains, as already found in [18].[12] This section presents a comparison between the KS-NN network and the factor graph (FG) and Kalman filter (KF) estimators proposed by [12].The comparison is based on the same Ferrari telemetries used in the open race car dataset [40] presented in this paper.To directly compare our results with [12], we compute the RMSE of the side slip angle estimate (β = atan(v/u)).

D. Comparison With the Factor Graph Benchmark
Table VI shows the results obtained with our KS-NN, 6 the benchmarks Luenberger, GRU, LSTM, and the FG and KF of [12,Section 4].
Our KS-NN model shows superior performance compared to the FG and KF benchmarks, with an RMSE of 0.40 • , which is 30% and 54% lower, respectively.It is worth noting that the FG and KF benchmarks were not cross-evaluated on the Corvette in [12].On the other hand, as shown in Table V, our KS-NN also exhibits good generalization capacity on the Corvette dataset, which was not used for training.

E. Noise Sensitivities
Assuming that most of the noise is contained in the a y and a x signals, one can study the propagation of noise in  [12] the KS-NN networks and in the benchmark Luenberger observer [11] as follows.
Starting from the characteristic equations of each model ((7a,7b) for KS-NN), the quantities {a y k , a x k } are replaced with {a y k + δa y k , a x k + δa x k }, where {δa y k , δa x k } stand for the acceleration noise.Applying the Z transform, the variation δ v(z) for given {δa y (z), δa x (z)} is: with   to lateral (a y ) and longitudinal acceleration (a x ) for the two vehicles are not equal, as they have different levels of noise in their respective signals.During training, the sensitivities are adjusted to balance the trade-off between accuracy in estimating the states and robustness to noise.

F. Power Spectral Densities of the Residuals
Figure 8 displays the power spectral densities (PSDs) [42] of the lateral speed estimation errors, obtained by the KS-NN network and the benchmarks Luenberger and LSTM.The top plot illustrates the results of the Ferrari test set, using the most robust models trained with the Corvette, while the bottom plot depicts the results of the Corvette test set, using the most robust models trained with the Ferrari.
The PSD of the measured lateral velocity signal is plotted as well (in black), to appreciate the signal-to-noise ratios.
The benchmark Luenberger residuals have a higher PSD than the KS-NN model, across most frequencies.Beyond 1-2 Hz, the PSD of the Luenberger exceeds even the signal level, indicating noise introduced by the observer.
The benchmark LSTM residuals have greater power than the KS-NN network, with a 7 dB difference at 0.1 Hz and 8-10 dB difference at 1 Hz, on the Corvette.In contrast, the PSD of the residuals of the KS-NN model is consistently below the signal level, and has the lowest values for all frequencies in both cars, confirming its improved estimation robustness.

G. Computational Efficiency of Implementations
As noted, the KS-NN estimator is made of parallel models, whose outputs are weighted by activation functions.The discrete-time equations of a module (7a,7b), feedback function (8), and the activation functions (φ p (|ω|)) are not difficult  IV and V).The KS-NN network outperforms the benchmarks, as its residuals have the lowest power.to code.Weights can be obtained from the trained network.The total number of mathematical operations is small and easily supported by low-cost automotive-grade hardware.

H. What Happens Without Cross-Vehicle Regularization
In Section VI-B.1, the cross-vehicle regularization technique was introduced to prevent overfitting of the training data, using early stopping with a validation set of another vehicle.In this section, we investigate the effects of removing cross-vehicle regularization.
Let us consider the findings in Table V, and focus on the benchmark LSTM model and the novel "KS-NN nonlinear no bias (best generalization)."Let us re-train these two architectures (LSTM with 2 states and KS-NN with P = 2 and Q = 2) as follows.Instead of using the Ferrari F-1 set for training and the Corvette C-1 set for validation, let us use the first two files of F-1 for training and the last file of F-1 for validation.This means that we train on Ferrari data, and the validation set is from the same vehicle.Using both training and validation sets from the same vehicle permits exploiting characteristics that are specific to that vehicle: on the one hand, the trained network improves specialization, but on the other hand, it loses generality.We show that the loss of generality is negligible for KS-NN, but is severe for LSTM.
The results are shown in Table VII.When cross-vehicle regularization is not used, the benchmark LSTM performs better on the Ferrari F-2 test set, with an FVU of 0.042, which is more than twice as good as the regularized version's FVU of 0.098.However, in the absence of regularization, the LSTM model fails to accurately predict the sideslip on the Corvette (C-2) test set, with an FVU of 0.84.In Fig. 6, the point of this case would fall outside the left edge of the charts (and still below the most specialized KS-NN model with nonlinear feedback, at R 2 = 0.967).In contrast, the KS-NN model maintains its performance on the Ferrari F-2 test set, and exhibits only a slight decline in accuracy on the Corvette C-2 test set.

IX. CONCLUSION AND FUTURE WORK
This paper introduces novel kinematics-structured neural network (KS-NN) models for lateral speed estimation in vehicles.The KS-NN model is designed to incorporate and generalize the underlying kinematic laws.
The kinematics-driven internal structure of the KS-NN models makes them physically explainable and enhances their generalization capacity.This reduces the risk of the models learning vehicle-and environment-specific characteristics, as they are designed to embed and generalize the underlying kinematic laws.
We propose a physical interpretation of the neural network's internal structure, explaining the role of its learnable parameters, and discussing the advantages over existing unexplainable neural estimators.
We evaluate KS-NN against existing Luenberger [11], GRU and LSTM neural networks (similar to [9] and [10]), and factor graph [12] estimators, using an openly accessible dataset with the telemetries of two race cars.The cars have significantly different lateral velocity dynamics, noise levels, and measurebiases.The KS-NN variants that exhibit the highest generalization capacity are those with nonlinear feedback functions and no biases.When tested on a vehicle that was not used during training, these models outperform the Luenberger observer by 25-43% in terms of explained variance R 2 , as well as the GRU and LSTM networks by 47-65%.Furthermore, KS-NN outperforms the factor graph proposed in [12] by 30%, in terms of RMSE.Although the best linear variants of KS-NN have slightly lower accuracy (up to 2% RMSE) than their nonlinear counterparts, they are simpler and physically explainable LPV systems with only 12 parameters.Despite their simplicity, they still exhibit superior generalization performance compared to all the benchmarks.
The KS-NN models have low computational complexity and require only few parameters, with the best nonlinear variant having up to 36 parameters and the linear variant having only 12.They rely on standard odometers, gyroscopes, and bi-axial accelerometers to measure forward speed, yaw rate, and longitudinal/lateral accelerations.Unlike other recurrent neural network estimators, which may require specialized hardware for real-time operation, the KS-NN models can be easily deployed using standard sensors and automotive-grade hardware.

A. Limitations and Future Work
This study used an open dataset with the telemetries of two different race cars, indicating how well the models may generalize.Future work might aim to study a broader set of experimental cars.
In this study, roll and pitch biases were not compensated for, as using a 6-degree-of-freedom inertial measurement unit for commercial vehicles is not affordable.Hence, estimators that do not compensate for acceleration biases were studied.It was found that models that can learn biases perform better on the training vehicle, but perform worse on others, as different vehicles have different biases.Therefore, it is expected that removing biases from the measurements before feeding the networks, e.g.via bias estimation and removal, would achieve better performance without sacrificing generalization capacity.

PATENT
The method described in this paper is patented (PCT/IT2023/050058).

APPENDIX A BENCHMARK LUENBERGER OBSERVER
This section discusses the implementation of the kinematic Luenberger observer of [11], which is used as a benchmark.The observer uses the basic kinematic estimation model (5a,5b), where the feedback functions f1 (•) and f2 (•) are designed as: Following [11], the feedback gain L(ω) in ( 17) is designed to place both the eigenvalues of the resulting state matrix in −α|ω|, where α ∈ R >0 is a design parameter.Large values of α produce faster estimation convergence dynamics, at the price of a higher sensitivity to the measurement noise.

Manuscript received 27
October 2021; revised 15 April 2023; accepted 3 August 2023.Date of publication 21 August 2023; date of current version 29 November 2023.The Associate Editor for this article was A. Amditis.(Corresponding author: Mattia Piccinini.)

Fig. 2 .
Fig. 2. Internal architecture of the recurrent block C of Fig. 1: (a) combination of P parallel modules of type M, to define the overall neural model C (9); (b) internal structure of the neural network module M (7a,7b); (c) internal structure of the { f 1 (•), f 2 (•)} feedback sub-networks (8) composing the module M.

Fig. 3 .
Fig. 3.This figure shows the activation functions φ p (|ω|) of the local neural models M p , where p ∈ {1, . . ., P}.The centers of the φ p (|ω|) functions are equally spaced over the range of recorded yaw rate ω values, which is denoted as 0, |ω max | .The plot is an example with three local models, P = 3.

Fig. 4 .
Fig.4.The periodograms compare the spectral densities of the measured signal v and the process noise v − (a y − ωu m ), for the Ferrari (top) and the Corvette (bottom).They show that the noise-to-signal ratio is more favorable in the Ferrari than in the Corvette, especially at low frequencies.

Fig. 5 .
Fig. 5.The plot shows the measured lateral velocity v versus the lateral acceleration a y , for the Ferrari and the Corvette.It can be observed that the Ferrari achieves larger absolute values of v than the Corvette for the same a y , indicating different dynamic characteristics between the two cars.These differences support the robustness analyses presented in the paper, which aim to evaluate the observer's ability to generalize by testing it on a car that was not used for training.

Fig. 6 .
Fig.6.The charts illustrate the performance of the candidate observers listed in TableIII.The top row shows the explained variance R 2 on test datasets of the Ferrari (F-2) and the Corvette (C-2), while the bottom row shows the root mean square error (RMSE) on the same datasets.The first column corresponds to training scheme 1 (TableII), in which the C-1 (Corvette) dataset is used for training and the F-1 (Ferrari) is employed for early stopping.The second column is for the opposite training scheme 2. The "KS-NN nonlinear no bias" model exhibits the best generalization capacity, outperforming all other models on the vehicle not used for training.Compared to the benchmark Luenberger, the increase in R 2 is 25-43%, while the improvement over the benchmarks GRU and LSTM is 47-65%.
Figure 7 compares the sensitivities of the KS-NN model and the Luenberger benchmark for two different values of ω: 0.2 rad/s (which is the median value of |ω| for both the Corvette and the Ferrari) and 0.4 rad/s.The plot shows that the KS-NN model has significantly lower sensitivities compared to the Luenberger, for both yaw rate values.The sensitivities

Fig. 7 .
Fig. 7. Magnitudes of the transfer function sensitivities to acceleration noise, for the KS-NN model and the benchmark Luenberger, trained with the Corvette and the Ferrari training sets.The left column corresponds to the Corvette, while the right column corresponds to the Ferrari.For both vehicles, the KS-NN model has significantly lower sensitivities than the Luenberger benchmark, indicating that the KS-NN is more robust to acceleration noise.Note that the sensitivities to a y and a x are not equal, due to the different levels of noise in the signals.During training, sensitivities are balanced to maximize the estimation accuracy.

Fig. 8 .
Fig. 8.This figure compares the power spectral densities of the v estimation errors, for the KS-NN network and the benchmarks Luenberger and LSTM.The top and bottom plots present results on the Ferrari and Corvette test sets, using the most robust models trained on the opposite car (TablesIV and V).The KS-NN network outperforms the benchmarks, as its residuals have the lowest power.
Robust and Sample-Efficient Estimation of Vehicle Lateral Velocity Using Neural Networks With Explainable Structure Informed by Kinematic Principles Mauro Da Lio , Member, IEEE, Mattia Piccinini , Member, IEEE, and Francesco Biral

TABLE I STATISTICS
OF THE EXPERIMENTAL RACE CAR DATASETS.(* 0.99 QUANTILES)

TABLE III MODELS
HYPERPARAMETERS AND INSTANTIATIONS

TABLE IV MOST
GENERAL MODELS PER CATEGORY, TRAINING SCHEME 1: TRAINING ON CORVETTE, VALIDATION ON FERRARI, TEST ON CORVETTE (C-2) AND FERRARI (F-2).THE KS-NN NONLINEAR MODEL WITHOUT BIASES IS THE MOST ACCURATE ON F-2, I.E., HAS THE BEST GENERALIZATION CAPACITY TABLE V MOST GENERAL MODELS PER CATEGORY, TRAINING SCHEME 2: TRAINING ON FERRARI, VALIDATION ON CORVETTE, TEST ON CORVETTE (C-2) AND FERRARI (F-2).THE KS-NN NONLINEAR MODEL WITHOUT BIASES IS THE MOST ACCURATE ON C-2, I.E., HAS THE BEST GENERALIZATION CAPACITY by 25-43% on the test vehicle; compared to the benchmarks GRU and LSTM, it improves by 47-65%.

TABLE VI RMSE
OF THE SIDESLIP ANGLE (β) ESTIMATE ON THE FERRARI F-2 TEST SET: THE KS-NN MODEL OUTPERFORMS THE BENCHMARKS, INCLUDING THE FG AND KF OF and H X M (z) being the noise sensitivities, and M={KS-NN,Luenberger} indicating the model type.The expressions of {H Y M (z), H X M (z)} for KS-NN and the benchmark Luenberger are reported in Appendix B.
H Y M (z) and H X M (z) (Appendix B) depend on the yaw rate ω.

TABLE VII MODELS
TRAINED WITH AND WITHOUT CROSS-VEHICLE REGULARIZATION (TRAINING SCHEME 2).IN THE ABSENCE OF REGULARIZATION, THE BENCHMARK LSTM CONSIDERABLY WORSENS THE ACCURACY ON THE CORVETTE (C-2), NOT USED FOR TRAINING.IN CONTRAST, THE KS-NN MODEL PRESERVES GOOD GENERALIZATION CAPACITY