A Trainable Approach to Zero-Delay Smoothing Spline Interpolation

The task of reconstructing smooth signals from streamed data in the form of signal samples arises in various applications. This work addresses such a task subject to a zero-delay response; that is, the smooth signal must be reconstructed sequentially as soon as a data sample is available and without having access to subsequent data. State-of-the-art approaches solve this problem by interpolating consecutive data samples using splines. Here, each interpolation step yields a piece that ensures a smooth signal reconstruction while minimizing a cost metric, typically a weighted sum between the squared residual and a derivative-based measure of smoothness. As a result, a zero-delay interpolation is achieved in exchange for an almost certainly higher cumulative cost as compared to interpolating all data samples together. This paper presents a novel approach to further reduce this cumulative cost on average. First, we formulate a zero-delay smoothing spline interpolation problem from a sequential decision-making perspective, allowing us to model the future impact of each interpolated piece on the average cumulative cost. Then, an interpolation method is proposed to exploit the temporal dependencies between the streamed data samples. Our method is assisted by a recurrent neural network and accordingly trained to reduce the accumulated cost on average over a set of example data samples collected from the same signal source generating the signal to be reconstructed. Finally, we present extensive experimental results for synthetic and real data showing how our approach outperforms the abovementioned state-of-the-art.


I. INTRODUCTION
Online learning has been studied and applied in a broad range of research fields, including optimization theory [5]- [7], signal processing [8], and machine learning [9]- [11].Within these fields, online methods generating a series of estimates from sequentially streamed data are especially useful to reduce complexity in large-scale problems [12], to dynamically adapt to new patterns in the data [13], and to enable acting under real-time requirements [14].
This work addresses the last one of the previous use cases in the context of signal reconstruction.Specifically, it investigates the use of online methods with zero-delay response for smooth signal reconstruction.First, most physical signals are bounded and smooth due to energy conservation [15]; hence it is beneficial to maintain smoothness as a property during signal reconstruction.Second, the zero-delay requirement demands new portions of the smooth signal to be reconstructed as ...

Data sample Reconstruction Future
Fig. 1: A signal reconstructed from stream data points by different methods and models.The symbols x t and f t represent the current time and signal estimate, respectively.a) Recursive least squares [1], [2] with a linear model.b) NORMA [3] with Gaussian kernels.c) Smoothing myopic interpolation with cubic Hermite splines [4].
soon as a new data sample is available.Consequently, a reduced constant complexity per iteration [16] is required so that the online method is executed at a higher speed than the transmission rate at which the data samples are received.These requisites are well-motivated since they appear in many practical problems, such as online trajectory planning [17], [18], real-time control systems [19], [20], and high-speed digital to analog conversion [21], among others.Although the tasks of estimating smooth signals or delivering a zero-delay response are separately managed by most online methods, handling them together becomes challenging, as we expound next.Some popular online methods that can be used for smooth signal estimation are online kernel methods [22], [23] and online Gaussian processes [24], [25].They aim at yielding a sequence of signal estimates with convergence guarantees or sublinear regret [26].To this end, they initially propose a signal estimate which is updated (or modified) possibly globally as new data samples arrive.Their goal is to refine the signal estimate rather than reconstruct new portions of the smooth signal.Therefore, neither smoothness nor even continuity of the sequentially reconstructed signal is guaran-teed.In fact, any online method not ensuring the smoothness of the reconstruction during the signal estimate update (even when the signal estimate is modeled by smooth functions) suffers from this issue, as illustrated in Fig. 1a) and 1b).On the other hand, online interpolation methods can be suitable candidates for the task of smooth signal reconstruction with a zero-delay response.These methods use piecewise-defined functions to model a sequence of local signal estimates.Some of these functions allow shaping piecewise-modeled signal estimates that can be updated by assembling a new section (or piece) while guaranteeing the smoothness of the overall sequentially reconstructed signal, as shown in Fig. 1c).Among them, piecewise polynomial functions, also known as splines, are arguably the most representative ones [27], [28].Actually, splines have been used since ancient times [29], long before their mathematical foundations were even established [30], presumably because of their approximation capabilities over functions of arbitrary complexity and ease of use.
It should be noted that most recursive signal estimation methods modeling function estimates by splines as a basis expansion [31]- [33], suffer from the same issues exposed before.This is mainly because the smoothness of their signal estimate is directly incorporated into the basis representation and not treated as a set of continuity constraints.On the contrary, some works [34], [35] have explored the task of interpolating sequentially streamed data under real-time requirements by means of splines subject to continuity constraints.However, the online methods they use involve a multi-step lookahead or shifting window mechanism, which introduces a delay.Indeed, most online methods for spline interpolation work with local information, for instance, a subset constituted by the last sequentially received data samples.In this case, a delayed response allows them to use a larger subset of sequentially received data samples and correct the signal estimates as long as they are updated within the delay limits.In brief, they can expand the extent of available information at the expense of some delay.On the other hand, and to the best of our knowledge, the only zero-delay spline interpolation method in the literature is the myopic approach, referred to as the "classical greedy approach" in [35], which reduces the delay response to zero by totally ignoring any source of forthcoming information, i.e., a purely local method.Clearly, there is a research gap on zero-delay spline interpolation methods exploiting additional nonlocal information to achieve a better reconstruction.This motivates us to work on the research question of whether it is possible to maintain the zero-delay requirement while efficiently using more information than the myopic approach.
In this paper, we answer affirmatively to the above research question by introducing a novel method for zero-delay smoothing1 spline interpolation that incorporates a priori information about the dynamics of the signal being reconstructed.To this end, we identify the elements of a state space-based sequential decision-making process [36] in the context of zero-delay smoothing spline interpolation.The proposed method relies on a policy, i.e., a strategy, that yields a section of the spline (action) as a function of the current condition of the so far reconstructed signal and the last received data sample (state).Such a policy consists of a differentiable convex optimization layer (DCOL) [37] on top of a recurrent neural network (RNN) [38], [39].The DCOL allows managing continuity constraints (for any differentiability class) at each interpolation step, thus guaranteeing the smoothness of the signal reconstruction.The RNN assists the signal estimate update mechanism when appending a new spline section by taking into account the effect of each interpolated section on future interpolation steps.This aid comes in the form of global data-driven knowledge, and it is tailored to minimize the global cost of the smoothing interpolation problem, on average.The cost is, in this case, the residual sum of squares plus a weighted derivative-based measure of smoothness.Lastly, our method is trainable in the sense that it uses example time series, i.e., time series sampled from the same signal source generating the signal to be reconstructed, to customize the policy to the temporal dependencies (dynamics) of the signal at hand.
The main contributions of this paper can be summarized as follows: • We rigorously formulate the problem of smoothing spline interpolation from sequentially streamed data, where each spline section has to be determined as soon as a data sample is available and without having access to subsequent data (zero-delay requirement).Due to its nature, it is formulated as a sequential decision-making problem.• As opposed to previously proposed (myopic and not trainable) zero-delay methods, our method trains a policy that aims at minimizing the smoothing interpolation cost metric on average.In order to capture the temporal, possibly long-term, dependencies between the streamed data samples and exploit them to reduce further the average cost metric, an RNN able to capture the signal dynamics is incorporated.• The proposed policy guarantees that the reconstructed signal is smooth (a certain number of derivatives are continuous over the interior of the signal domain).This is achieved by adding a DCOL at the output of the RNN and imposing a set of continuity constraints at each interpolation step.In addition, such a layer admits a closed-form evaluation, resulting in improved computational efficiency with respect to off-the-shelf DCOL libraries.
• We present extensive experimental results that validate our approach over synthetic and real data.Additionally, we show how our approach outperforms the state-of-theart (namely, myopic) zero-delay methods in terms of the smoothing interpolation average cost metric.
The rest of the paper is structured as follows: Sec.II introduces the notation and presents some basic concepts and definitions.Then, in Sec.III, we provide our problem formulation.Next, in Sec.IV and Sec.V, we respectively provide a solution, a benchmark, and a baseline.Thereafter we experimentally validate our solution in Sec.VI.Finally, Sec.VII concludes the paper.

II. PRELIMINARIES
In this section, we present the notation and introduce the type of data used in the paper.Afterward, we address the description of spline-based signal estimates as well as related concepts recurrently appearing in this work.Finally, we formally describe the smoothing spline interpolation problem, which will be used as a starting point for the formulation of our problem.

A. Notation
Vectors and matrices are denoted by bold lowercase and capital letters, respectively.Given a vector Similarly, given a matrix M ∈ R R×C , the element in the rth row and cth column is indicated as [M ] r,c .The notation [v] i:j refers to the sliced vector [v i , . . ., v j ] ⊤ ∈ R j−i+1 .We use Euler's notation for the derivative operator; thus, D k x denotes the kth derivative over the variable x.

B. Problem data
The data considered in this paper consist of discrete time series, or series for short, of T terms each.We interchangeably refer to the terms of the series as observations.Each tth observation o t ∈ R2 is described by its time stamp x t ∈ R and its value y t ∈ R, i.e., o t = [x t , y t ] ⊤ .The observation-associated time stamps are set in strictly monotonically increasing order, i.e., x t−1 < x t for all terms in the series.Any two consecutive time stamps define a time section T t = (x t−1 , x t ].Finally, the initial time stamp x 0 is set by the user.

C. Spline-based signal estimates
A spline is defined as a piecewise polynomial function.We denote any spline composed of T piecewise-defined functions, or function sections, as where every tth function section g t : T t → R is a linear combination of polynomials of the form with combination coefficients a t ∈ R d+1 and basis vector function p t : T t → R d+1 defined as The integer d denotes the order of the spline.A spline f T is said to have a degree of smoothness φ if it has φ continuous derivatives over the interior of its domain dom(f T ) = T t=1 T t .Next, Proposition 1 shows how to enforce continuity up to degree φ ≤ d in a spline-based signal estimate of order d.
Proposition 1.Given a spline expressed as in (1), we can enforce its degree of smoothness to be φ ≤ d by imposing the following equality constraint for every t ∈ N [1,T ] , where e t ∈ R φ+1 is a vector such that each of its elements is computed as with u t ≜ x t − x t−1 , and with the exception of e 0 , which determines the initial conditions of the reconstruction and can be either calculated or set by the user.Proof : see Appendix A.

D. Smoothing spline interpolation
Consider the space W ρ of functions defined over the domain (x 0 , x T ] ⊆ R with ρ − 1 absolutely continuous derivatives and with the ρth derivative square integrable.Then, given a whole series of observations {o t } T t=1 with T ≥ ρ and a positive hyperparameter η, we can formulate the following batch optimization problem min known as smoothing spline interpolation [40], [41].The name is due to the unique solution to the optimization problem (6) being a spline conformed of T function sections, as in (1).More specifically, the solution of ( 6) is a spline of order 2ρ−1 with 2ρ−2 continuous derivatives and natural boundary conditions [31].The hyperparameters η and ρ control the smoothness of such a solution.Particularly, the integer ρ dictates the minimum required degree of smoothness of the search function space W ρ and the type of regularization 2 (second term in ( 6)).Regarding η, it controls the trade-off between the squared sum of vertical deviations of the signal estimate from the data and the regularization term.Notice that as η → 0, the solution of (6) approaches the interpolation spline while as η → ∞, it tends to the polynomial of order ρ − 1 that best fits the observations in the least-squares sense.
On the other hand, note that the structure of the solution of the problem (6), being a natural spline, arises organically rather than being imposed in advance.This is a direct consequence of its batch formulation allowing us to delimit the search space W ρ to splines of order d and degree of smoothness φ satisfying 2ρ − 1 ≤ d and ρ − 1 ≤ φ ≤ 2ρ − 2 without loss of optimality.From a practical perspective, it is sufficient to choose the minimum required order and degree of smoothness, thus reducing the model's complexity.However, this trait is not necessarily present in online settings.That is, the smoothness of the solution does not arise naturally using online methods, and it has to be enforced.So here, the choice of the spline order and degree of smoothness is rather userdefined or task-oriented.

III. PROBLEM FORMULATION
Once the problem data, the description of spline-based signal estimates, and the smoothing spline interpolation problem have been introduced in Sec.II, we are ready to formalize the main task of this paper, namely the trainable zero-delay smoothing spline interpolation problem.This section fully describes the aforementioned task from a data-driven sequential decision-making perspective by introducing a suitable dynamic programming (DP) [43] framework.To this end, we first model the environment, define the state space and action space, and delimit a suitable family of candidate policies.Then we introduce the total cost and formulate the above task as the problem of finding the policy incurring the lowest total cost on average.

A. Characterization of the problem data
In our problem, the data described in Sec.II-B are observed sequentially.Before every tth time step, the observation about to be received o t remains undetermined but still governed by the dynamics of the environment.In this work, we model the dynamics of the environment as a random process Y (ω, x), where ω is a sample point from a sample space Ω, and x is a value within an index set X ⊆ R, in this case, time.At time x t , all possible outcomes form a random variable Y (ω, x t ) or y t for short.If the mth sample is considered at time x t , the outcome has a value denoted by Y (ω m , x t ) or simply y m,t .Consequently, if a discrete set of time stamps is chosen, i.e., X = {x 1 , . . ., x T }, T random variables can be formed, and all the information about the discrete random process Y X is contained in the joint probability density function P Y X .

B. State space
At every time step t, we encode a snapshot of the observable environment and the condition of the so-far reconstructed signal in a vector-valued variable called state.With S denoting the state space, each tth state s t ∈ S is constituted by the corresponding observation o t , and the condition at which the reconstruction was left, which is specified by the vector e t−1 whose components are given as in (5), and the time instant x t−1 .Formally, every tth state s t is expressed as ⊤ .Since every state s t is uniquely determined once the spline coefficients a t−1 are fixed, we can explicitly describe the state update mechanism, by means of a deterministic mapping, as Formalizing the state update mechanism in (7) allows us to identify all visitable states seamlessly.

C. Action space
Immediately after receiving the tth observation, we propose a function section as in ( 2), and we implicitly select the spline coefficients a t .This is because the function section is determined as soon as a t is chosen (the basis vector defined in (3) is given).From this point of view, selecting the spline coefficients of a function section can be understood as an action.Any valid action generates a function section of the same order d as the spline reconstruction.Formally, a t ∈ A ⊆ R d+1 for all the tth terms, where A denotes the action space.However, if we want a reconstructed spline that is continuous up to the φth derivative, not all valid actions are appropriate.In our context, for any tth action to be deemed admissible (or feasible), it must satisfy the constraint in (4).Notice that the set of admissible actions depends on the current state.Therefore, we accordingly denote the admissible action space as A(s t ).

D. Policy space
A policy π = {µ t : S → A} t∈{1,2,... } consists of a sequence of functions that map states into actions.Policies are more general than actions because they incorporate the knowledge of the state.However, notice that not all policies return admissible actions.Only the policies that satisfy π(s t ) ∈ A(s t ) for all time steps are termed admissible policies.Separately, stationary policies are policies that do not change over time, i.e., µ ≡ µ t = µ t+1 for all time steps.Hence, a stationary policy is unequivocally defined by the mapping µ.Stationary policies are suitable for making decisions in problems with a varying horizon (varying number of time steps), assuming usually stationary environments.
These arguments motivate the use of admissible stationary policies.However, the space of admissible stationary policies is huge, and therefore, the problem of finding the most adequate policy within it can be overwhelmingly complex.Policy approximation techniques help reduce the pool of candidate policies by restricting them to a certain family of policies.These techniques tend to work best (in the sense of providing an adequate policy) when the problem has a clear structure that can be accommodated into the policy.In our case, we aim to incorporate the temporal dependencies across the observations into the policy, as well as the notion of smoothness discussed in Sec.II-D.To this end, we resort to parametric policy approximation [10] denoting any approximated stationary policy as µ θ , where the vector θ ∈ R P contains the P parameters constituting the aforenamed policy.The set Π of parametric stationary policies that return admissible actions is, therefore, the space of policies of interest to this work.

E. Total expected cost
The following Proposition 2 shows that the smoothing spline interpolation objective introduced in Sec.II-D, equation (6), can be expressed as a summation where each term depends on a single action, resembling the sequence of instantaneous costs in a typical DP formulation.
Proposition 2. The objective of the optimization problem ( 6) can be equivalently computed additively as where M t ∈ S d+1 + with elements given by Based on Proposition 2, we can express the objective of the smoothing spline interpolation problem (6), as the total cost T t=1 κ(s t , a t ), with cost κ : S × A → R given by where M t is constructed as in ( 9).This is because each tth state-action pair contains all necessary information.From here and under a given policy of interest µ θ , as described in Sec.III-D, the metric denotes the total expected cost incurred by following such a policy from a given initial state s 0 , and traveling all the remaining states s t ∈ S t via (7).The expectation in ( 11) is performed over the random process modeling the dynamics of the environment through the observations within the states.

F. Policy search by cost optimization
Computing the expectation in ( 11) is computationally expensive or even intractable when the underlying random process generating the series of observations is unknown.Instead, we can rely on sample average approximation of example series collected from past realizations of the process.The sample average approaches the expectation as the number of examples grows.In this way, we can determine a data-driven policy by solving the following optimization problem arg min s. to: where the integer M denotes the number of example series, indexed by m, and where all the initial states s m,0 as well as all observations o m,t are given.

IV. PROPOSED SOLUTION
The previous Sec.III has provided the necessary definitions and considerations to arrive at a rigorous problem formulation.An exact solution to the problem (12) is probably impossible to obtain in practice, mainly due to the complexity of the search space Π.There are multiple possibilities regarding the policy approximation and optimization techniques that can be taken towards obtaining a near-optimal solution to (12).This section presents a specific set of design choices based on the current state-of-the-art.In particular, we rely on a policy parametrization through cost parametrization technique, borrowed from the DP literature, in synergy with an RNN Fig. 2: Visual representation of our RNN architecture satisfying the relation in (15).The elements onto the gray shaded area constitute the mapping R θ ′ .The CAT cell stands for a concatenation operation.
architecture.Then we make use of backpropagation through time (BPTT) [44], a gradient computation technique borrowed from the deep learning literature [45].Our proposed solution can effectively solve the problem formulated in Sec.III-F for ρ ≤ 2. The remaining configurations manifest instability issues, and even though they may be solvable, they lie outside of the scope of the current paper as further discussed in the following Sec.VI-B.
Future developments in the DP or deep learning areas, such as new policy approximation approaches, neural architectures, or optimizers, can possibly render the techniques proposed in this section obsolete but will not affect the validity of the problem formulated in Sec.III-F.

A. Policy form
Parametric policy approximation via parametric cost function approximation (CFA) [10] is a method that seeks through the policy space, in our case Π, among those policies defined as an optimization problem with parametrized objectives.In this work, we are interested in CFA-based policies of the form µ θ (s m,t ) = arg min a∈A(sm,t) {κ(s m,t , a) + J θ (s m,t , a; h m,t )} , (13) where the map κ denotes the cost described in (10), and the mapping J θ : S × A × R H → R is a parametric cost-to-go approximation involving P parameters contained in the vector θ.Regarding the vector h m,t ∈ R H , it represents a latent state value at the tth time step of an mth example series.The latent state may encode relevant information from past observations and can be viewed as a policy memory [46]- [48].
We aim for a cost-to-go approximation J θ , which penalizes those actions that are distant from the output of a certain RNN.The main reason behind this approach is that an RNN that successfully captures the temporal dynamics of the environment has the potential to pull towards actions that yield a low expected total cost.So, it is constructed as follows where λ ∈ R + .The vectors r m,t ∈ R d−φ , and h m,t ∈ R H represent the outputs and latent state of an RNN, R θ ′ : with θ = [λ, θ ′ ⊤ ] ⊤ and θ ′ ∈ R P −1 , exemplified in Fig. 2.
From now on, we refer to the policy in (13) with cost-togo as in ( 14) as the RNN-based policy.Finally, notice that besides being parametric, the RNN-based policy is admissible and stationary by design.

B. Policy evaluation
Evaluating the proposed policy involves solving the optimization problem stated in (13).Notice that both the cost κ built as in (10), and the cost-to-go approximation J θ described in (14), are convex with respect to the actions and hence, the objective in ( 13) is convex too.Moreover, the admissible action set, described in Sec.III-C, is convex.Therefore, the optimization problem in ( 13) is convex thus, any locally optimal action is globally optimal [49].
Additionally, the optimization problem in (13) has been designed to admit a closed-form solution.Closed-form evaluations can usually be computed faster and more precisely than solutions obtained from numerical methods, and thus, they are more suitable under zero-delay requirements.See Appendix C for the derivation of the closed-form evaluation.

C. Policy training
As explained in Sec.IV-A, we have reduced the search space of problem (12) by restricting the policy space to a family of policies of the form given in (13).Specifically, from searching a function µ θ in the function space Π, we have narrowed the problem down to that of finding a vector θ in the vector space R P .In fact, tuning the proposed policy parameters by solving the optimization problem ( 12) is commonly referred to as policy training.Unfortunately, the objective (12a) is nonconvex with respect to the parameters in θ.As a reasonable solution, we rely on a gradient-based optimizer aiming to converge to a high-performance local minimum.
From a deep learning perspective, the policy evaluation presented in Sec.IV-B can be understood as a forward pass of a DCOL on top of an RNN, and hence, it is trained using BPTT via automatic differentiation [50].This point of view is schematized in Fig. 3, where traveling the given mth series, by following a policy µ θ , allows to construct the cumulative objective in (12a) used for training.Additionally, and thanks to the closed-form policy evaluation discussed in Sec.IV-B, computing and propagating the gradient of the tth action a t with respect to the parameters contained in θ is done avoiding the need of unrolling numerical optimizers [51] or using specific numerical tools for DCOLs such as CVXPY Layers [37].
V. BENCHMARK AND BASELINE METHODS Recall from Sec. II-D that the batch formulation provides the optimal reconstruction with hindsight.The batch solution can be found by solving the optimization problem ( 6), but only once all time-series data are available.Thus, it cannot be used for zero-delay interpolation.Conceptually, online methods achieve a zero-delay response at the expense of incurring higher or equal loss than the batch solution.For this reason, the batch solution is used here as a baseline.
On the other hand, as stated in the Introduction and to the best of our knowledge, there is no related work to our trainable zero-delay smoothing interpolation approach in the literature.One could consider that the closest approach is the interpolation method known as myopic.This is a local method in the sense that it only focuses on the last received data sample while completely ignoring the distribution of future arriving data.For this reason, the myopic method is used here as a benchmark.In this sense, our proposed method must outperform the myopic method to be deemed acceptable.

A. Myopic benchmark
A policy that chooses the action that minimizes the current or instantaneous cost is commonly referred to as myopic.It can be constructed as with cost κ as in (10) and admissible action set as described in Sec.III-C.Notice that since the myopic policy does not contain trainable parameters, it does not need to be trained.Moreover, the myopic approach is carried out as a parameterless CFA-based policy, hence, becoming a particular case of (13).For this reason, it also admits a unique and closed-form evaluation.See Appendix C for more details.
VI. EXPERIMENTS In this section, we experimentally validate the effectiveness of the proposed RNN-based policy, introduced in Sec.IV.To this end, we first describe the time-series datasets used.Then, we outline the possible policy configurations, i.e., the possible types of splines as well as the RNN architecture.Afterward, we report how the experiments have been carried out.Finally, we present and comment on the experimental results.

A. Problem data description
For these experiments, we use a synthetic dataset and five real datasets.Each dataset consists of a time series of 28800 signal samples which has been split into 288 series of 100 samples each, except for the first real dataset which contains 57600 samples split into 576 series.
The synthetic dataset is first generated as a uniformly arranged realization of a given autoregressive process AR (2) with white Gaussian noise N (0, 0.1).Then, the resulting series is compressed via PI [52] with CompDev = 0.1, CompMax = ∞ and CompMin = 0.As a result, the series time stamps are not uniformly distributed anymore.
The first real dataset (R1) consists of a series of household minute-averaged active power consumption (in Kilowatts) [53].The second real dataset (R2) is a quantized and PIcompressed (and hence not uniformly sampled) time series measuring an oil separation deposit pressure 3 (in Bar).For the third real dataset (R3) [54], a cooling fan with weights on its blades is used to generate vibrations which are recorded by an attached accelerometer.The vibration samples are recorded every 20 milliseconds.We use the accelerometer recorded x-values (which are standardized) for the rotation speeds ranging from 5 to 40 rpm.The fourth real dataset (R4) [55] monitors the skin temperature (in Celsius degrees) of a 3 Data collected from Lundin's offshore oil and gas platform Edvard-Grieg.volunteer subject through a wearable device every 4 minutes.The fifth and last real dataset (R5) [56] consists of a sensor within a sensor network deployed in a lab, collecting the temperature-corrected relative humidity in percentage.The sampling rate is non-uniform and ranges from deciseconds to tens of seconds.Finally, it is worth mentioning that the datasets R4 and R5 contain gaps (several orders of magnitude wider than the average sampling period) of missing data that we have shortened to avoid instability in the reconstruction.In similar cases where the available raw data is of low quality, thorough and task-specific data preprocessing techniques are assumed.This can improve the performance results as described in the ensuing Sec.VI-D.

B. Policy configuration
We experimentally observe that the myopic policy described in Sec.V-A is not stable for values of ρ > 2. Recall that the value of ρ affects the policy cost, set as in (10), and delimits the order and degree of smoothness of the spline signal estimate, as explained in Sec.II-C.We also observe instability under the myopic policy for ρ = 2 with a spline signal estimate of order d = 3 and degree of smoothness φ = 2. Consequently, our proposed RNN-based policy is unstable for the same ρ values and spline configurations since it implicitly uses the myopic policy as a guided starting point.This can be seen 2.9e-4 9.7e-3 6.1e-4 1.4e-2 5.4e-3 3.6e-2 by comparing ( 16) and ( 13) with a near-zero initial value of λ.Although further theoretical instability studies, alternative policy architectures, or low-delay approaches can contribute to solving the instability issue, they lie outside of the scope of this paper.Nonetheless, we have maintained the general problem formulation as a starting point for future works to take over.On the other hand, the interpolation problem with ρ = 1 is not interesting since it leads to linear interpolation.Therefore, in the present work, we focus on the smoothing interpolation problem with ρ = 2 and with the remaining stable spline configurations, within the search function space described in Sec.II-D, which can lead to optimal reconstructions.Those spline configurations, hereinafter specified by the shorthand notation (d, φ) of the order and degree of smoothness of the spline, correspond to (3,1) and (4,2).Accordingly, the notation Myopic(d, φ) or RNN(d, φ) refers to the type of policy besides the spline configuration.
Regarding the RNN architecture shaping the approximated cost-to-go within the RNN-based policy, introduced in Sec.IV-A and illustrated in Fig. 2, we set a preprocessing step that forwards the time length, i.e. u t = x t − x t−1 , of the tth time section T t , instead of directly using the time stamps.This preprocessing step makes the architecture invariant to time shifts in the set of time stamps.In our experiments, the recurrent unit consists of two stacked gated recurrent unit (GRU) layers [57], with a latent state (hidden state) of size 16 and an input of size 16.The input and output layers are set as linear layers to match the required dimensionality, i.e., to match the input size after the preprocessing step and to match the order of the spline minus the number of constrained

C. Experimental setup
The datasets are randomly divided into 192 series for training, 64 for validation, and 32 for testing.Except for the R1 dataset, which has been divided in the same proportion but in relation to its data size.The benefits of this trainvalidation-test partition are two-fold: i) the policy becomes more robust against unknown initial conditions, and ii) we can validate the reconstruction against an optimal batch solution (shorter sequences are computationally tractable using batch optimization).All series within a dataset are standardized for implementation convenience.To avoid data leaking, the mean and standard deviation of their respective training partition are used for the standardization.In other words, we compute the mean and variance of the training partition and assume them to be the moments of the true data distribution.The standardization of series is useful to enforce the RNN unit to focus on the fluctuations of the signal values rather than on their magnitude.Finally, the RNN-based policy has been trained using the adaptive moments (Adam) optimizer [58], with β 1 = 0.9, β 2 = 0.999, without weight decay, and a learning rate of 0.001 over mini-batches of 32 time series each (double mini-batch size in the case of R1).

D. Results and discussion
Some of the training-validation curves are presented in Fig. 4. As expected, we observe that randomly initialized RNNbased policies (except for the parameter λ, which controls the length of the initial performance gap, as discussed in Sec.VI-B, and is manually initialized) only outperform the myopic policy after training.We also observe wider (in relative terms) standard deviations in those datasets with more abrupt changes, either from the nature of the data, as in R1, or due to missing data and posterior preprocessing, as in the case of R4 and R5.This phenomenon appears also to be caused by highly non-uniform sampling rates, as in R5.But in this case, the width seems to decrease as the policy yields more accurate estimates.This implies that the RNN-based policy is able to learn how to adapt under non-uniform sampling rates properly.
Once the RNN-based policy has been trained, we measure its performance with respect to the myopic policy (as the benchmark) and the batch reconstruction (as the baseline) through an improvement metric defined as where ℓ M , ℓ R , ℓ B denote the loss metric displayed in Fig. 4 but over the test partition for the myopic, the RNNbased policies and the batch solution, respectively.Acceptable performances yield improvement values in (0, 1], being I = 1 the best possible value, whereas nonpositive improvement values indicate a deficient performance.The standard deviation of the improvement metric is then estimated through error propagation, i.e., with σ M , σ R , and σ B denoting the standard deviation of the respective loss metrics over the test partition.The improvement results are summarized in Table I.From Fig. 4 and Table I, it can be observed that the policy configurations with the highest improvement scores over each of the considered dataset test partitions are in agreement with their corresponding validation curves.Table I also shows standard performance descriptors such as the mean squared error (MSE) and mean absolute error (MAE).See Appendix D for their computation.Note that for most of the experiments that we have carried out, the RNN-based policy outperforms, in terms of the MSE and MAE metrics, the myopic policy while it falls behind the batch policy.This observation experimentally justifies the smoothness assumption in our formulation.Regarding the parameter λ ∈ R + introduced in ( 14), it can be understood as the confidence of the RNN-based policy in its ability to foresee incoming data samples.In this way, it also quantifies the importance of the RNN architecture (detailed in Fig. 2) in the reconstruction task.As an illustration, Fig. 5 shows the training curves corresponding to the parameter λ for the policy configurations presented in Fig. 4.
On the other hand, we observe a competitive performance in terms of the execution time of the RNN-based policy evaluation (forward pass) as compared to their myopic counterpart.Our evaluation time results are summarized in Fig. 6, where the policies are implemented in Python 3.8.8. and the experiment is done in a 2018 laptop with a 2.7 GHz Quad-Core Intel Core i7 processor and 16 GB 2133 MHz LPDDR3 memory.Regarding memory complexity, the myopic policy is parameterless (see Sec. V-A), and our configuration of the RNN-based policy (see Sec. VI-B) contains approximately 3400 trainable parameters, which is arguably a reduced model size for most tasks.
Finally, and for the sake of completeness, Fig. 7 shows a snapshot of a zero-delay smooth signal reconstruction alongside its two first derivatives using our proposed method.

VII. CONCLUSION
In this paper, we propose a method for zero-delay smoothing spline interpolation.Our method relies on a parametric policy, named the RNN-based policy, specifically engineered for the zero-delay interpolation task.As new data samples arrive, this policy yields piecewise polynomial functions used for smooth signal reconstruction.Our experiments show that the RNNbased policy can learn the dynamics of the target signal and efficiently incorporate them (in terms of improved accuracy and reduced response time) into the reconstruction task.
This work can be seen as a proof of concept with several immediate follow-ups.The flexibility in our policy design allows extending this work to multivariate time series with a moderate increase in complexity.It is also possible to generalize the problem data, e.g., quantization intervals instead of data points, as well as to accommodate additional constraints as long as the convexity of the policy evaluation problem is preserved.Lastly, we notice that our work provides the foundation and can be tailored effectively for reconstructing non-stationary signals by borrowing reinforcement learning techniques.

A. Proof of Proposition 1
Recall from Sec. II-C that every spline f T , as in (1), is composed of T function sections and T − 1 contact points.We say that two consecutive function sections have a contact of order φ if they have φ equal derivatives at the contact point.Then, guaranteeing a degree of smoothness φ for a given spline f T is equivalent to ensuring that all its contact points are at least of order φ since every tth function section g t , as in (2), is already smooth over the interior of its domain T t .In practice, this can be ensured by imposing the following equality constraints lim for every k ∈ N [0,φ] and t ∈ N [2,T ] .From here, notice that the kth derivative of every tth function section g t can be computed as Also, notice from the definition in (3) that the ith component of the tth basis vector function p t equals for all i ∈ [1, d + 1].From this point, the kth derivative of each ith component of the basis vector function p t can be straightforwardly computed as Now observe that lim Therefore, from the relations in ( 20) and ( 23), the right hand term in ( 19) can be equivalently computed as Separately, we can define a vector e t ∈ R φ+1 whose components are constructed as for every k ∈ N [0,φ] and t ∈ N [1,T ] with u t ≜ x t − x t−1 , and where the step (25b) uses the relations described in (20) and (22).On the other hand, e 0 encodes the initial boundary conditions of the reconstruction and can be set by the user in advance or calculated.Finally, by dividing both sides of the equality constraint in (19) by k!, using the relations derived in ( 24) and ( 25), and appropriately renaming the indices we obtain the Proposition 1.

B. Proof of Proposition 2
Recall from Sec. II-D that the solution to the optimization problem stated in ( 6) is a spline function in W ρ .This fact allows us to reduce the search function space without loss of optimality.In fact, we can incorporate the spline form of the solution into the objective functional as far as we ensure the required minimum degree of smoothness of the solution, for example, via (19).From here, we can equivalently compute the regularization term in the objective in (6) (second term) as Separately, and making use of the definition of function section in (2), we obtain the following relation with From the relation in (22), it is clear that the first ρ rows and columns of the matrix defined in (28) are zero valued.Then, we can compute the rest of the elements in the matrix M t ∈ S d+1 + as follows On the other hand, the sum of squared residuals in the objective in (6) (first term) can be equivalently computed as from the definition of spline, see relation (1).Summing up, the result stated in Proposition 2 can be reached starting from the objective in (6) then following the relations in ( 26), ( 27) alongside ( 29) and (30).C. Closed-form policy evaluation Notice that both the proposed policy in (13) and the myopic policy in ( 16) can be equivalently evaluated by solving the following quadratic convex problem µ(s t ) = arg min a∈A(st) where the terms A t ∈ S d+1 + and b t ∈ R d+1 take different values for the different policy variations as described in the Table II.The form in (31) is displayed as an intermediate step for the sake of clarity, and the dependencies with example time series (indexed by m) and the policy parameters (contained in θ) have been omitted for the sake of notation.Then, we relocate the equality constraints (presented in (4) and satisfied by the actions in the admissible set A(s t )) in the objective of (31), by restating or equivalently, by setting a = B 1 e t−1 + B 2 α where the components of e t−1 are described in (5), the matrices B 1 ≜ [I φ+1 , 0 (φ+1)×(d−φ) ] ⊤ ∈ R (d+1)×(φ+1) and B 2 ≜ [0 (d−φ)×(φ+1) , I d−φ ] ⊤ ∈ R (d+1)×(d−φ) are defined for the sake of notation, and where α ∈ R d−φ .After some algebraic steps, both policies can be equivalently evaluated as where the vector α t ∈ R d−φ is obtained from with closed-form solution given by being B ⊤ 2 A t B 2 ∈ S d−φ ++ .

D. Boostrap method for estimating the MSE and MAE
When the data distribution, or in our case, the underlying (assumed) smooth process, is unknown, we cannot follow the standard MSE and MAE computation procedure because the original function ψ ∈ W ρ is also unknown.Instead, we only have access to a certain dataset of test samples, e.g., D = {(x t , ψ(x t ))} Mathematically, this can be expressed as: where f B is the signal estimate constructed from the test data subset B. This procedure is illustrated in Fig. 8. Notice that it is important to partition the test data because any test data sample used for the signal reconstruction cannot be used to compute the performance metrics.Otherwise, this results in data leakage.On the other hand, due to the lack of data samples, the performance metrics estimated in this way, may not be as accurate as if we had larger test sets, or more specifically, large test sets with higher temporal resolution.Thus, to reduce the variance of the MSE and MAE estimators, we repeat the procedure for several randomly chosen partitions B, B with replacement (i.e. they may repeat) and average the result.Particularly, we perform 10 repetitions.
This work was supported by the SFI Offshore Mechatronics grant 237896/O30 and the IKTPLUSS INDURB grant 270730/O70.

Fig. 3 :
Fig. 3: Rolled representation of the environment-agent system.The sampler cell samples the random process modeling the dynamics of the environment.The interpolator cell performs the reconstruction, evaluates the cost, and updates the states.The agent cell contains the policy and the RNN.The blueshaded area encompasses the environment.The cost (in gray) is used during the training phase but not for evaluation.

Fig. 4 :
Fig. 4: Some of the training-validation curves for the considered RNN-based policy configurations (d, φ) and η values for each dataset.The legends (T) and (V) refer to the training and validation partitions, respectively.The loss metric is the average total cost per function section (see Sec. III-E).The shaded areas represent one standard deviation.

Fig. 6 :
Fig. 6: Box plot of the policy execution time per interpolation step over the test partitions and η values {0.1, 1, 10}.It illustrates (excluding outliers) the minimum, first quartile, median, third quartile, and maximum.

Fig. 7 :
Fig. 7: Snapshot of a reconstructed time series from the test partition of the synthetic dataset.Both of the presented policies have been trained over the training partition before reconstruction.

T t=1 .Fig. 8 :
Fig. 8: Illustration of the residuals used to estimate the MSE via bootstrapping.The test partition comes from the synthetic data and has been reconstructed with a Myopic(3,1) policy with η = 1 and ρ = 2. B contains 90% of the test partition and B the remaining 10%.

TABLE I :
(17)ormance metrics averaged over the test partitions, for different η values and policy configurations with ρ = 2. Recall that the improvement metric(17)reports the gain of any RNN(d, φ) configuration over its corresponding Myopic(d, φ) (as a benchmark) and batch(d, φ) (as the baseline) configurations.