Asynchrony Increases Efficiency: Time Encoding of Videos and Low-Rank Signals

In event-based sensing, many sensors independently and asynchronously emit events when there is a change in their input. Event-based sensing can present significant improvements in power efficiency when compared to traditional sampling, because (1) the output is a stream of events where the important information lies in the timing of the events, and (2) the sensor can easily be controlled to output information only when interesting activity occurs at the input. Moreover, event-based sampling can often provide better resolution than standard uniform sampling. Not only does this occur because individual event-based sensors have higher temporal resolution, it also occurs because the asynchrony of events allows for less redundant and more informative encoding. We would like to explain how such curious results come about. To do so, we use ideal time encoding machines as a proxy for event-based sensors. We explore time encoding of signals with low rank structure, and apply the resulting theory to video. We then see how the asynchronous firing times of the time encoding machines allow for better reconstruction than in the standard sampling case, if we have a high spatial density of time encoding machines that fire less frequently.


I. INTRODUCTION
Many aspects of our lives are governed by routine and rythm: our work days, circadian rythms, breathing patterns, or even music. However, applying metronomic schedules might not necessarily be resource-efficient for all applications. We generally say "hello" when we see someone we know rather than saying it at regular intervals.
Many engineered systems, such as traditional sampling devices, rely almost exclusively on clocked behavior. These sampling schemes are powerful -they govern how we record music, take images, transfer information -but they fail to adapt their activity to the varying complexity of the input.
This drawback leads to inefficiencies which are apparent when comparing the power consumption of human-engineered technologies to biological equivalents.
As demands for increased storage and processing coupled with smaller devices has brought efficiency into the spotlight, researchers have been turning to biology for inspiration. Inspired by neurons, event-based sensing is growing in popularity [2]- [4]. The output of such a sensor is a series of spikes which are characterized by their timing rather than their amplitude, as is the case with traditional sampling [5]. In addition, since spikes times are dependent on the input, the activity of the output is correlated to the activity of the input.
While efforts have recently been invested to better understand event-based sensing and reconstruction of bandlimited or finite-rate-of-innovation signals [6]- [8], comparisons between event-based sensing and standard sampling have mostly considered the timing-based output of event based sensing to be more of a pesky necessity that requires some work-around rather than a blessing in disguise. Actually, it is precisely this staggered asynchrony in the outputs that can allow for better resolution and more flexibility when using event-based sampling.
We show this in a series of steps. First, we review time encoding machines (TEMs), which are the ideal event-based sensor [9,10]. The input-output relationship of an integrateand-fire TEM follow similar rules to that of nonuniform sampling [11,12], and thus results in interesting consequences when it comes to single-signal multi-channel time encoding [13,14].
The natural extension to multi-signal multi-channel time encoding then offers optimal sampling efficiency for lowdimensional signals with known structure. We find a Nyquistlike criterion on the number of spikes needed for reconstruction, requiring as many linearly independent constraints as degrees of freedom.
Using this formulation, the video recording problem with TEMs turns into a parametric estimation problem. We study the setup in Fig. 1: multiple TEMs are used to encode multiple locations in a scene and each TEM outputs a series of asynchronous spikes. We find that this setup offers interesting tradeoffs in terms of time and space resolution: for once, increasing space resolution can actually increase time resolution as well, precisely thanks to our blessing in disguise-the staggered, asynchronous outputs of the TEMs.
As a result, time encoding or event-based sensing encourages increasing the number of sensors rather than the spiking rate of sensors as a means to improve resolution in both time and space. This, in turn, means that (1) there are fewer hardware requirements on the sensors themselves, and (2) sensors can integrate information over more time, thus avoiding issues with low photon count that occur at higher shutter speeds with standard cameras.
This particular result does not usually hold in the equivalent scenario in standard frame-based video recording. There, the resolution in time and space are almost independent from T i m e Time Time Time Time Fig. 1: Vision setup: we assume that we have an array of spiking devices, such as photoreceptors or TEMs, each of which is observing a scene at a particular location. The input to the receptor at this location is a time varying signal and the receptor will output a stream of spikes, the timing of which is dependent on the input. On the left, we show the projection of the scene which is being observed, with an overlay of event-based sensors shown in yellow. To its right, we zoom in to view the spiking output of some of the sensors. one another. The former depends on the frame-rate of the camera and the latter depends on the sampling pattern of the camera, which of course includes the number of pixels used for sampling. If we increase the number of pixels used to record a frame, this would not improve the resolution in time, because all pixels of a frame are taken at the same moment in time. To fix this, different pixels would need to record frames at different times. This is possible but renders things more complicated: reconstruction would either require that time shifts between pixel clocks be known, or it becomes a difficult problem that has no uniqueness guarantees. In contrast, when using time encoding, spike times are-by design-almost surely different and this difference comes at no extra cost.

II. BACKGROUND
As first presented in [15], time encoding machines (TEMs) encode inputs using times that are dependent on the input itself. TEMs can therefore be used to model neurons or sensory receptors such as photoreceptors. In fact, the simpler neuron models often encode their input currents using action potentials with fixed amplitude and varying timing, where the timing holds the information about the input [16].
In this paper, we will consider one model for time encoding machines which resembles an integrate-and-fire neuron with no leak [11,16]. Such TEMs can provide perfect encodings of signals using one or many channels. The circuit of a TEM is depicted in Fig. 2.
Definition 1. A time encoding machine (TEM) with parameters κ, δ, and β takes an input signal x(t), adds a bias β to it and integrates the result, scaled by 1/κ, until a threshold δ is reached. Once this threshold is reached, the time t k at which it is reached is recorded, the value of the integrator resets to −δ and the mechanism restarts. We say that the machine spikes at the integrator reset and call the recorded time t k a spike time.
The first results on time encoding machines that resemble this model were, to the authors' knowledge, established by Lazar and Tóth [9].
The results operate under the following assumptions.
Under these assumptions, the input x(t) can be reconstructed from the emitted spike times if the parameters of the machine satisfy β > c and the bandwidth satisfies The reconstruction scheme and the proof of convergence are based on two key elements: 1) the time encoding scheme is tightly related to the scheme of sampling averages, therefore the results developped for the reconstruction from averages can be used for time encoding and reconstruction [12,17], and 2) when performing time encoding, the maximal delay between two consecutive spike times is dictated by the Spike triggered reset Fig. 2: Circuit of a Time Encoding Machine, with input x(t), threshold δ, integrator constant κ and bias β.
parameters of the machine: Given these two observations, and under Condition (1), the input signal can be perfectly determined by the spike times using algorithms based on alternating projections onto convex sets [6,9,18].
Later, the theory was extended to multi-channel time encoding of a signal. On one hand, Lazar suggested a scheme for bandlimited signal sampling and reconstruction using many time encoding machines coupled with filter banks [13]. On the other hand, we suggested a scheme for sampling and reconstructing a bandlimited signal using many time encoding machines that are similar and have no pre-filters [19]. In the latter scenario, we showed that if one TEM can encode a signal with bandwidth Ω, then M TEMs can encode a signal with bandwidth M Ω assuming that the TEMs have unknown non-zero shifts between their integrators α 1 , · · · , α M [19]. In other words, M time encoding machines with parameters κ, δ, and β can encode a signal which satisfies assumptions (A1), (A2), (A3) if As a result, instead of using one TEM with a certain spiking rate to encode a signal, one can now use many TEMs with lower spiking rates to encode the same signal. This is useful if time encoding machines or neurons have an upper limit on their spiking rate.
The result generalizes to multi-signal, multi-channel time encoding, as partly studied in [20] and as we will see in the next sections.

III. PROBLEM SETUP
For the remainder of this paper, we consider many timevarying signals y (i) (t), i = 1 · · · I that are correlated with each other. These signals are encoded using time encoding machines and we assume that each signal follows a parametric model which we know.
Correlated signals arise in many applications such as, among others, meteorological data, biomarkers in human patients, regional economic data, audio, and video. The latter example will be given particular attention later on.
In our setup, we let y(t) denote the vector signal composed of y (i) (t)'s and let y(t) be such that (A4) each y (i) (t) has a finite parametric representation: where the c i,k (y) are fixed coefficients that are unknown apriori and the f k (t)'s, k = 1...K are known functions, (A5) each y (i) (t) can be written as a linear combination of x (j) (t)'s, j = 1 · · · J where J < I: for a matrix A ∈ R I×J , and (A6) each y (i) (t) is sampled using a time encoding machine TEM (i) with parameters κ (i) , δ (i) and β (i) which are known and can vary between machines. The outputs of the machines are denoted t (i) , = 1 · · · n (i) spikes . The sampling setup we described is depicted in Fig. 3 We will consider two options for the functions f k (t): for Ω and τ k known, so that the y (i) (t)'s are a finite sum of sincs, or so that the y (i) (t)'s are bandlimited periodic functions. Thankfully, the functions resemble each other enough for the treatment of the two functions to be done at the same time. For both of them, we consider the reconstruction conditions with A satisfying either of the following two assumptions. (A8.a) The linear map from the low dimensional space A ∈ R I×J is known. (A8.b) The linear map from the low dimensional space A ∈ R I×J is unknown but the dimension of the low dimensional space J is known. We first consider the case where A is known and provide conditions for perfect reconstruction in Section IV-A and a reconstruction algorithm in Sections IV-B. We later provide applications for this scenario in Sections V and VI, where we deal with time encoding video.
Later, we will consider the case where A is unknown and provide a reconstruction algorithm based on singular value projection for low-rank matrix recovery in Section VII. We then follow with simulations to show results and with example applications for time encoding time-varying scenes.

IV. KNOWN LOW-RANK FACTORIZATION: TIME ENCODING AND RECONSTRUCTION A. Conditions for perfect reconstruction
We can establish the following sufficient conditions to ensure that a series of inputs y (i) (t) are reconstructible from their time encoding using machines TEM (i) . Fig. 3: Sampling setup: J input signals x (j) (t), j = 1 · · · J are mixed using a matrix A and produce signals y (i) (t), i = 1 · · · I. Each y (i) (t) is then sampled using a time encoding machine TEM (i) which produces spike times t (i) , = 1 · · · n (i) spikes .
Theorem 1. Let I signals y (i) (t), i = 1 · · · I satisfy assumptions (A4), (A5) and (A6), and their functionals f k (t) satisfy either of (A7.a) or (A7.b) with the corresponding coefficients c i,k being drawn from a Lipschitz continous probability distribution. Now assume A ∈ R I×J as defined in (A5) is known and has every J rows linearly independent. Then the inputs y (i) (t), i = 1 · · · I are exactly determined by the spike times if the time encoding machines start sampling at t 0 with a known integrator value ζ An intuitive explanation of this result follows in Section IV-C. The theorem just stated can be generalized to include scenarios where the initial integrator value is not known: Under the same assumptions of Theorem 1, but when the time encoding machines have an unknown integrator value ζ (i) 0 , the inputs y (i) (t), i = 1 · · · I are exactly determined by the spike times t (i) , = 1 · · · n (i) We can prove the above theorem and corollary by writing it as a problem of rank one measurements, also called bilinear measurements in [21]. The full proof is provided in Appendix A.

B. Reconstruction Algorithm
The spike time outputs of the machines t (i) , = 1 · · · n (i) spikes provide constraints on the integral of the input signals: These measurements can be rewritten to fit the rank one measurements formulation [21]. Letting C(x) denote the matrix of coefficients c j,k (x) for the underlying signals x (j) (t), we can reconstruct C(x) (and therefore y(t)) by solving where b (i) l is known and denotes the integral with t (i) 0 denoting the time at which TEM i starts integrating and Under the conditions of Theorem 1, the linear system in (11) is full rank and C(x) can be recovered perfectly.
Once the matrix C(x) has been recovered, one can recover the coefficients c i,k (y) of the y (i) (t)'s by setting C(y) = AC(x) and can therefore recover the original sampled signals.

C. Interpretation
The results in Theorem 1 and Corollary 2 establish a Nyquist-like criterion for recovery. They specify how to count the number of linearly independent constraints in the multichannel TEM setup and require as many of these constraints as there are degrees of freedom to recover the sampled signals. The results can be summarized by a few key points: 1) When sampling a collection of signals with a known linear mapping to or from a lower dimensional representation, what matters is the number of degrees of freedom in the low dimensional space, rather than the number of degrees of freedom in the high dimensional space. More practically, to ensure perfect reconstruction, we need the number of linearly independent constraints to be at least the number of degrees of freedom in the low dimensional space JK. In the case where J << I, we can see how this can be a major improvement in spiking rate. 2) When multiple correlated signals are sampled using different time encoding machines, a lower spiking rate of one machine can be compensated for by higher spiking rates from others. This can be seen by observing the summation in (8) and noting that the total spiking rate of the machines matters more than the individual spiking rates. 3) One machine can only compensate for another machine's low spiking rate up to a certain degree. This can be seen by the min term in (8) which implies that every machine has a maximal "useful" spiking rate depending on the signal and that going above this spiking rate does not add further information. This has a series of implications. First, signals that have lower dimensional representations can be sampled at lower rates, increasing sampling efficiency. Second, if TEMs have limited capacity in terms of spiking rates (for example they have a refractory period), this can be compensated for by adding more TEMs. This would still ensure reconstruction of the input since the reconstruction condition in (8) is only linked to the number of degrees of freedom in the low dimensional space. Third, we will see in Section VI how the results help us solve time encoding of time-varying spatial signals which have certain structure in space.
Note that these results provide a stark improvement to sampling high-dimensional but low-complexity signals using regular clock-based sampling. In fact, Theorem 1 holds because of one key element: different dimensions of the signal are sampled at different times with continuous probability distributions. Regular-based sampling does not have this property; and indeed, it only takes a short mental exercice to see that the recovery of y(t) takes IK samples if the y (i) (t)'s are all sampled at the same sampling times.
To be fair, one could ensure that different y (i) (t)'s are sampled at different times (minus the continuous probability condition), but this condition is much more elegantly ensured in the time encoding scenario. Moreover, using different clocks in the classical sampling setup poses difficulties because it is hard to align different clocks. Clock alignment is not an issue in time encoding because the time reference of different time encoding machines can always be aligned by simply adding the spike trains of two machines and registering the time differences.
V. REPRESENTING 2D SIGNALS WITH SPIKES: VIDEOS WITH 1 SPATIAL DIMENSION The results obtained in previous sections provide considerable improvements in sample requirements for multi-signal reconstruction when these signals have a low dimensional structure.
However, one cannot help but wonder how restrictive the conditions we have set are and which existing situations actually satisfy the given restrictions.
To answer these questions, we study how bandlimited videos fit into our framework. First, we start with a simpler case and consider a twodimensional (2D) signal y(d, t) that is bandlimited in both components. Note that, exclusively when using term "2D" and "3D", when we refer to "dimension", we mean the spatial and time dimensions, i.e. the signal varies along each of these two components. We do not refer to the complexity of the signal (as it relates to I, J and K) as we did before.
where c k0,k1 denote the 2D Fourier series coefficients of y(d, t). Note that we assume that y(d, t) has (2K 0 + 1) × (2K 1 + 1) of these coefficients with periods T and D in the time and space components, respectively.
The results here will concern any such signal but, to make the treatment more intuitive, we will assume that we are dealing with a visual scene that has one continuous spatial component d and is varying along time t. To be clearer, taking a picture of this scene at time t provides the light intensity along one direction, which we assume to be the horizontal direction, without loss of generality. See Fig. 4 for illustration.
Now assume that we sample this time-and-horizontallyvarying scene using I TEMs. Each TEM (i) is associated with a location in "space", i.e. a position on the horizontal axis, d (i) , such that the sampled signal y (i) (t) satisfies: To make the connection to the theory in Section IV, we first need to define an auxiliary vector signal x(t) with 2K 1 + 1 components, such that We now notice that we can rewrite We can directly see that this brings back the structure we saw earlier, we have y(t) = Ax(t) where x(t) is as defined in (13) and , where i denotes the sampled channel.

VI. REPRESENTING 3D SIGNALS WITH SPIKES: VIDEOS WITH 2 SPATIAL DIMENSIONS A. Theory
We can use a similar treatment to understand how to time encode and reconstruct 3D signals y(d 1 , d 2 , t). These signals can be interpreted as scenes that have 2 spatial components d 1 and d 2 (horizontal and vertical) and one time component t, as in videos.
We again assume that such a signal y(d 1 , d 2 , t) is bandlimited along all components: Once again we assume that y(d 1 , d 2 , t) is sampled in space at locations specified by d where sample i is taken at spatial location d (i) = (d We define x (k1,k2) (t) in a similar fashion to (13): and we obtain the input signals to the TEMs Once more, we have found that we are time encoding y(t) = Ax(t) with the entries of x(t) satisfying (16), and a matrix A which is known if we know the locations of the time encoding machines d (i) . If the locations of the time encoding machines d (i) are such that A has every (2K 1 + 1)(2K 2 + 1) rows linearly independent, then all coefficients c k0,k1,k2 (t) can be recovered using 2 n=0 (2K n + 1) appropriate measurements. This means that the continous scene can also be recovered, so we can interpolate the scene between spike times in both space and time components.
One example of a matrix A that satisfies the above constraint arises when one follows sufficient uniform gridding.
Definition 2. Sufficient Uniform Gridding defines the sampling locations d (i) to follow a uniform grid over a spatial period, with 2K 1 +1 positions in the d 1 direction and 2K 2 +1 positions in the d 2 direction. More formally, i ranges between zero and (2K 1 + 1)(2K 2 + 1) and Lemma 1. The matrix A obtained from using sufficient uniform gridding with entries as defined in (17) has every (2K 1 + 1)(2K 2 + 1) rows linearly independent.
Proof: The proof of relies on calculating the Gram matrix of A and noticing that it is diagonal and therefore full rank. Consequently A also has a full rank (2K 1 + 1)(2K 2 + 1) and has every (2K 1 + 1)(2K 2 + 1) rows linearly independent. This is not the only case in which A satisfies our assumptions, it seems that more general configurations of the spatial sampling can also work provided the samples cover the space.
As was the case in Section IV, admitting that TEMs are receiving input signals that have a low dimensional structure allows one to manipulate the number of time encoding machines while keeping the same total spiking rate, and without compromising on reconstructibility.
In other words, every TEM does not have to be able to perfectly reconstruct its own input for the entire scene to be reconstructed. On the contrary, emitted spikes from all machines are used collaboratively in order to reconstruct the scene which has a parametric representation.
Therefore, if we have TEM-like receptors or sensors that have a limited spiking rate, spatial and temporal resolution can be regained by adding more sensors at new locations.

B. Simulations
We would like to illustrate the theoretical results obtained in the previous section on an actual video, to illustrate the relationship between spatial and temporal sampling density. First we choose a video recorded with a standard frame-based camera [22], and examine a patch of this video as shown in Fig. 5. This patch has H × W × N f samples where H refers to the height of the patch in pixels, W refers to the width of the patch in pixels and N f refers to the number of frames.
We assume the underlying scene has a periodic bandlimited structure (which is also the assumption that allows for finite uniform sampling). This allows us to (1) express the scene as in 15 and (2) fix the corresponding number of Fourier series coefficients to match the number of samples (2K 0 + 1) × (2K 1 + 1) × (2K 2 + 1) where K 0 = H/2 for example. The patch we consider is therefore a smooth function Fig. 5: A time-varying scene is sampled at different spatial locations using TEMs. On the left, we see the scene with varying spatial and time components taken from the Need for Speed dataset [22]. On the right, we see a time-varying patch which we will record using time encoding machines placed at the yellow dots. Originally, the video data we use was captured using a standard frame-based camera. We smoothly interpolate the video by assuming that the underlying structure is bandlimited and periodic and we aim to estimate the corresponding Fourier Series coefficients using the spikes emitted by the TEMs. In this case, we use a 9×9 grid of TEMs on an interpolated version of a 9×9×N f video patch where N f is the number of frames used for the interpolation and time encoding. Therefore the number of Fourier Series coefficients to obtain is 9×9×N f . with a fixed number of parameters we are interested in and which can be sampled anywhere in time and space.
Given the smoothly varying patch, we place TEMs, for example, at the yellow dots in Fig. 5. In this case, we have a patch which is 9 pixels high and 9 pixels wide and we place a 9×9 grid of time encoding machines, according to the definition of sufficient uniform gridding. We will show, in our experiments that this is the minimum number of TEMs required to achieve perfect reconstruction.
We will also show how we can use more TEMs in the spatial components to obtain better resolution in the time component. This will not necessarily be the case the other way around: more sampling in time does not always provide improved spatial frequency resolution.
The interpolated patch from Fig. 5 is sampled using a fixed number of TEMs and we vary the number of spikes per TEM to see how this effects the reconstruction error in Fig. 6.
We consider three scenarios from top to bottom: we have a 9 × 15 uniformly spaced grid of TEMs, a 9 × 9 uniformly spaced grid of TEMs (similar to uniform sufficient gridding), and a 9 × 5 uniformly spaced grid of TEMs.
We examine the evolution of the reconstruction error as the number of spikes per TEM increases. The number of constraints provide by the spikes (dashed green lines in the figure) will not always match the total number of spikes at which the reconstruction error significantly decreases. The results rather match the predictions of Theorem 1. In fact, Theorem 1 cannot place any guarantees on reconstruction for the case where there are fewer TEMs than spatial components (i.e. when we have a 9 × 5 grid of TEMs). In fact, perfect reconstruction is never possible: the system will always be underdetermined because of too few sensors in the spatial domain.
On the other hand, we examine the scenario where we vary the number of TEMs for a fixed spiking rate per machine in Fig. 7. We similarly set the spiking rate to different levels from top to bottom: 5 spikes per TEM, 9 spikes per TEM and 15 spikes per TEM. Here, the sufficient number of spikes per machine 2K 0 + 1 = 9 is the one that allows each machine to perfectly resolve its own input. We notice that the reconstruction error undergoes a significant decrease once the number of TEMs is such that the condition of Theorem 1 is satisfied. As was the case for Fig. 6, the threshold at which this decrease occurs does not depend on the total number of constraints (in green) but rather on the number of linearly independent constraints.
We can draw a similar conclusion to that drawn for Fig. 6: increasing the number of spikes per TEM beyond a certain point is not helpful and it is generally more beneficial to have more TEMs or sensors that spike less frequently.

C. Coupling of Spatial and Temporal Resolution: Intuition and Consequences
In a nutshell, the theory developed and experiments conducted all indicate that, if one would like to increase resolution, whether spatial or temporal, it is better to increase spatial sampling density. Increasing spatial sampling density is always useful, unlike increasing the number of spikes per machine.
In fact, a TEM can only output as much information as it receives, so if a TEM perfectly characterizes its own input using 15 spikes, there is no point in generating 20, 30 or 40 spikes.
On the other hand, increased spatial sampling can aid spatial and temporal resolution because TEMs located at different locations will almost surely spike at different times because they either have different inputs or different initial conditions [19], or both. We assume the video has 9 × 9 × 9 Fourier series coefficients that we wish to recover. The first row shows the evolution of the error as number of spikes increases for 9 × 15 uniformly spaced TEMs . The second row shows the evolution of the error for 9 × 9 uniformly spaced TEMs placed according to sufficient uniform gridding. The third row shows the evolution of the error for 9 × 5 uniformly spaced TEMs. For each plot, the dashed green lines mark the number of spikes per machine starting which we have more constraints than unknowns, not accounting for linear independence. The vertical orange line marks the threshold provided by Theorem 1 and sets the number of spikes per TEM starting which we have more linearly independent constraints than unknowns.
This particular characteristic is not met by standard, framebased video recordings where all pixels record information at the same time. Unfortunately, when all information is recorded at the same time, any information obtained from oversampling in the spatial domain is redundant rather than contributing to better resolution in the time domain as is the case when times are asynchronous.
In practice, this means two things: (1) TEMs or event-based sensors that have a limited spiking rate can be compensated for by simply having more sensors in space and (2) it is better to increase sampling capacity in the spatial domain when performing time encoding because this can improve both spatial and temporal resolution. We assume the video has 9×9×9 Fourier series coefficients that we wish to recover. The first row shows the evolution of the error as number of spikes increases for 5 spikes emitted per machine. The second row shows the evolution of the error for 9 spikes emitted per TEM which matches the sufficient rate starting which each TEM can perfectly reconstruct its input. The third row shows the evolution of the error for 15 spikes per TEM. For each plot, the dashed green line marks the number of TEMs starting which we have more constraints than unknowns, not accounting for linear independence. The vertical orange line instead marks the threshold provided by Theorem 1 and marks the number of TEMs starting which we have more linearly independent constraints than unknowns.

A. Problem Formulation and Algorithm
We revisit the setup exposed in Section III. So far, we have assumed that we are given the time encodings of a collection of signals y (i) (t) with a low dimensional structure which we can reach by a known linear transformation A ∈ R I×J and that we are asked to reconstruct the inputs y (i) (t). While this is a useful model in itself, we are also interested in studying the case where the linear transform A is unknown.
Once again, we assume we have the time encodings of a collection of signals y (i) (t) which satisfy assumptions (A4), (A5) and (A6). Furthermore, we assume the functions f k (t) of y (i) (t) satisfy either of (A7.a) or (A7.b) and that the linear transformation A is unknown as in (A8.b).
We wish to recover the signals y (i) (t), i = 1...I from their time encoding, with as few samples as possible.
To do so, we aim to reconstruct the coefficients of the parametric representation of y(t), c k0,k1,k2 as defined in (A4). These coefficients are placed in the matrix C(y), with row i containing the coefficients of signal y (i) (t). We note once more that C(y) can be written: where A ∈ R I×J , C(x) ∈ R J×K , J < I and J is known.
In words, C(y) is a matrix which has a low rank matrix decomposition with a known rank.
The matrix C(y) is probed using a sensing operator which we will call S. The sensing operator performs the measurements in (11), i.e.
where we index a pair (i, ) by n.
Given this measurement setup, we can adopt the Singular Value Projection approach to recover the matrix C(y) from few measurements [23].
The Singular Value Projection (SVP) algorithm alternately applies the low-rank constraint and the measurement constraint on the matrix of interest C(y). In Algorithm 1 we let X t be the estimate at iteration t of the target matrix to reconstruct (in our case this is C(y)) and Y t be a proxy matrix to perform the iterations.
1: X 0 = 0 and t = 0 2: repeat 3: The SVP algorithm is based on projected gradient descent. Reconstruction guarantees for this algorithm were initially established in cases where the sensing operator satisfies the Restricted Isometry Property [23]- [25]. This property does not hold in our case, given that our measurement operators have rank one. The rank one scenario has been treated in [26] where Gaussianity assumptions are made on the measurement operators. Again, these assumptions do not hold for our case and we leave the theoretical analysis of convergence for future work. We do, however, illustrate the utility of our approach with simulations in the next section.

B. Simulations
We provide simulation results to evaluate the reconstruction performance in different regimes. We consider the scenario where we are time encoding and reconstructing twenty signals that are composed of 25 sinc functions at known locations and that can be written as linear combinations of two such signals.
We evaluate the reconstruction performance that varies with the number of spikes of all machines increase uniformly. We do this in the following cases: The red dashed line marks the perfect reconstruction condition assuming the transform to the low dimensional space is known, and the purple dashed line marks the perfect reconstruction condition assuming there is no lower dimensional representation of the signals. We show the median and quartiles of the reconstruction error for 25 random trials, when assuming the signals have no low dimensional structure, when assuming they have a low dimensional structure with a known linear mapping, and when they have a low dimensional structure with an unknown linear mapping.
(S1) when assuming the signals have no underlying low dimensional structure (S2) when assuming the signals have an underlying low dimensional representation which we can reach through a known linear transform A, and (S3) when the signals have an underlying low dimensional representation with an unknown mapping A. For each of these cases, we time encode and reconstructing all twenty signals and compute the obtained normalized meansquared error for the first signal among the twenty, assuming a random mapping to low dimensional space A. Then we plot the median and quartiles of the mean-squared error on a log plot to compare performance. Results are included in Fig. 8.
Note that, if we assume no underlying low dimensional structure (S1), the signals should be reconstructible assuming there are I × K linearly independent constraints. In this case, since the number of spikes of all machines increase uniformly, we will need I × K = 500 spikes. As for the scenario (S2), according to Theorem 1, the signals should be reconstructible assuming that there are J ×K linearly independent constraints. As before, this means we would need J × K = 50 spikes.
We draw each of these conditions in Fig. 8 to see if the performance is consistent with our expectations.
Notice that assuming we know a transformation A to a low-dimensional space (S2) greatly improves reconstructibility compared to when we assume that there is no low rank structure for the input: the error decays much earlier in the first case than it does in the second case.
Assuming such a transformation exists but that we do not know it (S3), also offers benefit. While the reconstruction algorithm can be quite unstable in regimes where the number of spikes is not sufficient, it can yield a very good reconstruction for a higher number of spikes, where the scenario (S1) fails entirely.

VIII. CONCLUSION
We have shown how time encoding can be used to encode and reconstruct multiple signals that have lower-dimensional representations.
The general case can be treated by reformulating our problem as a rank-one matrix measurement problem: we have shown that signals that have a known lower dimensional representation require fewer spikes for perfect reconstruction than if this lower dimensional representation did not exist.
Time encoding videos can then be rewritten as a special case of low-rank signal estimation. As a consequence, we show through theory and experiments that, if one wishes to increase spatial or temporal resolution, it is better to sample densely in space than to have TEMs emit more spikes. More practically, in the case of an event-based camera, it is better to have more pixels that fire asynchronously than to have pixels that fire more often.
Finally we have also examined the case where the signals of interest are low rank but we do not know the transformation to the low rank space. We applied low rank factorization algorithms and found significant experimental improvements compared to the case where no low rank structure is assumed.
In future work, we would like to further investigate low rank factorization within the time encoding setup and understand how it can be used to encode multi-dimensional data with a different structure to that presented in the paper.

APPENDIX A KNOWN LOW DIMENSIONAL MAPPING -ELABORATION
AND PROOF OF THEOREM 1 To prove Theorem 1, we will use results about rank-one matrix measurements [21]. The work in [21] assumes that one is attempting to reconstruct a matrix C using measurements of the form: and rewrites the measurements as b n = vec(g n h n T ) T vec(C).
Note that we adopted a change of notation with respect to [21] to avoid confusion.
The results of [21] then hold under two further assumptions. (A9) h n can be parametrized by one variable t ∈ R. More precisely, we assume the k-th entry of h n has the form [h n ] k = h k (t n ) where h k : I → R, k = 0, · · · , K − 1 are linearly independent functions from a linear space of fucntions F, I ∈ R is an interval or the whole real line and t n ∈ I, n = 0, ..N −1 are sampling times. Moreover, we assume that the sampling times (t 0 , ...t N −1 ) follow a continuous probability distribution on I N and that for every non-zero element h ∈ F, the set of zeros of h has Lebesgue measure (λ) equal to zero: λ({t|f (t) = 0}) = 0. (A10) The vectors g n are taken from a set A, where every J elements of A are linearly independent. As a result, a uniqueness condition can be obtained.
Theorem 3 (Pacholska '20). Consider the set of KJ vectors of the form vec(g n h T n ). It is a basis in R KJ if and only if no more than K vectors g n are equal.
We are able to rewrite our problem as a rank-one measurement problem by letting each index n denote a pair ( , i), letting b n denote the integral t (i) t0 y (i) (u) du, letting g n denote rows of the matrix A and letting [h n ] k denote the integral We will use the following lemmas to prove Theorem 1.
Proof: This closely follows the proof in [21]. Proof: This follows by construction of the f k 's which are linearly independent, leading to their integrals being linearly independent. The second part of the lemma follows from the properties of bandlimited functions.
We can now prove Theorem 1 and Corollary 2.
Proof of Theorem 1: We will assume that we operate under the assumptions set out in Theorem 1.
We start by showing that different constraints imposed by the time encoding machines can be written as in (21). In fact, two consecutive spike times t (i) and t (i) +1 from a machine TEM (i) impose a constraint on the integral of the concerned signal: We define Y (i) (t) = t t0 y (i) (u) du to be the integral of the signal y (i) (t) between t 0 and any later time t. Given (23), we can compute Y (i) (t (i) ) for any spike time t (i) : We define this quantity to be b (i) := Y (i) (t (i) ) and denote the function F k (t) = t t0 f k (u) du. We then rewrite the right-hand side of 24 in terms of the parametrization of y (i) (u): Where we defined F(t (i) ) to be the vector of integrals F k (t (i) ) for k = 1...K. We also defined C(y) to be the matrix of coefficients c i,k (y) as defined in (A4) and [C(y)] i is the i th row containing the coefficients for signal y (i) (t).
We further rewrite C(y) = AC(x) (from (A5)) and obtain [C(y)] i = [A] i C(x). We thus obtain: We can thus reindex the above equations: we let every n correspond to a single pair ( , i) and let b n = b (i) , g n = [A] i and h n = F(t (i) ) .
We can now see that the vectors h n can be parametrized by one variable t ∈ R using a set of function h k which satisfy assumption (A9), as stated by Lemma 3. Moreover, according to Lemma 2, the spike times follow a continuous probability distribution, as required in assumption (A9).
We can also see that the vectors g n just defined satisfy assumption (A10) by construction since this is a condition in Theorem 1.
Then, we note that under the conditions of Theorem 1, one can extract KJ constraints that satisfy the constraints of Theorem 3, thus ensuring perfect reconstruction of the matrix of parameters C(x).
Proof of Corollary 2: Using similar notation used for the Proof of Theorem 1, we note that the value b (i) is not known, when the initial integrator values ζ (i) 0 are not known, instead, we know the value of Continuing in the same logic as before, we letb n =b (i) , g n = [A] i and h n = F(t (i) ) .
We then obtaiñ b n = vec(g n h n T ) T vec(C(x)) + ζ To keep things in matrix form, we first denote G n = vec(g n h n T ) T and then denote the row vectorG n = [G n , e in ] where e in is a length-J row vector with a 1 in the column corresponding to the machine that generated measurement n and zeros otherwise. We also denote the column vector C(x) = [vec(C(x)) T , ζ (1) 0 , ζ (2) 0 , · · · , ζ (J) 0 ] T . The measurement therefore satisfies b n =G nC (x). (29) For the system to be invertible we need J × (K + 1) of the vectorsG n to be linearly independent.
If we satisfy the condition set by Corollary 2 in (9), we also satisfy the condition set by Theorem 1 in (8). This means that there are JK rows G n of G that are linearly independent. Now let us consider the extensionG, the corresponding JK rows from G will still be linearly independent (otherwise we reach a contradiction).
According to the assumptions of the corollary,G has a K + 1 st set of J rows, each row coming from a TEM (j) . Let g j be the K + 1 st row coming from TEM (j) . The right part of the vector will be comprised of e j as is the case for all the other vectors coming from machine TEM (j) . Therefore, this row is linearly independent of all rows coming from all other machines.
The only remaining question is whether g j is independent of the first K rows coming from the same TEM (j) . Because the right part e j of the vectors are all the same, the K +1 st element can only be a linear combination of the other K elements if the linear combination coefficients all sum to 1. The latter case does not occur, with probability 1, thus showing that every K + 1 st constraint of a TEM is linearly independent of the other K constraints of the machine, and showing that one can find J × (K + 1) inG that are independent, thus concluding our proof.