Maximum Likelihood Estimation of Time-Varying Sparsity Level for Dynamic Sparse Signals

In the field of Compressed Sensing, estimation of sparsity level is very essential as the sparsity level determines the minimum number of (i) measurements to be obtained of a sparse signal during acquisition and (ii) iterations to be performed for many of the greedy techniques for the perfect recovery of the sparse signal from the obtained measurements. In this paper, we propose a Maximum Likelihood (ML) estimator to estimate the instantaneous sparsity level during acquisition and an ML sequence (MLS) estimator of sparsity levels during recovery. As the sparsity level varies in time due to the continuous birth of newer supporting components and death of existing supporting components, this paper models the sparsity level variation as a stochastic birth-death process. The real-world applications of the proposed estimators are presented on the compression of aircraft vibration signals and the estimation of wireless channels. The simulation results on real-world and model-generated data demonstrate the performance merits of the proposed estimators compared to other existing methods.


I. INTRODUCTION
In recent years, Compressed Sensing (CS) [1]- [5] has emerged as a powerful technique for acquiring highdimensional sparse signals with fewer measurements than what is dictated by classical Shannon-Nyquist theory. An N −dimensional sparse signal is said to be k−sparse when it is composed of k N active (or supporting or significant) components belonging to a certain class of orthogonal basis such as the Fourier, cosine, wavelets, etc. The collection of indices of those active components forms the support S and the cardinality of the set S is the sparsity level k.
In the CS acquisition process, m samples or measurements are obtained using m × N −dimensional random or deterministic sensing basis, where m < N , and m ≥ ck log(N /k) for some constant c [4], [5]. In the CS recovery process, the N −dimensional sparse signal is reconstructed from those m CS measurements using either convex relaxation based algorithms [6]- [8] or greedy techniques [9], [10] The associate editor coordinating the review of this manuscript and approving it for publication was Qingchun Chen . with the prior knowledge of the sensing basis involved during acquisition. The basic theoretical concepts underlying CS can be found in [11].

A. NEED OF SPARSITY LEVEL ESTIMATION
The objective of CS is to minimize the number of measurements during acquisition and the reconstruction error during recovery. As the sparsity level dictates the number of measurements during acquisition and the number of iterations for many of the greedy techniques during recovery, the efficiency of acquisition and recovery hinges on the knowledge of the sparsity level. Researchers have thus far assumed that the sparsity level is known beforehand [12]- [15] and is timeinvariant [16]. In many practical scenarios, sparse signals are characterized by the time-varying sparsity level whose value is seldom known a priori [17]- [20]. The wireless channel with time-varying tap coefficients and distribution (tap distribution is the support, whereas the number of tap coefficients is the sparsity level) is a typical example of a real-life sparse system. While applying CS to such systems, it is essential to know the time-varying sparsity level during acquisition and recovery. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ A practical CS acquisition system obtains each measurement using a separate Integrator and Dump (I&D) circuit, which multiplies the input sparse signal with an independent sensing basis (typically either a Gaussian or a Bernoulli signal) and integrates it for a fixed duration. There are m such separate I&D circuits working in parallel to obtain m discrete measurements at every time instant. Minimizing the number of measurements amounts to minimizing the number of I&D circuits. Hence instantaneous estimation of sparsity level becomes essential to optimally utilize the hardware resources during acquisition.
The discrete CS measurements are generally stored either for off-line CS recovery as in the case of Magnetic Resonance Imaging (MRI) or near real-time CS recovery as in the case of wireless channel estimation. As the mandate of the CS recovery hardware is to minimize the recovery error, the instantaneous sparsity levels estimated from the obtained measurements need to be refined.

B. RELATED LITERATURE
Recent approaches to sparsity level estimation [21]- [23] require additional measurements to estimate the sparsity level, where those measurements are not useful for the recovery of the sparse signal. Sparse Gaussian Sensing Matrix (GSM) based sparsity level estimation is given in [24], [25] where the knowledge of signal statistics is required for constructing the sparse GSM, that is seldom known a priori. In the context of cognitive radio spectrum sensing, sparsity level estimation using the Monte Carlo simulations is given in [26], and computationally intensive eigenvalue method is presented in [27] assuming that the sparsity level is timeinvariant. The estimation based on the trace of the covariance matrix of the measurements is given in [28] assuming that the signal statistics are known a priori.
There exist literature on tracking of dynamic sparse signals [17], [20] which assumes that sparse signals are slow-varying and the amplitude of each supporting component varies according to a predefined Gauss-Markov model exhibiting temporal correlation. They recover sparse signals by estimating the support and do not involve sparsity level estimation. Kalman filtering on CS recovery is proposed in [17] where support is estimated using convex optimization based CS recovery. Dynamic Compressed Sensing Approximate Message Passing (DCS-AMP) based tracking is proposed in [20], where in addition to amplitude modeling, the support is modeled using the Discrete Markov process. These support estimation techniques require a longer execution time and are suitable only during recovery and not during CS acquisition.
Recently, the Deterministic Binary Block Diagonal (DBBD) matrix-based sensing [29] and Kronecker-based recovery [30] are developed for acquiring sparse signals. Though the sensing matrix has deterministic binary entries and reduces the hardware complexity, the recovery performance is degraded under noisy settings due to the structure of DBBD. As each row of the DBBD matrix has no overlapping ones with any other rows and accumulates the neighboring significant components' energy, the support estimation of the underlying sparse signals becomes difficult.
Considering the above, there remain issues in sparsity level estimation such as (i) doing away with or minimizing the additional measurements [21]- [23], (ii) overcoming the assumption that the knowledge on sparse signal statistics [24], [25], [28] is available a priori, and (iii) adapting to the time-varying signal statistics. In addition to this, practical implementation aspects of sparsity level estimators in the CS acquisition and recovery systems are not being addressed in the literature thus far, to the extent of our knowledge. We present practically implementable sparsity level estimators that track the time-varying sparsity level using the same set of measurements acquired for use during recovery. We estimate the signal statistics on the fly from the obtained measurements and adapt our CS acquisition system according to the instantaneous signal statistics and sparsity level. The proposed CS recovery system performs better by exploiting the temporal characteristics of sparsity level variation.

C. KEY CONTRIBUTIONS
As the sparse signal varies in time, there are inclusions of newer basis functions or deletions of the existing basis functions. These inclusions/deletions of basis functions, i.e., births/deaths of the supporting components, result in the variation in the sparsity level lending itself to be modeled using a birth-death process. Hence the time-varying sparsity level is modeled as a discrete-valued Markov birth-death process and is used for the sparsity level estimation.
We use a composite sensing basis to acquire sparse signals where the first few measurements are obtained using a deterministic sparse Binary basis and the rest using a random Gaussian basis. All the acquired measurements are used for recovering the sparse signal. Here the Binary basis is used for estimating the statistics of sparse signal, and the number of measurements obtained using Binary basis is limited due to its weaker isometry bounds [1]. The knowledge of statistics thus obtained is used to estimate the instantaneous sparsity level with the help of measurements obtained using a random Gaussian basis. Thus no additional measurements are required for estimating the sparsity level.
During CS recovery, real-time recovery of sparse signals is not the mandate; whereas, minimizing the recovery error is. Hence we propose a sequence estimation method that exploits the birth-death model mentioned above to output the most probable sequence of sparsity level estimates for use during near real-time recovery. The following is the summary of our contributions: 1) Modeling of time-varying sparsity level using a generalized discrete Markov birth-death process and estimating the model parameters. 2) Statistical characterization of the sparsity level variation using the concept of survival time. 3) Obtaining the Maximum Likelihood (ML) estimator of instantaneous sparsity level, estimating signal and noise statistics required for the ML estimator, and deriving the performance bounds for the ML estimator. 4) Refining the sequence of ML estimates by exploiting its Markovian property to obtain the most probable sequence of sparsity level estimates. 5) Implementation aspects of sparsity level estimator in CS acquisition and recovery systems.
The proposed method is shown to have improved performance compared to other existing methods in terms of the standard measures through both analysis and simulations. In essence, the paper obtains the ML estimate of time-varying sparsity level during acquisition and refines it using the Viterbi algorithm during recovery. Since the proposed technique makes no assumptions regarding the signal or noise statistics, it morphs into a joint estimation problem, as detailed in the subsequent sections.
The rest of the paper is structured as follows. Section II presents the CS acquisition model and introduces the Markov birth-death framework for modeling the time-varying sparsity level. Section III derives the sparsity level estimator along with the estimators for signal and noise statistics. Further refinement of the ML estimates using sequence estimation procedure for use during CS recovery is presented in Section IV. The practical implementation aspects of the proposed CS acquisition system aiding the ML estimation and Viterbi algorithm for MLS estimation in the recovery system are given in Section V. Simulation results comparing the recovery performance of the proposed ML-MLS algorithm against the existing methods are presented in Section VI. Real-world applications of the proposed estimators on compression of aircraft vibration signal and estimation of a wireless channel are also presented in Section VI.
The operators commonly used in this paper are . p , Pr[.], E{.}, and VAR{.} referring to p norm, probability, expectation, and variance, respectively. The operators . , . , and . , round the argument to nearest integer, greatest preceding integer and least succeeding integer, respectively. The notations R and Z indicate domain of reals and integers, respectively. The notation O represents the order of complexity. The notation x ∼ N (µ, σ 2 ) indicates x is a Gaussian random variable having mean µ and variance σ 2 whereas,

II. MODELING OF CS ACQUISITION SYSTEM AND TIME-VARYING SPARSITY LEVEL
Consider a continuous time dynamic sparse signal x(t) which has an N −dimensional sparse vector representation s(t) = {s j (t)} N j=1 = {s 1 (t), s 2 (t), . . . , s N (t)} using an orthogonal sparsifying basis set = {ψ j (t)} N j=1 at t th time as given below.
where s j (t) is the component or coefficient of the j th basis ψ j (t). The sparse representation vector s(t) has very few active components with time-varying support and amplitude and is given as, is a Bernoulli vector, and s j (t) =ś j (t) when κ j (t) = 1 and s j (t) = 0 when κ j (t) = 0.
where 1) φ i (t) is either a Gaussian signal or distributed impulses for a duration of T seconds and it repeats after T seconds. 2) ϑ i (nT ) ∼ N (0, σ 2 ϑ ) is the i th component of measurement noise vector ϑ(t). Without loss of generality, the CS acquisition model is represented in discrete domain now onwards for better understanding.

B. DISCRETE CS ACQUISITION MODEL
The discrete version of Equations (1),(2),(3) is given as, where continuous time t is substituted with discrete time step n. The discrete sparse signal x(n) = {x j (n)} N j=1 has N samples in every time step of duration T seconds and has a sparse representation s(n) = {s j (n)} N j=1 : s j (n) ∼ N (µ s j , σ 2 s j ) when projected onto an N × N orthogonal signal basis matrix . The discrete version of sensing basis is the m × N −dimensional when it is Gaussian distributed and φ ij ∈ {0, 1} when it is binomial distributed. Substituting Equation (4) in Equation (6), the discrete CS acquisition model becomes, We devise as a composite matrix such that where BSM is the deterministic sparse Binary Sensing sub-Matrix (BSM) and GSM is the random Gaussian Sensing sub-Matrix (GSM). Their multiplication with inverse of sparsifying basis i.e., −1 results in direct acquisition of sparse representation s(n). Thus Equation (7) becomes, The reason for devising as a composite matrix is to maximize the sparsity level estimation performance using BSM and recovery performance using GSM and is explained in Section III.

C. DISCRETE MARKOV BIRTH-DEATH MODELING OF SPARSITY LEVEL
Any real-world sparse signal has time-varying support due to inclusions (births) and deletions (deaths) of the signal basis functions, as shown in Fig. 1. In this figure, the sparse nature of the support spectrum of a vibration signal of an aircraft [31] using the Discrete Cosine Transform (DCT) basis is shown. The figure captures the time evolution of the active cosine harmonics using different colored tracks. Here it may be noted that there are (i) clusters of active basis functions centered about certain resonant frequencies, as shown in the zoomed portion of the plot, (ii) large spectral gaps between these clusters which reveal sparsity in the DCT domain, and (iii) inclusion and deletion of the DCT basis functions with time, as revealed by the discontinuous lines. Thus the support and its cardinality, i.e., the sparsity level, vary with time and can be modeled as a birth-death process.
The variation in the support is modeled by appropriately changing the entries of the Bernoulli vector κ(n) given in Equation (2). The value of difference (κ j (n) − κ j (n − 1)) represents the birth, or death, or survival of the j th component depending on whether it is +1, −1, or 0, respectively. Due to this difference, the sparsity level k at the n th time step is a generalized discrete Markov birth-death process and is given as, The probabilities Pr[(k(n) − k(n − 1)) = d] = p d : d > 0, d < 0, and d = 0 are associated with birth, death and survival of the supporting components. In particular, the probability that the sparsity level remains unchanged between two time steps p (d=0) = p 0 determines the degree of sparsity level variation.

D. STATISTICAL CHARACTERIZATION OF THE SPARSITY LEVEL VARIATION
The sparsity level variation can be either slow, moderate, fast, or rapid. The sparse signals exhibit slow and gradual variations in the sparsity level under steady-state conditions and rapid variations during transients and bursty behavior. The idea of survival time L is introduced here to statistically quantify the degree and duration of invariance of the sparsity level in time akin to similar concepts in the context of modeling time-varying wireless channels in digital communications. (11). As the probability that the sparsity level remains unchanged for l time steps is p l 0 (1 − p 0 ), the average time steps for which the sparsity level remains unchanged is p 0 /(1 − p 0 ). Thus the survival time L is,

Definition 1: Survival time L is the statistical average number of time steps for which the sparsity level is practically time-invariant and is given by Equation
Intuitively, larger survival time indicates slow variation and smaller survival time indicates rapid variation of the sparsity level. To exemplify, here we classify the sparsity level variation as (i) slow when 0.9 ≤ p 0 < 1 ( L ≥ 9), (ii) moderate when 0.8 ≤ p 0 < 0.9 (4 ≤ L < 10), (iii) fast when 0.6 ≤ p 0 < 0.8 (2 ≤ L < 5), and (iv) rapid when p 0 < 0.6 ( L ≈ 1), respectively.
An example simulation of time-varying sparsity level k(n) for four different values of p 0 is shown in Fig. 2. It is observed that, as p 0 decreases, the survival time L becomes smaller, and the rate of change of sparsity level increases, i.e., when , and (iv) p 0 = 0.5{ L = 1}, the sparsity level variations are slow, moderate, fast, and rapid, respectively.

III. MAXIMUM LIKELIHOOD (ML) SPARSITY LEVEL ESTIMATION
We derive an ML estimator to estimate the instantaneous sparsity level when the measurements are obtained using a composite matrix. For the sake of brevity, we ignore the time index n in this section. Each GSM measurement y i is given by, Gaussian and is independent of s j , ∀j, the mean and variance of y i are given as, is the mean energy of supporting components. Since each y i is sum of k random components, the probability density function (pdf) of y i follows a Normal distribution according to central limit theorem, i.e., Then the joint pdf of m independent random variables parameterised by k is given as, Using the above defined statistical properties of sparse representation and measurement vector, the following theorem defines the ML estimator for the sparsity level.
. Then a real-valued ML estimator K r of sparsity level k which maximizes the joint pdf of measurements parameterised by k and provides the estimate k r is given by Since the sparsity level k is an integer, an integer-valued ML estimate k ML for the sparsity level k can be obtained from K r by choosing one among the two integer valued estimates k r and k r which maximizes the pdf p Y (y; k) i.e., Equation (13) reveals that the ML estimator requires the knowledge of σ 2 s and σ 2 ϑ . The estimate of σ 2 ϑ i.e., σ 2 v is obtained from measurements as given in [32], The mean energy σ 2 s is not available a priori, and it has to be computed from the measurements. However, it is not possible to calculate the same from the statistics of GSM based measurements as the variance of each measurement is the same and is a product of both k and σ 2 s that are not available a priori. Thus the difficulty of estimating σ 2 s can be overcome by obtaining a few measurements initially using a sparse BSM, as explained below.

A. BSM BASED CS ACQUISITION AND ESTIMATION OF STATISTICS OF SUPPORTING COMPONENTS
The sparse BSM BSM has a fixed number of k b k ones in each row. Thus the probability of any randomly chosen element φ ij = 1 in any row of BSM is k b /N . Using Equation (9), each sparse BSM measurement y i is a random sum of supporting components corrupted by measurement noise, i.e., where S i is the support set of i th row of BSM and {S i S} contains the set of indices which are common to both S i and S. Since each s j ∼ N (µ s j , σ 2 s j ), every y i is a random sum of Gaussians. Consider a special case when none of the ones in i th row of BSM multiply with a supporting component in s, resulting in the measurement having contribution from the noise alone, i.e., y i = ϑ i . There exists a finite probability of obtaining such measurements, which can be used to estimate the statistics of supporting components.
Since there are k b ones in each row of the BSM at predetermined positions and k supporting components at randomly distributed over N possible locations, the random sum in Equation (16) consists of an average of τ s = k b k/N supporting components. The mean of the random sum is µ y = E{y i } = τ s µ s , where µ s = 1 k j∈S µ s j is the mean amplitude of a supporting component. Similarly the variance of random sum is σ 2 y = VAR{y i } = τ s σ 2 s + σ 2 ϑ . As every y i is a random sum of maximum k b supporting components, its probability density function (pdf) p Y (y i ) can be approximated as, Thus the estimate of statistics µ s and σ 2 s of the supporting components can be obtained from the BSM measurements, VOLUME 9, 2021 as given below.
where µ y and σ 2 y are computed from the m b number of BSM measurements as, Note that the estimate of σ 2 s depends on τ s = k b k/N , which in turn depends on the value of the sparsity level k that is to be estimated. However, τ s can be estimated using the following ingenious method.
Suppose P α is the probability that α number of ones in a row of the BSM multiplies with supporting components of s. Then P α follows a Poisson distribution for very small k b and it is given as, . The probability P 0 can be estimated by finding the proportion of measurements y i such that |y i | ≤ 3σ v covering 99% confidence interval. However, there exists a finite probability Q that for some measurements, the random sum of two or more supporting components would resemble the measurement noise i.e., Thus the estimate of P 0 is, (20) where m b is the total number of BSM measurements. Therefore, on computing Q, the required probability P 0 is estimated to compute τ s = − ln( P 0 ) and substituting in Equation (18) gives the estimates of statistics as, It may be noted that Q depends on p Y (y i ) which is a function of µ s and σ 2 s . Thus the Equations (19)- (22) are quite interdependent on each other. Therefore, they need to be recursively updated starting with the initial value Q = 0. In other words, the updated values of τ s , µ s and σ 2 s are used in (19) to compute Q 1 and the iteration proceeds through Equations (19)-(22) till τ s converges as given in Algorithm 1.

Begin iteration:
4.1 Compute the probability Q, a function of τ s , µ s , σ 2 s , where µ P 0 and σ 2 P 0 are the mean and variance of P 0 , respectively. The optimum value of k b that minimizes VAR{τ s } satisfies ∂VAR{τ s } ∂k b = 0, which results in, which upon simplification using the fact that k b N 1 gives, From Equation (26), it is observed that the optimum value of k b is different for different k. As the sparsity level k(n) for the current time step is not known, k b (n) is computed using the previous estimate k ML (n − 1).
The size of m b is determined such that BSM spans all the supporting components which are distributed across N positions for the purpose of estimating the statistics σ 2 s . Hence the deterministic BSM matrix is designed by stacking k b numbers of N k b × N k b Identity matrices horizontally. Thus each row of deterministic BSM has k b ones and there are N k b rows which spans all the components of sparse signal.
As the structure of BSM (i.e., k b ) changes according to k(n), the BSM has to be transmitted to the recovery process in every time step along with the acquired measurements for recovery. However, the deterministic construction of BSM avoids transmitting such overhead as recovery process also performs instantaneous sparsity level estimation from the obtained measurements as done during acquisition.
C. PROPERTIES OF k ML ESTIMATOR 1) Unbiased estimator.
2) The variance of the ML estimator k ML is, 3) The Cramer Rao Lower Bound (CRLB) for the integer parameter k is given by, It is observed from Equation (28) that, in the absence of measurement noise, the variance of ML estimator approaches the lower bound 2k 2 /m ML which determines the spread in the ML estimate. The spread in the ML estimate is reduced in the acquisition process when the measurement size m ML is increased, however, as the lower bound is directly proportional to k 2 , the measurement size m ML required to reduce the spread increases quadratically with the sparsity level k. For an efficient CS acquisition, the measurement size cannot be increased beyond the minimum number required for perfect CS recovery. Hence, with a limited number of measurements acquired for CS recovery, the ML estimator provides a ML estimate at every time step.
The number of measurements m(n) required for the perfect CS recovery of x(n) is given in [33] as m(n) ≥ 1.69 k(n) log( N k(n) ). Since k(n) is unknown, k(n − 1) is used in the place of it for determining m(n). To overcome the degradation in recovery performance due to the randomness of k(n − 1), we consider the number of GSM measurements (after excluding the BSM measurements) to be, Thus this estimator obtains sufficient measurements using BSM and GSM without any compromise on the recovery performance compared to other estimators given in [21], [22], where separate measurements are obtained just for sparsity level estimation that are not useful in the recovery process and entail additional costs during acquisition.

D. EXPERIMENTAL EVALUATION
This section presents the simulation results on ML estimation. A sparse signal of dimension N = 1024 with sparsity level k = 100 is generated. The SNR values considered are −5, 0 and 5 dB. The SNR is computed as 10 log(kσ 2 s /(mσ 2 ϑ )). For each SNR value, we considered 100 trials and the results are averaged. In every trial, though sparsity level k is maintained as 100, the support is randomly chosen. Also, in every trial, initially, 100 measurements are obtained using BSM for estimating the mean energy of supporting components, i.e., m b = 100. Figure 3 shows the ML estimates we obtained for different SNR conditions. For the SNR values 0 and 5 dB, the averaged sparsity level estimate converges to 100 ± 1 with nearly m ML = 125 measurements. Whereas for the SNR value of −5 dB, the effect of measurement noise is significant, and the estimator requires nearly m ML = 150 measurements for convergence. Figure 4 shows the comparison of recovery performance for the different number of measurements in a single trial under different SNR values. Normalized Recovery Error (NRE) = x−x 2 2 / x 2 2 , i.e., the ratio of the energy of the difference between the recovered and original signal to the energy of the original signal is shown. It is observed that NRE is initially higher due to the weaker isometry conditions of BSM based measurements. Then NRE reduces drastically as the number of GSM based measurements increases.

1) PERFORMANCE COMPARISON OF SPARSITY LEVEL ESTIMATORS
The performance of ML estimator is evaluated and compared with existing Lopes [22] and 2-GMM [24] methods in terms VOLUME 9, 2021  Observation: (i) (m b + m ML ) is less than the number of measurements required for Lopes and 2-GMM methods for a given SNR, and (ii) (m b + m ML ) < m = 1.69k log(N /k).

IV. IMPROVING SPARSITY LEVEL ESTIMATES DURING RECOVERY
During recovery, the ML estimates can be refined further by exploiting the Markovian property of sparsity level variation. This property fits into the framework of MLS estimation that is implemented using the classical Viterbi algorithm. Thus a two-step estimation, i.e., ML, as discussed in Section III followed by MLS (ML-MLS) estimation of the sequence of sparsity levels can be used during recovery.
The MLS estimates {k MLS (n)} L n=1 are obtained by identifying an optimal L length sequence of sparsity levels which has the maximum probability of occurrence under the condition that the ML estimates of sparsity levels {k ML (n)} L n=1 are available, as given below in Equation (29). Before implementing Equation (29), it is required to filter the ML estimates to remove outliers, if any, and to reduce the spread of values in the sequence of the ML estimates { k ML (n)} L n=1 . As the sparsity level remains constant on an average of L Survival time steps, a simple L−tap moving average (MA) filter is considered for prefiltering the ML estimates. The filtered estimate is then rounded to the nearest integer. The effect of filtering the ML estimates using MA is shown in Fig. 6a and Fig. 6b for two different sequences of sparsity levels with slow and fast variations, respectively. The ML estimates of slow and fast varying sparsity levels are filtered using L = 10 and L = 3 taps, respectively. These filtered ML estimates are then refined using the Viterbi algorithm. For the sake of maintaining brevity of the notations, the filtered estimates are also denoted as k ML in this and subsequent sections.

A. VITERBI ALGORITHM
The MLS estimation using the Viterbi algorithm is similar to estimating the most probable sequence of L hidden states from the sequence of L observations emitted by those hidden states in the hidden Markov problem. Here, the MLS and filtered ML estimates are the hidden states and the observations, respectively. Since hidden states are the refined versions of observations in this hidden Markov problem, the transition probability values of hidden states and the emission probability values of observations from those hidden states are the same.
The MLS estimatek MLS (L) at L th instant is obtained by maximizing the probability q v L , which is the probability that the sequence {k MLS (n)} L−1 n=1 accounts for first L − 1 refined estimates of {k ML (n)} L n=1 andk MLS (L) = v at L th instant given Considering L (v) = arg max u q u L−1 p d , the MLS estimates at the time steps n = L − 1, L − 2, . . . , 1 can be computed using trace-back technique, i.e.,k MLS (n) = n+1 (k MLS (n + 1)).

B. ML ESTIMATION OF MARKOV MODEL PARAMETERS
As the probability p d is not available a priori for the MLS estimation, it is estimated using the ML principle [34], from the filtered ML estimates. Considering the sequence {k ML (n)} L n=1 , the ML estimated transition probabilityp d is given as,

C. SENSITIVITY OF MLS ESTIMATION
The errors in the input sequence {k ML (n)} L n=1 to the MLS estimator affect the estimate of p d , leading to the model mismatch. To demonstrate the sensitivity of the MLS estimator, consider the simulation of the discrete Markov model where the birth and death transition probabilities are considered equal and kept constant throughout the simulation. For each p 0 , a sequence of 128 sparsity levels is generated and perturbed by introducing errors at random locations to mimic the errors in the ML estimate after filtering. The probabilities p <0 = d<0 p d , p 0 , and p >0 = d>0 p d are estimated using Equation (32), and the Viterbi estimator is applied on the perturbed sequence. For each p 0 , the number of errors introduced is 10, 20, 30, 40, and 50. The estimate of transition probabilities for each p 0 , after the introduction of errors, is shown in Fig. 7. The robustness of MLS estimation method in terms of the number of sparsity level errors before and after the MLS estimation is shown in Fig. 8 for the different values of model parameters.

1) OBSERVATIONS
1) For the slow varying case (p 0 = 0.98, p <0 = 0.01, and p >0 = 0.01), when the errors in ML estimates are less, the errors in the estimated transition probabilities (p <0 ,p >0 , andp 0 ) are also less, as shown in Fig. 7a, and the MLS estimation is robust enough to correct most of the errors, as shown in Fig. 8. However, when the errors in ML estimates are large and bursty in nature, the errors in the estimated transition probabilities are large, as shown in Fig. 7a, and the performance of Viterbi algorithm deteriorates and exhibits poor performance as shown in Fig. 8.
2) The worst-case scenario is for the rapid varying case i.e., when p 0 = 0.33, p <0 = 0.33, and p >0 = 0.34. Here the errors in the ML estimates of sparsity level are indistinguishable from that of natural variation in the sparsity level, and the ML estimated transition probabilities are independent of the number of errors introduced as shown in Fig. 7b. Therefore, the Viterbi estimator cannot recognize those errors, and hence the correction is reduced, as shown in Fig. 8.
Thus the error in the estimate of transition probabilities harms the MLS output only if many ML estimates are in error or if the rate of change of sparsity level is very high. It may be observed from Fig. 7a-7b that when no errors are introduced in the sparsity level, the estimate of transition probabilities is converging to the true value of transition probabilities.
The proposed two-step ML-MLS estimation technique is summarized and is given as Algorithm 2.

V. PRACTICAL IMPLEMENTATION ASPECTS
This section covers the implementation aspects of the CS acquisition system proposed for ML estimation of sparsity level and Viterbi algorithm based MLS estimation for the CS recovery system.

A. IMPLEMENTATION OF CS ACQUISITION SYSTEM
The practical real-time CS acquisition hardware for the proposed estimation method is shown in Fig. 9. There are m identical and independent modulator circuits working in parallel. The hardware components of a modulator circuit are (i) a multiplexer to select between Gaussian basis g(t) (continuous-time version of rows of GSM) and Bernoulli basis b(t) (continuous-time version of rows of BSM) and (ii) a multiplier and an I&D circuit to multiply the sparse signal with measurement basis for a duration of T seconds to output a measurement. The select signal β i (t) of the multiplexer takes the value 0 for i ≤ m b to select the Bernoulli basis and 1 for m b < i ≤ m to select Gaussian basis.

B. IMPLEMENTATION OF VITERBI ALGORITHM IN CS RECOVERY SYSTEM
The main component of sparsity level estimator in the CS recovery system is the Viterbi algorithm. Here sliding window based Viterbi algorithm is considered. There exist two parallel sliding windows and each window requires a Viterbi  It may be noted that for fast and rapid varying sparsity levels, the Viterbi algorithm has poor performance in correcting the errors introduced by the filtered estimatesk ML as shown in Fig. 8. Thus the proposed Viterbi algorithm based MLS estimation technique is suitable for slow and moderate varying sparsity levels to provide better MLS estimates with less computational complexity. If the rate of sparsity level variation is higher than that of the estimation speed, then the filtered ML estimates are retained as the best estimates without executing the Viterbi algorithm.

VI. PERFORMANCE EVALUATION FOR THE PROPOSED ML-MLS ESTIMATION
The performance of the proposed estimator is evaluated by applying it to model-generated data and real-world signals. The performance measures such as Mean Recovery Error (MRE) = |(k − k)/k|, NRE, and the computational complexity are compared between the proposed and other sparsity level estimators described in [22], [24] and [28] as well as support estimators (size of estimated support gives sparsity level) such as [20], [30], and [17], and an Oracle CS estimator which is aware of signal and noise statistics for the support recovery.

A. COMPARISON ON COMPUTATIONAL COMPLEXITY
The performance of ML-MLS estimation technique is compared in terms of the computational complexity with Lopes method [22], 2-GMM method [24], and Trace based method [28] where sparsity level is estimated, and DCS-AMP method [20], DBBD-Kronecker [30], and KFCS method [17] where support is estimated which gives knowledge on the sparsity level for a given m number of measurements, as follows.
The proposed estimator requires m = m b + m ML measurements per time step, where m b measurements are for estimating the mean energy of the supporting components, and m ML measurements are for estimating the sparsity level. The computational complexity for estimating the instantaneous sparsity level using m measurements is O(m).
At any time step, the Viterbi algorithm performs the MLS estimation over a block of L length whose computational complexity is O(D 2 R L), where D R = k max − k min + 1 is the dynamic range of sparsity level, k min and k max are the minimum and maximum sparsity level, respectively in the L length block. Thus the overall computational complexity for the proposed ML-MLS method is The computational complexity of other methods are given in Table 1, where the remarks explain the reason for their computational complexity. The proposed ML-MLS estimation technique demands an additional computational complexity of O(D 2 R L) which is lesser than O(mk 3 ) of KFCS and O(mN ) of DCS-AMP methods. Except for estimating the mean energy of supporting components, the ML-MLS technique does not involve any iterations. Though the DBBD-Kronecker method has a better computational complexity of support estimation, which is O(N −m) and is lesser than that of proposed ML-MLS, its recovery performance is very much affected due to the structure of DBBD.

B. PERFORMANCE EVALUATION ON MODEL-GENERATED DATA
A time-varying Gaussian sparse signal x(n) ∈ R 1024×1 with k(0) such that 64 ≤ k(0) ≤ 128, and supporting componentsś i (n) ∼ N (0, 2) is generated using the model parameter p <0 = 0.05, p 0 = 0.9, p >0 = 0.05, for 128 time steps. The input to the Viterbi Algorithm is filtered using L = 9−tap MA filter. With the estimated sparsity level, Orthogonal Matching Pursuit (OMP) [9] algorithm is run to find the supporting components and recover the sparse signalx(n). In every trial of the simulation, the number of obtained measurements m = 4k. The measurements are obtained for different SNR values. We considered 500 trials in the simulation, and the averaged MRE and NRE results are plotted in Fig. 11 and Fig. 12, respectively. It is observed that the proposed two-step estimation algorithm significantly outperforms the other methods in terms of MRE and NRE. It is natural to expect improved performance for the ML-MLS estimator, which is close to the oracle estimator because the instantaneous ML estimate is improved upon by exploiting the temporal correlation defined by the Markov model.
It is observed that the performance of the Lopes method [22] is inferior and considerably invariant to the SNR. This is because the Lopes method uses the random Cauchy sensing matrix for which the variance is infinite. As Trace-based method [28] is more suitable for estimating time-invariant sparsity level, its performance is poor. Though both the KFCS and DCS-AMP estimators (which exploit the temporal correlation of sparse signal) perform at par with the proposed ML-MLS estimator, and Oracle estimator, their computational complexity results in harder practical implementation.

C. PERFORMANCE EVALUATION ON REAL-WORLD DATA
The performance of the ML-MLS estimation method is evaluated here for the real-world applications on compression and recovery of sparse vibration signals and estimation of wireless channels.

1) APPLICATION ON COMPRESSION AND RECOVERY OF VIBRATION SIGNAL DATA OF AN AIRCRAFT
Real-world vibration signals were obtained from Mide Technologies [31]. The vibration signal acquired on the aircraft VOLUME 9, 2021    skin at a sampling rate of 2500 sps for 500 seconds during the aircraft's climb is analyzed here. The Time-Frequency spectrogram of that vibration signal is shown in Fig. 13, which reveals that (i) the vibration signal is compressible in the frequency domain as only a very few frequency components are significant at any given time step, and (ii) the number of significant frequency components varies in time.
In our analysis, the duration of 500 seconds is divided into 500 time steps of one second duration and DCT is applied on the vibration signal in every one second time step. The sparsity level k(n) is computed such that 95% energy of N = 2500−dimensional vibration signal is contained in k(n) number of the most significant DCT coefficients and it is plotted in Fig. 14. It is analyzed that the sparsity level variation is moderate and follows a discrete Markov model with transition probabilities d>0 p d = 0.08, d<0 p d = 0.11, and p 0 = 0.81. After obtaining the mean energy of the frequency components using BSM as discussed in Algorithm 1, the ML estimate of k(n) is computed at each time step using Equation (14) and the number of acquired measurements m(n) gives the compression ratio CR = N /m(n). On the recovery side, the MA filter length is kept as L = 5. The transition probabilities are estimated as d>0p d = 0.10, d<0p d = 0.14, andp 0 = 0.76 and the MLS estimation is performed using Viterbi algorithm. The ML and MLS estimates are plotted in Fig. 14 along with the actual sparsity level and it is verified that MLS estimation refines the ML estimates.
After the ML-MLS estimation, the vibration signal is recovered using the OMP recovery algorithm, which iterates fork MLS (n) number of times at each time step. The recovery performance of the proposed estimation followed by the OMP algorithm (ML-MLS-OMP) for the total duration of 500 seconds is shown in Fig. 15. In each time step, the recovery performance is computed in terms of NRE. A snapshot of the ML-MLS-OMP based CS recovered signal along with the original vibration signal is shown in Fig. 15a. The NRE performance of the proposed ML-MLS-OMP method with DCS-AMP and KFCS-based CS methods, and traditional DCT based compression method is presented in Fig. 15b. It is observed that the NRE performance of the ML-MLS-OMP method is comparable with that of the traditional DCT based compression method and at par with DCS-AMP and KFCS methods. For the DCT based compression method, the most significant coefficients and their locations are well known a priori resulting in a minimum 5% NRE. Whereas for  the ML-MLS-OMP based CS method, the most significant components and their locations are not available a priori or directly, and each measurement is affected by the insignificant DCT coefficients. This makes the NRE performance of the ML-MLS-OMP based CS method to have a slight degradation with respect to the DCT based compression method.
It may be noted that all vibration signals will not be sparse. For example, the vibration signal of a rocket during the transonic regime is almost a random signal rich in significant components making the signal non-sparse. Here, we do not consider such non-sparse signals.
The run-time complexity of the proposed ML-MLS method is evaluated and compared with other existing Sparsity Level Estimation (SLE) methods on a Windows 7 OS based PC having a processor running with 1.66 GHz clock speed. The average elapsed time for the SLE on analyzing each segment of the vibration signal is shown in Table 2.
The time elapsed by the proposed ML-MLS method is significantly less than that of other existing SLE methods.

2) APPLICATION ON ESTIMATION OF SPARSE IMPULSE RESPONSE OF WIRELESS CHANNELS
Consider the problem of estimating the wireless channel impulse response (CIR) from the model,  where y, X, h and ϑ are the received measurements, Toeplitz matrix of known pilots, CIR, and measurement noise, respectively. The above model becomes the CS acquisition model as the CIR h is sparse, and X is carefully chosen as a sensing matrix. As the minimum number of measurements obtained are dependent on the sparsity level of CIR, the sparsity level estimation is essential to reduce the number of pilots. Thus the performance of the proposed sparsity level estimator in terms of estimation accuracy is evaluated on a real-world measured CIR obtained from CRAWDAD data set [35]. The CIR was recorded using a 44-node wireless network in a standard office environment. There were 44 × 43 = 1892 pairwise links. The transmitter and receiver were moved and CIR measurements were taken multiple times between the nodes. A sample real and imaginary realization of CIR between nodes 1 and 2 are shown in Fig. 16. The time-varying sparsity level is shown in Fig. 17 by stacking the several realizations of CIR measurements. It is observed that sparsity level variation follows a discrete Markov process with transition probabilities d>0 p d = 0.46, d<0 p d = 0.43, and p 0 = 0.11 resulting in fast sparsity level variation. After obtaining the mean energy of the multipath components using Algorithm 1, the ML estimate of k(n) is computed for each realization using Equation (14). The MA filter length is kept as L = 3. As the sparsity level varies fast, the ML estimates of transition probabilities are computed as equally distributed i.e., d>0p d = 0.33, d<0p d = 0.33, andp 0 = 0.34, from the sequence of filtered ML estimates. The MLS estimation is performed using the Viterbi algorithm. The unfiltered ML estimates and refined MLS estimates are plotted in Fig. 17 along with the actual sparsity level, and it is verified that MLS estimation refines the unfiltered ML estimates.

VII. CONCLUSION
In this paper, novel sparsity level estimation techniques for the acquisition and recovery of time-varying sparse signals have been proposed using the ML principle. The ML estimator estimates the sparsity level with a minimum number of measurements without any additional cost or resources compared to any other estimator in the acquisition process. In the recovery process, a two-step sparsity level estimation is proposed, in which the first step is the ML estimation, and the second step is the MLS estimation. The MLS estimation technique filters the ML estimates and refines the sequence of filtered ML estimates using the Viterbi algorithm. The performance of the proposed two-step ML-MLS technique is comparable to that of an oracle estimator. It is an improved one compared to other published methods, even under high level of measurement noise.

VIII. FUTURE WORK
The present work focuses only on the sparsity level estimation for time-varying sparse signals. However, the problem of estimation and tracking of time-varying sparse signals is three dimensional, where time-varying (i) sparsity level, (ii) support, and (iii) amplitude have to be estimated simultaneously. Hence as future work, characterization of time-varying sparsity level and support together as a two dimensional discrete Markov birth-death model and derivation of corresponding ML algorithms can be considered.

APPENDIX: PROOF FOR THE PROPERTIES OF ML ESTIMATOR
1) The ML estimator is an unbiased estimator. Since k r = k ML + k δ , where k δ ∼ U{−0.5, 0.5} is the uniformly distributed round-off error. Thus, E{ k r } = E{ k ML } + E{k δ } = E{ k ML }. Since, the estimator k ML is an unbiased estimator. 2) The variance VAR of the ML estimator k ML is derived from the variance of k r and is given by, VAR{ k ML } = VAR{ k r } + VAR{k δ } − 2COV{ k r , k δ }.
3) For the integer parameter k, the CRLB can be derived by computing either second forward or second backward difference for the log likelihood of Equation (12). Let a = σ 2 φ σ 2 s and b = σ 2 ϑ . Then the joint probability distribution of m ML independent random variables {y i } m ML i=1 parameterised by k is given as, The second forward difference of log likelihood is given by, ln(p Y (y; k + 2)) − 2 ln(p Y (y; k + 1)) + ln(p Y (y; k)) 2a 2 (ak +b)(a(k +1)+b)(a(k +2)+b) .
Using the second forward difference of log likelihood, CRLB= −1 E{ln(p Y (y; k + 2)) − 2 ln(p Y (y; k + 1)) + ln(p Y (y; k))} = −m ML 2 −a 2 (a(k + 1) + b) 2   He is currently a Professor with the Department of Avionics, Indian Institute of Space Science and Technology, Thiruvananthapuram, India. His current research interests include complex networks, ad-hoc wireless networks, delay tolerant networks, nextgeneration wireless architectures, wireless sensor networks, wireless mesh networks, and signal processing over networks. He also serves as an Associate Editor for the IEEE SYSTEMS JOURNAL and Springer Nature Computer Science journal. He is a fellow of IE(I) and IETE. VOLUME 9, 2021