1-Bit Compressed Sensing via an L1-TV Regularization Method

This paper considers the 1-bit compressed sensing (1-bit CS) of signals that are simultaneously sparse and gradient sparse. Since L1-norm and total variation (TV) penalties are beneficial for reconstructing element-sparse and gradient-sparse signals, respectively, we combine both to propose the L1-TV regularization model for 1-bit CS. We show that the proposed model has a closed-form solution so that one can easily calculate. Despite the apparent simplicity, our theoretical analysis reveals that the solution provides an approximate estimate for the underlying signal. Besides, in the case of introducing a dither vector, we develop an adaptive algorithm to accelerate the decay rate of recovery error. The key idea is that generating the dither for each iteration relying on the last estimate. In addition to theoretical analysis, we conduct a series of experiments on both synthetic and real-world data to show the superiority of our algorithms.


I. INTRODUCTION
In the past decade, 1-bit CS [1], [2], [3], [4], [5], [6], [7] has drawn extensive attraction as a new paradigm in classic compressed sensing [8]. Instead of assuming infinite-precision real-valued measurements as in classic compressed sensing, 1-bit CS only retains the signs of real-valued measurements, leading to low-cost implementation in hardware and fast sampling speed. For the above reasons, it has a broad application prospect in massive multiple-input multiple-output systems [9], wireless sensor networks [10], synthetic aperture radar systems [11], and so on, where large-scale sparse data is usually involved.
Mathematically, 1-bit CS aims to recover the direction of the underlying sparse signal x ∈ R n (i.e., x/ x 2 ) from binary measurements The associate editor coordinating the review of this manuscript and approving it for publication was Frederico Guimarães .
where A ∈ R m×n denotes the sensing matrix and sign(·) applies componentwise on Ax, satisfying To this end, Jacques et al. [2] show that the solution to an L0-minimization problem provides a robust estimate of the original signal, where the L0-norm, u 0 , counts the number of non-zero entries in u ∈ R n . They also design a binary iterative hard threshold (BIHT) algorithm to solve the problem approximately. Plan and Vershynin [3] employ the L1-norm as sparsity promotion to propose a convex programming approach and provide a theoretical guarantee for it. After that, in an early work [7], we propose an Lp(0 < p < 1)minimization method and prove that this method requires fewer measurements than the L1-norm based counterpart. Unfortunately, the above-mentioned methods are computationally complex. They either involve iterative procedures or rely on some toolboxes for convex problems, e.g., CVX [12], resulting in low efficiency when processing high-dimensional VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ signals. To enhance the solving efficiency, Zhang et al. [4] suggest the following optimization model u := arg min It has the closed-form solution P λ (A T y)/ P λ (A T y) 2 where is the soft-thresholding operator with denoting a Hadamard product.
On the other hand, 1-bit measurements (1) lose all information on the magnitude of x (i.e., x 2 ). For full recovery (both direction and magnitude), we shall consider the thresholded binary measurements [5] where τ ∈ R m denotes a known dither vector. With this setting, References [5], [6], and [13] provide different selection strategies for τ . Specifically, Baraniuk et al. [6] show that an adaptive choice method of the thresholds used for quantization can significantly lower the error decay rate.
The above results rely on the sparsity prior assumption of the underlying signal, and they can be extended to signals synthesized by a linear combination of some atoms in a (redundant) dictionary with incoherent atoms [14]. However, there are numerous signals possessing other structure information that can not fall into the category in the aforementioned theoretical work. One representative example is the signal consisting of piecewise constants (i.e., its gradient is sparse), which arises widely in image restoration [15] and image reconstruction [16], [17]. Let Dx ∈ R n−1 denote the finite difference of x ∈ R n defined by [Dx] i = x i+1 − x i for i = 1, · · · , n − 1 where D ∈ R n−1×n is a differential matrix. In fact, D is extracted from the first n − 1 rows of the row circulant matrix of (−1, 1, 0, · · · , 0) ∈ R n . The piecewise constant property of x means that Dx is sparse. In this situation, to reconstruct x, one usually borrows the idea of the element-sparse case to solve min x Dx 1 subject to y = Ax, where A ∈ R m×n is a sensing matrix and Dx 1 is called the total variation of x, denoted as x TV . The method of TV-minimization has been analyzed in recent years [18], [19], [20] for compressed sensing of gradient-sparse signals and some optimization algorithms are designed to solve it, such as proximal gradient descent algorithm and projected subgradient algorithm [21].
Besides separately sparse or gradient sparse signals, there also exists a type of signals, involving both of the two properties simultaneously, such as Electrocardiogram (ECG) signals [22] and some slow time-varying signals [23]. In this situation, neither the L1-term penalty nor the TV-term penalty can comprehensively character both properties, which inspires us to combine them. An intuitive combination method is to use addition, namely solving the following problem where λ 1 , λ 2 > 0 denote penalty parameters. The model relates to the fused LASSO model in statistics and has obtained some applications in biological research, such as gene classification [24] and coefficient selection for the colon tumor data set [25]. It has also been used for the compressed sensing of some slow time-varying signals [23], exhibiting a great potential.
In this paper, we aim to introduce the L1-TV penalty model in 1-bit CS framework to solve the reconstruction problem of signals possessing both element sparse and gradient sparse properties. We employ − 1 m A T y, x as the loss function following the suggestion by [3] and combine it with the L1-penalty term and the TV-penalty term simultaneously as the regularizer, leading to the following optimization problem min where λ 1 and λ 2 are regularization parameters, whose value will be discussed later. We will provide the closed-form solution to (6) and prove that it is an approximate direction estimate of the underlying signal of both sparse and gradient sparse.
To our knowledge, in the literature, there are few related works in the field of 1-bit CS. Though References [26] and [27] have employed the L1-TV term penalty in their methods, they aim at the compressed sensing of block-sparse signals, leading their models to distinguish from ours. In addition, their methods are intuition-driven without providing theoretical analysis. The main contributions of this paper are listed bellow • This paper first proposes an L1-TV method for 1-bit CS and shows that the signal simultaneously being element sparse and gradient sparse can be approximately estimated by solving the proposed optimization problem.
• The proposed model has a closed-form solution, ensuring it can be easily obtained. Furthermore, we introduce a quantization scheme that chooses the dither vector adaptively, thus exponentially accelerating the recovery error decay rate.
• We conduct experiments on both synthetic and realworld datasets where the experimental results show that the proposed algorithms perform better than their counterparts. The rest of this paper is organized as follows. In Section II, we define some notations and give some important lemmas that will be utilized in the proof process of our main results. In Section III, we present the proposed methods with corresponding theoretical guarantees. Comparisons between different algorithms on synthetic data, ECG signals, and gray images are conducted in Section IV. Section V concludes the paper and presents some future research problems.
Appendix provides detailed proofs for the proposed lemmas and theorems.

II. NOTATIONS AND PRELIMINARY A. NOTATIONS
We begin with some necessary notations used throughout the paper. Scalars are denoted with lowercase or capital letters (e.g., x or X ), vectors with boldface lowercase letters (e.g., x), and matrices with boldface capital letters (e.g., X). We denote x i or (x) i as the i-th element of the vector x. The field of real numbers is denoted as R. S n−1 = {u ∈ R n : u 2 = 1} denotes the unit hypersphere in R n and B n = {u ∈ R n : u 2 ≤ 1} denotes the unit ball in R n . [n] denotes the index set {1, 2, · · · , n}. Throughout the paper, we denote A ∈ R m×n as the sensing matrix and a i as its i-th row where i ∈ [m]. We define the Lp norm of x ∈ R n as

B. PRELIMINARIES
In this part, we define some operations and provide some lemmas which may be used in the following sections. First, we consider the optimization problem As is known, (7) has the closed-form solutionx = P λ (x) where P λ (·) is defined in (3). Besides, for the TV-term penalty based optimization problem Condat [28] shows that T λ (x) can be exactly (numerically) computed by some dynamic programming approaches, such as the taut-string algorithm [28]. Throughout the paper, we keep that P λ (·) is defined as in (3) and T λ (·) is defined as in (8).
The operator P λ 1 (T λ 2 (·)) has the following property, which is proved in Appendix A.
Lemma 2: For any y ∈ R n , we have where x * := P λ 1 (T λ 2 (y)). In this paper, we need to solve the optimization problem of the form arg min where v ∈ R n is a known vector. Note that the following lemma shows that (10) has a closed-form solution, which is proved in Appendix A. Lemma 3: For any λ 1 , λ 2 ≥ 0, the solution u * to (10) is given by For the underlying signal x ∈ R n and each measurement a i (i ∈ [m]), following [3], we assume each sign measurement y i ∈ {−1, 1} is drawn independently at random satisfying where the function θ(·) may be unknown or unspecified, which automatically must satisfy 0 ≤ θ(z) ≤ 1. Suggested by [3], we define to capture the relation between x T a i and y i . Standard 1-bit measurement becomes a special case of the model (11) with θ(·) = sign(·). In this case, we have γ = E(|g|) = √ 2/π, achieving the maximum relation.

III. MAIN RESULTS
Since the method for direction recovery is the foundation of the method for full recovery, we first consider direction recovery from standard binary measurement given by model (1) and then extend it to full recovery from thresholded binary measurement given by model (4). Furthermore, we introduce an adaptive quantization scheme to improve recovery accuracy and reduce the number of 1-bit measurements.

A. DIRECTION RECOVERY
As indicated by Lemma 3, the problem in (6) has the closedform solutionx The following result implies that any signals in the set K s 1 ,s 2 ∩ S n−1 can be approximately recovered by solving the optimization problem (6).
Theorem 4: Assume with a probability at least 1 − e 1−t , the solutionx to the optimization problem (6) satisfies The proof of Theorem 4 is provided in Appendix B. Remark 5: For fixed t = log(n), Theorem 4 suggests a recovery accuracy ε ∈ (0, 1) provided m = O((s 1 + s 2 )log(n)/ε 4 ). This is the first theoretical result for the L1-TV solver in 1-bit CS. When λ 2 = 0, the optimization problem (6) reduces to the L1-norm based model proposed in [4] and Theorem 4 is also consistent with the theory result in [4]. When λ 1 = 0, the optimization problem (6) reduces to a TV-seminorm based model fitting to signals of gradient sparsity.

B. FULL RECOVERY
In this part, we consider full recovery based on the established recovery strategy of direction. To facilitate discussion, we first define the sets and always assume x ∈ V s 1 ,s 2 ( ), i.e., the underlying signal satisfies the prior x 2 ≤ . To achieve full recovery of x, we have to allow a well-chosen threshold vector in 1-bit measurements, i.e., where τ ∈ R m denotes the dither vector whose entries are generated from N (0, 2 ). Define A ∈ R m×(n+1) as A i,: = [A i,: ; τ i / ] and D ∈ R n×(n+1) as the differential matrix. Let x = [x; ] ∈ R n . Then the measurements (14) can be reshaped as Intuitively, both x 1 / x 2 and D x 1 / x 2 are bounded, and thus we can recover x from (15) by solving the established optimization problem (6). Indeed, the desired upper bounds on them are guaranteed by the following lemma proved in Appendix A.
to reconstruct the underlying signal x. Full recovery of x relies on the following lemma which combines direction recovery error and full recovery error. Lemma 7 (See [14]): For f , g ∈ R n ,f ,ĝ denote the vectors that are lifted by one dimension, taking the form of Lemma 7 indicates that f n+1 g/g n+1 is an approximate estimate of f provided the direction recovery error f / f 2 − g/ g 2 2 is small enough and f 2 , g 2 /|g n+1 | are above bounded. Inspired by this fact, the following theorem provides a method for the full recovery of signals with a theoretical guarantee.
Theorem 8: Consider > 0 and x ∈ V s 1 ,s 2 ( ). Let λ 1 and λ 2 be defined the same as Theorem 4 The proof of Theorem 8 is provided in Appendix B. Remark 9: Theorem 8 implies that x # /x # is an approximate estimate of the underlying signal x. For fixed t = log(n), the method also inherits the sample complexity O((s 1 + s 2 ) log(n)/ 4 ) with full recovery error relating to the prior magnitude upper . Via dividing Eq.(17) by , we have that Since is usually in the same order of x 2 , we conclude that the relative recovery error x # /x # − x 2 / x 2 turns to be above bounded by a quantity independent of .
From Theorem 8, the proposed method exhibits polynomial error decay in the oversampling factor µ = m/[(s 1 + s 2 ) log(n)], which sometimes may be unsatisfactory. Next, we aim to obtain an exponential decay. To this end, borrowing the idea from [6], we provide a new 1-bit quantization scheme that adaptively chooses the dither vector at each iteration. Before explaining this idea, we clarify some notations. We define the projection operator P K (·) as and let (y) denote the reconstruction operator that recovers the underlying signal from thresholded 1-bit measurements y by solving the problem (16). Moreover, we set x 0 = 0, for each k = {1, · · · , K } where K denotes the maximum number of iterations, and define For any x ∈ V s 1 ,s 2 ( ) and 0 < η ≤ 1 4 , we may choose to generate standard Gaussian measurement matrix A (1) ∈ R q×m and observe y (1) via y (1) = sign(A (1) x + τ (1) ) where (τ (1) ) i ∼ N (0, 1) for each i ∈ [q 1 ]. Then Theorem 8 implies that holds with a probability at least 1 − e 1−t . Since x ∈ V s 1 ,s 2 ( ) and x 1 is the best approximation to x − x 0 in the set V s 1 ,s 2 ( ), using triangle inequality indicates that Then we regard the signal x − x 1 ∈ U s 1 ,s 2 ∩ 2η B n := V s 1 ,s 2 (2η ) as the target signal to be reconstructed and conduct another q 2 measurements via y (2) = sign(A (2) where (τ (2) ) i ∼ 2η N (0, 1) for each i ∈ [q 2 ]. Utilizing Theorem 8 again indicates that holds with a high probability, provided q 2 ≥ Cη −4 γ −2 (s 1 + k 2 s 2 )(t + log(n)). A similar discussion to the proof of Eq. (19) shows that Repeating this process, in the l-th iteration, we have that holds with a high probability. Through the above discussion, it is not hard to find that the recovery error converges to 0 at an exponential rate with the increase in the number of iterations provided We summarize the quantization process in Algorithm 1 and the recovery process in Algorithm 2 to fit the common paradigm of compressed sensing. It should be noted that achieving the projection operation P V s 1 ,s 2 ( ) (·) is not easy. In practice, we may consider the convex set K s 1 ,s 2 and thus replace P V s 1 ,s 2 ( ) (·) by P K s 1 ,s 2 (·) so that it can be solved by existing convex optimization software, such as the CVX toolbox. Since R is in the same order of x 2 , this modification is acceptable though it relaxes the approximately sparse constraint to some extent. Moreover, for ease of applying, we usually fix q i = q, (i ∈ [L]) with a well-chosen q and thus m = Lq in practice.
The proposed adaptive quantization/recovery scheme is guaranteed by the following theorem which is proved in Appendix B.
Theorem 10: For any x ∈ V s 1 ,s 2 ( ) and 0 < η ≤ 1 4 , let α = min{|x i | : i ∈ supp(x)} and {q 1 , q 2 · · · , q L } is defined as in Algorithm 1, where L denotes the maximum number of iterations. Suppose we have m := L i=1 q (i) independent measurement vectors a i ∈ R n (i ∈ [m]) which are populated by independent random standard Gaussian entries. Then with a probability at least 1 − Le/n, the output x L of Algorithm 2 satisfies

IV. EXPERIMENTS
In this section, we conduct a series of experiments on synthetic data and real-world data to test the validity of the proposed algorithms and the correctness of the theoretical results. For simplicity, we use the same notations between models and algorithms, calling the proposed nonadaptive algorithm L1-TV and the adaptive algorithm Adap L1-TV (i.e., Algorithm 1, 2). To show the superiority of the proposed model, we compare L1-TV with the algorithm for solving only L1-penalty based optimization model and the algorithm for solving only TV-penalty based optimization model, respectively. Specifically, we employ the algorithm proposed in [4] to solve the L1-penalty based model and call the algorithm L1 for simplicity. For the TV-penalty based model, there is no existing related research. Thus we zero the penalty parameter for the L1-term in L1-TV to get the corresponding algorithm, naming it TV.
In Section IV-A, we conduct some simulations to explain how to select proper parameters for L1-TV and Adap L1-TV. In Section IV-B and Section IV-C, we apply the proposed VOLUME 10, 2022 algorithms to ECG signal recovery and gray image recovery, respectively. For the underlying signal x and its estimatex, we define the direction error as DirError := x/ x 2 − x/ x 2 2 and the relative error as RelError = x−x 2 / x 2 . Note that in 1-bit quantization, it is meaningful to acquire sign measurements at high, super-Nyquist rates since the measurements are cheap and fast. Thus, we conduct experiments at the oversampling rate m/n more than 1. We perform all experiments on a PC with 32 GB of RAM and Intel Core i7-8700M(3.2GHz) and always fix t = 10 in assumption (13).

A. SYNTHETIC EXPERIMENTS
This part conducts some synthetic experiments to help clarify the parameter setting and verify our theoretical results. In the first experiment, three signals are generated at random as shown in FIGURE 1 (a)  Next, we select the value of parameter λ 2 over synthetic experiments. First, we generate four signals of different lengths (n = 500,n = 1000,n = 1500, and n = 2000) at random, termed as signal 1, signal 2, signal 3, and signal 4, respectively (They are generated in a similar way to FIGURE 1 (a),(b),(c). Due to limited space, we do not plot them here). We fix the constant c = 0.075, parameters λ 1 = 2c √ (t + log(n))/m, λ 2 = kλ 1 and test the performance of the proposed method for different values of k.   With the above parameter setting, we conduct an experiment to compare the proposed L1-TV algorithm with the methods of L1 and TV by calculating the direction recovery errors for different sampling rates. FIGURE 3 (a) plots the original signal and FIGURE 3 (b) reports the average direction errors over 50 trials. As expected, the experimental results show that L1-TV performs better than the compared algorithms since the former penalizes both sparsity and gradient sparsity. The above experiments only consider the recovery of directions of signals. In the following, we further test the performance of the proposed algorithms for recovering the full signal (including both direction and magnitude). Firstly, we give a convenient choice for the standard deviation σ of the dither vector τ in the measurement model (14). Though Theorem 8 provides a theoretical guarantee for the case that σ equals to the known upper bound of x 2 , the setting may not be optimal in practice. In fact, the standard deviation σ should be smaller than the real value of x 2 . To see it, we generate three signals of different lengths (n = 500,n = 1000,n = 1500) at random, termed signal 1, signal 2, and signal 3, respectively (we do not plot them due to limited space). Then we fix m = 20n and test the performance of L1-TV for different values of σ/ x 2 . The relative errors are plotted in FIGURE 4 over 100 trials. It can be seen that too large or too small values of σ all tend to result in poor performance. We find that L1-TV performs the best when σ is about 0.2 x 2 and behaves acceptably in the range [0.1 x 2 , x 2 ]. The observation inspires us to choose the value of σ in the following way: (i) For synthetic experiments, we first generate 100 vectors with the same scale and sparsity as the underlying signals at random and obtain the average valueˆ of their magnitude. Then we empirically set σ = 0.2ˆ . (ii) For experiments on image processing, since the maximum of the underlying signal is known, e.g., 255, we may assumeˆ = 255 √ n/2 and easily set σ = 255 √ n/10 where n denotes the number of pixels. Besides, some other methods, such as cross-validation, can also be employed to tune σ , which may also obtain good performance.
With the above setting on parameters, we experiment to compare the proposed algorithms, L1-TV and Adap L1-TV, with the competitive algorithms, L1 and TV. Note that for Adap L1-TV, we easily set q 1 = q 2 = · · · = q 10 = 3n for the sampling complexity of each iteration. For a randomly generated signal x of dimension 1000, we calculate the average relative errors over 100 random trials at each sampling rate and plot the results in FIGURE 5. One can easily observe that L1-TV outperforms both L1 and TV at all sampling rates. Besides, the introduction of an adaptive quantization/recovery scheme (i.e. Adap L1-TV) significantly improves the recovery performance of L1-TV, which coincides to the conclusion reflected by Theorem 10.

B. APPLICATION TO ECG SIGNALS RECOVERY
In this part, we employ the proposed method for ECG signals recovery. The ECG signals were collected from the MIT-BIH Arrhythmia Database [30] and are windowed to 1000 samples in the experiment. FIGURE 6 (a) displays a segment of such ECG signals which is tested as the underlying signal.  The upper bound R on the magnitude is selected using the cross-validation on the tested set. The parameter λ 1 is set as λ 1 = 0.15 √ (t + log(n))/m, suggested by the experimental results in Section IV-A. It remains to determine the ratio k = λ 2 /λ 1 . To this end, we fix m = 20n and calculate the average relative errors for different values of k. The results are plotted in FIGURE 6 (b), which indicates that k = 3 may be an appropriate choice. For TV, a similar exhaustive test suggests us to choose the regularization parameter 0.30 √ (t + log(n))/m. For L1, we follow the suggestion by [4] that is λ = √ (t + log(n))/m. Under such parameter setting, we compare the proposed L1-TV and Adap L1-TV with L1 and TV at the sampling rate m/n = 30 and report the experimental results in FIGURE 7. We observe that Adap L1-TV performs the best, behaving with great advantages. L1-TV also achieves better performance than either L1 or TV, attributed to considering both signal sparsity and sparsity in the difference between consecutive components of the signal.

C. APPLICATION TO GRAY IMAGES RECOVERY
Next, we test the recovery performance of the proposed algorithms on gray images. Five images of size 256 × 256 are considered as examples, named as Barbara, Lenna, Pepper, and House, respectively (see FIGURE 8 (a)). Note that the VOLUME 10, 2022  recovery process is not performed directly on the whole image but small disjoint segments. Specifically, we first decompose the original image into 64 parts of size 32×32 and then restore these segments one by one. These segments are sparse in some bases, such as discrete cosine transform (DCT) or wavelet basis. In this experiment, we exploit the MAT-LAB command, dct2, to generate the desired sparse basis and then use different algorithms to reconstruct the induced sparse coefficients from thresholded binary measurements at sampling rates m/n = 10, m/n = 20, and m/n = 30, respectively. We set the maximum number of iterations K = 5 for Adap L1-TV. TABLE 2 exhibits the PSNR and SSIM values of the tested images obtained by different algorithms. From FIGURE 8 (b) to FIGURE 8 (e), we plot the recovered visual results by different algorithms respectively at the sampling rate m/n = 30. Whether from a visual or quantitative perspective, our algorithms equip more powerful recovery ability than classic sparse and gradient sparse based methods. Indeed, the superiority of L1-TV benefits from the combined utilization of sparse and local-smooth properties of image data while the outstanding performance of Adap L1-TV comes from the integrated use of an L1-TV regularizer and an adaptive measurement scheme.

V. CONCLUSION AND FUTURE WORK
This paper proposed a recovery method for 1-bit compressed sensing based on the L1-TV regularizer with an efficient solving algorithm. We also provide theoretical guarantees for the proposed method by detailed analysis. Compared with existing methods, our algorithms can better process signals that are element sparse and gradient sparse. In addition, the proposed adaptive algorithm achieves a significantly faster recovery error decay rate.
It is worth mentioning that some signals are sparse under particular linear transformations rather than themselves. We may consider linear transforms in the L1-norm penalty to fit more data in applications in future work. It is interesting to provide a corresponding theoretical analysis and consider how to design an efficient algorithm in this case.
For any u ∈ ∂g(u * ), using the definition of SGN(·), we can easily check that From Lemma 1, we know u * = arg min u∈R n g(u). Since the objective function g(u) is strictly convex, it admits a unique minimizer by u * . The optimality condition implies that 0 ∈ ∂g(u * ).
Therefore, using Lemma 1, the optimal solution is The proof is completed. Proof of Lemma 6: Since x 1 / x 2 ≤ √ s 1 , we have Besides, since x n ≤ x 2 ≤ , we have The proof is completed.

B. PROOFS OF THE PROPOSED THEOREMS
Proof of Theorem 4: The proof is fundamentally built upon the following observation between 1 m A T y and γ x, which is shown in [29].
In particular, looking at the last coordinate, this inequality yields x 2 2 ≤ .