ADMM–Net: A Deep Learning Approach for Parameter Estimation of Chirp Signals Under Sub-Nyquist Sampling

Parameter estimation of chirp signals plays an important role in the field of radar countermeasures. Compressed sensing (CS) based sub-Nyquist sampling and parameter estimation methods alleviates the pressure on hardware systems to acquire and process chirp signals with large time-bandwidths. In this paper, a framework based on the fractional Fourier transform (FrFT) and alternating direction method of multipliers network (ADMM-Net) is proposed to realize chirp signal parameter estimation under sub-Nyquist sampling. The whole framework is composed of multiple parallel ADMM-Nets, where each ADMM-Net is defined over a data flow graph, which is derived from the iterative procedures of the ADMM algorithm for optimizing a CS-based $p$ -order FrFT spectral estimation model. The chirp rate and central frequency of chirp signals are obtained through a two-dimensional search on the spectrum image output by the network group. Experiments demonstrate that the proposed ADMM-Net-based method can achieve higher estimation accuracy and computational efficiency at lower signal-to-noise ratios and sampling ratios than traditional CS methods. We also demonstrate that the proposed ADMM-Net-based framework has strong generalization ability for multi-component chirp signals. Furthermore, we further generalize ADMM-Net to GADMM-Net, in which the activation function is data-driven instead of model-driven. Experiments demonstrate that GADMM-Net significantly improves on the basic ADMM-Net and achieves higher spectral resolution with faster computation speed.


I. INTRODUCTION
Linear frequency-modulated (LFM) signals, which are also called chirp signals, are widely employed in various applications, such as radar [1], sonar [2], ultrasonics [3], and telecommunication [4]. Accurate estimation of the parameters of chirp signals, i.e., central frequency f 0 and chirp rate k, is essential in passive detection technology [5]. However, in the field of radar countermeasures, chirp signals are usually accompanied by a large time-bandwidth, and the estimation of its modulation parameters requires an extremely high sampling rate, which puts tremendous strain on the hardware systems used for signal acquisition, transmission, and processing. Although some scholars have proposed the The associate editor coordinating the review of this manuscript and approving it for publication was Pietro Savazzi . method of under-sampling combined with dechirping [6], which reduces the sampling frequency requirement, there is also frequency ambiguity [7]. Therefore, how to find a new signal parameter estimation algorithm to reduce the pressure caused by large time-bandwidth is an urgent problem to be solved.
Compressed sensing (CS) has brought new ideas to chirp signal processing. Most applications of CS theory for signal processing are for accurate reconstruction of signals [8]- [11]. However, in the application of CS theory for signal parameter estimation, with the help of a specific parameter matching dictionary, the estimation task can be achieved without complete reconstruction of signals [12]- [17].
CS has been a popular technique to acquire a decomposition expression of the signal from sampled measurements by solving an under-determined linear system [18].

CS incorporates data compression by a measurement matrix
∈ C M×N (M N) and the solution of the coefficient vector θ ∈ C N of signal x ∈ C N on the sparse basis ∈ C N×N from measurements y ∈ C M by a non-linear mapping f with lowest calculation error: The measurement matrix , sparse basis , and non-linear mapping f (·) are three key factors in CS. In this work, we focus on learning a sparse basis and non-linear mapping.

A. MEASUREMENT MATRIX
The measurement matrix is commonly required to satisfy some conditions, such as small autocorrelation coefficients [19] or small restricted isometry constant (RIC) [20], [21], to avoid aliasing effects by under-sampling. Random Gaussian matrices, partial Fourier matrices, randomly permuted coded diffraction, and Walsh-Hadamard operators are popular measurement matrices for different practical tasks.

B. SPARSE BASIS
The traditional chirp signal sparse representation is to decompose the signal in some specific transformation domain and express it with the known basis function in the form of an atom-dictionary or transform matrix. Dictionary-based bases [12]- [15] usually have local sparsity, which can represent the local time-frequency information of chirp signals. However, as the super-resolution estimation of parameters greatly increases the number of atoms in the dictionary, a high computational complexity is required; besides, because of the sparsity of single-frequency signals and narrow-band chirp signals in the frequency domain, the inverse Fourier transform matrix [16] can be selected as the sparse basis, whereas it is not suitable for wide-band chirp signals. Based on the energy concentration property of chirp signals in the proper order Fractional Fourier transform (FrFT) domain [22], a sparse dictionary is proposed in Ref. [17], where the discrete inverse FrFT (DIFRFT) matrices of different orders are adopted as the atoms. The solution of θ under each atom is essentially a p-order FrFT spectrum estimation model, which provides access for the detection and parameter estimation of chirp signals based on the FrFT spectrum [23]. Besides, the scale of the DIFRFT dictionary is much smaller than that of the time-frequency dictionary.

C. NON-LINEAR RECONSTRUCTION MAPPING
The constrained standard CS model [18] iŝ θ = f (y) = arg min where P(θ ) is a penalty term enforcing the transform sparsity of the signal, which is a regularization function derived from the data prior, e.g., q -norm (0 ≤ q ≤ 1) for a sparse prior. A wide variety algorithms have been developed to optimize the model, such as convex relaxation algorithms based on the 1 -norm [24], [25], greedy algorithms based on the 0 -norm [26], [27], and non-convex optimization algorithms [28], [29]. Although they are effective with many theoretical guarantees, most of them lack computational efficiency, and how to determine the numerous uncertainties in the model, such as regularization parameters and the parameters involved in the optimization algorithm, is still a challenging task.
Recently, deep neural network (DNN)-based supervised learning methods have shown success in signal parameter estimation, such as direction of arrival estimation [30], [31], and channel estimation [32], [33], where a DNN reconstructs a line spectral estimation model based on measurement datasets. The basic idea of these works is taking a DNN to directly model the non-linear mapping by learning network parameters from massive training data. However, although conventional data-driven DNNs are computationally efficient, they have restrictive receptive fields. They use conventional network architectures as black boxes without explicitly modeling or investigating the domain knowledge, such as transformation model or signal sparsity. Therefore, it is difficult for a conventional black box network to automatically learn the transform sparsity.
The unrolling method is a discriminative learning method that unfolds an iterative optimization algorithm to a hierarchical architecture. This category of methods links the conventional model-based approach to the deep learning approach. Model-based networks focus on learning a deep cascade of multiple sub-networks by introducing a data consistency (DC) layer. Each sub-network generates a prediction using the predicted result of the preceding subnetwork. These unrolling networks are mainly used in the field of image reconstruction, e.g., [34] unfolded an AMP algorithm [11] for CS image recovery, where the denoising operator in each AMP iteration was replaced with a very deep CNN. Sun et al. [36] and Yang et al. [37] presented an ADMM-Net derived from the iterative procedure of alternating direction method of multipliers (ADMM) algorithm [37] to solve a CS-MRI problem. Grefor and LeCun [38] proposed learning an iterative soft thresholding algorithm (LISTA) to approximate t sparse codes of the 1 sparse regularization problem in fixed steps. Wang et al. [39] and Xin et al. [40] further unfolded the iterative hard thresholding algorithm to feed-forward neural networks for an 0 regularized sparse coding model to obtain a sparser solution.
Inspired by the excellent performances of unrolling methods in image reconstruction, we propose an effective model-based deep learning method for parameter estimation of chirp signals under sub-Nyquist sampling. The proposed method is based on a CS model of the 1 -norm with the DIFRFT dictionary as a sparse basis. Then, we design an ADMM iterative algorithm to solve the above CS model. By unrolling and the ADMM iterative procedures, we derive a corresponding version of ADMM-Net, and the FrFT of signals under each p-order inverse FrFT matrix are obtained. Finally, the optimal transformation order is obtained through a two-dimensional search on the FrFT spectrum image, and estimations of the chirp rate and central frequency of the chirp signal are obtained.
Contributions: The main contributions of this work can be summarized as follows: 1) We propose a novel deep learning-based framework for chirp signal parameter estimation under sub-Nyquist sampling. The whole framework is composed of multiple parallel ADMM-Nets, where each ADMM-Net is defined over a data flow graph, which is derived from the iterative procedures in the ADMM algorithm for optimizing a CS-based p-order FrFT spectral estimation model. 2) Extensive experiments demonstrate that the proposed ADMM-Net-based method can achieve higher estimation accuracy and computational efficiency at lower signal-to-noise ratios (SNRs) and sampling ratios compared with traditional CS methods. We also demonstrate that the proposed ADMM-Net-based framework has strong generalization ability for multi-component chirp signals.

3)
We further unroll the ADMM-Net to a generalized one (GADMM-Net), where the activation function is data-driven instead of model-driven. Experiments demonstrate that GADMM-Net significantly improves the basic ADMM-Net and achieves higher spectral resolution with higher computational speed. The remainder of the paper consists of five parts: Section II introduces the sparse representation based on FrFT. Section III presents the details of unrolling ADMM. Section IV presents an application of the proposed framework in parameter estimation under sub-Nyquist sampling. Section V presents the results of simulations to evaluate the performance of the proposed framework with respect to conventional ones, and then, a GADMM-Net structure is proposed to improve the performance of the algorithm. Finally, Section VI concludes the paper.

II. SPARSE REPRESENTATION BASED ON FrFT
In this section, we first study the sparsity of chirp signals in the FrFT domain and then introduce the DIFRFT dictionary.

A. FrFT SPARSITY
A mono-component chirp signal s(t) with amplitude 1 and duty ratio 1 can be expressed as where T is pulse width. Referring to the Fourier transform expression, the FrFT expression of s(t) can be defined as where p is the transformation order of the FrFT, α is the rotation angle of the signal on the time-frequency plane, α = pπ/2, and FrFT [·] is the FrFT operator; K α (t, µ) represents the kernel function of the FrFT, which is expressed as where A α = √ 1-j cot α. By substituting Eqn. (3) into Eqn. (5), the FrFT expression of the signal can be written as Let rotation angle α = α 0 , cot α 0 = −k, and α 0 = πp 0 2 ; then, the FrFT expression of a chirp signal at rotation angle α 0 can be calculated as Therefore, the FrFT of a chirp signal behaves as an impulse function when rotated at an angle of α = α 0 = − arccot (k), and the impulse peak position is µ = µ 0 = f 0 sin α 0 . Accordingly, the estimation process of chirp rate k and central frequency f 0 of the chirp signal can be described as where the default length of S p [n] is N, and K α [n, n] is defined as where B α = √ S(sin α − j cos α)/N with S = sgn(sin α). Eqn. (9) can be written in the form of a transformation matrix S p = K α s (11) where The kernel functions of the FrFT and inverse FrFT (iFrFT) have the following characteristics: Therefore, the iFrFT of a chirp signal can be understood as the linear expansion of chirp signal s(t) in the space with the inverse transformation kernel K −α (t, µ) as the basis function: where K −α is still an orthogonal basis for the µ domain. At this point, we can construct a dictionary based on K H α , and obviously, in the case where N is determined, the complexity of the dictionary O(n) is uniquely determined by the density of p, which is much smaller than the dictionary with parameter matching O(n r ), r 2.
In this work, we use the DIFRFT dictionary as the sparse basis for CS methods.

III. ADMM-NET FOR CS MODEL
Alternating direction method of multipliers (ADMM) [37] is a widely utilized variable splitting algorithm for solving the energy minimization problem. It adopts the augmented Lagrangian function of the given model and splits variables into sub-groups, which can be alternately optimized by solving a few simple subproblems.

A. ADMM SOLVER
Tibshirani [42] proposed the least absolute shrinkage and selection operator (LASSO) model in 1996, adding the 1 norm regularization term on the basis of the original fidelity term, which is more conducive to selecting the ideal solution. The LASSO model iŝ where λ > 0 usually, and by adjusting λ, the sparsity of the sparse solution can be adjusted, and good noise suppression performance can be achieved. In 2012, Boyd et al. [37] systematically proposed the operation steps to solve the LASSO problem based on the ADMM method and combined the dual decomposition and augmented Lagrange multi-multiplier methods to solve the optimization problem step by step, and they improved the computational efficiency and obtained robust convergence performance. The advantage of the ADMM method is that a complex optimization problem is decomposed into several sub-problems, and the global performance can be harmonized while distributed optimization is realize to get closer to the optimal solution. According to ADMM algorithm theory, LASSO can be decomposed into two subproblems: The optimization problem can be solved efficiently by introducing auxiliary variables z = [z [1], z [2], . . . , z[N]] T to the transform domains, making Eqn. (16) Hence, an optimization equation with augmented Lagrange terms is established: where α represents Lagrangian multipliers and ρ is the Lagrange multiplier coefficient. The introduced Lagrange multiplier term adds an equality constraint to the optimization of Eqn. (19), and the objective solution is limited to the corresponding feasible region, which improves the solution efficiency, i.e., the robustness of the algorithm. In this case, the ADMM algorithm alternately optimizes {θ, z, α} by solving the following subproblems: where k denotes the k-th iteration. θ and z are iteratively updated in alternate directions to achieve joint minimization, while the dual variable α is updated by the iteration values of θ and z. In the solution process, the Lagrange multiplier acts on each of θ and z and then indirectly acts on α through the iteration of θ and z, thus improving the convergence rate and robustness. The following three steps for Eqn. (19) are explained as follows:

1) RIDGE REGRESSION
According to the first equation in Eqn. (19), the optimization problem of variable θ is equivalent to the joint minimization problem of the fidelity term and Lagrangian term, which can be understood as a ridge regression problem, i.e, where Tr[·] represents the trace operator. It can be seen from Eqn. (20) that the minimized objective function is a convex VOLUME 8, 2020 function, so the optimal value can be obtained by taking the derivative of the objective function 2) 1

NORM REGULARIZATION
According to the second equation in Eqn. (19), the optimization problem of variable z is equivalent to the regularization problem of the 1 norm, which is different from the LASSO problem of Eqn. (16), where the regularization is oriented to the augmented Lagrange term. The optimization problem for z is the same as for θ: where S λ ρ (x i ) represents the soft thresholding of x i , which is any element of x. When x ⊆ R, we can obtain where sign(·) denotes the sign function. A schematic diagram of the soft threshold operator S λ ρ (x R ) for a real value x R is shown in Fig. 1(a). The real soft threshold can be regarded as a piecewise linear filter that zeroizes smaller values of |x R | ≤ | λ ρ | and linearly shrinks x R when |x R | > | λ ρ |. Based on the complex value characteristics of chirp signals in the time domain and transformation domain, we improve the existing real-valued soft threshold operator to its complex form [43], where any element z i in the complex vector z is decomposed into a real part (z i ) and imaginary part (z i ), and the same complex decomposition is performed for θ and α. The complex form of Eqn. (23) is obtained by taking derivatives of (z) and (z): According to Eqn. (24), the complex soft threshold has the same phase retention as the real soft threshold. For convenience, the polar coordinate scheme of the complex x C soft threshold operator S λ ρ (x C ) is shown in Fig. 1(b). In Fig.1(b), the magnitude of x C is |x C |, compound angle x C ∈ [0, 2π), and the compound angle of complex S λ ρ is also x C . |x|, and x are sufficient to represent all the values of x C . The real-valued soft threshold can also be considered as a special case of the complex-valued soft threshold at x = 0.

3) DUAL VARIABLE UPDATE
The updating of θ and z is the process of minimizing their combined values. In the complex-valued ADMM algorithm, the iteration of α is also the update process of the joint dual variables of θ and z, i.e, The process is iterated through Gauss-Selde until the stop criterion is met, i.e., the set number of iterations or accuracy denotes the main residual, ε is the precision threshold set based on experience, and the algorithm terminates when the stop criterion is met.

B. ADMM-NET
To design our deep ADMM-Net, we first map the ADMM iterative procedures in Eqn. (19) to a data flow graph. As shown in Fig. 2, this graph is comprised of nodes corresponding to different operations in ADMM, and directed edges correspond to the data flows between operations. In this case, the k-th stage of the data flow graph corresponds to the (N s + k)-th iteration of the ADMM algorithm, where N s is the number of stages. The whole data flow graph is multiple repetitions of the above stages corresponding to successive iterations in ADMM. In the k-th stage of the graph, there are three types of nodes mapped from three types of operations in ADMM, i.e., coefficient solving layer (X (k) ), nonlinear transform layer (Z (k) ), and multiplier update layer (M (k) ) in Eqn. (19). Given an under-sampled signal, it flows over the graph and finally outputs a spectrum.

1) COEFFICIENT SOLVING LAYER X (k)
This layer solves the sparse vector θ (k) according to Eqn. (21). Given α (k−1) and z (k−1) , which are outputs of previous layers in stage k − 1, the output of this layer is defined as where y represents the input measurements and ρ (k) is a learnable penalty parameter in the k-th stage. We do not constrain the parameters to be the same in different stages to increase the network capacity. The output θ (k) in this layer is the input for subsequent multiplier update layer (α (k) ) and nonlinear transform layer (z (k) ) in stage k. In the first stage (i.e., k = 1), θ (1) = A H A + ρ (1) I −1 A H y .

2) NONLINEAR TRANSFORM LAYER Z (k)
This layer performs a nonlinear transform inspired by a complex-valued soft threshold operator S τ (·). Given θ (k) and α (k−1) , the output of this layer is defined as where τ (k) is a learnable threshold parameter in the k-th stage.

3) MULTIPLIER UPDATE LAYER M (k)
This layer is defined by the Lagrangian multiplier updating procedure α (k) in Eqn. (25). Given the three inputs α (k−1) , θ (k) , and z (k) , the output of this layer in stage k is defined as where η

4) NETWORK PARAMETERS
These layers are organized in the data flow graph shown in Fig. 2. In the deep architecture, we aim to learn the following parameters: ρ (k) in the coefficient solving layer, τ (k) in the nonlinear transform layer, η All of these parameters are taken as the network parameters to be learned. Obviously, the data set size of the CS model-based network is much smaller than that of the data-driven network.

C. NETWORK TRAINING 1) LOSS FUNCTION
The sparse vectors from fully sampled data are taken as the ground-truth labels θ gt and the under-sampled measurement data y as the input. Then, a training set with size is constructed containing pairs of under-sampled data and ground-truth labels. We choose averaged normalized root mean square error (NRMSE) as the loss function to train the networks. Given pairs of training data, the loss between the network output and ground-truth is defined as whereθ(y, ) is the network output based on network parameter and under-sampled measurement data y. We learn the ∪ ρ (N s ) by minimizing the loss w.r.t. them using L-BFGS [44] and learn the gradients of the loss function E( ) w.r.t. parameters using backpropagation over the network.

2) BACKWARD PROPAGATION
In the forward pass, the data of the k-th stage is processed in the order of X (k) , Z (k) , and M (k) . In the backward pass, the gradients are computed in an inverse order. For stage k, Fig. 3 shows three types of nodes (i.e., network layers) and the data flow over them. Each node has multiple inputs and (or) outputs. We next introduce the gradient computation for each layer in a typical stage k (k < N s ).

b: NONLINEAR TRANSFORM LAYER Z (k)
As shown in Fig. 3(b), this layer has two sets of inputs: α (k−1) and θ (k) . Its output z (k) is the input for computing α (k) and θ (k+1) in the next stage. The parameter of this layer is τ (k) . The gradient of loss w.r.t. parameters can be computed as where Similarly, we can also compute the gradients of layer output to its inputs, ∂z (k) ∂α (k−1) , ∂z (k) ∂θ (k) , and parameters ∂z (k) ∂τ (k) from Eqn. (27).

c: COEFFICIENT SOLVING LAYER X (k)
As shown in Fig. 3(c), this layer has two sets of inputs: z (k−1) and α (k−1) , and the output θ (k) is the input to compute z (k) and α (k) in the subsequent layer. The parameter of this layer is ρ (k) . The gradients of loss w.r.t. parameters are computed as where Similarly, we can also compute the gradients of a layer output to its inputs, ∂θ (k) ∂α (k−1) , ∂θ (k) ∂z (k−1) , and parameters ∂θ (k) ∂ρ (k) from Eqn. (26).

IV. ADMM-NET APPLIED TO CHIRP SIGNAL PARAMETER ESTIMATION
In this section, we use ADMM-Net to output the FrFT spectrum of chirp signals under sub-Nyquist sampling. First, the data flow diagram of the whole framework is introduced, and then the training details of ADMM-Net are introduced.

A. DATA FLOW
Eqn. (15) shows the decomposition form of a chirp signal on the p-order DIFRFT matrix K H α K H p , i.e., s = K H p S p . Therefore, we can obtain a p-order FrFT spectrum estimation model based on CS: where is a random Gaussian matrix. Obviously, the essence of the above CS model based on the p-order DIFRFT matrix is to obtain the p-order FrFT spectrum of the sequence under fully-Nyquist sampling through the measurement sequence under sub-Nyquist sampling. Therefore, eachŜ p under K H p must be solved by a p-order ADMM-Net, p = p 1 , p 2 , . . . , p L . Then, a network framework consisting of L parallel ADMM-Nets sharing input y is obtained. The data flow diagram based on ADMM-Nets is shown in Fig. 4. AllŜ p constitute the spectrum image of (p, µ) ∈ G = Ŝ p 1 , . . . ,Ŝ p L T , G ∈ C L×N , and then, a two-dimensional extreme value search is conducted on abs(G). The coordinate information of the peak can be substituted into Eqn. (8) to obtain the parameters of the reference chirp signal.
Different from the ADMM-Net [36] used for MRI image recovery, the ADMM-Net in our work is used to solve θ rather than restore x. In [36], ADMM-Net is further generalized, where sparse bases are learned by neural networks through data. This is because the contour of the reconstructed image is known; however, in parameter estimation, it is difficult to highlight the parameter characteristics of the signal without defining a standard without a known transformation domain. Instead, we use the predefined FrFT spectrum estimation model (Eqn. (36)) to complete the task of parameter estimation. Besides, formally, though, the output of ADMM-Net in our work looks like an image, it is a matrix of coefficient vectors G = Ŝ p 1 , . . . ,Ŝ p L T , which is completely different from an RGB image. Therefore, what we propose is in fact a ''network group'' structure as shown in Fig. 4, in which each sub ADMM-Net is used to solve aŜ p .
The structure of the network framework is determined by parameters {N, M, N s , L}. Obviously, {N, L} affects the resolution of the output spectrum image, M represents the sampling ratio, and N s reflects the depth of the network. Because the resolution of the spectral image is determined by the properties of the FrFT, which represents the upper limit of the estimation accuracy, we fix {N, L} as a constant and only set {M, N s } as variables in the following sections.

B. NETWORK TRAINING
In this subsection, we train a basic ADMM-Net to realize parameter estimation of chirp signals.  Fig. 5.
A sequence of measurement data y[m] is generated under each training template, which constitutes the final training set, containing all 1,904 sets of reference parameters. Accordingly, the testing set consists of 1,954 templates, of which 10% are used as the validation set.
Note that all data sets are noiseless. Considering that Gaussian white noise does not show energy accumulation in any FrFT domain, we can obtain the output with excellent noise resistance without adding additional noise resistance processing to the data sets or network.

2) TRAINING
We learn the parameters (p 1 ) , . . . , (p L ) of the parallel ADMM-Nets group by minimizing the loss E (p) using gradient-based algorithm L-BFGS. We implement the ADMM-Nets using Tensorflow, and all experiments were conducted in Python (3.7) on a macbook with an 2.6 GHz Intel Core i7 CPU and a GTX1080 GPU. It should be noted that M and N s are predefined and determine the structure of the parallel ADMM-Nets group. Once M and N s are determined, these networks have similar loss curves. Fig. 6 shows the loss curve with L-BFGS iterations of an ADMM-Net corresponding to p = 1.20 under the settings of 30% sampling ratio, i.e., M = 153, and N s = 7. It is satisfactory that E (p) converges to the ideal value after 22 L-BFGS iterations.

V. SIMULATION AND RESULTS
In this section, we report several quantitative experiments on the estimation performance of the proposed framework. Based on the DIFRFT dictionary, we first compare the sensitivity of our deep ADMM-Net with conventional CS methods in terms of sampling ratio and SNR. Then, we further study the generalization ability of the framework on multi-component chirp signals. Finally, we observe the VOLUME 8, 2020 influence of different network structures on the results and propose a more generalized network model.

A. PERFORMANCE COMPARISON
In this subsection, we compare the sensitivity of our deep ADMM-Net with conventional CS methods in terms of sampling ratio and SNR. Conventional methods include convex relaxation algorithms based on 1 , basis pursuit denosing (BPDN) [24], fast iterative shrinkage thresholding (FIST) [25], greedy algorithms, orthogonal matching Pursuit (OMP) [26], and iterative hard thresholding (IHT) [27] and non-convex optimization algorithm, iterative reweighted least squares (IRLS) [28], Bayesian compressive sensing (BCS) [29]. In the following experiment, we continue with the ADMM-Nets group trained in Section IV-B. The DIFRFT dictionary is used as the sparse basis ∈ C N×N , and the random Gaussian matrix is used as the measurement matrix ∈ C M×N for all CS methods in this work. The experimental sample in Experiment 1 is a discrete complex-valued mono-component chirp sequence with white Gaussian noise with signal length N = 512, sampling frequency F s = 2 MHz, chirp rate k = 1.17 GHz/s, and central frequency f 0 = 0.505 MHz. Under different sampling ratios and SNRs, the above CS methods are used to solve the coefficient vectors θ (p) ∈ C N of the signal under each K H p ∈ C N×N . Then, the coordinates of the peak can be obtained by two-dimensional peak search on G = θ (p 1 ) , . . . , θ (p L ) T , L = 200, p = 0.01 : 0.01 : 2, andk andf 0 are obtained through Eqn. (8). The evaluation criterion is the mean absolute percentage error as follows: where M t denotes the trials of a Monte Carlo simulation (MCS). Experiment 1: We first compare the sensitivity of our deep ADMM-Nets group with conventional CS methods in terms of sampling ratio under SNR = 10 dB. The compression sampling ratio was changed from 5% to 30% in steps of 5%. An MCS with 200 trials is conducted at each sampling ratio. Because parameter estimation based on the DIFRFT dictionary is a spectral peak estimation, there is a synchronous detection threshold before the peak is overwhelmed by clutter. Thus, the estimation error ofk andf 0 of a mono-component chirp signal has the same variable threshold. Therefore, we can compare the performance of these CS methods only through the performance of δ k . Fig. 7(a) shows the quantitative results of the CS methods under different sampling ratios with SNR = 10 dB, and Tab. 1 shows the average time taken by these CS methods to complete a trial.
The results show that when the sampling ratio is greater than or equal to 20%, all CS methods used in this work can achieve the estimation accuracy of the fully sampled FrFT.  The main difference of these error curves is that they have different sampling ratio thresholds, which directly reflect the resolution of the output FrFT spectrum image. Compared with the conventional methods, our proposed ADMM-Net produces the highest estimation accuracy under all sampling ratios. In average, ADMM-Net achieves comparable estimation accuracy (FrFT baseline) using sampling ratios in the range of 2.5% ∼ 7.5% less than conventional methods. Further, it achieves an approximate 20 times speed up in running time (GPU time) compared with IST and IHT. Under a 15% sampling ratio, ADMM-Net outperforms BCS by 1% and runs more than 36 times faster.
Further, we also compare the sensitivity of our deep ADMM-Net with conventional CS methods in terms of SNR. Similarly, the above sample is still used for the experiment. SNR is increased from −18 dB to 20 dB in steps of 2 dB, and an MCS with 200 trials is performed for each SNR. Fig. 7(b) shows the quantitative results of the CS methods under different SNRs, where the sampling ratio is 30%.
The results show that when SNR 6 dB, all CS methods used in this work can achieve the estimation accuracy of the fully sampled FrFT. ADMM-Net has the best noise resistance compared with other CS methods, which is embodied in its lowest SNR threshold. With the comparable estimation accuracy (FrFT base-line), ADMM-Net is 6 dB∼12 dB better than BPDN and other conventional CS methods. We conclude that even in the training phase of the network, the data sets are noiseless, but this does not affect the noise resistance of the network, which is determined by the properties of the FrFT. This gives CS model-driven networks a natural advantage over data-driven networks. The visual comparisons of reconstructedĜ = Ŝ p 1 , . . . ,Ŝ p L T using 30% sampling ratio and SNR = 10 dB are shown in Fig. 8. Peak SNR (PSNR) is introduced to represent the quality ofĜ: (39) where MAXĜ indicates the maximum magnitude of the elements inĜ.
The visualization results show that ADMM-Net has the highest FrFT spectrum image resolution.

B. DISCUSSION
In the above experiment, ADMM-Net produced the best results for various SNRs and sampling ratios. In this subsection, we further discuss the generalization ability of ADMM-Net to multi-component chirp signals. Then, the effects of different network architectures are evaluated, and a generalized ADMM-Net (GADMM-Net) with improved performance is proposed.

1) GENERALIZATION ABILITY
One advantage of model-driven networks is their ability to generalize. In previous experiments, the proposed ADMM-Net was fed with a mono-component chirp signal dataset. Therefore, we test the generalization ability of ADMM-Net by testing the learned networks group from mono-component data to multi-component data.  Obviously, ADMM-Net has strong generalization ability for the three-component chirp signal, and the spectral peak of each chirp component in the plane (p, µ) can be accurately detected. The difference in the estimation accuracy between components is caused by the difference in modulation parameters, which is attributed to Eqn. (8). The threshold value is nearly the same as that of the mono-component chirp signal in Experiment 1, which indicates that ADMM-Net still maintains high noise resistance and information recovery ability to multi-component signals. Fig. 9 shows the visualization results of the threecomponent chirp signal processed by BPDN and ADMM-Net when the sampling ratio is 30% and SNR is 10 dB.  The results clearly show that in the (p, µ) plane based on BPDN, there is strong pseudo-spectral interference between components, while the spectral peaks of the (p, µ) plane based on ADMM-Net are cleaner, indicating that ADMM-Net has better spectral resolution than BPDN. Experiment 3 To verify the effectiveness of the method, an experiment on the sensitivity of modulation parameters was conducted to observe the performance of ADMM-Net on other modulation parameters. Under each test template (the red dot shown in Fig. 5, a mono-component chirp signal with SNR = 10dB was generated as a sample. All the samples made up the sample set. Then, each sample was processed in an MCS with 200 trials using FrFT under full sampling and an ADMM-Net with 30% sampling ratio. We continue to use mean absolute percentage error as the degree of deviation: Fig. 10 shows the error matrix obtained by FrFT and ADMM-Net. The triangular region represents the parameter distribution of the sample set, which is same as the testing set's parameter distribution in Fig. 5. Different colors represent the magnitude of δ ε , whose upper bound shown in the figure is 5%, i.e., for each point δ ε = 5% in the figure, its true δ ε 5%. Fig. 10(a) and 10(c) respectively show the error matrix amplitude diagram of FrFT and ADMM-Net estimating f 0 on the sample set, whereas Fig. 10(b) and 10(d) show the results for k.
The simulation shows that the sensitivity of ADMM-Net and FrFT to chirp signal parameters are nearly the same. It is not difficult to find that the estimation accuracy of chirp signal oscillates with the parameter changes, which confirms the reliability of the results in Tab. 2. Obviously, ADMM-Net with iFrF matrix as sparse basis inherits the spectrum characteristics of FrFT, which proves the effectiveness of the proposed method for chirp signal parameter estimation.

2) EVALUATION FOR DIFFERENT NETWORK ARCHITECTURES
Generally, a common problem faced by neural network-based methods is how to determine the network structure containing the optimal solution. We next evaluate the performance of our proposed ADMM-Net with varying network architectures.
Experiment 4: The depth of an ADMM-Net depends on the number of stages (i.e., N s ) corresponding to the iterations in the ADMM algorithm. To test the effect of the number of stages, we train deeper networks by adding one stage at each time with other parameters fixed, i.e., sampling ratio 30%. Then, we select a mono-component chirp signal testing set at SNR = 10 dB to conduct the experiment. Fig. 11(a) shows the average testing PSNR values of ADMM-Nets using an increasing number of stages.
The PSNR increases fast when N s 7 and marginally increases when further increasing the number of stages. Therefore, in this work, N s = 7 is optimal from the perspective of calculation efficiency and accuracy.
Experiment 5: In the nonlinear transformation layer, ADMM-Net performs nonlinear transform inspired by the shrinkage function S(·) defined by z (k) in Eqn. (24). Obviously, like ReLU, when |x| | λ ρ |, the negative gradient will be set to zero, which may lead to neuronal necrosis. Therefore, instead of setting it to be a determined shrinkage function, we aim to learn a more general function using a piecewise linear function. Given θ (k) and α (k−1) , the output of this layer is defined as where S PLF (·) is a piecewise linear function determined by a set of control points p i , q where r = a−p 1 p 2 −p 1 and {p i } N c i=1 are the uniform positions located within −τ (k) , τ (k) , and q are the values at these positions in the k-th stage. Fig. 11(b) presents an illustrative example function. Thus, we get a more generalized ADMM-Net (GADMM-Net). Because a piecewise linear function can approximate any function, we can learn a flexible nonlinear transform function from data beyond the off-the-shelf hard or soft thresholding. Both real and imaginary parts share the same piecewise linear function, Fig. 11(c) shows the average testing PSNR values of ADMM-Nets using an increasing number of N c . The PSNR increases fast when N c 10 and marginally increases when further increasing the number of stages. To compare the estimation performance of GADMM-Net against that of ADMM-Net, we train and test a GADMM-Net and ADMM-Net with the data sets in Experiment 1, with the network parameters set as N s = 7 and N c = 10. Similarly, the sample signal in Experiment 1 is used to test the performances of the networks. Because the estimation accuracies ofk andf 0 of a mono-component chirp signal have the same detection threshold, let us compare the performances of the networks through the performance of δ f 0 . Fig. 12(a) shows the quantitative results of the two networks under different sampling ratios with SNR = 10 dB, and Fig. 12(b) shows the quantitative results under different SNRs with sampling ratio 30%. The computational efficiencies of the two networks are similar; the average testing times (GPU) of one trial are 0.47 s and 0.45 s, respectively. The results in Fig. 12 show that GADMM-Net has better estimation performance than ADMM-Net, with a 2.5% reduction in the demand for sampling ratio and better noise resistance. The SNR detection threshold has been optimized to approximately 4 dB, which is close to the estimation performance of the FrFT with full sampling.

VI. CONCLUSION
In this paper, a deep learning-based framework was proposed for chirp signal parameter estimation under sub-Nyquist sampling. The framework was expanded into a neural network group by a series of parallel ADMM algorithms, and its forward propagation is an FrFT spectrum estimation model. The framework can convert the measurement data under sub-Nyquist sampling to a fully sampled FrFT spectrum image, which can be used for chirp signal detection and parameter estimation. The simulation results show that the FrFT spectrum image obtained based on ADMM-Net has higher spectral resolution with faster computing speed than those obtained by other CS methods, which gives the parameter estimation of chirp signals better estimation accuracy at low SNR and low sampling ratio.
The framework proposed in this paper is the application of the CS unrolling method in chirp signal parameter estimation. Although it has better robustness and generalization capabilities than traditional data-driven neural networks, it is limited by the selection of sparse bases. For example, the FrFT basis used in this paper cannot directly estimate the parameters of chirp signals with duty ratio η < 1. In much of the literature on CS methods for image enhancement, the unrolling method is further generalized, where sparse bases are learned by neural networks through data. This is because the contour of the reconstructed image is known; however, in parameter estimation, it is difficult to highlight the parameter characteristics of the signal without defining a standard without a known transformation domain. Although we further generalized ADMM-Net in the activation layer in this work to improve its performance, we are still unable to deal with η < 1 adaptively, which needs to be further considered in future work. VOLUME 8, 2020 QINGLONG BAO received the B.S. and Ph.D. degrees from the National University of Defense Technology, Changsha, China, in 2003 and 2010, respectively. He is currently an Associate Professor with the National University of Defense Technology. His current research interests include radar data acquisition and signal processing.
ZENGPING CHEN received the B.S. and Ph.D. degrees from the National University of Defense Technology, Changsha, China, in 1987 and 1994, respectively. He is currently a Professor and a Ph.D. Supervisor with the Sun Yat-sen University. His current research interests include signal processing, radar systems, and automatic target recognition. VOLUME 8, 2020