Solving Complex Quadratic Systems With Full-Rank Random Matrices

We tackle the problem of recovering a complex signal <inline-formula><tex-math notation="LaTeX">${\boldsymbol{x}}\in \mathbb{C}^n$</tex-math></inline-formula> from quadratic measurements of the form <inline-formula><tex-math notation="LaTeX">$y_i={\boldsymbol{x}}^*{\boldsymbol{A}}_i{\boldsymbol{x}}$</tex-math></inline-formula>, where <inline-formula><tex-math notation="LaTeX">${\boldsymbol{A}}_i$</tex-math></inline-formula> is a full-rank, complex random measurement matrix whose entries are generated from a rotation-invariant sub-Gaussian distribution. We formulate it as the minimization of a nonconvex loss. This problem is related to the well understood phase retrieval problem where the measurement matrix is a rank-1 positive semidefinite matrix. Here we study the general full-rank case which models a number of key applications such as molecular geometry recovery from distance distributions and compound measurements in phaseless diffractive imaging. Most prior works either address the rank-1 case or focus on real measurements. The several papers that address the full-rank complex case adopt the computationally-demanding semidefinite relaxation approach. In this paper we prove that the general class of problems with rotation-invariant sub-Gaussian measurement models can be efficiently solved with high probability via the standard framework comprising a spectral initialization followed by iterative Wirtinger flow updates on a nonconvex loss. Numerical experiments on simulated data corroborate our theoretical analysis.


Introduction
Systems of quadratic equations model many problems in applied science, including phase retrieval [1][2][3], unlabeled distance geometry problem (uDGP) [4,5], and blind channel estimation [6,7].Phase retrieval, in particular, has motivated considerable recent research on quadratic equations.In phase retrieval, the measurements are y i = |a * i x| 2 = x * a i a * i x, where the measurement matrices a i a * i are rank-1 positive semidefinite matrices.We are interested in a different measurement model with highrank measurement matrices.Such a measurement model arises in a number of applications including combinatorial optimization problems such as uDGP [5], the turnpike problem [8], and in phaseless diffractive imaging where, either because of experimental or algorithmic design, we work with linear combinations of pixel values on the detector: y i = r w r |a * r x| 2 .These problems can be modeled as systems of quadratic equations where the measurement matrices are not necessarily rank-1 or real.Different methods are needed to study such high-rank complex quadratic measurement model.
Recovering the signal x from its complex quadratic measurements is a nonconvex problem, and solving for the the globally optimal solution is in general intractable.Recent works on nonconvex quadratic problems such as phase retrieval [9,10], phase synchronization [11,12], and low-rank matrix recovery [13] have shown that a globally optimal solution z can be recovered from sufficient measurements with high probability when iid Gaussian measurement vectors or matrices are used.This motivates the question of what other types of measurement matrices also enjoy such favorable properties.In this paper we show that the class of measurement matrices with rotation invariant entries is one such type, and propose to recover a globally optimal solution via the standard framework comprising a spectral initialization followed by iterative gradient descent updates.We can prove that when the number of measurements exceeds the signal's length by some constant factor, the complex signal can be recovered up to a certain phase difference with high probability.

Rotation invariant measurement model
We address the problem of recovering a complex signal x ∈ C n from its complex quadratic measurements y i ∈ C of the following form: where A i ∈ C n×n is the i-th complex measurement matrix.Let r i ∈ R 2n 2 denote the real and imaginary coefficients of the entries of A i : The m coefficients vectors {r i |i = [m]} are independently and identically distributed and follow a multivariate rotation invariant distribution [14]-the distribution of r i does not change under any unitary transform.In other words, r i has a spherically symmetric distribution and so its probability density function, p (r i ), depends only on the norm r i 2 .Using [14, Proposition 4.1.1],we have: It is easy to verify that E [r i,k ] = E [r i,l ] = 0 and E [r i,k • r i,l ] = 0. We then have: Hence the entries of r i are pairwise uncorrelated.Without loss of generality, we assume the variance µ 2 = Var(r i,k ) = E[r 2 i,k ] = 1, and the following variances exist and are bounded.In this paper we shall refer to the model given by (1) as the rotation invariant measurement model.

Problem formulation
We minimize the following loss function f (z) to obtain the recovered signal z: Suppose we are given a good initialization point z (0) (finding such a point is discussed in Section 2).The solution is then updated iteratively via gradient descent: where η > 0 is some suitable step size.The gradient ∇f (z) can be computed as follows: If x is a global minimum of f (z), then xe jφ is also a global minimum for all φ ∈ [0, 2π).Consequently, it is standard to define the squared distance between the recovered solution z and the true solution x as where z * x = |z * x|e jφ(z * x) and φ z = −φ(z * x).

Prior art
Similar quadratic equation problems have been studied in other contexts.Candès et al. [10] cast the phase retrieval problem as a system of structured quadratic equations and solve it via vanilla gradient descent with a guaranteed linear convergence rate.As this is a non-convex problem, they use a suitably constructed spectral initializer z (0) [9] for the Gaussian measurement model: z (0) can be proven to be close to a globally optimal solution when sufficient measurements are available.The work of Ghods et al. [15] on phase retrieval demonstrates that the signal can also be recovered with guarantees in a deterministic measurement model.Here we complement these results by showing that both the initialization and convergence proofs can be derived using Chebyshev's inequality for the full-rank rotation invariant measurement model in (1).Lu and Li [16] study how the spectral initialization can be generalized in the real recovery case for measurements that are independently drawn from y i ∼ p(y|a T i x), where a T i x serves as the parameter of the pdf p(y|a T i x).They focus on the asymptotic behavior of the initializer w.r.t. the sampling ratio m/n in the high-dimensional limit.To accelerate the phase retrieval process for large-scale problems, [17] computes a optimal step size for the gradient descent method at every iteration, and [18] proposes several coordinate descent algorithms with faster convergence speeds.The works of Wang, Xu, and Huang [19][20][21] solve a generalized phase retrieval problem where A i is a Hermitian matrix.They use algebraic methods [22] to find the number of measurements needed for a successful recovery.
Solving the systems of quadratic equations is closely related to low-rank matrix recovery-it is equivalent to recovering a rank-1 positive semidefinite matrix X = xx * with y i = A i , X [23].Previous works [24][25][26] on low-rank matrix recovery focus on relaxing the non-convex low rank constraint to the convex minimum nuclear norm constraint, and then establishing sufficient conditions that warrant such a convex relaxation.However, recovering the relaxed X is computationally demanding even for moderate sized problems.To address this issue, [13,[27][28][29] took the low-rank matrix decomposition approach X = U V T , where U , V ∈ R n×r and X ∈ R n×n are all real matrices, and the optimization is performed directly over U and V .In this work we aim to recover a complex signal x directly up to a certain phase difference.

Paper outline
The rest of the paper is organized as follows.In Section 2, we extend the derivations from [9] to the general rotation invariant measurement model.We show that the spectral initialization concentrates around a global optimum with high probability and compute the associated concentration bounds.
In Section 3, we analyze the regularity condition and derive new results for the rotation invariant measurement model.The two results are then combined to give the main theorem of this paper.In Section 3.2.1 we derive tighter results for the specific case of the complex Gaussian measurement model.In Section 4 we show the phase transition plots for different distributions in the rotation invariant exponential family (defined in Section 3.2) via simulated experiments.The detailed proofs are given in the Appendix.

Spectral Initialization
Spectral initialization is widely used to obtain an initialization that is close to a global optimum, x.Similar to [9,10], we show that the spectral initializer z (0) obtained with the rotation invariant measurement model also concentrates around x with high probability and can be used to initialize the gradient descent update in (9).This follows from the observation that in the the spectral initializer analysis, Gaussian concentration can be replaced by the weaker Chebyshev inequality.

Initialization for signals with known norms
We first consider the case where the norm of x is known and fixed.Without loss of generality, we shall assume x 2 = 1.The rationale behind the spectral initialization strategy is that we can get a good estimate S of xx * using {y i , A i |i = [m]} for some type of A i .The spectral initializer z (0) can then be obtained by performing a singular value decompositon of S and choosing a leading singular vector so that z (0) is highly correlated with x.
Unlike in the phase retrieval problem, which uses the leading eigenvector of the Hermitian matrix Ŝ = 1 m m i=1 y i a i a * i as the initializer, we have two possible choices here: the leading left or right singular vector v 0 of can both be chosen as the spectral initializer z (0) = v 0 , where y i is the complex conjugate of y i .
To see why, note that the expectation of S is The real and imaginary coefficients of entries in A i are rotation invariant, pairwise uncorrelated, and have unit variance i,rc and A (I) i,rc denote the real and imaginary parts of A i,rc .Using (5), it is easy to check that: where r = k or c = l.Hence E 1 m m i=1 A i,rc A i is a matrix with the (r, c)-th entry being 2, while the rest of the entries are all 0s.We then have: We prove that for sufficiently large m, the matrix S concentrates around E [S] in the spectral norm: S − E [S] ≤ δ.As a consequence the leading singular vectors of S are both highly correlated with a global optimizer x [16].The concentration proof hinges on the rotation invariance of the measurement matrices A i [14]: If we define B = RA with R ∈ C n×n being a complex unitary matrix, the real and imaginary coefficients, r i ∈ R 2n 2 , of entries in B still follow the same multivariate distribution as those of A.

Initialization for signals with unknown norms
When the norm of the signal is unknown, we can estimate it from the quadratic measurements.Using (16), we can compute: When m is sufficiently large, we prove that 1 2m m i=1 y i y i is close to its expectation x 4 2 with high probability.Based on this result, we can scale one of the leading singular vectors v 0 of S to get our spectral initializer: We have the following lemma stating that the distance between the spectral initializer z (0) and a global optimizer x is small with high probability when m is sufficiently large.Lemma 1.Under the rotation invariant measurement model given by (1), when the number of complex quadratic measurements satisfies m > Cn for some universal constant C, the distance between the spectral initializer z (0) and a global optimizer x is upper-bounded by with probability at least 1 where δ > 0 is some pre-specified constant and C 1 (σ, ρ, δ) is some constant depending on the variances {σ, ρ}.
Given the measurement model, the signal length n and the constant C 1 (σ, ρ) are fixed.To get a good initialization, we would like a small δ and ensure (19) holds with high probability.Hence C < m n needs to be large so that 1 Cn • C 1 (σ,ρ) 1. Experiments in Section 4.1, show that we can achieve this goal by increasing the number of measurements m, i.e. by raising the upper bound m n on C.

Convergence Analysis
For the objective function f (z) introduced in (8), there is a special neighborhood E( ) around every global optimizer x: The objective function f (z) is said to satisfy the regularity condition RC(α, β, ) if the following holds for all z ∈ E( ) [10]: where α > 0, β > 0, > 0 are some carefully chosen constants.The regularity condition RC(α, β, ) ensures that the gradient descent updates ( 9) with a step size η ∈ 0, 2 β converge linearly to a global optimizer x when initialized within the neighborhood E( ) [10, Lemma 7.10]: For the system of complex quadratic equations given by (1), we show that the objective function f (z) (8) satisfies the regularity condition RC(α, β, ) (20), and give a set of (α, β, )-values such that spectral initialization followed by gradient descent succeeds with high probability.We shall make use of the following lemma.Lemma 2. Under the rotation invariant measurement model given by (1), when the number of complex quadratic measurements satisfies m > Cn for some universal constant C, the following two inequalities hold with probability at least where ν is some pre-specified constant, and C 2 (σ, ρ) is some constant depending on the variances {σ, ρ}.
Following derivations similar to those of ( 13)-( 16), we can see that This allows us to interpret Lemma 2. It says that the matrix 1 m m i=1 p * A * i qA i concentrates around its expectation with high probability in the spectral norm.
Let h = ze −jφz − x, we can verify that: To prove that our objective function (1.2) satisfies the regularity condition, we have three steps.
We first find a lower bound on Re ( h, ∇f (z) ) for all h ∈ C n satisfying h 2 ≤ x 2 .
We proceed by showing that E [Re ( h, ∇f (z) )] > 0, and that Re ( Thus a positive lower bound can be established with high probability when m is sufficiently large.We break this problem down by lower-bounding every term in the following expansion: Using ( 23) and the Cauchy-Schwartz inequality, we have: where = h 2 x 2 , and ν is some pre-specified constant.
2. Finding an upper bound on ∇f (z) 2 2 .We next compute an upper bound on ∇f (z) 2  2 .This is equivalent to finding an upper bound of |u * ∇f (z)| 2 over all u ∈ C N with u 2 = 1.We have: We can then use (22) to obtain an upper bound on |u * ∇f (z)| for all u ∈ C n satisfying u 2 = 1: We then have: 3. Choosing a suitable set of (α, β, , ν)-values.
Main result Using Lemma 1 to provide an upper bound on the distance between the spectral initializer z (0) and a global optimizer x, we can finally state our main theorem: Theorem 1.Under the rotation invariant measurement model given by (1), when the number of complex quadratic measurements m > Cn for some universal constant C, and the step size 0 < η < 2 β , the gradient descent update (9) initialized with the spectral initializer z (0) obeying (19) converges linearly to a global optimizer x: with probability at least where δ > 0, ν > 0 are some pre-specified constants to ensure Re ( h, ∇f (z) ) > 0, {α, β} are constants depending on {δ, ν} and {C 1 (σ, ρ), C 2 (σ, ρ)} are two constants depending on the variances {σ, ρ}.
As discussed in the proof of Theorem 1, we first start by choosing the {δ > 0, ν > 0} so that the lower bound in ( 27) is positive.The values of {α, β} should satisfy 4  αβ ≤ 1 so that the gradient descent updates bring the solution z (t) closer towards x in each iteration.

Choosing the {α, β, }-values for the regularity condition
There are multiple choices for {α, β, }-values to make sure the initializer z (0) stays in the neighborhood of x and the objective function f (x) satisfies the regularity condition (20).For example, if we choose δ = 0.01, using Lemma 1, with high probability we can get that: Now, if we choose ≤ 0.2 and ν = 0.01, it is easy to verify that We can then choose When m is sufficiently large, the regularity condition holds for all h satisfying = h 2 x 2 ≤ 0.2 with high probability.We can show that the following update converges linearly to a global optimizer: where 0 < η < 2 β .

Rotation invariant exponential family
We next give examples of rotation invariant distributions and introduce the rotation invariant exponential family E. The multivariate random variable r ∈ R 2n 2 can be generated from a multivariate standard Gaussian distribution s ∈ R 2n 2 ∼ N (0, I) by • κ > 0 is a scaling constant chosen so that the variance µ 2 = Var(r i ) = E r 2 i = 1.
• q = −1.In fact if q = −1, the variable r is uniformly distributed on the sphere r 2 = κ.
As derived in Appendix B, the pdf of r is: Given q, κ, we can see that the probability distribution of r only depends on the norm r 2 , and is therefore invariant under unitary transforms.
• When q = 0, the variable r = s and its entries are iid standard Gaussian.We obtain tighter bounds for the complex Gaussian measurement model in Section 3.2.1.
• When q = 0 and q = −1, the entries of r are pairwise uncorrelated.

Complex Gaussian Measurement Model
The standard multivariate Gaussian distribution is a special case of a rotation invariant distribution: the individual entries are not just pairwise uncorrelated, but also pairwise independent.In this case we have We are able to obtain tighter probability bounds for complex iid Gaussian measurement matrices.For this measurement model, we derive lemmas similar to Lemmas 1 and 2: Lemma 3.Under the complex Gaussian measurement model, when the number of complex quadratic measurements satisfies m > Cn for some universal constant C, the distance between the spectral initializer z (0) and a global optimizer x is upper-bounded as with probability at least 1 − 6 exp (−Cn • C 1 (δ)), where δ > 0 is a pre-specified constant, and C 1 (δ) is some constant depending on {δ}.
Lemma 4.Under the complex Gaussian measurement model, when the number of complex quadratic measurements satisfies m > Cn for some universal constant C, the following two inequalities hold with probability at least 1 where ν > 0 is a pre-specified constant, and C 2 (ν) is some constant depending on {ν}.
We can show that the regularity condition holds with high probability for the complex Gaussian measurement model as before.We then have the following corollary: Corollary 1.Under the complex Gaussian measurement model given by (1), when the number of complex quadratic measurements m > Cn for some universal constant C, and the step size 0 < η ≤ 2 β , the gradient descent update (9) initialized with the spectral initializer z (0) obeying (38) converges linearly to a global optimizer x: with probability at least where δ > 0, ν > 0 are some pre-specified constants chosen to ensure Re ( h, ∇f (z) ) > 0, {α, β} are constants depending on {δ, ν} and C 3 (δ, ν) is some constant depending on {δ, ν}.
We obtain these tighter bounds for the Gaussian measurement model by replacing Chebyshev's inequality with the Bernstein-type inequality [30,Proposition 5.16] in the proofs.

Experimental Results
We perform numerical experiments to empirically evaluate the performance of our approach.Theorem 1 and Corollary 1 state that the step size is upper-bounded by 2  β where β is one of the regularity condition parameters in (20).In Section 3.1 β is proportional to the squared norm of the signal to recover.Hence, in all experiments the step size is chosen to be 0.1 where the signal norm, x 2  2 , is estimated using (17).Average relative initialization distance q = -100 q = 100 q = -75 q = 75 q = -50 q = 50 q = -25 q = 25 q = -10 q = 10 q = -1 q = 1 q = 0 Figure 1: Closeness of spectral initialization with varying number of measurements where the complex random measurement matrices are from the rotation invariant exponential family (37).

Closeness of spectral initializer
In this experiment we monitor how the distance between the initialization and the true solution varies with the number of measurements.We fix n = 100 and try different values of m with m n uniformly sampled between 1 and 10.We run 100 random trials for each m n value and calculate the average distance between the initialization and a global optimizer.In each trial we generate a random signal x ∈ C n and m complex random rotation invariant matrices from the same distribution (37) to produce m complex quadratic measurements.We repeat this experiment with multiple distributions from the rotation invariant exponential family by varying the q parameter in (37).Recall that q = 0 gives us the complex Gaussian measurement model (Section 3.2.1).
Distance between complex signals is defined in (11).The iterative gradient descent reconstruction is terminated if the relative distance between successive iterations is less than 10 −6 or if 2500 iterations are completed.We define relative distance as dist(x, x) x where x is the recovered image and x is the original image.As the true signal and its norm is unknown during gradient descent updates, ( 17) is used to estimate the norm.In Figure 1 we can see that the spectral initializer becomes increasingly closer to a global optimizer as m n increases.Furthermore, increasing q increases the average initialization distance.

Phase transition behavior
In this experiment we evaluate how the proposed approach transits from a failure phase to a success phase as we increase the number of measurements.We fix n = 100 and try different values of m with m n sampled uniformly sampled between 1.5 and 5.5.We again run 100 random trials for each m n value and calculate the success rate.Success is declared if the distance (11) between the recovered Success rate q = -75 q = 75 q = -50 q = 50 q = -1 q = 1 q = 0 and true signal is less than 10 −5 .Again, in each trial a random signal x ∈ C n is reconstructed and multiple distributions from the rotation invariant exponential family (37) are used to generate the measurement matrices.
As in Section 4.1, the iterative gradient descent reconstruction is terminated if the relative distance between successive iterations is less than 10 −6 or if 2500 iterations are completed.Figure 2 shows that approximately 4n measurements are needed to successfully recover the signal for the complex Gaussian measurement model (q = 0) and for when the entries of the measurement matrices are uniformly distributed on the sphere (q = 1). 2 Increasing q reduces the success rate which is expected as increasing q increases the initialization distance as in Figure 1.

Reconstruction of an image
In this experiment we reconstruct an image via its complex quadratic measurements given by (1).For images the iterative gradient descent reconstruction is terminated if the distance (11) between successive iterations is less than 10 −6 or if 2500 iterations are completed.The measurement matrices are complex random Gaussian matrices (Section 3.2.1 and q = 0 in (37)).We reconstruct the three color channels of an image separately.We choose m n = 4. Figure 3 shows the absolute value of the spectral initialization and the corresponding reconstruction.We define the relative error as | x|−x x , where | x| is the absolute value of the recovered image and x is the original image.The spectral initialization relative error is 0.34.The relative distances between the three channels of the original and their spectral initializations are 0.53, 0.49 and 0.51.The reconstruction relative error is 4.78 × 10 −7 .The relative distances between the three channels of the original and their reconstructions are 5.45 × 10 −7 , 7.73 × 10 −7 and 8.66 × 10 −7 .Additionally, we also reconstruct the same image with varying number of measurements.Figure 4 shows the relative distances of the recovered images for each channel.

Conclusion and future work
We addressed the problem of recovering a signal x ∈ C n from a system of complex random quadratic equations y i = x * A i x, where {A i } m i=1 are independent rotation invariant matrices.When the number of complex measurements m > Cn for some universal constant C, we can prove that with high probability: 1) the spectral initializer z (0) is close to a global optimizer, 2) the gradient descent update initialized with z (0) converges linearly to a global optimizer.Numerical experiments corroborate the theoretical analysis and show that a global optimum can be successfully recovered when m is sufficiently large.Additionally, our analysis complements the existing results for real measurements and rank-1 positive semidefinite measurement matrices.
Recent phase retrieval works show that a regularized spectral initialization and gradient descent update can improve the robustness and performance of the recovery algorithm [31,32].Chen, et al., [33] further proved that vanilla gradient descent with random initialization enjoys favorable convergence guarantees in solving the phase retrieval problem.Our future work involves analyzing similar algorithmic enhancements for the random complex Gaussian measurement model as well as extending our approach to recovering structured high-rank complex matrices that arise in key applications.

A Proofs for Rotation Invariant Measurement Model A.1 Proof of Lemma 1
Proof.The spectral initializer z (0) can be the left or right leading singular vector of the matrix S in (12) scaled by x 2 .By the rotational invariance of the random matrix A i , we can simply choose x = x 2 • e 1 .For the complex entries of A, we have: where (r, c) = (k, l).
We first show that S concentrates around E[S] = 2xx * in terms of the spectral norm: S − E[S] ≤ δ holds with high probability.Specifically, we have: where A i is the matrix A i with its (1, 1)-th entry replaced by 0.
• For the first term of (46), we have: Using Chebyshev's inequality, if m ≥ Cn for some constant C, we have: • Finding an upper bound on the second term in (46) is equivalent to finding an upper bound on the following (50) for all u, v ∈ C n satisfying u 2 = 1, v 2 = 1: where We then have: Take the first term of (52) for example, let B rc = u r v c we have: Since the coefficients of A i follow a spherically symmetric distribution, we can get For the first term of (52), we can compute that: Using Chebyshev's inequality, if m ≥ Cn, we have: Similar result can be obtained for the second term of (52): • Combining (48), ( 49), ( 58),(59), if m ≥ Cn, we have: (60) • So far we have shown that S − 2xx * ≤ δ x 2 2 holds with high probability when m ≥ Cn.Let {u 0 , v 0 } be the leading left and right singular vectors of S and τ 0 be the largest singular value of S. With high probability we have: Hence With high probability we also have: 1) When the signal norm x 2 is known, using Cauchy-Schwartz inequality, we have |u * 0 x| ≤ x 2 .If x 2 v 0 is chosen as the spectral initializer, we can get: Using (11), the squared distance between the spectral initializer z (0) = x 2 v 0 and x is then upper-bounded by 2δ x 2 2 : with probability at least 1 2) When the signal norm x 2 is unknown, we need to estimate it using R = 1 2m m i=1 y i y i .By rotational invariance property of the random Gaussian matrix A i , we can simply assume x = x 2 e 1 .We have: Using ( 48) and ( 49), when m ≥ Cn we know that the following equality holds with probability at least 1 − 1 Cn 32σ 2 δ 2 .The following inequalities then hold with high probability: The spectral initializer is chosen to be z (0) = 4 √ Rv 0 .Assuming δ < 4, we have the following holds with probability at least 1

A.2 Proof of Lemma 2
Proof.Let p = p p, q = q q, u = u û, and v = v v, we first prove the inequality (22).We need to show the following: By rotational invariance, let p = e 1 , q = r 1 e jφ 1 e 1 + r 2 e jφ 2 e 2 .Let b i = r 1 e jφ 1 A i,11 + r 2 e jφ 2 A i,21 , and A i denote the matrix A i with the (1, 1)-th and (2, 1)-th entries replaced by 0s, we have: • For the first term of (70), we have: It is also easy to verify that A Using Chebyshev's inequality, if m ≥ Cn, we can get: (74) • For the second term of (70), we have: We have the following probabilities: (78) • For the third term of (70), we have: We have the following probabilities: (82) • For the fourth term of (70), we have: We have the following probabilities: (86) • To make sure the fifth term of (70) where G i = kl g k w l • A i,kl .Let B kl = g k w l , we have: The variable b i can be expressed as: We then have: holds with probability at least 1 − 1 Cn • C 1 (σ,ρ) δ 2 .Pluging (97) into ( 27), we have: where δ > 0, ν > 0 should be chosen so that c 1 (δ, ν) > 0. Plugging (97) into (30), we have: Both ( 98) and (99) hold with probability at least 1 according to Lemma 2. We next discuss how to select the {α, β}-values.In order to achieve a linear convergence rate for (21), the α and β values of (20) should satisfy In order to satisfy the regularity condition (20), we require: In other words, we require: Given some δ > 0, ν > 0 satisfying c 1 (δ, ν) > 0, there are multiple choices of α, β that satisfy the regularity condition (20).In order to speed up the convergence towards x, we would like to find the a, b that maximize 4 αβ subject to (100) and (102).

B Rotation invariant exponential family
Since the random variable r = κ • s q 2 • s, where q = −1, and s ∈ R d ∼ N (0, I), we then have: where κ = κ(q) ensures that E[r 2 i ] = 1.Let J be the Jacobian matrix of s w.r.t.r, its entries are given by: The Jacobian matrix J is: with the determinant given by det(J ) = κ We obtain the expression for the pdf as C Proofs for Complex Gaussian Measurement Model C.1 Proof of Lemma 3 Proof.As done in Section A.1, we first establish an upper bound on S − 2xx * .We have: • The real and imaginary coefficients of A i are iid standard Gaussian.The first term of ( 113) is a summation of 2m iid sub-exponential random variables.Using the Bernstein-type inequality [30,Proposition 5.16], if m > Cn, we have: where c > 0 is some absolute constant that is different from C, and K is the sub-exponential norm of A • As done in Section A.1, finding an upper bound on the second term of (113) is equivalent to finding an upper bound on the following (115) for all u, v ∈ C n satisfying u 2 = 1, v 2 = 1: where G i = (r,c) =(1,1) u r v c • A i,rc .Let B rc = u r v c , we have: Take the first term of (117) for example, we have: (53 revisited) We further have: It is easy to verify that (125) • We can derive probability bounds for the second, third and fourth term of (123) similarly.
• As done in Section A.2, finding an upper bound on the fifth term of ( 123) is equivalent to finding an upper bound of 1 m m i=1 b i g * A i w for all g, w ∈ C n satisfying g 2 = 1, w 2 = 1.We have: From ( 91) and (118), we can see that the first term of (126) is a summation of iid sub-exponential random variables.Using the Bernstein inequality, if m ≥ Cn, we then have: Similarly we can get that: • Combining all the probability bounds, we can get:

Figure 2 :
Figure 2: Success transition plot showing the empirical probability of success based on 100 trials with varying number of measurements where the complex random measurement matrices are from the rotation invariant exponential family (37).

Figure 3 :
Figure 3: Spectral initialization and reconstruction of the University of Illinois at Urbana-Champaign logo from its complex random quadratic Gaussian measurements.

Figure 4 :
Figure 4: Reconstruction performance of the image from Figure 3 with varying number of measurements.
are uncorrelated.Similarly, we can see that (53) is a summation of pairwise uncorrelated terms.

2 − 1
are both the centered Chi-square random variable with 1 degree of freedom.
u, v, the real coefficient of G i is the summation of n 2 − 1 iid Gaussian N (0, |u r v c | 2 ).Hence Re(G i ) follows N 0, 1 − |u 1 v 1 | 2 ,its imaginary coefficient Im(G i ) also independently follows the same distribution.We then have:

C. 3
Proof of Corollary 1 Proof.The proofs are basically the same as done in Appendix A.3.Using Lemmas 3 and 4, we can show that (41) holds with probability at least 1 − 6 exp −Cn • min cδ 2 exp −Cn • C 3 (δ, ν) .
[30, (118), we can see that the two terms of (123) are both summations of iid centered subexponential random variables.Using the Bernstein inequality[30, Proposition 5.16], if m > Cn, we have: √ 2are both standard Gaussian variables, and they are also uncorrelated.Since uncorrelated standard Gaussian variables are independent, A