One-Bit Phase Retrieval: More Samples Means Less Complexity?

The classical problem of phase retrieval has found a wide array of applications in optics, imaging and signal processing. In this paper, we consider the phase retrieval problem in a one-bit setting, where the signals are sampled using one-bit analog-to-digital converters (ADCs). A significant advantage of deploying one-bit ADCs in signal processing systems is their superior sampling rates as compared to their high-resolution counterparts. This leads to an enormous amount of one-bit samples gathered at the output of the ADC in a short period of time. We demonstrate that this advantage pays extraordinary dividends when it comes to convex phase retrieval formulations, namely that the often encountered matrix semi-definiteness constraints as well as rank constraints (that are computationally prohibitive to enforce), become redundant for phase retrieval in the face of a growing sample size. Several numerical results are presented to illustrate the effectiveness of the proposed methodologies.


I. INTRODUCTION
Phase retrieval has gained significant interest in applied physics and statistical signal processing communities over the past decades [1]- [21]. This classical problem manifests as the recovery of an unknown signal solely from phaseless measurements that depend on the signal through a linear observation model. Due to the intrinsic difficulties of the recovery task [22], recently, there have been many efforts to This  propose approximate or relaxed versions of the phase retrieval problem in a convex optimization language, particularly via semi-definite programming [11], [23].
Quantization of the signals of interest through analog-to-digital converters (ADCs) is an important task in digital signal processing applications. A very large number of quantization levels is necessary in order to represent the original continuous signal in high-resolution scenarios. The large number of quantization bits, however, can cause a considerable increase in the overall power consumption and the manufacturing cost of ADCs, as well as a reduction in sampling rate [24]. Such disadvantages have motivated the researchers to investigate the idea of utilizing fewer bits for sampling. One-bit quantization is an extreme quantization scenario, in which the signals are compared with given threshold levels at the ADCs, producing sign (±1) outputs. This enables signal processing equipments to sample at a very high rate, with a considerably lower cost and energy consumption, compared to their counterparts which employ multi-bit ADCs [24]- [27]. We further note that one-bit quantization with a fixed threshold (usually zero) can lead to difficulties in the estimation of the signal amplitude. Employing time-varying thresholds, however, has been shown to result in enhanced signal recovery performance in some recent works [24], [28]- [33].

A. Contributions of the Paper
While convex formulations of the phase retrieval problem promise a global solution, some of the introduced constraints are computationally costly; including the matrix rank and the positive semi-definite (PSD) constraints. However, we show that if more samples are available, the sheer number of samples can constrain the solution in a less costly manner and make such constraints redundant. Note that, as mentioned earlier, by employing the one-bit quantization, sampling can be done at significantly higher rates. As a result, the emergence of one-bit sampling techniques paves the way for an investigation on the role of an increased sample size in the phase retrieval problem.
In this paper, we show that the phase retrieval problem can be tackled by taking advantage of the large number of linear observation inequalities that emerge naturally in the one-bit quantization regimen.
Instead of considering the often-formulated trace relaxation problem, our approach to one-bit phase retrieval is presented as a randomized Kaczmarz algorithm-based recovery. We present our results on a proper selection of the sufficient number of samples. Furthermore, an algorithm is proposed based on our model to adaptively evaluate the time-varying sampling thresholds. The performance of our approach with an increased sample size is also investigated when noisy measurements are utilized. March 18, 2022 DRAFT

B. Organization of the Paper
Since our approach takes root in convex phase retrieval, Section II is dedicated to a survey of such formulations. In Section III, we will discuss the appearance of one-bit sampling with time-varying thresholds in the phase retrieval context through linear inequality constraints (defining a polyhedron feasible region), as well as the randomized Kaczmarz algorithm (RKA) that can be utilized to recover our desired signal. To investigate the error recovery of the proposed algorithm, a theorem, which may be useful to select the number of measurement, is presented in Section IV. Section V is devoted to comparing our method with PhaseLift and its one-bit version in terms of their computational burden.
Based on our proposed polyhedron formulation, an algorithm is proposed to obtain the adaptive timevarying thresholds which benefit finding the signal of interest with more accuracy and less computational cost in Section VI. In Section VII, the noisy measurement scenario is studied owing to its importance in practical applications. Finally, Section IX concludes the paper.
Notation: We use bold lowercase letters for vectors and bold uppercase letters for matrices. C and R represent the set of complex and real numbers, respectively. (·) ⊤ and (·) H denote the vector/matrix transpose, and the Hermitian transpose, respectively. I N ∈ R N ×N is the identity matrix of size N .
Tr(.) denotes the trace of the matrix argument. The spectral radius ρ(B) of a matrix B is defined as a maximum absolute value of its eigenvalues [34]. The Frobenius norm of a matrix B ∈ C M ×N is The Hadamard (element-wise) product of two matrices B 1 and B 2 is denoted as B 1 ⊙ B 2 . Additionally, the Kronecker product is denoted as B 1 ⊗ B 2 . The vectorized form of a matrix B is written as vec(B). 1 s is a s-dimensional all-one vector. Given a scalar x, we define (x) + as max {x, 0}. For an event E, I (E) is the indicator function for that event meaning that I (E) is 1 if E occurs, and 0 otherwise. f ≍ g means f and g are asymptotically equal. The cumulative distribution function (CDF) of the zero-mean Gaussian process z ∼ N (0, ζ) is given by To compare two different CDFs, the Hellinger distance may be utilized [35], which is defined as with p, q ∈ [0, 1].

II. CONVEX PHASE RETRIEVAL: OPPORTUNITIES AND CHALLENGES
To tackle the phase retrieval problem, many non-convex and local optimization algorithms have been developed over the years. Recently, however, convex programming formulations have come to the fore March 18, 2022 DRAFT to yield global solutions. As a case in point, the PhaseLift method in [11] adopts a convex optimization mathematical machinery to tackle the phase retrieval problem, ensuring a near exact recovery of the unknown signal. To do so, PhaseLift relies on a trace-norm relaxation that is used in lieu of the original non-convex rank minimization problem-more on this below. Due to the imposition of the positive semidefinite (PSD) constraint, the PhaseLift problem formulation joins the class of semi-definite programs (SDPs).
Suppose x ∈ C n is the discrete signal of interest that is observed linearly through the lens of sensing vectors a j , with a H j constituting the rows of the sensing matrix A ∈ C m×n . Our goal in phase retrieval is to recover the signal x from phaseless measurements y j [11], [23]: To ease the mathematical manipulation, one can use the squared version of (3), i.e., where X = xx H and V j = a j a H j . Based on (4), the phase retrieval problem can be defined as, To have a convex program as [11], the problem (5) is then relaxed as [11], The linear objective and constraints, along with the PSD constraint, turns (6) to a semi-definite program which is convex [36]. Due to its convexity, there exists a wide array of numerical solvers including the popular Nesterov's accelerated first-order method to tackle the problem above [23], [37].
Although, the rank-one and the PSD constraints are deemed necessary to the phase retrieval formulation, they lead to an increased computational cost even in cases where we deal with a convex optimization landscape. To enforce the PSD constraint, a projected gradient method is used in [23], where the approximate solution should be projected onto a PSD cone at each iteration by recovering all eigenvalues and setting the negative eigenvalues to zero, which is quite expensive [23].
An interesting alternative to enforcing the PSD constraint in (6) emerges when one increases the number of samples m, and solves the overdetermined linear system of equations with m ≥ n. By collecting a large number of samples, the linear constraints Tr (V j X) = y 2 j may actually yield the optimum inside the PSD area X 0. As a result of increasing the number of samples, it is possible that the intersection of these hyperplanes will shrink to the optimal point without the need to consider the PSD constraint.
However, this idea may face practical limitations in the case of multi-bit quantization systems since ADCs capable of ultra-high rate sampling are difficult and expensive to produce. Moreover, one cannot necessarily expect these constraints to intersect with the PSD cone in such a way to form a finite-volume space before the optimum is obtained [11].
As we will show in the next section, by defining the phase retrieval in the one-bit sampling regimen, linear equality constraints are superseded with linear inequalities. Therefore, by increasing the number of samples, we may create a finite-volume space inside the cone X 0; making the PSD constraint no longer informative or required. From a practical point of view, one-bit sampling is done efficiently at a very high rate with a significantly lower cost compared to its high-resolution counterpart. Thus, by employing one-bit ADCs, it is practical, indeed natural, to study the game-changing opportunities that emerge in the context of phase retrieval due to the availability of a large number of samples.

III. ONE-BIT PHASE RETRIEVAL WITH SAMPLE ABUNDANCE
As indicated earlier, employing one-bit quantization provides a practical opportunity to address an important question as to whether more samples can mean less complexity in the context of the phase retrieval problem. We begin our efforts by defining a linear system of inequalities representing the phase retrieval problem in the one-bit quantization system deploying time-varying thresholds leading to the one-bit phase retrieval formulation. To recover the desired symmetric positive semi-definite matrix X ⋆ , we propose an algorithm which relies on exploiting the large number of one-bit sampled data and solves the associated linear system of inequalities by taking advantage of the randomized Kaczmarz algorithm (RKA).

A. Problem Formulation
In the one-bit sampling scenario, we only observe sign data r, given as where τ is a time-varying 1 threshold. Let e denote the phase vector to be recovered. The one-bit phase retrieval problem can be formulated as: find y, x, e y ⊙ e = Ax, where Ψ is a feasible region created by the one-bit constraints or equivalently, with the matrix Ω defined as Ω = diag {r}. Inspired by (8), in the following, we present a reformulation of the one-bit phase retrieval problem. Since y j ≥ 0 based on (3), assuming τ j ≥ 0, the following relation holds: Therefore, the set of inequalities in (9) can be rewritten as Consequently, one can recast (8) in the same spirit as (5): Moreover, based on (6), the one-bit version of the PhaseLift formulation may be written as which we refer to as one-bit PhaseLift in this paper. It is worth noting that the problem in (14) also belongs to the class of semi-definite programs (SDPs). As discussed earlier, in the asymptotic case of the one-bit phase retrieval problem, the PSD constraint may not be required. Moreover, the linear system of inequalities in (14) can be reformulated as where we use the matrix identity [38], with H and D being two arbitrary square matrices. As a result, the constraints imposed in the optimization problem (14) can be simplified as Note that dropping the SDP constraint is not the only advantage of having access to a large number of one-bit sampled data in the context of phase retrieval problem. In fact, we claim that by our approach the rank-one, or its relaxed versions potentially manifested as a trace minimization, also become redundant.
To see why, observe that in the asymptotic case of one-bit phase retrieval, the space constrained by the defined inequalities in (12), which is a polyhedron, shrinks to become contained inside the feasible region in terms of the PSD constraint. However, this shrinking space always contains the globally optimal rank-one solution, with a volume that is decreasing with an increasing number of samples. Thus, instead of the optimization problems in (14) and (17), we formally define the said polyhedron, i.e., equivalently restated based on (15) as A numerical investigation of (19) reveals that by increasing the number of samples m, the space formed by the intersection of half-spaces (inequality constraints) can fully shrink to the optimal point inside the PSD constraint-see Fig. 1 for an illustrative example of this phenomenon. As can be seen in this figure, the black lines representing the linear inequalities form a finite-volume space around the optimal point displayed by the purple circle inside the PSD cone (the elliptical region 2 ) by growing the number of one-bit samples. In (a)/(d), constraints are not enough to create a finite-volume space, whereas in (b)/(e) such constraints can create the desired finite-volume polyhedron space which, however, is not fully inside the PSD cone. Lastly, in (c)/(f), the created finite-volume space shrinks to be fully inside the PSD cone.
To find the signal of interest in the polyhedron (19), we use the RKA without enforcing other costly constraints. This is due to the fact that the solution may be efficiently approached by solving the linear system of inequalities presented in (19).
Note that two signal models for the phase retrieval problem were introduced in [11]: (1) The real-valued model: the unknown signal x and {a j } are real. (2) The complex-valued model: the unknown signal x and {a j } are complex [11]. Both settings will be considered in the following proposed algorithm.

B. One-Bit Phase Retrieval Algorithm (OPeRA)
To recover the desired signal in the one-bit phase retrieval problem, we aim to find a point in the polyhedron (19) instead of solving the SDP in (6). As discussed in Section III-A, when we exploit a large number of samples, the solution of (19) is increasingly likely to capture the desired point, i.e. the signal of interest, inside the PSD conical region. The proposed signal recovery relies on the RKA, which is a powerful tool for solving real-or complex-valued linear system of equations, or inequalities through projections [39], [40].
Accordingly, we propose an algorithm to find the desired matrix X ⋆ in (13) by (i) using abundant measurements, in order to create the finite-volume space inside the PSD conical region (discussed further in Section IV), and (ii) solving (19) via the RKA. We name our algorithm the One-bit Phase Retrieval The RKA is a sub-conjugate gradient method to solve overdetermined linear systems, i.e, Cx ≤ b where C is a m × n matrix with m > n [40], [41]. Conjugate-gradient methods immediately turn the mentioned inequality to an equality in the following form: and then, approach the solution by the same process as used for systems of equations. Without loss of generality, consider (20) to be a polyhedron: , where the disjoint index sets I ≤ and I = partition our sample index set J , and {c j } denote the rows of C. Based on this problem, the projection coefficient β i of the RKA is defined as [39], [40], [42]: Also, the unknown column vector x is iteratively updated as: where, at each iteration i, the index j is chosen independently at random from the set J , following the distribution To ensure a limited error, the feasible region in (19) cannot be an infinite space in an asymptotic sense.
Fortunately, by introducing more samples, the problem can form a polyhedron with a bounded volume containing the desired point. Even more interesting, by adding more inequality constraints in (19), the shrinkage of the said polyhedron will put a downward pressure on the error between the desired and recovered points (each informative sample will shrink this space). We will show that by increasing the number of constraints and effective sampling, this error approaches zero. Moreover, as a result of using an overdetermined linear system of inequalities, the convergence of the RKA is guaranteed [39], [40].
It is worth noting that, in our problem, we only have the inequality partition I ≤ . Herein, the row vectors {c j } and the scalars {b j } used in the RKA (21)-(24) are − r j vec V ⊤ j ⊤ and − r j τ 2 j , respectively.

C. Numerical Illustrations for OPeRA
We numerically examine the effect of growing the sample size in the recovery of the desired matrix X ⋆ in OPeRA. We will use the spectral radius metric which is particularly informative in the recovery of rank-one matrices. Note that for a positive semi-definite matrix such as our desired matrix X ⋆ , the spectral radius is equal to the Frobenius norm of the matrix [38]. In all experiments, the input signals were generated as x ∼ N (0, I n ) + jN (0, I n ) for the complex-valued model, and x ∼ N (0, I n ), for the real-valued model. For both models, the rows of the sensing matrix A were generated as a j ∼ N (0, I n ). Accordingly, we made use of the time-varying thresholds τ ∼ Lognormal (0, 1). We define the experimental mean square error (MSE) between the true spectral radius ρ (X) and its estimate ρ X as Deploying a large number of samples leads to obtaining an X that is "increasingly" rank-one PSD.  Figure 4: Average NMSE for the Frobenius norm error between the desired matrix X ⋆ and its recovered versionX for different one-bit sample sizes with the RKA applied to (19) where E is the number of experiments. Each presented data point is averaged over 10 experiments. The results are obtained for the number of samples m ∈ {1000, 5000, 10000, 50000, 100000}. An important part of our work is to show that our method can recover an X that is "increasingly" rank-one and PSD by growing the number of one-bit sampled data. To do so, at first, we show in Fig. 2 that the maximum eigenvalue is accurately recovered. Next, we present that all eigenvalues {ℓ i } of the recovered matrixX except the maximum eigenvalue approach zero by increasing the number of one-bit sampled data. Fig. 3 appears to confirm this claim for both real-valued and complex-valued models. The presented results are averaged over 10 experiments and the eigenvalues are arranged in descending order.
To further investigate the effectiveness of OPeRA in both real-valued and complex-valued models, Fig. 4 illustrates the squared Frobenius norm of the error normalized by the squared Frobenius norm of the desired matrix X ⋆ , defined as where the presented results are averaged over 10 experiments. Fig. 4 appears to confirm that the performance of the recovery is enhanced by increasing the number of one-bit samples.

IV. BOUNDING THE RECOVERY ERROR
In this section, we derive the convergence rate of our proposed algorithm in its search for the optimal point in the PSD cone. Moreover, an upper bound for the recovery error E X i − X ⋆ 2 F will be introduced. This bound will be leveraged to find a lower bound on the number of measurements m; the critical role of which was readily discussed in Section III.

A. Chernoff Bound Analysis for OPeRA
We first investigate the convergence of OPeRA through a probabilistic lens. Define the distance between the optimal point X ⋆ and the j-th hyperplane presented in (19) as where X i is the solution from the RKA iterations. From an intuitive point of view, it is easy to observe that by generally reducing the distances between X ⋆ and the constraint-associated hyperplanes, the possibility of capturing the optimal point is increased.
Suppose K is the cardinality of the set of the hyperplanes presented in (19), a portion of the whole sample size, i.e., K ≤ m , which will effectively form a polyhedron inside the PSD cone around the optimal point X ⋆ if the number of samples is sufficient. The average of the distances {H j } around the optimal point is obtained as For a specific sample size m, when the area of the finite-volume space around the optimal point is reduced, E (X ⋆ ) is diminished as well. Define the overall average distance T ave as Increasing the number of samples leads to smaller values of E (X ⋆ ) and T ave . Therefore, the possibility of creating the finite-volume space around the desired point increases in the asymptotic sample-size scenario, with m ≥ m ⋆ , where m ⋆ is the minimal measurement size. The Chernoff Bound [43], [44] can shed light on this phenomenon as illustrated bellow.
Theorem 1. Consider the distances {H j (X ⋆ , X i )} between the desired point X ⋆ and the hyperplanes of the polyhedron defined in (19) to be i.i.d. random variables.
• The Chernoff bound of T ave in (29) is given by where M T is the moment generating function (MGF) of T ave , given as with µ (κ) Hj = E H κ j , and R denoting a bounded reminder associated with truncating the Taylor series expansion of M T .
• M T is decreasing with an increasing sample size in the sample abundance scenario, leading to an increasing lower bound in (30).
Proof. The MGF of T ave is given by By using the Taylor series expansion, one can write which leads to the formulation (31). Note that due to the fact that the distances {H j } can be considered to be finite values, their moments always exist.
It is straightforward to verify that M T is an analytic function and that |R (m ⋆ )| is bounded. Let t 0 denote the value of t making the upper bound in (30) infimum. To prove that M T is a decreasing function in the asymptotic sample-size case (m > m ⋆ ), we use the Padé approximation (PA) which can asymptotically approximate M T with a rational function of given order through the moment matching technique as follows [24], [28], [29]: where {a 0 , a 1 , b 0 , b 1 } are the PA coefficients as given in [24]. The above rational approximation 3 is a decreasing function; a fact that can be verified by taking its first derivative with respect to m. The negativity of the derivative is easily concluded by observing that where σ 2 Hj is the non-negative variance of the random variable H j .

B. Recovery Error Upper Bound for OPeRA
In order to find the signal of interest in the polyhedron (19), we are utilizing the RKA which leads to the following convergence bound [39]- [41], [45]: where x ⋆ is a desired point, q ∈ (0, 1) is a function of the condition number of the matrix C, and i is the number of required iterations for the RKA. In our problem, x i = vec (X i ), x ⋆ = vec (X ⋆ ), and C = R ⊙ V . Therefore, the right-hand side of (36) may be recast as It is clear from (36) that by using a well-chosen initial point x 0 or by increasing the number of iterations i, the recovery error can be further contained. Nevertheless, in the proposed recovery approach, it is deemed necessary to have the sufficient number of samples (inequalities) in order to guarantee a finite-volume feasible region and a bounded recovery error. Once our search area is located inside the PSD cone, we may effectively employ (36) for the convergence rate. The convergence rate of the RKA is useful when we have a linear system of inequalities. On the other hand, in the one-bit phase retrieval, the main constraints, i.e. the rank-one and the PSD, are non-linear and they may be considered to be redundant by deploying the enough number of samples. Thus, (36) is insufficient to present the convergence rate of OPeRA. We can make (36) relevant to the one-bit phase retrieval problem by taking a penalty function into consideration: where Ψ(.) is a decreasing function in the abundance sample-size regime, such that if the number of samples is enough to satisfy the PSD constraint, the penalty function approaches zero Ψ (m) → 0.
Tails of decreasing functions may be asymptotically approximated by an exponential function [44].
Mathematically, this may be expressed as where ε is an arbitrarily small positive number. Therefore, the boundary (38)

C. Lower Bound on the Number of Measurements
The algorithm termination criterion is considered to be Based on this criterion and (41), the following theorem is presented in order to find a lower bound for the number of required measurements in OPeRA.

Theorem 2.
To recover the desired PSD matrix X ⋆ in accordance to (42) in one-bit phase retrieval by OPeRA with a probability of at least 1 − inf M T e −at , the number of measurements m must obey where ǫ 0 and γ 1 are determined via (40), ω 0 = X 0 − X ⋆ 2 F is the initial squared-error, and i is the number of iterations.
Proof. To satisfy (42), the inequality in (41) is adjusted to capture the upper bound ǫ 1 . As a result, the following inequality is obtained: Note that ω 0 is a constant scalar that only depends on the initial and optimal signals. According to (41), we can write or equivalently, which proves the theorem.
The sufficient number of measurements m is sought to create a finite-volume in the PSD cone X 0 for the input signal x with the size n. It is easy to verify that the dimension of the PSD cone is equal to n 2 +n 2 in both real-valued and complex valued cases. The infimum number of the hyperplanes creating a finite-volume space in a n ′ -dimensional region is n ′ + 1. Consequently, we can write: Therefore, to form the finite-volume space, the sample size m must be lower bounded by inf K: The above bound helps us to establish a clear connection between the sample size and the problem dimension. However, since the boundary in (43) is concerned with containing the error after a finitevolume is formed, it is asymptotically tighter than (48). Therefore, by satisfying (43), the lower bound in (48) is typically met as well.

V. COMPLEXITY INVESTIGATION
We examine the computational cost associated with the use of more samples by comparing our approach (OPeRA) with the PhaseLift method and its one-bit extension (one-bit PhaseLift) as defined in (14). This comparison is based on the required computational time for different sample sizes.

A. Comparing PhaseLift and OPeRA
To compare the computational time for the SDP-based PhaseLift approach and our proposed method, the iterative algorithms are terminated according to (42) with ǫ 1 = 10 −2 . The unknown signal x lies within R 10 . The number of samples m is set to be 100, 1000, 3000, 4000, 5000, 10000, 30000, 50000, 80000, and 100000. Each CPU time is obtained by averaging over 5 experiments. PhaseLift is applied to the high-resolution samples, whereas OPeRA is applied to their one-bit sampled data counterpart, which means only partial information is made available to OPeRA.
As can be seen in Fig. 5

B. Comparing One-bit PhaseLift and OPeRA
Next, we compare the CPU time of the one-bit PhaseLift defined in (14) and that of OPeRA. As mentioned before, the one-bit PhaseLift problem is a SDP similar to the problem (6).
To compare the relaxation-based formulation of the one-bit PhaseLift with our approach, we aim at the recovery of a signal x ∈ R 10 with sample sizes m ∈ {1000, 3000, 5000, 8000, 10000}. The termination criterion of both algorithms is exactly similar to the one in Section V-A.
As can be seen in Fig. 6, by growing the number of one-bit samples, the CPU time of the one-bit PhaseLift is increasing. On the other hand, the CPU time of OPeRA is decreasing. One may conclude that Note that PhaseLift is applied to the high-resolution samples whereas only the one-bit version of the samples are made available to OPeRA.
in the large sample size regimen, the proposed one-bit phase retrieval approach has a lower computational burden than one-bit PhaseLift. This due to enjoying the advantages of using more samples to make both the rank-one constraint and the PSD constraint redundant. We hypothesis that the computation time rise due to unhelpful extra samples (in achieving the error bound) occur well beyond m = 10000, which is presumably why the increase is not observed in our experiment.

VI. ADAPTIVE TIME-VARYING THRESHOLDING
Hereafter, we propose an adaptive threshold design strategy for the task of one-bit phase retrieval. By the spirit of using the iterative RKA, a suitable time-varying threshold can be chosen in order to find the optimal solution with enhanced accuracy. As discussed earlier, with sample abundance, we have an overdetermined linear system of inequalities creating a finite-volume space. To capture the desired signal matrix X ⋆ more efficiently, the right-hand side of the inequalities in (19), i.e. r j τ 2 j , must be determined in a way that each associated hyperplane passes through the desired feasible region within the PSD cone.
Therefore, an algorithm is proposed to ensure that this occurs. To give an illustration, we suppose the solution is the yellow point in Fig. 7. Geometrically, with an adaptive time-varying threshold algorithm, our goal is to generate an informative sampling threshold creating the inequality constraint corresponding to the hyperplane shown by the blue line.
Unlike the other two inequality constraints (hyperplanes illustrated by green and purple lines in Fig. 7), the blue hyperplane will further shrink the feasible region for signal recovery. From this viewpoint, the other hyperplanes (green and purple lines) constitute extra inequality constraints that are not informative.
As discussed in Section V-A, such extra samples (the extra inequality constraints) only increase the computational burden of the phase retrieval task.

A. Adaptive Threshold Design for OPeRA
In light of previous discussion, to achieve a better recovery accuracy for a specific sample size m, one may use the idea of shrinking the space imposed by the set of inequalities in (19) around the optimal solution X ⋆ . To make this happen, we propose an iterative algorithm generating an adaptive threshold to accurately obtain the desired solution. To diminish the area of the finite-volume space around the optimal point, we update the time-varying threshold as where ǫ (k) j at the j-th element of a positive vector ǫ (k) in the k-th iteration. This updating process is based on the fact that when r j = +1, we have Tr (V j X) ≥ (τ j ) 2 , and Tr (V j X) ≤ (τ j ) 2 otherwise.
The reason behind updating the one-bit measurements r (k) j , is to ensure that the optimal solution X ⋆ satisfies (19) in iteration k, i.e. the inequalities r

B. Numerical Study of Adaptive Thresholding
To present the efficacy of the adaptive time-varying threshold algorithm in comparison with a nonnegative random threshold, the unknown signal x and the sensing matrix A are generated using the same settings as in Section III-C. As can be seen in Fig. 8, the performance of our proposed algorithm is evaluated by the NMSE defined in (26) which is considerably enhanced in comparison with adopting a non-negative random threshold according to Lognormal (0, 1). The results are obtained for the sample sizes m ∈ {1000, 5000, 10000, 50000, 100000} and δ = 10 −3 . Each presented data point is averaged over 5 experiments.
In the following, we compare the number of measurements m required to recover the rank-one and PSD matrix X ⋆ using the OPeRA with (i) our proposed adaptive threshold algorithm, as well as (ii) a random non-negative threshold. The result can be summarized as follows. Proof. Let m f denote the number of inequalities in (19) creating a finite-volume space around the desired matrix X ⋆ , which is not necessarily inside the PSD cone. As discussed in Section VI-A, our proposed adaptive thresholding algorithm shrinks the finite-volume space around the optimal solution X ⋆ in a stronger way than the random thresholds. As a result, {H j } defined in (27) as well as their moments Hj , will further diminish which leads to a smaller value for M T . Therefore, according to Theorem 1 March 18, 2022 DRAFT Input: One-bit data: r, sensing matrix: A, initial time-varying thresholds: τ (0) ∼ Lognormal (0, 1) (with the same length as r), small positive number: δ.
-Calculate the matrix V with the rows {vec a j a H j ⊤ }.
-Initiate the following loop by setting k = 0.
-Increase k by one. end Algorithm 1: Adaptive Algorithm for Sampling Threshold Selection and Theorem 2, a similar recovery performance can be expected with a smaller sample size when the adaptive sampling threshold proposed in Algorithm 1 is utilized.
To numerically scrutinize our claim in Theorem 3, Table I illustrates that the sufficient number of samples m required for OPeRA to recover a PSD matrix with adaptive sampling thresholds proposed in Algorithm 1 is much less than that of OPeRA with a random threshold. The result is obtained for the input signal x, a random time-varying threshold τ , and the sensing matrix A originating from the same settings as presented in Section III-C. The number of samples examined in our experiments are same as Section V-A, plus m = 500.
To further investigate the effectiveness of Algorithm 1, we show that the average of all eigenvalues {ℓ i } ofX except the maximum eigenvalue (spectral radius) approaches zero by increasing the number of measurements-see Fig. 9. Interestingly, Fig. 9 reaffirms that the number of measurements used to recover the rank-one and PSD matrix X ⋆ by OPeRA with adaptive thresholding, is much less than the same result reported in Fig. 3 where OPeRA with a random threshold is adopted. The results are obtained for the number of samples m ∈ {300, 500, 1000, 2000}, they are averaged over 5 experiments, and the non-dominant eigenvalues are arranged in a decreasing order.

VII. ONE-BIT PHASE RETRIEVAL WITH NOISY MEASUREMENTS
In this section, we extend our study to signal recovery from noisy one-bit data in the phase retrieval problem. In most practical applications, we must rely on noisy measurements [23], [35], [46]. In particular, we will examine whether the computational advantages provided by sample abundance in the noiseless scenario will also be observed under the presence of noise.
(50) OPeRA with a random threshold 30000 OPeRA with the adaptive threshold 500 Let λ and z denote the time-varying threshold vector and the noise vector, respectively. The noisy one-bit samples are generated as The occurrence probability vector p for the noisy one-bit measurement r is given as [46], where Φ(.) is the CDF of −z. Since {µ j } are linear function of X, the CDF of noise {Φ(µ j − λ j )} can be written as Φ (X). The log-likelihood function of the sign data r is given by Interestingly, by solving the maximum log-likelihood estimation (MLE) problem associated with (53), our desired matrix X ⋆ can be immediately approximated. The proposed algorithm is called Noisy OPeRA.

B. Noisy One-Bit Phase Retrieval via Convex Programming
A preliminary formulation of our optimization problem based on the MLE may be cast as: This problem is the one-bit version of its counterpart formulated in [23]. However, as discussed in previous sections, because of employing one-bit sampling, the large number of samples can be adopted which leads to the availability of a large number of sign data {r j } and the corresponding inequality constraints; since when r j = +1, we have Tr (V j X) ≥ (τ j ) 2 , and Tr (V j X) ≤ (τ j ) 2 otherwise. These inequalities are collected to form the polyhedron (19). However, these constraints may be equivalently absorbed in the objective function to facilitate the one-bit phase retrieval formulation in the noisy case.
Therefore, the problem (55) can be reformulated as In many cases, L r (µ, X) is a concave function and thus the above programs becomes convex. One can readily verify this in the case of a Gaussian noise [35]. In the rest of our paper, −z is assumed to be an i.i.d. zero-mean Gaussian process z ∼ N 0, σ 2 z I m , for which Φ(.) is given in (1).

C. Numerical Investigation of Noisy OPeRA
To examine the performance of Noisy OPeRA in practice, and to validate the theoretical results described in this section, we consider signal recovery with different values of σ z ∈ {0.1, 0.2, 0.4, 0.5, 0.7, 1}, where the unknown signal x was generated in a similar manner as in Section III-C. The stochastic threshold λ was generated according to λ ∼ N (0, I m ). The signal to noise ratio (SNR) is evaluated as: In Fig. 10, the recovery performance is illustrated by using the NMSE defined in (26), with the results averaged over 10 experiments. We report both SNR and NMSE in dB (10 log(.)). As expected, by increasing the SNR, the performance of our method is improved. Furthermore, the performance of the estimation problem formulation in (55) is enhanced by increasing the number of one-bit samples m ∈ {5000, 10000}. In this approach, since the desired matrix X ⋆ is recovered statistically from MLE, we compare Φ (X) and Φ(X) by resorting to a widely used statistical distance, known as the Hellinger distance, which was defined in (2). The vector entry-wise formula of the Hellinger distance is given as where {μ j } is the estimated version of {µ j } obtained from (55). As was previously observed, by increasing the value of SNR, Noisy OPeRA performs better in terms of the NMSE . A similar behavior occurs with the Hellinger distance shown in Fig. 11. The Hellinger distance is obtained is very small for all SNR values in this experiment, However, it is decreasing for an increasing SNR, which appears to where Υ µ (X) = log (f (s|µ)), with the noisy measurement vector {s j } is sampled from a probability distribution f (.|µ), and α is a positive scalar. For our numerical examinations, we assume the measurement noise is distributed as z ∼ N (0, 0.25I m ) and the termination criterion is X i − X ⋆ 2 F ≤ 5×10 −3 X ⋆ 2 F . Table II shows that by using a large number of samples (and making rank-one and PSD constraints redundant) in the noisy one-bit sampling scenario, Noisy OPeRA can recover the signal with a better CPU time for sample sizes m ∈ {5000, 10000, 20000} compared to the noisy PhaseLift method. This is similar to our discussion in the noiseless scenario; see Section V. Interestingly, by growing the number of samples, the NMSE is enhanced more significantly by Noisy OPeRA than that of the noisy PhaseLift method. The results are averaged over 5 experiments. The settings of the input signal, time-varying thresholds and the sensing are also chosen in the same way as in Section III-C.

VIII. CONCLUSION
We showed that the abundance of samples that naturally occurs in one-bit sampling scenarios has significant implications in lowering the computational cost of phase retrieval by making costly constraint redundant. The problem then boils down to a set of linear inequalities that may be solved by RKA within the proposed OPeRA signal recovery framework. The numerical results showcased the effectiveness of the proposed approaches for phase retrieval.