Phase Retrieval of Quaternion Signal via Wirtinger Flow

The main aim of this paper is to study quaternion phase retrieval (QPR), i.e., the recovery of quaternion signal from the magnitude of quaternion linear measurements. We show that all <inline-formula><tex-math notation="LaTeX">$\boldsymbol{d}$</tex-math></inline-formula>-dimensional quaternion signals can be reconstructed up to a global right quaternion phase factor from <inline-formula><tex-math notation="LaTeX">$\boldsymbol{O(d)}$</tex-math></inline-formula> phaseless measurements. We also develop the scalable algorithm quaternion Wirtinger flow (QWF) for solving QPR, and establish its linear convergence guarantee. Compared with the analysis of complex Wirtinger flow, a series of different treatments are employed to overcome the difficulties of the non-commutativity of quaternion multiplication. Moreover, we develop a variant of QWF that can effectively utilize a pure quaternion priori (e.g., for color images) by incorporating a quaternion phase factor estimate into QWF iterations. The estimate can be computed efficiently as it amounts to finding a singular vector of a <inline-formula><tex-math notation="LaTeX">$\mathbf{4}\boldsymbol{\times}\mathbf{4}$</tex-math></inline-formula> real matrix. Motivated by the variants of Wirtinger flow in prior work, we further propose quaternion truncated Wirtinger flow (QTWF), quaternion truncated amplitude flow (QTAF) and their pure quaternion versions. Experimental results on synthetic data and color images are presented to validate our theoretical results. In particular, for pure quaternion signal recovery, our quaternion method often succeeds with notably fewer measurements compared to real-valued methods based on monochromatic model or concatenation model.


I. INTRODUCTION
A S an expansion of the complex field C, an element in the non-commutative field Q = {q 0 + q 1 i + q 2 j + q 3 k : q 0 , q 1 , q 2 , q 3 ∈ R} is called a quaternion number, which contains one real part (q 0 ) and three imaginary parts (q 1 , q 2 , q 3 ). Although signals or images are traditionally processed in R or C, the quaternion algebra has been noted to be a suitable platform for certain signal processing tasks. Consequently, many signal processing tools have been developed for quaternion setting over the past decades, including Fourier transform [1], [2], wavelet transform [3], [4], principal component analysis [5], moment analysis [6], [7], compressed sensing [8], matrix completion [9], [10], deep neural network [11], [12], adaptive filtering [13], [14], [15], quaternion derivative [16], [17], [18], [19], and many others. Color image processing is an important application of quaternion. To process color images in R, one may use the monochromatic model that deals with each channel separately, or the concatenation model that concatenates three channels as a real matrix of triple size, whereas these two methods often fail to utilize the high correlations among channels. To fully utilize these correlations and process the color image as a whole, it was proposed to use tensor, quaternion or their integration for color image processing, see [20], [21], [22], [23] for instance. The quaternion-based approach encodes three channels (i.e., Red, Green and Blue in the RGB color space) into three imaginary parts of pure quaternion, with the real part set to zero, hence the color image is modeled as pure quaternion signal. The advantage of quaternion-based approach is that the correlations among the three channels can be well preserved, and color images can be processed in a holistic manner. This approach was proposed in [1], [2], [24], and now has been extensively developed in various color imaging problems or methods, including denoising [23], [25], [26], inpainting [9], [10], [27], segmentation [28], [29], convolution neural network [30], watermarking [31], [32], sparse representation of color image [33], [34]. By taking advantage of holistic processing of color images, quaternion-based approach usually outperforms processing methods in R (e.g., the aforementioned monochromatic model and concatenation model), see for instance [9], [23], [25], [26], [27].
Departing momentarily from the quaternion methods in signal processing, phase retrieval concerning signal reconstruction from phaseless measurements has attracted considerable research interest. It is motivated by a frequently encountered setting where it would be expensive, difficult, or even impossible to capture the measurement phase, to name a few, X-ray crystallography [35], quantum mechanics [36], speech recognition [37]. We further emphasize the crucial role played by phase retrieval in many imaging problems like diffraction imaging [38], astronomical imaging [39], optics and microscopy [40], [41]. For detailed discussion, we refer to the survey paper [42].
Mathematically, the goal of phase retrieval is to recover a signal x ∈ R d /C d based on the given knowledge of measurement matrix A ∈ R n×d /C n×d and the corresponding phaseless measurements |Ax| 2 (| · | 2 here applies elementwisely). From the theoretical side, it is possible to reconstruct all signals up to a global phase factor (i.e., a sign ±1 in R or a unit complex scalar exp( iθ) in C) under optimal sample complexity O(d), see [43], [44], [45], [46] for instance. While most early algorithms for phase retrieval lack theoretical support [47], [48], [49], a series of guaranteed algorithms have been developed in the past decade, which can be divided into a convex optimization approach [50] and a non-convex optimization approach [51], [52], [53], [54]. Among them, the seminal work Wirtinger flow (WF) [51] provides a framework for algorithmic design under a non-convex optimization setting, i.e., via a careful initialization that can well approximate the solution, followed by gradient descent refinement. In many cases, this is essentially more practical and scalable than a convex lifting approach [50].
Although quaternion is widely used to represent and process color images, and phase retrieval is a crucial technique in imaging science problems, quaternion phase retrieval (QPR) has not yet been studied before. More precisely, given a quaternion measurement matrix A ∈ Q n×d , QPR is concerned with the recovery of x ∈ Q d from the phaseless measurements |Ax| 2 . To our best knowledge, the only related result is presented in [55], but it is restricted to A ∈ R n×d . Under such real measurement matrix, the model fails to utilize the quaternion multiplication but simply identifies Q as R 4 , thereby reducing to a special case of phase retrieval of real vector-valued signal. More prominently, compared with A ∈ Q n×d studied in this work, A ∈ R n×d leads to essentially more trivial ambiguities that can probably limit applications of QPR in signal processing (Remark 1).
The main aim of this paper is to close the research gap between phase retrieval and quaternion signal processing. We initiate the study of QPR, by first identifying the unavoidable trivial ambiguity, and then proposing and studying a practical algorithm of quaternion Wirtinger flow (QWF) with a linear convergence guarantee. This work is built upon many previous developments of quaternion, for instance, the HR calculus of quaternion derivative [16], [19], and results for quaternion matrices [56]. As color image processing is to deal with pure quaternion, we also develop an algorithm that can incorporate the priori of a pure quaternion signal (i.e., (x) = 0) into QWF. Our main contributions are summarized as follows: • (Trivial Ambiguity  To utilize the pure quaternion priori, PQWF  embeds an efficient quaternion phase factor estimate  into the iteration and enjoys similar theoretical guarantee  (Theorem 3). The earlier phase transition is presented to confirm the efficacy of PQWF (Fig. 2). • (Refinements and Experiments). Motivated by existing Wirtinger flow refinements for real/complex phase retrieval, we propose their counterparts in QPR (Algorithms 4-5) and numerically show their improvements over QWF (Fig. 3). We further specialize them to pure quaternion signal (Algorithms 6-7) and then use them in color image recovery. Compared to real-valued phase retrieval based on monochromatic model or concatenation model, the proposed quaternion method succeeds with notably fewer phaseless measurements (Figs. 4-6). As part of our technical contributions, many essentially different treatments take place in the proof of Theorem 2 to overcome the challenges arising in quaternion setting. An evident example is the concentration of 1 n n k=1 where α * k is the k-th row of A. In the proof, we avoid the Hessian matrix employed in [51], and instead calculate 2 by using a real matrix representation T (·) of a quaternion (Remark 3). The formal definition of T (·) can be found in Section II. In our analysis, such real matrix representations are recurring, e.g., (16), (27). Actually, with a great deal of quaternion-based ingredients involved in the theoretical analysis, we believe this work can technically provide an example for quaternion study and hence of some pedagogical value. This paper is organized as follows. In Section II we state the notation and provide the preliminaries. Several useful techniques for quaternion study are included. In Section III we first identify the trivial ambiguity in QPR, then propose QWF, and present the proof for its linear convergence. In Section IV, we propose PQWF for QPR of pure quaternion signal. Experimental results on both synthetic data and color image data are presented in Section VI. We give some remarks to conclude the paper in Section VII. To improve the readability, some auxiliary results for the main proof are provided in Appendix A.

II. NOTATIONS AND PRELIMINARIES
Some notations are needed for mathematical analysis. We denote probability and expectation respectively by P(·) and E(·), note that E(·) separately operates on one real part and three imaginary parts of a quaternion random variable. Besides, C, c, C i , c i represent absolute constants whose value may vary from line to line. Both T 1 T 2 and T 1 = O(T 2 ) mean T 1 ≤ CT 2 for some absolute constant C. Conversely, T 1 ≥ CT 2 is denoted by T 1 T 2 or T 1 = Ω(T 2 ). We use capital boldface letters, lowercase boldface letters to denote matrix, vector, respectively. We denote real or complex scalar by regular letter (e.g., a, b), while quaternion scalar by the typewriter style letter (e.g., a, b). We conventionally write [m] = {1, · · · , m}.

A. Basics of Quaternions and Quaternion Matrices
Let Q = {q = q 0 + q 1 i + q 2 j + q 3 k : q 0 , q 1 , q 2 , q 3 ∈ R} be the set of quaternion numbers, then Q d (resp. Q d1×d2 ) denotes the set of d-dimensional quaternion vectors (resp. n 1 × n 2 quaternion matrices). The addition and subtraction of two quaternion numbers are defined component-wisely, e.g., and with distributive law, associative law imposed, the multiplication between quaternion numbers is defined.
Given q = q 0 + q 1 i + q 2 j + q 3 k, besides the real part (q) = q 0 and the vector part (q) = q 1 i + q 2 j + q 3 k, we further use P ϑ (ϑ = i, j, k) to extract three imaginary components, i.e., P i (q) = q 1 , P j (q) = q 2 , P k (q) = q 3 . Note that q = q 0 − q 1 i − q 2 j − q 3 k is the conjugate of q, |q| = ( 3 k=0 q 2 k ) 1/2 is the absolute value. We allow these operations for quaternion number element-wisely apply to quaternion vectors and matrices. For nonzero q, q −1 = q/|q| 2 is its inverse. As pure quaternions with zero real part is of particular interest, we collect them in Q p = {q ∈ Q : (q) = 0}. The phase of a non-zero quaternion q is defined as sign(q) = q |q| . As the phase belongs to T Q = {q ∈ Q : |q| = 1}, we sometimes call quaternion in T Q the quaternion phase factor, e.g., when we describe the trivial ambiguity.
Given the vector , is defined to be the maximum number of right linearly independent columns of A. Note that quaternion vectors α 1 , · · · , α N are said to be right linearly independent if N k=1 α k q k = 0 (q k ∈ Q) can imply q k = 0 for all k. Let I d be the identity matrix, the matrix A ∈ Q d×d is invertible if there exists B such that AB = BA = I d . Parallel to complex matrices, A is invertible if and only if it is full rank (rank(A) = d) [56]. We say A ∈ Q d×d is Hermitian if Note that quaternion multiplication is non-commutative (e.g., i j = − j i), which is often a key technical challenge in the extension from R/C to quaternion setting. We note that, taking the real part is an effective technique to circumvent this issue, since we have (ab) = (ba) for any a, b ∈ Q, or more generally, for A ∈ Q d1×d2 , B ∈ Q d2×d1 it holds that Tr(AB) = Tr(BA) . (1)

B. Real Representation and Quaternion SVD
We further introduce two key techniques to study quaternion matrices. The first one is the complex or real representation of quaternion matrices based on the maps T C (·), T (·). Note that any A ∈ Q d1×d2 can be uniquely written as A = B + C j for some B, C ∈ C d1×d1 , then T C (·) maps A to its complex adjoint matrix belonging to C 2d1×2d2 For quaternion matrices A, A 1 , A 2 , several useful relations are in order: [56]. Furthermore, A ∈ C d1×d2 can be written as A = B + C i for some B, C ∈ R d1×d2 , which can then be reduced to real matrix by T R : and for complex matrices A, Naturally, a composition of these two maps, i.e., More precisely, The following relations hold due to the properties of T C and T R : . We would also define T i (A) (i = 1, 2, 3, 4) to be the i-th column of blocks in T (A).
For instance, In addition, quaternion singular value decomposition (QSVD) is another powerful tool. For A ∈ Q d1×d2 , there exist unitary matrices U ∈ Q d1×d1 , V ∈ Q d2×d2 , diagonal matrix Σ ∈ R d1×d2 with non-negative diagonal entries σ 1 , · · · , σ min{d1,d2} , such that A = U ΣV * . We refer readers to [56,Theorem 7.2] for its derivation. In QSVD, {σ k : 1 ≤ k ≤ min{d 1 , d 2 }} are the singular values of A. It is straightforward to show several facts coincident with R d1×d2 or C d1×d2 , e.g., the maximum singular value equals A , and A F = ( k σ 2 k ) 1/2 . Moreover, the SVD for T (A) is given by T (A) = T (U )T (Σ)T (V ) , which leads to T (A) = A , T (A) F = 2 A F . We will also work with the matrix nuclear norm defined to be the sum of singular values, i.e., A nu = k σ k .

C. (Standard) Eigenvalue and Eigenvector
For simplicity, in this paper we restrict the eigenvalue to be the right one. 1 In particular, given A ∈ Q d×d , if Ax = xλ for some nonzero x ∈ Q d , we refer λ, x to as the eigenvalue, eigenvector of A. Since Ax = xλ is equal to A(xv * ) = (xv * )(vλv * ) for any v ∈ T Q , A with eigenvalue λ indeed possesses a set of eigenvalues {vλv * : v ∈ T Q }, among which we can pick a unique "standard eigenvalue" in the form of

D. Derivative of Real Function With Quaternion Variable
We need to calculate the derivative of real function with quaternion variable. The framework that best meets such optimization need is the HR calculus, see [16] and the more complete theory in [17], [18], [19]. More precisely, given a function Generally, this is called left derivative and is different from the right derivative ∂f Considering only the derivative of real function f (f ∈ R) will be involved in this work, we simply adopt the left one (2). For f ∈ R with quaternion variable q, the gradient represents the direction in which f changes in a maximum rate [16], [19]. Some calculation rules in [19] would also be used later.

A. The Trivial Ambiguity
Given a quaternion signal x ∈ Q d and a measurement matrix Observe that for any unit quaternion q (i.e., q ∈ T Q ), |A(xq)| 2 = |Ax| 2 , so one can never distinguish two signals only differentiated by a global right quaternion phase factor. 2 However, a global left quaternion phase factor q is not necessarily an ambiguity since q and A may not be commutative, hence |A(qx)| 2 = |Ax| 2 is possible, see more discussions below Theorem 1. 3 This is in stark contrast to the real or complex phase retrieval.
Throughout this paper we consider the Gaussian measurement ensemble where the entries of A are i.i.d. drawn from Evidently, E(α k α * k ) = I d . 2 For nonzero quaternion q we call q |q| its phase. Hence, we will refer to q ∈ T Q as a quaternion phase factor. 3 However, when a real measurement matrix is used (as in [55]), this becomes a trivial ambiguity due to |A(qx)| 2 = |qAx| 2 = |Ax| 2 when A ∈ R n×d . This is the essential difference between this work and the QPR result in [55].
Generally speaking, the goal in any signal reconstruction task is to recover the signal up to trivial ambiguity. Thus, a question of fundamental importance is whether there exist other unavoidable ambiguities in QPR (besides the aforementioned right phase factor). In the next theorem, we show that the global right quaternion phase factor is the only trivial ambiguity by proving a stronger uniform recovery guarantee: all signals in Q d can be reconstructed up to right quaternion phase factor from O(d) phaseless measurements.
Theorem 1: When n ≥ Cd for some absolute constant C, with probability at least } up to a global right quaternion phase factor. Proof: The proof can be found in supplementary material. In the proof of Theorem 1, we use Mendelson's small ball method to show that, with high probability, |α * k x| 2 = |α * k y| 2 (∀k ∈ [n]) implies xx * = yy * (while the converse statement is evidently true). As shown in Lemma 1, xx * = yy * is equivalent to x = yq for some q ∈ T Q .
Note that a global left phase factor is not trivial ambiguity, as for q ∈ T Q , (qx)(qx) * = qxx * q does not equal to xx * in general. Indeed, even it happens that (qx)(qx) * = xx * , by Lemma 1, such ambiguity can be expressed via a right quaternion phase factor (i.e., qx = xq 1 for some q 1 ∈ T Q ).
Proof: If x = yq for some unit quaternion q, then xx * = yqqy * = yy * . Thus, it remains to prove . Let i = j, we obtain |x i | = |y i |, so we can assume x i = y i q i for some q i ∈ T Q . For i = j, we thus have y i q i q j y j = y i y j . Assuming y i , y j are both non-zero, this implies q i = q j = q for some common q. When y i = 0, then x i = 0, we also have x i = y i q. Therefore, there exists a common q ∈ T Q such that x = yq.
Note that using Mendelson's small ball method with a little bit more work one can prove uniform stable recovery guarantee [57].
We further give a remark comparing our model and QPR with real measurement matrix studied in [55].
Remark 1: Some results on recovering x ∈ Q d from |Ax| 2 with A ∈ R n×d were presented in [55]. Using a real measurement matrix, such model does not really utilize the special quaternion multiplication because for can be viewed as a d-dimensional R 4 -valued vector. Thus, it simply identifies Q with R 4 . Compared to our QPR model, the downside of such model is that it suffers from much more trivial ambiguities, e.g., the left quaternion phase factor as |Aqx| 2 = |Ax| 2 holds for q ∈ T Q , the conjugate as |Ax| 2 = |Ax| 2 , and moreover a 4 × 4 orthogonal matrix operating on the real part and three imaginary parts because α x = α x O holds for any 4 × 4 orthogonal matrix O (interested readers can verify that this ambiguity is already more severe than the right quaternion phase factor in our model). Indeed, we will show that using our QPR model, most pure quaternion signals can be reconstructed up to a sign (Lemma 6), but this is not possible under real measurement matrix because of the trivial ambiguity

Algorithm 1 Spectral Initialization
Input: data (α k , y k ) n k=1 1: Construct the Hermitian data matrix and find its normalized eigenvector regarding the largest standard eigenvalue ν in . 2: We compute λ 0 := 1 n n k=1 y k 1/2 and obtain the spectral of a 3 × 3 orthogonal matrix operating on three imaginary parts (as an example, let a, b ∈ R d we cannot distinguish a i + b j and b i + a j using the phaseless measurements produced by a real measurement matrix).

B. The Quaternion Wirtinger Flow Algorithm
Recall that WF for solving real/complex phase retrieval problem contains spectral initialization and WF update as two steps [51], and we will first present the QWF algorithm in analogy. We will exclusively use x to denote the underlying quaternion signal and assume x = 1. For succinctness we focus on noiseless case where the k-th measurement is y k = |α * k x| 2 . The QWF algorithm is based on minimizing the 2 loss We first describe the careful initialization by spectral method in Algorithm 1. Intuitively, ν in can well approximate the direction of x due to ES in = I d + 1 2 xx * (Lemma 8(d)). Because Ey k = x 2 , it is natural to estimate the signal norm x as λ 0 = 1 n n k=1 y k 1/2 .
Then, QWF refines z 0 by a quaternion kind of gradient descent. Here, the gradient is calculated under the framework of (generalized) HR calculus [16], [19], but we still follow the convention in [51] and term the algorithm as (quaternion) Wirtinger flow. Given f (z) with quaternion variable z, it would be cumbersome to rewrite it as f ( (z), P i (z), P j (z), P k (z)) and then follow the definition (2). Instead, we apply some rules derived in [19], specifically the product rule ∂(fg) Table IV]. Therefore, the quaternion derivative can be calculated as Hence, taking a suitable step size η, the update rule is Algorithm 2 Quaternion Wirtinger Flow (QWF) Input: (α k , y k ) n k=1 , step size η, iteration number T 1: for i = 0, 1, ..., T − 1: Compute ∇f (z i ) as in (8), then update z i to z i+1 as in (7). end for Output: z T where we let ∇f (z) := ∂f (z) ∂z * to keep notation light, i.e., Overall, we summarize the QWF update in Algorithm 2.

C. Linear Convergence
For z, x ∈ Q d , due to the trivial ambiguity of right quaternion phase factor, we characterize the distance between z and x by Define the phase of nonzero w ∈ Q to be sign(w) = w |w| , and let sign(0) = 1, some algebra shows that the minimum of (9) is attained at w = sign(x * z), and hence dist(z, to denote the reconstruction error of z. Naturally, a small neighborhood of x should be given as E (x) = {z ∈ Q d : dist(z, x) ≤ }. We present our first main result that guarantees the linear convergence of QWF.
Theorem 2: We consider a fixed signal x satisfying x = 1, a measurement matrix A ∼ N n×d Q , and the observations y k = |α * k x| 2 where α k is the k-th row of A. Suppose that we run Algorithm 2 with the step size η in (7) . Then under the sample size n ≥ C 1 d log n for some absolute constant C 1 , with probability at least 1 − C 2 n −9 − C 3 n exp(−C 4 d), the sequence {z t } produced by QWF satisfies for some c 1 .

Remark 2:
We assume x = 1 to facilitate theoretical analysis with no loss of generality. To be adaptive to an unknown signal norm, as in [51] we suggest a step size η = η1 z0 2 where z 0 is the spectral initialization from Algorithm 1. In this case, the linear convergence still holds as long as η 1 The proof of Theorem 2 can be divided into several ingredients below, specifically Lemmas 2-5, and then we will arrive at the desired linear convergence in the end of this section. The theoretical analysis will be provided in a reverse order. We first show that as long as z 0 ∈ E (x) for some sufficiently small , the sequence produced by QWF update (7) linearly converges to x. Then, we complete the proof by showing z 0 ∈ E (x) holds with high probability. To analyze the behaviour of {z t } in E (x), we define several conditions to characterize the landscape of f (z) when z ∈ E (x).
Condition 1: (Regularity Condition) The regularity condition holds with positive parameters τ, β, , abbreviated as With sufficiently small step size, linear convergence of {z t } can be implied by RC(τ, β, ) of f (z). This observation bears resemblance to a classical result in convex optimization (see [58,Theorem 2.1.15]), but its proof requires proper modification (see [51,Lemma 7.10]). Fortunately, it remains true in the quaternion setting without essential technical changes.
Lemma 2: Under Condition 1, if z 0 ∈ E (x) and the step size 0 < η ≤ 2 β , then the sequence {z t } produced by (7) satisfies Proof: The proof can be found in supplementary material. Therefore, it suffices to establish the regularity condition, and similar to [51] we divide it into two properties.
Proof: We aim to show (13) for some τ, β specified later.
Then we deal with (13) by using ∇f (z) = 1 n n k=1 |α * k z| 2 − |α * k x| 2 α k α * k z and z = (h 0 + x)φ(z), it gives a sufficient condition for (13) We only need to consider nonzero h 0 and we further let h 0 = s · h with s = h 0 ∈ [0, ], h = 1, (h * x) = 0. Hence, the above sufficient condition can be implied by We define t β := 9 We 2 , and we need to work out the concentration of 1 n n k=1 Letting Note that entries of T 1 (α k ) are independent copies of 1 2 N (0, 1), by rotational invariance, without changing distribution we can assume Hence, we can invoke Lemma 9 and obtain that when n = Ω(δ −2 d log n), the right-hand side of (18) is bounded by δ with probability at least 1 − 32n −9 − 32 exp(−Ω(d)). This gives rise to Our strategy is to first consider fixed (h, s) and then apply a covering argument. Note that where we use Lemma 8(b)-(d). Assuming s ≤ ≤ 1, the last inequality follows as long as t β = 9 Now we can invoke [51, Lemma 7.13] (or the original derivation [59]) and obtain ∀ t > 0 On the other hand, we can use the first line of (20), then for fixed h, s, (19) becomes Setting (hence |s|), δ, and 1 τ to be sufficiently small, t β = O(1), then the right-hand side of (22) We can assume sup [51], for some C 4 , max k∈[n] α k ≤ C 4 √ d holds with probability at least 1 − n exp(−C 5 d) (One may also see this by a direction application of Theorem 3.1.1, [60]). We proceed on this assumption, and start from We use the first line in (20), it is direct to show R 1 ≤ C 6 (δ h + δ s ). For estimate of R 2 , we let Thus, we can take δ h , δ s = 1 C7d 2 with sufficiently large C 7 so that R 1 + R 2 ≤ 1 8 holds. In this case we can assume . Plug these into (23), when n = Ω(d log d) for sufficiently large hidden constant, with probability at least Recall that the only additional scaling we assume in the proof is t β = O(1), while this can be guaranteed by sufficiently small  [51]). However, this becomes infeasible in quaternion setting: Firstly, the Hessian matrix now contains 16 blocks and can be exhausting in calculations (see [18], Equation (33)]); Perhaps more prominently, the relation between R and h * ∇f (x) h heavily relies on commutativity and hence is likely to fail due to non-commutativity of quaternion. Instead, we calculate (x * α k α * k h) 2 via the map T (·) (17). Having reduced to the real case, we directly work on the desired concentration ingredient in Lemma 9.
We let h := zφ(z) − x, then h ≤ , (h * x) = 0, and z = (h + x)φ(z). Substituting z with h, the right-hand side of (25) becomes 1 τ h 2 + 1 β 1 n n k=1 |α * k h| 4 . We further define w = uφ(z), and plug in ∇f (z), some algebra gives . Thus we can further estimate the left-hand side of (25) Similar to the proof of Lemma 3, we can assume max k∈[n] α k ≤ C 1 √ d with probability at least 1 − n exp(−cd). Since h ≤ ≤ 1, by Cauchy-Schwarz we have where in the last inequality we use A * A ≤ A 2 and a standard estimate for operator norm of (sub-)Gaussian matrix that holds with probability at least 1 − 2 exp(−n) (e.g., Theorem 4.4.5, [60]). We similarly use Cauchy-Schwarz to deal with I 2 , it yields Note that in the last inequality, we can assume x = e 1 by rotational invariance, then write α k = [α ki ] and calculate Note that (α k1 ), P ϑ (α k1 ) is just one entry of T i (α k ), we can invoke Lemma 9 to establish the concentration of each summand in (27) around its mean with δ = 1, while evidently for the mean of each summand is O(1). This leads to 1 n n k=1 |α * k x| 2 α k α * k = O(1) with probability at least 1 − C 2 n −9 − C 3 exp(−Ω(d)). We use this again to deal with I 3 Putting pieces together, we have shown that for all w = 1, h ≤ , | (u * ∇f (z))| 2 ≤ C h 2 + Cd 1 n n k=1 |α * k h| 4 . Thus, LSC(τ, β, ) holds with sufficiently small τ , and β = c1 d with sufficiently small c 1 . The result follows.
Lemma 5: Assume x is the fixed underlying signal. Given δ ∈ (0, 1]. If n ≥ C 0 δ −2 d log n for sufficiently large hidden constant, then with probability at least Proof: The proof can be found in supplementary material. In Lemma 5 we take δ = 1 16 , then under the assumptions of Theorem 2 z 0 ∈ E 1/8 (x) with high probability. Then, applying Lemma 3, 4 shows RSC(c, dc, 1 8 ) (where c is sufficiently large). Thus, if η = O( 1 d ), Lemma 2 delivers the linear convergence claimed in Theorem 2.

IV. PURE QUATERNION WIRTINGER FLOW
Recall that Q p is the set of pure quaternions, and naturally, Q d p represents the space of d-dimensional pure quaternion signals. This section is intended to propose a variant of QWF called pure quaternion Wirtinger flow (PQWF) that can effectively utilize the priori of x ∈ Q d p . This is motivated, for example, by quaternion methods in color image processing where the color channels are encoded in three imaginary components, and hence the desired signal is pure quaternion [9], [23]. While many works choose to remove the real part of the quaternion signal after the reconstruction (e.g., [9], [23], [61]), it is obviously more sensible to incorporate the pure quaternion priori into the recovery procedure and gain some benefits [10], [62].
For a ∈ Q d we define the real counterpart V(a) := [ (a), P i (a), P j (a), P k (a)] ∈ R d×4 . For instance, we The next lemma shows that, if the pure quaternion signal x satisfies rank V(x) = 3, then the trivial ambiguity in phase retrieval reduces to a sign.
Lemma 6: Assume x ∈ Q d p . In the phase-less measurement setting described in Theorem 1, all x satisfying rank V(x) = 3 can be reconstructed from {|α * k x| 2 : k ∈ [n]} up to a sign ±1. Proof: By Theorem 1, one can exactly reconstruct A x := {xq : q ∈ T Q }. Due to the assumption of x ∈ Q d p , we choose x 0 ∈ A x and can further pick It is not hard to verify that rank V(x 0 ) = rank V(x) = 3, for example, one can identify V(x 0 ) with the first row of T (x 0 ) up to permutation and then use T (x 0 q) = T (x 0 )T (q). Thus, [q 0 , q 1 , q 2 , q 3 ] lives in a one-dimensional subspace of R 4 . Combining with q ∈ T Q , there are only two feasible q in the form of {q, −q}, and ±x 0q obviously corresponds to ±x.
Remark 4: We remark that for color images with red, green, blue channels, each pixel has non-negative imaginary parts in its pure quaternion representation. Thus, the ambiguity of the sign (±1) can be further removed, meaning that the image can be exactly reconstructed.
Note that rank V(x) = 3 is often very minor in application, we thus impose this assumption and define to measure the reconstruction error. Note that the convergence guarantee for QWF is under the error metric dist(z t , x) = z t − xφ(z t ) , or equivalently, {z t φ(z t )} linearly converges to x, but the issue is that φ(z t ) = sign(z * t x) can never be determined (as it involves the unknown signal x). Thus, additional efforts are needed to design an algorithm for phase retrieval of x ∈ Q d p with convergence guarantee regarding dist p (z t , x). Our idea here is to estimate it up to a sign based on the pure quaternion priori. More precisely, for some z our strategy is to find a quaternion phase factor q ∈ T Q such that zq is closest to pure quaternion signal, i.e., and then we map z to (zq). Specialized to the current iteration point z t , we find q t as follows and then map z t to (z t q t ). A simple observation is (30) is equal to finding the eigenvector with respect to the smallest eigenvalue of V(z t ) V(z t ) ∈ R 4×4 , which can be implemented efficiently without incurring computational complexity higher than QWF.
The following condition on x assumes a scaling slightly stronger than rank V(x) = 3, i.e., the third singular value of V(x) is bounded away from 0. We restrict our algorithmic analysis to the set of pure quaternion signals with Condition 4 for some absolute constant κ 0 .
Condition 4: The pure quaternion signal x has unit 2 norm, and V(x) has three positive singular values bounded below by (i.e., larger than) some absolute constant κ 0 (κ 0 > 0). The following Lemma shows q t found by (30) can transfer the error metric from dist(z t , x) to dist p (z t q t , x), with the error preserved up to a multiplicative constant only related to κ 0 in Condition 4.
Lemma 7: Under Condition 4 we assume dist(z t , x) ≤ δ for δ ∈ (0, 1 2 ). If the quaternion phase factor q t is found by (30), it Moreover, we have where we use the optimality of w t in the second inequality. On the other hand, by Condition 4 (z t , x). Substitute this into (31) completes the proof. Now we are at a position to propose the PQWF algorithm. The core spirit is to pick some positive integer T p and then invoke the pure quaternion prior every T p QWF iterations.
The next Theorem presents similar linear convergence for PQWF.
Theorem 3: We consider a fixed signal x satisfying Condition 4. Suppose x satisfies Condition 4, and by using A ∼ N n×d Q and y k = |α * k x| 2 we run Algorithm 3 with step size If n ≥ C 0 d log n for some C 0 , T p ≥ −2 log(c2) log(1 − c1/d) , then with high probability as in Theorem 2, for any k ≥ 0 we have

C. The Pure Quaternion Versions
The developed techniques for utilizing a pure quaternion priori can be similarly incorporated into QTWF, QTAF -by mapping z t to (z t q t ) (q t is defined in (30)) every T p iterations. For clarity, we present Pure QTWF (PQTWF) and Pure QTAF (PQTAF) in the following.

VI. EXPERIMENTAL RESULTS
We present experimental results in this section, specifically Sections VI-A, VI-B, and VI-C for synthetic data, and Section VI-D for color images. 5

A. Synthetic Data
In each single trial of QWF, we use Gaussian measurement ensemble A ∼ N n×d Q . The entries of quaternion signal x ∈ Q d (resp. pure quaternion signal x ∈ Q d p ) are i.i.d. copies of N (0, 1) + ϑ= i, j, k N (0, 1)ϑ (resp. ϑ= i, j, k N (0, 1)ϑ), then x will be normalized so that x = 1. We apply 100 power iterations to approximately find the ν in as the leading eigenvector of the Hermitian S in . Here, the power method for quaternion Hermitian matrix is parallel to the complex case [63]. By default, we set η = 0.2n n k=1 y k and run 1500 QWF updates to obtain the reconstructed signalẑ. We first report the success rate of QWF under different sample sizes (Fig. 1a). Specifically, we test d = 100 and the sample sizes n d = 3 : 0.5 : 13. For each n we conduct 100 independent trials, with a trial claimed to be success if dist(ẑ, x) < 10 −5 [51], [53]. One can see that the phase transition starts from n/d = 6.5, and the success rate reaches 1 at n/d = 9. Under n ≥ 9d the success rate remains 1. In Fig. 1(b), we also plot the curve of log ( dist(z t , x)) versus t in a single trial with m/n = 10 and 3500 QWF updates. The curve decreases and shapes like a straight line, and then reaches a plateau, which is consistent with our linear convergence guarantee in Theorem 2.
Then we go into pure quaternion signal reconstruction via PQWF, and we test x ∈ Q 50 p . Recall that the error metric becomes dist p (z, x) = min{ z + x , z − x } since the synthetic signal admits Condition 4. Our main goal is to show the proposed PQWF in Section IV can effectively incorporate the pure quaternion priori and gain notable benefits from it, e.g., earlier phase transition. We also try to reveal the significant role played by the phase factor estimate (30). For this purpose, we invite the following algorithms to compete with PQWF: • Algorithm I (Alg. I): We run QWF and finally turn its output to pure quaternion via phase factor estimate. That is, it first runs Algorithm 2 to get z T , and then takes (z T q T ) as solution, where q T is the solution to (30) with z t = z T . • Algorithm II (Alg. II): This algorithm is a variant of PQWF -it incorporates the pure quaternion prior every T p iterations by directly removing the real part without phase factor estimate. That is, it runs Algorithm 3 but with (33) substituted by z (i+1)Tp = (z (i+1)Tp ). • Algorithm III (Alg. III): We run QWF and finally turn its output to pure quaternion by directly removing the real part without phase factor estimate. That is, it first runs Algorithm 2 to get z T and takes (z T ) as solution. The experimental results are reported in Fig. 2. We provide the success rate of PQWF and Alg. I-III under m n = 3 : 0.5 : 13 in Fig. 2(a) (for T p = 5) and Fig. 2(b) (for T p = 10). Clearly, the proposed PQWF outperforms Alg. I-III. For instance, it embraces a phase transition earlier than Alg I, thus confirming the advantage of PQWF in utilizing the pure quaternion priori to reduce the measurement number. In stark contrast, imposing the pure quaternion constraint by removing the real part, Alg. II and Alg. III have worse performances, which demonstrates the crucial role played by the phase factor estimate (30). Sinceẑ returned by QWF only recovers x up to the unknown right quaternion phase factor, (ẑ) obviously does not approximate x up to a sign. This explains why the success rate of Alg. III remains zero. On the other hand, it would be more interesting to take a closer look at the curve of Alg. II. In particular, Alg. II also enjoys an phase transition earlier than Alg. I (6 ≤ n d ≤ 7). However, under relatively sufficient measurements (e.g., 8.5 ≤ n d ≤ 10, Fig. 2b), it can be even worse than Alg. I. For illustration, we comment that the QWF update is based on the data, while removing the real part is based on the priori, so their effects on the iteration are likely to somehow neutralize, which possibly explains the unsatisfactory performance of Alg. II. Therefore, removing the real part is not a sensible strategy for utilizing the pure quaternion priori. Beyond that, we also track log( dist p (z 5k , x)) in a single trial of PQWF (with T p = 5, m/n = 8) and plot the error decreasing curve in Fig. 2(c). This corroborates our linear convergence guarantee.

C. Reduced Measurement Number in Pure Quaternion Signal Recovery
In this part, we compare QPR with the real-valued methods of monochromatic model and concatenation model in the regime of pure quaternion signal recovery. It will be shown that our quaternion model can succeed using notably fewer phaseless measurements.
• The monochromatic model conducts phase retrieval for P i (x), P j (x), P k (x) ∈ R 50 separately. Letx i ,x j ,x k be the corresponding reconstructed signals, the error metric is The concatenation model conducts phase retrieval for x con := [(P i (x)) , (P j (x)) , (P k (x)) ] ∈ R 150 . Let x be the reconstructed signal, the error metric is dist p (x con ,x). For the above two real-valued models, we use measurement matrix with i.i.d. standard Gaussian entries and test WF, TWF and TAF using parameters recommended by the original papers [51], [53], [54]. Accordingly, we test QPR under PQWF (step size η = 0.15, other parameters remain the same), PQTWF (same parameters as QTWF in Section VI-B, T p = 5), PQTAF (same parameters as QTAF in Section VI-B, T p = 5). The pure quaternion signal is drawn as before.
We test the success rate of QPR under our PQWF, PQTWF, PQTAF (Algorithms 3, 6, 7), and then compare with the above two real-valued models under WF [51], TWF [53], TAF [54]. The results are displayed in Fig. 4(a)-(c) (to be fair, each subfigure compares the considered models under comparable algorithms, see the caption of Fig. 4 for details). Observe that QPR enjoys a full success rate when m ≥ 8d under PQWF, when m ≥ 7.5d under PQTWF, or when m ≥ 6d under PQTAF. By contrast, the real-valued models require at least m = 9d to achieve full success rate, i.e., the concatenation model under TAF, whereas all other cases require much larger sample size. These results unveil an advantage of our QPR model over the two real-valued models -while each measurement (in the three models) is commonly a positive scalar, when comparable algorithms are used, QPR can succeed with notably fewer measurements.

D. Color Image
We test our algorithms in the real data of color image. The tested 256 × 256 images "Female" and "Mandrill" are available online. 6 To implement phase retrieval with Gaussian measurement matrix, the image is divided into 256 blocks of size 6 https://sipi.usc.edu/database/database.php?volume=misc#top 16 × 16. 7 In our QPR model, each block is modeled as pure quaternion signal in Q 256 p , and we will test algorithms PQWF, PQTWF and PQTAF with parameters as before. For comparison, we also test the real-valued methods of monochromatic model and concatenation model using TAF (from Fig. 4, TAF performs better than TWF and WF for the real-valued models). Recall that a color image can be exactly reconstructed without any ambiguity (Remark 4). Using the same measurement number of m = 7.5 × 256 to deal with each block separately, we show the original image and reconstructed images (with PSNR) using different models/algorithms in Figs. 5 and 6. Clearly, using our quaternion model and an oversampling rate of 7.5, PQWF only fails in one block, while the more refined algorithms PQTWF and PQTAF exactly recover the whole image. By contrast, the two real-valued based methods fail in much more blocks and deliver much lower PSNR. These results agree with the conclusion of Section VI-C.
Additionally, we further test QPR (using PQTAF) and the real-valued methods of monochromatic model and concatenation model (both using TAF) over 24 color images from the Ko-dak24 image dataset. 8 Specifically, we downsample the images to "192 × 128"/"128 × 192" and divide them into 96 blocks of size 16 × 16, then we perform (quaternion) phase retrieval in each block using m = 7.5 × 256 measurements independently. The results are reported as follows: (1) The monochromatic method does not exactly recover any image, with the mean and standard deviation of the 24 PSNR values being 23.54 and 2.10, respectively; (2) The concatenation model achieves exact reconstruction over 12 images, with the mean and standard deviation of the remaining 12 PSNR values being 27.20 and 3.45, respectively; (3) Our quaternion method delivers exact recovery over 22 images, while the PSNR values of the remaining two images are 26.32 and 13.34. Note that QPR exactly recovers most images, which again validates our advantage that QPR succeeds with fewer measurements. However, QPR does not perform well in two images. Taking a closer look at these two images, we find that the blocks where QPR fails, do not satisfy 7 To handle the whole image without dividing into blocks (as in [51], [53], [54]), we need to develop more practical measurement matrix for QPR, e.g., coded diffraction pattern. See more discussions in Section VII. 8 https://www.kaggle.com/datasets/sherylmehta/kodak-dataset Original image "Female", reconstructed images of QPR using PQWF, PQTWF, PQTAF, and the monochromatic model, concatenation model using TAF. Note that PSNR = ∞ appears because the signals returned by the algorithms have relative error (i.e., Î − I F / I F withÎ, I being the reconstructed image, original image (resp.)) less than 10 −9 , then after changing to the "uint8" format the reconstructed image exactly equals to the original image. Fig. 6. This is an experiment parallel to Fig. 5. All simulations use 1920 phaseless measurements for each block with size 16 × 16. QPR using PQTWF or PQTAF achieves exact reconstruction, QPR using PQWF fails in two blocks, the two real-valued based models using TAF fail in more blocks. Condition 4,9 in the sense that their three channels are (nearly or exactly) real linearly dependent. Note that Lemma 6 no longer stands for pure quaternion signal whose three imaginary parts are (real) linearly dependent, and in general we cannot identify such signal up to a sign, so QPR unavoidably fails in these blocks due to identifiability issue. Thus, extra cautiousness is needed to ensure Condition 4 when one applies QPR to pure quaternion signal recovery. Furthermore, we conduct the experiment again with the measurement number for each block reduced to m = 6.5 × 256 (i.e., under the oversampling rate 6.5), then the performances of the two real-valued based methods deteriorate significantly, while QPR using PQTAF still exactly recovers 22 color images and fails in the remaining two due to the signal nature. Detailed results are reported in the supplementary material.

VII. CONCLUDING REMARKS
In this paper, we initiate the study of quaternion phase retrieval (QPR) problem, which is formulated as the 9 See supplementary material for the two images in which QPR fails.
reconstruction of x ∈ Q d from |Ax| 2 with known A ∈ Q n×d . As the theoretical foundation, we first confirm that the global right quaternion phase factor is the only trivial ambiguity. Then, we propose quaternion Wirtinger flow (QWF) as a scalable and practical algorithm for solving QPR. The linear convergence of QWF has been proved and presented as a major theoretical result. The technical ingredients involve the HR calculus and some techniques for handling quaternion matrices. While our proof strategy is adjusted from the complex Wirtinger flow [51], a series of different treatments are employed with some new machinery (Remark 3 and other remarks in supplementary material).
Special attention is paid to QPR of pure quaternion signal. With an additional minor assumption, one can reconstruct the signal up to the trivial ambiguity of a sign (Lemma 6), which can be further removed in color image recovery (Remark 4). By using a crucial phase factor estimate, we develop the PQWF algorithm that can effectively utilize the pure quaternion priori. Note that PQWF enjoys guaranteed linear convergence. Motivated by existing refinements of WF, specifically TWF and TAF, we further propose QTWF, QTAF and their pure quaternion versions PQTWF, PQTAF. Their advantages are numerically demonstrated. We provide experimental results on synthetic data and color images that corroborate our theories. A surprising finding is that, for pure quaternion signal recovery, our quaternion method often succeeds with measurements notably fewer than real-valued methods based on monochromatic model or concatenation model. This is claimed as the advantage of the QPR model and makes our quaternion method preferable in a situation where one can only obtain a very limited number of phaseless measurements.
We note that this work only provides a starting point for the research of QPR, and there are undoubtedly many questions worth further exploration. We point out several directions to close this paper. Theoretically, although O(d) measurements have been shown to be sufficient for QPR, it would be of theoretical interest to investigate the precise measurement number needed for recovery of all signals in Q d . This is known as the minimal measurement number in phase retrieval (e.g., [43], [44], [45], [46]). Besides, compared to A ∼ N n×d Q , coded diffraction pattern (CDP) that applies Fourier transform to the masked signal would be more practical for some applications [64]. It is appealing to develop a similar measurement scheme for quaternion signal, ideally accompanied by a guaranteed algorithm. Moreover, while our simulations unveil an advantage of QPR (i.e., success using fewer measurements), it is interesting to explore more privileges so that one knows in what regime the quaternion model is preferable. By making use of the (approximately) low-rank structure of color images, the advantages of using quaternion-based method were demonstrated in image restoration or inpainting [9], [10], [23]. On the other hand, recent works have explored how to incorporate the low-rank priori into phase retrieval (referred to as low-rank phase retrieval, LRPR) [65], [66], [67]. Taken collectively, we conjecture that it may be fruitful to explore the benefit of using quaternion method in LRPR.

APPENDIX A AUXILIARY RESULTS
We provide some expectation results to support our analysis. Their proofs and some technical remarks are provided in the supplementary material.
Lemma 8: Assume entries of α ∈ Q d are i.i.d. drawn from N Q , and u, v ∈ Q d satisfy u = v = 1. Then we have the following: (a) (rotational invariance) For any unitary matrix P ∈ Q d×d , P α and α have the same distribution.