On the Convergence of Orthogonal/Vector AMP: Long-Memory Message-Passing Strategy

Orthogonal/vector approximate message-passing (AMP) is a powerful message-passing (MP) algorithm for signal reconstruction in compressed sensing. This paper proves the convergence of Bayes-optimal orthogonal/vector AMP in the large system limit. The proof strategy is based on a novel long-memory (LM) MP approach: A first step is a construction of LM-MP that is guaranteed to converge systematically. A second step is a large-system analysis of LM-MP via an existing framework of state evolution. A third step is to prove the convergence of state evolution recursions for Bayes-optimal LM-MP via a new statistical interpretation of existing LM damping. The last is an exact reduction of the state evolution recursions for Bayes-optimal LM-MP to those for Bayes-optimal orthogonal/vector AMP. The convergence of the state evolution recursions for Bayes-optimal LM-MP implies that for Bayes-optimal orthogonal/vector AMP. Numerical simulations are presented to show the verification of state evolution results for damped orthogonal/vector AMP and a negative aspect of LM-MP in finite-sized systems.


I. INTRODUCTION
A. Motivation C ONSIDER a reconstruction of an N -dimensional sparse signal vector x ∈ R N with N −1 E[x 2 ] = 1 from compressed, noisy, and linear measurements y ∈ R M [1], [2] with M ≤ N , given by y = Ax + w. (1) In (1), A ∈ R M×N is a known sensing matrix. The vector w ∈ R M denotes an additive noise vector with σ 2 = M −1 E[w 2 ]. A goal of compressed sensing is to reconstruct the signal vector x from the knowledge on the sensing matrix A and the measurement vector y. Compressed sensing has been applied to several practical issues, such as magnetic resonance imaging [3], radar [4], image super-resolution [5], channel estimation [6], spectrum sensing [7], error-correcting codes [8], and multiuser detection [9]. The prior distribution of the sparse signal vector is known in specific instances, such as errorcorrecting codes or multiuser detection, while it is unknown in many signal processing instances. This paper is more relevant to such specific instances-the prior distribution is known-since the knowledge on the prior distribution is utilized in the theoretical analysis of iterative reconstruction algorithms. Approximate message-passing (AMP) [10], [11] is a powerful message-passing (MP) algorithm for signal reconstruction. AMP with Bayes-optimal denoiser-called Bayes-optimal AMP-can be regarded as an exact approximation of loopy belief propagation in the large system limit [12], where both M and N tend to infinity while the compression ratio δ = M/N ∈ (0, 1] is kept constant. Furthermore, Bayes-optimal AMP was proved to achieve the Bayes-optimal performance [13]- [16] for zero-mean, independent and identically distributed (i.i.d.), and sub-Gaussian sensing matrices [17], [18] if the fixed-point (FP) equations describing the Bayes-optimal performance have a unique solution. Spatial coupling [19] is required when the FP equations have multiple solutions. See [20]- [23] for the details.
The main weakness of AMP is that AMP does not converge for sensing matrices beyond zero-mean i.i.d. matrices, such as non-zero mean matrices [24] or ill-conditioned matrices [25]. To tackle this convergence issue, several modified MP algorithms [26]- [31] were proposed. In particular, orthogonal AMP (OAMP) [30] or equivalently vector AMP (VAMP) [31] is a promising algorithm to solve the weakness of AMP. Since they are equivalent to each other in the linear measurement system (1), the two MP algorithms are referred to as OAMP in this paper.
The main feature of OAMP is the use of extrinsic messages to eliminate intractable dependencies between the current message and all preceding messages. The extrinsic messages were originally used in a single loop algorithm to solve a FP of the expectation consistency (EC) free energy [32], inspired by expectation propagation [33]. In fact, Bayes-optimal OAMP can be regarded as an exact approximation of EP in the large system limit [34], [35].
The asymptotic performance of OAMP can be characterized via rigorous state evolution [31], [35]. Bayes-optimal OAMP was proved to achieve the Bayes-optimal performance [36]- [38] for all orthogonally invariant sensing matrices if the FP equations describing the Bayes-optimal performance have a unique solution. The class of orthogonally invariant matrices is a general class beyond zero-mean i.i.d. sensing matrices.
Strictly speaking, state evolution [17], [18], [31], [35] does not guarantee the convergence 1 of MP algorithms for AMP or OAMP while it describes the asymptotic dynamics of these MP algorithms rigorously. As a result, we need to investigate the convergence of state evolution recursions to a FP for individual problems, e.g. [39]- [41].
For finite-sized systems, damping [25] is used to improve the convergence property of MP algorithms. Conventional state evolution cannot describe the dynamics of damped AMP or OAMP anymore. To the best of author's knowledge, theoretical analysis for damped MP algorithms are very limited [25], [42].
This paper addresses these two issues by proposing a novel damped MP algorithm that realizes the convergence in principle and the exact description for its dynamics. The damping issue has been already solved in [43] via the framework of long-memory (LM) MP [44]. LM-MP utilizes all preceding messages in computing the current message while conventional MP uses messages only in the latest iteration to update the current message. LM-MP was originally used to solve the Thouless-Anderson-Palmer (TAP) equation for Ising models with orthogonally invariant spin interaction via nonrigorous dynamical functional theory [45]. In this direction, Fan [46] proposed an LM-MP algorithm and analyzed the dynamics of the LM-MP algorithm rigorously. See [47] for a generalization of [46].
The state evolution framework for LM-MP in [44] is fully general, so that rigorous state evolution results for LM-MP can be obtained just by confirming whether the model of estimation errors for the LM-MP is included into a general error model proposed in [44]. In this direction, convolutional AMP (CAMP) [44], [48] was proposed as an LM generalization of AMP. The framework was used to construct VAMP with warm-started conjugate gradient (WS-CG) in [49]. To solve an convergence issue in CAMP for sensing matrices with high condition numbers [50], memory AMP (MAMP) [43] was proposed. State evolution results in [43] can describe the exact dynamics of MAMP even if damping is used. Since the main purpose of these LM-MP algorithms is to reduce the complexity of OAMP, such LM-MP algorithms are out of the scope of this paper.
The "convergence in principle" is the main idea to prove the convergence of OAMP. It means that the convergence of an iterative algorithm is intuitively trivial while the convergence of conventional MP algorithms is non-trivial and therefore needs to be proved for individual problems. The current message in LM-MP can be regarded as an additional measurement for the signal vector that is correlated with all preceding messages. When all measurements are utilized in a Bayes-optimal manner to estimate the signal vector, using the additional measurements never degrades the estimation performance. As a result, the performance of the obtained Bayes-optimal LM-MP is monotonically convergent in principle as the iteration proceeds.
To design the Bayes-optimal estimation of the signal vector in each iteration, we start with an LM generalization of OAMP-called LM-OAMP. The estimation errors in LM-OAMP are proved to be jointly Gaussian-distributed for all iterations in the large system limit via the unified framework of state evolution [44]. Thus, the design issue of Bayes-optimal LM-OAMP reduces to a solvable issue, i.e. the Bayes-optimal estimation of the signal vector from correlated additive white Gaussian noise (AWGN) measurements. The convergence of Bayes-optimal OAMP is guaranteed by proving the equivalence between Bayes-optimal LM-OAMP and OAMP.

B. Contributions
The main contributions of this paper are threefold. In particular, the first two contributions were presented in [51].
A first contribution is a statistical interpretation for the optimization of damping in [43]. This paper derives the optimized damping in terms of a sufficient statistic, without using state evolution recursions, while it was obtained via an optimization problem for state evolution recursions [43]. This statistical interpretation is a key technical tool to prove the convergence of Bayes-optimal LM-OAMP rigorously.
A second contribution is a derivation of state evolution recursions for LM-OAMP (Theorems 1 and 2) and a rigorous convergence analysis of state evolution recursions for Bayesoptimal LM-OAMP (Theorem 3), as well as an exact reduction of Bayes-optimal LM-OAMP to conventional Bayes-optimal OAMP [30]. These results provide a rigorous proof for the convergence of conventional Bayes-optimal OAMP in the large system limit.
The last contribution is a negative aspect of LM-OAMP. Numerical simulations show that LM-OAMP requires larger systems than conventional OAMP for state evolution to provide a good approximation for finite-sized systems. As a result, damped LM-OAMP is slightly inferior to OAMP with heuristic damping for small-to-moderate system sizes while it is consistent with state evolution for large systems. This observation explains why MAMP with the optimized LM damping [43] needs large systems to realize good convergence properties.

C. Organization
After summarizing the notation used in this paper, Section II presents the statistical interpretation and technical results in the first contribution as preliminaries to propose LM-OAMP in Section III. Section IV presents state evolution results for LM-OAMP. The convergence of state evolution recursions for Bayes-optimal LM-OAMP is analyzed in Section V. Numerical simulations for damped LM-OAMP are presented in Section VI. Section VII concludes this paper. Several theorems and lemmas are proved in the appendices.

D. Notation
Throughout this paper, M T , Tr(M ), and det M denote the transpose, trace, and determinant of a matrix M , respectively.
For a vector v t with a subscript t, the nth element of v t is written as v n,t . The notation · · · denotes the Euclidean norm. The vector 1 represents a vector whose elements are all one while e n is the nth column of the identity matrix I. The notations ⊗ and δ i,j denote the Kronecker product and delta, respectively.
The Gaussian distribution with mean µ and covariance Σ is written as N (µ, Σ). The notations a.s. → and a.s. = represent the almost sure convergence and equivalence, respectively.
For a function f : R t → R of t variables, the notation f (x 1 , . . . , x t ) means the element-wise application of f , i.e.

A. Measurement Model
The purpose of Section II is to present the technical background of LM-OAMP proposed in Section III. This section is separated from the other sections in terms of the notation. Consider . . , τ}, we reconstruct the signal X from partial measurements Y τ ∈ R 1×|Tτ | that consist of {Y t : t ∈ T τ }, given by where W τ = {W t : t ∈ T τ } ∼ N (0, Σ τ ) denotes a zeromean Gaussian row vector with covariance E[W T τ W τ ] = Σ τ ∈ R |Tτ |×|Tτ | . The covariance matrix Σ τ is assumed to be positive definite. When all measurements are used, i.e. T τ = {0, . . . , τ}, Σ τ is the (τ + 1) × (τ + 1) upper-left submatrix in Σ t for all τ < t. The goal is to estimate the signal X from the knowledge on the measurement vector Y τ .
We know that the posterior mean estimator E[X|Y τ ] is the best among all possible estimators f (Y τ ) in terms of the mean-square error In computing the posterior mean estimator, we use a two-step approach: A first step is the derivation of a sufficient statistic S τ for estimation of X based on the measurements Y τ . The second step is to compute the conditional mean of X given the sufficient statistic S τ , instead of the original measurements Y τ . This two-step approach is useful in proving technical lemmas to establish the main results while it is equivalent to direct computation of the posterior mean estimator, i.e.

B. Sufficient Statistic
We evaluate a sufficient statistic for estimation of X based on the measurements Y τ . By definition, the probability density function (pdf) of Y τ given X is defined as (3) Evaluating the exponent in the pdf yields which implies that Y τ Σ −1 τ 1 is a sufficient statistic for estimation of X given Y τ . Normalizing this sufficient statistic, we arrive at It is straightforward to confirm thatW τ is a zero-mean Gaussian random variable with variance E[W 2 τ ], given by Furthermore, the correlation E[W τ W τ ] is given by When all measurements are used, i.e. T τ = {0, . . . , τ}, the correlation (8) reduces to for all τ ≤ τ , since E[W T τ W τ ] = (I τ +1 , O)Σ τ holds for W τ = (W 0 , . . . , W τ ). This property is used to prove the convergence of the error covariance in Bayes-optimal LM-OAMP via that of its diagonal elements, which are monotonically decreasing MSEs when we use the posterior mean estimator of the signal vector given all preceding messages.
Finally, we investigate the impact of small eigenvalues for Σ τ on the variance E[W 2 τ ].
The quantity σ 2 0 in Proposition 1 may be obtained via the Bayes-optimal performance for the original compressed sensing problem (1). There exists σ 2 0 > 0 as long as noisy measurements are considered.
Proposition 1 implies that the inner product 1 T u τ must tend to zero as some eigenvalue λ τ > 0 goes to zero. In other words, all eigenvectors associated with small eigenvalues have a tendency to be orthogonal to the vector 1. Unless this asymptotic orthogonality holds, error-free estimation of X is possible via the matched filter (MF) Y τ u τ /1 T u τ → X in probability as λ τ ↓ 0.

C. Extrinsic Denoiser
The Bayes-optimal estimator is defined as the posterior To realize asymptotic Gaussianity in LM-OAMP, we define the extrinsic denoiser f ext where ξ τ,t for τ ∈ T t denotes the averaged partial derivative of f opt (S t ) with respect to Y τ , given by See Section I-D for the definition of the mapping I Tt (·).
. The numerator in the extrinsic denoiser (12) is the Onsager correction of the Bayes-optimal estimator f opt (S t ), which originates from a general error model in [44]. The significance of the denominator is presented in the following proposition: Proposition 2: Among all possible Lipschitz-continuous extrinsic denoisers ψ t : Proof: See Appendix A-A. Proposition 2 implies that the denominator is the best option in terms of the MSE, as long as the Bayes-optimal denoiser is used.

D. Estimator for Error Covariance
For denoisers f t and f t , LM-OAMP needs a consistent estimator for the error covariance When the Bayes-optimal estimators f t = f opt (S t ) and f t = f opt (S t ) are used, we can use the posterior covariance C(S t , S t ) as a consistent estimator, given by In particular, C(S t , S t ) reduces to the posterior variance However, it is not straightforward to construct a consistent estimator for general denoisers.
We derive a consistent estimator of the error covariance for general f t and f t . For any t and t, let {S t ,n , S t,n } N n=1 denote independent samples that follow the same joint distribution as for the two sufficient statistics (S t , S t ), i.e.
where {X n } are independent random variables satisfying X n ∼ X, while {W t ,n ,W t,n } N n=1 are independent zero-mean Then, the sample average of {Ĉ(S t ,n , S t,n )} N n=1 is a consistent estimator: Proof: See Appendix A-B. Corollary 1: Suppose that f t is a Lipschitz-continuous denoiser and let Then, the sample average Proof: Use Lemma 1 for S t = S t and f t = 0.

E. Properties of Bayes-Optimal Estimator
We investigate properties of the Bayes-optimal estimator f opt (S t ) = E[X|S t ]. We have the following technical tools to analyze Bayes-optimal LM-OAMP: . . , τ} for all τ and suppose that the covariance Σ t for the noise vector W t in (2) satisfies are positive definite, the following properties hold: Proof: See Appendix A-D. The former property in Lemma 2 is used to prove the convergence of state evolution recursions for Bayes-optimal LM-OAMP. The latter property and Lemma 3 are useful in proving an exact reduction of Bayes-optimal LM-OAMP to conventional Bayes-optimal OAMP.

A. Outline
LM-OAMP is a generalization of conventional OAMP [30], [31]. OAMP consists of two modules-called modules A and B in this paper. Module A uses a linear filter to compute a posterior message of the signal vector x while module B utilizes a separable denoiser to refine the posterior message for each signal element. A crucial step in each module is the so-called Onsager correction to obtain extrinsic messages of x. This step realizes asymptotic Gaussianity for the estimation errors in the two modules [31], [35]. The obtained messages may be damped in a heuristic manner to improve the convergence property of OAMP for finite-sized systems.
LM-OAMP consists of two modules in Fig. 1, of which each has four steps. In the first two steps, each module utilizes all preceding messages to compute a posterior message of x while OAMP only uses those in the latest iteration. A first step in computing the posterior message is computation of a sufficient statistic for estimation of x given all preceding messages, as presented in Section II. The second step computes a posterior message of x based on the sufficient statistic. This step corresponds to the computation of the posterior messages in conventional OAMP. While the first two steps are equivalent to direct computation of the posterior message given all preceding messages, the two-step approach simplifies the formulation of the Onsager correction in a third step.
To realize asymptotic Gaussianity in LM-OAMP, each module computes the Onsager correction of each posterior message. This third step is heavily based on a general framework of state evolution [44], which can evaluate asymptotic correlations between the current message and all preceding messages rigorously.
The last step is optional: LM damping [43] is employed to improve the convergence property of LM-OAMP for finite-sized systems. As long as the Onsager-corrected messages in the third step are damped, LM damping does not break asymptotic Gaussianity for the estimation errors while heuristic damping of posterior messages breaks asymptotic Gaussianity [44].

B. Notation
The notations used in LM-OAMP are summarized. Let x A→B,t ∈ R N and {v A→B,t ,t ∈ R} t t =0 denote mean and covariance messages for the signal vector x, respectively, passed from module A to module B in iteration t. The covariance v A→B,τ ,τ corresponds to an estimate of the true , and V B→A . The two modules utilize part of preceding messages in each iteration t. Let T A,t ⊂ {0, . . . , t} and T B,t ⊂ {0, . . . , t} denote subsets of indices that represent preceding messages used in iteration t for modules A and B, respectively. We assume that both subsets T A,t and T B,t contain the current index t.
We write the message matrix obtained by arranging the Similarly, we define the message matrices X B→A,t and V B→A,t ,t ∈ R |T A,t |×|TA,t| passed in the opposite direction via the preceding messages {x B→A,t : t ∈ T A,t } and V B→A . In particular, LM-OAMP reduces to conventional OAMP [30] when the subsets . . , t} is the best in terms of the estimation performance. Interestingly, this paper proves that Bayes-optimal LM-OAMP reduces to Bayes-optimal OAMP for T A,t = T B,t = {0, . . . , t}.
Update rules in LM-OAMP are defined so as to realize asymptotic Gaussianity for {x A→B,t }. While a mathematically rigorous definition for asymptotic Gaussianity is presented shortly, a rough intuition is to regard the estimation respectively. These properties help us understand LM-OAMP intuitively while they are too strong to justify. Asymptotic Gaussianity proved in Theorem 1 is enough strong to prove the Bayesoptimality of LM-OAMP while it is weaker than the properties mentioned above.

C. Module A (Linear Estimation)
Module A consists of four steps in each iteration t. In the first step, module A uses the preceding message matrix X B→A,t and the covariance messages {v B→A,τ ,τ : τ , τ ∈ {0, . . . , t}} to compute a sufficient statistic x suf B→A,t ∈ R N for estimation of x and the corresponding covariance {v suf B→A,t ,t } t t =0 , given by For the initial iteration t = 0, we use x B→A,0 = 0 and v B→A,0,0 = E[x 2 ]/N . These update rules are obtained by regarding the estimation errors {x A→B,t − x} as zero-mean See Section II for the derivation. The covariance matrix V B→A,t,t is guaranteed to be asymptotically positive definite via state evolution as long as t is finite. However, V B→A,t,t has small positive eigenvalues that converge to zero as t → ∞. Nonetheless, such small eigenvalues causes no numerical issues in computing V −1 B→A,t,t 1 since the eigenvectors associated with such small eigenvalues should be orthogonal to the vector 1, as proved in Proposition 1.
Remark 1: The sufficient statistic (18) is a normalized linear combination of {x B→A,t } t t =0 while it might not be a convex combination. Thus, (18) can be regarded as a damped estimate in module B. In fact, (18) was originally obtained via the optimization of damping [43]. Nonetheless, the interpretation in terms of a sufficient statistic is important in analyzing the convergence of LM-OAMP as a technical tool.
The second step is computation of the posterior mean x post A,t ∈ R N and covariance {v post A,t ,t ∈ R} t t =0 . A linear filter W t ∈ R M×N is applied to the residual after interference subtraction based on the sufficient statistic (18). with The formula (21) for the posterior covariance is justified in the large system limit via state evolution while the posterior mean (20) is equivalent to that in conventional OAMP [30]. The third step is the Onsager correction to realize asymptotic Gaussianity. Module A computes the extrinsic mean x ext A,t ∈ R N and covariance {v ext A,t ,t } t t =0 , given by where ξ A,t ,t ∈ R is given by for t ∈ T A,t , with The factor ξ A,t ,t is the empirical average for the partial derivatives of the posterior mean x post A,t with respect to the elements in x B→A,t . The formula (24) for the covariance is justified via state evolution.
The numerator in (23) is the Onsager correction of the posterior message x post A,t to realize asymptotic Gaussianity, and originates from a general error model in [44]. The denominator in (23) has been selected to simplify state evolution, as considered in VAMP [31]. In particular, this selection is the best when the true posterior mean estimator is used. See Section II-C for the details. The last step is LM damping [43] of the extrinsic messages. Module A feeds the damped messages x A→B,t and {v A→B,t ,t } t t =0 forward to module B.
where the damping factors {θ A,τ,t } satisfy the following normalization condition: for all t, where the condition θ A,t,t = 0 circumvents rankdeficient V A→B,t,t . Note that the positivity of the damping factors is not assumed since it is not required in state evolution. The main novelty for module A in LM-OAMP is in the sufficient statistic (18), the correction (23) for the posterior message to realize asymptotic Gaussianity, and the update rules (19), (21), (24), and (28) for the covariance messages. In particular, the sufficient statistic is a key technical tool to analyze the convergence of state evolution recursions for LM-OAMP rigorously.
Example 1 (LMMSE): Consider the linear minimum meansquare error (LMMSE) filter We know that the LMMSE filter minimizes the posterior variance v post A,t,t in (21). Let η(x) denote the η-transform [52] of the empirical eigenvalue distribution of A T A, given by It is straightforward to confirm that the variance v post where ξ A,t in (26) is given by For where γ t ,t in (22) is given by

D. Module B (Nonlinear Estimation)
Module B is formulated in the same manner as in module A. The only exception is element-wise nonlinear denoising while module A has used the linear filter to mitigate the interference. Thus, the same explanation is omitted as in module A.
In the first step, module B uses the preceding message matrix X A→B,t and covariance {v A→B,τ ,τ : τ , τ ∈ {0, . . . , t}} in iteration t to compute a sufficient statistic x suf A→B,t and the corresponding covariance {v suf A→B,t ,t } t t =0 , given by The second step is computation of the posterior mean with The posterior mean message x post B,t+1 is used as an estimator of the signal vector x. Since the sufficient statistic x suf A→B,t depends on the preceding messages X A→B,t , the posterior message x post B,t+1 is a function of the preceding message matrix X B→A,t . In this sense, module B utilizes LM denoising.
The posterior covariance (39) is a consistent estimator of the covariance for the estimation errors. See Section II-D for the details.
The third step is the Onsager correction to realize asymptotic Gaussianity. Module B computes the extrinsic mean x ext B,t+1 ∈ R N and covariance {v ext B,t +1,t+1 } t t =0 , 2 If the true prior distribution of x is available, (39) should be replaced with the posterior covariance C(x suf A→B,t , x suf A→B,t ) in (14) while v post for t ∈ T B,t , where ξ B,t is given by (40). The last step is LM damping of the extrinsic messages. Module B feeds the obtained messages x B→A,t+1 ∈ R N and {v B→A,t ,t+1 } t+1 t =0 back to module A. For t , t ≥ 0, with

A. State Evolution
We analyze the dynamics of LM-OAMP in the large system limit via state evolution [44]. A first step is a proof of asymptotic Gaussianity for the estimation errors in LM-OAMP. Asymptotic Gaussianity is proved by confirming that the error model for LM-OAMP is included into a general error model proposed in [44], which guarantees asymptotic Gaussianity. A second step is a derivation of state evolution recursions via asymptotic Gaussianity.
We first postulate technical assumptions to justify asymptotic Gaussianity.
Assumption 1: The signal vector x has i.i.d. elements with zero mean, unit variance, and bounded (2 + )th moment for some > 0.
Assumption 1 is an assumption to simplify state evolution analysis. We would need non-separable denoising if we postulated dependent signal elements. See [53]- [55] for nonseparable denoising.
Assumption 2: The sensing matrix A is right-orthogonally invariant: For any orthogonal matrix Φ independent of A, the equivalence in distribution AΦ ∼ A holds. More precisely, for the SVD A = U ΣV T the orthogonal matrix V is independent of U Σ and Haar-distributed [52], [56]. Furthermore, the empirical eigenvalue distribution of A T A converges almost surely to a compactly supported deterministic distribution with unit first moment in the large system limit.
The unit-first-moment assumption is equivalent to the almost sure convergence N −1 Tr(A T A) a.s. → 1 in the large system limit. The right-orthogonal invariance is an important assumption to justify asymptotic Gaussianity, for which a rough intuition is as follows: Let m ∈ R N denote a vector independent of V . Since V is Haar-distributed, the orthogonal transform V m is distributed as (m/z)z for a standard Gaussian vector z. If the amplitude m/z converges in probability to a constant a > 0 as N → ∞, V m is distributed as if it followed N (0, a 2 I N ).
It is a challenging issue in random matrix theory to weaken the right-orthogonal invariance to more practical assumptions, including discrete cosine transform (DCT) or Hadamard matrices with random permutation [57]. See [58]- [60] for theoretical progress in this direction.
Assumption 3: The noise vector w is orthogonally invariant: For any orthogonal matrix Φ independent of w, the equivalence in distribution Φw ∼ w holds. Furthermore, the almost sure convergence M −1 w 2 a.s. → σ 2 holds for the variance σ 2 > 0. The vector w has bounded (2+)th moments for some > 0.
The AWGN vector w ∼ N(0, σ 2 I M ) with variance σ 2 satisfies Assumption 3. When the sensing matrix is leftorthogonally invariant, the orthogonal invariance in w can be induced from A.
Assumption 4: The linear filter matrix W t in module A has the same SVD structure as the sensing matrix

Assumption 4 contains practically important linear filters, such as the MF W t = A and the LMMSE filter (30).
Assumption 5: The scalar denoiser f t in module B is Lipschitz-continuous and nonlinear.
The Lipschitz-continuity in Assumption 5 is a standard assumption in state evolution analysis [17], [31], [35], [44] while the nonlinearity is required for the positive definiteness of V B→A,t,t . In module A, Assumption 4 and the compact support in Assumption 2 are substituted for Assumption 5.
We are ready to present asymptotic Gaussianity for the estimation errors in LM-OAMP.
verges almost surely to some constantv B→A,τ ,τ in the large system limit. Furthermore, the (t + 1) × (t + 1) upper-left block of the matrixV B→A with [V B→A ] τ ,τ = v B→A,τ ,τ is positive definite as long as t is finite. • LetV B→A,t ,t ∈ R |T A,t |×|TA,t| denote the matrix obtained by extracting the elementsv B→A,τ ,τ for all τ ∈ T A,t and τ ∈ T A,t from the covariance matrix V B→A defined in the first property. Then, the covari- converges almost surely to some constantv post A,τ ,τ in the large system limit, where γ τ ,τ is given by (22), with v suf B→A,τ ,τ = verges almost surely to some constantv A→B,τ ,τ in the large system limit. Furthermore, the (t + 1) × (t + 1) upper-left block of the matrixV A→B with [V A→B ] τ ,τ = v A→B,τ ,τ is positive definite as long as t is finite. • LetV A→B,t ,t ∈ R |T B,t |×|TB,t| denote the matrix defined by extracting the elementsv A→B,τ ,τ for all τ ∈ T B,t and τ ∈ T B,t from the covariance matrix V A→B defined in the third property. Then, the covariance converges almost surely to some constantv post B,τ +1,τ +1 in the large system limit, where {z τ , z τ } are independent of the signal element x 1 and zero-mean Gaussian random variables with covari- (52) Furthermore, ξ B,τ in (40) converges almost surely toξ B,τ in the large system limit, Proof: See Appendix B. The second and fourth results in Theorem 1 are the precise meaning of asymptotic Gaussianity for LM-OAMP while the asymptotic Gaussianity for the second result is not explicit, because of the linear filter in module A. We cannot claim the joint Gaussianity for the estimation errors: the convergence of the joint distribution of {x suf A→B,τ − x} t τ =0 toward a joint Gaussian distribution with covariance E[(x suf A→B,τ − x)(x suf A→B,τ − x) T ] =v suf A→B,τ ,τ I. We only claim the almost sure convergence of (51) given via the zero-mean Gaussian random variables z τ and z τ with covariance E[z τ z τ ] =v suf A→B,τ ,τ . We next derive state evolution recursions by evaluating the covariancev A→B,τ ,τ andv B→A,τ ,τ in Theorem 1.
Theorem 2 (State Evolution): Suppose that Assumptions 1-5 are satisfied. Then, the state evolution recursions for module A are given bȳ v suf B→A,t ,t = withv B→A,0,0 = 1, wherev post A,t ,t is defined in (49). In these expressions, v suf B→A,t,t in W t is replaced withv suf B→A,t,t . On the other hand, the state evolution recursions for module B are given bȳ v suf A→B,t ,t = for t , t ≥ 0, wherev post B,t +1,t+1 andξ B,t are defined as (51) and (53), respectively. For t = −1, where z τ is independent of x 1 and a zero-mean Gaussian random variable with variance E[z 2 τ ] =v suf A→B,τ,τ . Proof: See Appendix C. Theorem 2 presents the state evolution recursions for LM-OAMP, which justify the update rules for the covariance messages in LM-OAMP. In particular, the MSE N −1 x post B,t+1 − x 2 converges almost surely tov post B,t+1,t+1 in the large system limit.
Example 2: As an example of Theorem 2, we derive state evolution recursions for damped OAMP. Let T A,t = T B,t = {t} and consider damping θ A, (44) with some θ B ∈ [0, 1] [31]. From (27) we have x A→B,0 = x ext A,0 and for t > 0. Similarly, from (44) we have x B→A,1 = x ext B→A,1 and for t > 0 The state evolution recursion for module A in damped OAMP is given bȳ wherev post A,t ,t is defined in (49) withv suf B→A,t ,t =v B→A,t ,t . Furthermore, we havev A→B,0,0 =v ext A,0,0 and v A→B, for all t , t > 0, with The state evolution recursions for module B reduce tō v ext B,0,t+1 =v post B,0,t+1 for t ≥ 0, wherev post B,t +1,t+1 is given by (51) with v suf A→B,τ ,τ =v A→B,τ ,τ . Furthermore, we havev B→A,0,0 = 1, v B→A,0,1 =v ext B,0,1 ,v B→A,1,1 =v ext B,1,1 , and These results imply that the correct state evolution recursions for damped OAMP are the two-dimensional discrete systems with respect to the covariance messages. Equivalent results were derived for the MF W t = A and zero-mean i.i.d. Gaussian sensing matrices [42].
The following heuristic damping for the variance messages was used: in precision domain [31] or in variance domain. The heuristic damping cannot provide accurate estimates for the MSE in each module while it results in a one-dimensional discrete system to describe the variance messages in OAMP.

A. Damping
LM-OAMP has several design parameters such as the subsets of indices T A,t , T B,t , the linear filter W t , damping factors {θ A,τ,t }, {θ B,τ,t }, and the denoiser f t . Since this paper focuses on the Bayes-optimal LM-OAMP, the LMMSE filter (30) and the Bayes-optimal denoiser are used. Thus, the remaining design parameters are T A,t , T B,t , {θ A,τ,t }, and {θ B,τ,t }.
We use Theorem 2 to optimize the damping factors {θ A,τ,t } and {θ B,τ,t }. A reasonable criterion is the minimization of the variancev suf B→A,t,t andv suf A→B,t,t for the sufficient statistics given in (54) and (58). This criterion results in greedy minimization of the MSEsv post A,t,t andv post B,t+1,t+1 for the posterior estimators.
When all preceding messages are used, i.e.
In this case, the damping factors can be optimized via the following lemma: where {θ τ,t } t τ =0 satisfy the normalization condition, Define the upper triangular matrix Θ T ∈ R (T +1)×(T +1) as If Θ T has full rank, then the following identity holds: holds. Combining these results, we arrive at Lemma 4. Using Lemma 4 for (57) and (60), we find that the damping factors {θ A,τ,t } and {θ B,τ,t } are independent of the variance (79) for the sufficient statistics if T A,t = T B,t = {0, . . . , t} are used. This observation is intuitively trivial since x suf B→A,t and x suf A→B,t in (18) and (36) are sufficient statistics. Thus, we assume no damping θ A,τ,t = θ B,τ,t = δ τ,t when all preceding information T A,t = T B,t = {0, . . . , t} are used.

B. Bayes-Optimal LM-OAMP
We consider Bayes-optimal LM-OAMP. In general, the state evolution recursions in Theorem 2 are two-dimensional discrete systems. Interestingly, the systems reduce to onedimensional systems when all preceding information T A,t = T B,t = {0, . . . , t}, the LMMSE filter (30), and the Bayes-optimal denoiser are used in each iteration.
The Bayes-optimal denoiser is defined as the denoiser f τ that minimizes the MSEv post B,τ +1,τ +1 in (51). By definition, the Bayes-optimal denoiser is equal to the posterior mean estimator f τ = E[x 1 |x 1 + z τ ] with z τ defined in Theorem 1.
To present the state evolution recursions for the Bayesoptimal LM-OAMP, we review state evolution recursions for conventional Bayes-optimal OAMP [31], [35], which are one-dimensional systems with respect to two variance parametersv A→B,t andv B→A,t , withv B→A,0 = 1. In (84),ξ A,t is given bȳ in (85) corresponds to the MSE for Bayes-optimal OAMP in the large system limit, with z t denoting a zero-mean independent Gaussian random variable with variancev A→B,t .
The state evolution recursions for the Bayes-optimal LM-OAMP are essentially one-dimensional:v A→B,t ,t and v B→A,t +1,t+1 are well defined via the one-dimensional recursions (84) and (85) for Bayes-optimal OAMP [31], [35]. As a by-product, we arrive at the convergence of conventional Bayes-optimal OAMP. Corollary 2 (Bayes-Optimal OAMP): Suppose that Assumptions 1-5 are satisfied. Then, the state evolution recursions (84) and (85) for conventional Bayes-optimal OAMP converge to a FP. Furthermore, the FP characterizes the Bayes-optimal performance if it is unique.
Proof: Since the Bayes-optimality of the unique FP is due to [30], we only prove the convergence of the state evolution recursions for Bayes-optimal OAMP. The latter property in Theorem 3 implies the convergence ofv A→B,t ,t andv B→A,t ,t for the Bayes-optimal LM-OAMP to a FP. Furthermore, the former property in Theorem 3 claims that the diagonal elementsv A→B,t,t andv B→A,t,t are respectively equal tov A→B,t andv B→A,t in the state evolution recursions (84) and (85) for Bayes-optimal OAMP [31], [35]. These observations imply the convergence of the state evolution recursions for Bayes-optimal OAMP to a FP.
Theorem 3 implies that the dynamics of the MSE performance for the Bayes-optimal LM-OAMP is equal to that for Bayes-optimal OAMP. Interestingly, we can confirm an equivalence between the Bayes-optimal LM-OAMP and Bayesoptimal OAMP. Proposition 3: Consider T A,t = T B,t = {0, . . . , t}, θ A,τ,t = θ B,τ,t = δ τ,t , the LMMSE filter (30), and the Bayes-optimal denoiser. If the covariance matrices V A→B,t,t and V B→A,t,t given via (28) and (45) are positive definite for all t, then the Bayes-optimal LM-OAMP is equivalent to Bayes-optimal OAMP.
Proof: Repeating the proof of the former property in Theorem 3, we can prove the identities v A→B,t ,t = v A→B,t,t and v B→A,t +1,t+1 = v B→A,t+1,t+1 for all t ≤ t. From the positive definiteness of V A→B,t,t and V B→A,t,t , we can use the last property in Lemma 3 to find that (18) and (36) reduce to x suf B→A,t = x B→A,t and x suf A→B,t = x A→B,t , respectively, and that ξ A,t ,t = ξ B,t ,t = 0 in (25) and (43) holds for all t = t.
The former observation implies that one can skip the computation of the sufficient statistics in the first step of the Bayesoptimal LM-OAMP. The latter observation indicates that the extrinsic mean messages in (23) and (41) are equivalent to those in Bayes-optimal OAMP [30]. Thus, the Bayes-optimal LM-OAMP is equivalent to Bayes-optimal OAMP. Theorem 1 justifies the positive-definiteness assumption in Proposition 3 in the large system limit. Intuitively, Proposition 3 holds since x A→B,t and x B→A,t are sufficient statistics for estimation of the signal vector given all preceding messages in each module. The Bayes-optimal LM-OAMP may be regarded as a theoretical tool to prove that x A→B,t and x B→A,t are sufficient statistics. These observations imply that conventional Bayes-optimal OAMP is the best option among all possible LM-MP algorithms in terms of the reconstruction performance.

A. Simulation Conditions
In all numerical results, we assume an i.i.d. Bernoulli-Gaussian (BG) signal vector x and artificial sensing matrices A. Each signal element x n is independently sampled from the Gaussian distribution N (0, ρ −1 ) with probability ρ ∈ [0, 1].
Otherwise, x n takes zero. See [44,Appendix F] for properties of the BG prior.
To reduce the computational complexity, we assume the SVD structure A = ΣV T , in which V denotes a Hadamard matrix with random row permutation. Such orthogonal matrices V may be regarded as a practical alternative of Haar orthogonal matrices [58]- [60].

B. Implementation
We consider the LMMSE filter (30) in module A and the Bayes-optimal denoiser f t (s t ) = E[x 1 |s t ] with s t = x 1 + z t in module B, in which {z τ } are defined in Theorem 1.
In computing the covariance message v post B,t +1,t+1 , we use the posterior covariance v post instead of (39). See [44, Appendix F] for the details.
Let V ext B,t,t ∈ R |TB,t|×|TB,t| denote the covariance matrix with [V ext B,t,t ] τ ,τ = v ext B,τ ,τ given in (42). The matrix V ext B,t,t is guaranteed to be positive definite in the large system limit. For finite-sized systems, however, it might not be positive definite for some t. To circumvent this issue, it is recommended to replace v ext B,τ ,τ and v ext B,τ,τ with v ext B,τ,τ if v ext B,τ ,τ v ext B,τ,τ − (v ext B,τ ,τ ) 2 < holds for small > 0, which was set to = 10 −6 in numerical simulations. As discussed in Section II, this replacement implies that the AWGN observation s τ = x 1 + z τ in Theorem 1 becomes a sufficient statistic for estimation of x 1 based on both s τ and s τ for τ < τ. In other words, it means that s τ is not used since it is correlated with s τ strongly.
We first verify the correctness of the state evolution results via numerical simulations for large systems N = 2 13 . Figure 2 shows the MSEs versus the number of iterations for LM-OAMP with damping only in module B, i.e. θ A = 1 and θ B ∈ (0, 1). The state evolution results predict the MSEs of LM-OAMP accurately while OAMP is not in agreement with the state evolution results for small θ B .
We next compare LM-OAMP with OAMP for smaller systems than in Fig. 2. As shown in Fig. 3, LM-OAMP is slightly inferior to OAMP especially for small systems. This is  surprising since LM-OAMP uses the consistent state evolution in the large system limit.
This result is intuitively understood as follows: LM-OAMP uses the covariance messages in all preceding iterations to compute the state evolution prediction in the current iteration. As a result, all prediction errors in the preceding iterations are accumulated for finite-sized systems, so that LM-OAMP should have larger prediction errors than conventional OAMP without LM processing. Figure 3 shows that this prediction errors due to finite size effects are more dominant than the inconsistency of the heuristic damping in the large system limit.

VII. CONCLUSION
This paper has proposed LM-MP, constructed by adding computation of sufficient statistics for estimation of the signal vector given all preceding messages to the beginning of each module in conventional MP. The proposed LM-MP is a modification of conventional MP in two points.
In one point, the Bayes-optimal LM-MP provides a proof strategy to guarantee the convergence of state evolution recursions for conventional Bayes-optimal MP systematically. In the proof strategy, one-dimensional state evolution recursions for the Bayes-optimal MP are embedded into two-dimensional state evolution recursions for the Bayes-optimal LM-MP, which are systematically guaranteed to converge. As a result, one can prove the convergence of the state evolution recursions for the Bayes-optimal MP. While it has been applied to Bayesoptimal OAMP [30], [31] in this paper, the proposed LM-MP strategy is general and applicable to the convergence analysis of the other Bayes-optimal MP algorithms.
The other point is that LM-MP may improve the convergence properties of conventional MP. If state evolution recursions for MP with Bayes-optimal denoiser are not embedded into those for the corresponding LM-MP, the LM-MP strategy indicates that the conventional MP has room for improvement: The constructed LM-MP improves the convergence properties of the conventional MP for large systems. In this sense, MAMP [43] can be regarded as a modification of an MP algorithm without LM damping.
Possible directions for future work are summarized to conclude this paper. A first issue is another criterion for optimality such as the maximum a posteriori (MAP) scenario while this paper has focused on the minimum MSE (MMSE) scenario. It is an interesting direction whether the LM-MP strategy can be generalized to the other criteria for optimality.
The other issue is an improvement of the state evolution prediction for LM-MP in finite-sized systems. State evolution recursions for LM-MP are more sensitive to finite size effects than for conventional MP while they are consistent in the large system limit. It is important to construct covariance estimators that are consistent in the large system limit and accurate for finite-sized systems. Such covariance estimators should increase practical value of LM-MP for finite-sized systems while the numerical simulations in this paper have shown the inferiority of damped LM-OAMP to OAMP with heuristic damping for finite-sized systems.

A. Proof of Proposition 2
We follow [30, Appendix B] to prove Proposition 2. We use To evaluate E[(ψ t − f opt ) 2 ], we repeat the same argument for the extrinsic denoiser f ext t to obtain if the second term is equal to zero for all extrinsic denoisers ψ t .
To complete the proof of Proposition 2, it is sufficient to prove E[ψ t (f ext t − f opt )] = 0 for any Lipschitz-continuous extrinsic denoiser ψ t with E[∂ψ t /∂Y τ ] = 0. From the definition of f ext t in (12), we have Since {W τ : τ ∈ T t } are zero-mean Gaussian, we can use [44,Lemma 2] for Lipschitz-continuous ψ t to obtain where the last follows from the assumption E[∂ψ t /∂Y t ] = 0. Thus, we have where the last equality follows from Thus, Proposition 2 holds.

B. Proof of Lemma 1
Since {S t ,n , S t,n } N n=1 are independent samples, we use the strong law of large numbers to obtain Thus, it is sufficient to prove the unbiasedness ofĈ, i.e.
We first evaluate the expectation E[S t f t (S t )]. Substituting S t = X +W t and using Stein's lemma [61] for Lipschitzcontinuous f t , we have Applying this result to (16), we arrive at Thus, Lemma 1 holds.

C. Proof of Lemma 2
We prove the former property.
for t < t, the optimality of the posterior mean where Z t is a zero-mean Gaussian random variable with variance E[W 2 t ] − E[W 2 t ] ≥ 0 and independent of X and W t . The representation (97) can be verified from The representation (97) implies that S t is a sufficient statistic for estimation of X given both S t and S t . Thus, Applying this identity to the definition of C(S t , S t ) in (14), we arrive at Thus, the latter property holds.

D. Proof of Lemma 3
We first prove the first property. It is straightforward to confirm with have been assumed to be positive definite, we have the positivity {det Σ τ > 0} t τ =1 . Using (100) yields the first property ΔΣ τ,t > 0 for all τ .
Let us prove (100). Since τ holds for all τ < τ, we subtract the (τ + 1)th column in Σ t from the τ th column for all τ ∈ {0, . . . , t − 1} to obtain the upper triangular expression for the determinant, which implies (100). Thus, the first property holds. We next prove the second property. Using the assumption [Σ t ] τ ,τ = [Σ t ] τ,τ and the first property [Σ t ] τ ,τ > [Σ t ] τ,τ for all τ < τ, we have the following representation for {Y τ }: Finally, the last property follows immediately from the definitions of S t and ξ τ,t in (5) and (13), and the identity The identity S t = Y t provides an alternative proof of the second property: The identity Σ −1 t 1 = ([Σ t ] t,t ) −1 e t can be confirmed as follows: From the assumption in Lemma 3, Σ t has the last column

APPENDIX B PROOF OF THEOREM 1
Asymptotic Gaussianity has been proved for a general error model proposed in [44]. Thus, we first prove that the error model for LM-OAMP is included into the general error model.
Lemma 5: Suppose that Assumption 4 holds. Let h t = x A→B,t − x denote the estimation error before denoising in iteration t. Define an error vector q t+1 associated with the estimation error after denoising by Then, the error model for LM-OAMP is included into the general error model in [44] and given by In particular,q t = x B→A,t − x holds. Proof: We first prove which is equivalent to the definition in [44,Eq. (9)]. From the definition of q t+1 in (103), we have We define the error matrix H t = X A→B,t − 1 T ⊗ x for the preceding messages before denoising, so that H t consists of the columns {h t : t ∈ T B,t }. Using the definition of x post B,t+1 in (38) with (36), we find that x post B,t+1 is a function of {h t : t ∈ T B,t } since the sufficient statistic (36) is a function of X A→B,t = 1 T ⊗ x + H t . Thus, we obtain for t ∈ T B,τ , with ξ B,t ,τ defined in (43), where [· · · ] ,n denotes the nth row of · · · . Substituting these results and the definition of q t+1 in (103) into (110), we arrive at the equivalence between (109) and (110). We next proveq t+1 = x B→A,t+1 − x. Using the definition of x ext B,t in (41) and h t = x A→B,t − x yields where we have used t ∈TB,t ξ B,t ,t = ξ B,t . Applying this expression to (109), we havẽ Using the definition of x B→A,t+1 in (44) and the normalization (46), we arrive atq t+1 = x B→A,t+1 − x.
We shall derive (107). Applyingq t = x B→A,t − x to the sufficient statistic (18) yields withQ t = X B→A,t − (1 T ⊗ x). Substituting this result and the measurement model (1) into the definition of x post A,t in (20), we have For the SVD A = U ΣV T , applying the SVD W t = UW t V T in Assumption 4 to (116) and using the definition of b t in (104), we arrive at (107) with B t = V TQ t . Let us provẽ which is equivalent to the definition in [44,Eq. (11)]. Using the expression for m t in (105) with (107), we repeat the derivation of (112) to have for t ∈ T A,t , with ξ A,t ,t defined in (25) We have so far proved that the error model (104)-(109) is included into the general error model in [44]. To complete the proof of Lemma 5, we prove h t = Vm t . From the definition ofm t in (106), we have Using the definition of x ext A,t in (23) yields withq t = x B→A,t − x, where we have used the identity t ∈TA,t ξ A,t ,t = ξ A,t . Thus, we have where the second equality follows from the definition of x A→B,t in (27) a.s.
in the large system limit, with where {B τ } follow zero-mean Gaussian random matrices with covariance E[b τ b T τ ] =v B→A,τ ,τ I N while U T w follows N (0, σ 2 I M ). Evaluating the expectation in (122) with the identities Tr(D τ ,τ ) = Tr{( , we find that the righthand side (RHS) is equal to (49) in the large system limit. Thus, the second property in Theorem 1 holds.
Finally, we prove the last property. Repeating the derivation of (115) for the sufficient statistic (36) yields We use the definition of x B,τ +1 in (38) where {H τ } in (124) follow zero-mean Gaussian random matrices with covariance E[h τ h T τ ] =v A→B,τ ,τ I N . Evaluating the correlation E[z n,τ z n,τ ] with (8), we arrive at (51).
To complete the proof of Theorem 2, we derive (54) and (58). Since we have defined the covariance messages that are consistent to the state evolution recursions in the large system limit, we have V A→B,τ,τ a.s. →V A→B,τ,τ and V B→A,τ,τ a.s. →V B→A,τ,τ . Thus, (50) and (52) reduce to (54) and (58), respectively. This justifies the replacement of v suf B→A,t,t in W t withv suf B→A,t,t .

APPENDIX D POOF OF THEOREM 3
We first prove the identityv A→B,t ,t =v A→B,t for all t ≤ t. When all preceding information T A,t = T B,t = {0, . . . , t} is used for all t,v suf B→A,t ,t in (54) as shown in (9). Using this result, the definition of γ t ,t in (22), the LMMSE filter (30) with v suf B→A,t,t replaced byv suf B→A,t,t , we find thatv post A,t ,t in (49) for t ≤ t reduces tō where the last equality follows from (55). This expression is independent of t . Thus, we havev post A,t ,t =v post A,t,t . Substitutinḡ ξ A,t =v post A,t,t /v suf B→A,t,t into (56), we find that, for all t ≤ t,v A→B,t ,t in (57) is equal tov A→B,t given in (84) with v B→A,t,t replaced byv suf B→A,t,t . We next prove the identityv B→A,t +1,t+1 =v B→A,t+1 for all t ≤ t. Repeating the same derivation for (58) as in (135) for all t ≤ t. Using the second property in Lemma 2 yields v post B,t ,t =v post B,t,t for all t ≤ t. Furthermore, we have the general relationshipξ B,t =v post B,t+1,t+1 /v suf A→B,t,t [35, Lemma 2] between the Bayes-optimal denoiser and the MMSE. Applying these results to (59), we find that, for all t ≤ t,v B→A,t +1,t+1 in (60) is equal tov B→A,t+1 given in (85) withv A→B,t,t replaced byv suf A→B,t,t . To complete the proof of the former property in Theorem 3, we need to prove the identitiesv suf A→B,t,t =v A→B,t,t and v suf B→A,t,t =v B→A,t,t . Without loss of generality, we focus on the formerv suf A→B,t,t =v A→B,t,t . We have already provedv A→B,t ,t =v A→B,t,t for all t ≤ t. Furthermore, Theorem 1 implies that {V A→B,τ,τ } t τ =1 are positive definite. Thus, we can use the second property in Lemma 3 to arrive atv suf A→B,t,t =v A→B,t,t , so that the former property in Theorem 3 holds.
Let us prove the latter property in Theorem 3. Sincē v suf A→B,t,t =v A→B,t,t andv suf B→A,t,t =v B→A,t,t hold, the first property in Lemma 2 implies that {v A→B,t,t ≥ 0} and {v B→A,t,t ≥ 0} are monotonically non-increasing sequences with respect to t. Also, it is possible to prove that {v A→B,t,t ≥ 0} and {v B→A,t,t ≥ 0} are strictly deceasing sequences, by using the first property in Lemma 3. Thus, there are the limits lim t→∞vA→B,t,t =v A→B and lim t→∞vB→A,t,t = v B→A , which imply the convergence ofv post B,t+1,t+1 in (51) to some constantv post B . From (53) and (86),ξ B,t andξ A,t converge to some constantsξ B andξ A as t → ∞. These observations imply thatv A→B,t ,t =v A→B,t andv B→A,t ,t = v B→A,t given in (84) and (85) converge to a FP as t tends to infinity for all t ≤ t.

ACKNOWLEDGMENT
The author thanks the anonymous reviewers for their suggestions that have improved the quality of the manuscript greatly.