Training-Based Channel Estimation Algorithms for Dual Hop MIMO OFDM Relay Systems

In this paper, we consider minimum-mean-square error (MMSE) training-based channel estimation for two-hop multiple-input multiple-output (MIMO) orthogonal frequency division multiplexing (OFDM) relaying systems. The channel estimation process is divided into two main phases. The relay-destination channel is estimated in the first phase and can be obtained using well-known point-to-point MIMO OFDM estimation methods. In the second phase, the source-relay channel is estimated at the destination with the use of a known training sequence that is transmitted from the source and forwarded to the destination by a nonregenerative relay. To obtain an estimate of the source-relay channel, the source training sequence, relay precoder, and destination processor, require to be optimized. To solve this problem, we first derive an iterative algorithm that involves sequentially solving a number of convex optimization problems to update the source, relay, and destination design variables. Since the iterative algorithm may be too computationally expensive for practical implementation, we then derive simplified solutions that have reduced computational complexity. Simulation results demonstrate the effectiveness of the proposed algorithms.

non-regenerative relays (also known as amplify-forward relays) are expected to have simpler functionality and lower implementation cost than decode-forward relays.As such they have limited signal processing capabilities [17]- [23] and are therefore not expected to perform complex tasks such as channel estimation.In fact it is desireable to keep the complexity of non-regenerative relays to a minimum.For these reasons the estimation of the source-relay channel at the relay in such networks is considered impractical, and alternative estimation methods have therefore been suggested.
For channel estimation problems in non-regenerative MIMO relaying, one possible approach is to estimate the compound MIMO channel through observations at the destination device (as in e.g.[15], [24]).Although techniques that estimate the compound source-destination MIMO channel allow reliable detection at the destination device, they cannot fully exploit the potential benefits (such as improved MSE and BER performance etc.) that are made available by sophisticated transceiver designs such as those developed in [1]- [9], which require knowledge of both the source-relay and relay-destination channels.Alternative methods that are capable of estimating both these channels have therefore been developed [18], [19], [21]- [23], [25], [26], where all channel estimation is conducted at the destination.
Least squares (LS) and weighted least squares algorithms have been developed in [25] and [26], respectively, where the source-relay and relay-destination channels are estimated from the observed composite MIMO channel at the destination.Necessary and sufficient conditions for obtaining the source-relay and relay-destination channels from the composite channel observed at the destination are studied in [19].
Two-phase channel estimation procedures have been developed in [18], [21]- [23].In these works, the relay-destination channel is estimated in the first phase using conventional techniques.In the second phase, the source sends training symbols to the relay, which precodes the received symbols and forwards them to the destination.The destination can then estimate the source-relay channel using the knowledge of the relay-destination channel obtained in the first phase of channel estimation.In [18] the authors derive the optimal source training symbols and relay precoder under the assumption that the relay-destination channel is perfectly estimated during the first phase.Imperfect estimation of the relay-destination channel adversely affects the source-relay estimate obtained by the algorithm in [18].The works of [21]- [23] therefore develop robust estimation algorithms that account for such imperfect relay-destination CSI.
In this paper we consider the channel estimation problem for non-regenerative MIMO OFDM relay networks for transmission over frequency selective channels.Similar to [18], [21]- [23] we adopt a two-phase channel estimation approach.In the proposed algorithms, the relay-destination channel is equivalent to a point-to-point MIMO OFDM channel estimation problem and the algorithms in e.g.[11]- [13] can be used to obtain the channel estimate at the destination.The main contribution of this paper lies in the estimation of the sourcerelay channel, which is conducted at the destination in the second phase of channel estimation.We consider the design of the source training matrix, relay precoder, and destination processor in order to estimate the source-relay channel at the destination.An iterative algorithm is firstly considered, before simplified approaches with reduced complexity are presented.
Notation: In our notation we denote scalars, vectors, and matrices by lower case normal font, lower case bold font, and upper case bold font respectively.The quantities I N and 0 N ×M denote the N × N identity matrix and N × M zero matrix.The element in the ith row and jth column of matrix A is denoted [ A] i j .The operators E{.}, tr{.}, (.) T , (.) H , {.} † , and .F denote expectation, trace, transpose, Hermitian transpose, pseudo inverse, and Frobenius norm, respectively.Matrix rank is denoted rank{.}and diag[{ A n } N n=1 ] produces a block diagonal matrix with the nth diagonal block given by A n .The operators min(.) and max(.)return the minimum and maximum, and we define [x] + max(x, 0).The floor operator .returns the maximum integer not exceeding the argument.The Kronecker product is denoted by ⊗ and the vectorisation operator is denoted vec [.].The notation A B signifies that A is positive semi-definite w.r.t.B.

A. OFDM Channel Model
We consider a half duplex two-hop non-regenerative MIMO relaying system where the source, relay, and destination devices are equipped with N s , N r , and N d antennas, respectively.The source-relay and relay-destination channels are considered to be frequency selective with L + 1 distinguishable delay paths between each transmit-receive antenna pair.We denote the lth MIMO taps of the source-relay and relay-destination channels by respectively, and assume each delay path to be spatially correlated on both the transmit and receive sides.We adopt the Kronecker spatial correlation model and can thus decompose the source-relay and relay-destination channel taps as [27], [28] H where ϒ s [l] ∈ C N r ×N r and s [l] ∈ C N s ×N s are the receive side and transmit side spatial correlation matrices, respectively, for the lth delay path of the source-relay channel.Similarly, are the spatial correlation matrices for the lth delay path of the relay-destination channel.We assume that different delay paths are uncorrelated and the elements of H s [l] and H r [l] are circularly symmetric complex Gaussian random variables drawn from CN(0, σ 2 h s [l]) and CN(0, σ 2 h r [l]), respectively.To deal with the frequency selective nature of the wireless channels, OFDM is employed with K subcarriers for both the source-relay and relay-destination transmission stages.Assuming that the OFDM cyclic prefix on each antenna branch is of length L, the frequency selective channels are decoupled into the K orthogonal narrowband subcarrier channel matrices where H s,k and H r,k are the kth subcarriers for the source-relay and relay-destination channels, respectively, and we define W kl e − j2π(k−1)l/K .Channel estimation can be conducted to either directly estimate the OFDM subcarrier channels in ( 3) and ( 4), or to estimate the underlying time domain channels characterised by ( 1) and (2).In this paper we develop channel estimation algorithms for the latter case since it generally requires fewer parameters to be estimated.
In our channel model we assume that both the source-relay and relay-destination channels remain constant over the channel estimation process.Furthermore, it is also assumed that the coherence time of both channels is long enough such that the subsequent transmission of data symbols may be conducted without the overhead spent on channel estimation and feedback becoming prohibitive and thereby limiting the system spectral efficiency.

B. Channel Estimation Procedure
Similar to the works in [18] and [21], we assume that the estimation of the source-relay and relay-destination channels is conducted in two training phases.In the first phase the source remains silent and the relay sends a known training sequence to the destination.The estimation of the relay-destination channel in this case is a standard point-to-point MIMO OFDM estimation problem and the algorithms developed in [11]- [13] can be used to estimate the channel at the destination.Since the estimation of the relay-destination channel is a standard MIMO OFDM channel estimation problem it shall not be discussed further.The main focus of this paper is on the estimation of the source-relay channel, which is obtained at the destination during the second phase of channel estimation.
The second phase of channel estimation is dedicated to estimating the source-relay channel matrices in (1).In this phase the source transmits a training sequence to the relay whilst the relay remains silent.For each OFDM subcarrier we can write the received signal r k ∈ C N r at the relay as where s k ∈ C N s is the training signal for the kth subcarrier, and v s,k ∈ C N r is an additive white Gaussian noise (AWGN) vector, which contains independent identically distributed (i.i.d.) zero mean complex Gaussian random variables, and has covari-

), and using vec[ AX B] = (B T ⊗ A)vec[X], we have
where for notational convenience we define In (8) we define L h s N s N r (L + 1) as being the number of source-relay channel coefficients that require to be estimated.Collecting the received signals from (6) into a single column vector defined as r vec[r 1 , . . ., r K ] ∈ C K N r , it can be straightforwardly shown that where is the collection of source-relay AWGN vectors over all subcarriers, and we also define the matrices The matrix S in (12) is the training matrix associated with the source-relay channel estimation problem, which requires to be optimised.It is worth noting here that if the relay device has the ability to perform channel estimation then, based on the received signal in (9), the estimate of h s can be computed at the relay using the point-to-point channel estimation algorithms in e.g.[12].However, a non-regenerative relay as considered here may be limited in its processing capability and it is desireable to keep the computational expense at the relay as low as possible.
Similar to the works in [18] and [21], we consider that the relay only performs a simple precode-and-forward operation and channel estimation is then performed at the destination.The relay device thus precodes the received signal in (9) to produce the transmit signal where G ∈ C K N r ×K N r is the linear relay precoder that operates over all subcarriers and also requires to be optimised.After performing the precoding operation the relay transmits the signal in (13) to the destination node, resulting in the signal y ∈ C K N d received at the destination being given by where v r ∈ C K N d is the relay-destination AWGN vector that contains i.i.d.zero mean complex Gaussian random variables and has covariance E{v r v H r } = σ 2 v r I K N d .In (14) the matrix H r ∈ C K N d ×K N r describes the relay-destination OFDM channel over all subcarriers and is defined as with the subcarrier channels H r,k being given in (4).
From the received signal in (14) the task is to compute an estimate of the source-relay channel taps given in (1).This is equivalent to the estimation of the channel vector h s in (8).In the following we shall denote the estimate of h s by with Ĥs [l] ∈ C N r ×N s signifying the estimate of the lth sourcerelay MIMO channel tap in (1).In order to facilitate the optimisation of the channel estimate in (16) we introduce a linear processor W ∈ C L hs ×K N d at the destination and let ĥs = W y. Using the received signal in ( 14) the source-relay channel estimate is then given by To obtain a useful channel estimate in (17) we require to optimise the destination processor W , the linear relay precoder G, and the source training matrix S (note that M is a function of S through (10)).Since both the source-relay and relay-destination channels are estimated at the destination device, it is convenient that the optimisation problem is solved at the destination.The source training matrix and relay precoder may then be communicated back to the source and relay, respectively, through low rate feedback channels.

III. PROBLEM FORMULATION
In this section we formulate the optimisation problem for deriving the source training matrix, the relay precoder and the destination processer to minimise the source-relay channel estimation MSE.We firstly denote the error between the estimated channel vector ĥs and the actual channel vector h s as e = ĥs − h s ∈ C L hs .Using (17) we can express the source-relay estimation error as The channel estimation MSE cost function is then given by (S, G, W ) = tr{E{ee H }}, where the expectation is taken w.r.t. the random noise compenents v s and v r , as well as the unknown channel vector h s .Using (18) we can expand (S, G, W ) as where we define as the source-relay channel covariance matrix.Unfortunately, we find from (19) that minimising (S, G, W ) can only be conducted if the relay-destination channel matrix H r in ( 15) is precisely known.
In order to formulate an analytically tractable optimisation problem we make the simplifying assumption that the estimate of the relay-destination OFDM channel is sufficiently accurate.This is a valid assumption when the signal to noise ratio (SNR) during the first phase of channel estimation is sufficiently high such that relay-destination channel estimation errors are small enough to be negligible [21].We therefore consider the cost function ˆ (S, G, W ) given by which is obtained from (19) simply by replacing H r with the corresponding channel estimate Ĥr .
As well as minimising the cost function ˆ (S, G, W ), the source and relay transmit powers should be constrained since these devices will have limited power budgets.The source transmit power constraint is given by tr SS H ≤ P s , (22) with P s being the power budget available to the source device.
The relay transmit signal in (13) should also abide by a power constraint.Since (13) depends on the unknown noise signal v s as well as the channel vector h s we shall enforce an expected relay transmit power constraint given by tr{E{r r H }} ≤ P r , where the expectation is taken w.r.t.v s and h s , and P r is the power budget available to the relay.The relay power constraint can thus be written as where T E{r r H } ∈ C K N r ×K N r and is given by We can now construct the optimisation problem for finding the source-relay channel estimate as min tr SS H ≤ P s (26) tr The optimisation problem in ( 25)-( 27) is non-convex and obtaining the optimal solution in closed form is intractable.
In the following sections we propose several algorithms for overcoming this problem.

IV. PROPOSED ITERATIVE ALGORITHM
It can be shown that with any two of the variables fixed the problem in ( 25)-( 27) is convex w.r.t. the remaining variable.In fact, the individual design problems for the source training matrix, the relay precoder, and the destination processor can be formulated as (possibly) constrained quadratic matrix problems with matrix variables.With this observation we therefore suggest an iterative algorithm to solve ( 25)-( 27) by sequentially updating each variable.Since each subproblem is a convex optimisation problem, iteratively updating each variable in a sequential fashion allows the algorithm to converge to (at least) a locally optimal solution.It is interesting to note that the proposed iterative algorithm is not too dissimilar in nature to iterative linear MMSE transceiver designs for MIMO relay systems (see [29]), and similar tools can be utilised to solve these problems.

A. Training Matrix Update
We focus firstly on solving (25)-( 27) for updating the source training matrix S when both G and W are fixed.To this end we note that ˆ (S, G, W ) can be equivalently written as Similarly, we can equivalently express the source and relay power constraints in ( 26) and ( 27) as where to obtain (30) we have used the definition of T in (24).Using ( 28)-( 30) we can rewrite the optimisation problem in ( 25)-( 27) for finding the training matrix S as We note that the contribution of the second term in (28) has been eliminated from the objective function in (31) since it is not a function of the training matrix S. Introducing an auxilliary variable t that satisfies we can solve ( 31)-( 33) from the following second order conic program (SOCP) [30] min The SOCP in ( 35)-( 38) is a standard convex optimisatin problem and therefore the optimal solution can be efficiently found using interior point algorithms [30].

B. Relay Precoder Update
We now focus on updating the relay precoder by solving (25)- (27) for G when both the source training matrix S and the destination processor W are fixed.Since the power constraint in (26) does not depend on G we can find the relay precoder by solving This optimisation problem is a quadratic matrix problem with a single constraint and is a standard convex optimisation problem.
It is worthwhile mentioning that the optimisation problem for the relay precoder in (39)-(40) can be reformulated as a SOCP and solved using interior point methods.However, since there is only one constraint, a simpler solution can be derived from the KKT conditions [30].The following KKT conditions are sufficient for optimality: where μ r is the Lagrangian multiplier associated with the relay power constraint in (40), or equivalently that in (42).Solving (41) results in the optimal relay precoder given by The Lagrangian multiplier μ r must now be calculated to ensure the power constraint in (40) is satisfied.Substituting (45) into (42) and (43) we require to find μ r that satisfies tr where for convenience we define The condition in (47) can be satisfied with either μ r = 0 or tr{( If μ r = 0 satisfies the condition in (46) then, since the left hand side of ( 46) is a monotonically decreasing function of μ r , it is the only solution to satisfy both (46) and (47).On the other hand, if μ r = 0 does not satisfy (46) then we must compute a positive μ r to satisfy both (46) and (47).To satisfy (46) and (47) in this case we require to compute μ r such that tr{( Ĥ H r W H W Ĥr + μ r I K N r ) −2 X} = P r .Based on the fact that the left hand side of this expression is a monotonically decreasing function of μ r , the method of bisection can be used to find μ r in this case.

C. Destination Processor Update
We finally focus on updating the destination processor by solving ( 25)- (27) for W when both the relay precoder G and the source training matrix S are fixed.Since the source and relay power constraints in ( 26) and ( 27) are independent of W , the destination processor can be found by solving the unconstrained optimisation problem From ˆ (S, G, W ) given in (21) it is straightforward to show that ˆ (S, G, W ) is convex in W .The optimal solution to the unconstrained problem in (49) can therefore be found by setting the derivative of ˆ (S, G, W ) w.r.t.W * to zero and solving the result for W .This leads to the optimal destination processor being given by

D. Summary of Iterative Algorithm
We now briefly summarise the proposed iterative algorithm.For the ith iteration, let us denote S i , G i , and W i , as the updates of the source training matrix, relay precoder, and destination processor, respectively.The main steps of the proposed iterative algorithm are shown in Algorithm 1: Algorithm 1: Iterative algorithm for source-relay channel estimation.
As highlighted in Algorithm 1, the variables S i , G i , and | falls below some threshold ∈ R + , or until a maximum number of prespecified iterations are reached.Since the design variables are found by solving convex subproblems, the channel estimation MSE can only decrease or maintain after each update and convergence is guaranteed.

V. PROPOSED SIMPLIFIED ALGORITHMS
As has been previously established, for any given S and G, the optimal destination processing matrix W can be found by solving the unconstrained problem in (49) and the solution is given by (50).Substituting (50) into (21), and using the matrix inversion lemma, we can express the source-relay channel estimation MSE as in (51), shown at the bottom of the page.To obtain the matrix E 1 in (51) we have also used the facts that The optimisation problem in (52)-( 54) is not jointly convex in the design variables S and G and thus the optimal solution is difficult to obtain.In the following we therefore focus on simplified approaches to solving (52)-( 54), which have reduced computational complexity compared to the iterative algorithm proposed in the previous section.

A. Optimal Relay Precoder Structure
We begin by deriving the optimal structure of the relay precoder G as the solution to (52)-(54).Since E 1 and the source power constraint in (53) are both independent of G, we can calculate the optimal relay precoder structure by solving min Theorem 1: The optimal structure of the relay precoder G as the solution to the problem in (55)-( 56) is given by where L ∈ C K N r ×L hs is a matrix yet to be determined.Proof: See Appendix A. We see that the optimal relay precoder given in (57) can be decomposed into two main components.Specifically, it consists of a transmit/precoding matrix L and a receiver matrix given by R h s (M H ⊗ I N r )T −1 .Interestingly, given the relay received signal in (9), it can be straightforwardly shown that this receiver matrix is in fact the optimal linear matrix for producing the MMSE channel estimate at the relay.The action of the relay precoder is therefore to firstly produce a MMSE sourcerelay channel estimate, before precoding this estimate by L and forwarding the result to the destination.
Substituting the relay precoder structure of (57) into tr{E 2 } in (51) and using the matrix inversion lemma we can write where for convenience we define For high SNR of the source-relay link, i.e. when we have , it is straightforward to show from (59) that N approaches R h s .In such a high SNR environment the source training matrix S does not impact tr{E 2 } in (58).The optimisation of S in this case can then be computed independently of the relay precoder and we can therefore approximate and decompose the original optimisation problem in (52)-(54) into two separate problems.Specifically, using the optimal relay precoder structure in (57) as well as the matrix E 1 defined in (51), we can approximate and decompose the problem (52)-(54) into the following source training matrix optimisation problem min and relay precoder matrix optimisation problem min The decomposition of the original problem in (52)-(54) using the high SNR approximation greatly simplifies the optimisation procedure since the training matrix S can firstly be found by solving (60)-( 61) and the matrix L can subsequently be found by solving (62)-(63).Unlike the algorithm proposed in Section IV there is no need to iteratively update the variables in this case, which results in the proposed simplified algorithms having substantially reduced computational complexity compared to the iterative algorithm.

B. Proposed Simplified Algorithm 1
The source training matrix optimisation problem in (60)-( 61) is equivalent to the point-to-point MIMO OFDM channel estimation problem considered in [12] and the optimal solution to (60)-(61) can be found using similar arguments to those made in [12].Specifically, the optimal structure of the source training matrix is given by where S ∈ C N s ×N s is a matrix yet to be determined and Q ∈ C K ×N s is a semi-unitary matrix that satisfies where F l were defined in (11).The semi-unitary matrix Q that satisfies the properties in ( 65)-(67) can be constructed as follows: Partition the matrix Q as Q = [q 1 , . . ., q N s ], where q i ∈ C K denotes the ith column of Q.Let the first column of Q be given by q 1 = √ 1/K 1 K , where 1 K is a K dimensional column vector with all elements equal to one, which satisfies q 1 = 1.The kth element of the remaining columns of Q can then be constructed as q i k = e − j2π K /N r (i−1)(k−1)/K .Substituting ( 20) and ( 64) into (60)-( 61), then after introducing the variable change the optimisation problem in (60)-( 61) is equivalent to the following problem in To obtain the objective function in (69) we have utilised the fact that the matrix Q in (64) satisfies the conditions in (65)-(67), as well as the properties then by utilising the Schur complement lemma the problem in (69)-(71) can be solved by the following SDP [30] min where 76) is a standard convex optimisation problem and thus the optimal solution can be efficiently found using interior point algorithms [30].After solving (73)-(76) for Ŝ, the matrix S can be computed from (68) as S = Ŝ1/2 , and finally the optimal training matrix S is given by (64).
Having solved (60)-(61) for the optimal training matrix S, we now turn our attention to deriving the optimal matrix L as the solution to the optimisation problem in (62)-(63).We firstly restate the problem in (62)-(63) as Let us now consider the singular value decomposition (SVD) of Ĥr and eigenvalue decomposition (EVD) of N given by where is a K N d × K N r diagonal matrix containing the non-zero singular values {δ i } R r i=1 ∈ R ++ , and is an L h s × L h s diagonal matrix that contains the non-zero eigenvalues Here we define R r rank{ Ĥr } and R n rank{N}, and assume w.l.o.g. that the diagonal entries in and are arranged in descending order.Theorem 2: The optimal relay precoding matrix L as the solution to (77)-( 78) is given by where is a K N r × L h s diagonal matrix with non-negative diagonal elements {φ i } R i=1 ∈ R + .Here we define the variable R min(R r , R n ).
Proof: See Appendix B. With the matrix L given in (81) the optimal relay precoder is finally given from (57) by Substituting ( 79)-( 81) into ( 62)-( 63), the original matrix valued optimisation problem reduces to the scalar problem min This problem has a standard waterfilling solution, which can be obtained from the KKT conditions [30], and is given by Substituting ( 86) into (84) we now require to calculate the waterlevel μ r to satisfy the non-linear equation which can be obtained using the waterfilling algorithm in [31].We now briefly summarise the main steps for computing the source-relay channel estimate using the simplified algorithm.The matrix Ŝ is firstly computed by solving the SDP (73)-(76) from which we then obtain S = Ŝ1/2 .The source training matrix S is then given by (64).Having calculated S, we then compute the relay precoding matrix G from (82), where the elements of are computed according to (86).Finally, the destination processor W is given by (50) and the source-relay channel estimate is obtained from (17).

C. Proposed Simplified Algorithm 2
Although the previously discussed simplified algorithm allows the source training matrix to be computed independently of the relay precoder, and is therefore more computationally efficient than the iterative algorithm in Section IV, it requires solving the SDP in (73)-(76).The channel estimation algorithm can be simplified further by deriving a suboptimal solution to the training matrix design problem in (60)-(61).
Substituting (64) into ( 60)-( 61) the training matrix design problem can be stated as where . We now consider a suboptimal solution to (88)-( 89) that can be obtained in closed form.To this end it is shown in Appendix C that an upper bound to the training matrix objective function in (88) is where we define the matrices Replacing the objective function in (88) with the upper bound in (90), we can obtain a suboptimal solution to the optimisation problem in (88)-( 89) by solving min Before solving (93)-( 94), let us introduce the EVD where is an N s × N s diagonal matrix that contains the positive eigenvalues {ξ i } N s i=1 ∈ R ++ .Theorem 3: The optimal solution to the optimisation problem in (93)-( 94) is given by where is a diagonal matrix of dimension N s × N s with nonnegative diagonal entries Substituting (96) into (64) the suboptimal structure for the training matrix S as the solution to the original problem in (60)-( 61) is given by We would like to mention that this result coincides with the suboptimal training matrix solution derived in [12], although a different method of derivation was employed in [12].Substituting ( 96) and ( 95) into ( 93)-( 94) we can restate the matrix valued optimisation problem as the following scalar problem min The optimal solution to (98)-( 100) is still difficult to obtain in closed form and we therefore consider a suboptimal solution to this problem.It is straightforward to show that the objective function in ( 98) is concave in υ j .We can therefore obtain an upper bound to the objective function by using Jensen's inequality, and a suboptimal solution to ( 98)-( 100) can be obtained by minimising this upper bound.This suboptimal solution to ( 98)-( 100) can be found by solving min The problem ( 101)-( 103) can be solved from the KKT conditions of optimality [30] and is given by where μ s is the KKT multiplier required to be computed to satisfy the constraint in (102) and can be found using the waterfilling algorithm in [31].In summary, the proposed simplified source-relay channel estimate can be obtained as follows.The source training matrix S is firstly given by (97), after which the relay precoder G can then be computed according to (82).Finally, the destination processor W is calculated from (50) and the source-relay channel estimate is then obtained from (17).

A. Simulation Setup
We consider a two-hop MIMO OFDM relaying system equipped with N s , N r , and N d antennas at the source, relay, and destination devices.The frequency selective paths between each transmit and receive antenna pair are considered to be of length L = 4 with the lth MIMO channel taps being modelled according to (1) and ( 2).The transmit spatial correlation matrices s [l] and r [l], and the receive spatial correlation matrices ϒ s [l] and ϒ r [l], are modelled using the exponential model (see e.g.[32], [33]) and are given by  2) are drawn from i.i.d.complex Gaussian distributions with zero mean and variances σ 2 h s [l] and σ 2 h r [l], and in all simulations we set σ 2 h s [l] = σ 2 h r [l] = 1/(L + 1), ∀l.We consider an OFDM system that utilises K = 32 subcarriers.For channel estimation, the relay-destination channel is estimated in the first phase where the SNR of the relaydestination channel during the first phase of channel estimation is SNR r = Pr /(K σ 2 v r ), where Pr is the power afforded to the relay for estimation in the first phase.The source-relay channel is estimated during the second phase.During the second phase of channel estimation, the SNRs of the source-relay and relay-destination channels are given by SNR s = P s /(K σ 2 v s ) and SNR r = P r /(K σ 2 v r ), respectively.

B. Performance of the Proposed Algorithms
In our first set of simulation examples we assume a system with N s = N r = N d = 3 and that the relay-destination channel estimate obtained during the first phase of channel estimation is accurate such that Ĥr = H r can be assumed.This is a practical assumption when the relay-destination SNR during the first training phase is sufficiently high.
The convergence of the iterative algorithm proposed in Section IV is firstly investigated when the source training matrix and relay precoder are initialised as random matrices scaled to satisfy the power constraints, as well as when it is initialised using the solutions for the simplified algorithms presented in Section V-B and Section V-C. Figure 1 shows the convergence of the algorithm for SNR s = {5, 10, 30} dB, SNR r = 20 dB, and It can be seen that the algorithm converges after around 5-6 iterations when initialised with random matrices.Interestingly, when the iterative algorithm is initialised using the simplified solutions presented in sections V-B and V-C, only a very small performance improvement can be seen from the initial estimates.This result suggests that the simplified solutions must be close to a local optima (if not the global optima).It is evident that a judicious initialisation of the iterative algorithm is to initialise it using one of the proposed simplified solutions.In all simulation examples henceforth, the iterative algorithm is initialised using the proposed simplified algorithm in Section V-B.Furthermore, in all subsequent simulations, the iterative algorithm is terminated after the difference in MSE betweeen consecutive iterations falls below a threshold of 1 × 10 −6 or a maximum of 5 iterations is reached.
We now compare the performance of the proposed solutions to various benchmark LS and MMSE source-relay channel estimation algorithms.The benchmark LS algorithms utilise the optimal least squares destination processor W = (H r G(M ⊗ I N r )) † whilst the MMSE benchmark algorithms employ the optimal MMSE destination processor in (50).In these algorithms the relay precoder is selected as a naive amplify forward (NAF) matrix given by G = α I K N r , where α = } ensures the relay power constraint is met with equality.The benchmark algorithms either utilise random training symbols (RTS) or an equal power allocation (EPA) matrix.The RTS matrix S contains randomly generated QPSK symbols, whereas the EPA matrix is given by S = √ tr{P s /N s } Q where Q is the semiunitary matrix satisfying (65)-(67).The proposed algorithms are also compared to the optimal point-to-point MIMO OFDM channel estimation algorithm developed in [12].It is important to note that the utilisation of this algorithm assumes the sourcerelay channel can be estimated at the relay device.In cases where it is practical to perform channel estimation at the relay, the algorithm in [12] provides the optimal solution.However, for cases where the processing cost and power consumption at the relay should be minimised, the proposed algorithms are more appropriate.
Figure 2 shows the results of the proposed and benchmark algorithms against varying SNR s with the spatial correlation coefficients ρ s It is observed that the proposed algorithms show improved performance compared to the benchmark LS NAF and MMSE NAF designs across the whole SNR region.Interestingly, the proposed simplified solutions have very similar performance to the proposed iterative algorithm, which is guaranteed to converge to at least a locally optimal solution.The close performance of the simplified solutions to the iterative approach (which was initialised using Simplified Algorithm 1) suggests that the simplified solutions are close to a locally optimal (if not the globally optimal) solution.We also note that, in the high SNR s region, the proposed algorithms suffer a slight loss in performance compared to the optimal MMSE algorithm in [12].This is due to the fact that proposed algorithms suffer from noise added by the relay-destination channel whereas the algorithm in [12] is not affected by such noise.
Figure 3 show the performance of the various algorithms, with all simulation parameters set as in the previous algorithm but with the relay-destination spatial correlation co-efficients now set as ρ r [l] = r [l] = 0.8, ∀l.It is observed that the performance of the proposed algorithms, as well as the LS NAF and MMSE NAF algorithms suffer a loss in performance when the relay-destination channel is highly correlated.The algorithm in [12] does not suffer such a performance since as previously noted it is independent of any relay-destination channel parameters.Figure 4 shows the performance of the different algorithms with the source-relay channel spatial coefficients now increased to ρ s [l] = s [l] = 0.8, ∀l, and the relay-destination channel spatial co-efficients set as ρ r [l] = r [l] = 0.2, ∀l.Comparing the results in Figure 4 to those in Figure 2 it is seen that the proposed algorithms have improved MSE when the source-relay channel is highly correlated.The effect that the relay-destination channel noise has on the performance of the proposed algorithms is highlighted in Figure 5, where the channel estimation MSE is assessed for different SNR r values.It is clear the all proposed algorithms have poorer performance for lower SNR of the relay-destination link, whereas if the point-to-point estimation algorithm of [12] is used to estimate the source-relay channel at the relay device then the resulting channel estimate is independent of SNR r .
Figure 6 demonstrates the performance of the proposed source-relay channel estimation algorithms for various antenna configurations.Specifically, the performance of the proposed algorithms are assessed with N s = N d = {1, 2, 4} and with N r = {3, 6}.It is observed that, for all source and destination antenna configurations, the performance of the proposed algorithms decreases with an increasing number of relay antennas.This results due to the fact that an increased number of relay antennas results in an increased number of channel co-efficients to estimate and a poorer channel estimation MSE consequently results.We now investigate the performance of the proposed algorithms when the relay-destination channel is estimated imperfectly during the first phase of channel estimation i.e. when Ĥr = H r .We consider that the relay-destination channel is estimated using the optimal algorithm in [12], with the SNR during this channel estimation phase being SNR r .The quality of the relay-destination channel is obviously highly dependent on SNR r , with low SNR r signifying a poor relay-destination channel estimate and vice versa.As has been noted in the previous sections, the performance of the proposed source-relay channel estimation algorithms are dependent on the quality of the relay-destination channel estimate.
Figure 7 shows the performance of the proposed algorithms compared to that achieved by estimating the source-relay channel at the relay using the optimal MMSE algorithm in [12].Results are shown for a system with N s = N r = N d = 3 and with the spatial correlation co-efficients set as ρ s The effect of any relay-destination channel estimation error from the first phase of channel estimation is highlighted by different SNR r values.It can clearly be seen from these results that the proposed designs (as indeed do any designs where the source-relay channel is estimated at the destination) suffer a degradation in performance when the relaydestination channel is not sufficiently accurate.Estimation of the source-relay channel at the relay node does not suffer this drawback since it does not require knowledge of the relaydestination channel.However, it is again stressed that channel estimation at the relay may be impractical in many cases since the relay device may be a low cost, low power unit, and may not be capable of performing the task of channel estimation.

VII. CONCLUSIONS
In this paper we have studied the problem of channel estimation of spatially correlated frequency selective channels in dual hop MIMO OFDM relay networks.The estimation of the source-relay and relay-destination channels was conducted in two phases.In the first phase the relay-destination channel was estimated using convention point-to-point MIMO OFDM techniques.The source-relay channel was then estimated at the destination in the second phase of channel estimation.
To obtain the MMSE source-relay channel estimate, algorithms were developed to design the source training matrix, the relay precoder, and the destination processor.An iterative algorithm, which has guaranteed convergence, was firstly proposed where each variable was iteratively updated through convex programming.Due to its iterative nature, the proposed iterative algorithm would be too computationally expensive for practical implementation.Two suboptimal algorithms were therefore derived using a high SNR approximation.The suboptimal algorithms have comparable performance to the iterative algorithm but at reduced computational cost, making them more suitable for practical implementation.
Simulation results demonstrated that the proposed algorithms could achieve a better source-relay channel estimate than various benchmarks that also estimated the source-relay channel at the destination.The proposed algorithms were also compared to the optimal point-to-point MIMO OFDM channel estimation design where it was assumed that the source-relay channel could be estimated at the relay device (instead of at the destination).It was shown that if the relay-destination channel was sufficiently accurate and the relay-destination SNR was also sufficiently high, then the proposed algorithms had similar performance to this optimal algorithm.However, the performance of the proposed algorithms was degraded with poorer quality relay-destination channel estimates and/or lower relaydestination channel noise.It is concluded that if the relay device has the capability of performing channel estimation then it can achieve better performance than estimating the channel at the destination.However, when the relay is unable to perform this task (e.g due to computational cost and power constraints) then the proposed algorithms should be used to estimate the source-relay channel at the destination.

APPENDIX A
In this appendix we prove the optimal relay precoder structure given in (57) of Theorem 1.To do so we shall require the following lemma: Lemma 1: [34] For Hermitian positive semi-definite matrices A ∈ C N ×N and B ∈ C N ×N , with eigenvalues {λ a,i } N i=1 ∈ R + and {λ b,i } N i=1 ∈ R + arranged in descending order, we have the inequality where equality holds when A is diagonal with diagonal elements arranged in descending order and B is diagonal with elements arranged in ascending order.
To derive the optimal relay precoder let us introduce the following singular value decompositions (SVD's) where , , and are K N r × L h s , K N d × K N r , and K N d × K N r diagonal matrices, respectively, and contain the positive singular values Here we define R y rank{Y }, R z rank{Z}, and R r rank{ Ĥr }.The singular values in , , and are assumed w.l.o.g. to be arranged in descending order.Substituting (108) into (107) and solving the resulting equation for G we can parameterise the relay precoder as We wish to find the specific relay precoder from the parameterised set (109) that minimises the objective function in (55) and satisfies the power constraint in (56).We focus firstly on identifying the relay precoder structure that minimises (55).To this end we note that by substituting (106) and (107) into tr{E 2 } in (51), we can write where the lower bound in (112) is obtained by applying Lemma 1 to (111).It is clear that the lower bound in (112) holds with equality when V z = U y .We therefore find that any precoder given by (109) achieves the lower bound in (112), and therefore minimises (55), provided that V z = U y .To identify a unique precoder we shall select the one that consumes the least transmit power.Substituting (109) into tr{GT G H } the power consumed by the relay is where a i [U H z X −1 U z ] ii ∈ R + and {λ z,i } N i=1 ∈ R + are the eigenvalues in z .We note here that the matrices Z and U H z X −1 U z are Hermitian positive semi-definite and we therefore have {a i } N i=1 ≥ 0 and {λ z,i } N i=1 ≥ 0. With these observations it is straightforward to show from (134) that where the lower bound is obtained by applying Lemma 1 from Appendix A to (143).We note that (143) is invariant to the unitary matrix U s and that the lower bound in (144) holds with equality when V s = V ¯ s .Therefore, matrices given by (141) minimise the objective function in (93) when V s = V ¯ s .We now note that by substituting (95) into (141), and the resulting equation into tr{ S SH }, we have where the lower bound results from the use of Lemma 1 and holds with equality when V s = V ¯ s .Since both (143) and (145) are invariant to U s we can w.l.o.g.select U s = I N s .We therefore find that, by substituting U s = I N s and V s = V ¯ s into (141), the optimal structure of S is which, after defining the diagonal matrix −1/2 , concludes the proof of Theorem 3.
, and [ϒ r [l]] mn = r [l] |m−n| .where the correlation co-efficients ρ s [l], ρ r [l], s [l], and r [l], define the level of spatial correlation.The elements of H sw [l] and H r w [l] in (1) and (

1 = 1 × 1 =
the inequalities in (137) and (139) yields the desired result in (90) after multiplication by L + 1.APPENDIX DIn this appendix we prove the structure of S given in (96) of Theorem 3. Let us firstly introduce the SVDS S ¯ 1/2 s = U s V H s , (140)where is a square N s × N s diagonal matrix and contains the non-negative singular values {π i } N s i=1 ∈ R + .Substituting (95) into (140) and solving for S we find the general family of matrices S given by S = U s V H s that, through some straightforward deductions, we can write the objective function in (93) astr ¯ −1 s ⊗ Ῡ−1 s + σ −2 v s (L + 1) −1 SH S ⊗ I N r −tr ¯ s ⊗ Ῡs I L hs + σ −2 v s (L + 1) −where we have used the fact that, for matrices of commensurate dimensions,( A ⊗ B)(C ⊗ D) = ( AC ⊗ B D). Substituting (95) and (140) into the right hand side of (142), and again making use of the previous Kronecker product rule, we can write tr ¯ −1 s ⊗ Ῡ−1 s + σ −2 v s (L + 1) −1 SH S ⊗ I N r −tr ⊗ Ῡs I L hs + σ −2 v s (L + 1) −1 for matrices of commensurate dimensions.We can now formulate the joint training matrix and relay precoder design problem as min which proves the convexity of f (t).Since f (t) is convex w.r.t.t we find that q( A) in (127) is concave w.r.t. A. was defined in(91).With this result it can then be shown from (137) that s [l] s [l] ⊗ ϒ s [l].(138)From the definition of Ῡs in (92) it is straightforward to see that Ῡs ϒ s [l], ∀l, and consequently from (138) we have Rh s ¯ s ⊗ Ῡs (and equivalently R−1 s ), where ¯ s