Semi-Blind Receivers for Multi-User Massive MIMO Relay Systems Based on Block Tucker2-PARAFAC Tensor Model

Massive multiple-input multiple-output (MIMO) relay can significantly improve the capacity and throughput of wireless networks, thus has been a sought-after technique for future communication systems. However, the development of massive MIMO relay systems faces several major challenges. For example, the knowledge of instantaneous channel state information (CSI) is needed to estimate signals and optimize systems. Traditional estimation schemes need to transmit pilot sequences, which occupy the spectrum resources. In this paper, we propose a tensor-based method for joint signal and channel estimation for multi-user massive MIMO relay systems without using pilot sequences, and develop two tensor-based semi-blind receivers. Through multidimensional signaling scheme, the signals received by each user are formulated as the block Tucker2-PARAFAC (TP) tensor model. Then, two semi-blind receivers are proposed to jointly estimate the information signals and channel matrices. One is based on the tensor-based closed-form receiver, the other is based on the tensor-based iterative receiver. The proposed closed-form approach can also be used to initialize the iterative receiver for improving the convergence speed. In particular, the proposed schemes are practicable for both time division duplexing (TDD) and frequency division duplexing (FDD) modes. Uniqueness, identifiability and complexity are analyzed for our receivers. Compared with existing receivers, our receivers offer superior bit error rate (BER) and normalized mean square error (NMSE) performance. Numerical examples are shown to demonstrate the effectiveness of the proposed tensor-based receivers.


I. INTRODUCTION
Multiple-input multiple-output (MIMO) [1], [2] technique makes full use of the diversity gain and multiplexing gain of communication systems and improves the flexibility of transmission waveform design. It is one of the most important techniques in the 4th-generation (4G) wireless communication system. MIMO technique can also extend the spatial domain within a limited time-frequency resource using spatial multipath propagation characteristics. Due to this advantage, the spectral efficiency and channel capacity of The associate editor coordinating the review of this manuscript and approving it for publication was Luyu Zhao . communication systems have been significantly improved. With the coming of the next generation wireless communication, the number of users and the rate for information transmission have increased significantly, which makes the further expansion of spatial domain more urgent. Massive MIMO [3], [4] technique can meet the needs for future wireless communication services and effectively improve link reliability and data transmission rate, thus has been a sought-after technique for the 5th-generation (5G) communication system.
Massive MIMO channel vectors are asymptotically orthogonal in pairwise due to very large antenna arrays used at the base station. Therefore, the intra-cellular interference, small-scale fading and uncorrelated noise can be restrained by using simple linear processing methods [5]. For instance, zero-forcing (ZF) and maximum-ratio (MR) processing. The only factor limiting system performance is the inter-cell pilot contamination [6] generated by reusing pilot sequences, which can be solved by some existed techniques [7], [8]. In addition, the large-scale antenna array can concentrate the energy in a much narrower beam, thereby enhancing the received signals strength and the robustness of massive MIMO systems. Due to these advantages, massive MIMO has gained significant popularity in recent years. Cooperative communication [9]- [11] technique makes full use of the broadcast characteristics of wireless signals. It not only reduces the power consumption and saves bandwidth, but also improves the spectral efficiency, capacity and throughput of wireless networks without increasing the complexity of the system. In addition, multiple users exchange information through an intermediate relay, which can reduce the impact of path loss. Therefore, the combination of MIMO or massive MIMO technique and cooperative communication can further improve the performance of communication systems. It has become increasingly popular [12]- [14].
In wireless communication systems, tensor modeling [15]- [19] benefits from multiple diversity. This feature is conductive to achieving multi-user signal separation/equalization and channel estimation under the identifiability conditions which are more relaxed than that based on the traditional matrix method. It is quite mature to solve the joint estimation problem in MIMO cooperative communication systems using tensor-based approaches. A PARAFAC [17]- [19] based receiver is developed in [20] for joint symbol and channel estimation in two-hop amplifyand-forward (AF) MIMO relay communication systems. It makes use of multiple Khatri-Rao space-time (KRST) [21] coding schemes. And then, the authors formulate the received signals at the destination node as two PARAFAC models. While in [22], two independent KRST coding schemes are adopted at the source and relay nodes, which leads to a nested PARAFAC model for the signals received at destination. The work [23] considers source-destination and sourcerelay-destination links. By employing the linear constellation precoding and time-domain Vandermonde spreading for transmitted symbols, the received signals are also formulated as a nested PARAFAC model. In addition, additional cooperative diversity considered in [23] further improves the bit error rate (BER) performance. The work [24] considers a two-way AF MIMO relay system. It employs a third-order tensor space-time coding (TSTC) [25] scheme for transmitted signals at the source and relay nodes. The received signals at the relay are then modeled as a block Tucker2 model and that at the sources are modeled as a Tucker2 model [16], [17]. Two semi-blind receivers based on this tensor modeling are proposed for the channels with and without reciprocity, respectively. In addition, the two receivers avoid sending training sequences. The work [26] presents a robust semi-blind receiver based on the Tucker2 model for joint symbol and channel estimation. Notably, the semi-blind receiver in [26] can be developed into the multi-user massive MIMO system.
There are many tensor-based joint channel parameter estimation algorithms that have been successfully applied to massive MIMO systems. In [27], the PARAFAC decomposition is developed to jointly estimate channel parameters for multi-cell massive MIMO systems. Its performance comes near to the Cramér-Rao bound (CRB). The work [28] applies the PARAFAC model to millimeter wave (mmWave) massive MIMO systems, which achieves joint channel estimation of multiple users. Based on [28], PARAFAC decomposition is extended to mmWave MIMO orthogonal frequency division multiplexing (MIMO-OFDM) systems in [29] for multi-parameters estimation, and the associated CRB is also derived in order to show the performance conveniently.
Recently, the research on massive MIMO cooperative communication systems has become more and more popular due to their excellent performance. The works [30]- [32] investigate the performance of massive MIMO relay networks in different system models. In [33] and [34], some power allocation schemes are presented for massive MIMO relay systems to maximize the sum spectral efficiency. The ergodic rate has also been analyzed for massive MIMO relay systems such as in [13]. For the aforementioned massive MIMO relay communication systems, the channel state information (CSI) is needed to recover the transmitted information and carry out the optimization process. However, the CSI is unknown and required to be estimated in practical massive MIMO relay communication systems. Traditional estimation schemes use the pilot observation. But due to the fact that the number of pilot sequences increases as the number of transmit antennas increases, overhead of pilot sequences is overwhelming in massive MIMO relay systems.
As far as we know, tensor-based joint signal and channel estimation algorithms for massive MIMO cooperative communication systems are scarce. This paper keeps up with these current research hotspots and focuses on the tensor-based method for joint signal and channel estimation in massive MIMO relay systems. In cooperative communication systems, two-way relay has been widely studied, in which two users exchange their message with the assistance of the shared relay to reduce the impact of path loss. Multi-way relay [35], [36] is the natural generalization of two-way relay, in which many geographically separated users communicate with each other through an intermediate relay at the same time-frequency resource. Multi-way relay networks outperform two-way or one-way relay networks in terms of spectral efficiency because of the multiplexing gain, thus have attracted widespread attention in recent years. In this paper, we consider a multi-user (or multi-way) massive MIMO relay system. The relay node equipped with a large number of antennas serves multiple users at the same time, each of which is equipped with multiple antennas. The transmitted symbols are encoded by a three-dimensional coding matrix at the relevant user. The relay then retransmits signals to users following the AF relay protocol. We firstly formulate the signals received by users as the block Tucker2-PARAFAC (TP) model. Then, closed-form and iterative receivers using singular value decomposition (SVD) and alternating least squares (ALS) algorithms, respectively, are used to jointly estimate information signals and channel matrices. The main contributions of this paper are summarized as follows: 1) In our proposed TP model, the number of data streams sent by each user is not limited to be equal to that of transmit antennas. Moreover, the data streams can be allocated to any set of transmit antennas. Even if the number of data streams is greater than that of user's antennas, the proposed receivers still have good estimation performance. 2) Assuming that CSI is unknown at any node, the two proposed tensor-based semi-blind receivers can effectively estimate signals and channels without transmitting pilot signals, thus have higher spectral efficiency. Besides, both BER and normalized mean square error (NMSE) performance of the two receivers are superior to existing pilot-assisted receivers, and the BER performance of them is close to the ZF receiver with perfect CSI. 3) Unlike the coding scheme based on the PARAFAC model, our proposed receivers utilize the Tucker2-based coding at each user node. Only one priori symbol needs to be known for eliminating the scaling ambiguity to recover the information symbols. Therefore, the two proposed receivers have higher spectral efficiency than PARAFAC-based receivers. 4) The proposed closed-form SVD receiver shows a faster estimation speed, while the ALS receiver shows better estimation performance. Consequently, we can flexibly choose an appropriate receiver to jointly estimate signals and channels according to specific performance requirements.
The remainder of this paper is organized as follows. In Section II, we introduce the multi-user massive MIMO relay system model in detail. In addition, the TP model for received signals is also deduced in this section. The proposed closed-form and iterative receivers are shown in Section III. In Section IV, uniqueness issues, identifiability conditions and the computational complexity of these two receivers are discussed. Simulation results are demonstrated in Section V, and conclusions are drawn in Section VI. Notation: Scalars, vectors, matrices, and tensors are denoted by letters (a, b, · · · , A, B, · · ·), boldface lower-case letters (a, b, · · ·), boldface capitals (A, B, · · ·), and underlined boldface capitals A − , B − , · · · , respectively. A T , A * , A H , A † , A i· , A ·j , and A F represent transpose, conjugate, conjugate transpose, Moore-Penrose pseudo-inverse, the i-th row, the j-th column, and the Frobenius norm of the matrix A ∈ C I ×J , respectively. D i (A) and diag (a) stand for diagonal matrices formed by A i· and a, respectively. vec (A) denotes the vector obtained by stacking A ·j for j = 1, . . . , J , while unvec (·) denotes the inverse operation of vectorization. bd [B 1 , . . . , B K ] is a KM × KN block diagonal matrix, in which B k ∈ C M ×N for all k = 1, . . . , K . The identity matrix of N × N and the all-ones column vector of N × 1 are represented by I N and 1 N , respectively. The ceiling function x gives the smallest integer not less than x.
The Kronecker product and the Khatri-Rao product (or column-wise Kronecker product) are denoted by notations ⊗ and , respectively. The Khatri-Rao product of A ∈ C I ×N and B ∈ C J ×N is . . .
where A B ∈ C IJ ×N . |⊗| represents partial Kronecker product. ForÃ ∈ C I ×JK andB ∈ C M ×NK , we havẽ whereÃ In addition, the Kronecker product has the following property:

II. SYSTEM MODEL
We consider a multi-user massive MIMO relay system as illustrated in Fig. 1, which consists of Q users and one relay node. Each user is configured with M S antennas and the relay node with M R antennas. All nodes in this cooperative communication system are supposed to operate in the half-duplex mode, i.e., simultaneous transmission and reception of signals cannot be achieved. It is also assumed that due to the effects such as shadowing and multipath fading, there is no direct-path link between each user. So the information can only be exchanged with the assistance of the relay. The purpose of each user is to estimate the information signals sent by other users and the channel matrices. All channels in this paper are considered to follow independent and identically distributed (i.i.d.) Rayleigh fading [30]- [34]. The channel matrix from the user q to the relay is denoted by H The entire transmission protocol is divided into P transmission sub-processes and each of them is further divided into two phases: the multiple access (MAC) phase and the broadcast (BC) phase. During the MAC phase, all Q users transmit their signals to the relay node simultaneously. In the BC phase, the relay amplifies the received signals and broadcasts them to all Q users. S (q) ∈ C N ×R denotes the information symbol matrix transmitted by the user q, where R is the number of independent signal streams and each of them consists of N information symbols.
For the MAC phase of p-th transmission sub-process with p = 1, . . . , P, the user q first encodes S (q) with a three-dimensional coding matrix C (q) ··p ∈ C M S ×R and then sends them to the relay node via the channel H (q) SR . All signals received by the relay node are stored in the third-order tensor where S = S (1) , . . . , S (Q) ∈ C N ×QR is the information symbol matrix containing signals transmitted by all Q users, complex Gaussian noise with zero mean and unit variance at the relay node.
In the BC phase of p-th transmission sub-process, the relay node amplifies the received signals from all users with K different AF matrices and then broadcasts them to Q users. The received signals at the l-th user for the k-th time-block is given by where G ∈ C K ×M R is the amplification matrix containing the AF factors from all K time blocks, V Gaussian noise with zero mean and unit variance at the user l and l = 1, . . . , Q. Stacking signals of K time blocks received by l-th user and using (1), we have where p is the combined noise originating from the MAC phase of p-th transmission sub-process. V (l) p is the compound noise composed of noise from all K time blocks, which is generated by the BC phase of p-th transmission sub-process.
By defining and using (2), X (l) 1 can be expressed as (10), as shown at the bottom of the next page, where H (l) is the combined channel matrix, and H is the compound channel matrix consisting in channels from all users to the relay node. Our proposed receivers firstly estimate H (l) , and then estimate H (l)

RS and H (q)
SR for all l, q = 1, . . . , Q. From (6), we have By defining the following expression is obtained: where F (q) Moreover, by applying the operator vec (·) to Y (l) p and using the property (3), we have where We also define the following matrix unfolding and obtain where F (q) Equations (10), (18) and (24) correspond to the TP model. Joint estimation of symbol and channel matrices can be achieved by the essential uniqueness of block Tucker2 and PARAFAC decompositions. In this work, it is assumed that the amplification coefficient matrix and all three-dimensional coding matrices are chosen beforehand and are known at the users, i.e., G, F 1 , F 2 , and F 3 are chosen beforehand and are known.

III. TENSOR-BASED RECEIVERS
Based on the TP model introduced in the previous section, two following tensor-based receivers are proposed. The first one is closed-form solution, which consists in TP model based on the closed-form SVD (TP-SVD) algorithm. The second receiver is iterative solution, which consists in TP model based on the ALS (TP-ALS) algorithm. Table 1 and Table 2 summarize the joint signal and channel estimation processes of the proposed TP-SVD and TP-ALS receivers, respectively.

A. TP-SVD RECEIVER
For the TP-SVD receiver, the closed-form Tucker2-SVD (T-SVD) method for (24) is firstly used to jointly estimate the information symbol matrix S and the combined channel H (l) . After that, the closed-form PARAFAC-SVD (P-SVD) approach will be exploited to estimate the channels

H (l)
RS from the relay node to the l-th user and H (q) SR from the q-th user to the relay node for all l, q = 1, . . . , Q.

1) T-SVD ALGORITHM
According to (24), if F 3 is full row-rank, then whereŜ andĤ (l) stand for the estimation of S and H (l) , respectively. For the convenience of presentation, the following symbolic representations are defined: ThusĤ (l) and Z (l) can also be rewritten aŝ Each sub-block T (q,l) of Z (l) can be solved by the closed-form T-SVD algorithm to estimate S (q) and H (q,l) for q = 1, . . . , Q.
By rearranging the elements in T (q,l) , the following vector as (33), shown at the bottom of this page, can be obtained, n,r =ŝ n,rĤ (q,l) ∈ C K M S ×M S . Applying the unvec (·) operator to (q,l) , the following rank-1 matrix as (34), shown at the bottom of this page, can be obtained, where unvec (q,l) ∈ C K M S 2 ×NR . Due to the norm equivalent property as (35), shown at the bottom of this page, then unvec (q,l) instead of T (q,l) is used for estimation.
Applying SVD to the rank-1 matrix unvec (q,l) , (34) becomes are the first column of U (q,l) and V (q,l) , respectively. Since the scaling ambiguity exists, we have vec , where α q,l is the scalar factor. The detailed explanation of this scaling ambiguity is stated in Section IV from (57) to (61). In order to eliminate this ambiguity, we can set s (q) 1,1 = 1 in practical communication systems and further obtain α q,l = 1 v (q,l) * 1,1 . Therefore, vec

2) P-SVD ALGORITHM
According to the property of the PARAFAC decomposition [17]- [19], (11) is one of the compact forms of the PARAFAC model. The corresponding scalar form can be written as Since the estimated combined channel matrixĤ (l) is obtained by the above T-SVD algorithm, H RS and H can be further estimated by the closed-form P-SVD algorithm from (40).
Assuming that G is full column-rank, then Ĥ (l) RS Ĥ T is obtained by post-multiplying (40) with G T † . For the convenience of presentation, the following symbolic representation is defined: Applying the unvec (·) operator to P ·m R ∈ C QM S 2 ×1 , we obtain where Q (m R ) ∈ C QM S ×M S is a rank-1 matrix. Applying SVD to the rank-1 matrix Q (m R ) , (42) becomes where (m R ) ∈ C QM S ×M S is a diagonal matrix and σ are the first column of U (m R ) and V (m R ) , respectively. Since the scaling ambiguity exists, we have Ĥ (l) is the scalar factor. The detailed explanation of this scaling ambiguity is stated in Section IV from (62) to (64). In order to eliminate this ambiguity, we can set H (l) RS 1· = 1 T M R in practical communication systems and further obtain α (m R ) = 1 v (m R ) * 1,1 . Therefore, we have T-SVD and P-SVD approaches which have been developed above are closed-form solutions. Since they do not need any iterative procedures, the TP-SVD receiver works very fast. However, it has extreme identifiability conditions, which will be analyzed in Section IV. Moreover, the signal and channel estimation performance can be further improved by an iterative procedure, i.e., the proposed TP-ALS receiver.

B. TP-ALS RECEIVER
In the TP-ALS receiver, an iterative Tucker2-ALS (T-ALS) algorithm for (10) and (18) is firstly utilized to jointly estimate the information symbol matrix S and channel matrix H (l) . Afterwards, an iterative PARAFAC-ALS (P-ALS) algorithm is used to estimate the channels H (l)

1) T-ALS ALGORITHM
According to (10) and (18), the corresponding least squares (LS) estimates can be expressed as follows: Because the conditional LS estimates will be improved or maintained but not be worsen in the fitting process, the estimates of S and H (l) are updated iteratively by using formulas (46)

2) P-ALS ALGORITHM
After obtainingĤ (l) as described above, the property of PARAFAC decomposition is utilized to estimate H (l)

RS and H.
By reshaping H (l) , another compact form of (39) can be expressed as According to (11) and (50), the corresponding LS estimates can be represented as follows: The estimates of H and H (l) RS are updated iteratively by using formulas (51) and (52). The iterative P-ALS algorithm also converges monotonically to a local optimal solution [18].
After convergence, the LS estimatesĤ

C. DISSCUSSION
Both TP-SVD and TP-ALS receivers can be used for precise joint estimation. Generally speaking, the TP-ALS receiver outperforms the TP-SVD receiver in terms of estimation accuracy, but it has higher computational complexity than that of the TP-SVD receiver. Moreover, the convergence speed of iterative algorithms in TP-ALS receiver depends on initial values. Notably, the identifiability of the TP-SVD receiver is stricter than that of the TP-ALS receiver. However, the estimates in TP-SVD receiver are still valuable even in the case that the identifiability is not satisfied, because those estimates contain some information related to true values. In order to improve the convergence speed of TP-ALS receiver, T-SVD and P-SVD algorithms can be used to initialize T-ALS and P-ALS algorithms, respectively. More specifically,Ĥ (l) in the T-ALS algorithm andĤ (l) RS in the P-ALS algorithm are initialized by the estimated matrices of T-SVD and P-SVD algorithms, respectively. For convenience of description, it is named SVD-TP-ALS receiver.

IV. UNIQUENESS, IDENTIFIABILITY AND COMPLEXITY
In this section, uniqueness issues, identifiability conditions, and the computational complexity of these two receivers are discussed.

A. UNIQUENESS ISSUES
From the equation (24), X (l) 3 can be rewritten as Let us define then W (q,l) can be rewritten as (57), as shown at the bottom of this page, in which where A (q,l) ∈ C R×R and B (q,l) ∈ C M S ×M S are nonsingular matrices.Ŝ (q) andĤ (q,l) are alternative solutions with the rotational ambiguity for S (q) and H (q,l) , respectively. In order to recover S (q) and H (q,l) , it is necessary to eliminate the rotational freedom caused by the nonsingular transformation matrices A (q,l) and B (q,l) . Assuming that S (q) and H (q,l) are full column-rank and according to if F

(q)
3 is full row-rank, then Equation S (q) ⊗ H (q,l) = Ŝ (q) ⊗Ĥ (q,l) only holds if A (q,l) and B (q,l) are identity matrices with respect to complementary scale factors, i.e. A (q,l) = β (q,l) I R , B (q,l) = 1 β (q,l) I M S . It means that S (q) and H (q,l) are unique in the presence of the scalar factor β (q,l) , i.e., Equation (39) is a PARAFAC model. Applying the uniqueness theorem of PARAFAC decomposition [37], if 3 .
(57) VOLUME 8, 2020 for the uniqueness of the proposed receivers. In this paper, the amplification coefficient matrix G is known and can be chosen as a full rank matrix. Assuming that G is full columnrank, a more relaxed uniqueness condition can be derived as follows. From the equation (40), H (l) can be rewritten as Similar to the deduction from (57) to (61), it can be seen that H and H m R · are unique in the presence of scalar factors for all m R = 1, . . . , M R , i.e., where γ m R is the scalar factor, Ĥ (l)

B. IDENTIFIABILITY CONDITIONS
Since system parameter settings are subject to identifiability conditions, the identifiability of the proposed receivers is very important for signal detection and combined channel matrix estimation. It is directly related to the uniqueness in the LS sense of the minimized cost function.
For the TP-SVD receiver, its identifiability in the LS sense is deduced from (28) and (41). The following full row-rank conditions should be met For the TP-ALS receiver, its identifiability in the LS sense is deduced from (46), (47), (51), and (52). The following full column-rank conditions should be met Equations (65)-(66) and (67)-(70) are necessary and sufficient conditions of system identifiability for TP-SVD and TP-ALS receivers, respectively. For the proposed TP-SVD receiver, (65) and (66) imply for F 3 ∈ C QRM S ×P and G ∈ C K ×M R . Therefore, (71) is a necessary condition of identifiability for the TP-SVD receiver. For the proposed TP-ALS receiver, (67)-(70) imply for |⊗| H (q) F 1 ∈ C PK M S ×QR , ( |⊗| S) F 2 ∈ C PN ×QM S , G H (l) RS ∈ C K M S ×M R , and H T G ∈ C QK M S ×M R . Therefore, according to (72), the necessary condition of identifiability for the TP-ALS receiver is By comparing (71) and (73), it can be seen that the necessary condition of identifiability for the TP-ALS receiver is more relaxed than that for the TP-SVD receiver.

C. COMPLEXITY
The computational complexity of both receivers is demonstrated in Table 3  For the TP-ALS receiver, there are four full column-rank matrices (i.e., |⊗|Ĥ (l) F 1 , |⊗|Ŝ F 2 , G Ĥ (l) RS , and Ĥ T G ) need to be pseudo-inversed for each iteration. Correspondingly, four complex multiplications (i.e., RS †Ĥ (l) , and Ĥ T G †Ĥ (l) ) are calculated for each iteration. Therefore, the computational complexity of the TP-ALS receiver for each iteration is

V. SIMULATION RESULTS
In this section, Monte Carlo simulations are utilized to verify the performance of the proposed tensor-based receivers, which carry out joint symbol and channel estimation. Consequently, the BER and NMSE of transmitted information symbol and channel matrices are used to characterize the estimation performance. NMSE of channel matrices H (l)  RS and H at the m-th run, respectively.

RS and
The entries in three-dimensional coding matrices at users are given by exp j2πu √ M S R , where u is a random variable varying from 0 to 1. The amplification coefficient matrix G is drawn from a truncated discrete Fourier transform (DFT) matrix. In addition, 64QAM is adopted to modulate information symbols. The channel matrices H = 1 T M R in order to eliminate scaling ambiguities as in [26] and [38]. δ th = 1 × 10 −4 is chosen for the proposed TP-ALS receiver. SR for q = 1, . . . , Q need to be estimated. It is obvious that the estimation algorithms in TDD mode are simpler than that in FDD mode.
Since one priori symbol of signal matrix transmitted by each user needs to be known for eliminating the scaling ambiguity, the number of information symbols transmitted by Q users is Q (NR − 1). The symbol periods of two transmitting phases is 2KPN . Therefore, the transmission rate of our proposed semi-blind receivers is r a = Q(NR−1) 2KPN . For different simulation parameters of our proposed receivers, the transmission rate can be calculated according to this formula.

A. COMPARISON OF TP-ALS AND SVD-TP-ALS RECEIVERS
In the first example, we compare the performance of the proposed TP-ALS and the proposed SVD-TP-ALS receivers. System parameters are set as: N = 10, K = 32, P = 10. It can be seen from Fig. 2 that the BER and NMSE performance of the two proposed receivers are very close. In Table 4, Iter1_Average and Iter1_Max represent the average and maximum iteration number of the T-ALS algorithm, respectively. Similarly, Iter2_Average and Iter2_Max represent the average and maximum iteration number of the P-ALS algorithm, respectively. It can be seen from Table 4 that compared with the TP-ALS receiver, the average iteration number and the mean processing time of the SVD-TP-ALS receiver are significantly reduced. It is particularly noteworthy that the P-ALS algorithm of the SVD-TP-ALS receiver converges with only two iterations. The same situation is also shown in Table 5 at simulation B and Table 6 at simulation C.   It implies that the computational complexity of SVD-TP-ALS receiver is greatly reduced.
Since the SVD-TP-ALS receiver converges faster, takes less time and has the same performance compared with the TP-ALS receiver, we use SVD-TP-ALS for performance demonstration in subsequent simulations.

B. IMPACT OF N, K , P, AND R
In the second example, we study the impact of N , K , P, and R on the BER, NMSE and iteration number for the proposed SVD-TP-ALS receiver. Unlike traditional KRST coding schemes [21]- [23], the number of data streams of the proposed receivers can not be limited to be equal to that of user's antennas. It can be seen from Fig. 3 and Fig. 4 that the BER and NMSE performance increase gradually with the increase of N , K , and P. The SVD-TP-ALS receiver can effectively detect signals and estimate channels. Furthermore, the BER and NMSE performance decrease as R increases. However, the proposed SVD-TP-ALS receiver still has superior BER and NMSE performance even if the number of data streams is greater than that of user's antennas (R = 3 > M S = 2). At a BER of 1 × 10 −4 , the gap between R = 2 and R = 3 is only around 2.2dB. The gap of NMSE is much smaller than that of BER. Table 5 shows the number of iterations as N , K , P, and R increase with different SNR. It is expected that the iteration number decreases with the increase of N , K , and P while increases with the increase of R. The data transmission rate does not decrease as N increases, while the coding gain increases as K and P increase. Therefore, we can flexibly choose the appropriate system parameters according to specific performance requirements.

C. COMPARISON OF DIFFERENT NUMBER OF USERS
In the third example, we analyze the performance for different number of users with    The BER and NMSE performance versus SNR are demonstrated by Fig. 5 and Fig. 6. Since the inter-user interference and the channel information need to be estimated increase with the increase of the number of users, the BER and  NMSE performance decrease as shown in Fig. 5 and Fig. 6. Table 6 demonstrates the iteration number and mean processing time for different number of users. As shown in Table 6, the number of users mainly affects the iteration number of the T-ALS algorithm and further affects the mean processing time. The computational complexity is significantly reduced as the number of users decreases.
The estimation performance of the proposed SVD-TP-ALS receiver decreases and the complexity increases with the increase of user number. However, even when the number of users is large (Q = 8), the proposed receiver still has great estimation performance with low complexity. Since the channel matrices have been estimated, the LS estimate of the information symbol matrix S is further obtained by (46). For the traditional PA-LS receiver, it should be noted that the relay need to estimate channel matrices, which increases the burden on the relay.
The computational cost for calculating the pseudo-inverse of S (S−R) ∈ C QM S ×L S , S (R−q) ∈ C M R ×L R , and and O KPM S Q 2 R 2 , respectively. Besides, the complex multiplication cost for Y (R) S (S−R) † , Y (q) S (R−q) † , and are QL S M S M R , L R M S M R , and QNKPRM S , respectively. Therefore, the computational complexity of the PA-LS receiver is

2) PILOT-ASSISTED BILINEAR ALS (PA-BALS) RECEIVER
For the PA-BALS receiver, the signal matrix received by the relay is the same as the PA-LS receiver. But the difference is that the PA-BALS receiver do not carry out channel estimation at the relay. The relay node amplifies received signals with K different AF matrices as in [38] and then broadcasts them to all users. The received signals at the user q are denoted by where RS H ∈ C K M S ×QM S , and L S has the same meaning and assumption as the PA-LS receiver. We then obtainĤ (q) = Y (q) S (S−R) † = G Ĥ (q)

RS
Ĥ , which is one of the compact forms of the PARAFAC model. H (q) = Ĥ T G Ĥ (q)T RS is obtained by reshapingĤ (q) . After that, the BALS fitting algorithm [38] is utilized to estimate channel matrices H The first term is from the pseudo-inverse of G Ĥ (q)

RS
and Ĥ T G . The second term is from the complex multiplication of G Ĥ (q) RS †Ĥ (q) and Ĥ T G †Ĥ (q) .
In the fourth example, the performance of PA-LS, PA-BALS and the proposed SVD-TP-ALS receivers in terms of BER and NMSE is contrasted in Fig. 7 and Fig. 8. In particular, the BER curve using ZF receiver with perfect CSI is plotted in Fig. 7    compared with the PA-LS and PA-BALS receivers, respectively. At the same situation, the loss using the proposed SVD-TP-ALS receiver is around 3.3dB compared with the ZF receiver with perfect CSI. The gap of the NMSE performance between the SVD-TP-ALS receiver and the two pilot-assisted receivers is greater than that of the BER performance.
The competitive PA-LS and PA-BALS receivers have lower complexity compared with the proposed SVD-TP-ALS receiver. However, the proposed receiver has higher spectral efficiency without transmitting pilot sequences. In addition, the proposed SVD-TP-ALS receiver can carry out joint symbol and channel estimation, and has better BER and NMSE performance than PA-LS and PA-BALS receivers.

E. PERFORMANCE OF THE PROPOSED TP-SVD RECEIVER
In the final example, we study the performance of the proposed TP-SVD receiver. Besides, the performance of the PA-LS, PA-BALS, and the proposed SVD-TP-ALS receivers are analyzed for comparison. System parameters are set as: N = 24, K = 32, P = 64, L S = Q * M S = 16, L R = M R = 32. In this communication scenario, identifiability conditions in (71) are satisfied for the TP-SVD receiver. As shown in Fig. 9 and Fig. 10, both the proposed TP-SVD and SVD-TP-ALS receivers outperform the two competitive pilot-assisted receivers. We also see that compared with the proposed SVD-TP-ALS receiver, there are around 3dB loss about NMSE H (l) RS and 2.6dB loss about NMSE (H) by using the proposed TP-SVD receiver. However, it can be seen from Fig. 11 that the proposed TP-SVD receiver has lower complexity compared with two competitive pilot-assisted receivers. In addition, the complexity of the TP-SVD receiver is even lower than that of the ZF receiver. Therefore, the proposed TP-SVD receiver is also a good choice when the system requires a compromise between performance and complexity.

VI. CONCLUSION
A TP tensor model for the multi-user massive MIMO relay system has been presented in this paper. Based on this tensor model, the TP-SVD and TP-ALS receivers are developed to jointly estimate the information symbol and the channel matrices. The TP-ALS receiver can be initialized by the TP-SVD receiver, i.e., the SVD-TP-ALS receiver. Moreover, we analyze the uniqueness and identifiability conditions in detail. As shown in simulations, both of the proposed TP-SVD and SVD-TP-ALS receivers have better estimation performance than PA-LS and PA-BALS receivers. The mean processing time of the proposed TP-SVD receiver is even shorter than that of the ZF receiver with perfect CSI. In addition, the proposed receivers estimate signals and channels without transmitting pilot sequences, thus have higher spectral efficiency. Perspective of our work contains an extension to mmWave MIMO relay systems [39], [40] for joint symbol and channel parameter estimation, which includes information symbols, angles and fading coefficients. Effective tensor fitting algorithms will also be developed in that work.