Fronthaul Compression for Uplink Massive MIMO Using Matrix Decomposition

Massive multiple-input-multiple-output (MIMO) is a key enabler for obtaining higher data rates in the next generation wireless technology. While it has the power to transform cellular communication, with potential for spatial diversity and multiplexing, a bottleneck that often gets overlooked is the fronthaul capacity. The fronthaul link that connects a massive MIMO Remote Radio Head (RRH) and carries in-phase and quadrature (IQ) samples to the Baseband Unit (BBU) of the base station can throttle the network capacity/speed if appropriate data compression techniques are not applied, particularly in the uplink. This paper proposes an iterative technique for fronthaul load reduction in the uplink for massive MIMO systems utilizing the convolution structure of the received signals. The proposed algorithm provides compression ratios of about 30- $50\times $ . This work provides extensive analysis of the performance of the proposed method for a plethora of practical scenarios and constraints, such as different channel parameters and models, receive antenna correlation, and under imperfect channel information. It also discusses the numerical convergence and complexity of the proposed algorithm and compares the performance against other existing compression techniques.


I. INTRODUCTION
M ASSIVE MIMO is slated to change wireless communication experience with unprecedented speeds, network capacity and coverage [2]. The 5G New Radio (NR) wireless standards support massive MIMO for a wide variety of applications and deployment scenarios [3]. Next generation wireless systems will likely capitalise on this framework and use massive MIMO to enhance user experience and improve system performance with the use of large-scale antenna arrays in a variety of configurations [2]. Large scale antenna arrays can provide huge diversity gain and beamforming opportunities for millimeter wave applications as well as spatial multiplexing to support a large number of users or offer massive data rates with several MIMO layers for a few users [4]. However, one major roadblock to achieving these goals is the fronthaul data rate [5].
The fronthaul connects the radio frequency (RF) components of the base station, termed the Remote Radio Head (RRH), to its baseband processing unit (BBU), typically using optical fibre cables ( Fig. 1) [6]. The separation between the RRH and the BBU can range from a few metres to several kilometres depending on the deployment scenario or network architecture. For example, massively distributed MIMO deployments offer enhanced coverage for edge users by locating antennas over a wide area which are connected to a central server [2]. Such an architecture not only reduces the capital expenditure for network operators, since antenna installations are inexpensive and easy to maintain, but it also allows for cooperative interference management strategies in the network [6]. Since the fronthaul load scales up with the number of RRH antennas, laying high speed optical fibres for increasing antenna elements will increase the cost for network operators significantly. For instance, sending raw IQ samples from the RRH to the BBU requires a data rate upto 236 Gbps for 100 MHz bandwidth for a 64-antenna base station [7]. Given that the economical optical fibre cables and optical electronics used for fronthaul data transport have capacities in the range of only 10-40 Gbps, approximately 10× compression is required to make such MIMO architectures economically feasible for the operators.
Numerous compression techniques have been proposed to reduce fronthaul load. In the downlink, data is to be compressed at the BBU and transmitted to the RRH via the fronthaul. The BBU has complete knowledge of the data being transmitted and its parameters, such as the modulation order, the pilot symbols, the subcarrier allocation for data and pilots, the number of users, the number of MIMO layers, etc., which can be effectively utilized to achieve high compression rates. However, in the uplink direction, such knowledge is not available for compression at the RRH unless the entire receiver processing chain is implemented at the RRH. This makes achieving high compression rates for uplink fronthaul a challenging task.
An important factor that affects the fronthaul data rate is the functional split between the RRH and the BBU [7]. The Open Radio Access Network (O-RAN) Alliance outlines multiple ways in which the transmitter/receiver processing chain can be split between the BBU and the RRH and how each of these splits can be implemented/managed for different use cases. Fig. 2 outlines the different functional splits possible in the uplink [8]. The higher level splits are simpler to implement since the RRH does not require expensive computational resources for these splits. However, the further one moves down the receiver processing chain, the higher the compression ratios that can be obtained as one gets closer to decoding the received data. The highest compression ratio is obtained if the receive chain is fully implemented at the RRH, but this implies one cannot take advantage of cooperative interference management strategies such as mitigating pilot contamination that a centralized BBU can offer. The most popular split is the 7.2 split as it offers a good tradeoff between cost, compression ratio and centralization, and is used in [9]. But further increase in antenna elements as envisioned for 6G renders even 5× compression [9] insufficient. The ideal compression method would be one that offers both the high compression ratio provided by near completion of the receive chain at the RRH, but can also take advantage of centralized processing. This is precisely the aim of this work.
Most fronthaul load reduction techniques for the uplink exploit either the redundancies in the received waveform [10], [11], a priori knowledge of the characteristics of the received signal [12] or the correlation between the base station antennas [9], [13]. A technique that achieves a compression rate of 1/2 by removing redundant spectrum and reducing quantization bit-width is proposed in [10]. A lossy compression method is discussed in [11] which applies Fast Fourier Transform (FFT) and Discrete Cosine Transform (DCT) to the received signals and discards low power frequency coefficients. Though these methods have the advantage of low implementation complexity, they do not offer high compression ratios. On the other hand, compressive sensing techniques that make use of the sparsity of uplink signals offer slightly better but varying compression ratios depending upon the frequency of bursty transmissions from the user equipments (UEs) [12]. Antenna selection, wherein a subset of antennas is selected for reception at a time [14], can be applied to reduce the number of RF chains and consequently the fronthaul bandwidth required in the uplink. Although this can offer significant compression ratios, it can lead to loss in diversity or multiplexing capacity since the antenna array is not fully utilized. The principal component analysis (PCA) compression performed on the matrix of received signals in [13] utilizes the spatio-temporal correlation of the signals. Similarly, low-rank approximation of the received signal matrix via QR decomposition is used in [9] to reduce the fronthaul load. Although the methods in [9], [13] achieve better compression ratios than those proposed in [10], [11] or [12] at the cost of increased computations, the highest compression ratio they achieve for 64 antennas is 5×.
The goal in this work is to reduce the number of samples that have to be transmitted from the RRH to the BBU via fronthaul. For this, a block of N complex time-domain samples received at the N r antennas at the RRH are considered together to exploit their joint characteristics for the compression. These samples can be arranged in the form of an N ×N r matrix. For example, N can be the number of samples received in the duration of one orthogonal frequency division multiplexing (OFDM) symbol, which is the FFT size used for the OFDM. In this paper, a massive MIMO system in which N r is higher than the number of users simultaneously served (in a transmission time interval) is considered [15]. The columns of the received signal matrix at the RRH are the result of the user data sequence convolved with a different multi-path channel response corresponding to each receive antenna in the time domain. In the frequency domain, this received signal matrix can be expressed as the product of an N-dimensional diagonal matrix of user data and the Fourier transform of the L × N r channel matrix, where L is the number of significant multi-paths. If L N r , as is typical for a well-designed massive MIMO system, representing the received signal matrix in terms of the N-length user data sequence and the L × N r channel matrix allows its recovery at the BBU using far fewer samples than NN r . Towards this end, a blind deconvolution of the received signal matrix needs to be performed at the RRH. Noting that the N-point Fourier transform of the L × N r channel matrix is low rank, given L < min{N, N r }, an iterative algorithm consisting of alternating minimization similar to the one in [16] used for low rank matrix sensing is presented in this work. However, the proposed algorithm is constructed differently from [16] to reflect the fact that only one of the deconvolved matrices (the channel matrix) is low rank while the other (the user data matrix) is a full rank diagonal matrix. Deconvolution using non-convex minimization has been considered in [17]. However, a different approach is adopted here to solve the objective function since a matrix of multiple sequences has to be deconvolved rather than a single sequence as in [17].
The problem of fronthaul compression has also been studied in the context of a Cloud Radio Access Network (C-RAN), where several RRHs are connected to a central BBU. A distributed spatial filtering scheme and joint optimization of users' power allocation, RRHs' spatial filter design and quantization bits allocation, as well as BBU's receive beamforming to maximise the minimum signal-to-interference-plus-noise ratio (SINR) of all the users in the uplink in a C-RAN is proposed in [18].
Reference [19] investigates the joint design of fronthaul compression and precoding, and solves the optimization problem of maximising the ergodic capacity for two downlink functional splits in C-RAN, compression-after-precoding (CAP) and compression-before-precoding (CBP). In [20], the optimal design for the multiplex-and forward (MF) and the decompress-process-and-recompress (DPR) backhaul schemes is investigated, with the aim of maximising the sum-rate under the backhaul capacity constraints for a multihop topology in a C-RAN. Reference [21] studies the joint design of precoding and backhaul compression strategies for downlink C-RAN. It aims to maximise the weighted sum-rate with respect to both the precoding matrix and the joint correlation matrix of the quantization noises subject to power and backhaul capacity constraints. These works ( [18], [19], [20], [21]) look at the fronthaul compression problem with the aim of maximising the information theoretic capacity of multiplexed fronthaul links, subject to power/capacity constraints within a C-RAN framework. In contrast, the framework in this paper considers uplink data compression from a signal processing perspective rather than an information theoretic one. It aims to reduce the total number of digital samples that need to be sent in a single BBU-RRH fronthaul link. Moreover, the compression scheme presented here can be used at each RRH independent of the BBU or any number of other RRHs connected to the BBU. Therefore, the algorithm proposed in this paper can be applied to any type of network deployment/architecture.
The main contribution of this paper is to propose a novel fronthaul compression method for uplink massive MIMO that exploits the underlying convolution structure of the received signals. Although previous works have identified and utilized the low rank nature of the signals for their compression [9], [12], [13], these methods are agnostic to the convolution structure of the signals which gives rise to this low rank nature. The proposed method takes into account this signal structure to decompose the received signal into an approximate user signal matrix and a channel matrix. As a result, it offers a compression rate comparable to that achieved if the receive chain is fully implemented at the RRH, while still retaining the ability to leverage the advantages offered by centralization. Thus, it provides both orders of magnitude higher compression ratios than that offered by existing methods and a flexible functional split between the RRH and BBU as the iterative technique proposed in this paper can also be used for the blind demodulation of massive MIMO OFDM signals.
This paper is organized as follows: Section II outlines the system model, Section III presents the compression algorithm for single user and multi-user cases, and Sections IV-VI present the results of link level simulations of the algorithm for various scenarios. Section IV analyzes the robustness of the proposed method under different channel conditions, examines the impact of receive antenna correlation on the proposed method and evaluates the performance of the method for realistic channel models. Section V discusses the convergence and complexity of the proposed compression algorithm. Section VI compares its performance vis-á-vis the PCA compression in [13] and the uncompressed system. Finally, Section VII concludes the work.

II. SYSTEM MODEL
This paper considers a massive MIMO 5G base station RRH that has N r antennas and receives signals from N u users in the uplink at a given time slot. Since the uplink multiple access scheme in 5G is orthogonal frequency division multiple access (OFDMA) [22], the bit-stream from each user is mapped to an M-QAM (Quadrature Amplitude Modulation) symbol constellation followed by subcarrier mapping, inverse Fourier transform, and cyclic prefix (CP) addition. Let N denote the size of the FFT at the receiver. It is assumed that the channel between each user and each antenna has a maximum of L significant multi-paths in time-domain, and this L is known at the RRH. After sampling and removal of the CP, the signal received at antenna r at a sampling instant n is is the OFDM symbol of user u, h r,u [n] is the multi-path channel response between user u and antenna r, represents circular convolution resulting from the cyclic prefix in OFDM, and w r is the additive white circularly symmetric complex Gaussian noise (AWGN) at antenna r with variance σ 2 .
Consider a block of N samples in the time-domain received at the RRH after CP removal. The channel is assumed to be constant for the duration of these N samples. Let N be the size of FFT (or the size of one OFDM symbol).
The signal received at the RRH across all the antennas is Here, each column of Y represents the received signal at each antenna over N sampling instants. The data Y needs to be sent from the RRH to the BBU via the fronthaul link for its faithful reconstruction at the BBU.

III. COMPRESSION USING MATRIX DECOMPOSITION
This paper considers the convolution structure inherent in the received signal matrix Y for its compression at the RRH and reconstruction at the BBU, which is outlined in Fig. 3. This structure depends on the manner in which the N subcarriers are allocated to different users. Users can be allocated subcarriers in one of the following three ways: 1) A single user is allocated all the N subcarriers. 2) Multiple users share the same set of N subcarriers (overlapping allocation). 3) Multiple users are allocated subsets of the N subcarriers in a non-overlapping manner.
In the frequency domain, the matrix Y can be decomposed in different ways corresponding to the above three cases. This decomposition reveals the low rank nature of Y when the number of significant multi-paths, L < min{N, N r }, and forms the crux of the compression proposed in this work. For ease of analysis, compression in the above cases for users with single antenna is discussed first. Later in the section, it is shown how these can be extended to users with multiple antennas.

A. SINGLE USER CASE
The matrix Y can be expressed in the frequency domain by applying FFT to Y, and using the subscript f to denote quantities in the frequency domain, as where X f is the N × N diagonal matrix with the N-length M-QAM user data as its diagonal, H f is the N × N r multipath channel matrix in the frequency domain, and W f is the noise in the frequency domain.
In a massive MIMO setting, the phenomenon of channel hardening decreases the delay spread of the channel [15]. This means that the channel has very few resolvable multipath components in the time-domain. If the channel is assumed to have L resolvable taps in the time-domain, and H t denotes the L × N r time-domain channel response for the N r antennas, then the frequency domain channel, H f can be expressed as the product F L H t , where F L denotes L columns of the N × N Discrete Fourier Transform (DFT) matrix. Thus, the frequency-domain channel, H f in (1) is the result of oversampling in the frequency domain due to the large FFT/OFDM symbol size used in 5G. Equation (1) can be expressed using the above as If Y f is transmitted on the fronthaul without any compression, then NN r complex samples are required to be sent via the fronthaul. However, (2) reveals that the signal component of Y f can be reconstructed with the N non-zero samples corresponding to X f and the LN r samples corresponding to H t . Note that L N due to the channel hardening effect. Therefore, if Y f can be decomposed into its signal component X f and the time-domain channel H t , only N + LN r samples need to be sent from the RRH to the BBU, to recover Y f . Therefore, this gives the compression ratio for a single user (CR SU ) as An obvious way of obtaining X f and H t from Y f is coherent demodulation of Y f . This would require pilots and modulation order to be known and the entire receive chain to be present at the RRH [23]. However, in general, this information is not available at the RRH [8]. The idea in this paper is to decompose Y f into matriceŝ X andĤ (not necessarily equal to X f and H t ) so that Y f ≈XF LĤ , without any knowledge of pilots/modulation order. Thus, for compressing Y f , a diagonal matrixX and an L × N r matrixĤ that minimise ||Y f −XF LĤ || 2 F have to be found.
Minimising ||Y f −XF LĤ || 2 F , where bothX andĤ are unknown, is a non-convex optimization problem. However, fixing one of the unknowns reduces solving for the other into a simpler, convex problem. Therefore, this paper proposes using an iterative alternating minimization approach to findX andĤ. The algorithm begins with an initial guess forX, denoted byX 0 . Then, in the first iteration, the estimate for H is found by solvinĝ where k denotes the iteration number. This reduces to finding H such that Y f =X 0 F L H. This can be solved using linear least squares asĤ where † denotes the Moore-Penrose inverse. This can be interpreted as channel estimation. In the next step, thisĤ k is used to update the estimate ofX by solvinĝ The norm in (6) can be simplified by denoting B = F LĤk for ease of notation, and using the fact thatX k needs to be diagonal. This leads to where y r and b r denote the r-th columns of Y f and B, respectively, y(n, r) and b(n, r) denote the elements at n-th row and r-th column of Y f and B, respectively, and x(n) denotes the n-th element on the diagonal of the matrix X.
The inner sum N r r=1 |y(n, r) − b(n, r)| 2 can be optimized independently for each n ∈ 1, 2, . . . , N to obtain the n-th element ofX k as where b * (n, r) denotes the complex conjugate of b(n, r). The expression in (8) is equivalent to maximal-ratio combining (MRC). At each iteration, the optimization problems in (4) and (6) are solved alternately using the closed form solutions given in (5) and (8) until the optimal solution to faithfully reconstruct Y f within a pre-defined error tolerance is found. The above steps are summarized as Algorithm 1. Now, only the N samples corresponding tô X and the LN r samples corresponding toĤ are sent to the BBU to reconstruct Y f , leading to the compression ratio given in (3).
for n ← 1 to N do 8: end for 10: 1 denotes the first column of matrix Y f , † denotes matrix pseudo-inverse, * denotes complex conjugation, a(n, r) denotes element at row n and column r of matrix A and x k (n) denotes n−th diagonal element ofX k .

B. MULTI-USER CASE -OVERLAPPING ALLOCATION
In this section, the signal model and fronthaul compression for multiple users who share the same set of N subcarriers (overlapping allocation) is discussed. When there are N u users scheduled in the same set of N subcarriers, Y can be expressed in the frequency domain as (2) . . .
where X f (u) is the N ×N diagonal matrix, with the N-length M-QAM data of user u as its diagonal, and H f (u) is the N × N r multi-path channel matrix in the frequency domain for user u. Similar to the single user case, each H f (u) can be expressed as the matrix product (2) . . .
Equation (9) can be written using the above as The same iterative approach is used to findX andĤ that form the signal component of Y f as was used for the single user case.
An initial pointX 0 (u) is used in the first iteration, k = 1, in the minimization problem whose linear least squares solution is given bŷ ThisĤ k is then used to find Since X has to be in the form of N u diagonal N ×N matrices stacked horizontally, the norm in (13) can be simplified as where y T n is the n-th row of Y f , x T (n) = [x(n, 1)x(n, 2) · · · x(n, N u )], where x(n, u) is the n-th diagonal element of X(u), and B n is the N u × N r matrix obtained by stacking the n-th row of B(u), for u = 1, 2, . . . , N u , vertically. Here, y T n is the received signal at the N r antennas for the n-th subcarrier, x T (n) represents the estimated transmitted signal on the n-th subcarrier for all the users, and B n is the estimated frequency domain channel matrix for the n-th subcarrier for all the users. Thus, for each subcarrier n ∈ 1, 2, . . . , N, this leads to the linear least squares solution In each iteration, the minimization problems in (11) and (13) are solved alternately using the expressions in (12) and (15), respectively, until the optimal solution {Ĥ,X} to faithfully reconstruct Y f within a pre-defined error tolerance is obtained. Then the NN u samples corresponding toX and the LN r N u samples corresponding toĤ are sent from the RRH to the BBU, leading to the compression ratio for multi-user system (CR MU ) as

C. MULTI-USER CASE -NON-OVERLAPPING ALLOCATION
The signal model for the scenario when multiple users are allocated subsets of the N subcarriers in one OFDM symbol in a non-overlapping manner can be expressed as a special case of the model for overlapping allocation. In (9) for Y in the frequency domain, each user data matrix X f (u), for u = 1, 2, . . . , N u , will have non-zeros diagonal elements only on the subcarriers allocated to that user, and zeros elsewhere. With this signal model, for the alternating minimization algorithm, the solution for the optimization problem in (11) remains the same aŝ However, since only one user is scheduled in each subcarrier, the expression for the norm in (13) is modified to where b T n is the n-th row of the estimated frequency domain channel matrix B(u) of the user allocated the n-th subcarrier. This leads to the same solution as in the single user case in (8): where x k (n) is the estimated data symbol of the user allocated the n-th subcarrier. The alternating minimization algorithm solves (11) and (13) using (12) and (8), respectively until {Ĥ,X} meeting the error tolerance is found. Since there are only N non-zero elements inX, these N samples and the LN r N u samples corresponding toĤ are transmitted on the fronthaul, resulting in a compression ratio of for n ← 1 to N do 8: x T k (n) = y T n (B n ) † 9: end for 10: , where x(n, u) is the n th diagonal element ofX k (u), y T n is n th row of Y f , B n is the N u ×N r sub-matrix of B for n-th sub-carrier for the N u users.

D. MULTI-ANTENNA USERS
Multiple transmit antennas can be used for either diversity to improve error rate or spatial multiplexing to improve data rate. Suppose a user in the uplink has N t antennas. If the N t antennas are used to transmit the same data (diversity), then the received signal matrix can be expressed as where the superscript denotes the user antenna number, i.e., H i t is the time-domain channel for transmit antenna i of the user. Letting H 1 t + H 2 t + · · · + H N t t = H t allows (19) to be of the same form as (2) for the single user with single antenna case. Therefore, Algorithm 1 can be used for compression and the compression ratio remains the same as in (3).
When the multiple antennas are employed by the user to transmit different data streams (spatial multiplexing), then where X i f is the N × N diagonal matrix of data symbols transmitted from antenna i of the user. This is similar to (10) for the multi-user case. Thus, the multiple antennas of a user can be treated as additional virtual users when they transmit different data streams. Therefore, the compression ratio for a single user employing N t antennas for multiplexing is the same as given in (16), with N t replacing N u , and Algorithm 2 can be used for compression.
The above can be similarly extended to multiple users, each with N t antennas, by replacing N u by the total number of data streams, N s = N u N t in (10) and (16).

E. RECOVERY AT BBU
Upon receiving the samples corresponding toX andĤ at the BBU, the matrix Y f is reconstructed aŝ for the single user case, and aŝ for the multi-user case. Once Y f is reconstructed, the standard pilot-based channel estimation can be used at the BBU for data recovery. The estimated channel coefficients are used to apply either MRC in the single user case or zero-forcing (ZF) equalization in the multi-user case toŶ f in order to recover the user data X f .
TheX received from RRH are not directly used as the estimate of the user data X f for the following two reasons: 1) The solution {X,Ĥ} obtained in Algorithms 1 and 2 are unique only up to a scalar constant, and the product in (21) or (22) can converge to Y f even whenX andĤ do not individually converge to the actual X f and H t . A detailed discussion on this is given in the Appendix. 2) In the case of network architectures where several MIMO RRHs share a single BBU pool, the BBU can have knowledge of interferers which allows it to get a better estimate of the channel than theĤ received from the RRH.

IV. PERFORMANCE ANALYSIS UNDER DIFFERENT PROPAGATION CONDITIONS
In this section, the robustness of the proposed method is assessed under different channel conditions by analyzing how various channel parameters impact its error performance and compare it against that of an uncompressed system for reference. The analysis is confined to single user system since the proposed method has the same iterative minimization approach for both single user and multi-user systems. The multi-path channel can be modeled as a finite impulse response (FIR) filter characterized by the following three parameters: • Number of channel taps • Power of the channel taps • Delay of the channel taps Another factor that affects error performance is the correlation between the receive antennas. Monte Carlo simulations are used to evaluate the effect of the above parameters on the uncoded symbol error rates (SER) when the fronthaul data is compressed using the proposed method. The performance of the method under realistic channel models is also assessed in this section. The common simulation parameters are summarized in Table 1. After reconstruction of Y f , two different interpolation methods are used for channel estimation at the BBU: (a) Linear interpolation, where the frequency-domain channel estimates obtained at the pilot subcarriers are linearly interpolated to the rest of the subcarriers, and (b) FFT interpolation, where the pilot-based frequency-domain channel estimates are first converted to the time-domain and then interpolated to all the subcarriers using an FFT operation.

A. NUMBER OF TAPS
The number of taps determines the dimension, L of the multi-path channel matrix, H t in the decomposition Y f = X f F L H t + W f . In the 3rd Generation Partnership Project (3GPP) 5G standards, the Physical Random Access Channel (PRACH) can estimate the number of taps in the channel. This estimate is used to fix the dimension ofĤ produced by the proposed method, which affects the achieved compression ratio (CR), i.e., greater this dimension, lower the achieved CR as per (3), (16), and (18). The scenario of interest is when this estimate is not exact.
The error performance of the proposed method is examined when the estimate of the number of taps in the channel, L est available at the RRH is not equal to its true value, L. This affects the dimension ofĤ output by the proposed fronthaul compression algorithm at the RRH. It can also affect the channel estimation at the BBU when the pilotbased estimates are converted to the time-domain in the FFT interpolation method.
Two cases are considered here: (i) when L est < L and (ii) when L est ≥ L. Fig. 4(a) compares the uncoded SER of the proposed matrix decomposition method for these two cases against the uncompressed system when FFT interpolation is used at the BBU for pilot-based channel estimation. It is observed that when L est ≥ L, the proposed method performs slightly better than the uncompressed system, whereas when L est < L, it performs worse. Fig. 4(b) shows the SER for both the proposed method and uncompressed system with linear interpolation at the BBU for channel estimation after reconstruction of Y f . There is only one curve for the uncompressed system as it is not affected by the error in L est . Here, it is observed that matrix decomposition has nearly 4 dB gain over the uncompressed system. The uncompressed system suffers 4 dB loss in performance with linear interpolated channel estimates versus the FFT interpolated estimates in Fig. 4(a), whereas matrix decomposition suffers only 2 dB loss. It can also be concluded from both these figures that as long as L est ≥ L, matrix decomposition offers a small denoising gain similar to that observed in other low rank approximation methods such as PCA. However,  when L est < L, some of the diversity provided by multi-path propagation is lost.

B. POWER DELAY PROFILE
The effect of the power of the channel taps on the proposed method is examined next. The tap powers of the channel filter are modeled as a decaying exponential and the error performance for different decay rates are studied. Fig. 5(a) shows the uncoded SER for 10 iterations of Algorithm 1 for two decay rates, 1 dB/tap and 10 dB/tap. It is observed that the method performs well if the difference in tap powers is large; it performs poorly at higher SNR if the tap powers are too close together. Fig. 5(b) shows the SER for 50 iterations of the algorithm and it can be observed that the 1 dB decay curve now matches the 10 dB decay curve, and both provide a small denoising gain over the uncompressed system. This implies that it is harder for the algorithm to distinguish the different multi-paths for the estimateĤ when their powers are similar; however, given enough time, the algorithm manages to identify them correctly.

C. TAP DELAYS
The matrix F L in the signal model Y f = X f F L H t + W f denotes the L columns of the N × N DFT matrix corresponding to the channel tap delays in units of sampling duration. This should match the F L used in (4) of the proposed alternating minimization algorithm for the best results. However, in practice, the tap delays are not known at the receiver. It was noted in Section IV-A that as long as L est > L, the method performs well. Therefore, choose a value of L est that is large enough to correspond to the channel tap with the largest delay and use F L est , the first L est columns of the N × N DFT matrix in place of F L in (4). This changes the compression ratio (CR) in (3) to This would work well in practice since multi-paths which are extremely delayed have negligible power and can be ignored while choosing the value of L est . This circumvents the problem of lack of information on the tap delays at the cost of only slightly lower CR. Fig. 6 shows the SER performance of the proposed method and the uncompressed system for a channel with four taps and a delay of four samples between each tap. F L est with L est = 16 (corresponding to the delay of the last tap) is used in place of F L in Algorithm 1. The proposed method performs better than the uncompressed system while still providing a CR of 32.
It can be concluded from this section that the proposed method can provide good error performance even when none of the channel parameters are known, as long some of the compression ratio is sacrificed to fix a large enough value of L est . This method can also be used to perform an almost blind demodulation of MIMO OFDM symbols by using just a single pilot to estimate the scalar λ in Lemma 1 in the Appendix.

D. IMPACT OF RECEIVE ANTENNA CORRELATION
Massive MIMO antenna arrays have antenna elements spaced close together, making correlation between the antenna elements unavoidable in practical channels. Spatial correlation decreases the MIMO gain and capacity. The most widely used antenna correlation model is the exponential correlation model [24], which is also adopted by the 3GPP standards for 5G NR [25]. This model makes physical sense in most scenarios as correlation between antenna elements varies inversely with the distance between them. In this section, the impact of spatial correlation between the RRH antenna elements on the SER of the proposed method is evaluated. A uniform linear array (ULA) is assumed at the RRH with correlation matrix R given by where ρ is the correlation coefficient with 0 ≤ ρ ≤ 1. Fig. 7 shows the SER for three different levels of correlation, ρ = 0 or no correlation, ρ = 0.5 or moderate correlation, and ρ = 0.95, very high correlation, for the proposed method and the uncompressed system. Fig. 7(a) shows the SER for 10 iterations of Algorithm 1. It is observed that the proposed matrix decomposition method matches the uncompressed system performance when the correlation is not high, but performs worse at very high correlation. This is because when the antennas are highly correlated, the algorithm struggles to distinguish between the different antennas, similar to the trend observed in Section IV-B for tap powers. However, when the algorithm is given sufficient time (iterations), it again manages to distinguish between the antennas. This is demonstrated in Fig. 7(b) where the SER of matrix decomposition for very high correlation matches that of the uncompressed curve when the number of iterations is increased to 50.

E. PERFORMANCE IN PRACTICAL CHANNELS
In this section, the performance of the proposed matrix decomposition method in realistic multi-path fading propagation environments is examined. The 3GPP technical specifications for 5G NR provide channel models developed from real channel measurements for use in link level simulations [22], [26]. The tapped delay line (TDL) models for sub-6 GHz given therein are used here. These models are used to test the performance of the proposed compression method in low, medium and high delay spread environments. Linear interpolation is used at the BBU for channel estimation in the simulation of all three cases. Fig. 8 shows the SER performance of the proposed method and the uncompressed system for the NR channel model TDLA30, which represents a low delay spread environment. SERs for two values of L est = 2 and 6 are plotted, which provide compression ratios (CRs) of 57 and 51, respectively. It can be observed that the method performs well and provides about 3 dB denoising gain compared to the uncompressed system when the delay spread is low. In medium and high delay spread channels, the channel has more number of taps due to the richer multi-path propagation conditions. Therefore, L est has to be increased to obtain comparable SERs. This is demonstrated in Fig. 9 and Fig. 10. Fig. 9 shows the SER for the medium delay spread channel model, TDLB100. With L est = 6 and 12, matrix decomposition offers CRs of 51 and 36.6, respectively. The denoising gain is reduced for the lower value of L est at higher SNR. In Fig. 10, L est = 10 and 20 provide CRs of 39.4 and 28.4, respectively, for the high delay spread channel model, TDLC300. It can be observed that the denoising gain is further reduced at higher SNRs despite the increase in L est .
The algorithm is also tested in E-UTRA channel models, Extended Pedestrian A model (EPA) and Extended Vehicular A model (EVA) given in [22]. The values of L est are chosen to be 5 and 30, giving compression ratios of 48.8 and 22.3 for a single user for the EPA and EVA models, respectively. Fig. 11 shows that the method provides very good denoising gain for both models, especially at low SNR, but the performance under EVA model deteriorates at high SNR even after 50 iterations of Algorithm 1, because its delay spread is larger, in line with the observations from Section IV-A.

V. ALGORITHM CONVERGENCE AND COMPLEXITY
The numerical convergence of the algorithm is evaluated for the different channel models discussed above. The normalized error between the received data matrix, Y f and the matrix reconstructed at the BBU,Ŷ f is plotted against the iteration number at 0 dB SNR in Fig. 12(a) and at 20 dB SNR in Fig. 12(b). It can be observed that the algorithm converges fast (in 2 iterations) for the low delay spread channel, TDLA30. It takes about 7 iterations for TDLB100 model and converges the slowest (20 iterations) for the high delay spread model, TDLC300. For reference, the noise variance is equal to one for the 0 dB case and the proposed method converges to a value of error lower than this, demonstrating the denoising gain provided by the algorithm as observed in the previous section.
The above results suggest that the proposed method performs well in low/medium delay spread environments. However, for high delay spread channels, L est needs to be very high and more number of iterations are required to achieve good performance. When L est is high, even though it offers higher compression ratios than the existing methods [9], [13], algorithm complexity considerations come into play, which is discussed in detail next.
The computational complexity of the proposed compression method can be analyzed as a function of the dimensions N, L and N r . It can be observed from Algorithm 1 that the most computationally intensive operation is step 5, to calculateĤ The matrix pseudo-inverse is usually calculated via singular value decomposition (SVD), which has a time complexity of the order of LN 2 computations, i.e., O(LN 2 ), sinceX k F L has dimension N × L and N > L. However, considering the fact that typically N L and noting that the matrixX k is diagonal, expanding (24) as follows (25) reveals that replacing the matrix pseudo-inverse calculation via SVD with the expanded expression in (25) can lead to significant savings in computation. This arises from the fact that F H LX H kX k F L is only an L × L matrix, whose inverse can be calculated with significantly less number of computations than the SVD of an N × L matrix, when N L. Considering that N N r L, the approximate complexity of one iteration in Algorithm 1 is O(LNN r ), which is comparable to the complexity of O(N 3 r ) of the SVD step in PCA compression [13]. However, multiple iterations are required for Algorithm 1 to converge to Y f , which makes it computationally more intensive than PCA compression in [13].
The break down of (25) into steps 1-5 above shows that the algorithm is only memory-intensive but not processorintensive. This is due to the fact that all the steps except Step 4 involve only multiplications and additions.
Step 4 calculates the inverse of an L×L matrix; however, since the size L is small (typically L < 10) because of channel hardening in the massive MIMO setting [15], it can even be implemented on a field programmable gate array (FPGA) [27]. In contrast, the PCA compression in [13] needs to compute the SVD of a large N ×N r matrix, which involves the calculation of square roots, a much more processor-intensive operation compared to multiplications and additions.

VI. PERFORMANCE EVALUATION AGAINST OTHER COMPRESSION METHODS
The QR decomposition-based compression method [9] and the PCA-based compression method [13] offer the best compression ratios and performance in existing literature. Therefore, these methods are used to benchmark the performance of the proposed compression method based on two criteria: Compression Ratio (CR) and Symbol Error Rate (SER).

A. COMPARISON OF ACHIEVED COMPRESSION RATIOS
For a single user in the system, the CR of the proposed method is given by (3). The CR for PCA/QR-based compression in [9], [13] is given by It can be observed from (3) and (26) that for large values of N, which is the case when N is the OFDM symbol length for large bandwidths, having N max{L, N r } implies which gives For the case when multiple users share the same set of N subcarriers (overlapping allocation), Comparing this with (16) when N max{L, N r , N u } gives   The CRs for different values of N, N r , L, N u for both the methods are given in Tables 2 and 3. The TDLA30 channel with L = 12 taps [22] is used for the calculation of the CRs. Since L N, the FFT size, the decomposition of Y into approximate factorsX of length N, andĤ of size L×N r using the proposed method results in a compression of 30-50×. It is noted that the proposed compression method can give nearly an order of magnitude higher CRs than PCA compression in [9], [13] as illustrated by equations (27) and (28). This means that for a 64 antenna, 100 MHz system, the proposed compression method reduces the fronthaul data rate required from 236 Gbps to approximately 5 Gbps for a single user system, and 20 Gbps for four users in a multi-user MIMO system.

B. SYMBOL ERROR RATE PERFORMANCE
Monte Carlo simulations are used to evaluate the uncoded SER of the proposed method, PCA compression and the uncompressed system. The simulation parameters are summarized in Table 4. The exponential correlation model [24]  with correlation coefficient 0.7 is used to model the antenna correlation for a uniform linear array at the RRH. For the multi-user case, N u = 4 users are assumed to share the same set of N subcarriers (overlapping allocation). Fig. 13 compares the uncoded SERs of the proposed method and PCA compression for the TDLA30 channel model. SERs for the uncompressed system are also plotted as baseline. After quantizing the channel tap delays for the resolution given by 4096-point FFT and 30 kHz subcarrier spacing, the value of L est = 12 was chosen. For a single user system, this gives compression ratios of 53.9 and 5.25, for the proposed matrix decomposition method and PCA method, respectively. Fig. 13(a) shows the SER for a single user in the system. It is observed that the proposed method performs as well as PCA while providing more than 10× its compression. Both the proposed method and PCA also provide a denoising gain of approximately 2dB over the uncompressed system. Fig. 13(b) shows the SERs for N u = 4 users in the system (multi-user MIMO) with overlapping subcarrier allocation. The proposed method provides a compression ratio of 13.13 and matches the SER of the uncompressed system while the SER of PCA degrades at high SNR even though it provides only a tenth of the compression provided by the proposed method.

VII. CONCLUSION
The capacity of the fronthaul optical link is a major bottleneck in the implementation of OFDM massive MIMO networks with split architecture as specified by O-RAN. In this work, a method of fronthaul compression for the uplink MIMO that exploits the convolution structure of the data received at the RRH was proposed. An iterative alternating minimization approach was used at the RRH to approximate the received signal as the product of a diagonal user data matrix and a low rank channel response matrix, allowing the received signal to be reconstructed at the BBU using fewer samples. The performance of this method was analyzed for different channel parameters and its robustness was tested in many simulated realistic channel models. The method can be tailored to both single-user and multi-user MIMO systems, and link level simulations show that it provides the same or even better symbol error rates compared to an uncompressed system. It can offer nearly an order of magnitude higher compression ratios than the existing compression methods. The proposed method can also be used for almost blind demodulation of MIMO OFDM symbols.

UNIQUENESS OF THE SOLUTION
The optimal solution to (4) and (6), {Ĥ,X}, respectively, obtained via Algorithm 1 is unique up to a scalar constant. This is proved in the following lemma.
where λ ∈ C, the set of complex numbers. Proof: If {Ĥ,X} and {H,X} are both solutions to (4) and (6), thenX LetX =XX * andH = H * Ĥ , where X * is an N × N matrix and H * is an L × L matrix. Then, is needed to satisfy (30). If X * is a rank-N diagonal matrix, then F L H * = (X * ) −1 F L . If f m denotes the m th row of F L and λ m denotes the m th diagonal element of (X * ) −1 , then the above can be written as Thus, the rows of F L form the left eigen vectors of the matrix H * . F L is a rank-L matrix, therefore f L+1 can be expressed as Combining (32) and (33), this leads to which can hold true only when all λ i 's are equal. This gives X * = 1 λ I N and H * = λI L , where λ i = λ, for i = 1, 2, . . . , N and I K denotes identity matrix of dimension K. Note that {ĤH * ,XX * } is also a solution to (4) and (6), respectively, where H * = λI N r . To see this, use A = F LĤ in (30). Then the condition to be satisfied, X * AH * = A is of the form in (31) and the result follows.