Unsupervised Learning for Reference Signals Overhead Reduction in 3GPP MIMO Systems

The unprecedented increase in the number of wireless-connected devices requires novel solutions to improve the data rate at low latency. Reference signals overhead reduction is a powerful way to increase the data rates. However, excessive reduction in the number of reference signals degrades the channel estimation performance with potential negative impacts on the data rates. Toward this end, this paper proposes a machine learning-based approach that enables reference signal-free data channel demodulation. This new approach involves a repetition of part of the data channel symbols across the slot. Invoking canonical correlation analysis on the repeated data at the user side yields high-quality combiners that are used to recover both the repeated data blocks and the rest of the data symbols in the slot, without the need of traditional channel estimation. This paper also proposes two effective and principled strategies; one for repetition pattern selection as a function of the channel parameters and the other addresses performance in highly frequency selective channels. The proposed approach offers considerable gains in throughput performance and complexity reduction. Simulation results using a 3GPP NR link-level test bench, reveal the effectiveness of the proposed approach relative to the state-of-the-art methods.


I. INTRODUCTION
T HE ever-growing number of consumer wireless devices, together with data-hungry applications, has introduced unprecedented spectrum usage challenges [1].While the current 5G New Radio (NR) uses massive Multiple Input Multiple Output (MIMO) [2] and millimeter wave (mmWave) communications [3] to address these challenges [4], [5], for 5G Advanced and beyond there is need for further improvements in the end-user data rates, cost efficiency, and communication latency.
The 3rd Generation Partnership Project (3GPP) 5G NR defines Reference Signals (RS) for various purposes: for example, Channel State Information RS (CSI-RS) is used for channel state information reporting, while Demodulation RS (DM-RS) is used for estimating the effective (precoded) channel response for equalization and demodulation of the data and control channels [6] .
Traditional approaches for equalization require a two-step solution; first channel estimation based on the received RS, followed by equalization.Once the channel is estimated at the RS locations, interpolation (and possibly extrapolation) is typically needed to estimate the channel at the neighboring locations in the time/frequency grid.An equalizer is then used to recover the data signal.Several equalization approaches ranging from the optimal maximum likelihood detector [7], zero-forcing, Minimum Mean Squared Error (MMSE) [8], sphere decoder [9] to semi-definite relaxation-based techniques [10], can be used to decode the desired signal.These methods, however, have performance-complexity trade-offs, in addition to the fact that they all require accurate CSI to guarantee reliable detection performance.For Orthogonal Frequency-Division Multiplexing (OFDM) signals as used in Long-Term Evolution (LTE) and NR, channel estimation may be performed using either the Least Square (LS) or MMSE estimator.While MMSE-based estimation yields better performance relative to the LS approach, it incurs high computational complexity and requires some prior knowledge of the channel statistic.
RS-based channel estimators require a sufficient number of RS to ensure acceptable channel estimation performance.Moreover, for multi-layer transmissions -Single User (SU) or Multi-User (MU) MIMO -the DM-RS associated to different antenna ports need to be orthogonal.As the number of layers/antenna ports increases for Massive MIMO, the DM-RS overhead increases, which may negatively impact the data rates.Any loss of orthogonality of the received DM-RS may further degrade the performance.
Considering the aforementioned challenges (complexity and RS overhead), it becomes apparent that overhead reduction is a promising direction that may enhance the system's performance.This motivates the following question.Is it possible to bypass all the channel estimation procedures and design the transmitted data in such a way that enables simple and unsupervised (pilot-free) detection at the receiver, at considerably lower complexity relative to the state-of-theart methods?
The answer is, surprisingly, affirmative.This paper proposes a new DM-RS free OFDM frame structure that employs a simple repetition protocol on a part of the user data in the time-frequency grid.The repetition structure will be utilized at the receiver to invoke Canonical Correlation Analysis (CCA) to derive the weights (combiners) needed to determine the signals of interest.While one can deem the proposed framework as repetition coding (error control coding mechanism), it is fundamentally different since the decoding is capitalized on CCA.
CCA is a widely known machine learning tool that aims at finding linear combinations of two random vectors such that the resulting pair of random variables is maximally correlated [11].In a recent work [12], the authors have presented a powerful and broadly useful interpretation of CCA as a subspace intersection method that can identify a shared common subspace between two multi-antenna signal views, even under strong interference from individual components.CCA has found a plethora of applications in signal processing and wireless communications, including but not limited to, direction-of-arrival estimation [13], equalization [14], spectrum sharing [15], radar [16], [17], blind source separation [18], [19], cell-edge user detection [12], [20], and multi-view learning [21], [22], to list a few.
Our contributions can be summarized as follows: • We propose a new OFDM frame structure that is free from DM-RS.The new frame structure involves a simple repetition procedure of part of the Physical Downlink Shared Channel (PDSCH) data.Exploiting the repetition structure at the UE side coupled with CCAbased processing, the signal of interest can be identified under relatively mild conditions, and even under significantly low Signal-to-Interference plus Noise Ratio (SINR).From a computational perspective, the proposed end-to-end solution requires solving a series of principal eigenvalue problems that can all be solved in parallel using the power method.This renders our approach much more computationally favorable in practice compared to existing methods.
• We show that choosing the repetition pattern, e.g., time versus frequency repetition, is critical for the proposed method to work appropriately.In particular, we propose a strategy that ties the best repetition pattern to the channel parameters (e.g., delay spread and Doppler).
We then support the proposed strategy with numerical experiments to show how critical choosing the pattern is on the CCA performance.
• To deal with the frequency selectivity issue, we propose an effective solution, referred to as sub-gridding, that can boost the CCA performance in severe frequency selective scenarios.Further, we derive theoretical conditions on the relation between the sub-grid size and the sub-band size defined in the 3GPP.
• To demonstrate the effectiveness of our proposed method, we provide a comprehensive suite of simulations on a 3GPP link-level test bench using 3GPP channel models from the 38.901 specifications [23].We compare our results with DM-RS-based channel estimation using LS followed by MMSE-based equalization.The simulation results demonstrate the superiority of the proposed approach compared to the baseline, in terms of performance, overhead and computational complexity.In particular, we show that our method achieves more than 40% gain in throughput over the state-of-the-art.
A preliminary version of part of the results in this paper has been accepted at the IEEE Global Communications Conference (GLOBECOM) 2023 [24].Relative to the conference paper, this journal version includes i) a comprehensive CCA solution for highly frequency-selective channels with new necessary conditions on the sub-grid size and its relationship with the sub-band size defined in the 3GPP, ii) a section on system considerations for employing the proposed method, iii) extensive simulation results in terms of Symbol Error Rate (SER) to assess the performance of CCA on a wide set of parameters (including phase ambiguity resolution and higher order modulation), and iv) link-level simulation results incorporating the NR Forward Error Correction (FEC) processing chain, thereby showing the benefits of the proposed method in terms of overhead reduction and system performance in an end-to-end setup.
The rest of this paper is organized as follows.After briefly reviewing CCA in Section II, Section III presents the system and signal models and describes the limitations of the legacy method for the considered problem.The proposed CCA solution together with the pattern design and the sub-gridding solution for frequency selective scenarios are described in Section IV.Experimental results are provided in Section V, and conclusions are drawn in Section VI.

II. CANONICAL CORRELATION ANALYSIS
In statistics, CCA is a method of interpreting associations among two sets of variables.It determines a set of variates, which are linear combinations of the variables in each set, that best explain the commonality between the two sets.More specifically, assume that T samples of the random vectors y 1 ∈ R M 1 and y 2 ∈ R M 2 , that represent two views of the same entity, are available.For example, y 1 could be a stress measure of an individual, y 2 is their tendency to smoke, and data for T individuals is available.CCA aims to find a common latent representation that relates, for instance, stress level with smoking tendency, by deriving meta-variables -one for each view-that are strongly correlated with each other. Let , be the t-th sample of view ℓ, where t ∈ {1, . . .T }.CCA aims to find a pair of linear combinations of the two views that are maximally correlated with each other.More precisely, let T Y ℓ Y ⊤ m represent the autocorrelation matrix, if ℓ = m, or the cross-correlation matrix, if ℓ ̸ = m, of the random vectors y ℓ and y m .The vectors q 1 and q 2 (also known as canonical vectors) can then be obtained by solving the following optimization problem [11], [25], max where, without loss of generality, the random vectors y 1 and y 2 are assumed to have zero mean.Since the solution of the CCA problem in (1) is not affected by re-scaling q 1 or q 2 together or independently, and the choice of the re-scaling factor is arbitrary, the problem in (1) is equivalent to, max where λ 1 and λ 2 are the Lagrangian multipliers associated with (2b) and (2c), respectively.The CCA problem in (2) can be transformed into a distance minimization equivalent in the form, min Unfolding the objective in (3) shows the equivalence to the problem in (2).Upon examination of the constraints presented in problem (2), it becomes apparent that the inherent nature of this problem is characterized by non-convexity, primarily attributable to the presence of equality constraints.
It is essential to note, however, that our primary objective does not entail the derivation of a global maximally optimal solution for this problem.Instead, we opt to follow a solution approach akin to that was presented in the work of [25].
By applying Lagrange duality theorem, the optimal q * 1 for (2) can be found by solving the generalized eigenvalue problem with λ = λ 1 = λ 2 , which then yields for q * 2 to be Detailed explanation regarding the solution of this optimization problem can be found in the analysis provided in [25].

III. PROBLEM DESCRIPTION A. SYSTEM AND SIGNAL MODELS
Consider a Down-Link (DL) transmission in a 5G NR network where a single base station (gNB) transmits data to a single User Equipment (UE) through a physical DL shared channel (PDSCH).The gNB is equipped with N t antennas and serves a UE of N r antennas.For sub-band k, the gNB transmits L data streams; each of length N data and represented by the matrix where N represents the number of sub-bands.The received baseband signal for sub-band k can then be described as follows, where Y (k)  ∈ C N r ×N data is the group of received symbols over N r antennas and H (k)  ∈ C N r ×N t is the DL channel response matrix associated with the k-th sub-band.In order to support multi-stream transmission, the gNB precodes the data symbols matrix X (k) using the precoder F (k)  ∈ C N t ×L , where The gNB calculates the precoder based on the Channel State Information (CSI) reports from the UE; the CSI reports include -Channel Quality Indicator (CQI), Rank Indicator (RI), and Precoder-Matrix Indicator (PMI).The term W (k)  ∈ C N r ×N data contains independent and identically distributed (i.i.d) complex Gaussian noise entries of zero mean and variance σ 2 each.Throughout this work, the DL effective channel matrix H (k) eff = H (k) F (k) is assumed to be unknown at the UE.We assume a Wide-Band (WB) precoding of the entire Resource Grid (RG).This yields to a single precoder, F ∈ C N t ×L , and hence, we suppress the super script dependence of F (k) (the sub-band-based precoding will be considered later in the simulation section).

B. DEMODULATION REFERENCE SIGNALS (DM-RS)
In 3GPP, data transmission is accompanied by a set of reference signals, such as DM-RS.DM-RS for PDSCH are intended for the estimation of H eff as part of coherent demodulation.They are present in the resource blocks (RB)s used for PDSCH transmission.DM-RS possesses different structures on the RG depending on the required purpose.For instance, Mapping type-A is mainly intended for the case where the data occupy most of the slot while Mapping type-B is used with low-latency applications [6]. Figure 1 provides an example of DM-RS of mapping type-A and configuration Legacy DM-RS-based channel estimation and equalization tend to have high computational complexity.Data recovery using DM-RS is a two-block process, where the first block aims to create an estimate for H eff at the DM-RS locations, usually accompanied by interpolation and extrapolation at all the remaining REs.This increases the complexity especially when the number of allocated RBs is large.Moreover, the accuracy of channel estimation is influenced by factors such as Doppler frequency, noise estimation, and the DMRS density within the transmitted RBs.While increasing the DM-RS density may improve the channel estimation accuracy, it could potentially hurt the data rate as fewer REs are then used for data transmission.Finally, for the Multi-User (MU) transmission case, to guarantee an acceptable estimation performance, the DM-RS sequences associated with the co-scheduled UEs need to be orthogonal -a constraint that is difficult to achieve in the multi-cell networks.
We aim to overcome the DM-RS shortcomings mentioned above by designing a CCA-based detection method that tends to recover the transmitted data symbols in an unsupervised manner, and at a much lower complexity relative to the DM-RS approach.Our problem statement can formally be stated as follows Problem 1: Given the received signal Y at the UE, it is required to design a transmission scheme at the gNB that enables an unsupervised and DM-RS-free detection at the UE.

IV. PROPOSED SOLUTION
In this section, we explain how CCA can be exploited to solve the problem described in Section III.Instead of using the frame structure described earlier that contains reserved REs for DM-RS symbols (colored in yellow), we propose a new DM-RS-free OFDM frame structure.The new frame structure requires repeating a few data symbols in the timefrequency grid, as depicted in Fig. 2. The repetition may be employed either in time or frequency, where the different repetition patterns can yield different performances, as will be explained in Section IV-A.The repetition structure will then be utilized at the UE side to derive the combiners that will be used to decode the PDSCH data in the repeated locations and in the neighborhood of the repeated symbols, as will be shown later.Unlike the DM-RS approach that requires sending QPSK symbols in the reserved REs, our approach is agnostic to the PDSCH modulation scheme and it can even work with analog signals.As we will see later, the proposed approach can identify the desired signal without imposing any structure on the transmitted waveform.To explain how the proposed method works, let x cℓ ∈ C N denote the common/repeated signal associated with the ℓ-th layer within a part of the sub-band, where N represents the length of the common signal in one region and N ≪ N data .While the DM-RS framework requires the DM-RS sequences to be orthogonal in the multilayer transmission case, our approach merely requires the repetition patterns associated with the different layers to not fully overlap -a very mild assumption that can be easily satisfied as we will see in in Section IV-A.Towards this end, the baseband equivalent model of the received signal at each of those two separate regions using the repetition pattern of the ℓ-th layer can be expressed as, where i = 1, 2 refers to the region/view index, x ij denotes the signal in region i from the interfering layer j and h (iℓ) eff represents the channel response vector for region i and layer l.
It should be noted that the two views in (7) share one common signal associated with the layer whose repetition pattern is used -all the other layers will be randomly permuted, and hence, having different signals in both views.Note that the symbols in each region are assumed to have a constant effective channel (we will explain in Subsection IV-A how the patterns are chosen to satisfy such an assumption).
In a recent work [12], the authors provided a new and widely useful algebraic interpretation of CCA.That is, under a linear generative model and in the noise-free regime, if two signal views have a common/shared component in addition to individual components across each view, then applying CCA to those two views will recover the shared component regardless of how strong the individual components are.It can be easily seen that the model in (7) applies to such an interpretation as the two data views in (7) share the signal of the layer in which the repetition pattern is used.Considering the minimum distance formulation of CCA in (3) [25], the results of [12] showed that the projection 2ℓ q ⋆ 2ℓ , using the optimal solutions q ⋆ 1ℓ and q ⋆ 2ℓ of problem ( 8), represents the signal of interest (common signal) up to a phase ambiguity.The optimal canonical vectors (referred to combiners in the considered problem), q ⋆ 1ℓ and q ⋆ 2ℓ , can be obtained through solving the eigen-decomposition problem in (4) followed by direct substitution of q ⋆ 1ℓ in (5) to find q ⋆ 2ℓ .To clarify how the results of [12] can be mapped to the considered problem herein, let us define the matrix , where the columns of X i ∈ C N ×(L−1) are the signal associated with the other/interfering L − 1 layers.Similarly, the columns of matrix H i ∈ C N r ×L are the corresponding effective channel vectors.With these notations, we have the following result.
Proof: The proof follows from Theorem 1 in [12].Remark 1: The full rank condition on the matrix X iℓ requires the CCA view length N to be greater than or equal to the number of layers L and the columns of X iℓ to be linearly independent.Since the transmissions associated with the different layers are independent, then such requirements can be easily satisfied with modest N .On the other hand, the full rank condition on the effective channel matrix H i can be satisfied if i) the number of receive antennas is greater than or equal to the number of layers and ii) the columns of H i are linearly independent.Both conditions are satisfied with probability one since 1) the number of transmitted streams L is always upper-bounded by N r and 2) the columns of H i are in fact orthogonal (because of the SVD-based precoding).

A. CCA PATTERNS
For the ℓ-th layer, we consider an RG of N = 1 sub-bands and N RB RBs.Each RB is composed of REs that are divided among 12 sub-carriers vertically and 14 OFDM symbols horizontally.N data REs in the RG are loaded with data symbols from the transmit vector x ℓ ∈ C N data , such that each symbol corresponds to a RE, while N res REs are reserved for control signals.Figure 1 provides an example of a RG with N RB = 1, N data = 156 (blue REs) and N res = 12 (yellow REs).
REs in the ℓ-th layer RG are located using matrix linear indexing.Assuming that the (sub-carrier, slot) location of an RE in the RG is (m, n) and both m and n start from 0, the corresponding index can be found as i m,n = 12nN RB + m + 1.We define I RE as the set of all indices in an RG, where for ℓ ∈ [L], I RE = I ℓ data ∪I ℓ res , such that I ℓ data and I ℓ res are two disjoint sets that correspond to the indices of data and reserved REs for layer ℓ, respectively.Let the indices set where |.| is the cardinality of a set.For ease of notations, we assume equal view length for all layers, i.e., N ℓ = N , ∀ℓ.In this case, the common signal x cℓ ∈ C N , is the vector of symbols formed by the data in the REs located with the index set I ℓ s .We also let the index set  ℓ for layer ℓ and Wiℓ ∈ C N r × N contains i.i.d complex Gaussian entries of zero mean and variance σ 2 .Given the two views, Y iℓ , i = {1, 2}, the UE then solves the problem in (2) to find the combiners q * iℓ ∈ C N r , i = {1, 2}.q * iℓ is then used to combine TBC block ℓ and the REs in its vicinity.
A TBC block's shape depends on the channel impulse response.For flat fading and time-varying channels, TBC blocks exhibit a tall matrix shape, where the blocks are Given: Map data vector x ℓ ∈ C N data on G ℓ using I ℓ data .

4:
Repeat repeated in time to capture the possible time variation.Conversely, for frequency-selective and time-invariant channels, the blocks depict a fat matrix shape, where the repetition is done over the frequency band.Figure 2 provides two examples of time and frequency repetition schemes for an RG composed of one RB.The red line defines the vicinity regions for both B (1) ℓ and B (1) ℓ .For time repetition, q * 1ℓ is used to combine the REs in OFDM symbols 0 → 6 within layer ℓ, while q * 2ℓ is used to combine REs in OFDM symbols 7 → 13, as shown in Figure 2 (left).For frequency repetition, the entire bandwidth (BW) is divided to two parts, where q * iℓ is used to combine REs of part i, i ∈ {1, 2}, i.e., Figure 2 (right).
Algorithms 1 and 2 summarize the transmitter/receiver procedures.In algorithm 2, a generic combiner selection subroutine (comb_select) selects the combiner to be used (q * 1ℓ or q * 2ℓ ) for each RE and each layer.For the remainder of this paper, we adopt a selection procedure based on slicing the RG either vertically or horizontally as mentioned previously.The design of the optimal combiner selection rule can indeed improve the overall performance, however, due to space limitation, this is left for future study.It is also important to highlight that a unit cost of the problem in (2) is not guaranteed in noisy cases.Therefore, it turns out that making xℓ (I ℓ s ) be the average of Y ⊤ 1ℓ q * 1ℓ and Y ⊤ 2ℓ q * 2ℓ leads to an enhancement in performance as we will see later in the simulations section.
Remark 3: The overall complexity of the proposed approach depends on solving the problem in (2).This admits a relatively simple algebraic solution via solving the generalized eigenvalue problem in (4), which can be efficiently solved using the power method.

C. PHASE AMBIGUITY RESOLUTION
The estimated signal from the CCA combiners is always subject to a global phase ambiguity.This means that the original signal x ℓ is identified up to a scaling complex factor α ℓ ∈ C. Let be the scaled/rotated version of x cℓ using q * iℓ and Y iℓ with noise random vector, wiℓ = W⊤ iℓ q * iℓ , following a complex Gaussian distribution with zero mean and covariance matrix C iℓ = σ 2 iℓ I N , where I N is the identity matrix of size N .In order to eliminate the phase ambiguity effect, we first define xℓ and xℓ ∈ C N p as the vectors of the first N p elements of x cℓ and x(i) cℓ respectively.Assuming that xℓ , which we refer Algorithm 2 Receiver Side 1: Given: Ȳ ∈ C N r ×N RE and a combiner selection rule comb_select : Initialize: xℓ ∈ C N RE with zeros.
11: Output: to by the vector of pilot symbols for layer ℓ, is known at the receiver apriori, an estimate of α ℓ can be calculated as follows, where ŵiℓ is a complex Gaussian random variable with zero mean and variance α ℓ is then chosen to be the average of αiℓ for i ∈ {1, 2}.Ideally, in noiseless case, only one pilot symbol, N p = 1, can be used for phase correction.In the simulations section, we show that two or three pilots seem to be a good choice that provides a good estimate for the scaling factor α ℓ in the noisy case.

D. SUB-GRIDDING
As mentioned in Section IV-B, the combiner q * iℓ , after phase correction, is used to combine the REs in the vicinity region of B (i) ℓ .However, this only serves as an approximation for the optimal combiners required for those REs.In order to reduce that approximation error, we introduce the RG Sub-Grids (SG)s solution.Sub-gridding is motivated by the idea of RBs bundling to mitigate the effects of frequency selective channels.The RG is divided into equal-sized non-overlapping SGs, where SG j contains its own CCA pattern P ℓ j = (I ℓ s,j , I ℓ d,j ), signal x cℓ,j with length N j and pilots for phase correction.The pattern P ℓ j is then used to construct the two views, Y iℓ,j , i ∈ {1, 2}, of the CCA signal x cℓ,j within SG j.Given the per SG views, the CCA problem is solved to Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.find the optimal combiners q * iℓ,j , i ∈ {1, 2}.Combiner q * iℓ,j is then used to combine the REs in its vicinity in SG j. Figure 3 provides an example for a RG that is divided to N SG SGs of size N BSG = 2 RBs each.CCA is applied from the constructed views within each SG to find the corresponding optimal combiner.Remark 5: The N SG CCA problems introduced by sub-gridding can be solved independently and in parallel, and hence, the computational complexity introduced by sub-gridding can be neglected.
We define the set I ℓ,m s,j (I ℓ,m d,j ) to be the set of source (destination) REs in RB m of SG j such that m∈[N BSG ] I ℓ,m x,j = I ℓ x,j and |I ℓ,m x,j | = N m j with x ∈ {s, d}.In order to reduce the signaling overhead between the UE and the gNB, we assume that the sets I ℓ,m s,j and I ℓ,m d,j are symmetric across all RBs in the entire RG, which implies; for m, r ∈ [N BSG ], ℓ ∈ [L] and j, n ∈ [N SG ].The addition in ( 13) is element-wise -REs placed in the same locations of two consecutive RBs have an offset of 12 which is the number of sub-carriers in an RB -to every entry in the set.Hence, for layer ℓ, the UE signals ) to the gNB which casts the information required by the gNB's transmission scheme.From ( 13), we drop the RB and SG dependence, i.e., the indices j and m, in the Per Layer RB (PL-RB) pattern P ℓ,m j = (I ℓ,m s,j , I ℓ,m d,j ) and refer to it with Pℓ .When a RG is divided into SGs, the size of the TBC blocks per SG, N , referred to as the view length, decreases, e.g., assuming N RB = 4 RBs in the RG of Figure 3, N = 16 (number of light blue or yellow REs) if N SG = 2 SGs, whereas N = 32 when N SG = 1 SG (sub-gridding is disabled).Hence, despite the sub-gridding role in decreasing the combining approximation error (through reducing the number of REs equalized with the same combiner), it motivates the importance of investigating the effect of the view length on the performance.In sections V-A.1 and V-A.2, we provide a detailed study on the effect of the view length N and the SG size N BSG on the system performance.
So far, we assume a WB precoding where the RG is composed of only one sub-band, i.e., N = 1.More realistically, the RG could be divided into N > 1 sub-bands where the symbols within a sub-band are precoded using the same precoder.Since CCA equalization introduces another form of RG division through sub-gridding, it is important to study the relation between the sub-band size, which we will denote by N SBS , and the SG size N BSG , where both of them are measured in terms of number of RBs.As discussed earlier, for CCA to work properly, the CCA symbols within a view must lie within the same TBC block.Hence, symbols within a view must be precoded using the same precoder, i.e., lie within the same sub-band.However, different CCA views can be precoded with different precoders, i.e., lie in different sub-bands.This then motivates that the SG size N BSG should satisfy the following relations, mod(N SBS , N BSG ) = 0, for time repetition, ( 14) mod(N SBS , N BSG 2 ) = 0, for frequency repetition, (15) where mod() denotes the modulus operator.In (15), it is assumed that N BSG is an even integer and a CCA view spans half the SG as to be shown later in pattern 3 of Figure 12c.

E. SYSTEM CONSIDERATIONS
In this subsection, we will discuss some important system considerations for employing the proposed CCA approach.
As discussed in Section IV-A, the repetition pattern design is quite critical to ensure that the CCA approach works efficiently.Determining the pattern at the UE side and then sending it back to the gNB may incur some uplink overhead.One workaround is to design a set of patterns, known both at the gNB and UE, that include both repetitions in time and frequency with different densities (in our case the density value is related to the CCA view length N ).Then, the UE may select one of the pre-configured patterns and feedback the selected pattern index to the serving gNB.As mentioned earlier and will also be shown in the simulations, the CCA pattern can be selected based on channel measurements.This includes either the channel parameters (e.g., delay spread and Doppler) or even the complete channel state information, } N k=1 , that can be estimated using CSI-RS.Recall that our proposed solution only supports DM-RS free operation, but CSI-RS used for estimating the full channel matrix are still being transmitted, and this is in fact key information that the UE can utilize to determine the best pattern.
Even the pattern density itself can be determined at the UE side.For instance, one of the key parameters associated with the proposed approach is the CCA correlation coefficient (objective of problem ( 1)).This parameter gives an indication of the detection quality prior to applying the combiners to the data symbols (e.g., correlation coefficient equals to one guarantees error-free detection).We observed that increasing N implies a higher correlation coefficient but also impacts the transmission rate, so we see that the pattern density can be adapted based on the CCA correlation coefficient.
It is important to emphasize that our end-to-end approach necessitates the solution of a number of CCA problems, for each layer, in order to derive the combiners required to recover the necessary signals.Solving each CCA problem is tantamount to solving a maximum eigenvalue problem, which can be cheaply solved using the power method.On the other hand, the state-of-the-art methods require channel estimation, interpolation and extrapolation procedures, noise estimation, and equalizer design (e.g., MMSE) for decoding the desired signals.As a result, from a computational standpoint, our proposed approach has lower complexity compared to the state of the art.A detailed complexity analysis is out of the scope of this paper and will be considered in future work.
Finally, it is worth emphasizing that the proposed CCA framework can be used together with the DM-RS approach.While we will show that the proposed method offers considerable gains in performance, overhead, and complexity reduction, it may still be beneficial to consider a fallback strategy to the DM-RS approach as needed.

V. SIMULATION RESULTS
In this section, we evaluate the performance of algorithms 1 and 2 -which we will refer to by CCA transmission-for solving the problem described in Section III.We consider the DL transmission of a 5G link, shown in Figure 4, as defined by the 3GPP NR standards [4], [5].We assume optimal wideband SVD-based precoding where the columns of the precoder matrix F hold the L dominant right singular vectors of the average channel matrix H.While the SVD precoding is used for the proposed methods and the considered baselines, the proposed framework can work with any precoding technique, and the same gains can be achieved as long as the same precoding is used for all methods.
The simulation results are composed of two main parts: 1) SER results; where the DL-SCH encoder/decoder blocks are not used.The main purpose of this part is to investigate the effect of different parameters that affect the performance of the CCA-based equalizer, e.g., view length, and PL-RB pattern.2) Link level (End-to-end or E2E) results, where the full transmission chain is considered and the throughput is compared to the legacy DM-RS-based approach.
Table 1 summarizes the parameters that are used throughout the simulation.Other parameters, e.g., N , N BSG , are not listed in the table as they vary depending on the simulation setup and are mentioned in the corresponding sections.In V-A and V-B, we assume a single-layer transmission, and hence, we drop the dependence on the layer index ℓ of the corresponding variables in these sections.Throughout the simulations, light blue REs represent the TBC block B 1 , while yellow REs are for TBC block B 2 .The pilot symbols are selected as the initial N p = 5 symbols within each view.In the case of time repetition, these pilot symbols are vertically positioned.For frequency repetition, the selection of pilot symbols is executed horizontally.

A. CCA CONFIGURATION PARAMETERS
In this section, we simulate the SER performance versus the CCA view length N , SG size N BSG and PL-RB pattern P. We use repetition in time for the first two experiments and repetition in frequency for the pattern design simulations.

1) CCA VIEW LENGTH
Assuming no sub-gridding, implying; we study the effect of the CCA view length, N , on the SER performance.From (16), and the symmetry assumption across patterns in different RBs in (13), we use N (and its corresponding P) for the experiment setup.For a given value of N , 100 different random realizations for P are generated.
Figure 5 provides examples on a realization for P when N = 2, 4. As can be implied from the figure, we assume that TBC block 1 always exists within slots 0 → 6, while block 2 in slots 7 → 13.We perform the simulation for different signal-to-noise ratio (SNR) points, where for each point, the SER is calculated for every time slot at all P realizations, and the results are averaged over all frames and realizations.
In Figure 6, we provide a simulation result for a RG with N RB = 1, i.e., N = N .It can be seen that increasing the view length improves the average SER performance up to some value for N .Significantly large values of N can potentially hurt the overall transmission rate, as the number of non-data REs increases, without boosting the detection accuracy.Figure 7 shows simulation results for N RB = 20 RBs, N ∈ {2, 4, 8} and various channel models, i.e., CDL-(A,B,C).Consistent with previous findings, it can be seen that large values of N do not enhance the performance significantly; therefore the trade-off between the view length and the transmission rate should be taken into consideration while designing the PL-RB pattern P and the sub-grid size N BSG .

2) SUB-GRIDS
In this part, we study the impact of the SG size N BSG on the SER performance.We consider an RG of N RB = 50 RBs with SCS equal to 30 kHz and the PL-RB pattern P shown in Figure 8.These choices ensure that the channel's coherence BW is less than the transmission BW leading to a frequencyselective channel.In Figure 9, we simulate the average SER for different values of N BSG .The values of N in the legend can be found by applying (12) with N = 8 from the P-RG pattern described in Figure 8.It can be realized that when the SG size is large, N BSG ∈ {25, 50}, CCA transmission provides a degraded average SER.This is because, due to the frequency selectivity nature of the channel, the symbols within a CCA view experience significant variations in channel responses which violates the identifiability requirements for the method to work.Upon decreasing the SG size, N BSG ∈ {10, 5, 2}, the SER decreases as the channel affecting the CCA signal within each SG tends to be flatter.For a very small SG size, N BSG = 1, the average SER degrades again because the view length, N , per SG tends to be significantly small.
Figure 10 highlights the relation between the average SER and the number of CCA pilots, N p , per SG for different SNR values.It can be realized that increasing N p does not provide a significant improvement in the average SER.Hence, we pick N p = 5 symbols per SG for the remaining of the simulation results.
In Figure 11, we simulate the average SER for a RG of 48 RBs that is divided into N = 8 sub-bands with N SBS = 6 RBs each.We consider the CCA default PL-RB pattern in Figure 8 which employs a repetition in time scheme.It can be realized that when condition (14) is not satisfied, CCA equalization fails to recover the transmitted symbols.

3) PATTERN DESIGN
As mentioned in IV-B, the shape of the TBC blocks, where the CCA symbols are chosen and repeated to, depends on the channel conditions.This section shows simulation results for different CCA patterns.Figure 12 provides different CCA patterns for SGs of N BSG = 2 RBs each.On one side, pattern 1 in Figure 12a is concentrated in one OFDM symbol per view, spans the entire frequency domain, and time repetition is employed.On the other side, pattern 3 in 12c spans the entire time domain, but is located in only two sub-carriers per view and is repeated in frequency.Pattern 2 in 12b is a time repetition compromise between patterns 1 and 3. We simulate the average SER of the 3 patterns for a CDL-C channel with SCS=30kHz and an RG of 50 RBs that is divided into SGs of size N BSG = 2 RBs each.We consider two different channel setups; 1) CDL-C channel of 30ns DS and 60 km/hour speed.From Figure 13 it can be seen that pattern 1 outperforms patterns 2 and 3 with pattern 3 providing the worst average SER.2) CDL-C channel of 300ns DS and 1 km/hour speed.In this counter setup, pattern 1 completely fails to detect the transmitted symbols while pattern 3 performs the best.The simulation results from Figure 13 motivate that a CCA pattern must be designed such that the symbols within a pattern should experience approximately the same channel, i.e., symbols must be within the same TBC block.

B. END-TO-END TRANSMISSION
In this section, we compare the E2E performance of the CCA transmission with the DM-RS legacy.For DM-RS, LS-based channel estimation is employed to calculate the effective channel at the DM-RS symbols locations which is then followed by time and frequency interpolation and extrapolation to derive the channel estimate in the remaining REs.Given the effective channel estimate and the received signal, Minimum Mean Square Error (MMSE) equalization is performed at each RE.For both CCA and DM-RS, the received PDSCH data is then demodulated, DL Shared Channel (DL-SCH) is decoded to calculate the block error which is further used in throughput calculations.Perfect Channel (PCHAN) knowledge results are also incorporated as an upper bound on the throughput performance.

1) LOW DS & HIGH SPEED
We consider a CDL-C channel model with a DS of 30ns and UE speed of 60 km/hr.We compare the CCA PL-RB patterns in Figure 14 with the DM-RS patterns in table 2 and defined in 3GPP standards [4], [5].In Figure 15, we simulate the average system throughput in Megabits per second (Mbps).It can be realized that CCA provides, on average, a 40% more throughput than the DM-RS legacy.DM-RS pattern 3 provides better throughput than its counterparts in the DM-RS group.This is because the number of reserved REs in DM-RS pattern 3 (24 REs/RB) is more than that of patterns 1 (12 REs/RB) and 2 (18 REs/RB).Moreover, all CCA patterns the DM-RS ones even with only 3 reserved REs/RB for CCA PL-RB pattern 3. maximum achievable throughput by all CCA patterns is more than that of DM-RS patterns 2 and 3.This is because the number of reserved REs/RB for the CCA patterns is less than those DM-RS ones, which in turn, provides a larger transport block size and an increased maximum throughput.

2) HIGH DS & LOW SPEED
We consider a CDL-C channel model with a DS of 300ns and UE speed of 1 km/hr.We compare DM-RS pattern 1 in table 2 with CCA pattern 3 in Figure 14c while varying the SG size such that N BSG ∈ {2, 4}.Similar to the low DS and high-speed case in V-B.1, Figure 16 shows that CCA provides more throughput than its DM-RS counterpart.

C. MULTI-LAYER SIMULATION
In this section, we simulate a multi-layer transmission scheme, with L = 2 layers, where two scenarios are considered; 1) the second layer acts as an interference and we only aim to decode the data from the first layer, and 2) two data layers are transmitted simultaneously to increase the data rate.

VOLUME 2, 2024
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

1) MULTI-LAYER INTERFERENCE
We consider a two-layer transmission scenario where the second layer functions as an interference.This can be used to model a multi-user (MU) MIMO network with two gNBs each transmitting data to an assigned UE.For either of the UEs in that network, the interference is then due to the transmission of the secondary gNB to its intended UE.Towards this end, the baseband equivalent model of the received signal at each of the two CCA views can be expressed as follows where the above equation is a special case of the general model defined in (7) with ℓ = 1, and L = 2.The Signalto-Interference Ratio (SIR) can easily be found to be SIR = 1−α α .We vary the SIR value in our simulation while satisfying a total power budget constraint.In order to ensure that the SNR remains fixed for the first layer, the noise power is scaled with a factor of 1 − α.In Figure 17, we simulate the average SER vs SIR for the default CCA pattern in Figure 8 and compare it with DM-RS pattern 2 in table 2. On one side, CCA equalization is not affected by the low SIR values.This is because CCA aims to recover the common signal regardless of the interference power, which is in fact, one of the striking and unique strengths of CCA.However, on the other side, DM-RS is highly influenced by the interference power.It fails to calculate an accurate effective channel estimate to be used further for equalization and therefore, it provides a high SER in the interference-limited (low SIR) range.On increasing SIR, the channel switches to being a noise-limited one where the performance converges to the average SER achieved by the simulated noise range, i.e., convergence is to zero for the noiseless case and 2 × 10 −3 average SER for 2 dB noise in the DM-RS case.

2) MULTI-LAYER TRANSMISSION
We consider a DL transmission scenario where a gNB transmits two data layers, i.e., L = 2, to the UE.CCA default pattern in Figure 8 is used in the first layer.In order to ensure that the CCA recovers one common signal at each layer, we use a shifted version of the pattern in Figure 8 such that all light blue and yellow REs are shifted one symbol to the right.It is important to highlight that the same CCA PL-RB pattern can be used across different layers but the way the views are constructed should vary among layers, as shown in the example in IV-A, i.e., different permutation codes.
In Figure 18, we present the average SER for each of the layers separately.PCHAN results, as in Section V-B, are incorporated as a lower bound on the performance.It can be seen that the considered CCA pattern outperforms the DM-RS one in the simulated channel conditions.Moreover, the gap between CCA and DM-RS in layer 2 is greater than that of layer 1.This is because the interference power on layer 2 is larger than that of layer 1, assuming equal power allocation across layers, i.e., α = 0.5, which leads to a   degraded SER for DM-RS as an inaccurate channel estimate is used for equalization.However, as mentioned in V-C.1, CCA is not affected as it tends to recover the common signal regardless of the interference power in addition to the fact CCA works well in the low SNR region [26].
As shown in the multi-layer interference example in Figure 17 and the multi-layer transmission example in Figure 18, the use of CCA transmission can enable the decoding of a targeted layer in the case of multi-user transmission or the decoding of all layers that are transmitted.It is important to note that multiple users and layers employ different patterns.Remark 2 emphasizes that even with a moderate  value of N , an adequate amount of CCA repetition patterns can be created, making it easier to accommodate multi-user and multi-layer transmission scenarios.

D. MODULATION ORDER IMPACT
In this section, we study the effect of higher-order modulation on CCA performance.In Figure 19, we simulate the average SER for CCA pattern 1 in Figure 12c and DM-RS pattern 1 in table 2. From Figure 19, CCA outperforms DM-RS in 16QAM modulation while DM-RS performs slightly better when 64QAM is used.It can also be realized that the average SER, from CCA equalization, increases with a higher modulation order, whereas, it decreases when the number of pilots, N p , per sub-grid increases.This is because, from (11), the variance of αiℓ decreases on either increasing N p or the pilot symbols' transmit power.Hence, with a fixed CCA pilot number, motivates those pilots should be chosen from symbols lying on the constellation boundaries when high modulation orders are used, or perhaps one can use QPSK pilots to estimate the phase and then update the combiners accordingly.

VI. CONCLUSION
In this paper, we introduced a machine learning-based approach that enables reference signal-free data channel demodulation.Instead of using DM-RS symbols with the traditional data channels, the new structure requires the repetition of a subset of data symbols across the allocated resources in the slot.Utilizing the repetition structure at the UE side, CCA is applied to the constructed views in each subband, resulting in high-quality combiners, that can be used to recover the signal of interest without the need for traditional channel estimation.To further boost the CCA performance, we presented two effective and principled strategies to select the repetition type and to deal with the high-frequency selective channels.We discussed how the proposed method reduces the complexity compared to traditional DM-RS channel estimation and equalization.Lastly, we provided extensive simulations on a 3GPP link-level testbench, that showed that the proposed approach consistently outperforms the state-of-the-art methods.

FIGURE 2 .
FIGURE 2. Proposed data structure with data REs colored in blue and light blue and reserved REs colored in yellow.Left (pattern 1) illustrates a repetition in time while right (pattern 2) a repetition in frequency.

B
. DATA REPETITION At time slot t, the UE generates the source indices set for the ℓ-th layer, I ℓ s , such that the symbols of the corresponding CCA signal x cℓ lie within the Time-Bandwidth Coherence (TBC) block B (1) ℓ .Similarly, the destination indices set I ℓ d is generated such that |I ℓ d | = |I ℓ s | = N with REs in the TBC block B (2) ℓ .The CCA pattern P ℓ = (I ℓ s , I ℓ d ) is then signaled to the gNB to be used in the next slot transmission.At time slot t + 1, the gNB repeats the CCA signal x cℓ , constructed from the indices in I ℓ s , in the locations defined by I ℓ d .The UE constructs the two views Y iℓ ∈ C N r × N , i = {1, 2}, of the CCA signal x cℓ defined in (7), where h (iℓ) eff is the DL channel impulse response matrix that corresponds to TBC block B (i)

FIGURE 3 .
FIGURE 3. A sub-grid example.Light blue REs in the SG represent the first TBC block, while yellow REs are for the second block.

FIGURE 5 .
FIGURE 5. Realizations for different values of N.

FIGURE 6 .
FIGURE 6. SER vs SNR for CDL-C channel model with a RG of 1 RB.

FIGURE 7 .
FIGURE 7. SER vs SNR for a SB of 20 RB.

FIGURE 9 .
FIGURE 9. SER vs SNR for a RG of 50 RBs, SCS=30kHz and N p = 1 symbols per SG.

FIGURE 10 .
FIGURE 10.SER vs N p per SG in a RG of 50 RBs, SCS=30kHz and N BSG = 5.

FIGURE 19 .
FIGURE 19.SER vs SNR for CCA and DM-RS with 16 & 64 QAM modulation.RG of 52 RBs with SCS of 15 kHz and N BSG = 4 RBs.
The ordering of the indices in the sets I ℓ s and I ℓ d is critical, i.e., different ordering of the indices leads to different CCA patterns.For instance, assume that the REs with indices 1 and 2 are reserved for I ℓ s while those with indices 3 and 4 are reserved for I ℓ d , i.e., N = 2.This leads to 2 different CCA patterns namely; ({1, 2}, {3, 4}) and ({1, 2}, {4, 3}).In general, if we have N REs reserved for each of I ℓ s and I ℓ d , the number of distinct CCA patterns one can pick from is N !, where ! is the factorial operator.This implies that a modest value of N can produce numerous CCA repetition patterns, allowing for the support of multi-user and multi-layer transmission scenarios.Recall that the N !patterns can be obtained for a fixed time-frequency location, so shifting those patterns in time and/or frequency will yield entirely different pattern sets.