Grant-Free Access via Bilinear Inference for Cell-Free MIMO with Low-Coherent Pilots

We propose a novel joint activity, channel, and data estimation (JACDE) scheme for cell-free multiple-input multiple-output (MIMO) systems compliant with fifth-generation (5G) new radio (NR) orthogonal frequency-division multiplexing (OFDM) signaling. The contribution aims to allow significant overhead reduction of cell-free MIMO systems by enabling grant-free access, while maintaining moderate throughput per user. To that end, we extend the conventional MIMO OFDM protocol so as to incorporate activity detection capability without resorting to spreading informative data symbols, in contrast with related work which typically relies on signal spreading. Our method leverages a Bayesian message passing scheme based on Gaussian approximation, which jointly performs active user detection (AUD), channel estimation (CE), and multi-user detection (MUD), incorporating also a well-structured low-coherent pilot design based on frame theory, which mitigates pilot contamination, and finally complemented with a detector empowered by bilinear message passing. The efficacy of the resulting JACDE-based grant-free access scheme without spreading data sequences is demonstrated by simulation results, which are shown to significantly outperform the current state-of-the-art and approach the performance of an idealized (genie-aided) scheme in which user activity and channel coefficients are perfectly known.


I. INTRODUCTION
Multiple-antenna architectures, in particular massive multiple-input multiple-output (MIMO) and its extensions, will continue to be one of essential technologies in fifth generation (5G) and future sixth generation (6G) networks, in order to satisfy the heterogeneous requirements raised by massive machine type communications (mMTC), enhanced mobile broadband (eMBB), ultra reliable low latency communications (URLLC), their various combinations, and the ever-growing demand for higher data rates and user capacities [1]- [10].Besides, massive MIMO technology is considered an enabler of not only high throughput communications but also massive connectivity due to the significant amount of the spatial degrees of freedoms (DoFs) it provides, which can be exploited to solve inherent problems such as multi-user detection (MUD), channel estimation (CE), and active user detection (AUD) in uplink scenarios, among others.
This article aims to tackle the challenging task of noncoherent joint activity, channel and data estimation (JACDE) in a cell-free MIMO architecture without spreading the data symbols unlike most of existing works.A key component of the proposed approach is an appropriately designed bilinear message passing algorithm.
As compared to the conventional coherent MIMO communication mechanism, where AUD and CE are sequentially performed based on predetermined reference signals (e.g., pilot sequences) followed by MUD relying on the estimated channel state information (CSI), a main challenge of massive uplink access is the communication overhead for CSI acquisition, which scales with the number of potential uplink users in the system due to the need of orthogonal pilot sequences so as to maintain accurate CSI knowledge.It is also worth mentioning that utilizing non-orthogonal pilot sequences for channel estimation, while contributing to reduce overhead, leads to severe MUD performance deterioration due to the rank-deficient (i.e., underdetermined) conditions typically faced, even under the assumption that perfect AUD is available at the receiver.In addition, in massive MIMO settings, excessive piloting might exceed channel coherence time, particular in the case of fast fading environments, which makes non-orthogonal pilots necessity.
One of the emerging approaches aimed at tackling this issue is joint channel and data estimation (JCDE) which takes advantage of estimated data symbols as soft pilot sequences, while exploiting their pseudo-orthogonality to improve system performance and efficiency.In particular, the bilinear generalized approximate message passing (BiGAMP) scheme proposed in [11] has been considered a key ingredient to solve such a detection problem in wireless systems.In that scheme, Onsager correction is employed to decouple the selffeedback of messages across iterations as is the case with the approximate message passing (AMP), leading to stable convergence behavior as shown, for instance, in [12], [13].
It has, however, been recently shown in the literature [14] that the estimation performance of BiGAMP severely deteriorates when non-orthogonal pilot sequences are exploited, even if adaptive damping is employed, due to the fact that the derivation of BiGAMP relies heavily on the assumption of very large systems, although shortening the pilot sequence is the very aim of the method itself.In order to circumvent this issue, the authors in [15] proposed a novel bilinear message passing algorithm, referred to as bilinear Gaussian belief propagation (BiGaBP), with the aim of generalizing BiGAMP on the basis of belief propagation (BP) [16] for robust recovery subject to non-orthogonal piloting.Despite the aforementioned progresses, many existing works including the ones mentioned above, focus only on joint CE and MUD while assuming that perfect AUD is available at the receiver.
One of the solutions to the AUD problem is grant-free random access [17], [18], which has been intensively investigated in the last few years and can be categorized as a variation of joint activity and channel estimation (JACE) or JACDE, with Bayesian receiver design components.In context of Bayesian approaches, Bayesian JACE can be seen as a non-orthogonal pilot-based random access protocol in which active users simultaneously transmit their unique spreading signatures to their base stations (BSs), and the BS employs a massage passing algorithm -e.g.multiple measurement vector approximate message passing (MMV-AMP) -as receiver, with the aim of detecting user activity patterns and their corresponding channel responses, while taking advantage of the time-sparsity resulting from the activity patterns.As for Bayesian JACDE schemes, most of the existing works on that approach, e.g.[19]- [23], are extensions of the aforementioned Bayesian JACE in which spreading data sequences generated by multiplying data symbols with their unique spreading signatures are transmitted, while leveraging a similar receiver design as that of Bayesian JACE methods.
There is also another approach to AUD that takes advantage of the sample covariance matrix constructed from the large number of antennas at the receiver.This covariance-based method has also attracted attention due to its applicability to unsourced random access (URA), where JACDE can be achieved by letting active users transmit a codeword sequence selected from a common predetermined codebook over a given time slot.To elaborate, it has been shown in [24] that the covariance-based approach is able to accommodate a larger number of active users, while using limited per-user wireless resources1 due to the nature of index-type modulation based on spread codewords, which is therefore suited to low-rate super mMTC scenarios.
Besides the above, a fundamental challenge from a system point of view is the spatial correlation of the massive MIMO channel, which has been argued in [26] to be a limiting factor of centralized massive MIMO, although most of the existing work in the area, including [14], [15], [19], [20], [27]- [29], makes use of the assumption that channels are subjected to ideal (uncorrelated) Rayleigh fading.In order to iron out this issue, the cell-free massive MIMO concept -studied e.g. in [30], [31], which virtually configures a massive MIMO setup by spatially distributing access points (APs) connected through wired fronthaul links to a common central computing unit (CCU) -has recently emerged, offering an architecture capable of decorrelating the spatial dependence between APs, leading to an ideal independently distributed channel structure.
In light of the above, the main focus of this article is to incorporate virtues of the aforementioned approaches, proposing a novel grant-free JACDE algorithm for non-orthogonal massive random access in cell-free MIMO systems with nonspread data streams, which, to the best of our knowledge, has not been presented yet in the literature.

A. Related Work
As described above, there is a variety of approaches to solve MUD, CE, and AUD in uplink MIMO systems, which for the sake of readability are categorized into two distinct approaches detailed as follows.
First, there are the approaches exploiting bilinear inference, well-known methods including the convex relaxation methods [32], the non-convex successive over-relaxation methods [33], the variational-Bayes methods [34], and the AMP methods, among others.In [11], [12], a unified AMP-based approach to matrix completion, robust principle component analysis (PRCA), and dictionary learning was proposed, and the resulting BiGAMP was empirically shown in [12], [13] to be competitive in terms of phase transition and computation complexity.In the context of self-calibration and matrix compressed sensing, parametric BiGAMP (PBiGAMP) was proposed in [35] and found to yield improved phase transitions in comparison with the aforementioned counterparts.
In the attempt to overcome AMP's vulnerability against measurement correlation [36], [37], the vector AMP (VAMP) [38], orthogonal AMP (OAMP) [39] and other iterative detectors based on the expectation propagation (EP) framework [40], [41] have been proposed, which handle a class of unitaryinvariant measurement matrices.These algorithms require, however, matrix inversion operations unlike the original AMP approach.A rigorous analysis of the convergence property of this approach was presented in [38], [41], and potential connection among different methods was investigated in [42]- [44], with the extension to the bilinear inference method proposed in [45], [46].Also, it is worth noting that the Gaussian belief propagation (GaBP) [16] approach can be interpreted as the origin of the aforementioned message passing rules.
In the context of grant-free massive random access, it has been shown theoretically in [47] that non-orthogonal access schemes with random coding not only outperform classical multiple-access channel (MAC) protocols such as coded ALOHA, but also have the potential to nearly achieve theoretical limits in terms of the user capacity.Motivated by the above, various authors [24], [25], [48]- [51] studied random coding schemes and/or its covariance-based receiver designs for URA in grant-free MIMO systems with massive numbers of potential users.In this line of work, the massive MIMO JACE problem was considered in [52], [53] with a similar receiver design based on the sample covariance approach.Finally, as for the Bayesian approach, the authors in [28], [54]- [56] have investigated JACE for massive random access with Bayesian compressed sensing receivers, which was also extended to JACDE in [19]- [21], [57], where informative bits are embedded into spreading codewords.And since many existing works, including the ones mentioned above, considered the conventional centralized massive MIMO architecture, which in practice might suffer from spatial correlation, a more recent contribution [58] has tackled this issue by studying a structured massive access scheme for JCDE in a cell-free massive MIMO setting, while assuming perfect activity detection and a grantbased architecture.
In summary, it can be said that the majority of contributions addressing JACDE in MIMO uplink channels can be categorized as either covariance-based receivers for the grant-free URA, or Bayesian receivers with informative data symbols embedded into spreading sequences.

B. Contributions
Given the above, the contributions of the article are summarized as follows.
• Feasibility of non-spread JACDE: In contrast to most of related literature [19]- [22], which addresses AUD, CE, and MUD jointly in a grant-free fashion but which on the other hand require spreading data sequences, we demonstrate the feasibility of grant-free JACDE without spreading data sequences drawn from a predetermined constellation as is the case for the conventional coherent MIMO-OFDM systems.• Cell-free assisted grant-free access: A potential advantage of the cell-free architecture is that helps resolve spatial correlation problems of massive MIMO.To the best of our knowledge, however, no grant-free design for cell-free massive MIMO scheme without spreading data has been proposed yet.In this article, we extend previous works such as [27], [58], [59] and offer a cell-free assisted grant-free access scheme.• Frame-theoretic pilot design: Although much of existing literature relies on Gaussian pilot designs, its mutual coherence is limited unless the large-system limit condition is satisfied.In other words, such a design policy is not suited for a non-orthogonal short pilot aided system, which is the very aim of the article.To this end, a frametheoretic non-orthogonal pilot design is introduced, which asymptotically approaches the theoretical lower bound.• Bilinear inference for JACDE: In order to enable the above, a novel JACDE algorithm, dubbed activity-aware BiGaBP, is presented here, in which bilinear inference, message passing rules, Gaussian approximation, and a new belief scaling technique that forges resilience of the derived messages, are combined.Simulation results are offered to support the claims above, which is based on the IEEE 5G NR configurations.We emphasize that, to the best of our knowledge, a JACDE mechanism for cell-free grant-free MIMO systems without spreading data symbols, which is the key contribution of this article, has not yet been proposed.
Notation: In the remainder of the article, the following notation will be employed.Sets of numbers in the real, complex, and Hilbert spaces will be denoted by R, C, and H, respectively.The transpose, transpose conjugate, and hermitian adjoint operators, correspondingly in the real, complex, and (a) A model illustration of cell-free MIMO systems with distributed single-antenna APs serving uplink users which access the system in a grant-free basis.Frame < l a t e x i t s h a 1 _ b a s e 6 4 = " X d

Subframe
< l a t e x i t s h a 1 _ b a s e 6 4 = " 7 z j e v T / 0 4 l 4  Hilbert spaces will be respectively expressed as • T , • H , and • H .The multivariate circular symmetric complex Gaussian distribution with mean µ and covariance Σ will be denoted by CN (µ, Σ), while N (µ, Σ) will denote its real-domain counterpart.Finally, • p will be used to denote the p-norm with p ∈ {2, ∞}, while •, • is the inner product operator.

II. SYSTEM MODEL
Consider a cell-free large MIMO system composed of N spatially distributed single-antenna APs connected by wired fronthaul links to a common high-performance CCU, serving M synchronized single-antenna users in a grant-free fashion, as depicted in Figure 1a.Due to the dynamic nature of grantfree systems, it is assumed that a fraction of the total M singleantenna users become active within a given 5G NR OFDM symbol frame [60], whereas the rest of uplink users remain silence during that period.Recalling the frame structure as 5G NR as shown in Figure 1b, in which an OFDM frame is composed of 10 subframes and each subframe consists of Q slots, each comprising 14 OFDM symbols, one may take advantage of a part of the radio frame structure as reference signals and the rest as a data stream.We also remark that the number of slots within a certain subframe (i.e., Q) depends on the subcarrier spacing employed in a system.
Given the above, let us define K being the total number of discrete time indexes within an OFDM frame and C {c 1 , c 2 , . . ., c 2 b } representing a set of constellation points with b denoting the number of bits per symbol.Introducing the user index set M {1, 2, . . ., M } and a set of active users A , the received signal vector y k ∈ C N ×1 at k-th time index with k ∈ {1, 2, . . ., K} can be written as (1) In the system model equation above, x k ∈ C M ×1 denotes the transmitted signal vector with non-zero element at index a ∈ A and zero otherwise; w k ∈ C N ×1 denotes the additive white Gaussian noise (AWGN) vector distributed as w k ∼ CN N (0, N 0 I N ); and H ∈ C N ×M denotes the flat fading communication channel matrix, whose element at n-th row and m-th column follows an independent and identically distributed (i.i.d.) circularly symmetric complex Gaussian distribution with zero mean and variance γ nm 10 − βnm 10 ; and β nm follows the 3GPP urban microcell model [61], namely, with d nm denoting the distance between the n-th AP and m-th user, while assuming that the height of the AP and the user are 10 [m] and 1.65 [m], respectively.
Assuming that the user activity remains consistent within an OFDM frame, we can concatenate K consecutive symbol transmissions into the transmitted signal matrix X [x 1 , x 2 , . . ., x K ].In turn, the row sparse nature of X translates to a column-sparsity in the channel matrix H.To be more specific, the m-th column of the channel matrix (i.e., h m ) can be modeled as a multivariate random variable following the Bernoulli-Gaussian distribution, that is, where λ is the activity factor, δ(h m ) denotes the Dirac delta function that takes value 1 if and only if h m = 0 and 0 otherwise, and Γ m diag ([γ 1m , γ 2m , . . ., γ N m ]) is the covariance matrix of the channel.
In light of the above, the received signal matrix Y ∈ C N ×K concatenating the received signal vectors given in equation ( 1) over K successive time indices, can be readily expressed as where Y [y 1 , y 2 , . . ., y K ] and W [w 1 , w 2 , . . ., w K ].Without loss of generality, in order to explicitly express the the pilot and data sequences within the transmit and received signal, let us define, with • p and • d corresponding to pilot and data signal blocks, respectively, K p + K d = K with K p K d < K, and each element of X d assumed to be drawn from the constellation set C in a similar way to conventional coherent OFDM systems.
We remark that unlike most grant free systems found in literature, e.g.[19]- [21], [55], [62], [63], in which informative data is transmitted in the form of a spread sequence, equation ( 4) can be seen as an activity-aware variant of the conventional MIMO-OFDM systems, which aims at both grant-free random access and pilot length reduction enabled by jointly extrapolating user activity, channel state, and data streams.These features make our approach suited to grant-free cell-free systems, which better meet the demand of higher spectrum efficiency per user than methods previously proposed, including those aforementioned.

A. Pilot Sequence Design
In this section we apply a frame-theoretic approach to effectively design a structured pilot matrix for non-orthogonal transmission, aiming at efficiently reducing the pilot length K p while preserving the linear independence between vectors in the pilot matrix X p as much as possible.To this end, assuming a non-orthogonal representation of the pilot matrix (i.e., K p < M ), let us first define the mutual coherence in the following as a measure of the similarity between nonorthogonal bases.
H J×L be a frame matrix over a Hilbert space H J×L , comprising of frame vectors f ∈ H J×1 , with ∈ {1, 2 . . ., L} and J < L. The mutual coherence of F is given by which in case of an equal-norm frame, resumes to Given the above, one may readily notice that the mutual coherence of a frame matrix F is equivalent to the maximum non-diagonal element of its gram matrix G F H F , which is known to be lower-bounded by the Welch bound for L ≤ J 2 [64], that is, indicating that the Welch bound is a lower-bound on the mutual coherence of a frame.Furthermore, equally spreading non-orthogonal bases over a certain Hilbert space is necessary for a design of the pilot sequences especially in grant-free systems so as to preserve an analogous tight energy representation in terms of the fairness among sporadic uplink users.To this end, another key property of a frame is introduced in the following.

Definition 2 (Tightness).
By means of the Rayleigh-Ritz Theorem, a frame matrix F possesses the following inequalities. where In light of the above, our goal is to find a frame that minimizes the mutual coherence in order to distinguish each pilot sequence even under severe non-orthogonal scenarios (i.e., K p M ) while maintaining the tightness for the sake of fair transmission in terms of energy consumption.Although it has been recently shown in [65] that a group-theoretic tight frame construction approach can indeed achieve near-Welch-bound performance, the algorithm proposed thereby has limitations on the size of a frame matrix constructable via the cyclic-group based method, e.g., the number of frame vectors (i.e., L) needs to be a prime number.Since it is desirable to design a frame with any L and J for system-and-user centric communication architectures, we hereafter consider a convex optimization based frame construction method that is capable of dealing with any dimensional pair while minimizing the mutual coherence from an optimization perspective.
Such a low-coherent unit-norm tight frame with arbitrary dimensions can be obtained by taking advantage of an iterative method, referred to as sequential iterative decorrelation via convex optimization (SIDCO), which sequentially separates non-orthogonal bases from any equal-norm frame as an initial starting point.Since the original SIDCO proposed in [66] is limited to frames only in the real space R, its extension to the complex space C, dubbed as complex SIDCO (CSIDCO), has been studied in [67], [68], where the strategy to minimize the mutual coherence is to iteratively solve a series of the following convex optimization problem for a given F ∈ C J×L .
where F ∈ C J×L−1 is equivalent to F with its i-th column pruned and the search region is limited to a multidimensional Euclidean ball with radius T given by Although the CSIDCO reformulation given in equation ( 10) already follows the disciplined convex programming conventions such that this problem can be easily solved via interior point methods through CVX available in high-level numerical computing programming languages such as MATLAB and Fig. 2: Mutual coherence comparison as a function of OFDM symbols utilized for pilot lengths K p with M = 100 uplink users.As benchmarks, we adopt the popular random Gaussian pilot sequence considered in, e.g.,, [25], [28], and a randomly truncated discrete Fourier transform matrix.
Python, the abstraction penalty of such high-level programming languages are too high for real-world communication systems.Aiming at real-time processing of convex optimization problems, the authors in [69] have developed an automatic low-level code generator for conic programming problems, which solves convex problems with moderate size on the order of microseconds or milliseconds, although it is limited to quadratic program (QP)-representable convex problems.In light of the above, for the sake of completeness, equation ( 10) is transformed into the following quadratic relaxation form.

Introducing x
[ {f } ; {f } ; t ,R ; t ,I ] ∈ R 2J+2×1 with slack variables t ,R ∈ R + and t ,I ∈ R + for all , the CSIDCO formulation given in equation (10) for a unit-norm low-coherent frame construction can be represented as where f T 0 0 T , and the other coefficient matrices are listed in the Appendix.

Proof. See Appendix A
One may argue that the frame matrix constructed via the quadratic CSIDCO method described above is not strictly tight due to the fact that tightness is not enforced during the optimization process.However, the tightening approach proposed in [70] based on the polar decomposition can be applied to the output of the quadratic CSIDCO, yielding an arbitrarily-sized low-coherent equal-norm tight frame sufficiently close to the ideal equiangular tight frames which are not suited to practice as they exist only for particular dimensions, which is therefore adopted as pilot sequences in this article.
To illustrate the performance of the quadratic CSIDCO method described above especially in 5G NR setups, mutual coherence comparisons of the method against popular pilot construction approaches (i.e., the random Gaussian and truncated discrete Fourier transform matrices) as a function of the number of OFDM symbols leveraged for pilot lengths are shown in Figure 2, which demonstrates that in fact the quadratic CSIDCO approaches the Welch bound while reducing the correlation between pilot sequences in comparison with the other two approaches, implying capability of sufficiently mitigating the inter-user interference even in highly nonorthogonal scenarios and efficiently decreasing the number of pilot lengths simultaneously.

III. JOINT DETECTION VIA BILINEAR GAUSSIAN BELIEF PROPAGATION
In this section, we describe a joint activity, channel, and data estimation mechanism via bilinear GaBP for large cell-free MIMO architectures with 5G NR signaling, whose belief propagation can be modeled as the graph schematized in Figure 3.In order to further clarify the process of the proposed method, a work flow chart of the proposed detection mechanism is also illustrated in Figure 4, where the proposed detection is split into two algorithms, namely, Algorithm 1): Belief consensus and Algorithm 2): hard decision.
As shown in Figure 4, the work flow starts with an initialization so as to obtain an initial guess of the channel estimate by leveraging only the pilot sequences, the result of which is however limited in accuracy due to the severely non-orthogonal structure of the pilot matrix as K p M .Aiming at improving the channel estimation accuracy by jointly detecting the data as well as the activity, the initial channel guess obtained by the initialization process is fed to Algorithm 1, which as illustrated in Figure 3, is composed of two different stages; 1) soft interference cancellation and beliefs generation based on tentative soft estimates and 2) combining beliefs and soft estimates generation, described in the next two subsections, respectively.
Followed by Algorithm 1, we proceed with Algorithm 2, where the final hard decision of the data and activity is curried out, which will be described in Section III-C.

A. Factor nodes
Focusing on the received signal element y nk at the n-th row and k-th column of Y with the aim of detecting x mk at the m-th row and k-th column of X, the received signal after soft interference cancellation using initial estimates is given by ỹm,nk y nk − Inter-user interference cancellation with soft-replicas where xn,ik and ĥk,ni with i ∈ {1, 2, . . ., M } are tentative estimates of x ik and h ni , respectively, generated at variable nodes in the previous iteration, and w nk is the noise element at the n-th row and k-th column of W .
Owing to the central limit theorem, the interference-plusnoise component can be approximated as a complex Gaussian random variable under large-system conditions, resulting in the fact that the conditional probability density function (PDF) of equation ( 13) for given x mk and h nm can be respectively expressed as with where ψ x n,ik and ψ h k,ni denote expected error variances corresponding to xn,ik and ĥk,ni , respectively, which will be explained in detail in the next subsection, and we remark that

B. Variable nodes
Given the soft interference cancellation and belief generation above, one can combine the Gaussian beliefs given in equation ( 14), yielding the PDF of an extrinsic belief l x n,mk with In turn, since the activity can be expressed as columnsparsity in the channel matrix H, combining beliefs of the m-th column of the channel matrix (i.e., h m ) needs to be jointly performed over n ∈ {1, 2, . . ., N }.Thus, the PDF of the extrinsic belief l h k,m for given h m is given by Notice that the expression in equation (18c) is in fact a complex multi-variate Gaussian distribution, that is where with qk,nm ψ q k,nm Taking the expectation over the PDFs of the extrinsic beliefs given in equations ( 16) and (19), respectively, soft estimates of x mk and h m can be respectively obtained as xn,mk = xq∈C x q • p l x n,mk |x mk (l x n,mk |x q )p x mk (x q ) x q ∈C p l x n,mk |x mk (l x n,mk |x q )p x mk (x q ) , (21a) where the denominators are introduced to ensure that the integral of the posterior PDFs is equivalent to 1.
Although a general closed-form expression of equation (21a) is not known for arbitrary discrete constellations, in the case of Gray-coded quadrature phase-shift keying (QPSK) it can be written as [71] xn,mk = 1 and its error variance estimate is given by In order to derive a closed-form expression for equation (21b), we first simplify the effective PDF such that Also, the normalizing factor in the denominator of equation (21b) can be calculated by solving the integral over the entire N -dimensional complex field, which yields where For the sake of readability, we have moved the detailed derivations of equations ( 24) and ( 25) to Appendix B.
In light of the above, the soft replica for h nm for a given effective distribution and for the corresponding error variance, introducing .

C. Algorithm Description
In this subsection we summarize the belief propagation and consensus mechanisms described above and schematized in Figure 4, offering also detailed discussion on the algorithmic flow.For starters, the schemes are concisely described in the fork of pseudo-codes in Algorithm 1 and Algorithm 2, respectively.For the sake of brevity, we define two sets of integers (i.e., K p and K d ), which are respectively given by K p {1, 2 . . ., K p } and K d {K p + 1, K p + 2 . . ., K}.
As shown in the pseudo-codes, Algorithm 1 requires five different inputs: the received signal matrix Y , the pilot sequence X p , an initial guess of the channel matrix Ĥ, an initial guess of the estimation error variance Ψh corresponding to Ĥ, and the maximum number of iterations t max ; while Algorithm 2 is fed with the beliefs obtained from Algorithm 1.We point out that rough estimates Ĥ and Ψh can be obtained through state-of-the-art estimation algorithms proposed for grant-free systems [25], [27], [28], although such mechanisms suffer from estimation inaccuracy in case of severely non-orthogonal pilot sequence (K p M ), which is however desired from a system-level perspective in terms of time resource efficiency.In Section III-D, the initialization process adopted in this article will be discussed in details.
10: ∀k, ∀m, ∀n: 11: ∀k, ∀m: µ h k,m (t) = [q k,1m (t), . . ., qk,Nm (t)] T 12: ∀k, ∀m: Σ h k,m (t) = diag(ψ q k,1m (t), . . ., ψ q k,N m (t)) 13: ∀k, ∀m: In turn, the outputs of Algorithm 2 are the following three quantities: an estimated symbol matrix X, an estimated channel matrix Ĥ, and estimates of active-user indexes Â .As for X and Ĥ, they are simply obtained by combining all the beliefs (i.e., consensus), while Â is determined by following a certain activity detection policy based on the estimated channel Ĥ.The activity detection policy considered in this article will be described in Section III-E in further details.
Although most of the algorithmic flow in Algorithm 1 and 2 follows the belief exchange policy described in Section III-A and III-B above, we have brought two belief manipulation techniques (i.e., damping [72] and scaling [71]) to Algorithm 1 in order to further improve the estimation accuracy and escape from local optima.This is due to the fact that when the Gaussian approximation assumed in equation ( 14) does not sufficiently describe the actual stochastic behavior of the effective noise, the accuracy of soft-replicas is degraded by resultant belief outliers caused by the aforementioned approximation gap, leading to unignorable estimation performance deterioration.This error propagation effect becomes predominant when the pilot length decreases, which is exactly the scenario the article aims at.
Following [11], [15], we have applied damping to line 15, 17 and 19-20 of Algorithm 1 with the damping factor η ∈ [0, 1], which tends to prevent the algorithm from converging to local minima by forcing a slow update of soft-replicas, whereas belief scaling is adopted in line 18 of Algorithm 1 with parameter γ(t), which in turn adjust the reliability of beliefs (i.e., harnessing harmful outliers).The dynamics is designed to be a linear function of the number of iterations, that is, Besides the above, as shown in Section III-B, soft estimates of x mk are obtained without considering the user activity (i.e., row-sparsity) by imposing user activity detection upon the channel estimation process as described in equation (21b), indicating that such row-sparsity of X needs to be incorporated so as to avoid inconsistency with column-sparsity of H.
Ironing out this issue, we leverage the sparsity factor τ k,m given in equation ( 26) in line 18 and 20 of Algorithm 1, which tends to be 1 when active and to be ∞ otherwise, such that rows of X corresponding to non-active columns of H are suppressed, maintaining the consistency.

D. Initialization
Due to the fact that bilinear inference problems are sensitively affected by the initial values of the solution variables, a reasonable initialization method is required so that the algorithm accurately estimates the channel, the informative data and the activity pattern simultaneously.However, one may also notice that due to the severe non-orthogonality of the pilot sequence (i.e., K p |A | M ) for overhead reduction, the accuracy of such an initial guess is not reliable enough.
In light of the above, although several approaches developed for grant-free access such as covariance-based methods [25], [52] can be considered to produce initial Ĥ and Ψh , in this article we have leveraged MMV-AMP [28] from a computational complexity perspective2 , which can be simply applied to equation ( 4) by regarding the first K p columns of Y and the pilot matrix as the effective received signal matrix and its measurement matrix, respectively.

E. Activity Detection Policy
In this section, we describe how to identify user activity patterns based on the belief consensus performed in Algorithm 2. Due to the fact that miss-detection and false alarms are in a trade-off relationship as shown in the grant-free literature, such an activity detection policy is affected by system and user requirements, indicating that one needs to adopt a suitable criterion depending on the situation in practical implementations.
With that in mind, we simply consider the log likelihood ratio method based on estimated channel quantities, which thanks to uncorrelated Gaussianity of the channel and residual estimation error, can be written as where LLR m is the log likelihood ratio corresponding to the m-th column and N n=1 is introduced to perform consensus over the receive antenna dimension.
After some basic manipulations, the log likelihood criterion can be simplified to where

IV. SIMULATION RESULTS
In this section, we evaluate via software simulation the proposed method in terms of bit error rate (BER), effective throughput, normalized mean square error (NMSE), and AUD performance.

A. Simulation Setup
Throughout this performance assessment section, we consider the following simulation setup unless otherwise specified.The number of receive antennas is set to N = 100, which are distributed over a square of side of 1000 [m] in a square mesh fashion, where M = 100 potential users are accommodated in each subcarrier.It is assumed that 50% of the M users become active during each OFDM frame, i.e., |A | = M/2 = 50, while K and K p are considered to be K ∈ {140, 280} and K p = 14 depending on the employed subcarrier spacing 3 .It is further worth-noting that since we accommodate M = 100 users with overhead length K p = 14, a significant amount of overhead reduction can be achieved.Although one might concern about performance degradation due to the resultant severe non-orthogonality, we dispel such concerns throughout this section by demonstrating that the bilinear inference method employed here is able to handle the non-orthogonality.
The transmit power range at each uplink user is determined based on the experimental assessments studied in [73], where the transmit power of each uplink user is limited by 16 [dBm].Furthermore, Gray-coded QPSK modulation is assumed to be employed at each user, whereas the noise floor N 0 at each AP is assumed to be modeled as σ 2 u = 10 log 10 (1000κT ) + NF + 10 log 10 (W ) [dBm], (33) where κ is the Boltzmann's constant, T = 293.15denotes the physical temperature at each AP in kelvins, the noise figure NF is assumed to be 5 [dB] and W expresses the subcarrier bandwidth.
As described in Section II-A, the pilot structure is designed via quadratic CSIDCO in order to mitigate pilot contamination effects as much as possible even in case of severely nonorthogonal scenarios such as one considered in this section.Regarding the effective throughput performance, we adopt the definition proposed in [74,Def. 1], which is given by where P e denotes the block (packet) error rate and b is the number of bits per OFDM symbol.
Regarding algorithmic parameters, the maximum number of iterations in Algorithm 1 is set to t max = 32 and the damping factor η is 0.5, while the belief scaling factor γ(t) is defined in equation (29).

B. MUD
The MUD performance of the proposed algorithm is studied in terms of uncoded BER as a function of transmit power.In order to take into account miss-detection (MD) effects on data detection, we count not only bits received in error but also the number of lost bits due to missing user activity, that is, Total number of bits (35) where P 1 e denotes the number of errors due to failure of symbol detection and P 2 e is the number of bits that have been lost due to failure of user detection.
Since there are no existing cell-free scheme with grantfree access which do not rely on spreading data sequences as described in equation ( 4), we consider the MMV-AMP algorithm as a state-of-the-art method to carry out JACE with basis of non-orthogonal pilot sequences X p , remarking that this receiver is widely employed in related literature [19]- [22].For the same reason (of lack of a direct equivalent competitor), we also compare the performance of our method against an idealized system in which perfect CE and AUD is assumed, with signal detection performed by the GaBP algorithm.
Our assessment starts with Figure 5, where the BER performance of the proposed method is compared not only to the state-of-the-art but also to the ideal performance, for different subcarrier spacing scenarios (i.e., K ∈ {140, 280}).
The state-of-the-art methods compared are the linear zeroforcing (ZF) MIMO detector and the GaBP message passing MIMO detector, followed by MMV-AMP-based CE with the aid of the non-orthogonal pilot sequence.As for the ideal performance, the GaBP MIMO detector with perfect knowledge of CE and AUD at the receiver is employed, which can be thus considered extreme lower bound to the proposed method.
As can be seen from both figures, state-of-the-art methods suffer from high error floors, which stem from the poor CE and AUD performances by MMV-AMP, caused by the severely overloaded condition.In fact the aspect ratio (number of users over number of pilot symbols) of the pilot matrix is M Kp = 100 14 ≈ 7.1428, indicating a highly non-orthogonal condition.
Although only m = 50 users out of M = 100 are assumed to be active in each coherent frame, the overloading ratio is still sufficiently high to hinder CE and AUD.In contrast, the proposed method enjoys a water-falling curve in terms of BER for the both situations, which can be achieved by taking advantage of the pseudo-orthogonality of the data structure.This advantage can be confirmed from the fact that increasing the total symbol length from K = 140 to K = 280 while fixing the pilot length to be K p = 14 can indeed enhance the detection performance as shown in Figure 5a and 5b.One may readily notice from the above that the corresponding CE performance can be also improved due to the same logic, which is offered in Section IV-D below.

C. Effective Throughput
In light of the definition given in equation (34), we next investigate the effective throughput performance per each OFDM frame of the proposed method.Please note that since the OFDM frame corresponds to 10 [ms], the effective throughput implies the number of successfully delivered bits within a resource block of 10 [ms] times an OFDM subcarrier.Simulation results showing the effective throughput achieved with the proposed scheme and compared alternatives are offered in Figure 6 for different subcarrier spacing setups.Note that the unit of the axis is set to kilobits per frame for the sake of readability.Furthermore, we also implicitly measure the packet (block) error performance of the methods as shown in equation (34), which is often used for practical performance assessment.Finally, in addition to the three counterparts considered in the previous section, we also offer in both figures the system-level achievable maximum data rate as reference, which is determined by Maximum Capacity K d As expected from the discussion of the previous section, it is found that the two state-of-the-art alternatives are incapable of successfully delivering bits transmitted by the active users even with sufficiently high transmit power.
In contrast, the proposed method dynamically follows the same improvement in performance with transmit power as the idealized receiver, approaching the achievable capacity as the transmit power increases.It is furthermore seen that, as inferred in Section IV-B, the throughput gap from the idealized scheme narrows as the subcarrier spacing increases, which is again due to the pseudo-orthogonality of the data sequence.

D. CE
In addition to the above, the CE performance of the proposed method is assessed in this section as a function of transmit power for different data lengths, so that one may observe that the performance improvement described in the above sections is, at least in part, induced by the resultant CE.
To this end, the NMSE performance of the proposed method for K = 140 and K = 280 is offered in Figure 7a and 7b, respectively, where the NMSE is defined as assuming a fixed pilot length K p = 14.As for methods to compare, we have adopted not only MMV-AMP but also minimum norm solution (MNS) that is known to be a method to seek a closed-form unique CE solution in case of a non-orthogonal pilot sequence [15], while employing the minimum mean square error (MMSE) performance with perfect knowledge of AUD and MUD at the receiver as reference.Please note that since the non-Bayesian approach, which takes advantage of the sample covariance of the received signals in order to detect user activity patterns, aims at only AUD, the resultant performance in terms of CE can be lower-bounded by MNS with perfect AUD.
With that in mind, it can be observed from Figure 7a and 7b that the proposed method can indeed improve the CE performance and approach the unachievable MMSE performance with perfect AUD and MUD, maintaining a similar gradient with that of the MMSE, whereas MMV-AMP and MNS suffer from a relatively high error floor due to the non-orthogonality of the pilot, although MMV-AMP appears to offer moderate performance in comparison with MNS.
Thanks to the pseudo-orthogonality of the data structure, the proposed method with 30 kHZ subcarrier spacing again outperforms its own NMSE with 15 kHZ subcarrier spacing.Furthermore, it can be mentioned that due to the sufficiently high CE accuracy of the proposed method (i.e., NMSE ∈ [10 −3 , 10 −4 ]), the considered non-coherent transmission architecture is comparable to the CE performance of the conventional coherent MIMO-OFDM systems.

E. AUD
In this section, we evaluate the AUD performance of the proposed BiGaBP method.Although the AUD performance may be examined in terms of either false alarm (FA), MD, or both, FA can be removed at higher layers by leveraging cyclic redundancy check codes [20], which are widely employed in practice.In light of the above, in this article, we adopt the occurrence of MDs as an AUD performance index.
In Figure 8, MD probabilities of the proposed BiGaBP and MMV-AMP algorithms are illustrated for different symbol lengths K as a function of transmit power at each uplink user, while assuming K p = 14 for both scenarios.It is perceived from Figure 8 that the proposed method can exponentially reduce the occurrence of MDs as transmit power increases, the reason of which can be explained from the discussions given in the preceding sections as follow.As we observed from Figure 5-7, the proposed BiGaBP algorithm starts to gradually recover the data and the channel from the observations Y as transmit power increases, which stems from the fact that the residual noise variances given in equation ( 23) and ( 28) are also accordingly reduced.Consequently, the resultant LLR given in (30) intends to be positive when | ĥnm | is not sufficiently close to 0 and negative when N n=1 CN 0, γ nm + ψ h nm | ĥnm ≈ 0 in comparison with N n=1 CN 0, ψ h nm | ĥnm for a small ψ h nm .Furthermore, the reason why the MD performance of MMV-AMP deteriorates in high transmit power regions can be explained as follows.Besides the insufficient observations due to a nonorthogonal pilot structure, MMV-AMP suffers from the fact that in such a high signal-to-noise ratio (SNR) region, its estimation error noise variance becomes indistinguishable from the AWGN noise level at the receiver, leading to a tendency to regard inactive users as active and vice versa.In contrast, the proposed method mitigates this bottleneck by taking advantage of DoFs in the time domain.

F. MSE Performance Prediction and Its Accuracy
Finally, in this subsection we evaluate the accuracy of mean square error (MSE) tracking via the state evolution of the proposed BiGaBP algorithm 4 , where the predicted MSE performances of the data and channel are obtained by equation ( 23) and (28), respectively.In Figure 9a and 9b, the predicted MSE for X and Ĥ is, respectively, compared with its corresponding simulated counterpart, where the solid line and the dashed line with circle markers corresponds to the simulated and predicted performance, respectively.Without loss of generality, the simulation setup is set to K = 280 and K p = 14 due to the space limitation.It can be observed from the figures that the state evolution can track the error performance of the proposed BiGaBP for both data and channel estimates, since the predicted MSEs follow approximately the same trajectory of its simulated counterpart.

V. CONCLUSION
In this article, we proposed a novel JACDE mechanism based on bilinear inference for OFDM cell-free grant-free MIMO systems without spreading data sequences, while designing a severely non-orthogonal pilot sequence via frame theory with the aim of an efficient overhead reduction.
In order to sustain moderate throughput per user, in contrast with most of the grant-free literature, the proposed method is developed based on the conventional MIMO OFDM protocol, whereas employing activity detection capability without resorting to spreading informative data symbols, which is enabled by bilinear inference and pseudo-orthogonality of the independently-generated data symbols.To this end, we derived new Bayesian message passing rules based on Gaussian approximation, which enables JACDE.Feasibility of JACDE grant-free access with non-spread data sequences has been established via simulation-based performance assessment, which is expected to pose a new angle to related research topics.

APPENDIX A PROOF OF THEOREM 1
Leveraging slack variables t ,R and t ,I , the real and imaginary parts of F H f can be bounded as which readily yields Furthermore, equation (38) can also be rewritten as (40a) where the inequality is applied in an element-by-element manner, leading to A ,I,2 x ≤ 0 (41d) where x is defined in Theorem 1, equation (12f) can be readily obtained from equation (10b), and this completes the proof.
APPENDIX B DERIVATION OF EQUATION ( 24) AND (25) Given equation ( 19) and (21b), the effective PDF can be readily expressed as where by the Woodbury inverse lemma we utilized −1 A for inversible A and B, one may readily obtain equation ( 24) from (42).This completes the derivation of (24).Similarly, the normalizing factor C k,m is given by = , which completes the derivation of equation (25).

(a) 15 Fig. 5 :
Fig. 5: BER comparisons as a function of transmit power

Fig. 6 :
Fig. 6: Effective throughput comparisons as a function of transmit power

Fig. 8 :
Fig. 8: MD comparisons as a function of transmit power

Fig. 9 :
Fig.9: MSE performance prediction via the state evolution of Algorithm 1 in comparison with the actual numerical evaluation as a function of transmit power.In Figure9aand 9b, the predicted MSE for X and Ĥ is, respectively, compared with its corresponding simulated counterpart, where the solid line and the dashed line with circle markers corresponds to the simulated and predicted performance, respectively.Without loss of generality, the simulation setup is set to K = 280 and K p = 14 due to the space limitation.It can be observed from the figures that the state evolution can track the error performance of the proposed BiGaBP for both data and channel estimates, since the predicted MSEs follow approximately the same trajectory of its simulated counterpart.