Spatial Correlation Aware Compressed Sensing for User Activity Detection and Channel Estimation in Massive MTC

Grant-free access is considered as a key enabler to massive machine-type communications (mMTC) as it promotes energy-efficiency and small signalling overhead. Due to the sporadic user activity in mMTC, joint user identification and channel estimation (JUICE) is a main challenge. This paper addresses the JUICE in single-cell mMTC with single-antenna users and a multi-antenna base station (BS) under spatially correlated fading channels. In particular, by leveraging the sporadic user activity, we solve the JUICE in a multi measurement vector compressed sensing (CS) framework under two different cases, with and without the knowledge of prior channel distribution information (CDI) at the BS. First, for the case without prior information, we formulate the JUICE as an iterative reweighted $\ell_{2,1}$-norm minimization problem. Second, when the CDI is known to the BS, we exploit the available information and formulate the JUICE from a Bayesian estimation perspective as a maximum \emph{a posteriori} probability (MAP) estimation problem. For both JUICE formulations, we derive efficient iterative solutions based on the alternating direction method of multipliers (ADMM). The numerical experiments show that the proposed solutions achieve higher channel estimation quality and activity detection accuracy with shorter pilot sequences compared to existing algorithms.


I. INTRODUCTION
Massive machine-type communications (mMTC) aim to provide wireless connectivity to billions of low-cost energy-constrained internet of things (IoT) devices [1].mMTC promote three main features.First, sporadic transmissions, i.e., only an unknown subset of the IoT devices are active at a given transmission instant.Second, short-packet communications dominated by the uplink traffic.Third, energy-efficient communication protocols to ensure a long lifespan for the IoT devices, here referred to as user equipments (UEs).As the base station (BS) aims to serve a massive number of energy-constrained devices, channel access management is considered as one of the main challenges in mMTC [2].The conventional channel access protocols, where each UE is assigned a dedicated transmission resource block, are inefficient because many resource blocks are frequently wasted as being pre-assigned to inactive UEs.Subsequently, alternative schemes have been proposed to provide more efficient channel access protocols.In particular, grant-free multiple-access has been identified as a key enabler for mMTC [3].
In the conventional grant-based channel access protocols, the active UEs first request an access to the channel, and then, the BS allocates a dedicated transmission block to each active UE in a multi-step handover process [2].Differently, in grant-free access, the UEs transmit data as per their needs without going through the grant-based access protocols.The main advantage of grantfree access compared to conventional random access is the reduced signalling overhead and the improved energy-efficiency of the UEs.However, a paramount challenge in grant-free access is to identify the set of active UEs and to estimate their channel state information for coherent data detection.We refer to this problem as joint user identification and channel estimation (JUICE).
The sparse user activity pattern induced by the sporadic transmissions in mMTC motivates the formulation of the JUICE as a compressed sensing (CS) [4]- [6] problem.Furthermore, as the BS antennas sense the same sparse user activity, the JUICE problem extends to the multiple measurement vector (MMV) CS framework.Sparse support and signal recovery from an MMV setup has been studied extensively in the literature.In a nutshell, the proposed MMV sparse recovery algorithms can be categorized into the following classes: 1) greedy algorithms such as simultaneous orthogonal matching pursuit (SOMP) [7], 2) mixed norm optimization approaches [8] (and the references therein), 3) iterative methods such as approximate message passing (AMP) [9], and 4) sparse Bayesian learning (SBL) [10].
In sparse support and signal recovery algorithms, the prior knowledge on the distributions and the structure of the signals has a profound effect on the recovery performance.For instance, when the signal distribution is known, algorithms like SBL have shown superior performance compared to mixed-norm minimization [11].However, if the signal distribution is unknown and signal statistics are not available, algorithms based on mixed-norm minimization such as 2,1norm minimization present a good choice, since they are invariant to the signal distribution.However, the 2,1 -norm suffers from a bias toward large coefficients in the recovery.Therefore, formulating the sparse recovery as an iterative reweighted 1 -norm [12] or 2,1 -norm [13] problem provides a significant improvement compared to their non-reweighted counterparts.

A. Related Work
A rich line of research has been presented for grant-free access in mMTC.In [14], Chen et al.
addressed the user activity detection problem in grant-free mMTC using AMP and derived an analytical performance of the proposed AMP algorithm in both single measurement vector and MMV setups.Liu et al. [15], [16] extended the analysis of [14] and conducted an asymptotic performance analysis for activity detection, channel estimation, and achievable rate.Senel and Larsson [17] designed a "non-coherent" detection scheme for very-short packet transmission by jointly detecting the active users and the transmitted information bits.Ke et al. [18] addressed the JUICE problem in an enhanced mobile broadband system and proposed a generalized AMP algorithm that exploits the channel sparsity present in both the spatial and the angular domains.
Yuan et al. [19] addressed the JUICE problem in a distributed mMTC system with mixed-analogto-digital converters under two different user traffic classes.An SBL approach has been adopted in [20] and a maximum likelihood estimation approach using the measurement covariance matrix has been considered in [21].Recently, Ying et al. [22] presented a model-driven framework for the JUICE by utilizing CS techniques in a deep learning framework to jointly design the pilot sequences and detect the active UEs.
In addition to the sparsity of the activity pattern of the UEs, the aforementioned algorithms require different degrees of prior information on the (sparse) signal distribution.For instance, in the AMP-based approaches [14]- [17], [21], the BS is assumed to know the distributions and the large-scale fading coefficients of channels.The work in [18] relies similarly on the known channel distributions but assumes unknown large-scale fading coefficients, which are estimated via an expectation-maximization approach.

B. Main Contribution
This paper considers the JUICE problem in single-cell mMTC, with single-antenna UEs under spatially correlated multiple-input multiple-output (MIMO) 1 channels.In particular, we address the JUICE under two different cases, with and without the availability of the channel distribution information (CDI) at the BS.First, under unknown CDI, the JUICE is formulated as an iterative reweighted 2,1 -norm minimization with a deterministic regularization penalty that accounts for the sparsity in the user activity.Second, when the CDI is available to the BS, we formulate the JUICE problem from the Bayesian perspective.By using the available knowledge on the CDI and imposing a sparsity-inducing prior on the sporadic activity of the UEs, we formulate the JUICE under a maximum a posteriori probability (MAP) estimation framework.For both JUICE formulations, we derive computationally efficient iterative solutions based on alternating direction method of multipliers (ADMM).
The vast majority of JUICE works assume that the communications channels are spatially uncorrelated and often also independent Rayleigh fading.Although this assumption may lead to analytically tractable solutions, it is not always practical as the MIMO channels are almost always spatially correlated [23].Our paper aims to bridge this gap by addressing spatially correlated channels, which have not been widely studied in the context of JUICE in mMTC.In fact, incorporating the spatial correlation structure in the design of a JUICE solution is crucial, because the performance of JUICE solutions designed for uncorrelated channels may be sensitive to the correlation structures faced in practical scenarios [22].Recently, Chen et al. [24] presented an orthogonal AMP algorithm to exploit both the spatial channel correlation in mMTC systems.
The main contributions of our paper can be summarized as follows: • We address the JUICE problem in spatially correlated MIMO channels to provide a realistic assessment of the performance of the proposed JUICE solutions.Although precise knowledge of the CDI may be challenging in some practical applications, the results provide channel estimation performance benchmark for system design.
• When the BS has limited knowledge on the data structure, i.e., only the sparse behaviour of users activity is taken into consideration, we exploit the benefits of reweighting strategies in CS and formulate the JUICE as a reweighted iterative 2,1 -norm optimization problem.
Reweighted 2,1 -norm minimization has not been used for JUICE problems earlier.
• When the CDI is known, we fully exploit the available information and propose a novel JUICE formulation from the Bayesian perspective.The proposed formulation relaxes nonconvex Bayesian MAP estimation to convex regularization-based optimization.In particular, the CDI knowledge is incorporated via the Mahalanobis distance measure.
• For each JUICE formulation, we use a specific variable splitting strategy that allows to derive an exact ADMM solution.The proposed approach decouples the JUICE problem into a set of convex sub-problems, each admitting a computationally efficient closed-form solution that can be computed efficiently via a simple analytical formula.
• We show empirically that the proposed algorithms enhance the accuracy of user activity detection and channel estimation quality.In particular, for predefined requirements, the proposed approaches achieve the same performance as baseline MMV JUICE solutions even when using significantly smaller signalling overhead.
This paper is in line with our recent work [25], [26] where we addressed the JUICE under spatially correlated highly directive channels.In [25], [26], the JUICE was formulated as a mixednorm minimization problem, augmented by a deterministic penalty that exploits the second-order statistics of the channels.In this paper, we further leverage the available knowledge on the entire CDI and treat the JUICE problem under a more rigorous, Bayesian framework.
Organization: The rest of the paper is organized as follows.Section II presents the system

A. System Model
We consider a single-cell uplink mMTC system, as depicted in Fig. 1 We consider a block fading channel response over each coherence period T c .Furthermore, to model the propagation channels between the UEs and the BS, we consider a local scattering model, which is suitable for multi-antenna channel modelling as it can capture some key characteristics of the typical MIMO channels [27,Sect. 2.6].In the local scattering channel model, the BS is considered to be located in an elevated position and thus, it has no scatterers in its near-field, whereas the UEs are surrounded by rich scattering environment.The channel response vector from each UE i ∈ N is modelled as the superposition of P i physical signal paths, each reaching the BS as a plane wave.Accordingly, the channel response vector between the ith UE and the BS, denoted as h i ∈ C M , is modelled as where g i,p ∈ C accounts for the gain and the phase-rotation of the pth propagation path, ψ i,p is the angle of arrival (AoA) of the pth path, and a(ψ i,p ) ∈ C M is the steering vector of the ULA, defined as a(ψ i,p ) = [1, e −j2π∆r cos(ψ i,p ) , . . ., e −j2π(M −1)∆r cos(ψ i,p ) ] T , where ∆ r denotes the normalized spacing between the adjacent BS antennas.We consider that ψ i,p = ψi + ζ i,p , where ψi ∈ [−π/2, π/2] represents the (deterministic) incident angle between the ith user and the BS, and ζ i,p denotes a (random) deviation from the incident angle with angular standard deviation σ ψ .We assume that each ζ i,p follows a Gaussian distribution ζ i,p ∼ CN (0, σ 2 ψ ) [27, Sect.2.6].The propagation channel between each UE and the BS is often considered to follow a complex Gaussian distribution.More specifically, by utilizing the valid assumption that the number of scatterers around each UE is very large in practice and invoking the central limit theorem, the channel vector h i in (1) can be modelled as a complex Gaussian random variable with zero mean and covariance matrix The channel realizations h i are independent between different coherence intervals T c .We consider UEs with low mobility, which is justified in the context of mMTC [28].Hence, we adopt the common assumption that the channels are wide-sense stationary [29].Thus, the set of channel covariance matrices {R i } N i=1 vary in a slower timescale compared to the channel realizations [30].Accordingly, {R i } N i=1 are assumed to remain fixed for τ s coherence intervals, where τ s can be on the order of thousands [23].We note that this assumption can be challenging in mMTC where some UEs are inactive for a longer period.Therefore, we will elaborate further on this issue in Section IV-C.
Due to the sporadic activity pattern of mMTC, only K << N UEs are active at each coherence interval T c , whereas the remaining N − K are inactive.This is depicted in Fig. 1(b).In order to deploy a grant-free multiple access scheme, we assume that all the UEs and the BS are synchronized.In addition, each coherence interval T c permits transmitting τ c symbols and is divided into two phases, as shown in Fig. 1(c).In the first phase, each active UE transmits its τ p -length pilot sequence to the BS.In the second phase, using the remaining τ c −τ p symbols, the active UEs send their information data to the BS.During each T c , the BS uses the transmitted pilot sequences from the first phase to identify the set of active UEs and estimate their corresponding channels in order to decode the information data transmitted at the second phase.
Regarding the channel estimation phase, the BS assigns to each UE i ∈ N a unique unitnorm pilot sequence φ i ∈ C τp .Due to the potentially large number of UEs, the UEs cannot be assigned orthogonal pilot sequences, because it would require a pilot length of the same order as the number of UEs.Therefore, the BS assigns a set of non-orthogonal pilots which can be generated, for instance, from an independent identically distributed (i.i.d.) Gaussian or i.i.d.
Bernoulli distribution.Herein we consider pilot sequences generated from a complex symmetric Bernoulli distribution.This approach would drive the probability of pilot collision, i.e., assigning the same pilot to two distinct UEs, to be negligible [17].
Furthermore, to mitigate the channel gain differences between the UEs, a power control policy is deployed such that each UE i ∈ N transmits with a power p UL i that is inversely proportional to the average channel gain [17], [23].Accordingly, the received signal associated with the transmitted pilots at the BS, denoted by Y ∈ C τp×M , is given by where W∈ C τp×M is an additive white Gaussian noise with independent an i.i.d.elements as CN (0, σ 2 ), and γ i ∈ B is the ith element of the binary user activity indicator vector γ = [γ 1 , γ 2 , . . ., γ N ] T , defined as where S ⊆ {1, . . ., N }, |S| = K, is the set of active users.We assume that besides not knowing which users are active at a given time, the BS does not either know the activity level K N .Let us define the effective channel of user i ∈ N as x i = γ i p UL i h i , and subsequently, the effective channel matrix as X = [x 1 , . . ., x N ] ∈ C M ×N .The pilot sequence matrix is defined as Accordingly, we can rewrite the received signal associated with the pilots in (3) as

B. Problem Formulation
The columns of effective channel matrix X corresponding to the inactive users are zeros, thus, X T is a row-sparse matrix; it contains only K non-zero rows.The objective of JUICE is to jointly identify and estimate the non-zero elements of effective channel matrix X.Thus, JUICE can be modelled as joint support and signal recovery from an MMV setup.Subsequently, the canonical form of the JUICE can be presented as where is the sparsifying regularizer and β 1 controls the trade-off between the emphasis on the measurement consistency term and the sparsity-promoting term.However, the 0 -"norm" (with slight abuse of terminology regarding a norm) minimization is an intractable combinatorial NP-hard problem.Therefore, several algorithms have been presented in the literature to relax the optimization problem (6).The existing algorithms can be categorized based on their required prior information on the signal.For instance, while AMP and SBL require prior information on the distributions of a sparse signal, mixed-norm minimization and most of the greedy algorithms operate based on the mere fact that the signal has a sparse structure.
In this paper, we cover both cases, i.e., JUICE with and without prior knowledge on the CDI.First, when there is no prior knowledge on the channel, we formulate the JUICE as an iterative reweighted 2,1 -norm optimization problem in Section III.Second, in Section IV, we assume that the BS has prior knowledge on the CDI, and we formulate the JUICE as a MAP estimation problem.For both these JUICE frameworks, we will derive a computationally efficient ADMM method to solve the formulated optimization problem.Each ADMM algorithm solves a relaxed version of the involved problem iteratively, and in particular, provides a closed-form solution to each sub-problem included in the optimization process.

III. JUICE VIA REWEIGHTED 2,1 -NORM MINIMIZATION
Without the CDI, the 2,1 -norm penalty is commonly used to relax the 2,0 -norm penalty in the JUICE formulation in (6) as Nevertheless, unlike the democratic 0 -norm which penalizes the non-zero coefficients equally, 1 -norm is biased toward larger magnitudes, i.e., coefficients with a large magnitude are penalized more heavily than smaller ones.Therefore, striving for enhanced sparsity recovery, we use the log-sum penalty [12] to relax the 0 -norm in (6) as where u = [u 1 , u 2 , . . ., u N ] T is a vector of auxiliary optimization variables and 0 is a small positive stability parameter.The log-sum penalty resembles most closely the 2,0 -norm penalty when 0 → 0. However, a practical, numerically robust choice is to set 0 to be slightly less than the expected norm of the non-zero rows in X T [12].
As the objective function in ( 8) is a sum of a convex and a concave functions, it is not convex in general.Therefore, by applying a majorization-minimization (MM) approximation, ( 8) can be solved as the following iterative reweighted 2,1 -norm minimization problem with the weights set at iteration (l) as

A. IRW-ADMM Solution
The optimization problem ( 9) is convex and can be solved optimally using standard convex optimization techniques.However, as mMTC systems may grow large, the standard techniques can suffer from high computational complexity.As a remedy, we utilize ADMM [31] to solve (9) iteratively in a computationally efficient manner at each MM iteration (l).
ADMM has been widely used to provide computationally efficient solutions to sparse signal recovery problems [32].Apart from signal reconstruction, ADMM has also been utilized in the context of activity detection in mMTC [22], [33].Cirik et al. [33] proposed an ADMM-based solution to multi-user support and signal detection in an SMV model, where they incorporate prior knowledge on the signal recovered from the previous transmission instants.In addition, Ying et al. [22] proposed an approximation step in ADMM similar to [32], but they solved the sub-problems through a model-driven deep learning decoder.
In contrast to the approximate solutions to problem (9) provided in [22], [32], we solve (9) exactly by adopting a variable splitting strategy different to [22], [32].More precisely, the proposed splitting technique decomposes the objective function in ( 9) into two separable convex functions that can be solved efficiently via simple analytical formulas.In particular, we derive a set of update rules to solve (9) iteratively in a sequential fashion over multiple convex subproblems, where each sub-problem admits a closed-form solution, as we will show next.
By introducing a splitting variable Z ∈ C M ×N , i.e., a copy of optimization variable X, we decompose the objective function in (9) into two separate functions: a quadratic function on the measurement fidelity over Z and a reweighted 2,1 -norm penalty over X.Subsequently, we rewrite the optimization problem (9) as Next, we write the augmented Lagrangian of (11) as follows where Λ = [λ 1 , . . ., λ N ] ∈ C M ×N denotes the dual variable matrix containing the ADMM dual variables {λ i } N i=1 , and ρ is a positive parameter for adjusting the convergence of the ADMM.The ADMM solves an optimization problem through sequential phases over the primal variables followed by the method of multipliers to update the dual variables [31].Therefore, by applying the ADMM to the optimization problem (9), we first minimize (12) over the primal variable Z with (X, Λ) fixed, followed by minimization over the primal variable X with (Z, Λ) fixed.Finally, the ADMM updates the dual variable matrix Λ using the most recent updates of (X, Z).Thus, the ADMM for (9) consists of the following three steps: where the superscript (k) denotes the ADMM iteration index2 .The derivations of the ADMM steps ( 13) and ( 14) are detailed below.
a) Z-update: ADMM updates the primal variable Z by solving the convex optimization problem (13).Thus, Z (k+1) is obtained by setting the gradient of the objective function in (13) with respect to Z to zero, resulting in Note that the matrix inversion Φ * Φ T + ρI N −1 and the product Y T Φ * need to be computed only once, thus, they can be stored, reducing the overall algorithm complexity.
b) X-update: The optimization problem ( 14) can be decomposed into N sub-problems as where c is the ith column of Λ (k) .The problem in (17) admits a closed-form solution given by the soft thresholding operator [34] as Finally, the dual variable update Λ (k+1) is performed using (15).

B. Algorithm Implementation
The details for the proposed iterative reweighted ADMM (IRW-ADMM) algorithm to solve the problem (9) are summarized in Algorithm 1.As one stopping criterion, Algorithm 1 is run until the X-update is converged, measured as X (k) − X (k−1) 2 F < with a predefined tolerance parameter > 0, or until a maximum number of iterations l max k max is reached, where l max denotes the maximum number of iterations in the MM loop and k max denotes the maximum number of iterations in the ADMM loop.Note that if the weight vector is fixed to g (l) = 1, l = 1, 2, . .., Algorithm 1 provides the ADMM solution for optimization problem (7), which we term ADMM henceforth.
1 Receive Y at the BS, and compute and store Update Z (k+1) using ( 16) Update X (k+1) using ( 18) In this section, we propose a Bayesian formulation for JUICE when the CDI is available at the BS.We formulate the JUICE as MAP estimation and derive a computationally efficient ADMM solution for a relaxed version of the MAP problem.

A. MAP Estimation
The JUICE formulation presented in Section III as an iterative reweighted 2,1 -norm minimization (problem ( 9)) can be viewed as a joint support and signal recovery problem with a deterministic sparsity regularization.Such formulation presents a robust approach as it is invariant to the channel statistics, making it suitable for a broad range of channel distributions.However, the optimization problem ( 9) omits any available side information on the CDI.Alternatively, if the CDI is available, the JUICE problem can be formulated in a Bayesian framework to account for the fact that each unknown channel to be estimated is a realization of a random variable (vector) with the known distribution.A Bayesian sparse signal recovery framework has great potential in providing certain advantages over deterministic formulations [35].
Developing a JUICE solution from a Bayesian perspective is enabled by: 1) the fact that the propagation channels h i , i ∈ N , are modeled by Gaussian distributions as shown in (2), and 2) the relatively slowly changing covariance matrices {R i } N i=1 which can be estimated with high accuracy.In the rest of the paper, we consider the common assumption that {R i } N i=1 are known to the BS [29].The acquisition of CDI knowledge is further elaborated in Section IV-C.
Next, we utilize the prior information on the CDI and derive a Bayesian formulation for the JUICE problem.The JUICE performs two tasks in a joint fashion: 1) identification of the support of the user activity indicator vector γ, and 2) estimation of the effective channel matrix X, relying on the current estimate of γ.The JUICE formulation in (9) applies a deterministic penalty that accounts for the row-sparsity of X T which inherently captures the sparsity in γ.
However, in the Bayesian modelling, we treat the two variables to be estimated, γ and X, as unknown quantities with such prior distributions that best model our knowledge on their true distributions, that is: 1) the sparse distribution of the user activity indicator vector γ, and 2) the effective channel x i , ∀i ∈ N , which is a random vector consisting of a multiplication of γ i and the complex Gaussian random vector h i (i.e., x i = p UL i γ i h i ).We derive joint MAP estimates { X, γ} by making an explicit use of the prior knowledge on the fact that the propagation channels between the UEs and the BS follow complex Gaussian distributions given in (2), under the assumption that the BS knows the estimates of the secondorder statistics of the channels, i.e., the matrices { Ri } N i=1 .To this end, the joint MAP estimates { X, γ} with respect to the posterior density given the measurement matrix Y is given by where denotes the conditional probability density function (PDF) of the effective channel X given the vector γ, whereas the term p(γ) represents the prior belief on the distribution of the user activity.
Next, we elaborate in detail the choice of the prior p(γ) and the definition of the conditional PDF p(X|γ).Then, having fixed these quantities, we derive an ADMM algorithm to find an approximate solution to the MAP estimation in (19).
1) Sparse prior p(γ): By the model assumption on the sporadic UE activity, the user activity indicator vector γ exhibits a sparse structure (γ i = 0, ∀i / ∈ S).Thus, in the context of sparse recovery, we impose a sparsity prior p(γ) on γ.For instance, given a continuous-magnitude random vector θ ∈ C N , a sparsity-inducing prior can be given by p(θ) Note that setting p = 1 results in the 1 -norm penalty corresponding to the Laplace density function.On the other hand, setting p = 0 renders the optimal sparsity-inducing penalty corresponding to the 0 -norm.Since γ is a vector of binary elements, setting p = 0 is equivalent to p = 1 as it imposes the same sparsity prior p(γ).Subsequently, we select the prior p(γ) as the 0 -norm penalty as 2) Conditional probability p(X|γ): Since the user activity is controlled by γ, the conditional probability p(X|γ) is defined as follows.First, we note that the activity patterns of the different users are mutually independent, hence, the conditional PDF factorizes as p(X|γ) = N i=1 p(x i |γ i ).In addition, for each user i ∈ N , we distinguish the two possible cases for p(x i |γ i ) as follows: 1) Conditioned on γ i = 1, the ith UE is active and x i follows a Gaussian distribution, i.e., p(x i |γ i = 1) = p x i , where p x i ∼ CN (0, Ri ) and Ri denotes the scaled covariance matrix defined as Ri = p UL i Ri .2) Conditioned on γ i = 0, the ith UE is inactive, and x i is a deterministic all-zero vector x i = 0 with probability 1, i.e., p(x i |γ i = 0) = 1.Therefore, p(X|γ) is given by By applying the log transformation to p(γ) in (20) and to p(X|γ) in (21), and by dropping the constant terms that do not depend on γ and X, the joint MAP estimation problem ( 19) can be equivalently written as where regularization weights β 1 and β 2 balance the emphasis on the priors both in relation to each other and to the measurement fidelity term.The third term in ( 22) applies a quadratic Mahalanobis distance measure 3 , x H i R−1 x i , i ∈ N , for active UEs in order to incorporate the knowledge of the spatial correlation matrices of the UEs into the optimization process.

B. MAP-ADMM Solution
The non-convex problem ( 22) is a mixed-integer programming problem due to involving binary optimization variables γ, and is, thus, hard to solve for large N .In this section, we develop a computationally efficient ADMM algorithm, which is numerically illustrated to achieve great performance in Section VI.
We start by noting that the recovery of effective channel X renders implicitly the vector γ, i.e., finding the index set {i | γ i = 0, i ∈ N } is equivalent to finding the index set {i | x i 2 > 0, i ∈ N }.Therefore, we solve a relaxed version of the MAP estimation ( 22) by approximating the penalty term that depend on γ by penalty term that depend on x i 2 , ∀i ∈ N .
Note that the second term N i=1 1(γ i ) in ( 22) is equivalent to an X 2,0 penalty in the sense that it enforces the row-sparsity of the matrix X T .Therefore, N i=1 1(γ i ) can be relaxed by the 3 The Mahalanobis distance between a vector θ and the Gaussian distribution with mean µ and covariance matrix R is defined It measures the distance between the vector θ and the mean of the distribution (µ) measured along the principal component axes determined the covariance matrix R.
log-sum penalty N i=1 log( x i 2 + 0 ).Subsequently, we can eliminate γ and approximate ( 22) as min Again, we utilize MM and linearize the concave penalty term by its first-order Taylor expansion at point u (l) .Thus, an approximate solution to ( 23) is found by iteratively solving the problem where the weight vector g (l) = [g 2 , . . ., g N ] T is given according to (10).The optimization problem (24) can be seen as an iterative reweighted 2,1 -norm minimization augmented with an additional penalty function that incorporates the spatial correlation knowledge to the optimization process by applying a Mahalanobis distance penalty on the active UEs.
The objective function in ( 24) is a sum of convex functions, hence, the optimization problem ( 24) is convex.Thus, aiming to provide a computationally efficient solution, we develop an ADMM framework that solves (24) through a set of sequential update rules, each computed in closed-form.In particular, in order to decompose (24) into a set of separate functions, we introduce two splitting variables Z, V ∈ C M ×N and rewrite the optimization problem as The optimization problem ( 25) is block multi-convex, i.e., the problem is convex in one set of variables while all the other variables are fixed.Since ADMM exploits implicitly the block multiconvexity nature of (25), utilizing ADMM to solve ( 25) is a reasonable choice.Accordingly, the augmented Lagrangian associated with ( 25) is given by where are the matrices of the ADMM dual variables.
The ADMM solution to the optimization problem (24) at the (l)th MM iteration is achieved by sequentially minimizing L(X, Z, V, Λ z , Λ v ) over the primal variables (Z, V, X), followed by dual variable (Λ z , Λ v ) updates as follows: We present the derivations of the ADMM steps ( 27), (28), and ( 29) in detail below.
a) Z-update: We note that the Z-update in ( 27) is identical to the convex optimization problem in (13), hence, Z (k+1) is computed using (16).
b) V-update: We can easily show that the V-update in ( 28) can be decoupled into N convex sub-problems, given by The solution for (32) is obtained by setting the derivative of the objective function with respect to v i to zero, resulting in c) X-update: Using the manipulations for (29) shown in Appendix A, the X-update solves the following convex optimization problem: where α i and . The problem (34) decouples into N sub-problems, each admitting the closed-form solution

C. CDI Knowledge
MAP-ADMM operates on the assumption that the BS knows the CDI of the individual channels, i.e., {R} N i=1 .We note that the assumption that the BS knows {R i } N i=1 is widely accepted in the massive MIMO literature [27], [29].Furthermore, a similar assumption on the availability of the CDI has been adopted in [24] that addresses JUICE in mMTC with sporadic user activity.
The acquisition of the CDI at the BS may be challenging, especially for the UEs which are inactive for a long period.Therefore, a possible solution to circumvent such an issue can be realized by deploying a training phase to estimate the CDI.The training phase can be implemented over separate channel resource blocks that are solely dedicated to estimate the CDI.In practice, the BS would consume a set of available channel resources in order to obtain an estimate of all the channel covariance matrices, denoted as { Ri } N i=1 .In particular, at different time intervals, a specific group of UEs transmit pre-assigned orthogonal training pilots to the BS over T coherence intervals, and subsequently, the BS employs conventional MIMO channel estimation techniques to obtain T estimates of channel responses h i , denoted as ĥ1 i , . . ., ĥT i .Subsequently, the BS computes the estimated channel covariance matrix 4 for the ith UE as i .The frequency of updating the CDI at the BS depends on the mobility and the activity level of the UEs as well as the changes in the multi-path environment.Therefore, the estimated Ri can be used over several coherence intervals due to the fact that: 1) the channel covariance matrices vary in a slower timescale compared to the channel coherence time, and 2) the UEs have very low mobility in many practical mMTC systems.Consequently, learning the CDI does not consume disproportionate amount of resources.As we will show in the simulation results, the BS does not require a large number of training samples T to estimate {R} N i=1 .In fact, the BS needs roughly T = 2M samples to provide near-optimal results in terms of the mean square error for channel estimation, and we note that similar conclusion has been reported in [27,Sect. 3.3.3].

D. Algorithm Implementation
The details of the proposed MAP-based JUICE, termed MAP-ADMM, are summarized in Algorithm 2. We note that Z-update (27) and the V-update (28) are independent from each 4 For a recent review on the different techniques on channel estimation and channel covariance estimation in massive MIMO networks, we refer the reader to [27,.

Algorithm 2: MAP-ADMM
1 Receive Y at the BS, and compute and store while l < l max do 2 while k < k max or X (k) − X (k−1) < do 3 Update Z (k+1) using ( 16) and V (k+1) using (33) 4 Update X (k+1) using ( 35) i using ( 10) other, hence, they can be performed fully in parallel.Similarly to Algorithm 1, MAP-ADMM is run until X (k) − X (k−1) 2 F < or until a maximum number of iterations l max k max is reached.

V. ALGORITHM COMPUTATIONAL COMPLEXITY
In a typical mMTC scenario, where the number of connected devices is very large, the complexity of the recovery algorithms is an important issue to address.In fact, for the implementation of the proposed algorithms, the computational complexity determines the hardware processing cost.Next, we analyze the complexity of the proposed JUICE algorithms in terms of the number of required complex multiplications per iteration using the big O(•) notation.The complexity analysis is summarized in Table I which also shows the exact number of matrix multiplications.
At the Z-update step of IRW-ADMM and MAP-ADMM, for fixed ρ, the quantity Φ T Φ * + ρI N −1 is computed only once at an algorithm initialization.Similarly, the term Y T Φ * is computed only once upon receiving the pilot signal Y. Therefore, computing Z (k+1) requires (M +1)N 2 complex multiplications.For the V-update of MAP-ADMM, the terms 1 (33) need to be computed only once and can subsequently be used for several coherence intervals.Hence, MAP-ADMM requires N (M 2 + 2M ) complex multiplications to compute V (k+1) .The soft-threshold operators in ( 18) and ( 35) for the X-update require 2M N complex multiplications.Finally, the weight vector g (l) is computed only at the outer MM iteration level (l) and it requires M N complex multiplications.Therefore, the overall complexity for IRW-ADMM and MAP-ADMM is O(M N 2 ) and O(M N 2 + N M 2 ), respectively.
Table I also compares the complexity of IRW-ADMM and MAP-ADMM to the three baseline algorithms that we consider in the numerical experiments: Fast alternating direction method (F-ADM) [32], simultaneous orthogonal matching pursuit (SOMP) [7], and temporal sparse Bayesian learning (T-SBL) [11].F-ADM solves the problem ( 7) using an ADMM algorithm and it has computational complexity of O(M τ p N ).The greedy SOMP is reported in [18] to exhibit computational complexity of O(M τ p N ).T-SBL has computational complexity 5 of O(N 2 M 3 τ p ).
In summary, incorporating the channel spatial correlation information results in increased computational complexity.For instance, MAP-ADMM has higher computational complexity per iteration compared to IRW-ADMM due to incorporating the spatial structure information in the V-update step.In addition, as the proposed IRW-ADMM and MAP-ADMM aim at providing an exact solution to the JUICE problem, they are more computationally complex than F-ADM.Nevertheless, the additional cost of the proposed algorithms is compensated for by the convergence to a more accurate solution, as we will show in the next section.

VI. SIMULATION RESULTS
In this section, we provide simulation results to show the performance of the proposed JUICE algorithms in terms of user activity detection accuracy, channel estimation quality, and convergence rate, and compare them to existing MMV reconstruction algorithms.

A. Simulation Setup
We consider a single-cell of a radius of 50 m, where the BS is surrounded by N = 200 uniformly distributed UEs, out of which K = 10 UEs are active at each coherence interval T c .
The propagation channel between the ith user and the BS in (1) consists of P i = 200 paths with angular spread deviation σ ψ = 10 • .Each user i = 1, . . ., N is assigned with a unique normalized quadratic phase shift keying (QPSK) sequence φ i , where the QPSK pilot symbols are drawn from an i.i.d.complex Bernoulli distribution.The SNR is defined as SNR [dB] = 10 log 10 .

B. Performance Metrics
The JUICE performance is quantified in terms of normalized mean square error (NMSE), support recovery rate (SRR), and the convergence rate.The NMSE is defined as , where X S and XS denote the original and estimated effective channel matrix, respectively, restricted to the true active support S. The expectation in the NMSE is computed via Monte-Carlo averaging over the randomness of effective channel matrix X, the pilot sequence matrix Φ, and noise W; thus, the NMSE is presented as the normalized average square error (NASE).
The SRR is defined as |S∩ Ŝ| |S− Ŝ|+K , where Ŝ = {i | xi 2 > thr , ∀i ∈ N } denotes the detected support for a small pre-defined threshold thr .Thus, |S ∩ Ŝ| represents the number of correctly identified active users, whereas |S − Ŝ| accounts for both the number of misdetected active UEs and falsely identified inactive UEs.The SRR rate approaches 1 when Ŝ is close to the true S.

C. Baselines
We compare the performance of the proposed algorithms against the following algorithms that solve any MMV sparse recovery problem: 1) SOMP [7], 2) F-ADM algorithm [32], which differs from the proposed ADMM in Algorithm 1 (with g (l) = 1) in that while ADMM provides an exact solution to (7), F-ADM solves (7) approximately by linearizing the X-update sub-problem with the first-order Taylor expansion; 3) SPARROW, which reformulates (7) as a semi-definite programming problem [8, Eq. ( 22)] and we solve it using CVX toolbox [37]; and 4) T-SBL [11] where both the second-order statistics and the noise variance are known at the BS and the sparse recovery is performed using the update rules given by [11, Eqs. ( 6), ( 7), ( 12)] (i.e., "B-update" in [11, Eq. ( 13)] is not performed, because we provide the covariance matrices ).In addition, we use both the oracle least square (LS) and the oracle joint minimum mean square error (MMSE) estimator, shown in Appendix B, where each estimator is provided "oracle" knowledge on the true set of active UEs.While the oracle LS estimator provides a good benchmark for channel estimation when no CDI is available at the BS, the joint MMSE estimator provides a lower bound on channel estimation performance when both the CDI and the noise variance is available at the BS.

D. Parameter Tuning
The sparse recovery algorithms require fine-tuning of their regularization parameters to yield their best estimates.While the regularization parameters depend on the different system parameters, such as N , M , K, τ p , and σ 2 , they are selected empirically by cross-validation in practice.For a fair comparison, all the parameters have been empirically tuned in advance and then fixed such that they provide overall the best performance in terms of NASE for the SNR range [0 − 16] dB.For instance, β 1 depends highly on the ratio K N , however, since the K is not known to the BS in general, we tuned β 1 based on the noise variance σ 2 as β 1 = σ 2 2 since it provided the most robust convergence.Furthermore, we set β 2 and log-sum stability parameter 0 to β 2 = 1 % and 0 = 0.07 − 0.12 % of the average norm of the effective channels.Moreover, since ADMM converges typically in few tens of iterations, a maximum number of iterations of l max = 12, k max = 5, and stopping criterion = 10 −3 were found to be sufficient for the ADMM-based algorithms to converge to their best performance.All the optimization variables for MAP-ADMM (X, V, Z, Λ z , and Λ v ) and for IRW-ADMM (X, Z, and Λ) are initialized as zero matrices.The results are obtained by averaging over 10 3 random channel realizations.

E. Results
1) Performance without Side Information: First, we examine the scenario when no CDI is available to the BS.To this end, we compare the performance of the proposed ADMM and IRW-ADMM algorithms (Algorithm 1) with F-ADM, SPARROW, SOMP, and oracle LS.The obtained results reveal that the proposed IRW-ADMM provides the best performance by achieving the highest user activity detection accuracy.In fact, the IRW-ADMM using pilot sequence length τ p = 20 is able to achieve SRR = 0.95 for SNR > 6 dB.Furthermore, even with the 25 % reduced pilot length, i.e., τ p = 15, the IRW-ADMM still outperforms the other MMV recovery algorithms by a large margin.Fig. 2(a) shows that the proposed ADMM provides similar performance F-ADM.However, we note that ADMM uses fewer regularization parameters compared to F-ADM, thus, it may be more resilient to parameter tuning.Similarly to the SRR performance, the proposed ADMM and F-ADM achieve similar NASE performance.Moreover, as the sparsity regularization parameter for both F-ADM and ADMM is based on the knowledge of the noise variance, F-ADM and ADMM shows to outperform the oracle LS for SNR < 4 dB.Finally, while SOMP provides a lower SRR performance, it outperforms ADMM, F-ADM, and SPARROW in terms of NASE for the high SNR regime.The low SRR in SOMP is caused by the high number of falsely identified inactive UEs.However, since NASE is quantified only for the true active UEs, the NASE performance does not suffer a huge degradation.In summary, the results presented in Fig. 2 highlight the remarkable gains obtained by formulating the JUICE as an iterative reweighted 2,1 -norm minimization problem.which is much faster than the proposed ADMM taking about 50 iterations to converge.
2) The Impact of Exploiting Channel Statistics: Since we have shown the superiority of IRW-ADMM over conventional sparse recovery algorithms where no knowledge on the CDI is used, next, we investigate the effect of incorporating the CDI on the JUICE performance.

G. Impact of Imperfect Knowledge of the Channel Covariance Matrix
This section investigates the impact of the training phase to estimate the second-order statistics of the channels { Ri } N i=1 on the channel estimation.More precisely, we vary the number of training samples T and we quantify the NASE performance of MAP-ADMM and the oracle joint MMSE estimator.Note that once the set of covariance matrices is generated using a particular number of samples T , it is used directly as an input to MAP-ADMM, hence, the BS does not need to update them at each MAP-ADMM iteration.as expected, increasing the number of samples T improves the channel estimation quality for both MAP-ADMM and the joint MMSE estimator as their NASE asymptotically approaches the lower bounds achieved by their counterparts that rely on perfect knowledge of { Ri } N i=1 .More interestingly, the results show that MAP-ADMM and joint MMSE requires around T = 2M samples in order to a achieve the same NASE performance to their optimal lower bound.This results indicate that MAP-ADMM is not highly sensitive to imperfect channel statistics.Finally, we note that a similar conclusion on the required number samples T to achieve near-optimal performance for the MMSE estimator is reported in [27,Sect. 3.3.3].

VII. CONCLUSIONS AND FUTURE WORK
The paper addressed the JUICE problem in grant-free access in mMTC under spatially correlated fading channels.We presented two JUICE formulations depending on the availability of CDI.If no CDI is available, we proposed an iterative reweighted 2,1 -norm optimization problem that depends only on the sparsity of the channel matrix and it is robust and invariant to different channel distributions.When the CDI is available at the BS, we approached the JUICE from a Bayesian perspective and proposed a novel JUICE formulation based on MAP estimation.Furthermore, we derived ADMM-based algorithms that feature computationally efficient closedform solutions that can be computed via simple analytical formulas.
The obtained numerical results highlight the following key findings.1) Formulating the JUICE as an iterative reweighted 2,1 -norm minimization problem provides a huge performance improvement over conventional 2,1 -norm minimization.2) While incorporating the spatial correlation of the channels increases the computational complexity of the recovery algorithms, it results in significant gains even with a smaller signalling overhead.3) The performance of the JUICE improve dramatically when moving from the conventional MIMO regime to the massive MIMO regime.4) The training phase for estimating the second-order statistics of the channel does not require a substantial amount of resources.Furthermore, MAP-ADMM is robust against imperfect channel statistics knowledge, which is conducive for practical use cases.
MAP-ADMM relies on the knowledge of the CDI at the BS, which may be challenging to acquire in practice.A potential future work is to design a sparse recovery algorithm that estimates the second-order statistics of the channels within the recovery process.Another interesting future direction would be to extend the JUICE framework into multi-cell and cell-free mMTC.

B. Joint MMSE Estimator
The received signal in (3) can be rewritten as where y = vec(Y T ) ∈ C τpM , w =vec(W T ) ∈ C τpM , and Θ S = Φ S ⊗ I M ∈ C M τp×KM .The vectorization in (38) transforms the matrix estimation into a classical vector estimation.Thus, we utilize the MMSE estimator [38] to jointly estimate the channels of the active UEs, as where Q = (ΘR diag Θ H +σ 2 I τpM ) −1 , x denotes the mean of x, and R diag denotes the covariance matrix of x S given as a block diagonal matrix whose main-diagonal blocks are given by the scaled covariance matrices Ri corresponding to the active UEs i ∈ S.
model and the canonical JUICE problem formulation.Section III addresses the JUICE with unknown CDI.Section IV derives the Bayesian formulation for the MAP-based JUICE which exploits the prior knowledge on the CDI.Simulation results are provided in Section VI, and Section VII concludes the paper.Notations: Throughout this paper, we use boldface uppercase letters (A) to denote matrices, boldface lowercase letters (a) for vectors, and calligraphy letters (S) to denote sets.The ith column of matrix X is denoted by x i .The transpose, the Hermitian, and the conjugate of a matrix are denoted as (•) T , (•) H , and (•) * , respectively.0 and 1 are vectors of all entries zero and one, respectively.The 2 -norm and the Frobenius norm are denoted as • 2 and • F , respectively.a 0 counts the number of non-zero entries of vector a. 1(a) is an indicator function that takes the value 1 if a = 0, and 0 otherwise.⊗ denotes the Kronecker product and vec(•) denotes the operation of column-wise stacking of a matrix.

Fig. 1 :
Fig. 1: Illustration of a typical mMTC scenario: (a) an mMTC uplink system with K active UEs and N − K inactive UEs, (b) sporadic transmission, (c) division of a coherence interval T c .
a) follows from the Markov chain γ → X → Y and because p(Y) does not affect the maximization and (b) follows from the additive Gaussian noise model in (3).The term p(X|γ)

Fig. 2 (
Fig. 2(a) illustrates the obtained SRR against SNR for the different sparse recovery algorithms.

Fig. 2 :
Fig. 2: Performance of JUICE with no side information in terms of SRR and NASE against SNR for N = 200, M = 20, and K = 10.

Fig. 2 (
Fig. 2(b) depicts the channel estimation performance for the different recovery algorithms in terms of NASE against SNR, including the comparison to the genie-aided LS benchmark.It can be readily seen that for τ p = 20, the performance of the proposed IRW-ADMM nearly matches the performance of the genie-aided LS.Furthermore, IRW-ADMM with a reduced pilot sequence length of τ p = 15 still outperforms ADMM, F-ADM, SOMP and SPARROW for SNR > 8 dB.

Fig. 3 Fig. 3 :
Fig. 3 presents a typical convergence behavior of ADMM, IRW-ADMM, and F-ADM for SNR = 16 dB.The figure shows the number of iterations required for the algorithms to converge to the optimal performance.The results reveal that IRW-ADMM using τ p = 20 takes approximately 40 iterations to convergence, with a slower convergence when reducing the pilot sequence length to τ p = 15.On the other hand, different from the SRR and NASE performance where ADMM and F-ADM provide similar performance, F-ADM converges in about 20 iterations

First, we quantify
the activity detection accuracy performance of the proposed MAP-ADMM and compare it to IRW-ADMM and T-SBL.Fig. 4(a) presents the SRR against SNR for the proposed algorithms for different values of pilots lengths.The results clearly show that MAP-ADMM provides superior performance compared to IRW-ADMM.For instance, MAP-ADMM identifies the set of true active users S perfectly for SNR > 8 dB using a pilot length τ p = 20.Furthermore, reducing the pilot length by a factor of 25 % (i.e., τ p = 15) affects the performance of MAP-ADMM only moderately and optimal performance is achieved for SNR > 10 dB.More interestingly, the results indicate that even with 40 % reduction in the pilot sequence length (i.e., τ p = 12), MAP-ADMM provides 95 % SRR rate for SNR > 10 dB.Finally, the results show that while T-SBL suffers from an inferior performance in the low SNR regime, it provides an optimal activity detection performance when SNR > 8 dB.

Fig. 4 (
Fig. 4(b) illustrates the channel estimation performance in terms of NASE for MAP-ADMM against SNR for different pilot lengths and compares it to IRW-ADM, T-SBL, and the oracle MMSE benchmark.The proposed MAP-ADMM indisputably provides superior performance and significant improvement over IRW-ADMM.For instance, given the same pilot sequence length of τ p = 20, MAP-ADMM achieves the same performance as IRW-ADMM while using up to 6 dB lower SNR.Furthermore, Fig. 4(b) reveals one advantageous feature of utilizing available

Fig. 5 :
Fig. 5: The performance of the proposed algorithms in terms of NASE and SRR versus the number of ADMM iterations for the proposed algorithms for N = 200, M = 20, and K = 10.

Fig. 6 :
Fig. 6: JUICE performance in terms of SRR and NASE versus the number of BS antennas M for N = 200, K = 10, τ p = 20, and SNR = 16 dB.

Fig. 7
Fig. 7 depicts the NASE versus the number of samples T used to generate { Ri } N i=1 for M = 20 and M = 40 at SNR = 16 dB.The regularization parameters for MAP-ADMM and IRW-ADMM are fixed to the ones providing the best results when perfect knowledge of { Ri } N i=1 is available.First, Fig. 7 indicates that using a low number of training samples T is detrimental to the performance of MAP-ADMM and the joint MMSE estimator as they require at least T > M 2 training samples to achieve the same performance as IRW-ADMM.Second,

TABLE I :
Computational complexity for different recovery algorithms, where (k) is the iteration index and K is the estimated number of non-zero element at any particular iteration