Coordinated Pilot Transmissions for Detecting the Signal Sparsity Level in Massive IoT Networks

Grant-free protocols exploiting compressed sensing multi-user detection (MUD) are appealing for solving the random access problem in massive Internet of Things (IoT) networks with sporadic device activity. Such protocols would greatly benefit from prior deterministic knowledge of the sparsity level, i.e., the instantaneous number of simultaneously active devices <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>. Aiming at this, herein we introduce a framework relying on coordinated pilot transmissions (CPTs) for detecting <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>. Specifically, the proposed CPT mechanism includes a downlink (DL) phase for channel state information acquisition that resolves fading uncertainty in the uplink (UL) transmission phase using shared UL pilot symbols for channel compensation. We propose a signal sparsity level detector and analytically assess its accuracy when network channels are subject to Rayleigh fading. We show that the variance of the estimator increases with <inline-formula> <tex-math notation="LaTeX">$K$ </tex-math></inline-formula>, and its distribution approximates that of the sum of a Student’s <inline-formula> <tex-math notation="LaTeX">$t$ </tex-math></inline-formula> and Gaussian random variable. The numerical results evince the need for carefully configuring the duration of the DL and UL phases. Indeed, we show that relatively short DL phases are preferable in highly sparse networks given the total CPT duration is fixed. Finally, we discuss and exemplify with some early results the potential of the proposed CPT framework for MUD, and highlight relevant research directions.

Grant-free multiple-access protocols are particularly attractive for mMTC since they [1], [2], [3]: i) promote efficient spectrum utilization as each device is not assigned a dedicated transmission resource block, ii) reduce signaling overhead, and iii) improve energy efficiency of the devices.Note that due to the massiveness of the network, it is impossible to assign orthogonal pilot sequences/preambles to the devices, thus, motivating the need for grant-free non-orthogonal multiple access protocols.However, a key challenge here lies in efficiently identifying the set of sporadically active, nonorthogonally coexisting, devices and their data, for which collision resolution mechanisms are required [3], [4].
We can distinguish two basic types of collisions: hard and soft.The former occurs when exactly the same preamble is used simultaneously by several active devices.In contrast, the latter occurs when active devices use different non-orthogonal preambles, as they interfere to some extent with each other.The probability of hard/soft collisions increases/decreases as the number of available preambles reduces.Since hard collisions are difficult to resolve without relying on sufficiently orthogonal channel subspaces [8], [9], [10] and/or additional communication overhead, increasing the pool of non-orthogonal preambles (thus, favoring the occurrences of soft instead of hard collisions) is usually recommended in practice [2], [3], [4].A promising class of soft collision resolution methods, known as compressed sensing (CS) techniques, have been considered for multi-user detection (MUD) in mMTC [11], [12].Note that MUD may include both user activity detection (UAD) and data detection, jointly or separately.However, in the following and to simplify our exposition, we refer by MUD either to i) UAD alone, in the case that data detection is implemented separately, or ii) both UAD and data detection, in the case they are implemented jointly.

A. Related Work
CS-MUD is usually based on regularization, greedy, message-passing (MP), and/or artificial intelligence (AI) techniques.
1) Regularized MUD relies on transforming the highly nonconvex CS-MUD problem to convex via regularization and iterative procedures.For instance, Zhu and Giannakis [13] proposed a ridge detector and a least absolute shrinkage and selection operator detector, which directly regularize the original CS-MUD problem based on l 2 − and l 1 −norm, respectively.Later, some sparsity-aware successive interference cancellation regularization techniques were proposed in [14] and [15] aiming at lowering the detection complexity by sequentially recovering the transmitted symbols.Meanwhile, Renna and Lamare [16] incorporated a l 1 − norm regularization into an iteratively updated linear minimum mean square error filter and a constellation-list scheme to enable sparse detection.Moreover, joint user identification and channel estimation approaches using the alternating direction method of multipliers (ADMM) were proposed in [17], [18], and [19].Finally, Gao et al. [20] proposed a low complexity coordinate descent mechanism for the CS-MUD problem.
2) Greedy MUD has low complexity and often only requires appropriate termination tuning of the transmitted signal/vector reconstruction.Schepker and Dekorsy [21] applied for the first time orthogonal least squares and orthogonal matching pursuit (OMP) greedy algorithms to a sparse mMTC scenario.Since the latter outperforms the former, the latest research on greedy MUD has focused mainly on OMP-based algorithms.For instance, Schepker et al. [22] proposed group OMP leveraging channel decoders for greater performance, while Xiong et al. [23] proposed a detection-based OMP algorithm that, unlike conventional OMP, does not rely on prior knowledge of the signal/device sparsity (the number of active devices).Specifically, it runs a binary hypothesis test on the residual vector of OMP at each iteration, while stopping when there is no signal component in the residual vector.Meanwhile, a noise-robust greedy algorithm exploiting a posteriori probability ratios for every index of sparse input signals is designed in [24].Lee and Yu [25] leveraged a priori information on the activation probability of each device to improve the performance of several greedy MUD schemes in mMTC, and showed that they are robust against prior information inaccuracy.Finally, Xiao et al. [26] proposed a MUD mechanism exploiting backward signal sparsity estimation.The latter is implemented by modifying the classical sparsity adaptive matching pursuit algorithm [27] to deal with data length diversity coming from the exploitation of repeating and spreading sequences.
3) MP-based MUD constitutes a class of algorithms that exploit factor graphs, thus the a posteriori distribution of the signal to be reconstructed.In practice, due to the large-scale nature of the access problem in mMTC, the usual approach is to adopt/design approximate MP (AMP) algorithms relying on iterative thresholding, which also allows analytic performance characterization via the so-called state evolution.For instance, Chen et al. [28] derived efficient denoisers for AMP depending on whether the large-scale component of the channel fading is known.Senel and Larsson [29] analyzed and proposed algorithmic enhancements for coherent and non-coherent MUD based on AMP.Meanwhile, Ke et al. [30] designed non-orthogonal pseudorandom pilots for massive UL broadband access.They formulated active user detection and channel estimation as a generalized multiple measurement vector CS problem and solved it via a generalized multiple measurement vector AMP algorithm.The suitability of AMP for joint device activity detection and channel estimation of devices coexisting with mobile broadband services is assessed and promoted in [31].Wang et al. [32] designed an AMP algorithm that exploits the temporal activation correlation of the devices, and showed the achievable performance gains.Renna and Lamare [33] proposed the so-called bilinear message-scheduling generalized AMP, which uses the channel decoder's beliefs to refine activity detection and data decoding.An AMP-aided CSI estimator and MUD is proposed in [34], where the authors use a multi-state Markov chain-based transmission model to characterize the diverse time-varying traffic demands of the users.Finally, Ke et al. [35] proposed an AMP-based unified semi-blind detection framework for grant-free sourced and unsourced random access aiming to facilitate massive ultra-reliable low-latency (URLLC) in massive multiple-input multiple-output (MIMO) systems.
4) AI-based MUD leads to direct detection decisions as the detection parameters are learned and configured on the go, thus, avoiding empirical parameter tuning.Deep learning is the most commonly used AI tool for solving the CS-MUD problem [36].Some examples of deep learning-based MUD can be found in [37], [38], [39], and [40].Specifically, Bai et al. [37] proposed a fast data-driven algorithm for CS-MUD in mMTC relying on a novel block restrictive activation nonlinear unit that nicely captures the system sparsity.Meanwhile, Cui et al. [38] designed two modeldriven approaches, which effectively utilize features of sparsity patterns in designing common measurement matrices and adjusting the state-of-the-art detectors/decoders.Interestingly, the optimum depth, i.e., the number of layers, to be configured in a deep neural network varies according to the sparsity statistics, which motivated the work in [39].Therein, the authors proposed to autonomously/dynamically update the number of layers in the inference phase by introducing an extra halting score at each layer.Yu et al. [40] proposed a deep learning approach consisting of a preamble detection neural network for a first tentative/rough MUD followed by a data detection neural network exploiting the information data signals to refine MUD.Finally, AI-based MUD may also leverage Bayesian learning [41], [42], [43], [44].Indeed, Zhang et al. [41] developed two CS-MUD Bayesian inference algorithms exploiting sparse prior information of the estimated channel vector.Similar approaches, but also exploiting the correlation of user activity over successive access slots, are proposed in [42].Meanwhile, Marata et al. [43] proposed a unified framework for non-coherent and coherent mMTC and enhanced mobile broadband data transmissions, respectively, including a proper pilot design.MUD for clustered MTC is explored in [44] by utilizing the approximation error method to account for errors in the sensing matrix and likelihood function.In addition to Bayesian learning, the works in [43] and [44] also assessed the system performance under regularized, greedy, and MP-based MUD algorithms.
We consider an mMTC deployment under quasi-static fading, where K random devices become active and aim to communicate with a coordinator.Our main contributions are three-fold: • We introduce a coordinated pilot transmission (CPT) framework for detecting2 the signal sparsity level, K, in time division duplex (TDD) systems and to be implemented prior to the MUD.Specifically, the CPT mechanism consists of a downlink (DL) broadcast transmission using N 1 symbols for the purpose of channel state information (CSI) estimation, and an UL transmission with channel inversion (power and phase) control using N 2 shared symbols to resolve the fading uncertainty at the coordinator.Note that the use of shared pilot symbols is a key innovation here.After this, the signal sparsity level, K, is detected based on the signal received at the coordinator by performing a relaxed (realdomain) estimation followed by a rounding-to-the-nearest operation.
• We assess the performance of the proposed CPT mechanism and signal sparsity level estimator in Rayleigh fading channel conditions.Specifically, we characterize analytically the permissible maximum power and average power consumption of the devices, and the probability that an active device cannot transmit due to insufficient power to compensate for the channel losses.Moreover, we demonstrate and corroborate numerically that the estimator's variance increases linearly with K and that its distribution matches approximately that of the sum of a Student's t and a Gaussian random variable.Moreover, we provide a semi-closed-form approximation for the detection success probability under the proposed signal sparsity level estimator, which is valuable for system design/optimization purposes.• We show that the attainable accuracy performance depends on the specific allocation of N 1 , N 2 rather than on the total number of CPT symbols N = N 1 +N 2 alone, thus, motivating a proper optimization of the DL and UL duration.Specifically, we illustrate that short DL phases are preferable in highly sparse networks (with small realizations of K) given a fixed N .Moreover, we motivate the proposed CPT + MUD over the conventional standalone MUD implementation by presenting and discussing some preliminary results on their attainable MUD performance.Finally, we discuss several attractive research directions related to CPT to pursue in the sequence.

C. Organization
The remainder of this paper is organized as follows.
Section II presents the system model and introduces the proposed CPT mechanism and signal sparsity level estimator.The accuracy of the proposed estimator is assessed in and X ∼ T (ν), are respectively a Gaussian, a circularly-symmetric complex Gaussian, a Rayleigh, and a Student's t with ν degree of freedom, random variable.Finally, Table I lists the main symbols used throughout the paper.

II. SYSTEM MODEL & PROPOSED CPT
Consider an mMTC deployment, where a set Q of devices is served by a single coordinator, e.g., a base station or an aggregator.It is assumed that all devices and the coordinator are equipped with a single antenna. 3Assume that time is slotted and active devices must wait for the next immediate time slot to start a (synchronous) transmission.Let us denote by h i the channel coefficient of the link between the coordinator and the i−th device, and assume that the channels are subject to quasi-static fading and remain unchanged during each time slot.In addition, DL and UL channels are reciprocal, which is motivated by the use of the same frequency band and a TDD operation [4], [6], [39].
As illustrated in Fig. 1, the proposed CPT mechanism for estimating the sparsity level, K, consists of a DL and a UL pilot transmission phase.This is followed by the transmission of training and data symbols in the case of coherent MUD, or only data symbols in the case of non-coherent MUD.

A. DL Phase
At the beginning of each time slot, the coordinator sends a broadcast pilot signal v ∈ C N1 comprising N 1 symbols.The signal received by the i−th device is given by where ||v|| 2 = N 1 , p is the per-symbol average transmit power of the coordinator, and w i [n] ∼ CN (0, σ 2 i ) is the n−th AWGN sample at the i−th device.For simplicity, we assume σ 2 i = σ 2 , ∀i.
This DL broadcast pilot transmission phase is leveraged by the active devices to estimate their corresponding channel coefficient since the UL and DL channels are reciprocal.Specifically, the minimum variance unbiased estimator of h i and the corresponding estimation error are respectively given by with ĥi = h i + hi .

B. UL Phase
Note that the transmission of DL pilots for the acquisition of DL / UL CSI (and corresponding power control, precoding/beamforming design, and other channel-aware resource allocation mechanisms) is widely used in TDD systems [4], [6], [9]. 4 The innovative part of our proposal lies in how this information is exploited for UL pilot transmissions.Specifically, we propose that, after the DL CSI acquisition phase, the active devices exploit the remaining N 2 symbols for sending a common/shared pilot sequence s ∈ C N2 , with |s[n]| 2 = 1 ∀n, but phase shifted as e −ı∠ ĥi s = ĥ * i s/| ĥi |, thus, aiming at a coherent signal combination at the coordinator.We adopt a channel inversion power control such that the i−th device transmits with power ρ | ĥi| 2 given a target receive power ρ.The signal received at the coordinator, y ∈ C N2 , is given by for is the AWGN sample at the coordinator.Finally, the last step in (4) follows after using ĥi = h i + hi and setting which denotes the uncertainty in the UL receive signal related to the CSI estimation error.

C. Signal Sparsity Level Estimator
The signal y received at the coordinator is used to detect the signal sparsity level, K, among all the Q+1 possible hypotheses: Here, notice that the distribution and statistics of φ K are completely unknown given no prior modeling assumption of the channel.Therefore, probably the wisest thing to do is to relax the integer detection problem to a real-domain continuous estimation as suggested by [51].Moreover, observe that Thus, we can use the method of moments in estimation theory [50] to obtain the relaxed estimator and set K = round( K r ).Note that E[ K] = K, i.e., the estimator is unbiased.Good detection accuracy is expected if , σ 2 ≪ ρK, as should occur in practice by design.
Finally, note that our CPT proposal cannot be applied in setups where channel reciprocity does not hold, such as frequency division duplex systems.

III. ACCURACY OF THE PROPOSED ESTIMATOR
In the following, we adopt a channel model for the purpose of assessing the detection accuracy of the proposed CPT mechanism and corresponding estimator.Specifically, channels are assumed to be subject to quasi-static Rayleigh fading such that h i ∼ CN (0, β i ), where β i is the average channel power gain.Using this, together with (2) and (3), one obtains hi where ϱ ≜ p/σ 2 is the DL transmit SNR, and N1ϱ .Moreover, we consider that the transmitting (thus, detectable) devices are only the active devices whose channels are not deeply faded, i.e., those with In practice, µ must be set based on the permissible maximum power and/or average power consumption, which are given respectively by where the last step comes from exploiting that λ i is an exponential random variable with mean ϑ i , and using the definition of the upper incomplete gamma function [52, eq.8.2.2].Finally, the probability that an active device cannot transmit due to insufficient power to compensate for the channel losses is given by

A. Variance of the Relaxed Estimator
In the following, we analyze the variance of the relaxed estimator (6).First, let us define V ≜ ℜ{s H w/(N 2 √ ρ)} and depart from ( 6) to obtain where (a) follows from using (5), setting γu ≜ ρ/σ 2 , which denotes the target receive SNR in the UL, and using var comes after exploiting the fact that ĥ *

equivalently
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
distributed as hi /| ĥi | since ĥ * i /| ĥi | is uniformly distributed in the unit circle and independent of hi /| ĥi |, thus, it does not alter the latter's distribution.Finally, (c) uses Observe that Z i can be written as Then, since X i and Y i are independent and E[X i ] = 0, we proceed as follows where (a) comes from using E[X 2 i ] = 1 and the integral form of E[Y −2 i ], (b) comes from substituting (13), while (c) is attained by applying simple algebraic transformations and using the definition of the exponential integral [52, eq.(6.2.1)].Now, by substituting ( 14) into ( 12), one attains where g(x) = xe x E 1 (x).Meanwhile, in the last line, we use ϑ ≜ min i∈Q ϑ i motivated by the fact that var[ K r ] is a decreasing function of ϑ i .This can be corroborated by noticing that g is an increasing function of x as both bounds in [52, eq.6.8.1] increase with x.All this shows that the worst-case scenario is where the active devices are the farthest from the coordinator and, thus, are characterized by the smallest ϑ i .Hereinafter, we focus on the worst-case deployment scenario, i.e., the active devices are at the edge of the service area such that ϑ i = ϑ, ∀i ∈ K.Then, using ( 15) and ( 16), one obtains is always smaller than 1/12 as this corresponds to the worstcase scenario, where

B. Distribution of the Estimator
In general, and especially for the considered setup, the estimator variance cannot be directly used to quantify, at least thoroughly, the performance degradation due to detection mismatches.Instead, the distribution of the classification results must be taken into account.
The PMF of K can be found as Notice that the distribution of the relaxed estimator, Kr , is needed for computing p K ( k).Hence, the problem translates to finding F Kr ( k), for which we rewrite (6) as and proceed as follows.
The distribution of Z i is derived as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where (a) comes from differentiating under the integral sign by leveraging Leibniz rule and dF Xi (yz)/dz = yf Xi (yz), and (b) follows from substituting f Xi (x) = e −x 2 /2 / √ 2π, and f Yi (y) as given in (13).The indefinite integral is solved in (c) using [54, eq. ( 2.325.6)],while (d) follows directly after evaluating the definite integral limits.Fig. 2 illustrates the shape of f Zi (z) for different values of µ/ϑ.By using (11), one obtains p out = 0.6321 already for µ/ϑ = 1, thus, we only considered configurations with µ/ϑ ≤ 1, which are required to guarantee a relatively small p out,i .Notice that f Z (z) is symmetric around 0, which is expected since X i is a zero-mean Gaussian random variable and Y i ≥ 0, and is bell-shaped.
With (20) at hand, the CDF of K r can be obtained as follows where Unfortunately, evaluating ( 21) becomes computationally expensive and often unaffordable, especially when K ≫ 1 due to the increased number of integration operations.
To address the above issue, herein we exploit the fact that T ≜ i∈K Z i is approximately distributed as a scaled Student's t distribution w 1 (1 − 2/ν)T (ν), where ν is the solution to and See the Appendix for the proof and accuracy-related discussions.From (19), this implies that the distribution of the relaxed estimator is symmetric around K and accurately matches the distribution of the sum of a Gaussian and a Student's t random variable.Now, let us denote T ′ ∼ T (ν), where ν is the solution to (22).Then, one attains where (a) comes from using the first line of ( 21) together with the definition of T , while (b) follows from using the distribution of V and T ′ .Notice that by using [55, eq. ( 3)], one can state the integral operation in ( 25) as an infinite sum that includes factorials, incomplete gamma, and confluent hypergeometric functions.However, such an approach may not significantly reduce the mathematical complexity of numerical computing ( 25), so we do not adopt it here.Finally, observe that computing ( 25) is much less computationally demanding than (21) since only two integrals must be evaluated, i.e., (25) and ω 2 in (24), independently of the value of K.

IV. NUMERICAL RESULTS
In this section, we numerically analyze the performance of the proposed CPT mechanism under Rayleigh fading channel conditions.For this, we resort to Monte Carlo simulations and the analytical approximations derived in Section III, which are shown to match closely.Performance is evaluated in terms of the detection success probability given by p K (K), which can be approximately obtained from evaluating (18) using (25), and the variance of the relaxed estimator, which is analytically characterized in (17).
We consider the worst-case deployment scenario, where all devices are at the edge of the service area, so β i = β ∀i.Unless stated otherwise, we consider a massive deployment of Q = 1000 devices, out of which K = 5 become active at each time slot, and β = −120 dB, which may correspond to a link distance in the order of 500 m [29].Also, we set N 1 = 2 and N 2 = 4 such that N = 6 symbols are dedicated to CPT.This may be a reasonable choice considering that the overall transmission time may comprise many more symbols depending on the data traffic and connectivity solution. 5Let p = 30 dBm, µ = −140 dB, and ρ = −115 dBm such that the maximum power allowed (9) and the average power consumption (10) of the devices are 316.2mW and 12.9 mW, respectively, while the probability that an active device cannot transmit due to insufficient power to compensate for the channel losses is 10 −2 .Finally, we set σ 2 = −120 dBm by assuming a transmission bandwidth of 180 kHz.
A. On the Detection Scalability Fig. 3 depicts the PMF of K considering several values of K.The results here corroborate the symmetric shape of the distribution of K r (and K), which matches approximately that of the sum of a Student's t and a Gaussian random variable.Moreover, the accuracy of the estimation decreases with K.The latter phenomenon can be more clearly appreciated in Fig. 4, where both the variance of the relaxed estimator (Fig. 4a) and the corresponding detection success probability (Fig. 4b) are plotted against K for β ∈ {−130, −120} dB.Indeed, the variance of the relaxed estimator increases linearly with K and decreases with β as predicted by (17), while being lower-bounded by the noise variance level, i.e., 1 2N1 γu .The quantitative impact of such behavior is captured by the detection success probability metric, which shows, for instance, that the signal sparsity level, K, is predicted with an accuracy of 91%, and 98% for K = 5 when β = −130 dB, and β = −120 dB, respectively, while such figures decrease to 76%, and 97% when K = 15.Detection success probability as a function of N 1 .We set B. How Many CPT Symbols Are Needed?Fig. 5 shows the detection success probability as a function of N 1 for a fixed number of CPT symbols N = 6.This is, Note that the allocation of the DL/UL symbol significantly influences the detection success probability.Indeed, a relatively small/large ρ makes the DL phase less/more performance sensitive, thus motivating the allocation of less/more pilots to it for optimum performance.For instance, the optimum pilot allocation is N 1 = 1 and N 1 = 3 when ρ = −115 dBm and ρ = −110 dBm, respectively.Observe that the optimum configuration of N 1 , N 2 is also the one that minimizes var[ K r ] since the estimator's distribution is symmetric around K. Since N 1 , N 2 are positive integers and N is usually small in practical setups, a brute force mechanism suffices to solve arg min N1,N2 var[ K r ] subject to Meanwhile, Fig. 6a illustrates the potential performance improvements from increasing the total number of CPT symbols N .Herein, we test the performance under all possible combinations of (N 1 , N 2 ) with N 1 + N 2 = N and select the one leading to the best detection success probability (or simply, minimum estimator's variance), which appears depicted in Fig. 6b.Observe that the probability of detection success increases rapidly with N , which represents the degrees of freedom to resolve the uncertainties related to fading in the system.Moreover, as K increases, it is more beneficial to allocate more symbols to the DL phase, which is a behavior that can be deduced from (17).Specifically, the first term of ( 17) increases (decreases) with K (N 1 ), thus, these values must be traded-off for best performance.However, the value of K is not known beforehand; therefore, in practice, the optimization of N 1 , N 2 can only be performed based on the statistical expectations of the number of active users K.

C. A Primer on the Performance of CPT + MUD
Next, we briefly illustrate how the performance of MUD, which comprises only UAD for simplicity, would benefit from incorporating the CPT mechanism proposed in this paper.For this, we consider that the coordinator is equipped with a 64-antenna array, although a single antenna is used for the purpose of signal sparsity level estimation using CPT.We adopt two fundamental CS-MUD algorithms: OMP [21], [23] and AMP [29], [31]. 6In the CPT + MUD implementation, K is first estimated using N symbols, then, MUD is executed employing M symbols in such a way that only the K devices with the strongest estimated channel power, in the case of AMP, or the K devices first detected, in the case of OMP, are declared active.We compare our proposed approach with the standard standalone MUD implementations leveraging N +M symbols, where a device is detected if its associated estimated channel power, in the case of AMP, or the residual signal power immediately before detecting the device, in the case of OMP, exceeds a pre-defined threshold ζ.In the following, we assume that the devices use Bernoulli pilots as in [29] and become active with probability ϵ = 0.01, thus, there are ϵQ = 10 devices active on average in the network.Fig. 7 shows the activity detection error rate as a function of the detection threshold ζ, which is only used in the standalone MUD implementations.Here, a relatively small ζ tends to decrease the miss-detection probability but at the expense of more false-alarm events.In comparison, a relatively large ζ tends to decrease the false-alarm probability at the expense of the occurrence of more miss-detection events.Indeed, notice that as ζ → 0 and ζ → ∞, the activity detection error converges to 1 − E[K]/Q = 0.99 and E[K]/Q = 0.01, respectively.This motivates the need to carefully tune ζ for optimum performance as discussed in Section I-A.Meanwhile, the proposed CPT + MUD completely avoids this tuning problem and, according to Fig. 7, can significantly outperform the standalone MUD implementations if the latter are not optimally/ideally configured as in practice.Note that the AMP outperforms OMP-based in terms of average error rate for relatively appropriate threshold choices, although at the cost of greater complexity and convergence time.As one may expect, when the number of available symbols is relatively small, the application of CPT is not advisable, and one may resort to standard standalone MUD approaches.However, as the number of available symbols increases, CPT assistance becomes more appealing.Interestingly, this depends on the sparsity of the network such that the number of symbols dedicated to CPT should be greater (smaller) for a smaller (greater) ϵ.Indeed, considering Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.ϵ = 0.01 and comparing with the ideal standalone OMP-based MUD, 0, 2, 3, 4, and 6 CPT symbols are recommended when the total number of symbols is [1,22], [23,25], [26,30], [31,33], [34,36], respectively.Meanwhile, when the system sparsity degrades such that ϵ = 0.03, the MUD phase must be prioritized and CPT symbols may only be needed when the total number of available channels is as large as 33 considering a sub-optimal selection of ζ for a standalone MUD.Meanwhile, it always holds that N 2 ≥ N 1 is preferable as also illustrated in Fig. 6.All in all, the results here evince that a CPT-assisted MUD may provide significant performance gains relative to a standalone implementation, even if the detection threshold for the latter is optimally selected, considering the availability of a relatively large number of detection symbols.

V. CONCLUSION AND FUTURE RESEARCH DIRECTIONS
In this work, we introduced a framework for detecting the number K of devices that become active, i.e., signal sparsity level, in an mMTC network.Specifically, the proposed CPT mechanism consists of a DL transmission using N 1 symbols for the purpose of CSI estimation, and an UL transmission with channel inversion (power and phase) control using N 2 shared symbols to resolve the fading uncertainty at the coordinator.We presented an efficient estimator for K based on such a UL signal and illustrated with some early results its crucial role for sparse signal recovery algorithms aiming at accurately identifying the specific set of active devices.
Regarding the signal sparsity level estimator, we analytically characterized its variance and distribution when the network channels are subject to Rayleigh fading.We showed that the estimator's variance increases linearly with K and that its distribution approximates that of the sum of a Student's t and a Gaussian random variable.The provided analytical framework allows tractable computation and optimization of the detection success probability, thus, becoming valuable for system design/analysis purposes.The numerical results showed that the attainable accuracy performance depends on the specific allocation of N 1 , N 2 rather than on the total number of CPT symbols N = N 1 +N 2 alone, thus, motivating a proper optimization of the DL and UL phases.Indeed, we revealed that relatively short DL phases are preferable in highly sparse networks (with small realizations of K) given a fixed N .
To conclude, below we enumerate some attractive research directions that we would like to pursue in the sequence.Note that they aim to address key limitations of our current work such as the fact that the proposed CPT and signal sparsity level estimator are i) completely agnostic of the statistics of K, ii) designed only for single-antenna systems, iii) derived assuming perfect network synchronization, and iv) not jointly optimized with MUD.
1) Exploiting Prior Statistical Knowledge of K: We have not assumed any statistical knowledge of K.In practice, the coordinator might have some prior expectations based on traffic history, which can be leveraged for more accurate CPTbased detectors.In future work, our aim is to design CPT-based detectors exploiting traffic history.
2) CPT Optimized for MIMO Systems: MIMO technology is key for successful MUD, especially in mMTC networks with sporadic device activations.Therefore, adapting our proposed CPT framework to MIMO setups is undoubtedly appealing.Due to the overhead introduced by multi-antenna CSI training and the limited number of CPT symbols that may be available, an efficient proposal can rely on a compressed CPT training phase that limits the number of communicating antennas and/or exploits efficiently configured precoders/combiners.
3) CPT for Imperfectly Synchronized Networks: The proposed CPT mechanism requires network synchronization, as assumed throughout the paper.However, IoT devices may lack precise clocks and have heterogeneous hardware capabilities and protocol stacks, making it challenging to achieve accurate network synchronization [1], [2].Mitigating the impact of timing discrepancies, especially for critical IoT networks, e.g., supporting URLLC [58], typically involves implementing time synchronization protocols and error-handling mechanisms, but perfect synchronization may not be achieved.Therefore, it is interesting to investigate/analyze the performance of CPT under imperfect synchronization conditions and propose synchronization error countermeasures, if applicable.

4) Joint CPT & MUD Optimization:
The proposed CPT mechanism spanning over N symbols and aiming to determine the number of active devices is followed by MUD occupying M symbols, where the specific set of active devices is detected.Note that the number of active devices detected by CPT works as a prior for MUD mechanisms.An interesting question that we aim to address in future work is how to efficiently allocate the CPT and MUD symbols given that N + M is constrained.For this, one needs to jointly assess the performance of both CPT and MUD mechanisms, which ultimately reveals the (practical) achievable performance of MUD.Some early insights were provided in the discussions around Fig. 8, but dedicated research and trade-off analysis are still required.Finally, comparisons with state-of-the-art MUD approaches that intrinsically implement signal sparsity level estimation, e.g., [26] and [27], must be conducted.

Fig. 6 .
Fig. 6. a) Optimum detection success probability (top), and b) optimum N 1 as a function of N , thus, N opt 2 = N − N opt 1 .Here, the markers denote the results from Monte Carlo simulations.

Fig. 7 .
Fig. 7. Average activity detection error rate as a function of the detection threshold ζ.We set N = 6 and M = 18.

Fig. 8
illustrates the performance of standalone and CPT-assisted MUD mechanisms as a function of the number of symbols N + M .In this case, we adopt only OMP-based MUD algorithms for simplicity and illustrate the results corresponding to the optimum selection of N 1 , N 2 , and M .In the case of the standalone MUD mechanisms, we consider two configurations: one where the detection threshold ζ is optimal, i.e., ζ = ζ ⋆ , which corresponds to an ideal standalone configuration, and another where ζ can randomly deviate up to 0.25 dB from the optimal, i.e., ζ (dB)∈ [ζ ⋆ (dB)−0.25 dB, ζ ⋆ (dB)+0.25 dB].

Fig. 8 .
Fig. 8. Average activity OMP-based detection error rate as a function of the total number of symbols N + M .The results correspond to the optimum configuration of N 1 , N 2 , and M .

TABLE I MAIN
SYMBOLS USED THROUGHOUT THE PAPER Section III, while Section IV discusses numerical results.Finally, Section V concludes the article and highlights further research directions.Notation: Boldface lowercase letters denote column vectors.Superscripts (•) * and (•) H denote the complex conjugate and Hermitian operations, respectively.|| • || is the Euclidean norm of a vector, | • | is the absolute (or cardinality for sets) operation, and round(•) denotes the rounding-to-thenearest rounding operation.Pr[A] denotes the probability of the occurrence of event A, while A|B denotes a random variable A conditioned on B. E[ • ] and var[ • ] output the expected value and variance of the argument, respectively.ℜ{•} (ℑ{•}) outputs the real (imaginary) part of the argument.Additionally, E 1 ( • ) is the exponential integral [52, eq.(6.2.1)], erfc( • ) is the complementary error function [52, eq.(7.2.2)], and Γ( • ) is the complete gamma function [52, eq.(5.2.1)].C is the set of complex numbers, and ı = √ −1 is the imaginary unit.f X (x) and F X (x) denote the probability density function (PDF) and cumulative distribution function (CDF), respectively, of a continuous random variable X, while p Y (y) denotes the probability mass function (PMF) of a discrete random variable