Semi-Grant-Free Orthogonal Multiple Access With Partial-Information for Short Packet Transmissions

Traditional multiple access schemes, as well as more recent preamble-based schemes, cannot achieve the extremely low latency, complexity, and collision probability required by the next-generation Internet-of-Things (IoT) networks to operate. To address such issues and further reduce the latency and packet loss, we introduce a novel semi-grant-free multiple access protocol for short packet transmission, the partial-information multiple access (PIMA) scheme. PIMA transmissions are organized in frames, and in the partial information acquisition (PIA) sub-frame of each frame, the base station (BS) estimates the number of active devices, i.e., the devices having packets waiting for transmission in their queue. Based on this estimate, the BS chooses both the total number of slots to be allocated in the data transmission (DT) sub-frame and the respective user-to-slot assignment. Although collisions may still occur due to multiple users assigned to the same slot, they are drastically reduced with respect to slotted ALOHA-based schemes, while achieving lower latency than both time-division multiple-access (TDMA) and preamble-based protocols, due to the extremely reduced overhead of the PIA sub-frame. We assess the performance of PIMA under various activation statistics, proving the robustness of the proposed solution to the traffic intensity, also with traffic bursts.


I. INTRODUCTION
Machine-type communications (MTC) is a use case of beyond-fifth-generation (B5G) cellular networks with increasing relevance, as the focus of mobile communications shifts from humans to machines.To support this scenario, several new technological solutions should be adopted, including specific multiple access schemes for both flavors of MTC, i.e., massive MTC (mMTC) and ultra-reliable low-latency communications (URLLC).Both show several distinct features and challenges to meet the requirements of emerging internet-ofthings (IoT) applications and services.For example, URLLC use cases target a maximum latency of 1 ms and reliability of 99.99999% (e.g.mission-critical applications), while mMTC Part of this paper has been presented at the International Conference on Ubiquitous and Future Networks (ICUFN) [1].scenarios require supporting devices with density up to 1 million devices per km 2 (e.g., for industrial IoT).The focus of this paper is the medium access control of uplink transmissions in an MTC scenario, wherein users (or machine-type devices (MTDs)) transmit to a common base station (BS).Although access procedures based on resource requests and grants have been discussed for MTC [2], mMTC [3], and URLLC [4], the sporadic nature of transmissions by a large number of users makes these schemes inefficient, and a grant-free (GF) or semi-GF solution is to be preferred.GF approaches address this issue since users can transmit data immediately, without waiting for approval from BS. GF approaches include several different techniques, which can be classified into uncoordinated and coordinated random access (RA) [5].A survey of the literature related to these alternative schemes is provided in Section II.In any case, existing multiple access solutions are either based on the exact knowledge by the BS of which users are going to transmit or on no knowledge of the users' state (i.e., if they have a packet to transmit or not).
In [1], we introduced a novel approach to multiple access, in which BS first acquire a partial information on the state of the user, and then schedules the transmissions.In particular, first, the BS estimates the number of users with packets to transmit, without knowing their identities.Then, based only on this information, allocates users to slots.The allocation is performed in a non-exclusive manner, i.e., a slot is typically allocated to multiple users whose transmissions may collide.Indeed, without knowing the identity of users with packets to transmit (active users), collisions could only be avoided using time-division multiple-access (TDMA), which is however extremely inefficient for sporadic traffic.
The resulting scheme is denoted as partial-information multiple access (PIMA).In PIMA, time is organized in frames of variable length, each split into two sub-frames.The first is the partial information acquisition (PIA) sub-frame, where all active users simultaneously send a signal to the BS.The BS performs a maximum a posteriori probability (MAP) estimation of the number of active users.Based on this knowledge, BS then assigns one slot to each user in the system for transmission in the data transmission (DT) subframe.
However, the PIA sub-frame in the scheme of [1] introduced a significant overhead, as the users must transmit a long random sequence and the BS estimated the total received power to count the active users.Moreover, an in-depth analysis of the scheme was missing, considering the differentiated scenarios of next-generation cellular networks.In this paper, we partially re-design PIMA and provide an in-depth analysis of its performance in various scenarios.In particular, the main contributions of this paper are the following.
1) We focus on the short packet transmission scenario, with packets consisting of a few complex symbols each and we improve the user enumeration procedure.By assuming channel state information (CSI) is perfectly acquired at each user with a downlink beacon, users can precode their transmission so that they coherently combine at the BS, similarly to a pulse-amplitude modulation (PAM) signal.Therefore the BS can reliably count the active users with a significantly shorter PIA sub-frame, reducing the overhead.2) We consider three different packet generation statistics, including a bursty traffic generation in the case of massive random access (MRA), wherein the number of users in the system is arbitrarily large while the number of active users remains finite; 3) We prove that with independent identically distributed (i.i.d.) activation, allocating each user to a single slot maximizes the resulting frame efficiency.4) We prove that with MRA traffic, the maximum frame efficiency is obtained by allocating a number of slots equal to the number of active users.5) We significantly extend the numerical evaluation part, comparing PIMA also with state-of-the-art preamblebased approaches.The remainder of the paper is organized as follows.Section II provides a review of the literature of multiple access schemes.In Section III and Section IV, we first introduce the system model and the packet generation processes, then we describe the frame structure and the PIMA protocol.Section V provides details of the user enumeration task for partial information acquisition.Then, the optimal scheduler for i.i.d.activations is derived in Section VI.In Section VIII we discuss the numerical results and compare PIMA with the state-of-the-art orthogonal schedulers.schedulers.Finally, Section IX draws some conclusions.

II. LITERATURE REVIEW
Multiple access schemes can broadly categorized into uncoordinated RA, coordinated RA, and fast uplink grants.
Uncoordinated RA.In uncoordinated RA, users transmit at random time instants, and specific techniques are adopted at the receiver to mitigate the effects of collisions.Such techniques include traditional orthogonal multiple access (OMA), which is outperformed by non-orthogonal multiple access (NOMA) schemes [6], for which the symbols of each user are not allocated to orthogonal resources.An effort has been made towards the unification of the different NOMA solutions and the performance gains with respect to their orthogonal counterparts have been highlighted [7], [8].However, NOMA requires advanced pairing and power allocation techniques, as well as powerful channel coding and interference cancelation mechanisms that only partially mitigate collision effects [9].Under these conditions, BS may become prohibitively complex to serve a large number of users.In recent years, unsourced RA has been proposed to manage a massive number of devices [10]: at any time, a fraction of devices transmit simultaneously using the same channel codebook.The receiver decodes the arriving messages without knowing the identities of the transmitters.Although this approach is very effective in managing many users, good performance can be achieved only for very small payloads and with high-complexity massive multiple input-multiple output (MIMO) receivers [11]- [13].Moreover, most works assume perfect knowledge of the number of active users, and only recently an information theoretical analysis of unsourced random access (URA) with random user activity has been proposed [14].
Coordinated RA.These solutions typically divide the time into slots, each with the duration of one packet.Slotted ALOHA (SALOHA) is the simplest and most widely adopted coordinated RA protocol: users transmit at the beginning of the first slot available after packet generation.When collisions occur, users randomly delay the retransmissions of collided packets.A reduction of collisions is achieved by dividing time into frames; each of these is split into slots wherein users transmit at random, in what is known as framed slotted-ALOHA (FSA), widely adopted in radio-frequency identification (RFID) systems [15], [16].Nevertheless, coordinated RA solutions are particularly useful when user activations are highly correlated, for example, as a result of correlated underlying traffic generation [17].Also, re-transmissions (with the consequent accumulation of packets in user queues) introduce a correlation of transmissions among users: this further increases collisions, while it can also be exploited to indirectly coordinate RA.In the literature, correlation-based schedulers have recently gained attention as a possible breakthrough for multiple access in MTC.Such schemes typically rely on the knowledge of traffic generation statistics [18], or learn the traffic correlation by exploiting the capabilities of hidden Markov models (HMMs) tracking successes and collisions in previous frames [19], or reinforcement learning techniques [20], [21].In another line of research, preamblebased random access techniques are more suitable in the IoT context.For example, with multichannel ALOHA, active users choose a preamble from the common preamble pool, that is, a codebook of orthogonal preambles.Then, the BS can effectively differentiate between multiple users attempting to access the network simultaneously and schedule the users for data transmission in a second dedicated sub-frame, which typically has a fixed length.However, such schemes require different preamble lengths based on the number of active users to perform optimally.Indeed, while long preambles support more active numbers, their entailed communication overhead grows rapidly.Non-orthogonal preambles to reduce overhead have been studied in the context of compressive RA (CRA) [22]- [24]; with CRA the BS resorts to compressed sensing (CS) to identify active users [25].However, multiple measurements are needed to accurately estimate the preambles, therefore requiring MIMO receivers or multiple transmission steps.
Fast Uplink Grant.In fast uplink grant schemes [26] the BS schedules one slot for each user, without any resource request, so access randomness is removed, while coordination remains.Slots are usually shared by multiple users, and collisions may still occur.The exploitation of traffic correlation has also been considered in the design of fast-uplink grant protocols, involving multiarmed bandits-based methods [27] and machine learning tools for traffic prediction [28].
PIMA Protocol Classification.The PIMA protocol proposed in this paper falls into the category of the coordinated RA schemes, and in particular, can be considered a semi-grant-free RA solution.Indeed, with respect to other two-step RA-access schemes consisting of preamble and data transmission stages, PIMA acquires only partial information on the activation statistics in the PIA sub-frame, avoiding revealing the users' identities.Likewise, PIMA cannot be considered a fast uplink grant approach, due to its partial information acquisition at the BS.
Notation.Scalars are indicated in italic letters; vectors and matrices are indicated in boldface lowercase and uppercase letters, respectively.Sets are denoted by calligraphic uppercase letters and |A| denotes the cardinality of the set A. P(•) denotes the probability operator, E[•] denotes the statistical expectation, and log(•) indicates the natural logarithm function.

III. SYSTEM MODEL
We consider the uplink of a multiple access scenario with N single-antenna users transmitting to a common single-antenna BS.We assume that the value of N is known at the BS.Time Organization.Time is split into frames, each comprising an integer number of slots and an additional short time interval, whose purpose is described in the following.Each slot has a fixed duration of T s complex symbols, while each frame comprises a different number of slots.Perfect time synchronization at the BS is assumed, thus each user can transmit signals with specific times of arrival at the BS.
Each user may transmit at most one packet per frame, each of the duration of one slot, and each user transmits at most one packet per frame.In the following, t denotes the frame index.We consider a multiple access protocol, where the same slot is in general assigned to multiple users for transmission.
Channel.Due to the scheduling of the same slot to multiple users, collisions between packets may occur.We assume that when two or more users transmit in the same slot, a collision occurs, preventing the decoding of all collided packets by the BS.Successful transmissions are acknowledged by the BS at the beginning of the following frame, and upon collision, collided users retransmit their packets in the following frame.We assume that the BS always correctly decodes the received packets in slots without collision, thus the channel does not introduce other sources of communication errors.

A. Packet Generation and Buffering
Packets generated in frame t by user n are stored in its buffer and transmitted according to a first in, first out (FIFO) policy.In the following, we consider both the case of finite and infinite buffer capacity.In the case of finite-length queues, to ensure data freshness, whenever a new packet is generated while the buffer is at full capacity, the oldest packet is dropped.Let B n be the queue length of the user n queue at the beginning of frame t.If B n (t) > 0, the buffer of user n is non-empty, and n is said to be active, instead, if B n (t) = 0, its queue is empty and user n is considered inactive.The total number of active users at the beginning of frame t is ν(t).

B. Activations Statistics
We analyze the performance of PIMA in three different scenarios, depending on the users' activation statistics.i.i.d.Activations.The activation statistics of the users are i.i.d.when a) users have a queue for one packet only, and b) all colliding packets are dropped at the receiver after the first transmission.Such a scenario is typical of monitoring systems requiring frequent updates and data freshness [29] and will be discussed in detail in Section VII.
Correlated Activations.In this scenario, retransmissions of previously collided packets are allowed.Due to the presence of queues and retransmissions, users' activations are generally correlated, since users colliding at frame t will deterministically retransmit in the following frames.In this case, we will assume to have queues of infinite length, and the stability condition of all queues is assumed.
For both the i.i.d. and the correlated activations cases, we assume that at each user the traffic generation, also denoted as packet arrival process, follows a Poisson distribution with parameter λ.The well-known properties of Poisson processes provide a total normalized arrival rate of Λ T = N λ/T s , and when Λ T = 1, on average one packet is generated during the duration T s of a slot.
Bursty Activations.In this scenario, we assume a bursty traffic model, in which a finite subset of users activates at the same time, with each user generating a single packet to transmit.Retransmissions of colliding packets are also allowed in this case, while the buffer capacity is unitary for all users.In particular, we assume that at each burst the number of generations of packets follows a Poisson distribution with parameter Λ B , and we denote by τ B the random variable of the interarrival time between consecutive bursts.

IV. PARTIAL INFORMATION-MULTIPLE ACCESS
In this section, we provide a detailed description of the proposed PIMA protocol.Each frame is divided into two subframes, namely the PIA sub-frame and the DT sub-frame.The PIA sub-frame is used to estimate (at the BS) the number of currently active users.Based on this information, the BS decides the duration (in slots) of the DT sub-frame and assigns each user to one slot, for uplink data transmission.Example of the PIMA protocol and its frame structure.In this example, the user n, m, and f are active at the beginning of frame t, each with a different number of packets to transmit (purple rectangles).The RB and the SB transmitted by the BS in the downlink are, respectively, represented by the blue and orange rectangles, while the yellow arrow represents the AI signal for the user enumeration.For drawing simplicity, in this example no packets are generated after the beginning of frame t.

A. Partial Information Acquisition Sub-frame
The beginning of frame t, and thus of the PIA sub-frame, is triggered by the RB, which is transmitted in broadcast by the BS to all users.RBs mark the start of the PIA sub-frame and contain the acknowledgments of correctly received packets in the previous frame.Moreover, RBs provides to each user an accurate CSI of its channel to the BS, denoted as g n .In the PIA sub-frame the BS obtains the estimate ν(t) of the number of active users ν(t), as described in detail in Section V.
At the end of the PIA sub-frame, the BS, knowing ν(t), schedules the transmissions for the next sub-frame.Let q(t) = [q 1 (t), . . ., q N (t)] be the slot selection vector, collecting the slot indices assigned to each user; then, the length of the DT sub-frame L 2 (t) can be derived from q(t) as To end the PIA sub-frame and trigger the beginning of the following DT sub-frame, the BS transmits the SB, which contains the slot selection vector q(t), encoded as described in Section IV-C.Note that, to maintain synchronization, inactive users could a) wake up and wait for the next downlink RB when generating a packet, or b) always wake up when RBs and SBs are transmitted (this can be achieved by collecting timing information in the beacons).

B. Data Transmission Sub-frame
In the DT sub-frame, users transmit their packets, according to the scheduling set by the BS in the SBs.Note that packets generated by user n during the DT sub-frame are delayed and transmitted in the following frame, to reduce collisions, since the DT frame length is derived only based on the number of users active in the PIA sub-frame.Indeed, data transmissions of users who do not request resources in the PIA subframe would introduce uncertainty in the optimization of q n (t), limiting the ability of PIMA to adapt to instantaneous traffic conditions.Since all the derivations in the following are related to each frame separately, to simplify notation, we drop the frame index t from all the variables.
An example of the frame structure and packet transmissions of PIMA is reported in Fig. 1.

C. PIMA Beacons Overhead
We consider two options for the coding of q.The first option provides that all the users in the system receive the explicit indication of the allocated slot in the DT sub-frame.Consider a codebook of N codewords (each of length log 2 N bits) representing all possible sorting of user indices, where if a user is in position i n , its assigned slot is k = i n mod L 2 +1.With this codebook, the SB introduces an overhead of R SB = (N + 1) log 2 N bits, since an additional codeword is needed to indicate the length of the DT sub-frame, L 2 .Note that for a large number of users, this encoding strategy can significantly increase the length of SB, deteriorating the performance of PIMA.
Therefore, we consider a second strategy, in which all users know a list of random sequences (each of N log 2 N bits) of length J, indicating the order in which the users will be served.In this case, the BS transmit in the SB only the index corresponding to the scheduling sequence and the codeword to indicate L 2 , providing an overhead of log 2 J + log 2 N bits.In the following, we adopt this second strategy to reduce the SB overhead.

V. ESTIMATION OF THE NUMBER OF ACTIVE USERS
To obtain an estimate of the number of active users at the BS, each active user transmits an AI signal immediately after receiving the RB.Note that we neglect here the propagation time between BS and the user, which can be easily accommodated by considering a transition (silent) time between the RB and AI transmissions.The set of users transmitting the AI signals during the PIA sub-frame is with |N a | = ν.We stress that the BS does not know the identity of the active users, since the AI signals do not contain such information, to make them shorter.In particular, we assume that each user transmits a single complex symbol γ n in the PIA sub-frame.Assuming perfect synchronization, the received signal, at the BS, is the superposition of all the symbols transmitted by the users, i.e., where h n is the channel coefficient between user n and the BS and w is the additive white Gaussian noise (AWGN) term with zero mean and variance σ 2 w .Assuming that perfect CSI is obtained by users through the RB downlink transmission, each user n perfectly inverts the channel, setting γ n = 1/h n , therefor the BS receives The MAP estimate of the number of active users is then where, for the AWGN channel, the probability density function (PDF) of the received signal conditioned to the number of active users is  5), the optimal regions satisfy the following equation since δ b+1 = 1 − δ b .Replacing (6) into (7), after some algebraic steps we have The error probability is the probability of falling out of the correct decision region, thus, conditioned on the fact that ν = b users are active, we have where Q(•) is the tail distribution function of the standard normal distribution.
I.i.d.Activations.In the case of i.i.d.activations, the activation process coincides with the packet generation process.Assuming that each user generates packets according to a temporal Poisson process with parameter λ n = λ, ∀n, we obtain where T a is the time interval considered for the packet generations and N b counts for all the possibilities of having exactly b active users.Note that for N → ∞ we have p ν (b) ≈ p ν (b+1), thus δ b → 1  2 and all regions have the same size.In this asymptotic scenario, we also have

VI. FRAME EFFICIENCY-BASED SCHEDULING
In this section, we propose a time-resource scheduling conditioned on the number of active users estimated in the PIA sub-frame.
First, we introduce a performance metric that takes into account both the packet latency and the collision probability, whose optimization aims at finding the right balance between the two.Let l ∈ {1, . . ., L 2 } be the slot index within the frame (in the DT sub-frame).We define the success indicator function in slot l as c l = 1 if a successful transmission occurs in slot l and c l = 0 otherwise.Note that the latter case considers both the collision and non-transmission cases.Then, the conditional frame efficiency is defined as the ratio between the number of successes in the frame and the length of the DT sub-frame, i.e., The adaptive maximization of this metric provides the proper balance between the DT sub-frame length and the successful transmission probability.
At each frame, immediately after the end of the PIA subframe, the BS solves the following optimization problem: The optimization problem ( 13) is one of mixed integer non-linear programming (MINLP), and its solution quickly becomes infeasible with long queues or many users.For these reasons, in the following, we focus on the analysis of the i.i.d.activations scenario, designing the parameters of the PIMA scheduler based on its basic assumptions.Note that, while the i.i.d.activations scenario could substantially differ from the correlated activations one at high traffic, it still represents a good approximation in low traffic conditions, wherein retransmissions due to collisions occur sporadically.Moreover, the analysis of i.i.d.activations is useful when there is no information available on the transmission correlation at the BS or when obtaining such information is too expensive.

VII. SCHEDULING WITH I.I.D. ACTIVATIONS
First, we observe that, since activations of users are i.i.d., we only have to determine how many users are assigned to each slot, as any specific assignment satisfying this constraint will yield the same collision probabilities, thus the same expected frame efficiency.Note that in case of decoding failure of multiple packets, the user scheduling should be randomized to avoid the repetition of the same collisions.
To minimize the number of users assigned to the same slot, given a length L 2 , we assign to slot l the following number of users where we possibly schedule one more user in the first N L2 slots to minimize the transmission delay.With this scheduling policy, the slot success random variable c l can be rewritten as a function of u l , as it only depends on the number of users scheduled in slot l.Thus, the optimization problem ( 13) is reduced to the optimization of the DT subframe length, L 2 , and from ( 12), we have Now, given ν, the probability that user n is the one and only active user assigned to slot l is derived by considering all cases of active users, where user n is active and all other users assigned to slot l are, instead, inactive.Consider the matrix N/L 2 ×L 2 , where column l collects the user indexes assigned to slot l.For slot l, assuming that an active user is assigned to slot l, the number of favorable cases is given by all the possibilities to place ν − 1 users (the remaining active users) in all columns of the matrix indexed by l = l.Excluding column l, the available positions (row and column index couples) are N − u l .Therefore, the favorable case is given by all the combinations of ν − 1 active users taken from N − u l users.Instead, the total number of cases is given by all the combinations of ν active users chosen from the N scheduled users.The probability of having a successful transmission in slot l is therefore where factor u l counts the users assigned to slot l.Note that, in (16), the numerator counts the number of combinations giving exactly one active user assigned to slot l, while the denominator counts the total number of possible combinations of active users.The probability of collision in slot l is therefore The problem ( 15) is a MINLP problem, and its solution strictly depends on the number of users in the system.If the number of active users is comparable with N , (15) is not solvable by continuous relaxation of L 2 , as the rounding functions are not differentiable.However, it is possible to find the optimal frame length L * 2 with complexity O(log N ), using a binary search algorithm, or alternatively using a discrete gradient ascent algorithm.In any case, L * 2 depends only on ν, thus can be computed offline and then stored in a table.Instead, if N → ∞ and ν << N is finite, the following result holds: Theorem 1.For N → ∞, under i.i.d.activations, given a finite number of active users ν, the DT sub-frame length L 2 maximizing frame efficiency is exactly ν.
Proof.Since N → ∞ and 1 < ν << N , u l = N L2 for all l = 1, . . ., L 2 .From ( 16) we have then, using the Stirling factorial approximation α! = √ 2πα( α e ) α , whose validity is verified with good accuracy even for small values of α, with some algebraic steps we obtain Taking the limit for N → ∞ we have Therefore the maximum frame efficiency only depends on ν and L 2 , and it is given by Its stationary points are derived from the first order derivative with respect to L 2 as thus, while L 2 = 1 is trivially the global minimum for ν > 1, the frame efficiency is maximized for L 2 = ν.

A. On the Optimality of the Single Slot Allocation
Throughout the paper, we have assumed that each user n is assigned to a single slot q n in the frame.However, we may wonder if this scheduling policy is optimal or if it is preferable to assign multiple slots to each user.Focusing on the case of i.i.d.activations, we have the following result.
Theorem 2. Under i.i.d.activations, the assignment of a single slot to each user in the DT sub-frame is optimal, i.e., it maximizes the expected frame efficiency.
Proof.Since the collision probability is the same for all users and depends only on the number of other users transmitting in the same slot, by allocating more slots to each user, we increase the collision probability.Hence, single-slot scheduling is optimal in this case.

VIII. NUMERICAL RESULTS
In this section, we present the numerical results, comparing our PIMA protocol with the state-of-the-art OMA schedulers in a) i.i.d.activation scenario, b) correlated activation scenario, both with N = 50 users, and c) bursty activation scenario, with a large N .Following the assumption of short packet transmission, the number of symbols in each packet (slot duration) is T s = 10 unit symbol duration (usd). 1 Furthermore, a constant signal-to-noise ratio (SNR) of 10 dB is assumed at the receiver.Note that while the analysis presented in Section VI assumes a perfect estimation of the number of active users, the results shown below are obtained using the estimated ν.
For performance comparison, we consider a) the standard TDMA, which provides fixed-duration frames of N slots, with one user assigned per slot deterministically, b) a stabilized version of the SALOHA protocol, and c) the CRA-2 protocol of [24] with preambles of length M p = N/2 and N .
Stabilized Slotted ALOHA.For the SALOHA protocol, we consider Rivest's stabilized SALOHA [30,Chapter 4], [31], where all users generating packets at slot l are backlogged with the same probability of backlog.The backoff probability is computed at each user through a pseudo-Bayesian algorithm based on an estimate of the number of backlogged nodes G(l) as where is the estimated number of backlogged users (with G(0) = 0) and θ = 1 − e −λ is the packet generation of probability at slot l.
Modified CRA-2 Protocol.The CRA-2 protocol was proposed in [24].Similarly to PIMA, each frame includes two subframes, and in the first sub-frame active users are identified with a temporary identifier, rather than just counted.To this end, each active user randomly chooses and transmits a sequence of complex symbols of length M p (preamble), from a preamble pool known to both users and BS.The BS receives all preambles simultaneously and detects them.Preambles are used as temporary identifiers for the active users, and the BS schedules the data transmission in the second sub-frame by allocating one slot per each detected preamble.Here, we assume that preambles are orthogonal; therefore, the number of preambles equals their length M p .When M p = N , each user is uniquely assigned to a preamble, while for M p = N/2, active users choose their preamble for a pool of M p orthogonal preambles uniformly at random.In the latter case, if two or more users transmit the same preamble (and the BS detect it), they are assigned the same slot and collide.For both schemes, we consider the probability of misdetection in the preamble in the first subframe P md = 0.1.Note that in 1 The usd is the inverse of the bandwidth if the Nyquist sampling rate is used.All the lengths of the sequences, slots, and beacons are given in usd.[24] preambles are assumed to be non-orthogonal, and a CSbased algorithm is applied.However, preamble detection in the presence of noise is not considered and a fixed misdetection probability is assumed.The impact on system performance of different CS algorithms has been discussed for the case of multiple measurements (e.g., multiple antennas in BS) in [32].Although on one hand CS-based detection allows increasing the number of preambles and reducing the probability of collision, on the other hand, it also produces a high probability of misdetection when BS is equipped with a single antenna and a large number of users are active [33].
Comparing PIMA with CRA-2, we have two main differences: a) the first sub-frame is shorter for PIMA than for CRA-2, and b) the second sub-frame has the same length for both approaches.Thus, on the one hand, overall the PIMA frame is shorter, reducing the average number of packets generated in each frame, and this reduces the average number of packets accumulated in the buffers before transmission in the next frame.On the other hand, in PIMA users may collide in the second sub-frame due to a non-orthogonal allocation, which increases (with respect to CRA-2) the number of packets to be re-transmitted in the next frame (thus increasing the average number of users in buffers).Lastly, a wrong counting of users in the PIA sub-frame or a wrong identification in the first CRA-2 sub-frame increases the collision probability of both schemes.The numerical results presented in this section will compare the performance of both schemes, taking into account the various described effects.
Overhead Comparison.Since the acknowledgment overhead is neglected, both TDMA and SALOHA do not entail any overhead.For both PIMA and the preamble-based solutions, we assume that a 64-QAM modulation is used to modulate the SB, and the list of the scheduling sequence is set to J = 64.Under the aforementioned assumptions, the total overhead induced by the PIA sub-frame is constant and equal to L 1 = 3 QAM symbols.Instead, for preamble-based approaches, the overhead (in symbols) is given by the preamble with length M p and the BS feedback, which is typically longer than the SB of PIMA.In particular, assuming that ν preambles are detected by the BS, the identifiers of these preambles are fed back in broadcast, in the order of slot allocation (for the subsequent data transmission), providing a total overhead of M p + ν.
The overhead of RB is neglected and does not play any role in the performance comparison, as its length is comparable for all the considered schedulers.Performance metrics.For the i.i.d. and correlated activation case, performance is assessed in terms of both average frame efficiency η and average latency D. The former metric is the average frame efficiency conditioned on ν > 0, obtained by averaging over all the frames with at least one estimated active user.Instead, in the latter metric the average is computed among all successfully delivered packets of user n.Moreover, in the case of i.i.d.activations, we also consider the packet dropping probability P drop , counting the packets dropped due to both collisions and replacements in the unit-length buffers when generations occur.This probability is constantly 0, for all the compared schemes, in the correlated activation case, due to the possible retransmission and the infinite-length queues.
Finally, for bursty activations, the performance of the system is evaluated in terms of burst transmission time D B , i.e., the time needed to transmit all the packets generated in a traffic burst.

A. I.I.D. Activations
We first report and discuss the results obtained for i.i.d.user activity and N = 50 users.In this activation scenario, retransmissions are not allowed.Therefore, SALOHA does not include backlogging, and each user attempts the transmission immediately upon the packet generation, i.e., α(l) = 1, ∀l.
A comparison of the average frame efficiency achieved by each of the schemes is shown in Fig. 2, as a function of the total packet generation rate Λ.While this metric cannot be defined for SALOHA, as it does not divide time into frames, we observe that TDMA, adopting the constant frame length, provides a very low frame efficiency.Moreover, since only the frames wherein ν > 0 are counted on average, PIMA achieves a low frame efficiency at extremely low traffic due to the noise affecting the estimation of ν.Indeed, with sporadic activity, very few frames see active users, and, in many of these rare cases, we have ν > 0 due to the noise contribution.As the traffic intensity increases, instead, PIMA achieves the highest frame efficiency, outperforming both the TDMA and the preamble-based schemes, whose overhead severely affects performance.
Figs. 3 and 4 shows the effect of the packet generation rate on the average latency and packet dropping probability, respectively.In this context, all packets generated during a frame transmission wait, on average, N/2 slots in low traffic conditions, therefore TDMA shows the highest latency.Still, latency decreases as the traffic increases, since the buffering delay is reduced by the new packets replacing the older ones in the queue.However, the dropping probability increases up to over 0.1 in high-traffic conditions.Instead, the SALOHA scheduler attains the lowest latency in this scenario, transmitting all packets immediately upon generation.It then provides a lower bound on the latency, as all colliding packets are discarded and do not contribute to the evaluation.However, such dropped packets increase the dropping probability, which approaches 1 for large values of Λ.The CRA-2 scheduler with N preambles, instead, is collision-free, and it drops a reduced number of buffered packets due to its shorter DT sub-frame with respect to TDMA.Indeed, CRA-2 achieves the lowest P drop among the considered approaches: this improvement comes at the cost of higher latency, due to the longer time needed for the first sub-frame.Lastly, while the already mentioned schemes drop packets due either to collision (SALOHA) or new packet generations (TDMA, CRA-2 with M p = N ), PIMA and CRA-2 with M p = N/2 attempt to merge the advantages of the aforementioned solutions.On one hand, PIMA provides a higher collision probability than the preamble collision probability of CRA-2.On the other hand, collisions are compensated by a reduced dropping probability of newly generated packets, thanks to a shorter first sub-frame.Thus, while CRA-2 achieves a considerably lower dropping probability than PIMA, its latency is higher than that of PIMA, which guarantees close-to-minimum latency.

B. Correlated Activations
We now assess the performance of the general case of correlated user activations, assuming infinite buffer lengths and an infinite number of (re)transmission attempts for each packet.We consider a total number of N = 50 users, under a traffic intensity guaranteeing queues' stability, as obtained by simulations.Performance results are shown as a function of the traffic intensity 0.01 ≤ Λ ≤ 3.5, with Λ = λ T T s .
Firstly, Fig. 5 shows the average frame efficiency η, as a function of the total packet generation rate.The performance is almost the same as the i.i.d.case.This is mostly due to the fact that, in stability conditions, few retransmissions are performed multiple times.While all the observations on Fig. 2 still hold, we observe a very slight degradation of PIMA, as it is designed for i.i.d.activation statistics and does not take into account previous collisions.
Second, Fig. 6 shows the average packet latency as a function of the packet generation rate for N = 50 users.Still, the latency of TDMA is much higher than that of other schemes considered due to its frame length of N (maximum).We also observe that, at low traffic, the PIMA scheduler achieves extremely low latency, comparable to the minimum latency provided by SALOHA.In higher traffic conditions, instead, the SALOHA backlogging mechanism prevents users from transmitting their buffered packets immediately, thus increasing the average latency.This effect is mitigated in PIMA, whose latency is reduced, due to its better ability to adapt to instantaneous traffic load.For the preamble-based approach, instead, we observe that overhead plays a crucial role in overall latency, and shorter preambles yield better performance, while PIMA is still outperforming CRA-2.

C. Bursty Activations
We now investigate the performance in the bursty activations scenario.We first assess the performance obtained for a single burst of intensity Λ B , and then discuss some constraints on the burst interarrival time τ B .As discussed in Section III-B, here we assume that a random number of active users ν, out of an arbitrarily large number of users in the system N , generate a single short packet to be transmitted in the following time slots.As we consider an arbitrary large N , the length of the DT sub-frame in PIMA is here derived according to Theorem 1.
The number of packet generations in a burst follows a Poisson distribution Λ B in the range [10,10000], thus, the average number of packet generations is Λ B .For comparison purposes, we consider CRA-2 with fixed preamble lengths M p = 1000 and 10000.We also consider an ideal solution, wherein the length of the preamble is adapted to the average number of generation of packets, that is, M p = λ B .Note that this ideal solution is hardly implementable, as it requires different preamble pools based on the traffic generation rate.Moreover, we do consider neither SALOHA, as all the packets generated simultaneously collide with probability 1, nor TDMA, as a large N yields a very long frame length.Note that a longer list of random sequences (that is, a longer overhead) for the scheduling vector encoding (Section IV-C) is necessary as the number of users in the system increases.In the following, we still consider a PIA sub-frame overhead of L 1 = 3 symbols, which can be easily accommodated by adopting a higher modulation order at the BS for SB transmission.Fig. 7 shows the average burst transmission time as a function of the average number of packet generations.First, we observe a constant gap between PIMA and the ideal preamblebased solution with M p = Λ B , while fixed-length preambles achieve better performance when the number of active users is comparable to the preamble length.In particular, the CRA-2 solution with M p = 1000 preambles achieves a lower burst transmission time only for 400 ≤ Λ B ≤ 2000, while at least Λ B = 4000 is needed when M p = 1000.For a low average number of packet generations, PIMA achieves the best performance due to its reduced overhead.For faster packet generations, both fixed-length preamble approaches suffer from preamble collisions, which implies a much higher  packet transmission time due to retransmissions.Therefore, while PIMA is able to adapt to all traffic conditions without any change in the PIA sub-frame, preamble-based approaches should adopt different preamble pools depending on the traffic intensity in order to achieve nice performance.Finally, ECCDF of the burst transmission time is shown by Fig. 8 for Λ B = 600 and 3000.The results confirm the comments on Fig. 7, with the ideal case M p = Λ B always attaining the lowest transmission time and the fixed-preamble approaches outperforming PIMA only for a preamble length comparable to the number of packet generations.Note that this last figure gives an indication of the minimum burst interarrival time τ B .To minimize the probability of overlap between two traffic bursts, in particular, τ B must be large enough to minimize the probability of having new arrivals while there are still previously generated packets to be transmitted.Setting a threshold on the overlap probability, the minimum τ B allowing one to satisfy such threshold can be easily retrieved by ECCDF.For example, from Fig. 8, the probability of overlap of PIMA is 10 −2 if τ B = D B ≈ 19•10 3 usd for Λ B = 600, and if τ B ≈ 85•10 3 usd for Λ B = 3000.In this comparison, PIMA is shown to be effective in minimizing burst transmission time, being very close to the CRA-2 solutions with M p = 1000 and M p = ΛB in the low traffic scenario, while outperforming both fixed-length preamble solutions for Λ B = 3000.

IX. CONCLUSIONS
We have proposed the PIMA protocol, a semi-GF coordinated multiple access scheme for short packet transmission, based on the knowledge of the number of users that have packets to transmit.PIMA organizes time into frames, and each frame includes a preliminary phase (the PIA sub-frame), where BS estimates the number of active users, and a second phase (the DT sub-frame), wherein the actual data transmissions are carried out.We derive the optimal scheduling in the case of i.i.d.activations and assess its performance for different users' activation statistics.The numerical results obtained in such scenarios show that PIMA is able to achieve extremely low latency with respect to state-of-the-art orthogonal multiple access solutions due to its low overhead and is able to adapt to different activation conditions by exploiting the partial knowledge of the instantaneous traffic load.

4 Figure 1 .
Figure1.Example of the PIMA protocol and its frame structure.In this example, the user n, m, and f are active at the beginning of frame t, each with a different number of packets to transmit (purple rectangles).The RB and the SB transmitted by the BS in the downlink are, respectively, represented by the blue and orange rectangles, while the yellow arrow represents the AI signal for the user enumeration.For drawing simplicity, in this example no packets are generated after the beginning of frame t.
This criterion establishes decision regions on the received signal.Define δ b as the distance from b (the value obtained without noise) and the region associated with the decision b + 1.Then, when y falls in the region [b − δ b , b + 1 − δ b+1 ], the decision on the number of active devices is b.Since the distance between b and b + 1 is 1, we have that the distance from b (the value obtained without noise) and the region associated with the decision b+1 is 1−δ b+1 .According to the MAP criterion (

2 ,Figure 2 .Figure 3 .
Figure 2. Average frame efficiency versus the total packet generation rate for N = 50 and i.i.d.activations.

Figure 4 .
Figure 4. Average packet dropping probability versus the total packet generation rate for N = 50 and i.i.d.activations.

2 ,Figure 5 .
Figure 5. Average frame efficiency versus the total packet generation rate for N = 50 and correlated activations.

2 ,Figure 6 .
Figure 6.Average packet latency versus the total packet generation rate for N = 50 and correlated activations.