GFDM-Based Asynchronous Grant-Free Multiple-Access

For next-generation Internet-of-Things (IoT) networks, asynchronous instant transmission has attracted increasing research interest with the expectation of achieving near-zero latency without excessive initiation procedure. However, in an asynchronous multiple-access scenario, there exist significant inter-carrier interference between sub-carriers allocated to different users. To suppress out-of-band emission (OOBE) of each sub-carrier, a new generalized frequency division multiplexing (GFDM) has been proposed, which has lower OOBE than the conventional orthogonal frequency division multiplexing (OFDM). In this paper, by using GFDM, two types of receivers are proposed with the aim of reducing latency and improving throughput: a GFDM-based minimum mean square error (MMSE) receiver and a GFDM-based MMSE-successive interference cancellation (SIC) receiver. Then, we develop a lightweight scheme using an $\epsilon $ -conservative rate control with GFDM-based MMSE receivers and also invent a performance-focused scheme using an advanced rate control with GFDM-based MMSE-SIC receivers. In particular, the latter scheme provides higher throughput with limited increase in computational load of user equipments. Numerical results show that with a high successful transmission probability higher than 99 %, the performance-focused scheme and the lightweight scheme achieve up to 85 % and up to 70 % higher throughput compared to the conventional OFDM-based asynchronous multiple-access scheme, respectively. Furthermore, since our proposal does not require any centralized user scheduling or initiation procedure, it presents a significant reduction in latency compared to the existing low-latency technologies.

in the conventional wireless communication standards such as 3GPP LTE and IEEE WLAN, one aspect is much more focused than the other.
To minimize the latency of unicast communication, 3GPP has developed ultra reliable low latency communication (URLLC) technologies, striving to achieve the end-to-end delay of 1 ms [4]- [6]. Therefore, the URLLC technologies are suitable for fast and reliable transmissions of short control packets in remote medical care, smart factories, and autonomous vehicles. In this regard, 3GPP has included the pre-scheduling, semi-persistent scheduling (SPS), contention-based physical uplink shared channel (CB-PUSCH), and short transmission time interval (sTTI) techniques to meet the requirements of the URLLC [7]- [10]. However, since all of these candidate technologies are based on synchronous multiple-access, there is a limit to reducing the latency to almost zero.
The Wi-Fi standardization group is seeking to increase throughput and reduce latency to some extent by applying the uplink (UL) multi-user multiple-input multiple-output (MU-MIMO) technology in its latest standard, 802.11ax [11], [12]. However, the MU-MIMO technology adopted for 802.11ax requires a long initialization process to synchronize between users and an AP, and to select users to transmit. Furthermore, the latency increases exponentially as the number of users grows due to increased collision probability.
To dramatically reduce latency, Grant-free (GF) multipleaccess techniques have been proposed, the key idea of which is to completely eliminate the time required for user selection and resource allocation [13]. Furthermore, in order to serve massive connections at ultra low latency, this technique has been applied in various fields of wireless communication such as the power domain non-orthogonal multipleaccess [14], sparse code multiple-access [15], K -repetition with slotted ALOHA [16], MIMO [17].
However, these previous studies still assume a perfectly synchronous scenario or a scenario where the time offset at each user does not exceed a cyclic prefix (CP) length. Since GF multiple-access methods basically do not include UL control channels, it is infeasible to assume the perfect time synchronization between all user equipments (UEs).
To circumvent the restriction of the synchronization, GF asynchronous multiple-access (GF-AMA) schemes are proposed with the orthogonal frequency division multiplexing (OFDM) waveform [18]- [20], and developed with the code division multiple-access (CDMA) [21]. Indeed, standardization groups such as 5GNOW, METIS, and 3GPP have been conducting feasibility studies to apply the GF-AMA technology to next-generation mobile communication networks [22]- [24].
However, we address an inherent but critical problem in GF-AMA, which we refer to as a rate mismatch problem. This problem occurs since any user can start its transmission to a base station (BS) even if the BS is already receiving a signal from another user. This asynchronous multiple-access breaks the orthogonality between the sub-carriers allocated to different users, which gives rise to significant inter-user interference. Due to this broken orthogonality, even if a packet starts to be transmitted under a situation where there is no inter-user interference, some part of the transmitted packet is severely contaminated if while it is still being transmitted, another user starts to transmit its new packet. Because of this contamination, the signal-to-interference-plus-noise ratio (SINR) at the time of the packet's creation becomes different from the SINR at the time where it is received at a BS. This difference causes the rate mismatch problem, which eventually results in a transmission failure.
In addition, the use of the OFDM waveform optimized for synchronous communication makes it more difficult to realize GF-AMA [25]. Specifically, the OFDM waveform generally generates high out-of-band emission (OOBE) [26], and such high OOBE leads to significant inter-user interference in asynchronous multiple-access scenarios [27], [28]. Hence, to achieve low OOBE, 3GPP has continued studying new waveforms such as filter bank multi-carrier (FBMC), universal filtered multi-carrier (UFMC), and generalized frequency division multiplexing (GFDM) [29]- [31].
GFDM has recently attracted research interests [31]- [35] because of its low OOBE and low peak-to-average power ratio [31]. This waveform can construct symbols flexibly in both the time and frequency domain with higher CP efficiency than OFDM. In [31], when using a radio band of white space, the authors analyzed the overall performance of GFDM such as OOBE and symbol error rate according to the circular filter used. To overcome high Doppler effect in a vehicle-to-everything communication, the work [36] studied GFDM with orthogonal time frequency space (OTFS). On the other hand, the authors of [33] analyzed a GFDMbased asynchronous multiple-access scheme, yet the analysis is restricted only to the power spectral density derivation in a saturated traffic scenario. Some literature proposed linear GFDM receiver filters to mitigate performance degradation due to the time and frequency offset in the single-user [34] and multi-user scenarios [35].
In this paper, we propose two different GFDM-based transceiver schemes with three goals: to i) resolve the rate mismatch problem in the GF-AMA scenario, ii) achieve low latency, and iii) obtain high throughput. Our contributions are summarized as follows: • We first propose a lightweight scheme that resolves the rate mismatch problem with minimized computational load at each UE. Specifically, a UE employs an -conservative rate control to tolerate unpredictable interference caused by other UEs' future transmissions. We also develop a low complexity GFDM-based MMSE receiver at the BS.
• With a limited increase in UEs' computational load and additional side information on the start times of ongoing packets, we develop a performance-focused scheme to improve the throughput performance and increase the successful transmission probability. We derive an estimate of the SINR degradation due to asynchronous VOLUME 10, 2022 GF transmissions, based on which we develop an advanced rate control scheme at each UE. Unlike the lightweight scheme, the performance-focused scheme employs different rate controls for different subcarriers. To enhance the throughput of the receiver, we also propose a GFDM-based MMSE-SIC receiver at the BS. Although the work [35] proposed a GFDM-based lineartype receiver in the same GF-AMA scenario, the goal of the scheme is to improve each user's SINR under the presence of time and frequency offsets. On the other hand, the focus of this paper is to resolve the rate mismatch problem, which is also one of the most critical problems in the GF-AMA scenario but has not been considered in the literature. Furthermore, the computational complexity of the linear-type receiver proposed in [35] becomes prohibitive as the total bandwidth increases, whereas the computational complexity of our proposed MMSE receivers is feasible even with the additional SIC process. To the best of the authors' knowledge, this is the first study that explicitly analyzes any GFDMbased GF-AMA transceiver scheme both in the throughput and latency perspectives.
For evaluation of the proposed schemes, we implement various existing synchronous and asynchronous schemes. Through extensive simulations, the proposed schemes can allow users to avoid the rate mismatch problem with high successful transmission probability, which leads to high throughput and low latency. Furthermore, even in a massive user scenario composed of hundreds of users but with highly limited available bandwidth, both the proposed schemes achieve remarkably lower latency, lower than 1 ms, and attain higher throughput than the existing schemes.

II. PRELIMINARY
This section introduces recent synchronous multiple-access techniques for low latency, and points out the limitations of them. Then, we explain asynchronous multiple-access as a promising solution, and discuss benefits and challenges of it.

A. LATENCY OF SYNCHRONOUS COMMUNICATION 1) DRAWBACKS OF THE CONVENTIONAL HANDSHAKE ACCESS
As of the 3GPP release 15 [7], for low latency, 3GPP added an option to use a sTTI lower than 1 ms, which was the lowest TTI value in previous releases. Even with the sTTI, the conventional handshake-based multiple-access still requires three steps between a BS and UE for the UE to transmit an uplink signal: i) a scheduling request (SR) is transmitted via physical random access channel (PRACH) or physical uplink control channel (PUCCH) from the UE, ii) a scheduling grant (SG) by the BS is awarded to the UE via physical downlink control channel (PDCCH), and iii) an uplink signal is transmitted by the UE by physical uplink shared channel (PUSCH).
Hence, the handshake procedure leads to two drawbacks. First, according to [8], since each step of the handshake  procedure has its own delay, the access delay cannot be lower than 3 × TTI even without any other additional delay due to signal processing and decoding procedures. Secondly, as the number of UEs increases, the excessive amount of PRACH or PUCCH is needed. As a result, throughput is degraded due to the lack of PUSCH.

2) LIMITATION OF CONTENTION-BASED ACCESS
To increase the wireless resource utility of the conventional pre-scheduling method, a contention-based (CB) multipleaccess method has been proposed in the 3GPP release 15 [7]. In this method, several UEs are pre-scheduled at the same resource block (RB). However, if two or more UEs concurrently start their transmissions in the same RB, all the transmissions fail. As the number of UEs increases, the collision probability becomes higher, resulting in the reduced resource utility. Fig. 1 illustrates the example of two UEs' asynchronous multiple-access. As seen from the figure, UE2 instantly transmits its UL signal right after the packet to transmit is generated without the grant from the BS and even without waiting for the start of the next OFDM symbol; that is, UE2 transmits without the symbol synchronization. Doing so, the access delay can be minimized, and the theoretical minimum delay can be near-zero. However, there are two main challenges in this GF asynchronous transmission approach: i) broken orthogonality between asynchronous sub-carriers and ii) rate mismatch problem.

1) BROKEN ORTHOGONALITY BETWEEN ASYNCHRONOUS SUB-CARRIERS
Suppose that UE1 and UE2 transmit on neighboring subcarriers, where UE1 uses sub-carrier 1, and UE2 uses subcarrier 2 and 3 as in Fig. 1(a). The symbol misalignment between UE1's symbol and UE2's symbol is depicted in the figure. In UE2's receiver window in Fig. 1(a), UE1's signal can be interference even though the two UEs use different sub-carriers. Fig. 2(a) shows the power spectral density (PSD) of UE1's signal in UE2's receiver window when the symbol misalignment is zero. In this perfectly synchronous case, the OOBE from sub-carrier 1 due to UE1's signal becomes 0 at subcarrier 2 for both GFDM and OFDM modulations. On the other hand, Fig. 2(b) shows the PSD of UE1's signal in UE2's receiver window, where the PSD is averaged over all possible symbol misalignment. In the case of OFDM, the interference from UE1's signal is severe even at sub-carrier 3, but GFDM lowers the OOBE interference by 11 dB. Owing to such low OOBE advantage, GFDM is much more suitable for this asynchronous multiple-access than OFDM.

2) RATE MISMATCH PROBLEM
In general, a modulation and coding scheme (MCS) at a UE is chosen such that the transmission rate is maximized up to the limit allowed by UE's expected SINR. As seen from Fig. 1(a), at the point when UE1 initiates its transmission, the expected SINR at the BS is calculated without the consideration of the transmission by UE2. However, UE1's signal can be contaminated by UE2's signal due to the aforementioned OOBE, leading to a lowered SINR value when the BS receives UE1's signal. As a result, the BS cannot successfully decode and demodulate UE1's signal due to the unpredictable interference from UE2.

III. SIGNAL AND SYSTEM MODEL
This section introduces a GFDM signal and system model. The notations used frequently throughout the paper are summarized as follows. Uppercase and lowercase alphabets in bold type denote vectors and matrices, respectively. x(n) denotes the n-th element of the vector x, a subvector x(i 1 :  denotes the element in the i 1 -th row and i 2 -th column in the matrix X, 0 N ×1 is an all-zeros vector that has N zeros, 0 N ×M is an all-zeros matrix consisting of zeros with the size of N × M , I N is an identity matrix of size N × N , F N is an N -point DFT matrix, and F −1 N is an N -point IDFT matrix. The expectation, transpose, Hermitian, and inverse operators are denoted as E{·}, (·) T , (·) H , and (·) −1 , respectively. Finally, the operator ⊗ represents the Kronecker product.
In Fig. 3, a sub-band is a group of sub-carriers assigned to a UE, and each UE transmits a packet composed of several GFDM symbols through the sub-band. Here, the j-th packet transmitted in the v-th sub-band is denoted as PACKET(v, j), and the w-th GFDM symbol in PACKET(v, j) is referred to u (v,j) w . In this work, it is assumed that the length of each packet is fixed to the same value. Table 1 summarizes the notations frequently used throughout this paper.

A. GFDM-BASED MULTIPLE-ACCESS
GFDM is a block-based filtered multi-carrier scheme where each block contains MK complex-valued data symbols that are carried by in K sub-carriers and M sub-symbols. Hence, according to the Nyquist theorem, a GFDM block should at least contain N = MK samples. Let us denote the indices of a sample, GFDM sub-symbol, and sub-carrier as n ∈ [0, N −1], m ∈ [1, M ], and k ∈ [1, K ], respectively. Accordingly, a GFDM pulse shape filter with the k-th sub-carrier and VOLUME 10, 2022 is a prototype filter. Here, g k,m [n] is generated by circularly shifting the prototype filter g [n]. Fig. 4 shows a block diagram of a GFDM transmitter with the illustration of the raised cosine prototype filter and the corresponding pulse shape filter for the sub-symbol index m and sub-carrier index k. Note that GFDM is not restricted to the raised cosine prototype filter [31], [37].
The number of available sub-bands is denoted by V . Then, the set of sub-carriers' indices in the sub-band v is defined as where |K (v) | means the number of sub-carriers in the v-th sub-band.
The transmitted signal of the w-th GFDM symbol u Then, each element of x (v,j) w is represented as where p (v,j) denotes the transmission power of PACKET(v, j), and d is a complex-valued data allocated to the k-th sub-carrier and m-th sub-symbol in the w-th GFDM symbol of PACKET(v, j). Moreover, by defining the vector notation g k,m = g k,m [0], · · · , g k,m [N − 1] T , (2) can be rewritten as where Finally, the transmitted signal of the GFDM symbol u w with a CP is defined aŝ where C ∈ R (N +N CP )×N is a matrix that appends the CP of size N CP in front of the data part, andx

B. GRANT-FREE MAC PROTOCOL
This study focuses on static IoT sensor networks where UEs are nomadic, and hence the coherence time of channels is sufficiently long. In this environment, a BS can estimate UL channel values, and UEs can obtain their channel values, whether through downlink (DL) pilot-based UL/DL channel reciprocity or through feedback from the BS [38]. Since the channel values remain constant for a long coherence time, the time or frequency resource required for obtaining the UL channel values at the BS and UEs becomes negligible compared to that required for the data transmission, similar to the UL channel sounding process of LTE.
For UL and DL, frequency division multiplexing is considered, which can minimize UL access delay. For UL, the BS should inform UEs which sub-band is idle and thus periodically transmits to each UE a bitmap called a DL map, including the sub-band usage status. The concept of a DL map has already been used in LTE, which contains more information such as user allocation and MCS index for each RB.
Users can grasp idle sub-band indices through the DL map. In addition, the newly added busy sub-band indices between the two DL map transmissions can be detected through carrier sensing. A UE employs the carrier sensing right before the start of a UL transmission to prevent a collision that may occur when several users start transmitting over the sub-band where there is an ongoing transmission.

IV. A LIGHTWEIGHT SCHEME WITH A GFDM-BASED MMSE RECEIVER AND -CONSERVATIVE RATE CONTROL
In this section, a lightweight scheme is proposed, where each UE employs an -conservative rate control, and a BS utilizes a GFDM-based MMSE receiver.

A. NETWORK-WIDE -CONSERVATIVE RATE CONTROL
In general, a UE transmits at the maximum possible rate that the SINR at the time of the transmission allows. However, in a GF-AMA scenario, the actual received SINR at a BS may be lowered due to the inter-user interference resulting in the rate mismatch problem. To prevent the rate mismatch problem, we propose an -conservative rate control scheme. By using the scheme, a UE builds PACKET(v, j) at the following rate: where SNR (v,j) is an SNR of PACKET(v, j) in the BS, and is a parameter determined by the BS. Note that the proposed -conservative rate control scheme is suitable for UEs with very low computational power as the rate is determined through a simple calculation based on the value.

1) NETWORK-WIDE -ADAPTATION ALGORITHM
The optimal choice of the parameter depends on network configurations, such as the number of UEs, SNR distributions, and packet arrival rates. Although a BS could obtain the optimal through numerical simulations for some static IoT network scenarios, we propose a network-wide -adaptation algorithm that efficiently deals with time-varying scenarios. Specifically, a BS determines the value according to the following rules.
• If the successful transmission probability is lower than δ low , is increased by β.
• If the successful transmission probability of 1 is maintained for more than γ succ times, the is decreased by β. In most existing rate adaptation schemes, each UE adjusts its transmission rate according to its successful transmission probability. On the other hand, in the proposed scheme, the BS adjusts the value which is shared by all the UEs. Doing so, we can guarantee high successful transmission probability across all the UEs more reliably, compared to the per-UE rate control, even though there is frequent unpredictable interference due to other UEs' GF asynchronous transmissions.

B. ASYNCHRONOUS MULTIPLE-ACCESS SCENARIO
In general, a received signal at a BS can be represented by using the convolution of a channel tap vector and a transmitted signal. For our analysis, the channel tap vector of PACKET(v, j) normalized by the sampling period is defined is the number of channel taps. By using the transmitted signal x (6), the sampled points of the received signal without interference is derived aŝ = 0 for a < 0 and a ≥ L. Then, the received signal without interference can be rewritten in a vector form aŝ is a toeplitz matrix consisting of the channel tap vector h (v,j) .

1) RECEIVED SIGNAL WITH SYNCHRONOUS ACCESS
By assuming that the length of the CP is longer than the multi-path delay, i.e., L < N CP , the received signal under a synchronous access is a subvector ofŷ (v,j) w , and it can be expressed aŝ where H (v,j) is a circulant matrix consisting of h (v,j) , and z =ẑ(N CP +1 : N +N CP ).

2) RECEIVED SIGNAL WITH ASYNCHRONOUS ACCESS
As mentioned in Section II-B, to minimize the access delay, each UE instantly transmits its UL signal right after the packet to transmit is generated without waiting for the start of the next GFDM symbol. This instant transmission causes asynchronous access, resulting in inter-user interference. Therefore, it is necessary to represent the received signal, including this inter-user interference. Fig. 5 depicts the block diagram of a GFDM-based BS receiver in an asynchronous multiple-access scenario. In the figure, the BS extracts the received signals at different time points according to the symbol to be detected, and we refer to this symbol as a target symbol. The extracted signal is decoded through a demodulator such as zero-forcing (ZF) or MMSE, and SIC can also be applied using decoded data. Therefore, in order to express the received signal with the inter-user interference caused by the asynchronous multipleaccess, a BS determines i) the target symbol, identifies ii) a set of GFDM symbols overlapped with the target symbol, and computes iii) misaligned time between the target symbol and the other overlapped symbols. Fig. 3 shows an example of asynchronous multiple-access where a BS receives the multiple packets transmitted by several UEs on different sub-bands at different transmission start times. Here, T Note that the unit of the time is defined by the sampling period throughout this paper. By defining T symbol as a GFDM symbol duration without a CP, the sampling period is calculated by w , the set of GFDM symbols overlapped with the data part of the target symbol is defined as where In (12) and (13), W is the total number of GFDM symbols in a packet. Eq. (12) denotes GFDM symbols overlapped with the target symbol and received earlier than the target symbol, while (13) denotes GFDM symbols overlapped with the target symbol and received later than the target symbol.
Note that (12) follows from the fact that the interfering symbol stretches by the number of channel taps, ignoring the case where the interfering signal overlaps with the CP of the target symbol. As shown in Fig. 3, the symbol misalignment between the target symbol u (v,j) w and the GFDM symbol u Finally, the received signal of the w-th GFDM symbol u (v,j) w with the inter-user interference due to the asynchronous multiple-access is represented as w , and is defined as In (16), it consists of the zero-padded received interference w is also received through the convolution process with channel taps at the BS, the received interference signal can be written asH ( is also a toeplitz matrix constructed by the channel tap vector h (v,ĵ) with the different size. As a result, By using the zero pad, in the symbol-level window for a target symbol, the part where the target symbol and a symbol in another sub-band do not overlap each other is regarded as zero.
In (16), the symbol-level windowing matrix S L (v,j,w) v,ĵ,ŵ is defined as v,ĵ,ŵ < 0, the window moves to the left from the end of the CP, and for L (v,j,w) v,ĵ,ŵ > 0, the window moves to the right. As a result, for any

C. GFDM-BASED MMSE RECEIVER
Since the received signal y (v,j) w in (15) has N = MK sampled points, it actually contains data transmitted over all subcarriers used in the system. However, in order for one BS to extract data from the UE of interest, it only needs sampled points for sub-carriers allocated to the UE. In addition, considering all the sampled points increases the computational complexity of an MMSE equalizer, which Section VI will explain in detail. Therefore, in this section, we derive the MMSE equalizer for a signal that has passed through a digital band-pass filter (BPF). The digital BPF is denoted as |×N is a matrix for extracting row vectors corresponding to subcarriers in the sub-band v. The extraction matrix B (v) is expressed as According to the definition of MMSE, the MMSE equalizer G is given by (21), as shown at the bottom of the next page, where D (v,j) and R (v,j,w) v,ĵ,ŵ are defined in (22) and (23), as shown at the bottom of the page, respectively. In addition, in (23), | is a zero-padded identity matrix derived from the zero-padded datad (v,ĵ,ŵ) in (17), and is expressed as follows: The derivation of the closed form for the MMSE equalizer is in Appendix VIII, and according to (4) and (15), the signal passed through the BPF and MMSE equalizer is denoted as From (25), the SINR of the i-th data in the GFDM symbol u (26), as shown at the bottom of the page. Then, the achievable date rate of Thus, if rate (7) is lower than rate MMSE , the data of PACKET(v, j) can be successfully decoded.
The -conservative rate control scheme has the advantage that a UE only needs its own SNR and for determining its transmission data rate, and thus it can operate in a GF-AMA environment with very low UE's computational load. However, the scheme attempts to predict the received interference at the BS on an average sense by adjusting the parameter . Consequently, a few outliers cannot be dealt with; for instance, if the SNR of a specific packet is too high, the interference due to that packet is too high at reception, falling outside the interference range that can be handled by . Hence, the next section will propose a performance-focused solution to enhance the throughput with the limited increase in computational load of UEs.

V. A PERFORMANCE-FOCUSED SCHEME WITH A GFDM-BASED MMSE-SIC RECEIVER AND ADVANCED RATE CONTROL
In this section, we propose a performance-focused scheme, where each UE employs an advanced rate control based on a rough interference prediction, and a BS utilizes a GFDMbased MMSE-SIC receiver. This proposed scheme improves throughput and guarantee higher chance of successful transmissions compared to the lightweight scheme in Section IV. In addition, for the performance-focused scheme, it is also assumed that the information about when busy sub-bands become idle is included in the DL map.

A. GFDM-BASED MMSE-SIC RECEIVER
For further improvement in throughput, a GFDM-based MMSE-SIC receiver is proposed by applying the SIC technique that removes successfully decoded packets from the received signal.
The GFDM-based MMSE-SIC receiver operates in two steps: it applies SIC to the received signal, and then performs the MMSE on the SIC-processed signal. For the SICdecoding order in which signals are decoded in the SIC, the first-received first-decoded order, not the SNR descending order commonly used, is applied to avoid the inherent delay caused by SIC. 2 For the first step, the SIC-processed signal of the GFDM symbol u (v,j) w is represented as where y (v,j) w and i (v,j,w) v,ĵ,ŵ are defined in (15) and (16), respectively, and the set of decoded symbols is represented as and rate (a,b) In (29), (a) denotes packets transmitted earlier than PACKET(v, j), and (b) means decodable packets. In addition, rate UE are the maximum decodable data rate with the MMSE-SIC receiver and the transmission data rate at the UE transmitting PACKET(a, b), respectively. Both are defined later in (30) and (42), respectively. Then, for the second step, the MMSE equalizer G (21), meaning that the MMSE equalizer is now applied to the SIC-processed signal.
Hence, by the GFDM-based MMSE-SIC receiver, the SINR of the i-th data for r (v,j) w , denoted as SINR ). (30)

B. ADVANCED RATE CONTROL SCHEME
In the -conservative rate control scheme, each UE determines without the inter-user interference its transmission rate only by using its SNR and the pre-defined parameter . However, the throughput of this -conservative scheme is diminished, if the inter-user interference overwhelms the noise (e.g. if the SNR of users is high or the number of users is large). Hence, this section proposes an advanced rate control scheme based on SINR taking into account the inter-user interference.
Since the exact SINR cannot be predicted at the time of transmission, we derive a lower bound of the SINR of a UE. 2 In a GF-AMA scenario, the change of the SIC-decoding order can affect the spectral efficiency, but it is negligible. Indeed, our simulation has verified that the difference in spectral efficiency between the SNR descending order and the first-received first-decoded order is less than 1 % even at SNR of 30 dB. Therefore, in order to minimize decoding delay, the first-received first-decoded order is recommended.  PACKET(v , j ). The black-dotted gray box is an ongoing packet before the start of PACKET(v , j )'s transmission while the red-dotted box is a packet generated after PACKET(v , j ) starts to be transmitted.
By using the lower-bounded SINR, a UE decides its transmission data rate such that it is lower than the actual data rate, and thus prevents the rate mismatch problem. Since the SINR in (26) can only be computed by the BS at reception, we need to rethink the SINR from the perspective of a UE at the time of transmission. If all the packets transmitted before a UE starts transmitting PACKET(v, j) are successfully decoded at the BS, by using (4) and (15), we can rewrite the received signal of (28) at the BS as follows: is defined in (16), and e w . Therefore, the set is defined as w . Formally, the set is defined as In Fig.7, the packets transmitted after PACKET(v, j) cannot be removed, and hence the GFDM symbol u (v,j) w can be interfered by sub-bandv. That is, F (v,j,w) v includes the symbols in such interfering packets.
In (31), at the point of starting to send a new packet, a UE cannot knowH (v,ĵ) within i (v,j,w) v,ĵ,ŵ in (16) with only local CSI. In addition, it also cannot know e (v,j because from the UE's point of view, these variables relate to the transmission of packets that occur after the UE starts transmitting its packet. Therefore, to deal with such unknown variables in (31), we will obtain an expected SINR, and then derive an approximated lower bound of the expected SINR. Note that a UE can know S (v,j) v via the DL map which includes the information on when busy sub-bands become idle.

1) LOWER BOUND OF THE EXPECTED SINR
By passing the signal r (v,j) w in (31) through the BPF and ZF equalizer, the resultant signal is represented as (34) where the ZF equalizerG By using (34), for a UE, the SINR of the i-th data in the GFDM symbol u (v,j) w with the ZF equalizer is defined as (36), shown at the bottom of the page. Note that (36) is defined without the consideration of the MMSE equalizer. With only local CSI, a UE cannot obtain the MMSE equalizer. Therefore, a UE has no choice but can only consider the SINR with the ZF equalizer. However, under the asynchronous multipleaccess scenario considered in this work, the SINR with the ZF equalizer is lower than the SINR with the MMSE equalizer. Since considering the ZF equalizer leads to a lower-bounded SINR, the consideration satisfies the purpose of finding the lower bound of the expected SINR.
By averaging out the randomness of the inter-user interference, we can derive a lower bound on the expected SINR. Recall that S L (v,j,w) v,ĵ,ŵ and R (v,j,w) v,ĵ,ŵ are defined in (19) and (23), respectively. In (36) The randomness is because a UE cannot exactly know when the inter-user interference from other UEs comes in during its packet transmission.
Let us define the denominator of SINR (v,j,w,i) ZF in (36) as X , which is a random variable because of the randomness on e (v,j,w) v and L (v,j,w) v,ĵ,ŵ . Then, the expected SINR, E{SINR where (a) follows from Jensen's inequality. 3 Since e (v,j,w) v and L (v,j,w) v,ĵ,ŵ are uncorrelated, the term E{X } can be rewritten as v,ĵ,ŵ is a random variable due to L (v,j,w) v,ĵ,ŵ within it. Subsequently, the aim is to derive the upper bound of the expected inter-user interference in (38), which is independent of the unknown variables F (v,j,w) v andH (v,ĵ) . The derived upper bound of the expected interference is used to calculate a lower bound of the expected SINR, and hence a UE can determine its transmission data rate.
The upper bound of the expected inter-user interference in (38) is derived in (39), as shown at the bottom of the next page. In the sequel, each line of (39) is justified.

a: CHANNEL APPROXIMATION
For (a) of (39), we consider that a UE starts its transmissions by randomly selecting a sub-band among idle sub-bands, 4 and hence E{e (v,j,w) v } becomes constant regardless ofv. That is, we can replace E{e (v,j,w) v } with a constant κ for allv. For (b) of (39), the channel matrix H (v,ĵ) is approximated to the toeplitz matrix g (v,ĵ)H 1 with the same channel gain ofH (v,ĵ) , where g (v,ĵ) = (h (v,ĵ) ) H h (v,ĵ) , andH 1 ∈ C 3(N +N CP )×3(N +N CP ) is a toeplitz matrix generated by an (L × 1)-dimensional vector h 1 = (1, 0, · · · , 0) T . From the triangle inequality, we have ĵ) . Then, as the number of channel 3 Note that X > 0, and thus 1/X is a convex function. 4 In the asynchronous multiple-access, since a UE cannot know which idle sub-bands other UEs will access, it is a natural choice for the UE to start its transmission on a sub-band that is randomly selected among the idle sub-bands. Indeed, our simulation has verified that the throughput of the random sub-band selection method is almost the same as that of the max-SNR-based method in which a UE chooses the sub-band with the highest SNR for transmission among the idle sub-bands.  5 In other words, this approximation generally overestimates the power of the inter-user interference.

b: POWER CONTROL SCHEME AND THE WORST INTERFERENCE ASSUMPTION
The lower bound (c) of (39) is derived by our power control scheme and the worst interference assumption.
Power control scheme: for (c), p (v,ĵ) g (v,ĵ) is replaced with ρ, which can be possible by carefully designing the power control scheme. When a packet arrives at a UE, the UE selects the sub-band v, and accordingly sets the power of the PACKET(v, j) as where P UE denotes a maximum available power for UEs, and ρ is a parameter obtained by the simulation using long-term channel values. By this proposed power control, the effective received power p (v,ĵ) g (v,ĵ) is maximally limited to ρ. The worst interference assumption: by assuming that overlapping symbols are always consecutive, (ĵ,ŵ)∈F (v,j,w) v and I M |K (v) | are combined and replaced by I 3M |K (v) | . In this regard, I 3M |K (v) | represents the case where there always are other GFDM symbols before and after a GFDM symbol, which increases interference, whileĨ M |K (v) | also includes the case 5 The inequality holds true with almost 100 % probability when the number of channel taps is larger than 2. Since LTE generally assumes six channel taps [40], the inequality holds true in practice.
where there may not be other GFDM symbols before or after a GFDM symbol.
This derivation considers that there always are consecutive interfering GFDM symbols in neighboring sub-bands. Hence, for the GFDM symbol u (v,j) w , the time misalignment point always is in the duration of u (v,j) w . Then, the time misalignment can be regarded as a random variable independent of (v,ĵ,ŵ), and we can replace L (v,j,w) (v,ĵ,ŵ) withL. Moreover, UEs instantly start their transmissions in the asynchronous multiple-access scenario, and thus the time misalignment should be uniformly generated within the duration of u (v,j) w . That is,L follows the uniform distribution, unif{0, N + N CP − 1}, from which (d) of (39) is finally derived. Since κ = E{e (v,j,w) v }, it represents the average probability that a sub-bandv is occupied by a packet. Fortunately, a BS can easily measure the probability, and broadcast the value to UEs as a system parameter. In Section VII, we analyze the change of throughput with varying κ. The analysis presents that the throughput indeed is maximized at the point where κ is close to the probability of a packet generation in a sub-band. (39), the lower-bounded SINR of the GFDM symbol u (v,j) w for the i-th data is defined as Therefore, when a UE selects the sub-band v, the UE can determine a data rate of PACKET(v, j) as

Finally, by using the upper-bounded inter-user interference
v,ĵ,ŵ and the inversion for the autocorrelation matrix is O(N 2 log 2 (N )) flops and O (M |K (v) |) 3 flops, respectively. In general, N is much larger than M |K (v) |. Hence, the dominant complexity of the GFDM-based MMSE receiver is O(N 2 log 2 (N )) flops, which is much lower than O(N 3 M |K (v) |) flops in [35]. Note that the computational complexity of are computed only once and do not need to be recalculated until the channel changes.
Compared to the GFDM-based MMSE receiver, the GFDM-based MMSE-SIC receiver additionally needs to calculate i (v,j,w) v,ĵ,ŵ in (28). By using (16) and (17) simply extracts the rows of the matrixH (v,ĵ)Ã(v) , and thus causes a negligible computational complexity. Therefore, the product of S L (v,j,w) v,ĵ,ŵH produces an additional computational complexity of O(NM |K (v) |) flops. As a result, the computational complexity of the GFDM-based MMSE-SIC receiver and the GFDM-based MMSE receiver is in the same order since the complexity-dominant operation is the same as the MMSE equalizer.

B. COMPLEXITY AT UEs
The -conservative rate control scheme in (7) has a negligible computational complexity, while the advanced rate control scheme in (42) needs to calculate J (v,j,w) v of (39) to get SINR (v,j,w,i) UE of (41). Since all the matrices in (e) of (39) depend only on the sub-band indices v andv, the term (e) only needs to be calculated once during system setup. Hence, the computational complexity of (42) depends on the product of G (v,j) ZF in (35) and (e) in (39), and the total computational complexity of the advanced rate control scheme becomes Table 2 summarizes the computational complexities of all considered schemes.

VII. NUMERICAL RESULTS
The asynchronous multiple-access simulations are designed based on the system model illustrated in Fig. 3. Important simulation parameters are listed in Table 3, the values of which are chosen based on [8], [41]. In the simulations, a UE randomly selects a sub-band among idle sub-bands for transmission. It is assumed that the transmission power is the same for all UEs. It is also assumed that each channel is composed of six taps, where the magnitude of each channel tap follows a Rayleigh distribution and decays exponentially according to its delay, as commonly assumed in the literature [40]. A raised cosine prototype filter with the roll-off factor of α = 0.1 is used for the generation of the GFDM pulse shape filter.
For the performance comparison of the proposed schemes and the existing technologies, we consider the five conventional schemes: (1) an asynchronous GFDM-based ZF scheme, (2) an asynchronous OFDM-based ZF scheme, (3) a synchronous LTE scheme, (4) a synchronous LTE scheme with sTTI, and (5) a synchronous low-latency scheme with CB-PUSCH, which are explained below.
The two asynchronous schemes are considered as follows. In the 'asynchronous GFDM-based ZF scheme', UEs' behavior is the same as that in the proposed schemes; that is, UEs access idle sub-bands asynchronously as in Fig. 3. Thus, the received signal at the BS is the same as (34). Instead of the MMSE equalizer, however, the BS in the asynchronous GFDM-based ZF scheme applies the ZF equalizerG (v,j) ZF , defined in (35), to its received signal y (v,j) w of (15). In the 'asynchronous OFDM-based ZF scheme', everything is the same as in the 'asynchronous GFDM-based ZF scheme' except that the OFDM pulse shape filter is used instead of the GFDM pulse shape filter. If without a proper power control scheme, an asynchronous scheme encounters severe inter-user interference. For fair comparison, we consider the -conservative rate control scheme also for the asynchronous GFDM-and OFDM-based ZF schemes. VOLUME 10, 2022 On the other hand, the existing synchronous schemes discussed in Section II-A are implemented as follows. The typical LTE multiple-access procedure is implemented for the 'synchronous LTE scheme'. In LTE, after a UE wakes up, it makes a radio resource control (RRC) connection to a BS, and then attempts to make a synchronization with the BS. For an RRC connected and synchronized UE, a BS assigns one RB in PUCCH. The UE sends an SR via the assigned RB, where up to 12 SRs can be transmitted via one RB [42]. Therefore, to serve a large number of UEs, many RBs should be allocated for PUCCH. We numerically obtained 40 % as the best bandwidth proportion of PUCCH that minimizes the latency of the synchronous LTE scheme.
While the TTI is 1 ms in the 'synchronous LTE scheme', the 'synchronous LTE scheme with sTTI' [8] adopts a TTI value lower than 1 ms, which makes the SR period shorter and thereby reduces the access delay. In the 'synchronous low-latency scheme with CB-PUSCH' [7], multiple users are pre-allocated to the same resources in CB-PUSCH, where 10 users as a group share the same RB, and RBs in CB-PUSCH are assigned to several groups in a round-robin fashion. In addition, the synchronous low-latency scheme with CB-PUSCH adopts the same LTE OFDM symbol parameters. Note that the most recent 3GPP standard [43] includes optional use of the OFDM symbol duration much shorter than conventional 71.4 µs. However, such a very short OFDM symbol with a short CP is designed for some particular mmWave scenarios where the communication coverage is extremely limited. Since our focus is mainly on the conventional ultra high frequency (UHF) band and a scenario with relatively large cell coverage, we use the conventional OFDM symbol duration of 71.4 µs for the synchronous lowlatency scheme with CB-PUSCH.
A collision occurs if more than two symbols are overlapped in the same sub-band. In addition, a rate mismatch can still occur, even though the proposed rate control schemes are used. We employ the conventional collision resolution technique [7] to resolve a collision or a rate mismatch happened. In the technique, failed transmissions are transmitted once more on some dedicated resources. It is known that this process causes additional 8 ms delay [7].  Fig. 8 compares the spectral efficiency of the proposed MMSE receiver in Section IV-C and five other linear-type receivers. In the figure, the noise-MMSE receivers represent the MMSE receivers that only consider additive noise at the receiver without the consideration of any inter-user interference [31]. Since inter-user interference is dominant in asynchronous multiple-access scenarios, the performances of the noise-MMSE receiver are almost the same as those of the respective ZF versions.

A. PERFORMANCE EVALUATION OF THE ASYNCHRONOUS SCHEMES 1) PERFORMANCE COMPARISON BETWEEN LINEAR-TYPE RECEIVERS
In Fig. 8, the proposed MMSE receiver shows up to 40 % and up to 7 % improved spectral efficiency compared to the OFDM-based ZF and GFDM-based ZF, respectively. In addition, it achieves higher than 95 % of the linear receiver performance of [35] even with the significant reduction in the computational complexity explained in Section VI. Therefore, the proposed receiver can be used more generally than the linear receiver of [35], even in small cell scenarios where the BS needs to provide high throughput but has limited computing power. Fig. 9 describes the uncoded bit error rate (BER) and coded block error rate (BLER) of the linear-type receivers. In the simulation, the modulation is 16-QAM, and the 3GPP turbo code [44] is used with a block length of 3, 000 and a code rate of 1/3. In both uncoded and coded scenarios, the ZF-based receivers show poor error rate performance with high SNR because they cannot suppress any inter-user interference, which becomes dominant at the high SNR regime. As a result, the ZF-based receivers cannot achieve 1 % BLER even with SNR of 30 dB. On the other hand, the GFDMbased MMSE receivers achieve 1 % BLER roughly with SNR of 22 dB. 2) PERFORMANCE OF THE LIGHTWEIGHT SCHEME WITH THE OPTIMAL Fig. 10 shows the throughput and successful transmission probability with varying for the asynchronous schemes, in which each UE employs the -conservative rate control. With the assumption of optimal sensing of idle sub-bands, the probability of packet collision is negligible, because, in the proposed asynchronous schemes, a UE transmits instantly almost right after the packet arrival. Thus, a rate mismatch occurrence determines the successful transmission probability that a packet is successfully received at the BS. The simulations are conducted in an environment where there are 300 UEs with SNR of 10 dB.
Recall that in the proposed -conservative rate control scheme, as increases, UEs adopt more conservative rate adaptation by further lowering the transmission rate of packets from (7). Thus, the throughput may be lowered by increasing due to lowered packet rates. However, the throughput may be rather increased as increases, because the lowered rates of packets help to alleviate the rate mismatch problem. The throughput is determined as a net consequence of these two effects. As seen from Fig. 10, the successful transmission probability converges to 1 after a certain value for each scheme, and the rates of packets keep decreasing as increases. As a result, there exists an optimal value for each scheme that maximizes the throughput.
In Fig. 10, the proposed GFDM-based MMSE scheme achieves the successful transmission probability higher than 98 % at the max-throughput point, where the throughput is maximized. On the other hand, the GFDM-and OFDM-based ZF schemes achieve the successful transmission probability of 95 % and 87 %, respectively, at their respective maxthroughput points. Therefore, at the max-throughput point, the proposed GFDM-based MMSE scheme not only has the higher throughput than the GFDM-and OFDM-based ZF schemes, but also shows the higher successful transmission probability, yielding the highest transmission reliability for each transmitted packet. Table 4 summarizes the optimal values of the lightweight scheme, which are obtained by computer simulations, with varying numbers of UEs and SNR. The results show that  the optimal increases for increasing number of UEs or for increasing SNR. In particular, the optimal increases more when the number of UEs increases from 100 to 200 than when the number of UEs increases from 200 to 300. This indicates that the inter-user interference tends to saturate as the network is highly densified.

3) PERFORMANCE COMPARISON OF THE LIGHTWEIGHT SCHEME WITH THE -ADAPTATION ALGORITHM
To show that the lightweight scheme can also work for dynamic networks where the number of active UEs changes over time, we evaluate the -adaptation algorithm described in Section IV-A1. With β = 0.002, δ low = 0.95, γ succ = 3, SNR of 20 dB, and initial = 0.08, the values of optimal and adaptive are depicted in Fig. 11(a). For the simulation, we consider a scenario in which the number of UEs varies according to the time index, where one time index represents the duration of 10 seconds. In the figure, the adaptive value tracks the optimal value well with an average error rate lower than 1 %. In addition, as shown in Fig. 11(b), the difference of spectral efficiency between the optimal value and adaptive value is less than 1 %, where Fig. 11(b) describes the spectral efficiency of a packet for the adaptive and optimal . Therefore, with the -adaptation algorithm, the proposed lightweight scheme can be used even in an environment where the number of UEs changes over time. Fig. 12 shows the throughput and successful transmission probability of the asynchronous performance-focused scheme with various ρ 2 and κ when the BS and each VOLUME 10, 2022 UE employ the GFDM-based MMSE-SIC receiver and the advanced rate control, respectively. In Fig. 12(a), the maximum throughput is obtained at the point (κ = 0.5, ρ 2 = 15 dB), at which the successful transmission probability is higher than 99 % as seen from Fig. 12(b). Hence, the asynchronous performance-focused scheme can simultaneously improve the throughput and successful transmission probability at the cost of the slightly additional computational complexity of the UEs.

4) PERFORMANCE OF THE PERFORMANCE-FOCUSED SCHEME
In Fig. 12(b), with κ = 1, the successful transmission probability is constant at 1 for varying ρ 2 . Hence, the parameters should be set to proper values depending on the main target of an IoT system. If the system should guarantee the successful transmission probability higher than 99.999 %, as required by URLLC, κ should be set to 1. On the other hand, if high throughput is a priority, we can choose κ = 0.5 to get the maximized throughput with a compromised reliability of 99%. Fig. 12(b) presents the throughput of the proposed schemes for two sub-band selection methods: a random sub-band selection and max-SNR-based sub-band selection methods. In the random sub-band selection method, a UE randomly selects a sub-band for transmission among idle subbands, whereas with the max-SNR-based sub-band selection method, a UE chooses the sub-band with the highest SNR among the idle sub-bands. The optimal for the asynchronous lightweight scheme and the optimal (κ, ρ 2 ) for the asynchronous performance-focused scheme were numerically obtained such that the throughput is maximized for both the schemes, and those optimal parameters are used for the throughput comparison. As with the previous simulations, the simulations are conducted in an environment with 300 UEs.

B. PERFORMANCE COMPARISON WITH DIFFERENT SUB-BAND SELECTION METHODS
Figs. 13(a) and 13(b) present that both the asynchronous lightweight and performance-focused schemes show almost the same throughput for the two different sub-band selection methods, the reason for which can be explained as follows. First, the max-SNR-based sub-band selection method does not guarantee any overall optimality in terms of the total throughput. Specifically, although a UE selects a sub-band with the highest SNR for transmission at a certain time point, it would be much better to assign that selected sub-band in the future to other UEs with even higher SNR. Second, the max-SNR-based sub-band selection may be inferior to the random sub-band selection, because high SNR on a subband also results in high inter-user interference to neighboring sub-bands. For these reasons, there is little difference in throughput between the random sub-band selection and max-SNR-based sub-band selection methods, as shown in Fig. 13. Therefore, we use the random sub-band selection method in our subsequent simulations, since it requires minimum implementation cost.

C. THROUGHPUT AND LATENCY PERFORMANCE COMPARISON
In Fig. 14, the proposed schemes are compared to the conventional schemes in terms of throughput and latency. Here, the latency means the elapsed time from the moment a packet is generated until the transmission of that packet is successful. For the throughput comparison in Fig. 14(a), a scenario with 300 UEs is considered. Recall that the synchronous LTE scheme uses 40 % of its bandwidth as PUCCH to serve massive UEs. As a result, as seen from Fig. 14(a), due to the lack of PUSCH, the throughput of the synchronous LTE is inferior to that of the synchronous low-latency scheme with CB-PUSCH. On the other hand, the synchronous low-latency scheme with CB-PUSCH suffers from collisions. Note that unlike in the asynchronous multiple-access scenario, many collisions may occur among the UEs sharing the same pre-allocated RB in the synchronous multiple-access scenario, because any generated packet should wait for the next pre-allocated RB of CB-PUSCH; therefore, multiple UEs' packets generated during the waiting time will collide with one another when they are simultaneously transmitted at the next pre-allocated RB of CB-PUSCH. As a result, the RBs with collisions are wasted, which degrades the throughput. Hence, since the asynchronous OFDM-based ZF scheme has a very low collision probability due to asynchronous instant transmission, it has better throughput at lower SNR than the synchronous low-latency scheme with CB-PUSCH. However, as the SNR increases, the inter-user interference becomes severe, which significantly degrades the throughput of the asynchronous OFDM-based ZF scheme. On the other hand, with the GFDM pulse shape filter, the asynchronous GFDMbased ZF scheme outperforms the synchronous low-latency scheme with CB-PUSCH even at the SNR of 22 dB by successfully mitigating the inter-user interference.
Even with the low computational complexity on UEs, the proposed asynchronous lightweight scheme outperforms all the conventional schemes for all SNR regime, because it uses the MMSE receiver that better mitigates the inter-user interference than the ZF receiver. With the cost of the slightly increased computational complexity of the UEs, the proposed asynchronous performance-focused scheme significantly improves the throughput for all SNR regime. Moreover, the proposed asynchronous performancefocused scheme achieves higher than 88 % of the achievable upper-bound, where the upper-bound is obtained with the assumption that a UE transmits a packet at its maximum achievable rate derived in (30) and that there is no rate mismatch problem.
To show the throughput of the proposed performancefocused scheme in a URLLC scenario, we conduct an additional simulation with κ = 1 for the 100% successful transmission probability, where ρ 2 is chosen to maximize the throughput. The throughput of this scheme is even slightly better than that of the asynchronous lightweight scheme, while guaranteeing 100 % successful transmission, which cannot be guaranteed by the asynchronous lightweight scheme. Therefore, if a UE has sufficient computing power, it is always better to use the asynchronous performancefocused scheme.
In Fig. 14(b), the synchronous LTE scheme shows the worst latency, and the latency increases significantly as the number of UEs increases. The reason for the increment is that due to the lack of PUCCH, as the number of UEs increases, a UE has to wait a long time for the PUCCH dedication. With the reduced TTI duration, the synchronous LTE with sTTI scheme reduces the latency to some extent compared to the case with 1 ms-TTI. However, the synchronous LTE scheme with sTTI still suffers from the lack of PUCCH, and hence its latency performance is still not acceptable in a low latency IoT scenario. On the other hand, the synchronous lowlatency scheme with CB-PUSCH does not require a dedication of PUCCH to each UE, and thus the latency increment is limited for increasing number of UEs. However, the scheme significantly suffers from collisions among the UEs sharing the same RB, resulting in relatively high latency compared to the proposed schemes. Fig. 14(c) provides an enlarged view of the black-dotted box in Fig. 14(b). In the figure, UEs of all the asynchronous schemes transmit packets instantly after the packets are generated, and thus all the asynchronous schemes show significantly lower latency than the synchronous schemes. In particular, the asynchronous lightweight scheme has low latency lower than 1 ms in most cases with low computational complexity at the UEs. Enjoying the highest successful transmission probability, as shown in Section VII-A, the asynchronous performance-focused schemes show the lowest latency among all the schemes compared. Particularly, the choice with κ = 1 shows near-zero latency lower than 0.2 ms with less than 250 UEs.

VIII. CONCLUSION AND FUTURE WORK
To solve the rate mismatch problem in the GF-AMA scenario, we have developed the two schemes: (1) the lightweight scheme with the -conservative rate control and the GFDMbased MMSE receiver, and (2) the performance-focused scheme with the advanced rate control and the GFDM-based MMSE-SIC receiver. Depending on the computational capability at each UE, a system operator can choose one of those two schemes. The lightweight scheme requires limited computing power for each UE to compute only its own SNR values. Even with limited computational complexity at UEs, this scheme outperforms the existing synchronous and asynchronous technologies in both throughput and latency. In the case where UEs have relatively high computing power, the performance-focused scheme can be used, which outperforms the lightweight scheme at the cost of slightly increased computational complexity at the UEs. In addition, the proposed schemes show extremely low latency lower than 1 ms even with 400 UEs, whereas the latency of the existing synchronous schemes increases exponentially as the number of UEs increases. Therefore, the proposed asynchronous schemes are suitable for scenarios with massive IoT nodes. For the performance-focused scheme, a system operator actually has additional options in choosing κ. To guarantee 99.999 % successful transmission probability, as required by URLLC, one can choose κ = 1. On the other hand, one can compromise the packet transmission reliability to further increase the total system throughput. As a result, we expect that the proposed schemes allow system operators to flexibly cope with UEs' performance requirements.
In future work, with the GFDM-based GF-AMA, the scenario can be extended to a multi-cell MIMO scenario where BSs and UEs have multiple antennas. In such a scenario, the interference due to the asynchronous multiple-access and inter-cell interference can be jointly analyzed. In order to mitigate such interference, a multi-antenna beamforming technique can be used both at UEs and BSs. In addition, we will apply various mmWave 5G numerologies to the proposed GF asynchronous schemes for a possible extension to mmWave applications.

APPENDIX GFDM-BASED MMSE EQUALIZER
For completely deriving the MMSE equalizer, we need to derive an autocorrelation and cross-correlation. Based on (15), (17), and the assumption of uncorrelated data with unit variance, the autocorrelation of P (v) y (v,j) w is given by where (a) follows from the fact that E{d w ) H } = I M |K (v) | owing to the uncorrelated data, and that E{zz H } = σ 2 n I N . According to (20), B (v) (B (v) ) H = I M |K (v) | , so that P (v) (P (v) ) H is represented as Note that F N F H N = I N and F −1 w ) H is expressed as follows.
In (46), bothx (v,ĵ) w andÃ (v) are denoted in (17),Ĩ M |K (v) | is defined in (24), and (b) can be derived in the same manner as (a) in (43). In a similar way, the cross-correlation is represented as After one year at IBM Research, San Jose, CA, USA, then he moved to TCSI Inc., Berkeley, CA, USA. He coordinates the 5G Laboratory, Dresden, Germany, and two German Science Foundation (DFG) centers at TU Dresden, namely cfaed and HAEC. He has been a Vodafone Chair Professor at TU Dresden, since 1994, and has been heading the Barkhausen Institute, since 2018. His research interests include wireless transmission and chip design for wireless/IoT platforms, with 20 companies from Asia/Europe/USA sponsoring his research. He is a member of the German Academy of Sciences (Leopoldina), the German Academy of Engineering (acatech), and received multiple IEEE recognitions as well as the VDE ring of honor. In Dresden, his team has spunout 16 start-ups, and setup funded projects in volume of close to EUR 1/2 billion. He co-chairs the IEEE 5G Initiative and has helped organizing