Low-Correlation Superimposed Pilot Grant-Free Massive Access for Satellite Internet of Things

Satellite Internet of Things (S-IoT) with low Earth orbit satellites has become an effective solution for providing global coverage for massive machine type communications (mMTC). Considering that the massive user equipments covered by the S-IoT are periodically activated and dominated by short packet communications, the pilot collision has become a challenging problem due to the limited length and number of pilot sequences. In this paper, we propose a low-correlation superimposed pilot grant-free massive access (LSP-GFMA) scheme, where a low-correlation-zone periodic sequence (LPS) is designed for the superimposed pilot (SP) structure. Our LPS can maintain low cross-correlation with random non-orthogonal shifts compared with the conventional Zadoff-Chu sequence (ZCS), which can alleviate pilot collision while ensuring high spectral efficiency. In addition, we propose an iterative channel estimation based on Kaczmarz algorithm to attain accurate channel state information for the SP structure with low complexity. Then, we derive the theoretical expressions of access failure probability (AFP) and achievable throughput for our LSP-GFMA scheme under the shadowed-Rician fading channel. Simulation results validate the accuracy of our theoretical derivations, and demonstrate that our LSP-GFMA scheme with LPS can achieve lower AFP and higher achievable throughput than that with ZCS, and also outperforms the state-of-the-art schemes.


I. INTRODUCTION
W ITH the rapid development of low Earth orbit (LEO) satellite constellation, the satellite Internet of Things (S-IoT) is regarded as a promising solution to enable ubiquitous connectivity in global coverage [1].The S-IoT can support the massive machine-type communications (mMTC) in the upcoming fifth generation-advanced (5G-A) network, such as intelligent transport systems, aeronautical telecommunication, international multimodal transport, and environmental monitoring, etc [2], [3].However, the massive connectivity for uplink S-IoT and large propagation delay in satellite-ground links make the massive access in S-IoT become a challenging issue [4], [5].
Starting from the initial publication of ALOHA, a vast amount of evolutions of random access protocols can be found in the literatures [6], [7].The Contention Resolution Diversity Slotted Aloha (CRDSA) protocol is proposed in [6], where the terminal is allowed to generate two replicas of the same packet.In the CRDSA scheme, the recovered information from a successful packet is exploited to cancel the interference that its twin may generate on another time slot.This approach is iterated to recover most of the frame packets that were initially lost due to collision.The enhanced spread spectrum Aloha (E-SSA) scheme is another approach to enhance the performance of packet-based spread-spectrum ALOHA (SSA) random access [7].The E-SSA employs successive interference cancelation (SIC) at the packet level with a cyclic redundancy check (CRC) and opportunistic transmission control mechanism (OTCM).
Recently, the grant-free random access scheme is considered for massive access in S-IoT because of reducing handshake phases and control overhead [8], [9].As the traffic of random access is sporadic, the satellite access point (SAP) may have no prior information on the activity of the user equipment (UE) nor the channel state information in each random access slot.In general, the UE in GFMA randomly selects a pilot sequence prior to data and directly transmits the pilot to the SAP along with its data, and the SAP differentiates UEs by the pilot sequences.However, a conventional collision occurs when more than one UEs select the same pilot in GFMA and would deteriorate the UE detection [10], [11].Many research works focus on pilot signals design to mitigate this problem.The Gold codes constructed by a pair of m-sequences, have unique autocorrelation properties and are widely applied in CDMA systems.The Zadoff-Chu (ZC) sequences with different cyclic shifts and root indexes are utilized for pilot design in the 3GPP LTE system [12].To alleviate pilot collision, a non-orthogonal spreading sequence is considered a complement to pilot resources [13], [14].However, although introducing the nonorthogonal spreading sequence can decrease pilot collision probability, it causes interference between non-orthogonal UEs, which has negative effect on channel estimation and access failure probability (AFP) [15].Thus, the design of low correlation pilot sequence is a key part of GFMA scheme.
Moreover, the traffic in GFMA is characterized by short packet communications, where the regular pilot (RP) structure (i.e., the pilot sequence and data are successively transmitted) may deteriorate the spectral efficiency [16].In the superimposed pilot (SP) structure, the pilot sequences are superimposed on data sequences and transmit together, which do not take up additional time-frequency resources [17].However, we need to design a low complexity channel estimation algorithm to attain perfect channel state information (CSI) due to the inherent data interference in SP structure.

A. Related Works
The largest number of available orthogonal pilot sequences with length L is equal to L [18].However, the orthogonal pilot sequence set cannot fully alleviate the pilot collision in GFMA.To enlarge the pilot sequence set, the authors in [19] exploit the potential of ZCS with its non-orthogonal sequences from at most (L − 1) roots.Besides, the m-sequence and all-top sequence also can be employed to multi-root ZCS (mZCS) [21], [22], which increases the number of pilots by (L + 1) times.Note that each pair of non-orthogonal ZCS have the same cross-correlation 1/ √ L, which leads to serious interference between non-orthogonal UEs, i.e., the UEs who select the non-orthogonal ZCS.In [23] and [24], the nonorthogonal interference can be relieved in the receiver, which is designed through modeling sparse signal recovery into a multiple-measurement vector class of compressing sensing problem.To further reduce the non-orthogonal interference, a secondary pilot structure is proposed in [25], where the orthogonal sequences in mZCS set are used as the main pilot for UE detection, and the non-orthogonal sequences in mZCS set are used for alleviating main pilot collision as a secondary pilot.Thus, a UE is only disturbed by the interference from other UEs who select the same orthogonal pilot.However, the secondary pilot structure may deteriorate the spectral efficiency in short packet communication scenario.
The SP structure has the potential to improve the spectral efficiency in short packet communications for mMTC [26].On one hand, the existing channel estimation algorithms for SP structure are mainly based on the grant-based random access protocol [27], [28], [29], On the other hand, the channel estimation algorithms for SP structure in [30], [31] and [32] can be utilized in grant-free random access.In [30] and [31], the non-orthogonal interference can be significantly alleviated by using the subspace projection combined with the independent component analysis.Besides, the authors in [32] propose an estimator by strictly following the definition of the minimum mean square error (MMSE) estimation, which eliminates the inherent data interference in SP structure significantly and achieves a similar MMSE.However, these methods in [30], [31] and [32] depend on the MMSE decoder, and the corresponding complexity is O(K 3 ) due to the large dimensional matrix inversion for each UE, where K denotes the number of activated UEs.
Due to the large propagation delay in high dynamic LEO satellite-ground links and limited onboard resources, an effective channel estimation algorithm is required for the GFMA in S-IoT.An iterative channel estimation based on ridge regression (ICER) algorithm is proposed in [33], which can achieve fast convergence by several iterations.However, the ICER algorithm needs the inversion operation for the reconstruction of signal matrix, which still leads to O(K 3 ) computation complexity.Moreover, the inevitable channel estimation error in SP structure would deteriorate the performance of successive interference cancellation (SIC) for UE detection [34], [35].Then, the effect of imperfect SIC for the GFMA scheme in S-IoT should be further analyzed.

B. Contributions
In this paper, we propose a low-correlation superimposed pilot GFMA (LSP-GFMA) scheme for the uplink mMTC in S-IoT, and design a joint iterative channel estimation based on Kaczmarz (ICEK) and SIC decoding algorithm.The main contributions of the paper are summarized as follows.
• By utilizing the Hadamard matrix of order M = M ′ ×N ′ with elements {−1, 1} and a perfect sequence with length N with elements {−1, 0, 1}, we can generate a M × N ′ × N low-correlation-zone periodic sequence (LPS) set with M root LPS (rLPS) of length L = M × N , and each rLPS can generate N ′ × N shifted LPS (sLPS) by left-shifting each bit of the first N ′ × N − 1 bits in rLPS.Thus, we utilize rLPS set as the pilot set in our LSP-GFMA scheme, where each activated UE randomly selects and shifts a rLPS, and then performs random access to the SAP, which can significantly decrease the pilot collision probability.Further, we prove that the average auto-correlation and cross-correlation of our LPS is O(1/L), which is 1/ √ L times lower than that of mZCS, and can improve the decoding failure probability over the shadowed-Rician fading channel.shadowed-Rician fading channel, and further derive the expression of signal to interference noise ratio (SINR) by taking into account the imperfect SIC at SAP and the channel estimation error after the joint ICEK and SIC decoding algorithm.Thus, we have the closed-form expression of the decoding failure probability.Finally, we derive the closed-form expression of the AFP in LSP-GFMA scheme, which is mainly determined by the practical decoding failure probability, and further derive the achievable throughput from the AFP.Simulations validate the accuracy of the theoretical derivations, and demonstrate that our LSP-GFMA scheme has superior AFP and achievable throughput performances in short packet communication.

C. Paper Organization
The rest of this paper is organized as follows.Section II introduces the system model and LSP-GFMA scheme.Section III describes the ICEK algorithm for SP structure, and details the design methods and characteristics of LPS.In section IV, we derive the theoretical expressions of AFP and achievable throughput over the shadowed-Rician fading channel.Section V shows simulation results, and conclusions are drawn in Section VI.

II. SYSTEM MODEL AND LSP-GFMA SCHEME
In this section, we first outline the system model and analyze the pilot collision.Then, we propose the LSP-GFMA scheme to alleviate the collision.

A. System Model
The uplink GFMA S-IoT scenario is depicted in Fig. 1 (a), where a Ka band SAP with an altitude of 600 km provides mMTC services for UEs of a spot beam coverage [3], [4], [36].The UEs are quasi-stationary and uniformly random distributed in the spot beam coverage area.Since the satelliteground link is larger than several hundred kilometers away, we assume that each UE has approximately the same distance from SAP and arrives at almost the same time.One random access time slot consists of the propagation delay and processing delay at the transceiver.Since the worst two-way propagation delay is expected to be 26 ms for the SAP at 600 km [37], the random access time slot is set as 50 ms [38].In each random access time slot, the covered UEs are activated with activation probability p a to perform access [39], and K activated UEs randomly select the pilot from a size M orthogonal pilot set, and transmit data frame in length B to the SAP.Assume the pilot length is L, then the pilot set can be represented as P = {p 1 , p 2 , . . ., p M } ∈ C L×M .The pilot collision will occur when multiple UEs select the same pilot.For example, three UEs U 1 , U 2 and U 3 randomly select the same pilot p 1 as shown in Fig. 1, where the y-axis in Fig. 1 corresponds to the cross-correlation of a pair of pilots that are selected by UE U 1 and the other UEs.
To alleviate the pilot collision, we construct a novel LPS pilot set with M orthogonal rLPS for this uplink GFMA S-IoT scenario, where both average auto-correlation and crosscorrelation of our LPS are O(1/L), which can significantly decrease the pilot collision probability, and also achieves lower decoding failure probability than the mZCS over the shadowed-Rician fading channel.The construction of LPS is detailed in Section III-B.
In our LSP-GFMA scheme, each activated UE randomly selects and shifts a rLPS to obtain a sLPS as shown in Fig. 1(b) and Fig. 1(c), where U 1 , U 2 and U 3 randomly choose the same rLPS p 1 and left shift τ 1 , τ 2 and τ 3 symbols on p 1 to obtain p 1,τ1 , p 1,τ2 and p 1,τ3 , respectively.Then, these sLPS pilots are transmitted with the data d 1 , d 2 , and d 3 to the SAP.Thanks to the low average cross-correlation of pilot in sLPS set, the SAP can distinguish U 1 , U 2 and U 3 , and then recover the data message.
Moreover, consider the dominant traffic in uplink GFMA is short packet communication, the conventional RP structure in Fig. 1(b) may lead to low spectral efficiency when the ratio of B to L is low [32].Thus, we propose a SP structure as shown in Fig. 1(c), where the sLPS pilots p 1,τ1 , p 1,τ2 and p 1,τ3 are superimposed on the data d 1 , d 2 , and d 3 , respectively, and transmitted to the SAP.The SAP can obtain UE's identification information by signature information embedded in data header [40].Further, we propose an ICEK algorithm to deal with the inevitable channel estimation error caused by SP structure, which is detailed in Section III-A.Then, the received superimposed signal at SAP can be expressed as where d i and p i,τi are data and a sLPS that has τ i symbols left shift on p i of U i , respectively, P t represents total transmit power, α represents power allocation coefficient, ω denotes the additive white Gaussian noise (AWGN) as CN (0, σ 2 ).Further, g i represents the channel gain coefficient between U i and SAP, which is given as follows, where h i denotes the small scale fading coefficient of U i , which follows the PDF of shadowed-Rician fading channel [41], e jϕ represents the Doppler shift, which is caused by the movement of the LEO satellite.Assume that the diameter of a spot beam is about 100 km and the minimum elevation angle is 30 • [42], we can calculate the average Doppler shift for all UEs is less than 1300 kHz and the maximum difference of Doppler shift between UEs is 15 kHz.To handle the Doppler shift on the LEO satellite channel, we can first pre-compensate the Doppler shift roughly by the predictable orbit and movement of satellite [43].Then, a refine pre-compensation of the oscillator offsets can be performed by utilizing a phaselocked loop to track the reference tone [44].In this way, the Doppler shift estimation can be controlled within a small error range.Further, to compensate for the residual frequency offset, specific modulation and demodulation techniques are required.For example, the cyclic prefix (CP) based method can be utilized to estimate the Doppler shift within ±1/2 subcarrier for the OFDM system [45], [46].In this way, the residual frequency offset can be estimated successfully in the most SNR regime.Moreover, as the total bandwidth of a Kaband SAP is usually larger than 400 MHz [36], and the total frequency bandwidth is divided into multiple units with each channel bandwidth of B c , we can set a guard band double than the Doppler shifts, denoted by B g , to relieve the influence of Doppler shifts on the system performance [47], [48].F i represents the free space path loss between SAP and U i , which can be expressed as where f is the signal frequency and d is the distance between SAP and U i .

B. LSP-GFMA Scheme
Our LSP-GFMA scheme is a 2-step grant free random access scheme.The activated UEs randomly shift their selected rLPS to alleviate pilot collision and superimpose their sLPS on data to transmit.Then, the SAP performs ICEK and SIC decoding algorithm to recover the data of UEs.The details of LSP-GFMA scheme are shown as follows: Step 0: The SAP configures uplink resources and broadcasts to the activated UEs, and also with its position and velocity for synchronization.
Step 1: In a certain time slot, the activated U i randomly selects a rLPS p i from the LPS pilot set and randomly shifts p i by τ i bits.Then, the sLPS p i,τi ∈ p i,: = {p i,τ1 , p i,τ2 , . . ., p i,τ K } is superimposed on the data d i at a specific power allocation ratio α.Then the superimposed signal is sent to the SAP [49].We assume that U i is equipped with global navigation satellite system (GNSS) receiver to estimate the timing and carrier frequency for synchronization.
Step 2: The SAP knows the rLPS set and obtains all sLPS by shifting operation for each rLPS, and the SAP utilizes the sLPS {p 1,: , p 2,: , • • • , p K,: } to detect pilot collision and recover the data from y.If a pilot collision occurs on p i,τi , it would be discarded.Then, the SAP performs the joint ICEK and SIC decoding algorithm to recover the data of remaining UEs in ŷ as shown in Fig. 2. In the i-th iteration, the ICEK algorithm outputs the estimated CSI ĝi and ŷi , and then delivers to the SIC decoder.If the algorithm converges, we get the decoded data D = Di .Otherwise, the ICEK algorithm reconstructs the weighting matrix of superimposed signal W i , and updates CSI ĝi+1 with W i for the next iteration, the detailed process is introduced in Section III-A.

III. DESIGN OF JOINT ICEK AND SIC DECODING ALGORITHM AND CONSTRUCTION OF LPS SET
In this section, we propose a joint ICEK and SIC decoding algorithm, and introduce the construction method of LPS set.Moreover, we prove that the average auto-correlation and cross-correlation of LPS are O(1/L).

A. Joint ICEK and SIC Decoding Algorithm
In the SP structure, since the accurate CSI cannot be obtained due to the inherent data interference, and the channel estimation error leads to imperfect SIC in the decoding procedure, we propose a joint ICEK and SIC decoding algorithm.
The original Kaczmarz algorithm is known as algebraic reconstruction technique (ART) [50] in computed tomography, and it provides an iterative method for solving over-determined linear equation Ax = b, where A is a n × m matrix, x is a m × 1 vector to be determined, and b is a measurement vector.In every iterative process of Kaczmarz algorithm, A k is the k-th row of the matrix A. The solution of the last inner iteration is xt,m−1 within the t-th outer iteration, which satisfies A T m , xt,m−1 = b m .Given an initial solution x1,0 , the k-th inner iteration in the t-th outer iteration of Kaczmarz algorithm can be expressed as: where ∥A k ∥ is the second order norm of A k .
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In general, if we utilize the MMSE estimation to obtain the CSI of activated UEs, Eq. ( 1) can be expressed as: where and g = {g 1 , g 2 , . . ., g K } T is channel gain vector of K activated UEs.Let X = D + P p , and the estimated CSI of activated UEs can be expressed as: where represents the weighting matrix.ŷ = P H p y can be utilized as a matched filter based on sLPS, which is expressed as: Note that P H p P p is much larger than P H p D due to the low auto-correlation and cross-correlation of sLPS.Therefore, we can treat the data matrix D as interference to the pilot matrix P p in our ICEK algorithm.To avoid the matrix inversion operation, we convert Eq. ( 6) to a linear equation W −1 ĝ = ŷ, where ŷ is regarded as a measurement vector, and the solution ĝ can be calculated by the Kaczmarz algorithm with K inner iteration and J 1 outer iteration: where ĝt,k is the solution in k-th inner iteration of t-th outer iteration, w k is the k-th row of W −1 , which depends on the cross-correlation of selected sLPS by U k and that of other UEs.The convergence factor δ ∈ (1, 2) is usually set as (1 + λ min /λ max ), where λ min and λ max represent the maximum and minimum eigenvalues of the matrix W, respectively [51].Consider W is approximately diagonal matrix with the same elements and has λ min ≈ λ max due to the low auto-correlation and cross-correlation of sLPS, we have δ ≈ 2. Furthermore, the initial solution is denoted as ĝ1,0 = 0.Then, we should find the closest vector ĝt,k+1 to the true solution g that satisfies arg min Therefore, we can find a large ∥w k ∥ to obtain w T k , ĝt,k−1 = y k .Consider that ∥w k ∥ is determined by the cross-correlation of selected sLPS by U k and that of other UEs, and ∥w k ∥ increases with Then, Eq. ( 8) can be rewritten as: Note that the estimated CSI from Eq. ( 10) needs further iteration process, because we only use P p to construct weighting matrix W in ICEK algorithm yet.We can jointly utilize ICEK and SIC decoder, and iteratively update the CSI for improving the accuracy.In the first iteration of joint ICEK Algorithm 1 Joint ICEK and SIC Decoding Algorithm Input: y, P p , J 2 , K, ĝ1,0 = 0 Output: and SIC decoding algorithm, let X 0 = P p + D0 and ĝ0 = 0, where D0 = 0.Then, we construct the initial weighting matrix as Ŵ0 = (X H 0 X 0 + σ 2 I K ) −1 , and utilize the ICEK algorithm to solve the linear equation Ŵ−1 0 ĝ0 = ŷ.Note that J 1 is the number of outer iterations when the ICEK algorithm achieves convergence, i.e., w T ψ K , ĝj J1,K−1 = y ψ k , and the number of outer iterations of ICEK in the j-th iteration of joint ICEK and SIC decoding algorithm is J outer iterations, and deliver the estimated CSI ĝ1 to the SIC decoder, denoted by F .Note that the SIC performs in an iterative manner and stops when any UE fails to decode before all non-collision UEs are decoded [9], [56].The first decoding result D1 = F (y, ĝ1 ) is used to construct the superimposed signal matrix X 1 = P p + D1 and Ŵ1 = (X H 1 X 1 + σ 2 I K ), where F (y, ĝ1 ) represents the data matrix decoded by SIC decoder with input y and ĝ1 .Then, Ŵ1 is the weighting matrix in the next iteration.Therefore, in the j-th iteration of joint ICEK and SIC decoding algorithm for j = 1, • • • , J 2 , we can obtain the weighting matrix Ŵj = X H j X j + σ 2 0 I K , where X j = P p + Dj , and Dj = F (y, ĝj ).In the ICEK algorithm, the CSI is updated iteratively to reduce the interference and then improve the estimation accuracy.In addition, the complexity of obtaining the weighting matrix in each iteration is O(K 2 ).Let J = J2 j=1 J (j) 1 denote the total number of iterations required to reach convergence.Thus, the ICEK algorithm has a combined complexity of O(JK 2 ).The detailed joint ICEK and SIC decoding algorithm is summarized in Algorithm 1, and the proof of convergence for line 9 in Algorithm 1 can be found in [52].Besides, the power allocation coefficient α has significant effect on the performance of joint ICEK and SIC decoder algorithm.To quantify the performance of joint ICEK and SIC decoding algorithm, we define the NMSE as follows, where ∆g = |ĝ J2 − g| denotes the channel estimation error.Fig. 3 compares the NMSE of our ICEK and the conventional MMSE algorithms versus J iterations under different α.We can observe that the ICEK algorithm achieves convergence after J = 10 iterations for all the simulated α, and it can achieve about 60 times gain over MMSE with respect to NMSE in our LSP-GFMA scheme.Moreover, the complexity of ICEK algorithm is O JK 2 , while the complexity of MMSE algorithm is O K 3 .
Furthermore, we can observe that the NMSE increases with the increase of α.This is because the more power is allocated to data, the stronger data interference is led to the pilot, and we can improve the performance of ICEK algorithm by allocating more power to pilot.However, if α is low, the SINR of UEs would be degraded and then the decoding failure probability would be deteriorated, which is analyzed in Section IV.Therefore, the selection of α is a tradeoff between the accuracy of ICEK algorithm and SINR of UEs.We simulated the NMSE of our ICEK algorithm versus α with different SNR as shown in Fig. 4. The NMSE increases slowly as α decreases when α ≤ 0.7.

B. Construction of LPS Set
Both pilot detection and joint ICEK and SIC decoding algorithm are based on LPS set, which plays an important role in our LSP-GFMA scheme.
We can construct the LPS set by utilizing the Hadamard matrix H = [h i,j ] of order M with elements {−1, 1}, and a length N perfect sequence a = {a 1 , a 2 , . . ., a N −1 } with elements {−1, 0, 1}.Note that for a zero-correlation-zone periodic sequences set with zero-correlation zone [0, τ z ] [53], each pair of sequences in this set has zero cross-correlation if they left shift no more than τ z bits symbols.We expand the zero-correlation zone to the low-correlation zone to construct our LPS set as follows.
Step I: Let M = M ′ × N ′ and rearrange each row h i,: = {h i,0 , h i,1 , . . ., h i,(M −1) } of H to a M ′ × N ′ matrix, and we have , where h m,: represents a row vector in the m-th row of H i .
Step II: Let s = {s 0 , s 1 , . . ., s i , . . ., s N ′ −1 } be a shift sequence of length N ′ , where s i is an integer and s i ∈ [0, N ).We can obtain the interleaving matrix I with the length N perfect sequence a and s: where the subscript is calculated by modulo N .For simplicity, the interleaving matrix can be expressed as where L s (•) denotes the operator of s symbols left shift.Then, for 0 ≤ i ≤ M − 1, we construct a matrix P (i) with N × M ′ rows and N ′ columns, . . .
where "•" is the column-wise product, which is defined as follows: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Step III: By concatenating the successive rows of the matrix P (i) , it can be rearranged to a rLPS sequence p i = {p × i( mod N ) to generate optimal LPS set [53].In this condition, the bound of zero-correlation zone is τ z = N ′ × N N ′ − 1, and the maximum symbols that LPS can left shift is τ m = N × N ′ − 1 to guarantee the low correlation.Thus, the length of LPS is L = N × M .The detail derivation is given in the Section III-C.
A toy example to better understand the LPS is given as follows: For a perfect sequence a = {1, 1, 0, 1, 0, 0, −1} with length N = 7 and a Hadamard matrix H with order M = 4, can be expressed as Let M = M ′ × N ′ , and Step II.Then, the toy example to generate the LPS set according to the above steps can be summarized as follows.

C. Theoretical Analysis of LPS
In this subsection, we prove that if the number of left shift symbols is τ ∈ [0, τ z ] on rLPS, each pair of sLPS in a LPS set are orthogonal to each other.Moreover, if the number of left shift symbols is τ ∈ (τ z , τ m ], any pair of sLPS in this LPS set are non-orthogonal with a low average cross-correlation O(1/L).Let R pi,τ i ,pj,τ j denote the crosscorrelation function between two sLPS p i,τi and p j,τj with their left shift difference is ∆τ = |τ i − τ j | symbols, and we have Thus, we have the following Theorem 1 and Theorem 2 to prove that both of the auto-correlation and cross-correlation are 0 and O(1/L) in ∆τ ∈ [0, τ z ] and ∆τ ∈ (τ z , τ m ], respectively. Theorem 1: If 0 ≤ ∆τ ≤ τ z , we always have R pi,τ i ,pj,τ i (τ ) = 0 for arbitrary two sLPS p i,τi and p j,τj , where the condition i = j and τ i = τ j are not simultaneously true. Proof: According to [54], the cross-correlation between p i,τi and p j,τj can be calculated as where R a (∆τ ) denotes the auto-correlation of perfect sequences a, and R a (∆τ ) = 0 if ∆τ mod N ̸ = 0.Then, we can distinguish the following two conditions: 1) When 0 < ∆τ ≤ τ z , two cases can be further distinguished: we have s t+τr ≥ s t , then we have s t+τr − s t = τ r × N N ′ , thus we have s t+τr − s t + τ q > 0 and s t+τr − s t + τ q ≤ N ′ × N N ′ − 1 < N ; Therefore, we have R pi,τ i ,pj,τ i (∆τ ) = 0.
□ Therefore, the LPS has perfect auto-correlation and crosscorrelation if the number of left shift symbols is in [0, τ z ].Note that the average cross-correlation of LPS affects the decoding failure probability in massive access, and we prove that the average auto-correlation and cross-correlation of LPS are O(1/L) in the following Theorem 2.
Note that the length of LPS is L = N × M , and M = M ′ × N ′ is the order of Hadamard matrix to construct the LPS set.If τ z < ∆τ ≤ τ m , the average auto-correlation and cross-correlation of arbitrary two sLPS p i,τi and p j,τj are Rpi,τ i ,pj,τ i Therefore, the average auto-correlation and cross-correlation of arbitrary two sLPS are O(1/L) when the left symbol shift is in (τ z , τ m ], which is lower than that of mZCS O(1/ √ L). □

IV. PERFORMANCE ANALYSIS OF LSP-GFMA SCHEME WITH LPS UNDER IMPERFECT SIC
In this section, we assume that the pilot collision can be detected with signal power detections [55] or utilize the CRC [56].We derive the close-form expressions of the AFP and achievable throughput of the uplink LSP-GFMA under shadowed-Rician fading channel and imperfect SIC.AFP is defined as the probability that pilot collision occurs or the data of UEs fails to be decoded.The achievable throughput is defined as the number of successfully decoded data bits in each data frame.
Note that for K activated UEs randomly select rLPS pilots, the pilot collision probability β can be calculated as: where M × (τ m + 1) = M × N ′ × N is the total number of sLPS.After the pilot detection, the UEs in pilot collision are discarded.Assume the number of UEs without pilot collision is K s , and the probability distribution of K s can be expressed as: Then, if U i selects p i and shifts τ i , the SAP can recover data di with sLPS p i,τi from K s UEs without pilot collision to obtain d i as follows, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where the UEs that select orthogonal sLPS to p i,τi are eliminated, and K n is the number of UEs that select lowcorrelation sLPS to p i,τi , which determines the interference of low-correlation UEs to U i .The probability distribution of K n can be expressed as: where Note that we can utilize typical threshold-based algorithms to detect UEs, and define the false alarm probability as the probability of detecting a sequence when it has not been transmitted.This happens when the correlator output is larger than a predetermined threshold value.The threshold is set as the estimation of the interference of low-correlation UEs to U i in Eq. ( 27) and noise power σ 2 , i.e., Pt(Kn−1)

LKn
• Kn i=1 g i +σ 2 .Thus, the false alarm probability is lower than 10 −4 under all simulation parameters, which can be ignored in our analysis.Then, the decoding failure probability of U i can be calculated as: where γ represents SINR of U i .Note that we ignore the specific modulation and channel coding schemes, and utilize the decoding threshold γ th to judge whether a decoding process can be successful, i.e., if γ ≥ γ th , d i can be recovered successfully.γ th is dependent on the modulation and channel coding schemes that decide the data transmission rate [18], [32].Considering imperfect SIC, the SINR of U i is calculated as: where θ = Pt σ 2 represents SNR, and I represents interference from other low-correlation UEs, where K denotes the number of decoded UEs.µ is interference transfer factor of imperfect SIC [35].
is the interference caused by channel estimation error.The definition expression of ∆g i is heuristic as shown in Eq. ( 10), and we set ∆g i as a constant from simulation results.Denote X = θ|g i | 2 , the PDF of X over the shadowed-Rician fading channel can be expressed as: where 2b 0 and m 0 represent the average power of the multipath component and Nakagami-m parameter, respectively.Ω denotes line-of-sight (LoS) component, and denotes the confluent hypergeometric function, which can be expressed as: We select the parameters in [41] as b 0 = 0.063, m 0 = 1, Ω = 0.000897, and the PDF can be simplified as: where

Then, I can be expressed as
Kn−1 j=0 ρ j X j , where The PDF of I can be derived as: The detail derivation can be found in Appendix A. Since X and I are independent, the joint PDF of X and I is where 0 ≤ x < +∞ and 0 ≤ y < +∞.Then, when K n UEs select low-correlation sLPS to p i,τi under γ th , the decoding success probability Pr γ ≥ γ th | K n can be calculated as follows Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.ρ 2 e ϕϵ η(ϕρ j + 1) where ϕ = .
Therefore, we can substitute Eq. ( 26), Eq. ( 28) and Eq. ( 38) into Eq.( 29), and the decoding failure probability can be expressed as, Finally, we can calculate AFP from Eq. ( 25) and Eq. ( 39) as follows, The expression of achievable throughput can be derived by utilizing AFP as follows, V. SIMULATION RESULTS AND DISCUSSION In this section, we simulate the pilot collision probability, decoding failure probability and AFP of LSP-GFMA scheme with LPS, ZCS, and mZCS.Then, we compare AFP and achievable throughput of LSP-GFMA scheme with the existing GFMA schemes, including the modified approximate message passing (M-AMP) [8] and the E-SSA schemes [57] with mZCS and ZCS, respectively.Both the M-AMP and E-SSA schemes adopt the RP structure.Note that the length of ZCS L ZCS is a prime number, and the mZCS is generated from L ZCS − 1 different roots.As the largest number of available orthogonal pilot sequences is equal to the pilot length, we try to construct the LPS with similar length to the ZCS for fair comparison of the pilot collision probability and decoding failure probability.Thus, in the simulation setting, we set the length of ZCS as 173 and 337, and the length of LPS as 168, 336.The simulation parameters are listed in Table I [58], [59].Fig. 5(a) shows the pilot collision probability of three different pilots with respect to K activated UEs with different length of pilots.We can observe that the LPS achieves about M times lower pilot collision probability than that of ZCS, and the mZCS has the lowest pilot collision probability because it has the most number of non-orthogonal sequences from L ZCS − 1 different roots.However, the decoding failure probability of mZCS is significantly deteriorated with the increasing of K due to its high cross-correlation than the LPS.Fig. 5(b) shows the decoding failure probability of LSP-GFMA scheme with LPS, mZCS and ZCS pilots with respect as the gain of LPS over mZCS and ZCS for decoding failure probability ξ, and achieve the largest G ξ =1.625 for γ th = 1.2 when K = 50.Therefore, we set γ th = 1.2 in the following simulations.
Note that the time difference between the UEs at subsatellite point and the edge of spot beam will lead to the pilot unalignment at the satellite receiver, and the decoding threshold γ th also becomes larger.With our simulation setup, there may one sampling point unalignment and we find that γ th = 1.25 can achieve the largest G ξ .Thus, Fig. 5(c) shows the decoding failure probability with and without time difference versus SNR.It can be observed that the decoding failure probability of LPS with time difference is slightly higher than that of LPS without time difference when SNR < 10dB.Moreover, the time difference increases the decoding failure probability by less than 1% when SNR ≥ 10dB.Hence, it is reasonable to assume that the impact of the time difference on the proposed scheme can be ignored when an appropriate SNR is guaranteed.
Fig. 6(a) shows the decoding failure probability with respect to K activated UEs under different γ th and L, which also validates the accuracy of Eq. (39).Note that the uncollision UEs with ZCS are zero cross-correlation to each other, which lead to the decoding failure probability of ZCS is not affected by K.However, both of the UEs with LPS or mZCS may suffer the interference from other uncollision UEs, and the decoding failure probability increases with K. We can observe that LPS has a lower decoding failure probability than mZCS, because the UEs with mZCS suffer higher interference due Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.10 times lower than mZCS with K = 50 when they are stable as SNR ≥ 10dB.
Fig. 7(a) shows the AFP of LSP-GFMA scheme versus α under different SNR.It can be observed that the AFP of LSP-GFMA scheme in all SNR regimes can achieve a minimum value when α=0.7.This is because smaller α leads to higher channel estimation accuracy.However, when α is small, the SINR of UEs would be degraded and then the AFP is deteriorated.
The comparison of AFP of three pilots versus K under different L and γ th is shown in Fig. 7(b), where the simulation results validate the accuracy of Eq. ( 41).We can observe that the AFP of LPS is lower than ZCS in all K and L regimes, because the AFP of ZCS is seriously deteriorated by the pilot collision due to its limited size of pilot set as shown in Fig. 4. On contrast, the AFP of UEs with LPS and mZCS are mainly affected by the decoding failure probability, which are both limited by the interference from other UEs.Therefore, the AFP of UEs with LPS can be lower than that of UEs with mZCS, which validates that our LSP-GFMA scheme with the innovative designed LPS can achieve better tradeoff between the pilot collision probability and decoding failure probability than both ZCS and mZCS pilots.    by the decoding failure probability due to the interference of other UEs.Moreover, the AFP of LPS with L LP S = 336 is about 60 times lower than that of mZCS and 10 times lower than that of ZCS when SNR ≥ 10 dB.
Fig. 8(b) shows the AFP of LSP-GFMA, E-SSA and M-AMP schemes versus K.We can observe that our LSP-GFMA has the lowest AFP among three schemes.In the E-SSA scheme, the activated UEs adopt the same ZCS pilot with different delay, and the SAP can distinguish them by different delays.The number of different delays is equal to the length of ZCS pilot, which leads to the pilot collision probability of E-SSA is higher than our LSP-GFMA, and thereby the AFP of E-SSA is about 2.5 times higher than our LSP-GFMA scheme.In addition, the M-AMP scheme utilizes the sparsity to reduce the correlation between pilots based on compressed sensing algorithm.The shorter pilot leads to higher correlation [8], and the correlation of adopted pilot set in the M-AMP scheme is higher than that of the LSP-GFMA scheme when pilot length is 336 or 168.Consequently, the M-AMP scheme has a slightly higher AFP than our LSP-GFMA scheme.Besides, the computational complexity of M-AMP scheme is O (K/p a • L), where our LSP-GFMA scheme is O JK 2 .Fig. 9(a) compares the achievable throughput of three pilots with different number of active UEs.Recalling that the AFP of ZCS is mainly deteriorated by the pilot collision due to its limited size of pilot set, while the AFP of LPS and mZCS Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
are mainly affected by the decoding failure probability, which are both limited by the interference from other UEs.Then, the achievable throughput of LPS is higher than that of ZCS and mZCS when the pilot sequence length is almost the same.Finally, we compare the achievable throughput of our LSP-GFMA scheme with E-SSA and M-AMP schemes versus K with different length of data B as shown in Fig. 9(b).Similarly, the proposed LSP-GFMA scheme has the largest achievable throughput among three schemes, because the LSP-GFMA scheme adopts superimposed pilot structure, which does not take up additional time-frequency resources.

VI. CONCLUSION
In this paper, we proposed an LSP-GFMA scheme with innovative designed LPS for uplink mMTC in S-IoT.We constructed the LPS set with M orthogonal rLPS by utilizing a perfect sequence and Hadamard matrix, and the activated UEs could randomly select and left shift the rLPS to obtain sLPS and significantly alleviate pilot collision.The pilot collision probability of our LSP-GFMA scheme with LPS is M times lower than that of conventional ZCS.We proved that sLPS has lower average auto-correlation and cross-correlation of O(1/L) than mZCS of O(1/ √ L), and derived the accurate theoretical expression of AFP and achievable throughput of the LSP-GFMA scheme.Moreover, we proposed a joint ICEK and SIC decoding algorithm for obtaining accurate CSI.Simulation results validated the accuracy of our derivations, and demonstrated that the decoding failure probability of LPS is about 60 times lower than that of mZCS when K ≥ 50.Furthermore, the LSP-GFMA scheme with LPS design could achieve lower AFP than ZCS and mZCS with the increasing number of activated UEs K, which indicated that LPS has a great advantage in practical massive access.We also demonstrated that our LSP-GFMA scheme can achieve lower AFP and higher achievable throughput than the existing M-AMP and E-SSA schemes.

APPENDIX A DERIVATION OF PDF OF INTERFERENCE FROM LOW-CORRELATION UES UNDER IMPERFECT SIC
Based on Eq. ( 31), we have ρ = η under the selected parameters.The interference I from low-correlation UEs can be expressed as: where ρ j = µ p T i,τ i p * j,τ j ∥pi,τ i ∥ 2 when 0 ≤ j < K h , and ρ j = p T i,τ i p * j,τ j ∥pi,τ i ∥ 2 when K h < j ≤ K n − 1.
Let Z j = ρ j X j and I = Kn−1 j=0 Z j .We can derive the PDF of Z j as: The Laplace transform of Z j is as follows: Since {Z 0 , Z 1 , . . ., Z Kn−1 } are independent, then we have The function F I (s) is a proper rational function, which can be represented as a sum of partial fractions in the following manner: where the constants c j are given as follows:

Fig. 2 .
Fig.2.In the procedure of joint ICEK and SIC decoding algorithm, after initialization, the decoded data by SIC decoder is reconstructed to re-estimate the CSI.

Fig. 8 (
a) compares the AFP of three pilots with different SNR under different length of pilots with K = 50.The AFP of LPS is the lowest among three pilots under all SNRs when K = 50, which indicates that the LSP-GFMA scheme with LPS has a great advantage in practical massive access.When

TABLE I SIMULATION
PARAMETERSto the γ th .Under the same SNR, ZCS has the lowest decoding failure probability, because the uncollision UEs with ZCS are zero cross-correlation to each other.Due to the low cross-correlation O (1/L) of LPS, it has about 23 times lower decoding failure probability than mZCS.To find the most appropriate γ th , we define G ξ =