Low-Complexity Decoder for Overloaded Uniquely Decodable Synchronous CDMA

We consider the problem of designing a low-complexity decoder for antipodal uniquely decodable (UD) /errorless code sets for overloaded synchronous code-division multiple access (CDMA) systems, where the number of signals Kamax is the largest known for the given code length L. In our complexity analysis, we illustrate that compared to maximum-likelihood (ML) decoder, which has an exponential computational complexity for even moderate code lengths, the proposed decoder has a quasi-quadratic computational complexity. Simulation results in terms of bit-error-rate (BER) demonstrate that the performance of the proposed decoder has only a 1-2 dB degradation in signal-to-noise ratio (SNR) at a BER of 10^-3 when compared to ML. Moreover, we derive the proof of the minimum Manhattan distance of such UD codes and we provide the proofs for the propositions; these proofs constitute the foundation of the formal proof for the maximum number users Kamax for L=8 .


5G
The fifth generation 6G The I N the last decade, wireless communication services have experienced explosive growth while communication technologies have progressed generation by generation. In the previous generations spanning from 1G to 4G, the multiple access schemes were mostly characterized by orthogonal multiple access (OMA) techniques, where users are assigned orthogonal resources in either frequency, (frequencydivision multiple access (FDMA)), time, (time-division multiple access (TDMA)) or code, (code-division multiple access (CDMA)). CDMA [1] was the basic technology for 3G and for some 2G (IS-95) networks. High spectral-and power-efficiency, massive connectivity and low latency are among the requirements for next generation communications and these requirements are expected to increase in the future, as researchers turn their efforts towards sixth generation (6G) wireless communications. Enhanced mobile broadband (eMBB), ultra-reliable low-latency communications (uRLLC) and massive machine-type communication (mMTC) support a suite of compelling applications driving these requirements. Massive multiple-input multipleoutput (MIMO), non-orthogonal multiple access (NOMA) and millimeter-wave (mmWave) communications constitute promising techniques of addressing these stringent requirements [2].
Supporting a large number of users communicating over a common channel may not be readily achievable by OMA techniques due to the multiple-access interference (MAI) in rank-deficient systems, where the number of users is greater than that of the resource blocks. To meet the demand of increased bandwidth efficiency in synchronous CDMA, a CDMA concept was introduced in [3], which can support many more users for a given code length compared to traditional CDMA. A number of signature design schemes have been studied for dense spreading in conventional CDMA, where low cross-correlation sequence sets are designed to minimize the overall MAI, which allows more users to simultaneously access the common channel. This in turn results in increased spectral efficiency. Finding suitable spreading codes and decoding schemes for such overloaded systems is a challenging optimization problem. To address these challenges, numerous non-uniquely decodable (non-UD) [3]- [10], and UD [11]- [35] construction based code sets have been proposed. Examples of such non-UD code sets are pseudo-noise spreading (PN) [4], [5], orthogonal/orthogonal CDMA (O/O), [6], [7], PN/orthogonal CDMA (PN/O) [3], multiple-orthogonal CDMA (MO) [8], improved O/O CDMA [9]. Those codes employ two or more sets of orthogonal signal waveforms, which allows the system to accommodate more users than the signature length L. As a consequence of this, a significant level of MAI exists at the output of each user's matched filter due to the non-zero cross-correlation of different signatures.
Low cross-correlation sequence sets might not be the best criterion for very high rank-deficient systems. One important criterion in such rank-deficient systems is for the code set to be UD. By definition the UD codes are those in which the data of different users can be unambiguously decoded in a noiseless channel using linear recursive decoders [31]. Lowcomplexity linear decoders were introduced for these UD code sets using either binary {0, 1}, or antipodal {±1}, or alternatively ternary {0, ±1} chips in [27]- [30], [32]- [35]. On the other hand, Lu et al. [36] proposed M -ary code sets for the multiple-access adder channel. Various applications have been implementing UD code sets framework such as in multi-way physical-layer network coding conceived in [33], [37].
All of these multiple access concepts were introduced in order to serve a number of excess users beyond the available resources. These multiple access schemes are characterized by NOMA techniques [38] in 5G and beyond wireless communications. Recently, several NOMA solutions have been actively investigated [2], which can be basically divided into two main categories, namely power-domain and codedomain NOMA. A few of the strong contenders of codedomain NOMA are low-density spreading aided CDMA (LDS-CDMA) [39], [40] and sparse code multiple access (SCMA) [41], [42]. LDS-CDMA [39], [40] must generally guarantee to be UD code set [43], which means nonzero Euclidean distance. The design of LDS type matrices offers flexible resource allocation, performs better in terms of handling the MAI that exists in rank-deficient systems and has low-complexity receivers compared to conventional CDMA.
Unlike the dense UD code set, the LDS-CDMA structure can be represented by a factor graph, the classic message passing algorithm (MPA) can be employed for its detection. Due to the reason that UD code sets were originally designed for adder channels, generally a simple noiseless detection is developed. However in practice, the wireless transmission channel exhibits, among other things, selective fading, multipath and the near-far problem, which leads to unequal received power among users. Consequently, if synchronization, channel equalization are compensated, lowcomplexity detectors can be applied to wireless channels.
Inspired by these attractive features of UD code sets, this paper investigates UD codes for synchronous uplink NOMA. In the case of dispersive fading channels, we can potentially equalize the channel effect by using channel precoding. Therefore, we consider developing a lowcomplexity detector for UD code sets that are proposed in [31] for overloaded synchronous CDMA systems. It is widely recognized that the complexity of an optimal detector is exponentially proportional to the number of users, which prohibits its practical implementation. Various suboptimal low-complexity detection techniques have been already proposed. These suboptimal approaches can be classified into two categories: linear and non-linear multiuser detectors. Linear multiuser detectors include among others, matched filter (MF), minimum mean-square error (MMSE), and zeroforcing (ZF), etc. In a non-linear subtractive interference cancellation detector the interference is first estimated and then it is subtracted from the received signal before detection. The cancellation process can be carried out either successively (SIC) [44], or in parallel (PIC) [45]- [47]. In non-linear iterative detectors [48]- [52], probabilistic data association (PDA) [53] aims to suppress the MAI in each iteration in order to improve the overall error performance. Suboptimal polynomial time detectors that are based on the geometric approach are studied in [54], [55].
In general, linear as well as non-linear detectors cannot separate users in overloaded systems even in the case of asymptotically vanishing noise. Therefore, the spreading codes must have property such that decoding can achieve asymptotically zero probability of error multiuser detection when the signal-to-noise (SNR) ratio becomes arbitrary large. The UD class of codes that guarantee "errorless" communication in an ideal (noiseless) synchronous CDMA/code-division multiplexing (CDM) also shows a good performance in the presence of noise.
Finding the overloaded UD class of codes for noiseless channel is directly related to coin-weighing problem, one of the problems that is discussed by Erdős and Rényi in [11]. It can be considered as a special case of a general problem where authors in [12], [13], [16], [25], [26] refer to them as detecting matrices. Lindström in [15] defines the same problem as the detecting set of vectors. Given an integer q ≥ 2 and a finite set alphabet M of rational integers, let v k for 1 ≤ k ≤ K be L-dimensional (column) vectors with all components from M such that the q K sums are all distinctly unique, then {v 1 , . . . , v K } are detecting set of vectors. Let F q (L) be the maximal number of Ldimensional vectors and f q (K) be the minimal vector length to form a detecting matrix for a given length L and a number of vectors K. The problem of determining f q (K) as a special case when q = 2, M = {0, 1} that can be equivalently expressed as a coin-weighing problem: what is the minimal number of weighings on an accurate scale to determine all false coins in a set of K coins. The choice of coins for a weighing must not depend on results of previous weighings. This problem was first introduced by Söderberg and Shapiro [12] for K = 5. The minimal number of weighings, L, has only been found for a few different values of K in [17]. However, Lindström gives an explicit construction of L × γ(L + 1) binary (alphabet {0, 1}) and L × γ(L) + 1 antipodal (alphabet {±1}) detecting matrices [14], where γ(L) i is the number of ones in the binary expansion of all positive integers less than L. He also proved that the lower bound in the case of i As an example, γ(8) = 12.
Cantor and Mills [16] constructed a class of 2 i ×(i+2)2 (i−1) ternary (alphabet {0, ±1}) detecting matrices for i ∈ Z + , which implies that in the case of M = {0, ±1} the lower bound is In the literature, most of the explicit construction of UD code sets are recursive [18]- [35]. To the best of our knowledge, it is worth mentioning that the maximum number of vectors of the explicit constructions of binary, antipodal and ternary code sets are K b max = γ(L + 1), K a max = γ(L) + 1 and K t max = (i + 2)2 (i−1) , as shown in Table 1, Table  2 and Table 3, respectively. Several authors have proposed decoders with linear complexity in noiseless scenarios for the explicit construction, where the detecting matrix has the known-to-us maximum number of vectors, K max .
For noisy channels, recently in [28], a class of antipodal code sequences, for overloaded CDM systems with simplified two-stage maximum-likelihood (ML) detection, has been proposed. In addition to that, other overloaded matrices over the ternary alphabet are introduced in [29] with a lowcomplexity decoding algorithm. Similarly, in [30] the authors propose overloaded code sets over the ternary alphabet that has a twin tree structured cross-correlation hierarchy that can be decoded with a simple multi-stage detector. Yet another construction of ternary codes that increases the number of columns, K, of UD codes for a such length compared to those proposed in [29] and [30] with a lowcomplexity polynomial time decoder is proposed in [32]. The primary reason for such low-complexity decoders is that the code sets are constructed with a certain criteria, which entails lowering the maximum number of users K < K max , as shown in Table 3.
Apart from, binary, antipodal and ternary UD spreading codes higher alphabet k-ary spreading codes were studied by Lu et al. [56] for the multiple-access adder channels.
In this work, for the first time we consider the problem of designing a low-complexity decoder that has a complexity of O(LK log 2 (K)) for UD code sets in [31] having the maximum number of users K a max . The code sets presented in [31] are also recursive, which make use of a linear map between vector spaces to Galois field extensions. These UD code sets are one possible construction of all possible distinct UD code sets, shown in Table 2. Simulation results in terms of bit-error-rate (BER) demonstrate that the proposed decoder has a degradation of only 1−2 dB in SNR compared to the ML decoder at a BER of 10 −3 .
Our contributions are summarized as follows: (1) We present the proofs for the important Propositions 1-4 that are presented for the first time in [31]. Those new prepositions are actually the broken down versions of the unique decodability (UD) property. (2) Moreover, for the first time, based on Propositions 1-4, we formally prove that the maximum number of users for the given L = 8 is K a max = 13.   [22] L γ(L + 1) Yes No 2019 Kulhandjian et al. [34] L γ(L + 1) Yes Yes † Code set constructions that achieve the maximum number of vectors Kmax are presented in bold.  (3) The minimum Manhattan distance is proved to be 4 for the recursive UD code sets in [31]. (4) We develop a low-complexity decoder that has a complexity of O(LK log 2 (K)) for our developed UD code sets in [31] having the maximum number of users K a max . (5) We compute the complexity of the deterministic noiseless detection algorithm (NDA) presented in [31] and perform some complexity analysis for the proposed fast (low-complexity) detection algorithm (FDA). The rest of the paper is organized as follows. The minimum Manhattan distance of code sets in [31] is presented in Section IV, followed by the proofs of the Propositions 1-4 and the maximum number of K a max for the case of L = 8 in Sections II and III, respectively. Detailed discussion of the FDA is presented in Section V. The complexity analysis for both NDA and FDA algorithms is presented in Section VI. After illustrating simulation results in Section VII, a few conclusions are drawn in Section VIII.
The following notations are used in this paper. All boldface lower case letters indicate column vectors and upper case letters indicate matrices, () T denotes transpose operation, mod denotes the modulo operation, sgn denotes the sign function, and | · | denotes cardinality of the set.

II. PROOF OF PROPOSITIONS
In order to facilitate the development of the proof it is beneficial to present the UD code set in [31] which . We recall that the Sylvester-Hadamard matrix of order 2 is H 2 = 1 1 1 −1 and of order 2 p+1 for p = 1, 2, ... is Then, for any p = 1, 2, ..., H 2 p H 2 p = 2 p I 2 p ×2 p , where I N ×N is the N × N identity matrix. We introduce the notation operator form a finite group (G, ). There exists an isomorphism ϕ, shown in Table 4 from G to finite additive Abelian group (F 2 4 , +) of extended Galois field F 2 4 , in other words G is isomorphic to (F 2 4 , +), (G, ) ∼ = (F 2 4 , +). From linear algebra we know that there is an isomorphism from finite additive groups (F p n , +) to vector fields (F n p , +) and to Z n p , that is (F p n , +) ∼ = (F n p , +) ∼ = Z n p , [57]. Table 4 shows the mapping of the vectors h 0 , ..., h 3 , a 0 , .., a 3 and its negated forms to elements in F 2 4 with primitive polynomial α 4 + α + 1 = 0, where α is the primitive element in extended Galois field GF(2 4 ). Notice carefully that operation of the finite group G is , whereas the finite additive group F 2 4 is +.

Antipodal
Polynomial With the above formulation, the columns of V L , F ϕ : As an example for V 8 the mapped vectors are as two-dimensional vectors In order to prove Proposition 1-4 that was first presented in [31], we will expose some very interesting claims. One property of the Hadamard matrix is the following, if we replace 1's in Hadamard matrix with 0's, and replace the −1's with 1's we create L − 1 Hadamard binary channel codes, since the first column results in zero vector [58].

Binary
Bipolor Let us look at the problem and assume that Null(C) ∈ Z, where Z ∈ {0, ±1} K×1 excluding trivial case {0} K×1 and let z ∈ Z. Then the nullspace of C can be formulated as such If we assume z 1 = 0 and for some z 2 (4) is true, then (4) can be expressed in terms of V L only as Claim 4. If V L does not satisfy (4) and (5) then multiplying any column of V L by −1 will still not satisfy (4) and (5).
Authors in [31] showed by exhaustive search that in case when L = 4 then  4×N , which have maximum one ±1 entry in each column and 0's elsewhere.
which have maximum one ±1 entry in each column and 0's elsewhere, except when N = 4 The generalization of Claims 1-6 formalizes the Propositions 1-4 and hence we presented the proofs.
For the case when L = 8, we prove that the maximum number of columns we can append to H 8 is actually K a max − L = 5. Note that all proposed UD or another words oneto-one matrix constructions in literature are C ⊂ C, where C is all possible antipodal UD code sets for a given L. In order to prove for the maximum number of possible vectors K a max , we should look at all possible V and count how many structure of V hits any of the forbidden lattice points H m z 1 and how many does not. If for a given k column, we count the number of V that hits any forbidden lattice points is equal to the total number of possible V vector set, then we know that maximum number of columns of V that does not hit any forbidden lattice points should be smaller than k.
First, we transform antipodal vectors into polynomials with integer coefficients . Those polynomials represent the row location and number of −1s or +1s in any antipodal v ∈ {±1} m×1 with dimension m. Let the polynomial be is mapped to the corresponding polynomials G j n (x) and G j p (x) to represent the +1 and −1 of the v j . As an example, for m = 4, the antipodal vector, v 10 = [1, −1, 1, −1] T , is mapped to G 10 n (x) = x 1 +x 3 and G 10 p (x) = x 0 +x 2 polynomials. Observe that for any antipodal vector v j , the addition of polynomials, , which are the evaluations of polynomials G j n (x) and G j p (x) at m, where m is the dimension of vector v j . By setting the x-axis and y-axis to be G j n (x) and G j p (x), we can build Λ ⊆ Z 2 lattice points, since evaluations σ j n (m) and σ j p (m) for each antipodal v j vectors are integers. Taking the above example of antipodal vectors having dimension of m = 4 the equivalent integer lattice points is shown in Table 6.
As a reminder, for the antipodal UD code set the goal is to construct V with the maximum number of columns K a max − L such that Λ H ∩ Λ V = ∅. Fig. 1 demonstrates a hit, in other words is not an empty set. Therefore, it can be shown that all possible 8 2 combination of two columns of V will at least hit a lattice points of Λ H , which means that Λ H ∩ Λ V = ∅ for k = 2. Hence, |V | = 1 for the case of m = 4. Using this approach, we can construct sublattices Λ H and Λ V for the case of m = 8 and find the upper bound of k such that Λ H ∩ Λ V = ∅. This creates a framework to search the solution using geometric combinatorics (e.g. Minkowski sum, Minkowski geometry of numbers [59]), partition and decomposition of Λ into equivalence classes, formed by sublattices and its' cosets to prove the K a max we can have for a given m and how to generate those vectors in V . Polynomials of G j n (x) and G j p (x) can also be represented by their exponents in the above example G 10 . Therefore, our problem of avoiding non-trivial combinations of V to hit any forbidden lattice points becomes where denotes the multiset union, element-wise sums of each multisets and 1 ≥ t ≥ 2 m −1 integer value corresponds to antipodal vector. The objective is to avoid V vt∈V E t j sums of multisets E p or E n of V to hit any forbidden Obviously, there are 3 m forbidden multisets of H m and we do not want 3 k − 1 multisets of V excluding trivial case hit any of the forbidden multisets. We can use multiset partition theories and study bipolar vectors by their E p and E n representations and prove the maximal number of k.
Note that in our matrix construction design of the v i 's that they are distinct and not equal to any of columns of H m or −H m . If, however, any of v i ∈ ±H m , then the multiset of v i hits the forbidden multiset of H m . Such v i vectors can never be included in vector set of UD codes. Additionally, the v i s can be replaced by −v without violating the uniquly decodability property (4). Since all possible combinations of vectors including −v of V do not hit any of the forbidden multisets. In other words, if [H m V] ∈ C so is any number of columns of V or H m multiplied by −1.
One way to approach this problem is to classify all v bipolar vectors into groups then use inclusion-exclusion principle. There are 2 m number of v's that consist of This narrows for our design to consider only distinct vector sets and the total number of such distinct v i 's to be considered in our V design is We need to construct from distinct vectors v ∈ B + m such that they do not hit any forbidden multisets.
Therefore, the total possible number of V sets with k columns is 2 m−1 −m k . Out of this total number of V sets with k columns only some satisfy UD or one-to-one condition when appended to H m . If all the possible V's do not satisfy one-to-one condition that means k exceeds the maximum number of columns that can be added to H m . Hence, we want to count how many combinations of v ∈ B + m with k columns hit the forbidden multisets. If the number of combination is equal the total number of V sets then we know that k is not the maximum. Counting that number will help us to prove the maximum number of columns k.
We classify B + m into different groups, so that any combinations of vectors in similar group hits the forbidden multisets and therefore, such v that belong to the same group must be avoided in our design.
In our example, for m = 8 and k = 2, we classify B + 8 into groups and count how many V's do not satisfy the UD condition or one-to-one condition out of We know that the total number of vs with |E n | = 1 is Here is how we divide 56 of |E n | = 3 and 8 of |E n | = 1 into A i 's, 1 ≤ i ≤ 8 groups.
equivalently, we can then write them in multiset form as We can prove that holds .., 8} and h ∈ {1, ..., 8} and does not hold if t 1 = {t 1 |a t1 ∈ A h1 }, and t 2 = {t 2 |a t2 ∈ A h2 }, where h 1 = h 2 . Therefore, we can conclude that we can append two vectors v 1 ∈ A j and v 2 ∈ A i to H m , which do not hit any forbidden multisets by choosing any two vectors from different groups 1 ≤ i = j ≤ 8.
Here is how we divide 28 of |E n | = 2 into D i 's, 1 ≤ i ≤ 7 groups.
equivalently, we can then write in multiset form as It can be proved that and holds and t 2 = {t 2 |d t2 ∈ D h2 }, where h 1 = h 2 . Therefore, we can conclude that we can append two vectors v 1 ∈ D j and v 2 ∈ D i to H m , which do not hit any forbidden multisets by choosing any two vectors from different groups holds only if , ..., 8} and h ∈ {1, ..., 8} and does not hold if t 1 = {t 1 |f t1 ∈ F h1 }, and Therefore, we can conclude that we can append two vectors v 1 ∈ F j and v 2 ∈ F i to H m , which do not hit any forbidden multisets by choosing any two vectors from different groups Here is how we divide 70/2 − 7 = 28 of |E n | = 4 into G i 's, 1 ≤ i ≤ 7 groups.
equivalently, we can then write in multiset form as It can be proved that holds , ..., 8} and h ∈ {1, ..., 7} and does not hold if t 1 = {t 1 |g t1 ∈ G h1 }, and Therefore, we can conclude that we can append two vectors v 1 ∈ G j and v 2 ∈ G i to H m , which do not hit any forbidden multisets by choosing any two vectors from different groups 1 ≤ i = j ≤ 7. Note that A i 's group can be constructed from A i 's and D i 's such as Also the G i 's group can be constructed from different combinations of A i 's or D i 's groups. For example, Now, let us count how many vector sets V = [v j1 v j2 ] hit forbidden multisets. We can easily count after classifying B + 8 into groups as discussed above. Hence, the total number of vector sets that hit forbidden matrices are There are no other different combinations of two vectors in B + 8 that can hit the forbidden multisets. We find that our computed number of comination that hits the forbidden multiset 308 < 7140 is less than the total number of two vectors combination sets, then we can claim that the maximum number of vectors that can be added to H 8 is (K a max −L) ≥ 2. This method of classifying B + 8 into groups not only helps us to prove the maximum number of vectors but also on how to construct such vector sets that posses unique decodability property (4).
Similar computation can be carried out for the cases k = 3, 4, 5 and still claim that the number of computation is less than the total number of k = 3, 4, 5 vectors combination sets. As an example, we present two of such combinations below for the case of k = 5, . In both construction examples V 1 and V 2 , we cannot find any 2 vector combinations that belong to the same A i 's, D i 's, F i 's and G i 's. Therefore, all possible combinations of 2 vectors do not hit any of forbidden multisets.
However, once we add any other vector v 6 ∈ B + 8 to the above sets some of the combinations of resulting vector sets hit one of the forbidden multisets. This means that if k = 6 we compute the number of combinations that hit the forbidden multisets is exactly equal to  A, D, D, G, G) and (A, D, G, G, G). We take each combinations and using the rules developed in [31], we show in Apendix B that we cannot add any more columns from any groups.

IV. MINIMUM DISTANCE OF CODE SETS
The Manhattan distance [60] equivalently ( 1 )-norm of two L-dimensional vectors y i and y j for i = j is defined as where | · | denotes complex amplitude. Then the general minimum Manhattan distance of received vectors for a given antipodal code set can be formulated by Now that we proved that δ(C) = 4, we will try to find d min (C) of our proposed UD code sets C ∈ C ⊂ C, where C ∈ {±1} L×K is the set of all the antipodal UD code sets. Let us follow the option of having all zeros but two non-zero elements inx. In the case of L = 4 the columns c 1 and c 5 differ in the first element only. Ifx = [2, 0, 0, 0, −2] T then the difference vector is y = [4, 0, 0, 0] T . Note that even if we substitute the c 1 by −c 1 we can still obtain same y with x = [2, 0, 0, 0, 2] T . Based on our construction in [31], we look at the case of L = 8. Observe that all the elements of the columns 9-th and 12-th, [α 13 , 0] T and [α 13 , α 13 ] T of the C are equal except the 5-th element in which they differ. If we select x n,9 = x m,9 , x n,12 = x m,12 , and x n,i = x m,i for all i / ∈ {9, 12} then y n,5 = 2 and y m,5 = −2 or y n,5 = −2 and y m,5 = 2, which will result in d L (y n , y m ) = 4. With this specific observation together with the Theorem 1, we conclude that d min (C) = 4. From this observation, we learn that if any two columns differ at one element in a UD code set we assure that d min (C) = 4. Similarly, for L = 16 columns 17-th and 27-th, [α 13 , 0, α 13 , 0] T and [α 13 , 0, 0, 0] T ii Not only the columns required to be distinct but we assume any column multiplication with minus one should result distinct columns as well. differ in one element only. Due to our recursive construction in [31] for L = 2 p , where p = 5, 6, ... columns p2 (p−2) + 3 and (p − 1)2 (p−1) + 3 differ in one element only. Therefore, all the UD code set generated in [31] has d min (C) = 4.

V. FAST DECODING ALGORITHM IN AWGN
The recursive linear NDA decoder, discussed in [31], is not suitable for the noisy transmission channel. Let the received vector in the presence of noise be mathematically formulated as where A denotes the amplitude, c k ∈ {±1} L×1 are signatures for 1 ≤ k ≤ K, x ∈ {±1} K×1 is user data and n is the Additive white Gaussian noise (AWGN) channel noise vector with a variance of σ 2 . The objective of the receiver is as follows; recover the user datax given the received vector y in (16) and C, so that the mean square error E{||x −x|| 2 } is minimized. The ML solution is given by It is widely recognized that obtaining the ML solution is generally NP-hard [61]. Our detection problem, where the overloaded signature matrix has UD structure, can be solved efficiently if there is a function that maps y → y ∈ Λ ⊂ N L×1 , where Λ is a Z-module with rank L. Therefore, it is equivalent to finding the closest vector point in a lattice Λ, such that Gaining the knowledge of y, one of the points in Λ generated by C, we can obtainx unambiguously (uniquely) by applying the NDA [31], since C satisfies the unique decodability criteria. However, there is no known polynomial algorithm to solve for y from a given y. Without loss of generality, we design our low-complexity decoding algorithm for code sets that are generated by the seed matrix V 8 which is given by: However, we do not necessarily imply that our proposed decoder cannot be applied to other recursive UD code sets such as in [14], [21], [23]. A slight modification may be required depending on a given C matrix. We present our proposed low-complexity FDA for C L , L ∈ 2 i , where i ∈ {2, 3, ...}, which is portrayed in Table 7. To quickly summarize, the FDA estimates iteratively the number and positions of −1s inx from the received values in y. Those estimates are updated when the FDA compares each value of the vector y against quantized levels. Those quantized levels VOLUME 5, 2016 are computed based on the information from the previous rows and the current values of the vector y.

Merge, meP function
Input: dP , m, n, K, rc, m LR , mP 1: dP art ← {∅}, B ← mP (rc), p ← 1 2: for p ← 1 to len(dP ) 3: r ← rc, f ← f alse 9: if L 0 = mP (r − 1, 1) 10: dP art(p , 1) ← m LR (r − 1, 3), f ← true 11: else 12: while f = f alse AND r > 1 13: if f = f alse 25: break 26: p ← p + 1 27: r ← rc, f ← f alse 34: if L 0 = mP (r − 1, 2) 35: dP art(p , 1) ← m LR (r − 1, 4), f ← true 36: else 37: while f = f alse AND r > 1 38: dP art(p , 1) ← i∈L 0 m(i), f ← true 41: else 42: L 0 ← [1 : K] \ L 0 43: [dP art, m] ← meP(tP , m, n, K, rc, m LR , mP ) Output: dP art, m Therefore, the FDA attempts to find the closest lattice point that is generated by matrix C from the estimatex to the received vector y ∈ R L×1 . This is achieved by performing quantization on each row of vector y to obtain z ∈ N L×1 such that z is a valid lattice point. In order to demonstrate how the FDA works, it is beneficial to describe each function in detail. The quantizer Q : R → N , z 1 = Q(y, −K, K, 2) maps a received real value y ∈ R to one of the constellation in {±K, ±(K − 2), ...} as follows, where t − , t + and s are the input parameters for the minimum, maximum, step-size values and i ∈ N is the internal value that decides the quantization level, respectively. Furthermore, let integer n and vector m denote the number of −1s and locations, where the −1s occur inx, respectively. Mathematically, n and m can be expressed as and  c AL ← 0, z ← 0, s I ← 1, c T ← 1 8: while s I = 1 AND c T < Nc 9: s I ← 0 10: while rc < L, rc ← rc + 1 11: [dP (rc), m]←meP(dP (rc−1), m,n,K, rc, m LR , mP ) 12: A − ← minT(dP (rc)), A + ← maxT(dP (rc)) 13: z(rc) ← Q(y , A − , A + , 4) 14: m LR (rc, 3) ← (2n − z(rc)− 14: (m LR (rc, 2) − m LR (rc, 1)))/4 15: 19: For the trivial case when z 1 = K or z 1 = −K the algorithm outputs the decision vectorx without proceeding to the next steps. Otherwise, it will initialize the vector m ← 1 K , index r c ← 1 and n ← (K−z 1 )/2. Then the value of n is recorded in the table m LR at the row r c = 1 and the column n L . The table m LR keeps track of l L , l R , n L and n R values for each row and l L and l R are defined as the number of +1s and −1s of the row of code set C, n L and n R are the number of −1s of the estimatedx that corresponds to locations of +1 and −1 of each row of the code set.
For example, m LR table for the code set in Fig. 6 can be constructed as where the last two columns presented as dots are filled up in the further steps of the algorithm. Whereas mP (r c ) is the actual column indices of +1s and −1s in the row r c . In the case of code set in Fig. 6, the matrix mP is defined as The algorithm proceeds by partitioning each row r c and saving the estimated number of −1s of vectorx, n , the partition size, K , the number of +1s and −1s, l L and l R of the specific partition in dP (r c ). In Step 7, the adaptive parameter c AL , stopping iteration flag s I , and the number of repetition count c T (e.g., algorithm can repeat steps from 10 to 16 up to a maximum of N c times, since it directly depends on the variance of the noise) are initialized. At each row r c , dP (r c ) gets updated by calling meP() function. The function meP(dP (r c − 1), m, n, K, r c , m LR , mP ) scans each partition of the row r c with updated values and if it finds one or more partitions completely identified the exact locations of −1s, hence will skip partitioning further. A − and A + are the minimum and maximum values calculated for a given partitions at each row.

MaxT function
Input: dP 1: A + ← 0 2: for i ← 1 to len(dP ) 3: A + ← A + + maxF(dP (i, 1), dP (i, 4), dP (i, 2)) Output: A + where minF(n, L, K) = 2|n−L|−K and maxF(n, R, K) = −2|n − R| + K. In line 13 of FDA, we define y = y(r c )+2sgn(y(r c )−z(r c ))c AL (r c ), where c AL is an integer vector that is incremented in Step 19 by one if estimated z is not one of the lattice vertices generated by C. In line 18, we scan the rows from 1 to L to find the first r c , where z differ from estimated lattice vertex. The function uM(m, m LR , r c , mP ) updates m with the given updated parameters as follows For the case of Rayleigh fading channel instead of AWGN in (16) the proposed FDA is still applicable to perform detection. However, the channel coefficients for each user k should be known at the receiver side. For the frequencyselective fading channels, we can employ transmitter precoding scheme to overcome multipath channel effect as proposed by Fantuz and D'Amours, which is detailed in [62]. Briefly, this transmit precoding scheme exploits the knowledge of the channel impulse response for transforming the multipath channel into a single-path non-dispersive channel, which is equivalent over non-dispersive Rayleigh fading channel model.

VI. COMPLEXITY ANALYSIS
In this section, we discuss the complexity analysis of the proposed NDA and FDA algorithms. The NDA decoder for the noiseless transmission channels, discussed in [31], deciphers the data of all users at the receiver side in a recursive manner. At each step, it performs additions, comparisons and multiplications to decipher the bits of the users.  The NDA is deterministic with an exact number of execution steps. After L/8 number of execution steps it calls the NDA algorithm recursively, using two smaller vectors composed of the upper and lower L/2 elements of the received vector. For that reason to compute the complexity of the algorithm we first break the algorithm into two blocks B 1 and B 2 from steps 1 to 8 and 9 to 12.    The complexity when L = 8 is N 1 + 2N 4 since the NDA will not execute block B 2 . In the case when L = 16, the complexity of NDA is 4N 1 + N 2 + 4N 4 . It follows that when L > 8 the complexity of the recursive NDA can be represented as such where L = 2 i for i ∈ {4, 5, ...}. Based on this calculation, we can conclude that the complexity of NDA algorithm is O(L log 2 (L)). As one would normally expect, the complexity of the decoder in noisy channels is much higher than in the noiseless channels. However, the complexity of the proposed FDA decoder in Section V is not more complex than the NDA in terms of big O notation. It is important to state that the proposed FDA requires neither matrix inversion nor decomposition, instead, only additions, comparisons and multiplications. The algorithm goes through each row of the received vector to decode one or more users. The best case scenario of our FDA would be to satisfy the condition in Step 2 with the complexity of maximum of K comparisons, N comp = K. We should note that unlike the deterministic NDA, FDA has an element of probability in the execution steps. Therefore, to compute the overall complexity of the FDA we need to consider the worst case scenario. In Step 11 the complexity of meP can be shown to have 6K additions and (1 + log 2 (K))6K + 2 comparisons. The complexity of steps 12 and 13 is 6 additions and K comparisons. Similarly, the complexity of steps 14 and 15 is 4 additions. The complexity of uM method is 2K additions and 2K + 6 comparisons. Those execution steps from 11 to 16 are repeated L times. Finally, in Step 17 the complexity is (2 + K)L + O(K) additions and LK multiplications. Considering all the components, the total complexity can be shown to be N add = 9LK + 12L + O(LK), N comp = 6LKlog 2 (K) + 9LK + 8L and N mult = LK. Therefore, we can conclude by looking at the higher order terms of comparisons, N comp , since it is higher than additions and multiplications, hence the overall complexity is O(LK log 2 (K)). We note that in both NDA and FDA algorithms we do not consider assignments in our complexity computations. We can see that the complexity of FDA is comparably larger than NDA but much lower than the MMSE-PIC, Slab-sphere, PDA and ML decoders, which have complexities of O(LK 2 ), O(LK 2 ), O(L 2 K 2 ), and O(2 K ), respectively, as shown in Table 8. The exact computation for the cases L = 4, L = 8 and L = 16 are shown in Table 9. The FDA verifies in Step 18 if the L-dimensional lattice point that is generated by the estimated vector m is the same as the vector z or not. If that condition is not true the algorithm will adjust c AL value by increasing it by one and starts repeating steps from 11 to 16. It is obvious that the FDA will make an exact number of execution steps if no noise vector, n, is present and the number of times it will repeat the steps from 11 to 16 will only depend on the variance of noise, σ 2 .

VOLUME 5, 2016
Therefore, we demonstrate the number of additions, comparisons and multiplications in Figs. 2, 3 and 4 by varying σ 2 in terms of E b /N 0 in our simulations. For the case of L = 4 in Fig. 2, the number of additions, comparisons and multiplications does not depend on variance due to small overload factor 5/4. However, for L = 8 and L = 16 it stays high for up to 10dB in E b /N 0 . Then it drops to constant numbers at around 20dB in E b /N 0 , which is shown in Figs. 3 and 4, respectively.

VII. SIMULATION RESULTS
In this section, we evaluate the performance of the proposed antipodal UD code sequences generated by the seed matrix (20), which are portrayed in Figs. 6, 7 and 5. In our simulations, we compare the FDA with the MMSE-PIC [40], slab-sphere [63], PDA [53] and ML detectors for binary phase-shift keying (BPSK) modulation, which are characterized in Figs. 8, 9 and 10.
UD code set C with L = 4 and K = 5. The BER performance of UD code sets are averaged over the different users for C 4×5 , C 8×13 and C 16×33 , respectively. The performance of the proposed FDA is comparable to that of ML, as shown in Fig. 8. For the larger values of Ls, the FDA has slightly inferior performance in terms of BER compared to ML. However, in practice the E b /N o at the BER of 10 −3 is considered to be the operating threshold since we can apply channel encoding to achieve even as low as 10 −6 BER. At the BER of 10 −3 the FDA achieves 1 dB and 4 dB gain compared to the Slab-sphere, 4 dB and 15 dB gain compared the PDA, as shown in Figs. 9   L the overload factor increases exponentially. This in turn results in degradation of the linear separabilty of the overall system. The linear separability criterion for linear detectors such as PDA have the property of vanishing BER as the channel noise goes to zero. Notice that we omitted the discussion of MMSE-PIC detector due to the overall poor BER performance of MMSE-PIC detector in comparison to other detectors. The Table 8 shows the computational complexity of all the detectors. Even though the performance of the proposed FDA is slightly worse than the ML detector, it has much lower complexity compared to ML.

VIII. CONCLUSION
In this paper, we introduced a novel fast (low-complexity) decoder algorithm (FDA) for antipodal uniquely decodable (UD) code sets. The proposed algorithm has a much lower computational complexity compared to the maximum likelihood (ML) decoder whose complexity may be prohibitive for even moderate code lengths. Simulation results show that the performance of the proposed decoder is almost as good as that of the ML decoder with only a 1 − 2 dB SNR degradation at a BER of 10 −3 . Moreover, we proved the minimum Manhattan distance of UD codes proposed in [31] and a number of propositions which collectively served to be the foundation of the formal proof for the maximum number of users, K a max , for the case of L = 8. In our future research, we will conceive multiuser detection for higher-order constellations for transmission over dispersive fading channels. .

APPENDIX A PROOF OF THE CONVERSION FROM C A TO C B
The proof of the conversion from the antipodal overloaded UD code sets to binary UD code sets, coined as optical CDMA code sets in [27], are presented next.

Theorem 2.
If there is an antipodal UD code set C a ∈ {±1} L×K , then there is an equivalent binary UD code set Proof. Suppose there is an antipodal UD code set C a L×K . By corollary, if multiplying each row or column by −1 we can assume that the entries of the first row of C a L×K are all 1s. Let the conversion to the binary matrix C b L×K = (C a L×K + J)/2, where J is the L × K all-one matrix. It is clear that C b L×K ∈ {0, 1} L×K , therefore, we now need to prove the following Null v (C b L×K )∩{0, ±1} K×1 = {0} K×1 statement. Assume that C b L×K z = 0 L , which yields to v Null() represents the nullspace of a matrix. VOLUME 5, 2016 (C a L×K + J)z = 0 L and thus, C a L×K z = −Jz, where z ∈ {0, ±1} K×1 . Since the entries of the first row of C a L×K as well as the matrix J are all 1s, the first entry of C a L×K z must be equal to the first entry of -Jz. It is only possible if the first entry of -Jz is 0. Thus, -Jz = 0 L which leads to z = 0 L . As a consequence of UD code set of C a L×K that satisfies unique decodability condition (4) the expression C a L×K z = 0 L implies that z = 0 L .
However, in general it is not necessarily true that any binary (optical) code sets C b of any size can be converted to UD code sets C a . The reason is because for a given UD code set and K a max = γ(L)+1 and obviously γ(L+1) ≥ γ(L)+1. It is equality, Therefore, for a given UD code set and there is no equivalent UD code set C a . and existing G we have seven G's. This completes all seven different G's from each seven groups, hence we cannot add any other G to the existing combination. So three A's with D produce six distinct A's and the A + A + A = A, A + A + A + D = A produce another two distinct A's. Therefore, we cannot add any more A to the exisiting cominations. Now three A's will produce three distinct D's, and three A's with D, A + D = A, produce another three disctinct D's, plus the D that is in the combination that makes a total of seven D's. Therefore, we cannot add any more D's to the combination. Therefore, we proved that we cannot add any more A, D and G to the (A, A, A, D, G).

APPENDIX B PROOF OF ALL POSSIBLE COMBINATIONS
Combination 2 (A, A, D, D, D): This group combination is such that distinct (D, D, D) produce 3 2 = 3 distinct G's by D + D = G rule, and those G's must be different from created by (A, A) A + A = G rule. Also each A must be different from three distinct A's created by D + A = A rule. Each existing A's with 3 2 = 3 distinct D + D + A = A produce six distinct A's and plus two existing A makes a total of eight A's. This completes all distinct A's and we cannot add any more A to the combination. From existing combination there are four distict D's produced by A + A + D = D and D + D + D = D rules as well as three existing D's this make a total of seven distinct D's. Hence, we cannot add any more D to the combination. We have already seen that with existing combinations we create six distint G's and with (A, A, D) we can create more G's by A + A + D = G rule. This tells us that we cannot add any more G's. Therefore, we proved that we cannot add any more A, D and G to the (A, A, D, D, D). A's that makes a total of eight A's, therefore we cannot add any more A to the combination. There are six distinct D's produced by A + A = D, G + D = D, D + G + G = D, A+A+G = D rules and with the existing D's that makes a total of seven D's, hence we cannot add any more D to the combination. Similarly, for the case of G the combination produces five distinct G's by A + A = G, G + G = G, G + A + A = G rules and with the existing G's that makes a total of seven G's, hence we cannot add any more G to the combination. Therefore, we proved that we cannot add any more A, D and G to the (A, A, D, G, G).
Combination 5 (A, D, D, D, D): This group combination is such that all D's are distinct and no D + D = D + D is satisfied. The combination produces seven distinct A's by A + D = A, A + D + D = A rules and with the existing A that makes a total of eight A's, hence we cannot add any more A to the combination. There are three distinct D's produced by D + D + D = D rules and with the existing D's that makes a total of seven D's, hence we cannot add any more D to the combination. Similarly, for the case of G the combination produces seven distinct G's by D + D = G, D + D + D + D = G rules and with the existing G's that makes a total of seven G's, hence we cannot add any more G to the combination. Therefore, we proved that we cannot add any more A, D and G to the (A, D, D, D, D).
Combination 6 (A, D, D, D, G): This group combination is such that different D's created by D + G = D, D + D + D = D rules are distinct from existing D's and different G's created by D + D = G, A + D + D = G rules are distinct from existing G's. The combination produces seven distinct A's by A + D = A, A + G = A, A + D + G = A rules and with the existing A that makes a total of eight A's, hence we cannot add any more A to the combination. There are four distinct D's produced by D + G = D, D + D + D + G = D rules and with three existing D's that makes a total of seven D's, hence we cannot add any more D to the combination. Similarly, for the case of G the combination produces five distinct G's by D + D = G, D + D + G = G rules and with the existing G that makes a total of seven G's, hence we cannot add any more G to the combination. Therefore, we proved that we cannot add any more A, D and G to the (A, D, D, D, G).
Combination 7 (A, D, D, G, G): This group combination is such that different D's created by D + G = D rules are distinct from existing D's and different G's created by D + D = G rules are distinct from existing G's. The combination produces seven distinct A's by A + D = A, A + G = A, A + D + D + G = A rules and with the existing A that makes a total of eight A's, hence we cannot add any more A to the combination. There are five distinct D's produced by D + G = D, D + G + G = D rules and with two existing D's that makes a total of seven D's, hence we cannot add any more D to the combination. Similarly, for the case of G the combination produces five distinct G's by D + D = G, G + G = G, D + D + G = G, D + G + G = G rules and with the existing G that makes a total of seven G's, hence we cannot add any more G to the combination. Therefore, we proved that we cannot add any more A, D and G to the (A, D, D, G, G).
Combination 8 (A, D, G, G, G): This group combination is such that different D's created by D + G = D rules are distinct from existing D's and different G's created by G + G = G rules are distinct from existing G's. The combination produces seven distinct A's by A + D = A, A + G = A, A + G + G = A, A + D + G = A rules and with the existing A that makes a total of eight A's, hence we cannot add any more A to the combination. There are six distinct D's produced by D+G = D, D+G +G = D, D+G +G +G = D rules and with two existing D's that makes a total of seven D's, hence we cannot add any more D to the combination. Similarly, for the case of G the combination produces five distinct G's by G + G = G, G + G + G = G rules and with the existing G that makes a total of seven G's, hence we cannot add any more G to the combination. Therefore, we proved that we cannot add any more A, D and G to the (A, D, G, G, G).