Information-Optimum Approximate Message Passing for Quantized Massive MIMO Detection

We propose an information-optimum approximate message passing (AMP) for quantized massive multi-input multi-output (MIMO) signal detection. A well-known strategy for realizing low-complexity and high-accuracy massive multi-user detection (MUD) is AMP-based belief propagation (BP). However, when internal operations are conducted with double-precision arithmetic, large memory occupancy and severe processing delay are inevitable in the actual massive MIMO implementation. To address this issue, we replace all operations with a simple look-up table (LUT) search where all messages exchanged between each iteration process are unsigned integers. That is, the proposed signal detection is performed using only simple integer arithmetic. The LUT is designed offline using an information-bottleneck (IB) method, and the probability distribution of messages at each iteration step is required for determining the quantization threshold tracked by discrete density evolution (DDE). Computer simulations demonstrate the validity of the IB LUT-based AMP in terms of bit error rate (BER) performance and memory occupancy. The proposed method allows quantizing the AMP detector with fewer bits while maintaining similar performances, such as that of a typical AMP with double-precision.


I. INTRODUCTION
Compared to typical small-scale multi-antenna systems, massive multi-input multi-output (MIMO), where the base station (BS) is equipped with a massive number of antenna elements, promises significant improvements in spectral efficiency, detection reliability, and energy efficiency [1]- [3]. Increasing the number of BS antenna elements leads to high spatial resolution. This makes it possible to serve a massive amount of wireless links in the same time-frequency resource simultaneously, thereby allowing massive connectivity in uplink multi-user MIMO (MU-MIMO) scenarios. However, high-dimensional signal separation, i.e., large-scale multi-user detection (MUD), raises the computational cost and circuit scale at the receiver considerably [3], [4].
Maximum likelihood detection (MLD) is infeasible in large-scale MUD because of the prohibitive computational The associate editor coordinating the review of this manuscript and approving it for publication was Yiming Huo . burden involved. As low-complexity MUD solutions, linear spatial filters based on least square (LS) and linear minimum mean square error (MMSE) criteria are often utilized while sacrificing optimal detection capability [3].
To improve detection reliability without requiring high computational cost, a message passing (MP) algorithm based on belief propagation (BP) has been investigated [3], [5]- [11]. The BP-based signal detection can take advantage of the law of large numbers to reduce marginalization burden and simplify computations of the conditional symbol expectation. The most well-known BP-based detector is approximate message passing (AMP) [5], which is systematically derived from a strict approximation of Gaussian belief propagation (GaBP) [10], [11] in the large-system limit that occurs when input and output dimensions, M and N respectively, increase to infinity for a given compression rate ρ = N /M . In AMP, a rigorous proof of convergence to the Bayes optimal was presented under independent and identically distributed (i.i.d.) Gaussian measurement matrices [6]. These algorithms can achieve the computational complexity order VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of O(MN ) for each iteration process because they comprise only scalar-wise tensor product operations. Thus far, it is difficult to further reduce the computational complexity by simplifying signal processing. Instead, the present study focuses on an approach to avoid internal detection operations with double-precision arithmetic for practical hardware implementation. All signal detection operations are replaced by simple look-up table (LUT) search such that iterative detection performs with only simple integer arithmetic. This makes it possible to reduce memory usage and computational cost significantly, resulting in low-latency signal processing for low-cost receiver design. In this case, the most vital issue is determining how to design the LUT with a limited number of quantization bits without severely sacrificing detection reliability. The most notable method for this approach is information-optimum discrete signal processing [12]- [16], where LUTs are designed using an information-bottleneck (IB) method [17], which is a more generic framework with roots in machine learning and information theory [18]. The idea is to preserve as much information on the variable of interest as possible. Thus, the maximum possible relevant mutual information (MI) needs to be preserved at each iteration step. References [12]- [14] exploit the IB method and discrete density evolution (DDE) scheme [19]- [21] to log-likelihood ratio (LLR) beliefs in the low-density parity-check (LDPC) decoder. Inspired by these methods, we design a novel LUT-based iterative MUD via AMP.
To apply the IB method used in the quantization of LDPC decoding to the LUT design of AMP, there are some difficulties to overcome. First, unlike LDPC decoding, the beliefs exchanged in the AMP-based MUD are fluctuated because of the stochastic behavior of random MIMO channels. Intuitively, the LUT should be constructed adaptively according to the instantaneous channel state information (CSI). However, the CSI-dependent LUT has to be updated at any time the CSI changes, which can lead to a catastrophic processing delay. To remove the dependency of LUTs on the CSI, we focus on the statistical stability of beliefs in AMP under large-system conditions. Because of strong channel hardening effects [3] and well-averaged AMP rule, the CSI-independent LUTs can be realized.
Second, it is difficult to track the probability distribution of beliefs using DDE. In the IB method, the relationship between the discrete numbers to the variable of interest is linked via a joint probability distribution, and its evolution via iterative processing is tracked by DDE. In LDPC decoding, both the check-nodes (CNs) and variable-nodes (VNs) refer to binary values as original referencing values. In this case, the joint probability is simple to track in quantized discrete decoding because a number of possible combinations is well-limited. However, in the AMP-based MUD, the original values referred in the function-nodes (FNs) change continuously and dynamically because of channel coefficients, which implies the number of possible combinations is infinite. Under this condition, the joint probability distribution is not calculable, and the DDE loop is not closing. To tackle this issue, we extend the typical binary class DDE to multi-class, where the probability mass function (PMF) of quantized values is computed based on the probability density function (PDF) of corresponding continuous values, which is unlike the original binary DDE. This is possible because the PDF of beliefs in AMP can be approximated to the Gaussian distribution with high-accuracy in large-system conditions.
Finally, severe performance degradation is inevitable because of the quantization errors even if IB-based quantization is performed faithfully based on the typical AMP algorithm. However, it is difficult to capture the behavior of such non-linear errors in iterative processing. Therefore, we introduce an adjusting parameter to suppress the negative effect of quantization errors. It is possible to modify the average variance of beliefs in AMP by designing the parameter as an equivalent variance of quantization noise.
Given the above, the contributions of this paper are summarized as follows.
• CSI-independent LUT-based AMP for quantized massive MIMO detection, where the LUTs are designed offline using the IB and DDE methods based on long-term channel statistics, is proposed.
• To overcome the limitation of the DDE because of the continuity of the channel coefficient, the quantization methods in [12]- [14] are modified to an application to MUD with the aid of large-system approximations used in the AMP-based signal recovery.
• Adjusting parameters are introduced to suppress the negative effect of inevitable quantization errors and are designed by computer simulations. We emphasize that, to the best of the authors' knowledge, a LUT-based MUD mechanism based on the IB method and DDE scheme for massive MIMO systems, which is the key contribution of this paper, has not been proposed yet.
A. RELATED WORKS As described above, the proposed detector is given by the LUT-based finite-precision MP algorithm. One way of dealing with such a coarse quantization of iterative processing is the finite-alphabet iterative decoding (FAID) [22], [23], where the LUTs are hand-optimized to approximate the actual arithmetic operations as accurately as possible. The FAID approach is restricted to regular LDPC codes, but the resulting quantized decoding was shown to achieve competitive performances on a binary symmetric channel (BSC) with the typical decoding, despite coarse quantization.
The IB method is a fundamentally different way to learn the LUTs in an unsupervised manner [24], where its discrete input-output relationship is designed for maximizing the MI between the quantized message and the corresponding coded bit, and the origin of this approach dates back to [20], [25]. In this line of work, the IB-based decoders for regular LDPC codes were proposed in [12], [16] and found to yield improved decoding capability in comparison with the aforementioned counterparts with coarse quantization. In [26], the implementation results show that a similar LUT-based decoder delivers throughput up to 0.588-Tb/s with high energy and area efficiency. The IB-based decoders for irregular codes were presented in [15], and the extension to protograph-based raptor-like (PBRL) LDPC codes [27] enabling inherent rate-compatible decoding was proposed in [28]. Besides the above, in [29], the information-optimum decoding for polar codes was presented.
While the finite-precision MP algorithm in the context of channel decoding has been actively investigated, to the best of our knowledge, such a quantized MP algorithm for MUD has not been presented yet in the literature. Therefore, the framework of LUT-based AMP approach is expected to pose a new angle to related research topics.
The rest of this paper is organized as follows: Section II presents the Bayes-optimal AMP algorithm as an equivalent real-valued representation of a complex-valued MU-MIMO system. Section III presents the LUT structure to realize a significant reduction of the required memory occupancy. Section IV proposes CSI-independent LUT construction using IB LUT. In Section V, computer simulations are conducted to characterize the validity of the proposed methods in terms of bit error rate (BER) and memory occupancy. Finally, Section VI concludes the paper with a brief summary. . . , y c n , . . . , y c N ] T ∈ C N ×1 denote TX and RX symbol vectors, respectively. Assuming flat Rayleigh fading channels, the RX vector y c is given by

II. SYSTEM MODEL
where H c ∈ C N ×M is an N × M MIMO channel matrix. The element in the n -th row and m -th column, is a complex additive white Gaussian noise (AWGN) vector whose entries obey CN (0, N 0 ), where N 0 is the noise spectral density; the covariance matrix is given by E z c (z c ) H = N 0 I N . The abovementioned complex-valued model can be rewritten using a double-size equivalent real-valued model as where we define M = 2M and N = 2N , The m-th symbol x m in x represents one of Q (= √ Q ) constellation points X = χ 1 , . . . , χ q , . . . , χ Q whose entries are amplitudes of the real and imaginary components of X c , e.g., X = {± √ E s /2} in QPSK signaling. The n-th entry of y is expressed as where h n = h n,1 , . . . , h n,m , . . . , h n,M is the n-th row vector of H, and h n,m obeys N (0, φ H /2). The noise vector z is a real-valued AWGN vector whose entries obey N (0, N 0 /2). Using the equivalent real-valued model in (2), the LUT design can be significantly simplified.

C. BAYES-OPTIMAL AMP
As a preliminary step, the typical AMP algorithm [5] is presented using (2). The LUT designs, which will be explained in this paper, are conducted based on the mathematical manipulations provided in this subsection. Let K denote the number of iterations in the AMP. For each variable, · (k) indicates the corresponding variable at the k-th iteration step. Fig. 1 shows a BP process at each iteration of the AMP [5]. In the FNs, the soft interference cancellation (IC) is performed on each RX symbol using the soft symbols generated in the previous iteration process. The output beliefs of soft IC is approximately modeled as an i.i.d. scalar Gaussian signal as a result of the central limit theorem (CLT), and its variance is also computed. In VNs, the beliefs from FNs are combined to compute the joint belief. The soft symbol used in the next iteration process is generated by the Bayes-optimal denoiser; further, its variance is computed.
The pseudo-code is provided in Algorithm 1. For the ease of notations, the equation in the i-th line of Alg. 1 is denoted by (A1-i). In the AMP, the variance of beliefs computed in FNs and VNs, ψ (k) andφ (k) , are averaged in the large-system limit, respectively. Furthermore, the harmful effect of self-feedback because of the loopy propagation of beliefs is averaged by M 2 φ Hφ in (A1-6); this term is referred to as the Onsager term [5]. Owing to this well-approximated message passing rule, the statistical behavior of beliefs is not largely dependent on instantaneous channel fluctuations; this makes it possible to construct CSI-independent LUT.

D. QUANTIZATION
To clarify how quantization is conducted with predetermined thresholds, we show the quantization process of an arbitrarily variable a ∈ R using Fig. 2 In this case, the quantization function is defined as where we defineâ[0] = −∞ as the left-sided threshold for the quantization level u = 1. Here, we introduce the holds. Hereafter, we present two criteria for determining the quantization threshold: the maximal entropy criterion and the IB-based maximal MI criterion.

1) MAXIMAL ENTROPY CRITERION
In the maximal entropy criterion, a is quantized such that each quantization level holds equal probability mass, which leads to well-known equiprobable quantization (EPQ) [30]. Because of large-system approximation, the probability distribution of beliefs in Alg. 1 can be regarded as a Gaussian distribution. Therefore, the quantization threshold can be easily determined based on the CDF of Gaussian distribution, namely the error function erf(·). This criterion is similar to the FAID approach [22], [23] in that it mimics the results of each arithmetic operation as accurately as possible.
For considering the negative effect of quantization noise, the pseudo-Gaussian noise n q ∼ N (0, N q ) is introduced into the reference PDF at the quantization step. That is, the effective noise density is given by Given a joint probability distribution p (a, b), the IB method proposed in [17], [31] is utilized to determine a compressed representation of b,ḃ, which results in the least information loss of the original information a. The target mapper p ḃ |b can be found by minimizing the IB function, which is defined as where η is used to adjust the preference between the compression rate and information loss. During the construction of the CSI-independent LUT, the quantization setting is not optimal because of the use of an average threshold. Thus, a part of the information is lost each time quantization is conducted. Therefore, to suppress the information loss, it is necessary to utilize not only the distribution of the observed value b, but also the original information of interest a. In analogy with the rate distortion theory [32], in the IB scenario, by setting the distortion function as the lost mutual information ofḃ and a, the IB function (10) is obtained by (10) can also be viewed such that the first term indicates the compression rate and the second term indicates the amount of informationḃ contains about a. The IB method optimizes the quantization rule that causes minimum MI loss as we set η = ∞. There are several algorithms proposed in [31] to solve the optimization problem of (10). In this paper, we utilize the sequential IB (sIB) algorithm modified in [33].

III. LUT WITH SMALL MEMORY USAGE
The simplest approach to building an LUT-based AMP is converting each process of Alg. 1 into a single large-LUT with multiple inputs and outputs. However, in this case, large memory usage is inevitable. For example, when the RX symbol y n is quantized into B bits, (N + 1)B bits are required to represent all possible combinations in (A1-6), which is prohibitively large for practical implementation.
To avoid the impairment, we further divide each process in Alg. 1 into multiple basic arithmetic operations. Thus, the large-LUT with multiple inputs and outputs are divided into the successive computations of two-dimensional LUTs with two inputs and one output. This 2-to-1 LUT is treated as the basic unit of the quantized discrete AMP.

A. BASIC UNIT CONSTRUCTION
We define u 1 ∈ Q, u 2 ∈ Q, and w ∈ Q as quantization indices for the two inputs and one output of the 2-to-1 LUT. The LUT functions w =˙ (u 1 , u 2 ) and w =˙ (u 1 , u 2 ) provide the quantized sum c = a + b and the quantized product c = a · b operations, respectively. In the quantized sum, the 2-to-1 LUT construction is performed as follows.
• All possible combinations of the representative value are listed according tȯ where (u 1 , u 2 ) is a two-dimensional index.     Fig. 4 shows the multi-layer 2-to-1 LUT structure for representing the summation of T terms, T t=1 a t , where the tree-depth (= Number of layers) is L. At the first layer, T terms are divided into T /2 nodes. Each node contains two terms as inputs and builds the 2-to-1 LUT. At the subsequent layer, the outputs from the previous layer are considered as new inputs, and the same operations are conducted. When the number of inputs is not divisible by two, an additional 2-to-1 LUT operation for one node output and the remaining input is conducted. The expected sum is obtained after the L-th layer processing is completed.

B. MULTI-LAYER LUT STRUCTURE
Owing to this multi-layer structure, the LUT size required for the operation with T inputs is reduced from 2 BT to 2 2B · (T − 1), which makes a considerable difference, especially in massive MIMO configurations. Furthermore, when LUTs at the same tree depth are approximated to be identical, memory usage is compressed further. In this case, the required LUT size can be reduced to 2 2B · log 2 T .

IV. IB LUT FOR MAXIMAL MI USING IB AND DDE
With the 2-to-1 LUTs, we quantize each process in Alg. 1 based on the IB-based maximal MI criterion. The quantization rule is designed to depend on only long-term channel statistics, i.e., the stochastic channel (correlation) model, and therefore, the LUT construction can be pre-designed. Thus, LUT reconstruction requires to be updated when only the propagation environments change widely. VOLUME 8, 2020 A. PRELIMINARIES In the IB method, it is necessary to compute the joint probability of values and its referencing original values. In [20], the DDE is applied to track the transition of the joint PMF of beliefs in each iteration step of LDPC decoding. However, there are some issues when applying the IB and DDE to the LUT construction of AMP-based MUD.
In the LDPC decoder, all operations conducted in the FNs and VNs are LLR belief combining, and they can be directly linked to each other in the LLR domain. The AMP-based MUD requires various node processing in addition to the belief combining of (A1-8) and (A1-5). The FNs conduct soft IC using (A1-6) in the symbol domain, and the VNs generate soft symbol replicas using (A1-9), which plays the role of bridging belief-domain and symbol-domain processing. Therefore, some methods are required to link the entire processing as in LDPC decoding.
The RX symbols {y n , ∀n} and the channel coefficients h n,m , ∀n, m are respectively quantized into B bits based on the maximal entropy criterion in advance, using y n ∼ N 0, ψ (1) and h n,m ∼ N 0, φ H 2 . Using the predefined quantization index u ∈ Q, we denote the corresponding representative values byẏ n [u] ∈ R andḣ n,m [u] ∈ R.

B. BELIEF COMBINING IN VNs (A1-8)
Let us consider the belief combining in (A1-8). This processing is similar to the LLR combining in VNs of LDPC decoding because the referencing original value is binary. Thus, the joint probability can be obtained easily because the number of possible combinations is sufficiently limited. Here, we focus on the processing in one 2-to-1 LUT, and all LUTs consisting of the multi-layer structure work with the same rule.
Consider a 2-to-1 LUT with two inputs f 1 [u 1 ] and f 2 [u 2 ], and one output F[w] for belief combining in VNs. F[w] denotes all possible combinations of two inputs with 2B bits. The joint PMF required for the IB method is given by Under sufficient large-system conditions, the PDF of α  (1) ; this stochastic behavior is referred to as scalar Gaussian approximation (SGA) [3].
Similarly, at the second iteration step (k = 2), α n,m can be expressed usingỹ (1) n = y n as α (2) n,m = h n,m y n − M i=1 h n,ix where the Onsager-term is denoted by β (k)ỹ (k−1) n to simplify mathematical notations.
n,m |h n,m , x m obeys N µ (2) α , ψ (2) α , where the equivalent average and variance are respectively given by Similarly, at the k-th iteration step (k ≥ 2), we formulates the equivalent average and variance of p α (k) n,m |h n,m , x m by the following recurrence formulas as where ψ (1) α = ψ (1) . Based on the above PDF, the conditional PMF giveṅ h n,m [u h ] and x m can be computed by where we havė Here, we define γ (k) as the Onsager-coefficient at the k-th iteration in the discrete LUT-based AMP. The design of γ (k) is presented in Section IV-E. The dependency on the instantaneous channel coefficient can be removed by marginalization processing as Consequently, we can derive the joint PMF required for the IB method to determine the threshold ofα After determining the threshold, the resultant B-bit quantized values and its PMF (equivalent to P [f 1 [u 1 ], x m ]) become the input for the 2-to-1 summation LUT at the next layer. After successively computing the quantized sum based on the multi-layer structure until the first summation-term in (A1-8) is quantized, we achieve the B-bit quantized belieḟ x (k) m [w] and its joint PMF P ẋ (k) m [w], x m , which leads to the multi-layer LUT for (A1-8).
C. SOFT SYMBOL REPLICA GENERATION (A1-9,10) Using P ẋ (k) m [w], x m passed from the LUT for (A1-8), the soft symbol replica can be directly computed by the conditional expectation aṡ where P ẋ (k) is Gaussian distributed, the effective variance is given by where the predetermined parameter ζ is introduced as an offset to compensate for quantization errors. Using φ (k+1) x m |x m , we can compute P ẋ (k) m [w]|x m via Gaussian CDF, and then, the average variance of VNs can be computed asφ

D. BELIEF COMBINING IN FNs (A1-5)
For the reconstruction of the RX symbol replica in (A1-5), the referencing original value for h n,mx (k) m is h n,m x m , i.e., a continuous value. Since channel coefficients are quantized with B bits, the number of possible combinations can be limited to 2 2B ; however, it remains a large number compared to that in the binary case. In addition, the referencing original value gradually varies and increases in successive summation operations of the M symbol replicas, and therefore, the typical DDE is not applicable.
To address this issue, we extend the DDE for the binary class [20] to that for multi-class using the quantized values based on its estimated PDF as referencing values. The discrete joint PMF required for determining the quantization threshold is computed based on the PDF of the corresponding continuous variable under SGA conditions. First, we define ω n,m = h n,m x m andω For subsequent summation, the multi-layer LUT structure is used. At the first layer, the variance of each input is obtained by ψ Here, we focus on two 2-to-1 LUTs for the quantized sum at the l-th layer, where the outputs are defined byJ (k,l) and J (k,l) , and the corresponding input pairs are denoted by (ǰ (k,l−1) 1 ,ǰ (k,l−1) 2 ) and (j (k,l−1) 1 , j (k,l−1) 2 ), respectively. The variance of the node at the l-th layer is denoted by ψ (k,l) . As the effective noise of two inputs is uncorrelated, the output variance is twice that of the input variance, ψ (k,l) = 2ψ (k,l−1) . Further, the referencing value ω n,m is added up through the multi-layer structure; i.e., it has to be quantized with B bits asJ (k,l) [u], where the maximal entropy criterion is used. Assuming that the corresponding continuous variable is Gaussian distributed, the joint PMF P J (k) [w],J (k) [u] is computed as 2 As iterative processing progresses, the symbol replicas approach hard-decision symbols and the probability mass of the first and last quantization level becomes dominant. If we use all levels, the algorithm does not work well because of the insufficient resolution of the discrete bin. VOLUME 8, 2020 where the conditional probability follows Based on (28),J (k,l) [w] is quantized using the IB method. Finally, the joint PMF is computed after recursively performing the abovementioned processing based on the multi-layer structure.

E. SOFT INTERFERENCE CANCELLATION (A1-6,7)
The Onsager-coefficient is independent both on the RX antenna and the UE indices, n and m. This term is theoretically derived based on the Bayes-optimal denoiser (A1-9) in the continuous system. However, it is difficult to compute it in terms of the discrete activation function of (24) because of the severe quantization errors. Owing to the statistical stability of beliefs and the index-invariant characteristics, the Onsager coefficient is often given by appropriate parameters based on learning algorithms [34]. Thus, we may find a proper Onsager correction coefficient to imitate the average effect of self-feedback because of the loopy-propagation even in the quantized system.
For representing the accumulated quantization errors via iteration processing, we provide the Onsager coefficient as where γ (1) = 1. Finally, for closing the loop, the joint PMF required for VN processing as discussed in Section IV-B is computed. After the construction of the 2-to-1 LUT for the quantized sum ofy n |y n ∼ N y n , ψ (k) under large-system conditions, the corresponding conditional PMF is given by  where ψ (k) x m |x m from (26). The joint PMF can be computed as After the quantization ofẏ (k) [w] with B bits using the IB method,ẏ n [u] used in Section IV-B is obtained and the loop is closed. With its PMF, the DDE loop is successfully linked, and we complete the entire processing in IB LUT-based AMP detection. The flow chart of the DDE for the joint PMF is summarized in Fig. 5.

V. COMPUTER SIMULATIONS
Computer simulations were conducted to validate the performance of the proposed IB LUT-based AMP detector for massive MUD, where the performance metrics were averaged over 10000 independent channel realizations. The average RX power from each -UE is assumed to be identical based on slow TX power control. Simulation conditions are summarized in Table 1. We assume massive MIMO channels are uncorrelated flat Rayleigh fading, and channel estimation is conducted perfectly at the receiver side. The modulation scheme is gray-coded QPSK 3 , and the channel coding is not utilized. The number of AMP iterations is K = 16. Time and frequency synchronization between TX and RX are assumed to be perfect. We compare the performances of IB LUT based on the maximal MI criterion, EPQ LUT based on the maximal entropy criterion, and a typical AMP with doubleprecision. The PDF required for EPQ LUT is computed by marginalizing the histogram of each variable in Alg. 1 over all channel realizations, where the distribution is approximated by a scalar Gaussian distribution under large-system conditions. Table 2 shows the predetermined parameters N q for EPQ LUTs and ζ for IB LUTs. Both parameters are set to achieve the BER performance for minimizing the error floor level in the present massive MIMO configurations. In computer simulations, N q is determined from 0 to 10 in steps of 0.5 in decimal, and ζ is determined from 0 to 1 in steps of 0.05. In [12]- [14], the IB-based LDPC decoders were shown to operate only 0.1 dB away from the double-precision decoding even though all messages were represented with 4 bits. However, when the number of quantization bits is 4 bits, the high-level error floor is inevitable both in the LUT-based detection schemes, as shown in Fig. 6. This is because, in MUD scenarios, the increase in the dynamic range of beliefs due to the random channel coefficients causes severe non-linear signal distortion due to insufficient resolution. The dynamic range becomes smaller because of channel hardening effects [3] as the compression rate ρ = N /M increases, and therefore the error floor level in Fig. 6(a) is less than that in Fig. 6(b).

A. BER PERFORMANCE
In the case of 5-bit quantization, we find clear improvements in detection performances due to the improved resolution. However, in both configurations, the EPQ LUT cannot sufficiently suppress the error floor level. This is because, in coarse quantization, the distribution of quantized beliefs deviates significantly from the referencing continuous distribution. In the EPQ LUT that mimics the original arithmetic operations with reference to the continuous distribution, performance degradation due to sever quantization 3 The methodology proposed in [11] can be used to extend the Bayes-optimal AMP in Alg. 1 to higher-order modulation, and the quantization process presented in Section IV can be applied to achieve LUT-based MUD for higher-order modulation. However, with the increase in the modulation level, appropriate changes to the DDE and conditional expectation calculation are required, which is left for future works, with the numerical evaluation. errors is inevitable. In contrast, the IB LUT that maximizes the MI between the quantized beliefs and the corresponding continuous values is less susceptible to the quantization errors caused via each arithmetic operation. Consequently, the IB LUT can sufficiently suppress the error floor level and achieve BER = 10 −5 because of minimal information loss. The performance degradation at a BER = 10 −4 compared to the double-precision AMP is about 4.0 dB and about 3.0 dB in Figs. 6(b) and 6(a), respectively.
When the number of quantization bits reaches 6 bits, the resolution becomes high enough to directly mimic the arithmetic operations. Thus, the envelopes of the discretized distribution can be captured with the original continuous Gaussian distribution with high-accuracy. Both LUT-based discrete detection schemes suppress the error floor sufficiently. The performance difference between the EPQ LUT and IB LUT becomes smaller because the negative effect of quantization noise is mitigated. Remarkably, the IB LUT-based AMP can approach the double-precision AMP, and the performance gap is less than 2.0 dB at a BER = 10 −4 . VOLUME 8, 2020 Let us now consider the BER performances of the EPQ LUT-and IB LUT-based AMP when the number of iterations K is changed. Fig. 7 shows the BER performances of EPQ LUT-and IB LUT-based AMP with different numbers of iterations, where ξ follows Table 2. The transmit E s /N 0 is fixed at 0 dB under the conditions of (N , M ) = (80, 64) and (96, 64). In the case of 5-bit quantization, the IB LUT-based AMP can converge with a fewer number of iterations compared to the EPQ LUT-based AMP because of minimal information loss. In the EPQ LUT, the quantized AMP is subject to ill-convergence behavior of iterative detection due to the accumulated quantized errors, resulting in poor detection capability with a small number of iterations. In (N , M ) = (96, 64), the number of iterations required for convergence is less than eight iterations; the convergence point is considerably better than EPQ LUT. A similar tendency is observed in the case of 6-bit quantization, and the IB LUT-base AMP can converge to within less than six iterations in both configurations. Further improvements are expected by optimizing ξ for K and E s /N 0 , Such a significant performance gap is attributed to the difference in the concept of the LUT design. The EPQ is  performed to maximize the entropy of quantized values; however, high-accuracy signal detection requires original information; therefore, MI maximization is more appropriate. In addition, the distribution of the quantization error is assumed to be constant throughout all iterative processing in the EPQ LUT. In contrast, in the IB LUT, the PMF is tracked using the DDE. Consequently, the distortion of the probability distribution can be largely mitigated.

B. MEMORY USAGE
Since CSI-independent LUTs are designed offline based on long-term channel statistics, we do not place our emphasis on the order of the computational complexity to construct the LUTs, instead, we focus on the memory size required for holding LUTs. Table 3 lists all LUTs for constructing the EPQ and IB LUT, where g and G i denotes the memory usage for the 2-to-1 LUT and the multi-layer 2-to-1 LUT structure with T terms, in each iteration, respectively; these are are formulated as The averaged variance and Onsager coefficient can be computed directly using the PMF tracked by DDE in the IB LUT. Therefore, LUTs for this computation are no longer necessary. In this case, the mathematical formulation of memory usage required for the EPQ LUT and IB LUT are respectively given by In Table 4, memory usage required for EPQ and IB LUT are presented. Indeed, it is extremely small memory usage. With limited memory usage in practical scenarios, the operation with LUTs can be substituted for arithmetic computations in AMP detection.

VI. CONCLUSION
In this paper, a novel CSI-independent LUT-based AMP for massive MIMO detection was proposed to achieve signal detection in integer arithmetic with an LUT designed offline. Maximal entropy criterion and maximal MI criterion are utilized as strategies for quantization threshold design in LUTs; this leads to EPQ and IB quantization. The IB LUT is constructed using the IB method and DDE for minimizing information loss via quantization. It is possible to construct CSI-independent LUTs using the modified IB and DDE-based method by exploiting the statistical stability of beliefs in AMP. The predetermined parameter for suppressing the harmful effect of quantization errors can be designed in addition to the Onsager-term. Via computer simulations, we confirmed that the proposed IB LUT achieves a BER performance similar to that of the AMP with double-precision. The penalty caused by the quantization error can be well-suppressed. Compared to the EPQ LUT, the LUT built according to IB and DDE manages to track the transition of discrete distribution more accurately and thus design an optimal quantization threshold.