Deliberate Clipping and Iterative Distortion Recovery for UCA-Based OAM Multiplexing Systems

Deliberate amplitude clipping is a simple and well-known technique to reduce the peak-to-average power ratio of orthogonal frequency division multiplexing (OFDM) systems. In this paper, we propose a clipping technique for peak power reduction in orbital angular momentum (OAM) multiplexing systems with uniform circular array (UCA) antennas. In the proposed technique, clipping is performed on digital baseband signals prior to OAM beamforming and pulse-shaping filtering, which helps to avoid out-of-band radiation and affecting the orthogonality of the OAM modes. In addition, an iterative distortion recovery algorithm is proposed in order to mitigate performance degradation in bit error rate (BER) due to the clipping. The algorithm is derived by unfolding the clipping noise cancellation (CNC) algorithm for OFDM systems into layers and by introducing layer-wise learnable parameters. Simulation results show that for a realistic OAM multiplexing system with 256QAM signaling, the unfolded CNC exhibits excellent BER performance even when the conventional CNC suffers from a high error-floor. The combination of the proposed clipping and distortion recovery schemes provides a significant reduction in the peak power of the OAM signals at the cost of only a slight degradation in BER performance.


I. INTRODUCTION
Radio orbital angular momentum (OAM) multiplexing transmission [1], [2] has recently attracted attention for application to high-capacity wireless communication systems, especially to point-to-point (PTP) line-of-sight (LOS) millimeter wave radio systems for mobile fronthaul and backhaul links [3]- [8].
OAM multiplexing is realized with electromagnetic waves of different OAM modes, which are inherently orthogonal to each other. Several methods have been proposed to generate OAM waves by using special antennas, such as helicoidal parabolic antennas [2], spiral phase plates [3], and thin metamaterial plates [9]. Among them, the uniform circular array (UCA) antenna is considered to be especially promising because of its simple structure and low cost. In [10], it was shown that standard antennas arranged in circular arrays can be used to generate OAM and that only local The associate editor coordinating the review of this manuscript and approving it for publication was Li Zhang. measurements are required for detecting OAM-modes. Edfors and Johansson [11] theoretically investigated the UCA-based OAM channel and showed its equivalence in terms of channel capacity to a pre-coded MIMO channel. Moreover, several experimental studies have recently demonstrated the effectiveness and feasibility of OAM multiplexing systems utilizing UCAs [4]- [7].
The UCA-based OAM system presented in [11] is equipped with UCAs with N antenna elements at each of the transmitter (Tx) and receiver (Rx) ends, which face each other on the same beam axis in free space. The pair of Tx and Rx UCAs leads to a circulant channel matrix and thus N parallel channels are created by employing a discrete Fourier transform (DFT) and its inverse transform as respective preand post-processings. The UCA-based OAM system can thus be thought of as an N × N LOS-MIMO system with OAM beamforming processing. There are two major approaches to implementing the OAM beamforming processing functions; one is to use a Butler matrix [4], [12], [13], which is a passive analog beamforming circuit, and the other is to perform digital signal processing [5] on baseband signal samples. While each of these approaches has its own advantages and disadvantages, we focus in this paper on the peak-power issue of the digital beamforming approach, in which the OAM signals are generated from baseband signals obtained by precoding N independent signals using a DFT matrix. The precoded signal at each antenna element in the UCA exhibits high peak transmission power, which leads to a significant reduction in power efficiency of the power amplifier (PA). The UCA-based OAM transmitter with the digital beamforming function thus needs to reduce the peak power at each antenna element in order to improve the power efficiency. The issue concerned in this paper is the high peak power of the OAM signals, which is caused by signal multiplexing in digital domain.
A number of peak power reduction techniques have been presented for orthogonal frequency division multiplexing (OFDM) systems in the literature and references therein [14], [15], and many of them are applicable to the UCA-based OAM multiplexing systems. Among the existing techniques for OFDM systems, we focus on clipping and filtering (CAF) [16]- [18]. CAF is one of the simplest and most effective techniques, wherein the peaks of the signal are clipped and the out-of-band radiation caused by the clipping is filtered at the Tx side. The critical issue here is the clipping distortion that remains after the filtering, which leads to a significant degradation in bit error rate (BER) performance. As a way to improve BER performance, iterative distortion recovery algorithms have been proposed [19]- [22], in which the clipping distortion is reconstructed using a decision feedback approach and then it is subtracted from the received signals observed at the Rx end.
On the basis of the above CAF techniques developed for OFDM systems, we propose a clipping method and its corresponding iterative clipping distortion recovery algorithm for UCA-based OAM systems. To avoid out-of-band radiation and affecting the orthogonality of the OAM modes, the clipping process is performed on the baseband digital signals prior to the OAM beamforming and the pulse-shaping filtering. The resulting peak power of the signal at each antenna element depends on the clipping ratio, roll-off factor (ROF) of the pulse-shaping filter, and an internal parameter related to the in-band distortion. The iterative clipping noise cancellation (CNC) algorithm [21] for OFDM systems can be applied straightforwardly at the receiver side, where the number of antenna elements N is the counterpart of the number of subcarriers in OFDM systems. Unfortunately, however, simulations show that it performs poorly in a realistic situation [5] where N = 8 and 256 QAM is used as the modulation scheme. As in the OFDM case [23], the value of N has a significant effect on performance. It seems that, in this case, N is too small for the CNC to perform well. As a way to improve distortion recovery performance, we propose a learning-based algorithm that is derived by unfolding the CNC algorithm and by introducing learnable parameters.
The set of the learnable parameters contains the counterparts of the Bussgang coefficient that is used in the original CNC [21] and is computed analytically under the assumption that the distribution of the pre-clipped signal is Gaussian. Simulation results show that the CNC performs well with the parameters optimized through training rather than with a single Bussgang coefficient. The main contributions of our study are summarized as follows: • The statistical distribution is investigated for the instantaneous power of the UCA-based OAM signals. It is shown numerically that the peak power increases rapidly with increasing OAM modes N up to around 16 and then saturates. Also shown is the relationship between the peak power and the ROF of the pulse-shaping filter, which provides the optimal ROF in terms of minimizing peak power.
• A clipping method is presented for reducing the peak instantaneous power of the OAM signals, which neither causes bandwidth expansion nor has an impact on the orthogonality of OAM modes. The clipping processing is performed prior to the OAM beamforming processing and the pulse-shaping filtering.
• To compensate for the distortion caused by the clipping, a learning-based clipping noise cancellation algorithm is derived by unfolding the iterations of the conventional algorithm developed for OFDM systems into layers and by replacing the Bussgang coefficient with layer-wise learnable parameters. Simulations indicate that the proposed algorithm performs very well. The rest of this paper is organized as follows: In Section II, we briefly review the UCA-based OAM transmission model and investigate the statistical distribution of the instantaneous power of the OAM signals. Simulations show that the digital OAM beamforming process significantly increases peak instantaneous power. Section III presents a clipping method for the UCA-based OAM systems. Section IV describes the learning-based algorithm for mitigating the clipping noise distortion. Section V presents the simulated BER performance of UCA-based OAM transmission systems with the proposed clipping and distortion recovery algorithms. Section VI summarizes our results and concludes the paper.
Notation: We denote by R and C, respectively, the sets of real and complex numbers. Uppercase and lowercase boldface letters denote matrices and (column) vectors, respectively. The superscripts T , * , and † stand, respectively, for transposition, element-wise conjugation and Hermitian transpose; and and denote the real and imaginary part, respectively. Additionally, I denotes the identity matrix of an appropriate size, and • and ⊗ are the Hadamard and Kronecker products, respectively. For a vector x, diag(x) denotes the diagonal matrix with diagonal entries from x, and x 2

II. SYSTEM MODEL AND PEAK SIGNAL POWER A. OAM MULTIPLEXING TRANSMISSION SYSTEMS
We consider an OAM multiplexing transmission system equipped with UCA antennas at the Tx and Rx sides. For simplicity, we assume that the UCA consists of N antenna elements equidistantly arranged on a ring, and we refer it to as an N -UCA. Let s (k) [n] denote the n-th discrete-time signal transmitted from the k-th antenna element in the UCA and x (l) [n] denote the n-th QAM symbol transmitted through OAM mode-l channel. Then, s (k) [n] can be written as Eq. (1) can be written in matrix form as is upsampled and then filtered using a root raised cosine filter (RRCF) [24]. The signal sequence at each antenna element in the UCA is modulated on a single carrier frequency f c .
The free space transmission model (e.g., [11]) is used to simulate an UCA-based OAM channel, in which the transfer function between a pair of Tx and Rx antenna elements is given by where d is transmission distance, λ is wavelength, λ/(4π d) represents the free space loss, and G contains all relevant constants such as attenuation and phase rotation caused by antennas and their patterns on both sides. We denote by d (k,l) the distance between the k-th antenna element in the Rx N -UCA and the l-th antenna element in the Tx N -UCA. The N ×N channel matrix H UCA can be expressed as follows: Assuming the Tx and Rx UCAs are ideally aligned, H UCA can be regarded as a circulant matrix. Note that, in this case, ). The channel output signals received at the Rx N -UCA are filtered by RRCFs, down-sampled, and then post-processed by performing an N -point DFT. Consequently, the resulting post-processed signal y[n] = (y (0) [n], y (1) [n], . . . , y (N −1) [n]) can be written as where r (k) [n] denotes the n-th (downsampled) output signal of the k-th RRCF and

B. INSTANTANEOUS SIGNAL POWER OF UCA-BASED OAM SIGNALS
As explained above, the OAM signals are obtained by adding several independent signals via an IDFT, which may considerably increase the peak power of the transmitted signals.
In this section, we evaluate the complementary cumulative distribution function (CCDF) of the instantaneous power for the OAM signals and compare them with those for classical single carrier SISO signals. The instantaneous power considered here is normalized by its average power, i.e., |s (k) (τ )| 2 /P (k) , where for k = 0, 1, . . . , N −1, s (k) (τ ) denotes the output signal of the RRCF associated with the k-th Tx antenna, and P (k) denotes the average power. The CCDF of the normalized instantaneous power (NIP) is thus written as Shown for comparison are the simulated CCDFs of the NIP for conventional 256 QAM SISO signals. It can be seen that when ROF = 0.4, the NIP increases by more than 3.5 dB at a CCDF of 10 −5 by combining eight independent signals via the IDFT. Also, as the ROF increases from 0.1 to 0.4, the NIP of the OAM signal slightly increases while that of the SISO signal decreases by about 1.5 dB at a CCDF of 10 −5 . Fig. 2 shows the relationship between the NIP of the N -UCA-based OAM signals observed at a CCDF of 10 −5 and ROF, where 256 QAM was used as the modulation scheme and the number of antenna elements N in the UCA was varied from 1 to 64. When N = 1, the OAM signal can be thought of as a classical SISO signal. In this case, the NIP is minimized around an ROF of 0.4. In contrast, it can be seen that the NIP for N ≥ 8 increases monotonically as ROF increases from 0.1 to 1.0. When the ROF is set to 0.4 as in [5], the NIP increases by more than 3.5 dB by combining eight independent 256 QAM signals using IDFT. It is known that, due to the nonlinear characteristics of power amplifier (PA), a high peak normalized instantaneous power leads to signal distortion, which significantly reduces the power efficiency of the PA. Thus, peak instantaneous power reduction techniques become necessary in order to improve PA efficiency for OAM multiplexing.

III. DELIBERATELY CLIPPING FOR PEAK POWER REDUCTION
To reduce the peak instantaneous power of the OAM signals, we present a deliberate amplitude clipping method.
for k = 0, 1, . . . , JN − 1. Note that z[n] is the IDFT of a vector obtained by zero-padding x[n] to length JN and it can be regarded as a signal for OAM transmission with JN -UCAs. Each signal sample z (k) [n] is then clipped by the following soft-envelope limiter: where a tilde mark placed on top of a symbol represents a clipped version of that symbol. The amplitudes of the clipped samples are thus limited to a predetermined threshold value of A. In the following, we denote (8) as z (k) [n] = g(z (k) [n]) 1 In general, J does not need to be an integer as long as JN is an integer. for short. The clipping ratio γ of g is defined as It is customary to express the clipping ratio γ in units of dB as γ dB = 20 log 10 (γ where F JN ,N denotes an N ×JN matrix consisting of the first N rows of F JN and g(·) is applied in an element-wise manner. Fig. 3 is a block diagram of the proposed OAM multiplexing transmitter with the above clipping procedure for instantaneous peak power reduction. The original QAM signals are clipped via (10) and then pre-processed by N -IDFT, as described in (1) Fig. 4, the difference between the simulated CCDF curves of J = 1 and J = 2 becomes larger when ROF = 0.4, while there is no significant difference between the curves of J = 2 and J = 4. Fig. 5 shows the relationship between the clipping ratio γ dB and the NIP of the OAM signals observed at a CCDF of 10 −5 . Similarly to Fig. 4, there is no significant difference between the curves of J = 2 and J = 4. Note that when ROF = 0.4, the NIP for (J , γ dB ) = (1, 6.0) is almost the same as that for (J , γ dB ) = (2, 4.0). The difference in NIP between J = 1 and J = 2 at the same clipping ratio increases as the clipping ratio decreases. As we will show by simulations in Section V, the in-band distortion of the transmit signal for J = 1 is much larger than those for J = 2 and J = 4, while there is no significant difference between J = 2 and J = 4.

A. UNFOLDING THE CLIPPING NOISE CANCELLATION ALGORITHM
In this section, we present a method for estimating the originally transmitted QAM symbol vector x[n] from the post-processed received signal vector y[n]. For simplicity of notation, we will drop the index n in the following discussion. The received signal vector y can be written as where w = (w (0) , w (1) , . . . , w (N −1) ) is a complex white Gaussian noise vector. Let us consider the linear minimum-mean square error (LMMSE) estimate of z (k) given z (k) . Denoting the error in the LMMSE estimate by ε (k) , z (k) can be written as where can be considered not to depend on the antenna index k. In the following, we refer to it as the Bussgang coefficient and denote it by α. It is known that for sufficiently large N , the probability distribution of the magnitude of z (k) is well described by a Rayleigh distribution and that the following expression for α is obtained [17], [21], [25]: The clipped symbol vector x of (10) can then be rewritten as Since ε is a deterministic function of the originally transmitted QAM symbol vector x, we often denote it by ε(x), i.e., In accordance with (11) and (14), x can be estimated by solving the following optimization problem: where y ε = y − Hε(x) and Q denotes the set of the QAM constellation points. If the receiver can observe y ε (i.e., ε(x) is known at the receiver), the problem can be reduced to the OAM detection problem discussed in [7]. In a realistic scenario, however, x should be obtained even if the exact value of ε is not known at the receiver. The clipping noise cancellation (CNC) algorithm presented by Chen and Haimovich [21] for OFDM systems can be used to find a solution to the above problem. Here, the number of subcarriers in the OFDM system is a counterpart of the number of transmission modes (i.e., the number of antenna elements N ) in the OAM multiplexing system. In the CNC algorithm, the originally transmitted symbol vector x and the clipping noise vector ε are alternately and iteratively estimated. The CNC algorithm for the OAM multiplexing system can be described by the following iterations: where denotes the iteration number, Q (·) is the entry-wise projection onto the QAM constellation set Q, and D(H, ·) denotes a linear MIMO detector function. When the QAM constellation is square, Q (z) can be written as (x) + j (y) for z = x + jy, where for some nonnegative integer q, and sign(x) = +1 if x ≥ 0 and −1 otherwise. Furthermore, for simplicity, we use conventional zero-forcing (ZF) as the MIMO detector D(H, ·). In a practical system, the channel matrix H will be estimated using predetermined pilot symbols. Note that this is not affected by the clipping process when it is assumed that the pilot symbol power is the same level as the average data symbol power. If the Tx and Rx UCAs are ideally aligned, D(H, y − Hε ) can be simply written as Note that if the Tx and Rx UCAs are not ideally aligned, the OAM modes are not orthogonal to each other. In this case, (20) does not perform well, and as a result, an inter-mode interference cancellation function must be integrated into the MIMO detector, as discussed in [7]. At the first iteration of (17), x 0 is set to 0 as the initial condition, and thus, the transmitted signal vector is estimated as x 1 = Q ((αH) −1 y). Then at a subsequent iteration , the estimate ε +1 of the clipping noise vector is obtained from the given x by (17). The estimate x is then updated to x +1 by using ε +1 , as described in (18). Note that the iterations (17) and (18) contain the parameter α, which must be determined in advance. It was shown in [21], [23] that for the OFDM counterpart with a sufficiently large number of subcarriers, the clipping noise cancellation in [21] performs well with α determined by (13). For UCA-based OAM systems, however, a realistic number of transmission modes N seems to be not large enough for the iterations (17) and (18) to perform well with α of (13). Indeed, as we will show in Section V, when it is applied to a realistic system presented in [5], the clipping noise cancellation performs poorly with α given by (13), where the number of transmission modes is 8 and 256 QAM is used as the modulation format.
To optimize the unknown parameter and improve the clipping noise cancellation performance, we unfold the iterations (17) and (18) into multiple layers and introduce layer-wise parameters that can be trained using a stochastic gradient descent (SGD) method. Fig. 6 is a block diagram of the OAM receiver with the unfolded CNC, which consists of L layers each with the same structure. For = 0, 1, . . . , L − 1, the processing in the -th layer is as follows: where α is a learnable parameter, β and t are learnable parameter vectors, and ψ Q,t (·) is a parameterized counterpart of Q (·). Similarly to (19), for z = x + jy and a real vector t = (t i ), ψ Q,t (z) is given as ψ t (x) + jψ t (y), where and R(x) denotes the ReLU function. The following pseudocode summarizes the unfolded CNC algorithm.  4: x +1 ← ψ Q,t β •H −1 (y − Hε +1 ) 5: end for 6: x ← x L For each layer = 0, 1, . . . , L − 1 in the above algorithm, the learnable parameters are α ∈ R, β ∈ R N and VOLUME 9, 2021 t ∈ R 2q+1 , where we may set α 0 to 1 since x 0 = 0. The loss function we use for training these parameters is given by 1) ) is the originally transmitted symbol vector and x = ( x (0) , x (1) , . . . , x (N −1) ) is its estimate obtained from y by the unfolded CNC algorithm. Note that the total number of parameters to be learned in the above unfolded CNC algorithm is (2q + 2 + N )L − 1. The number can be reduced by imposing restrictions on the parameters, which may lead to a performance degradation while reducing the complexity of the training parameters. For example, when imposing the restriction β = (α −1 , α −1 , . . . , α −1 ) for 1 ≤ ≤ L − 1, the number is reduced to (2q + 2)L. Another restriction is ψ t (−x) = −ψ t (x), i.e., t −i = t i . In this case, the number of parameters to be learned is reduced to (q + N + 2)L − 1. Furthermore, by imposing t 0 = t i for all −q ≤ i ≤ q, the number becomes (N + 2)L − 1.
The computational complexity of each iteration in Algorithm 1 is the sum of the complexity of the DFT/IDFT in (21) and the MIMO detection in (22). The former complexity is O(JN 2 ), which can be reduced to O(JN log 2 (JN )) if JN is a power of 2. The latter complexity is O(N ) if the Tx and Rx UCAs are ideally aligned and otherwise O(N 2 ) as shown in [7]. The total complexity is thus at most O(L(J + 1)N 2 ).

B. OPTIMIZING PARAMETERS
We begin this section by introducing notation. For a given complex-valued matrix A, let A denote its real-valued counterpart defined by Similarly, for a given complex-valued column vector a, we denote by a its real-valued counterpart, i.e., Lines 3 and 4 of Algorithm 1 can be rewritten using this notation as follows: Furthermore, for a vector u = (u i ), ∂E/∂u denotes the gradient (column) vector of the loss function of (24) w. r. t. u, and for two vectors v = (v i ) and w = (w i ), ∂v/∂w denotes a matrix whose (i, j)-th entry is ∂v j /∂w i . Then, the gradient ∂E/∂α for = 0, 1, . . . , L − 1, can be computed as follows: where B = diag(β ) and u +1 = B H −1 (y − Hε +1 ). For u = (u i ) ∈ R 2N , ∂ψ t (u)/∂u becomes a diagonal matrix, whose diagonal entries are of the form, For the parameter vector β , we have Note that the (i, j)-th entry of ∂u +1 /∂β is given by where v +1,j is the j-th entry of H −1 (y − H · ε +1 ). For the parameter vector t , we have where for u j , t i ∈ R, ∂ψ t (u j )/∂t i is given as For z = x + jy, let g R (x, y) and g I (x, y) denote, respectively, the real and imaginary parts of the soft-envelope limiter function g(z). Let Dg(z) denote the Jacobian matrix of the map (g R , g I ) at (x, y), i.e., if |z| ≥ A and Dg(z) = I, otherwise. For a complex vector z = (z i ), we denote by diag(Dg(z)) a real block diagonal matrix whose diagonal entries are of the form Dg(z i ). Then, ∂E/∂x can be computed inductively for

V. SIMULATION RESULTS
This section provides simulation results that demonstrate the performance of the unfolded CNC algorithm. We consider as a typical example the UCA-based OAM transmission system in [5], [7]. In our simulations, unless stated otherwise, the number of antenna elements N was set to 8; the distance separating the Tx and Rx antennas was set to 40 m; the carrier frequency was set to 84.5 GHz; and the radius of the UCA was set to 0.265 m. These conditions were selected with reference to those of the field trial experiment reported in [5]. In addition, for simplicity, the channel matrix H was assumed to be known at the Rx side. In practice, this assumption may not be true, and the receiver needs to estimate H via pilot symbols [5]. Note that assuming that the pilot symbol power is the same level as the average data symbol power, the estimation of H will not be affected by the clipping process. Furthermore, equal power allocation over all transmission modes was assumed, and Gray-coded 2 2m QAM with constellation points Q = {x + jy | x, y ∈ {±1, ±3, . . . , ±(2 m − 1)}} was used for all modes. Note that, in this case, the number of entries of the parameter vector t in (23) is 2q + 1 = 2 m − 1. As noted in Section IV-A, for each = 0, 1, . . . , L − 1, we imposed a restriction on the parameter vector t = (t ,i ), i.e., t ,−i = t ,i , i = 1, 2, . . . , 2 m−1 −1. Accordingly, the total number of parameters to be learned was (2 m−1 +1+N )L −1, where the number of layers L was set to L = 2, 4, 8, 16. The Adam optimizer [27] with a learning rate of 10 −3 was used for training these parameters. The training and testing processes were implemented using TensorFlow [26] on an NVIDIA GeForce RTX 2080 with 8 GPU cores and 2 GPUs each with 8 GB RAM. The batch size and number of training epochs used for the training were set to 4,096 and 40,000, respectively. Fig. 7 shows the simulated BER performance of the unfolded CNC algorithm with the trained parameters set for L = 2, 4, 8, 16, where 256 QAM was used as a modulation scheme, the scale factor J for clipping processing was set to J = 2, and the clipping ratio γ was set to γ = 1.50 (left) and γ = 2.00 (right). In each epoch of the training process, the SNR per transmit symbol E s /N 0 was chosen randomly from a uniform distribution between 23 dB and 27 dB. For purposes of comparison, the figure also shows the simulated BER performance of the unclipped case and that of the conventional CNC algorithm described as (17) and (18) with the Bussgang coefficient α computed by using (13). We can see that a significant improvement in the BER of the unfolded VOLUME 9, 2021  CNC can be gained by increasing the number of layers to L = 16, while the BER of the conventional CNC saturates after iterations of L = 2 and suffers from a high error floor. When γ = 1.50, there is a gap between the BER performance of the unfolded CNC with L = 16 and that of the unclipped case, and the gap at a BER of 10 −3 is about 0.8 dB. When γ = 2.00, near optimal performance can be achieved by the unfolded CNC after a small number of iterations. At a BER of 10 −3 , there is negligible difference in performance between the unfolded CNC with L ≥ 4 and the unclipped case. Note that the conventional CNC suffers from a high error floor even when γ = 2.00. It can also be seen that in the low SNR region, the BERs of the CNC algorithms are slightly lower than that of the unclipped case. This is because the clipping of the OAM signals leads to a reduction in the transmit signal power.
Similarly, Fig. 8 shows the simulated BER performance of the unfolded CNC algorithm, where 64 QAM was used as a modulation scheme and the clipping ratio γ was set to γ = 1.26 (left) and γ = 1.50 (right). In each epoch of the training process, the SNR per transmit symbol E s /N 0 was chosen randomly from a uniform distribution between 17 dB and 21 dB. The other simulation conditions were the same as those of Fig. 7. It can be seen that a significant improvement in the BER of the unfolded CNC can be gained by increasing the number of layers while the conventional CNC suffers from a high error floor. When γ = 1.26, there is a gap between the BER performance of the unfolded CNC with L = 16 and that of the unclipped case, and the gap at a BER of 10 −3 is about 1.1 dB. When γ = 1.50, near optimal performance can be achieved by the unfolded CNC, and at a BER of 10 −3 , there is negligible difference in performance between the unfolded CNC with L ≥ 4 and the unclipped case. Fig. 9 plots the E s /N 0 required for the unfolded CNC algorithm to achieve a BER of 10 −3 versus the clipping ratio γ dB for L = 2, 4, 8, 16, where the modulation schemes used were 256 QAM (left) and 64 QAM (right). It can be seen from Fig. 9 (left) that for γ dB ≥ 3.5, the SNR loss relative to the unclipped case can be reduced to less than 1.0 dB after at most 16 iterations. Note that from Figs. 4 and 5, it is clear that applying the proposed clipping process with γ dB = 3.5 and J = 2 can reduce the NIP at a CCDF of 10 −5 by about 2.5 dB when ROF = 0.4. Similarly, it can be seen from Fig. 9 (right) that for γ dB ≥ 2.5, the SNR loss relative to the unclipped case can be reduced to less than 1.0 dB after at most 16 iterations. It can thus be concluded that the proposed clipping and distortion recovery schemes perform effectively with 256 QAM and 64 QAM signaling.
Next, we show the clipping noise cancellation performance of the proposed method for the cases of J = 1 and J = 4. Fig. 10 (left) plots the simulated BER performance of the unfolded CNC algorithm, where the transmit signals were clipped with J = 4; the clipping ratio was set to γ = 1.50; and the other simulation conditions were the same as those  of Fig. 7. For comparison purposes, Fig. 10 (left) also plots the performance for the case of J = 2 and γ = 1.50. We can see that the performance of J = 4 is superior than that of J = 2, especially when the number of layers L is small. Thus it seems better to set J to a large value if the computational complexity allows. Fig. 10 (right) plots the simulated BER performance for the case of (J , γ ) = (1, 2.00). The figure also plots the performance for the case of (J , γ ) = (2, 1.58) for comparison purposes. Note here that, when ROF = 0.4, the NIP of the transmitted OAM signals clipped with J = 1 and γ = 2.00 (γ dB = 6.0) is almost the same as that of the OAM signals clipped with J = 2 and γ = 1.58 (γ dB = 4.0). As can be seen from the figure, the unfolded CNC algorithm suffers from high error-floors when J = 1. The OAM signals clipped with J = 1 are not restored even using the unfolded CNC, while for the signals clipped with J = 2, they are successfully restored. The clipping process with J = 1 thus leads to a large amount of distortion compared with that with J = 2 even when the resulting NIPs are almost the same.
Next we show the simulated BER performance of N -UCA based 256 QAM OAM receiver with the proposed clipping method when the number of antenna elements N is 16, 32 and 64. In the simulation for the case of N = 16, the distance d between the Tx and Rx antennas, the carrier frequency f c , and the radius r of the UCA were set to (d, f c , r) = (40 m, 157 GHz, 0.271 m), and in the simulations for N = 32 and 64, they were set to (40 m, 157 GHz, 0.413 m) and (15 m, 157 GHz, 0.360 m), respectively. These conditions were selected so that they can maximize the channel capacity under practical constraints [7]. Fig. 11 shows the simulated BER performance of the unfolded and the conventional CNC algorithms for the cases of N = 16, 32, 64, where the clipping ratio and the scale factor J were set to γ = 1.50 and J = 2, respectively, and the number of layers was set to L = 8. Shown for comparison are the simulated performance for the case of N = 8 that were plotted in Fig. 7 (left). We can see that similarly to the case of N = 8, the proposed clipping and distortion recovery schemes can achieve good performance for N = 16, 32, 64.
Finally, while we assumed so far for simplicity that there is no inter-mode interference (IMI), we show by simulations that the proposed clipping and distortion recovery methods work well even when there is IMI due to antenna misalignment. As discussed in [7], when the Tx and Rx UCAs are not ideally aligned, the channel matrix H does not become diagonal anymore. For each OAM mode-k, the signal-tointerference ratio (SIR) due to the antenna misalignment can be evaluated by using the entries H (k,l) of the matrix H as SIR(k) = |H (k,k) | 2 / N −1 l=0,l =k |H (k,l) | 2 . We used the average of SIR(k) over k = 0, 1, . . . , N − 1 to measure the amount of IMI. Fig. 12 shows the simulated BER performance of the unfolded CNC algorithm, where instead of the ZF-based MIMO detection of (20), we employed an iteration of the learning-based OAM-MIMO detection algorithm presented in [7] in order to cancel the IMI. In the simulations of Fig. 12, the antenna misalignments were set randomly so that the resulting SIR becomes around 6 dB and 10 dB. The clipping ratio was set to γ = 1.78 (γ dB = 5.0) and 1.50 (γ dB = 3.5). The number of layers L was set to 16, and the other conditions were the same as those of Fig. 7. From Fig. 12 it can be seen that when γ = 1.78, the BER performances for the cases of SIR = 6 dB and 10 dB only slightly deteriorate compared with that of the IMI-free case (SIR = ∞). When γ = 1.50, while it can be seen that the slope of the BER curves for SIR = 6 dB and 10 dB decrease gradually as increasing the E s /N 0 , the required E s /N 0 to achieve a BER of 10 −3 deteriorates by only 0.5 dB even for SIR = 6 dB. It can be thus concluded that our clipping method works effectively, even in IMI environments.

VI. CONCLUSION
We investigated the statistical distribution of the instantaneous power of UCA-based OAM signals. It has been shown that the OAM signal at each antenna element exhibits high peak power and that a reduction in peak power is needed in order to improve power efficiency. To address this issue, we presented a clipping method and an iterative distortion recovery algorithm for UCA-based OAM systems. The algorithm can be regarded as an unfolded version of the clipping VOLUME 9, 2021 noise cancellation algorithm developed for OFDM systems, in which learnable parameters are introduced in place of the Bussgang coefficient used in the OFDM case.
We performed simulations for UCA-based OAM multiplexing systems with the proposed clipping and distortion recovery schemes. The simulation results showed that the combination of the proposed clipping and distortion recovery schemes provides a significant reduction in peak power of the OAM signals at the cost of only a slight degradation in BER performance. We also showed by simulations, the proposed clipping and distortion recovery methods work well in combination with the OAM-MIMO detection algorithm presented in [7] even when there is inter-mode interference due to antenna misalignment.