Nonlinearity Design With Power-Law Tails for Correlation Detection in Impulsive Noise

Impulsive noise plays an important role in power line communication among other applications. To improve the communication performance, this paper proposes a novel design of nonlinear processing which improves the fundamental performance of signal detection in impulsive noise. Power-law tails are firstly introduced in nonlinearity design to provide adjustable decay factors for different distributions. Four modes of nonlinearity functions are developed and analyzed. By taking the exponent and the threshold as two arguments, we formulate the nonlinearity design as an optimization problem of maximizing the efficacy function, which is the fundamental measurement for detecting a deterministic signal in impulsive noise. Given that the efficacy function is differentiable, unimodal but without closed-form derivatives, we propose to solve the optimization problem by derivative-free methods, e.g. the Nelder-Mead simplex method. As concept demonstration, our method is used for three commonly-used distribution examples. Results show that our nonlinearity design can achieve almost the same efficacy and detection performance as the locally optimal detector, with the advantage of easy-to-apply closed form expressions.


I. INTRODUCTION
While Gaussian noise is generally encountered in most systems, impulsive noise raises additional consideration for some scenarios, e.g. long-wave communications and ultrawideband systems [1], [2]. Specifically, for power line communication (PLC) technology, which is very attractive as a smart grid application given such advantages as no additional installation costs [3], [4], the impulsive noise over the PLC channel may severely deteriorate the communication performance [5]. Although multi-carrier modulations, e.g. the orthogonal frequency division multiplexing (OFDM), are inherently more resistant to impulsive noise than single carrier modulations, the counter measures to the performance degradation caused by impulsive noise is still a challenging research area for communication engineers [6].
Impulsive noise possesses a unimodal probability density function (PDF) that is similar to the Gaussian PDF, but with significantly heavier tails. Up to now, various impulsive The associate editor coordinating the review of this manuscript and approving it for publication was Qinghua Guo . noise models have been developed and used in research and applications, including the symmetric α-stable (SαS) distribution [7], [8], the Middleton Class A/B distribution [9], [10], the Gaussian mixture model referring to as Bernoulli-Gaussian random process [11], [12], Poission distribution of Class A noise and Nakagami-m noise [13]. Recently, a K -component Gaussian mixture model has been used to approximate the Class A noise and the SαS noise [6]. A hidden Markov Middleton model was adopted to characterize impulsive noise bursts [14].
For signal detection in impulsive noise, the maximum likelihood detector (MLD) is optimal; however, it has high computation complexity and needs prior knowledge of signal amplitude [15]. Therefore, investigators usually consider the locally optimal detector (LOD) for its simple structure, which contains two steps, i.e. nonlinear processing and linear correlation [16]. In low signal-to-noise ratio (SNR), the LOD can be almost optimal compared with the MLD. However, the LOD for impulsive noise has the drawback that its nonlinear function is possibly unavailable in closed-form. For instance, the SαS model and the Class A model do not admit closed-form PDFs. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ Methods for impulsive noise suppression can be classified into two categories. One kind is the zero-memory nonlinearity (ZMNL) transformation, which is developed to replace the nonlinearity of the LOD. Given a proper design, the ZMNL functions can be near optimal. Besides, the ZMNL functions are simple and easy to implement in traditional structures. The other kind is adaptive filtering, which is usually designed under a carefully chosen criterion, such as the least square criterion [14], the minimum dispersion criterion [17], the maximum correntropy criterion [18], the logarithmic least mean pth-power criterion [19].
However, as traditional nonlinearity designs emphasize the evaluation of linear region thresholds, they ignore the design of tails but preset them instead. The most widelyused tails are the blanker and the clipper (or called soft limiter) [8], [25], while other tails are also used, such as joint blanking/clipping [5], multiple thresholds for blanking or clipping [6], deep clipping [30], Gaussian tail [20], algebraic tail [31], Cauchy tail [32]. These tails are proposed, inspired from the tails of the LOD. Certainly, the LOD function has the optimal ''tail'', which varies for different noise models. However, up to now, researchers have not proposed an effective approach for approximating the tails of different noise models.
A fixed tail cannot be optimal for every distribution. The performances of existing tails vary greatly across different distributions. For example, the clipper is worse than the blanker in the Class A model, but better than the blanker in the SαS model. Moreover, the algebraic-tailed ZMNL (AZMNL) can approximate the LOD nonlinearity of the SαS distribution better than the blanker [26]. Our previous work has analyzed the ZMNL designs based on algebraic tail [33] and Gaussianization [34], both of which are only sub-optimal compared to the LOD. It is demonstrated that the match between tails and distributions is essential for nonlinearity in impulsive noise.
The tail design provides extra gain over the base of the threshold evaluation. In [35], we improved the Gaussian tail design and obtained nearly optimal performance for the SαS noise. Recently, two papers proposed new nonlinear tails. In [6], piecewise attenuation and clipping with multiple thresholds are developed for the K -component Gaussian mixture model. But for practical applications, the threshold number and the performance-complexity tradeoff may be a problem. In [21], the nonlinear function for the SαS distribution is subject to the cumulative distribution function (CDF) set. However, this design is only useful for known signal amplitude.
This paper proposes to optimize the tail by the powerlaw function in the nonlinearity design. The decay factor of tails is varied by employing the power-law function x a for a ≤ 0, so as to suit different distributions of impulsive noise. Compared with traditional designs that optimize the threshold T for fixed tails, our design optimizes an extra argument, i.e. the exponent a. New parametric nonlinear functions are developed in four modes with different properties.
To formulate the nonlinearity design, the objective function is defined by the efficacy function, which is closely related to the output SNR as well as the detection performance [21], [25]. Therefore, the design becomes an efficacy optimization problem with respect to two arguments (T , a). Analysis shows that the efficacy function is continuous, differentiable, and unimodal with respect to T and a, which makes the optimization problem convenient to solve by numerical derivative-free methods [36], such as the Powell's method [37] and the Nelder-Mead simplex (NMS) method [38], [39]. The solution algorithm will be provided and simulated.
Our work in this paper is summarized as follows: • First, we introduce the power-law tail for the nonlinearity design in impulsive noise. Unlike traditional clipping, blanking, or other fixed tails, the power-law function x a can adapt a to vary its decay speed so that it can be suitable for various distributions of impulsive noise.
• Second, the nonlinearity design is converted into the problem of optimizing the power-law parameters for maximizing the efficacy. This problem is hard to solve theoretically, but can easily be solved by a derivativefree method, e.g. the NMS method.
• Third, the power-law nonlinearity is effective for the commonly-used models of impulsive noise, e.g. the SαS and the Class A models. Given in closed-form expressions, the power-law nonlinearity is nearly as optimal as the LOD. However, the LOD may bear high computational complexity without closed-form noise PDFs. The remainder of this paper is organized as follows. Section II briefly describes the system model and nonlinear processing for impulsive noise. Section III develops the nonlinear functions in four modes. Section IV formulates the nonlinearity design as an optimization problem and analyzes its properties. Section V presents the solution algorithm. Then, Section VI simulates the proposed design in three models of impulsive noise and discusses the results. Then, Section VII presents the applications and advantages of the proposed method. Finally, conclusions are drawn in Section VIII.

II. SIGNAL MODEL AND NONLINEAR PROCESSING
Consider the detection of a deterministic signal in additive white noise. Given M samples available for correlation detection, under hypothesis H i , the received signal model is For signal detection in impulsive noise, the correlation detector mostly uses a two-step structure which consists of nonlinear processing and linear correlation. By denoting the nonlinear function as g(x), the correlation output is Since the noise is white, for low input SNR, by using the central-limit theorem, the test statistic T i (g) in (2) asymptotically follows a Gaussian distribution 1 as where f (x) denotes the noise PDF [25], [31]. The output SNR can be derived from (3). Among the ZMNL functions for nonlinear processing, the LOD function is theoretically optimal in low SNR. However, the LOD g lo (x) is hard to manipulate when the PDF f (x) or its derivative f (x) is not in closed-form. For instance, the α-stable distribution cannot provide a closed-form PDF, while the Class A distribution gives its PDF as an infinite series. Closed-form nonlinearity can be obtained by developing ZMNL functions that have similar shapes as the LOD function. Two kinds of limiters, i.e. the blanker and the clipper, are most popular, formulated as where T is the threshold, and sgn(·) denotes the signum function. The effectiveness of ZMNL functions in impulsive noise has been demonstrated in the literature and in practical applications. However, compared to the LOD, most ZMNL functions bear some performance loss. Moreover, traditional ZMNLs are generally developed based on specific models and applicable to a limited class of impulsive noises.

III. NONLINEARITY DESIGN WITH POWER-LAW TAILS
This section introduces the power function for tail design and proposes new nonlinear functions with power-law tails.

A. ADVANTAGES OF A POWER-LAW TAIL
We aim to develop a nonlinear processor that can approach the optimal detection performance for a wide range of heavytailed distributions instead of a specific model. Thus, our design should work for different distributions and approximate the LOD better than the traditional ZMNL designs. Traditional ZMNL designs focus on placing the threshold of the linear region for preset tails. However, the designed tail should vary according to the noise distribution, like the LOD tail which always depends on the noise PDF. The difference between the designed tail and the LOD tail can lead to significant loss in the optimality performance.
This paper proposes to optimize the tail function in the nonlinearity design and at the same time add an extra degree-offreedom in approximating the LOD function and improving the detection performance. Since a tail function is generally odd, we will shape g(x) for x ≥ 0, as the nonlinear function over x ≤ 0 can be obtained as g(x) = g(|x|) · sgn(x).
The power function y = x a has two significant advantages. (i) y = x a is simple and analytical. It is much easier to use than the LOD. (ii) y = x a can achieve various levels of decay rate by varying the exponent a ≤ 0. We use it in the nonlinear region and construct a new nonlinearity with a power-law tail that can generate different levels of decay rate in the nonlinear region so as to match various distributions.
In addition, traditional ZMNL tails can be viewed as special cases of power-law tails. For example: (i) a = 1 coincides with the linear processing g(x) = x; (ii) a = 0 obtains the clipper by multiplying by a constant T ; (iii) a → −∞ approximates the blanker 2 by (x/T ) a . Therefore, the power-law tail suite includes the traditional tails. By employing the exponent a as an additional degree-offreedom, the new nonlinearity design can achieve significant gain, no less than the traditional designs of blanker and clipper.

B. PARAMETRIC NONLINEAR FUNCTIONS IN FOUR MODES
After defining the linear region as directly proportional to x and the nonlinear region as power-law x a , the next question is how to combine them as a continuous function. Herein, the continuity is emphasized, not only for the similarity to the LOD g lo (x) which is continuous in common distributions, but also for the analysis on nonlinearity optimization.
As for combining the linear and nonlinear regions, four modes can be considered for the nonlinear functions. 2 As shown in (6), the blanker has a sudden drop when x exceeds the threshold T . Consider a continuous limiter with a power-law tail, as will be introduced in (8), g lm (x) = x for 0 ≤ x < T and g lm (x) = T (x/T ) a for x ≥ T . When x increases from T to T + , g lm (x) decreases at a rate g lm (T + ) ≈ a < 0, where denotes a small positive number. In this context, the limiter g lm (x) approximates the blanker g bl (x) for a → −∞.  In all four modes, the linear region always meets the powerlaw tails at the connecting point or breakpoint (T , T ). With this common goal, the power function x a is transformed in four different ways to pass through (T , T ). The first way is by scaling, and the others are by moving axes.
The four modes are listed in detail as follows.

1) SCALE TRANSFORM (SCALING)
The scaling of x a makes the dilated or contracted function graph pass through the breakpoint (T , T ). Three kinds of scaling, including along the X-axis, the Y-axis, or both axes, are equivalent, because of the properties of the power function. Fig. 1(a) draws the diagram of the scaling x a to pass through (T , T ). Thus, the Scaling mode has a ZMNL function formulated as

2) MOVE A FIXED POINT (P-MOVE )
The power function x a always passes though the fixed point (1, 1). Thus, x a can be moved along the 45-deg. axis, hence moving the original point (1, 1) to reach the breakpoint (T , T ), as shown in Fig. 1 Noting that, when a < 0 and T < 1, we get T a > T , so that the scaling function x a is translated parallel to the 225-deg. axis to meet the (T , T ) point. Fig.1(b) now has to be interpreted with the red solid curve below the red dotted line. Therefore, the scaling function g pm will generate negative values for x → ∞. By setting such negative values to zero, as shown in Fig. 7(a), the P-move mode formulates the ZMNL function as

3) MOVE ALONG Y-AXIS (Y-MOVE )
The breakpoint (T , T ) can also be reached by moving the graph of x a parallel to the Y-axis. This is equivalent to moving (T , T a ) to (T , T ), as shown in Fig. 1(c). Like the P-move mode, for T < 1 and a < 0, x a after a downward translation parallel to the Y-axis will have negative values for x sufficiently large. Those negative values are set as zero, as shown in Fig. 7(b). Finally, the ZMNL function in the Y-move mode is

4) MOVE ALONG X-AXIS (X-MOVE )
The breakpoint (T , T ) can be reached by moving the graph of x a parallel to the X-axis. Actually, this is obtained by moving point (T 1/a , T ) to (T , T ), as shown in Fig. 1(d). Then, the ZMNL function in the X-move mode is formulated as

C. PROPERTIES OF ZMNL FUNCTIONS
The properties of of the four ZMNL functions, including continuity, differentiability, and breakpoints along x are listed in TABLE 1, based on the analysis presented in Appendix A. The P-move and the Y-move modes have breakpoints besides the common breakpoint at T , due to the max(·) functions, as analyzed in Appendix A. Note that X (T , a) is differentiable relative to T and a.
The X-move mode has a discontinuous nonlinearity, whereas the other modes are piecewise continuously differentiable. The piecewise differentiability of g(x, T , a) is important, for it yields the differentiability of related integrals, as will be shown in the next section.
The four functions g sc (x, T , a), g xm (x, T , a), g ym (x, T , a), and g pm (x, T , a) share the same arguments T , a to be optimized, where the differentiability described by Theorem 1 will contribute to the optimization.
As the optimizations of the modes are similar, this paper provides a generic presentation of the optimization and its computational solution. The performances of four modes will be compared in three commonly-used noise models.

IV. OPTIMIZATION OF THE EFFICACY FUNCTION
This section introduces the efficacy function in the nonlinearity design and analyzes its properties.

A. OPTIMIZATION PROBLEM FORMULATION
Basically, nonlinearity design is to improve the detection performance; for example, increasing the detection probability in radar systems or decreasing the bit error ratio (BER) in communication systems. Instead of direct analysis of the detection performances or via the distance to the LOD nonlinearity, the efficacy function has been proposed by researchers to test the optimality of nonlinear processors [7].
For the nonlinearity with power-law tails, the efficacy function can be calculated by As can be seen from (3), the efficacy function represents a measurement of the asymptotic output SNR for detecting a deterministic signal in white noise [7], [21]. The signal amplitude ξ i , or its distribution characterized by the channel gain, does not change the optimization of the output SNR. The efficacy function is closely related to the fundamental performance for signal detection. As the test statistics are approximately Gaussian distributed, the BER performance can be calculated in a similar way as for the Gaussian noise channels. 3 For instance, in the minimum shift keying (MSK) system, assuming the flat-fading channel ξ 0 = ξ 1 = ξ , a bit-by-bit detector can achieve the minimum BER as where E s denotes the bit energy, Q(·) is the tail distribution function of the standard normal distribution Obviously, the detection performance is improved when we increase E(g, T , a) .
Given the concordant relation between the efficacy and detection performance, by taking the efficacy as the objective function, the nonlinearity design is formulated as the optimization problem For each mode, g(x, T , a) is substituted into (14) and the arguments (T , a) are to be optimized for maximum efficacy. Unfortunately, it is hard to develop a theoretical maximizer or analytical solution to the problem (14), since the efficacy E(g, T , a) has a complex expression and f (x) does not denote a specific PDF. This also causes great difficulty in the strict mathematical analysis of the objective function E(g, T , a).
However, the numerical solution is a good choice for solving (14). To seek such a solution, we need to investigate the properties of E(g, T , a) with respect to (T , a), e.g., continuity, differentiability, and monotonicity.

B. DIFFERENTIABILITY OF THE EFFICACY
First of all, E(g, T , a) is continuous with respect to T and a, for the nonlinear functions in the Scaling, the Y-move, and the P-move modes. This is obtained directly based on two points: g(x, T , a) is continuous; the calculation in (12) does not change the continuity.
Then, E(g, T , a) is differentiable with respect to T and a. This can be proved based on the continuity and the piecewise differentiability of g(x, T , a).
Given that g(x, T , a) is odd and f (x) is even, we can rewrite the efficacy function in (12) as Given that the denominator never vanishes, as long as the integrals are differentiable with respect to (T , a), E(g, T , a) is also differentiable. Moreover, the integral can be simplified by substituting the sub-functions of g(x, T , a). Then, all the related integrals can be proved to be differentiable, based on applications of Theorem 1. For instance, the Scaling mode g sc (x, T , a) has two subdomains. In the numerator of (15), we have Obviously, the first term in the right-hand side is differentiable relative to T and trivially relative to a. The culprit is hence the second term of the right-hand side, for which the following holds: Theorem 1: Given a differentiable function h(x) > 0, for g(x, T , a) as in (8), (9), or (10), the integral is differentiable with respective to T and a. VOLUME 8, 2020  Proof: See Appendix B. For the P-move or Y-move mode, there may be two or three subdomains, according to TABLE 1. The integrals involved in efficacy calculation are differentiable, invoking Theorem 1.
In all, the efficacy E(g, T , a) in (12) is differentiable with respect to (T , a), for T > 0 and a ≤ 1, for the nonlinearity in the Scaling, the P-move and the Y-move modes.

C. UNIMODALITY OF SIMULATED EFFICACY
However, as for monotonicity, it is hard to analyze what interval of T or a can support an efficacy E(g, T , a) that is monotonically decreasing or increasing. This difficulty is partly due to our analysis focusing on a generic case of ''impulsive noise'' instead of a specified model or PDF f (x). A ''heavy-tailed'' PDF is not enough to provide a theoretical proof of the monotonicity of E(g, T , a).
Alternatively, we employ numerical simulations as an effective way for shedding light on the monotonicity of the objective function. As shown by massive simulations, the various models of impulsive noise share the common property of unimodality of the efficacy.
A typical simulation is illustrated in Fig. 2, which shows the E(g, T , a) of the Scaling, the P-move, and the Y-move modes for the SαS noise, α = 1.5, γ = 1. As can be seen, the surface plotting E(g, T , a) has only one local maximum, which is also the global maximum. The efficacy surfaces are smooth, which demonstrates that E(g, T , a) is differentiable. Fig. 3(a) depicts the the level sets of the efficacy E(g sc , T , a) of the Scaling mode. We can see that the level sets are not convex except in the neighborhood of maximum point. Figs. 3(b) and 3(c) plot the curves of E(g sc , T , a) versus T or a when the other one is fixed. Obviously, E(g sc , T , a) is unimodal for either argument. 4 Based on the above analysis, we reach the conclusion that the objective function E(g, T , a) is unimodal for either argument T or a, for T > 0 and a ≤ 1, in the Scaling, the P-move and the Y-move modes, though this is hard to prove since f (x) is not specified.
As for the X-move mode, since g xm (x, T , a) is discontinuous at a = 0, its efficacy E(g xm , T , a) behaves differently. Discussion of the X-move mode is deferred to Appendix C.

V. SOLUTION TO THE OPTIMIZATION
Based on relevant properties of the efficacy function, we provide solution guidelines as well as a practical algorithm for the optimization problem (14).

A. SOLVABLE BY DIRECT SEARCH METHODS
To summarize the property analysis, E(g, T , a) has two important properties for T > 0 and a ≤ 1.
• E(g, T , a) is continuous and differentiable for (T , a); • E(g, T , a) is unimodal relative to both T and a. This reveals that it is promising to develop a numerical method to maximize E(g, T , a).
Herein, due to the fact that ''partial derivatives are unavailable in closed-form'', derivative-free methods are of interest for their advantage of no requirements on partial derivatives. Especially, direct search methods are reliable and applicable for using function values instead of calculating or approximating any gradients [36].
To solve problem (14) as a two-dimensional optimization, two notable direct search methods can be considered, i.e. Powell's method and Nelder-Mead simplex (NMS) method. Though both methods are proposed for minimization, they are equivalent to maximizing the negative of the objective functions. Brief introductions to the methods are given as follows.
Powell's method adjusts one argument at a time and finds the minimum in a finite number of steps [37]. The minimization might be accomplished by the application of the golden section or Fibonacci search as long as the function is unimodal in this search direction [40]. Powell's method is workable, since E(g, T , a) is unimodal for both T and a.
The Nelder-Mead method is inspired by simplex-based direct search methods in [41] and deliberately modified to avoid assuming knowledge of relative steps [38]. The NMS method has four coefficients, including α NM for reflection, γ NM for expansion, β NM for contraction, and σ NM for shrinkage. By constant coefficients, the NMS method ''adapts itself to the local landscape and contracts on to the final minimum'' [38].
The optimization problem (14) is solvable by the Nelder-Mead method or Powell's method. To use the methods, some details need to be addressed accordingly.

B. SOLUTION BY THE NMS METHOD
This section illustrates the Nelder-Mead method for designing the algorithmic solution to the optimization of efficacy E(g, T , a). For the NMS procedure, some algorithm details are summarized as follows.

1) OBJECTIVE FUNCTION
Considering the constraints on the domain of arguments to be searched, the efficacy is set as zero for T < 0 or a > 0. The objective function is defined as E NM (g, T , a) = −E(g, T , a), T > 0 and a ≤ 1 0, Then, the optimization of the efficacy function becomes the problem of the unconstrained minimization of the objective function E NM (g, T , a) with respect to (T , a).

2) COEFFICIENTS
Four coefficients are set as standard values, e.g.
α NM = 1, β NM = 0.5, γ NM = 2, and σ NM = 0.5. (18) This is a nearly universal choice in the NMS method. It also reflects the fact that the efficacy optimization is a regular problem for the NMS method.

3) STARTING POINT
Without prior knowledge about the noise distribution, it is recommended to set the starting point at (T 0 , a 0 ) = (1, 0). The NMS method is also convergent even if the starting point is not within the neighborhood of the maximum point.
Besides, any information about noise covariance can be used to define the starting point and help improve the convergence speed.

4) STOPPING CRITERIA
A final point concerns the criterion used for halting the procedure. The criterion adopted in this paper is to take the preset value 10 −4 as final increment on either the objective function or its arguments. Based on the above issues, the optimal values of T and a in each mode can be achieved by substituting the corresponding nonlinear function into the objective function E NM (g, T , a) and then minimizing it via the NMS iteration procedure [39].

VI. PERFORMANCE ANALYSIS IN DISTRIBUTIONS
This section deals with simulations of the proposed nonlinearity design in three commonly-used models of impulsive noise, including the SαS noise, the Class A noise, and the Gaussian mixture noise. Note that these noise models include the background Gaussian noise.
In each noise, the four modes (8), (9), (10), and (11) are considered, and the NMS method is implemented to find the optimal thresholds and exponents. For comparison, traditional Blanker and Clipper are optimized by substituting (6) and (7) for g(·, ·, ·) into the efficacy optimization problem (14) which is also solved by the NMS method.

A. DESIGN IN THE SαS NOISE
The α-stable distribution is considered to be symmetric relative to zero, the so-called SαS distribution. The SαS probability density does not exist in closed-form, except for the Gaussian and the Cauchy distributions. As a result, the SαS PDF is calculated, numerically, by where 0 < α ≤ 2 is the characteristic exponent, γ is the dispersion, and IFT (·) denotes the inverse Fourier transformation. A smaller α means a heavier tail. The nonlinearity functions of the four modes are designed in the SαS noise for α running from 1.0 to 1.9, and for γ = 1. The optimal efficacy of the four modes are shown in Fig. 4(a). Clearly, the optimal efficacy is achieved by the LOD.  The designs in all four modes achieve nearly optimal efficacy, above 99% of the LOD efficacy. More precisely, the Scaling mode achieves more than 99.5% of the LOD efficacy, while the Y-move mode and the P-move mode achieve slightly less. The Clipper is sub-optimal, while the Blanker is significantly worse than the others. This reveals that the Blanker method is not suitable for nonlinear processing in the SαS noise. Analysis on the X-move mode is relegated to Appendix C.
Optimal thresholds and exponents of the four modes are shown in Fig. 4(b) and Fig. 4(c) respectively. The Blanker has the maximum threshold and the most rapid decay g(x) = 0 for |x| > T , while the Clipper has the minimum threshold and the slowest decay g(x) = sgn(x)T for |x| > T . Compared with the Blanker and the Clipper, the proposed designs in four modes have moderate thresholds and decay exponents. They keep changing for varying α. It demonstrates that different distributions require various decay factors in the tails.

B. DESIGN IN THE CLASS A MODEL
The Middleton Class A distribution has PDF given by where σ 2 k = σ 2 (k/A + )/(1 + ), σ 2 is the average power, A is the impulsiveness index, and is the power ratio of the Gaussian component to the non-Gaussian component.
The nonlinearity of the four modes, as well as the Blanker and the Clipper, is designed for various cases of the Class A model. The optimization results are listed in TABLE 2. Compared with the optimal efficacy resulting from the LOD, three modes including the Scaling, the X-move, and the P-move obtain 99.3% average for all the cases. The Y-move mode is slightly worse, at 98.3%. The Blanker is sub-optimal, at 96.3%. The Clipper is the worst, at about 70%. It indicates that the Clipper is unsuitable for suppressing the Class A noise.

C. DESIGN IN THE GAUSSIAN MIXTURE MODEL
Then, we consider a two-component Gaussian mixture distribution. Its PDF is given by where ε denotes the probability of occurrence of the impulsive noise, σ 2 2 σ 2 1 are the variances of the impulsive and the Gaussian components, respectively.
In simulations, the nonlinear functions of the four modes are designed by the NMS method, for various cases of Gaussian mixture. The efficacy of the designed nonlinearities are listed in TABLE 3. Among the nonlinearity designs, the Scaling mode is the best, with 99.5% of the LOD efficacy in average. The other three modes obtain over 92.5%, the Blanker obtains 97.2%, and the Clipper obtains 84.5%. Next we investigate the robustness of each design. Considering the worst cases of all designs, the smallest efficacy percentages of the Scaling, the X/Y/P-move, the Blanker, and the Clipper are 98.2%, 95.6%, 83.3%, 84.6%, 89.2%, 57.8% respectively. Thus, we can see that the Scaling mode is very robust, while the Clipper is not robust.

D. SUMMARY
From the above analysis of the efficacy performance in the SαS noise, the Class A noise, and the Gaussian mixture noise, we can conclude that the proposed nonlinearity design is more robust and effective than traditional limiters. Since the exponent in power-law tails is optimized to provide an adjustable decay factor, the proposed design almost achieves the optimal efficacy in various noise models. Among the four modes of power-law tails, the Scaling mode in (8) works best and so is recommended above all other methods.

VII. DISCUSSION AND SIMULATION OF APPLICATIONS
This section discusses the applications of the proposed method and compares it with other detectors, such as the LOD. Detection performance for various nonlinearity is simulated.

A. TWO STAGES IN PRACTICAL APPLICATION
When the proposed design is used for signal detection in impulsive noise, any practical application would contain two stages.
Stage 1: Design -Obtain the optimal parameters (T o , a o ) by solving the optimization problem (14). For example, solve (17) by the NMS method in Section V-B.
Stage 2: Process -Use the nonlinearity g(x, T o , a o ) to transform the received data r(m). Then the output g(r[m], T o , a o ) will be used for the signal detection. Fig. 5 depicts the block diagram of designing the nonlinearity based on the noise PDF and applying it for the signal detection from the received data. In practical applications, the optimal threshold and exponent can be calculated once and off-line. The noise parameters that are unknown can be estimated based on noise samples before the design stage. The merits of this method can be mainly summarized as three points: • The nonlinearity design with power-law tails is almost as optimal as the LOD, as demonstrated in Section VI.
• The nonlinearity design with power-law tails is effective for various distributions of impulsive noise.
• The nonlinearity g(x, T o , a o ) has closed-form expression, so that its calculation is accurate and efficient.
The first and second points show that our method does not bear any degradation when it replaces the LOD. The last point reveals that our method can outperform the LOD on computational efficiency in the Process Stage, when the PDF is unavailable in closed-form.

B. ADVANTAGES OVER THE LOD IN PROCESSING
As discussed before, the distribution models of impulsive noises may not provide closed-form PDFs, e.g. the SαS and the Class A models. Thus, we cannot process the received data r[m] easily or directly by the LOD.
To use the LOD, a workable approach is by interpolation. It may consist of two steps. The first step is to generate discrete samples g lo (kx ), k = −K , −K + 1, . . . , −1, 0, +1, . . . , K , where x is a uniform interval and K controls the range. The second step is the nonlinear transformation of r[m] . It has less computational complexity and better accuracy than the LOD. It is also more suitable for practical applications.

C. PERFORMANCE SIMULATION AND COMPARISON
The nonlinearity design and application is simulated for communication in the SαS noise for α = 1.2, γ = 1. Besides the proposed nonlinearity in four modes, the Blanker, the Clipper and other nonlinearity preprocessors are also presented for comparison. The AZMNL sets the tail as 1/x and the threshold as T az = α 2 (1/α)/ (3/α) [31]. The adaptive soft limiter (ASL) method [21] uses a clipper threshold which is obtained by solving the probability equation Pr(|x| ≤ T asl ) = 1 + 0.7756(α − 2).
Firstly, the nonlinearity in four modes are designed and depicted in Fig. 6. The LOD function, which corresponds to the optimal nonlinearity, is continuous and differentiable everywhere. The four modes have less decay rates than the LOD, with similar thresholds for the linear regions. From the comparison of these functions, we can see that the four modes do not strictly follow the LOD's shape, but maximize the efficacy in their own function sets.
Secondly, other nonlinear functions are compared in Fig. 6(b). The ASL method performs the same way as the Clipper, which demonstrates that [21] achieves optimal design of the clipper. From the efficacy values denoted as ''Eff'' in the legends, we can see that the AZMNL and the ASL achieve less efficacy than our design.
Finally, the BER is simulated for the MSK modulation in the flat-fading channel. The SNR is defined as ξ 2 E s /γ , for M = 1024 samples in one bit. The output of LOD is calculated by linear interpolation, as discussed in Section VII-B. The BER results of 10 7 Monte Carlo simulations are plotted in Fig. 6(c).
As can be seen, the Scaling mode obtains the same BER as the LOD. The Clipper is sub-optimal, while the Blanker is worse. The AZMNL is near-optimal, better than the Clipper but clearly worse than the Scaling mode. Note that the simulated BERs are consistent with the theoretical BERs (13) which are not drawn in the figure to avoid the clutter.
To quantify the performance gain of the proposed scheme and other nonlinearities, we can use the efficacy of the nonlinearity to measure its SNR loss with relative to the LOD SNR loss (g) = −10 × log 10 Efficacy of g(x) Based on the efficacy values denoted as ''Eff'' in the legends of Fig. 5(b), we can see that the Scaling mode bears a loss of 0.05dB, the Blanking scheme a loss of 1.06dB, the Clipping method a loss of 0.33dB, and the AZMNL method a loss of 0.22dB. Hence, the nonlinearity with power-law tails bear little loss compared with the LOD and outperforms all the optimal designs of traditional tails.

VIII. CONCLUSION
This paper has introduced power-law tails in nonlinearity design and provided a nearly optimal solution for correlation detection in various models of impulsive noise. The nonlinear function is defined as a piecewise-constructed function consisting of a linear function and a power-law function where the threshold T and the exponent a are two design arguments. Then, nonlinearity design is formulated as the problem of the efficacy optimization with respect to T and a. As the efficacy is differentiable and unimodal, the optimization problem can be efficiently solved by derivative-free methods, such as the Nelder-Mead simplex method.
The proposed design has been tested in the SαS noise, the Class A noise, and the Gaussian mixture noise. Simulation results have shown that our proposed design is significantly more efficient and robust than the traditional blanker and clipper, since the power-law tail provides a suitable decay factor to match the tail of the noise model. Compared with the locally optimal detector, the designed power-law tail achieves almost the same optimality and has the advantage of a simple closed-form formula for nonlinear computation. This paper has analyzed the fundamental performance of signal detection by the power-law tail design. Future work will discuss the applications in wireless communication systems under specific modulation and unknown noise distribution. For instance, when the PLC system employs the OFDM, the OFDM waveform envelope and the average peak-to-average ratio have impacts on the nonlinearity and the BER [5]. Besides, for real-time processing when the noise model is unknown or time-varying, it is meaningful to develop efficient approaches for designing the power-law tails based on the received data instead of prior known noise distributions.

APPENDIXES APPENDIX A PROPERTIES OF NONLINEARITY IN FOUR MODES
The nonlinearity g(x, T , a) in the four modes, i.e. g sc (x, T , a), g xm (x, T , a), g ym (x, T , a), and g pm (x, T , a), possesses similar but different properties, which can affect the optimization. The following analyzes the properties of g(x, T , a) for x > 0, T > 0, and a ≤ 1.
(1). Continuity. The Scaling mode, the P-move mode and the Y-move mode have continuous nonlinearity functions. However, in the X-move mode, g xm (x, T , a) is discontinuous at a = 0.
The proof of discontinuity is simple. Let a → 0 + or a → 0 − denote that a may approach 0 from above (right) or below (left). For x > T , we can calculate the limits: Therefore, g xm (x, T , a) is not continuous at a = 0.
(2). Differentiability. All the nonlinearity functions g sc (x, T , a), g pm (x, T , a) and g ym (x, T , a) are made up with three types of functions, i.e. y = x, x a , and 0, all of which are differentiable. Furthermore, all such pieces are connected in a continuous fashion. Therefore, the nonlinearity functions are piecewise continuous and continuously differentiable except at the breakpoints.
However, for the X-move mode, g xm (x, T , a) is not continuous at a = 0 and thus not piecewise differentiable.
(3). Breakpoints. Breakpoints along the x-axis affect the efficacy, which involves an integral along the x axis. All the modes share the same breakpoint at x = T . However, the P-move and the Y-move modes may have additional breakpoints.
In the Scaling mode, g sc (x, T , a) always has only one breakpoint at x = T . T a − T , resp., occur when x a is moved down for 0 < T < 1 and a < 0.
In the Y-move mode, g ym (x, T , a) can be rewritten as where X y (T , a) = a √ T a − T is the breakpoint in addition to T .
Similarly, in the P-move mode, g pm (x, T , a) is rewritten as x > X p (T , a), where X p (T , a) = a √ 1 − T − 1 + T is the breakpoint in addition to T .
Both X y (T , a) and X p (T , a) occur only for 0 < T < 1 and a < 0 when we move x a down for the continuity of g(x, T , a), as depicted in Fig. 7.
Finally, notice that all the breakpoints, i.e. T , X y (T , a) and X p (T , a), are differentiable with respect to T and a.

APPENDIX B PROOF OF THEOREM ''DIFFERENTIABILITY OF INTEGRAL''
To prove differentiability, we use the related condition that the partial derivatives ∂E(T , a)/∂T and ∂E(T , a)/∂a exist and are continuous. As the proofs of the two partial derivatives are similar, in the following we prove continuity of ∂E(T , a)/∂T . First of all, consider the P/Y-move mode for T ≥ 1 or a ≥ 0, as well as the Scaling mode. In such a case, as analyzed in Appendix A, g(x, T , a) is the power-law function for x ∈ [T , ∞). Obviously, (28) is continuous. VOLUME 8, 2020 Then, in the P/Y-move mode for 0 < T < 1 and a < 0, g(x, T , a) equals zero for x ∈ [X (T , a), ∞). So we can rewrite E(T , a) as E(T , a) = X (T ,a) T g(x, T , a)h(x)dx, (29) where g( Actually, this is easy to obtain, since both lim  (30) and (28) the same. Therefore, ∂E(T , a)/∂T is continuous. Similarly, ∂E(T , a)/∂a is also continuous. Hence, E(T , a) is differentiable.

APPENDIX C PROPERTIES AND PERFORMANCE OF THE X-MOVE MODE
Unlike the other three modes, the X-move mode has discontinuity with respect to a, not only for the nonlinear function g xm (x, T , a) in (11), but also for its efficacy function E(g xm , T , a) in (12). This makes the X-move mode behave differently from the other three modes.
(1) Discontinuity. The efficacy E(g xm , T , a) is not continuous at a = 0. This is due to the discontinuity of g xm (x, T , a) at a = 0. Fig. 8 shows the efficacy E(g xm , T , a) for the SαS noise α = 1.5 and γ = 1. As can be seen, the efficacy surface is divided into two regions by the vertical break line at a = 0. Contrarily, the break line does not exist in any efficacy surface of the Scaling, P-move or Y-move mode.
(2) Differentiability. The efficacy E(g xm , T , a) is differentiable in two subdomains, i.e. a < 0 and a > 0, respectively. In every subdomain, g xm (x, T , a) is differentiable and thus E(g xm , T , a) is differentiable.
As can be seen in Fig. 8, E(g xm , T , a) is smooth in each region of the surface, though it is discontinuous at a = 0. (3) Unimodality. Simulations show that the efficacy function E(g xm , T , a) is unimodal with respect to a in each subdomain for a > 0 or a < 0, respectively.
(4) Optimization results. The NMS method still works for the optimization of E(g xm , T , a). As shown in Fig. 4(a), TABLE 2, and TABLE 3, the optimal efficacy in the X-move mode is close to the efficacy of the other modes. Actually, the NMS method succeeds in finding the bigger one of two local maximums for a > 0 and a < 0.
Figs. 4(b) and 4(c) show that the threshold T and the exponent a for the X-move mode change dramatically around α = 1.55. In Fig. 4(c) for the X-move mode in α > 1.55, the optimal exponent a o is a small positive number. The corresponding nonlinear function and efficacy are close to those of the optimal clipper.
Finally, due to the discontinuity, the X-move mode is less recommended than the other modes.