A Review of Wavelet Analysis and Its Applications: Challenges and Opportunities

As a general and rigid mathematical tool, wavelet theory has found many applications and is constantly developing. This article reviews the development history of wavelet theory, from the construction method to the discussion of wavelet properties. Then it focuses on the design and expansion of wavelet transform. The main models and algorithms of wavelet transform are discussed. The construction of rational wavelet transform (RWT) is provided by examples emphasizing the advantages of RWT over traditional wavelet transform through a review of the literature. The combination of wavelet theory and neural networks is one of the key points of the review. The review covers the evolution of Wavelet Neural Network (WNN), the system architecture and algorithm implementation. The review of the literature indicates the advantages and a clear trend of fast development in WNN that can be combined with existing neural network algorithms. This article also introduces the categories of wavelet-based applications. The advantages of wavelet analysis are summarized in terms of application scenarios with a comparison of results. Through the review, new research challenges and gaps have been clarified, which will serve as a guide for potential wavelet-based applications and new system designs.


I. INTRODUCTION
In signal processing, much attention has been paid to multiresolution analysis and data feature extraction. As a powerful mathematical tool for analyzing time-varying non-stationary signals, the time-frequency analysis method offers information on joint distribution in both the time and frequency domains. This method clearly describes the relationship between time and signal frequency. Standard time-frequency distribution functions include short-time Fourier transform (STFT, including Gabor transformation), Cohen distribution function (including Wegener distribution), improved Wegener distribution, Gabor-Wigner distribution function and S transform [1]. The advantage of STFT is that its physical meaning, which represents the energy contained in each frequency component of a signal over a specified time interval, is clear. Many actual test signals provide a time-frequency structure consistent with people's intuitive perception, which has become the most used time-frequency analysis. Nevertheless, the time or frequency resolution of the STFT is limited by the window's width function and cannot be optimized at the same time [2]. Such limitations can be overcome with wavelets.
Similar to the Fourier transform, the wavelet transform can be seen as the projection of a signal into a set of basis functions that provide localization in the frequency domain. However, in contrast to the Fourier transform, which provides constant, equally spaced time-frequency localization, the wavelet transform provides high-frequency resolution at low frequencies and high time resolution at high frequencies. Thus, different from the Fourier transform, the wavelet transform utilizes a series of orthogonal bases with different resolutions to represent or approximate a signal through the expansion and translation of the wavelet basis function. Wavelet transform is considered to be a significant breakthrough in mathematical analysis. It can be applied to various fields. For example, signal processing, image processing, pattern recognition, speech analysis and many applications could introduce wavelet analysis.
The wavelet research developed rapidly in the 1980s. In 1981, Stromberg proved the existence of wavelet functions. From 1984 to 1988, Meyer, Battle and Lemarie designed different wavelet basis functions with fast decay characteristics [3]. Mallat proposed a fast wavelet transform algorithm for signal analysis and reconstruction, namely Mallat algorithm [4]. Based on the concept of multi-resolution analysis, the Mallat algorithm is expressed as a two-channel filter. Whether 2-dimension images or 1-dimension signals, signals could be approximated by a set of sub-signals with different resolutions. It is widely applied in signal decomposition and reconstruction. In 1992, Soman and Vaidyanathan proposed wavelet packet theory [5]. Compared with wavelet transform, wavelet packet can divide the time-frequency plane more finely, and the resolution of the high-frequency part of the signal is better than that of the wavelet analysis.
In 1992, Zou and Tewfik [6] proposed the M-band Wavelet theory, which extended people's research on wavelet transform from ''two-band'' to ''multi-band''. In 1994, Goodman et al. established a multi-wavelet theoretical framework based on R-order multi-scaling functions and multi-resolution analysis [7]. The multi-resolution space generated by multiple-scale functions in a single wavelet is expanded to be generated by multiple-scale functions to obtain greater degrees of freedom. Geronimo and his team in 1994 [8] designed a multi-scale wavelet transform. The construction of the wavelet function is completed by multiple scaling functions. It can have the characteristics of tight support, orthogonality, symmetry and interpolation at the same time. In 1995, Sweldens et al. proposed a new wavelet construction algorithm-Lifting Scheme. First, the original discrete sample signal is divided into odd and even, and then the odd and even sample points are filtered. All firstgeneration wavelets can be constructed using lifting schemes. It is characterized by fast computing speed, small memory requirements, and the ability to implement integer-to-integral conversion [9]. In recent years, with the development of modern communication systems and image processing, wavelet wavelet function are included in the network together, and the network is trained by the step-by-step learning method.
In this review, a clear track of the wavelet development in history is sorted out and reviewed systematically. The contribution of this paper could be concluded as follows: 1) Wavelet theory, including the wavelet construction method and properties definition, is briefly summarized. The development of the wavelet base is discussed, and the research direction, which is rationalization, has been pointed out. 2) Signal decomposition methods are discussed, including widely used DWT (Discrete wavelet transform) and its extensions: wavelet packet (WP), complex WT, and primarily, rational wavelet transform (RWT). RWT is a more powerful signal processing tool with a more satisfactory frequency resolution, which can be applied to a wide range of fields by adjusting the rational factor.
3) The advantages of WNNs that use wavelet analysis as preprocessing are further clarified. More satisfactory scale domain resolution is more convenient for signal processing, whether in denoising or feature extraction. Applying the wavelet function to the neural network, combined with some interdisciplinary algorithms, can significantly increase the neural network's performance. 4) The application of wavelets in signal processing in the emerging fields, image processing, and optimization algorithms is introduced to broaden the application scenarios of wavelets. 5) Some current challenges and research gaps are discussed in the review, and some future research directions are suggested. The paper is organized as follows. Section II is the description of review method. Wavelet theory is described in Section III. The properties of different wavelet bases are discussed. Wavelet transform, especially RWT, which can provide finer resolution analysis, is reviewed in Section IV. WNN is introduced in Section V. It is divided into two research directions: wavelet as signal preprocessing and wavelet as activation function. The advantages of wavelet in signal processing (traditional and emerging fields), image processing, and application of optimization algorithms are also reviewed in Section VI. Through the review, some research challenges and research gaps are clarified in Section VII. In the end, the conclusion and future work are presented in Section VIII.

II. REVIEW METHOD A. THE ORGANIZATION OF REVIEW
The development of wavelets in four main categories is reviewed, covering wavelet theory, applications of wavelet theory in general signal processing problems and practical application scenarios. WNN, one of the fast-developing hot spots, is also covered to reflect its theoretical development and rich applications. The organization of the review is visualized in Fig. 1 and the detailed organization of the wavelet application is in Section VI. The historical background and VOLUME 10, 2022 families of wavelet basis functions are first reviewed. Various wavelet decomposition architectures and time-scale analysis tools for general signal processing are summarized. The review of WNN is divided into two major categories: one is to preprocess the signal with wavelets and input them into the neural network. The other is the deep fusion of the wavelet function and the neural network. The most relevant, recent and representative practical implementations using wavelets are summarized to reflect the current status of wavelet applications. Through the comprehensive review, research gaps and challenges are identified.

B. SELECTION CRITERIA
We first used the filter provided by the database to screen the publication date, publication platform, and fields of the documents. For example, we choose the latest papers, which are mostly from the past ten years and divide them into journal articles and conference papers. Furthermore, we browse the abstracts and keywords of the literature to determine whether it meets the requirements of the review paper. If the abstract and keywords cannot be accurately identified, we need to read the introduction part and find the objective statement of the literature to determine its relevance. Through the extensive and intensive reading of the literature, the main contribution is extracted and presented in a concise way. The main aspects for the evaluation of cited papers are novelty, contribution, relevance to the field and timelines. The survey includes papers that represent recent advances in the field, as well as relevant contributions to the field, regardless of their publication date. Through the literature, it is judged whether the literature focuses on the theoretical proposal or application design and divided into two categories. In each category, the literature is subdivided according to the collected keywords. We rank the articles by publication date for each subcategory. We intensively read the literature and extracted the novelty and achievements of the literature for further comparison and analysis.

III. WAVELET THEORY
Current review papers related to wavelets focus on one narrow field in many cases. For example, Paul S. Addison's article focused on near-infrared spectroscopy and defined cross-wavelet transform [17]. The author suggests that wavelets can be used for real-time local transformation phase monitoring to obtain valuable new high-resolution views. Moreover, low-oscillation wavelets may be worth considering when the time resolution needs to be increased in the transform domain. Huang et al. [18] pays particular attention to the influence of different wavelets chosen (i.e., wavelet packet and Gabor wavelet) and proposes to use the recently introduced empirical wavelets. In order to pay attention to the texture existing in the image, experiments are conducted based on an extensive selection of wavelets. The wavelet families in different directions were used to represent texture more effectively. Reference [19] introduced continuous wavelet transform (CWT) and DWT in ECG (ElectroCar-dioGrams) signal denoising and data storage reduction. It is analyzed that the most suitable denoising technology method is the bionic wavelet transform method. It shows high selectivity and sensitivity with high noise reduction.
In comparison with the previous reviews, this paper contains the scope of wavelets and various fields of wavelet applications as detailed in the following sections.

A. WAVELET CONSTRUCTION METHOD
There are two methods to construct wavelets. The wavelet's concept was initially praised in 1981 [20], [21]. After that, multi-resolution analysis (MRA) was constructed as a toolbox for constructing standard wavelet bases. Meyer and some other researchers perfected the details of the MRA. Mansouri Jam and Sadjedi proposed an orthogonal MRA and designed a matched wavelet to satisfy orthogonal MRA conditions [22]. Assume {V j } j∈Z is a subspace of L 2 (R) space, consider {V j } j∈Z is an MRA of L 2 (R) space when [23]: The space expands with the increase of j and is strictly contained (subspaces are contained layer by layer), indicating that the information of V j−1 is completely contained in V j and contains less information than V j .
• j∈Z The subspaces have and only have the intersection of 0. All of them could form a L 2 (R) space.
The subspaces have dyadic scalability.
The function still belongs to this subspace after translation. The other method is based on the lifting scheme, which was raised by Sweldens in 1995 [9]. The method has a faster calculation speed, and takes up less memory. It retains wavelet characteristics while overcoming the original limitations. In [24], Ansari and Gupta extended the lifting framework from dyadic wavelet to rational wavelet. It inherits all the advantages of the lifting framework. These so-called second-generation wavelets are easy to implement and can process signals of any size in the spatial domain and show 58872 VOLUME 10, 2022 perfect reconstruction [25]. It has two main applications. The first is the acceleration of the fast wavelet transform algorithm. The boundary processing has also been simplified. The second application is to design wavelets suitable for multidimensional bounded domains and curved surfaces, which cannot be achieved by Fourier transform [26].
Haouam et al. [27] presented the lifting scheme with a biorthogonal wavelet and applied it in Magnetic Resonance Imaging (MRI) images compression. Their method gives better compression results than traditional methods. In [28], they also designed a rational wavelet filter bank based on a lifting scheme and achieved better sparsifying property than other wavelets. Chen et al. [29] proposed an undecimated lifting scheme to exact shift-invariance and show superiority over other methods. A biorthogonal wavelet with shape control is constructed under the lifting scheme in [30], which gives better performance in data compression and noise reduction. Guptha and his team introduced transistor technology based on a lifted wavelet transform architecture. The complete boosting step is processed as a continuous stream of samples. Compared with the existing architecture, the architecture is optimized by integrating forward and backward lifting schemes [31].

B. WAVELET PROPERTIES
The properties of the wavelet are an important reference for constructing wavelets or choosing suitable wavelets to process various signals. Farge [20] has discussed the factors that need to be considered when choosing the mother wavelet, such as orthogonal and non-orthogonal, negative and real values, and the width and shape of the mother wavelet. There are several basic properties or standards for wavelets: vanishing moment, support length, regularity, symmetry and orthogonality. Orthogonality wavelet has good time-frequency localization characteristics. Selesnick gave the necessary conditions for an orthogonal wavelet system to form a Hilbert transform pair [32] and proposed a construction algorithm based on a delay filter in [33], [34]. Selesnick further pointed out that the necessary conditions given by Selesnick in [32] are still sufficient [35]. Vanishing moment is defined as [36]. If the wavelet has N vanishing moments, it should satisfy that: where 0 ≤ p < N , ψ(t) is the wavelet function, and t is the time variable of the wavelet function. Higher vanishing moments of wavelets are required for signal compression, denoising, fast calculation. The larger the vanishing moment is, the more wavelet coefficients are zero. Fig. 2 shows the wavelet coefficients with different vanishing moments. We apply DWT with Daubechies wavelets on an example sine function. The wavelets are db2, db5, and db10, with vanishing moments being 2, 5 and 10, respectively. The larger the vanishing moment of the wavelet, the smaller the highfrequency coefficients after wavelet decomposition, the more concentrated the signal energy, and the higher the signal compression ratio. The support interval of the wavelet function is the length at which the function converges from a finite value to 0 when the time or frequency tends to infinity. The longer the support length, the larger cost of computation is required, and more high-amplitude wavelet coefficients are generated. Wavelets often have a compact support requirement, which means that the wavelet function is zero except for a small value range near 0. The compact support and the vanishing moment are contradictory. The support length represents the length of the filter. If the vanishing moment increases, the wavelet coefficients of the high-frequency sub-band decrease and a larger amount of coefficients are close to zero, so that the support length is shorter. Therefore, the support length and the vanishing moment must be compromised. Compactly support and vanishing moments can be better balanced under the multi-wavelet construction [26]. In actual situations, they can be weighed according to the singularity of the signal. High vanishing moments are more suitable if the signal singularities are few, and if singularities are trivial, shorter supports interval are required.
Regularity is generally used to describe the smoothness of a function. The higher the regularity, the better the smoothness of the function. The Lipschitz exponent k usually characterizes the regularity of the function. Given a positive integer n, if there is a positive integer A and a polynomial of degree n (P n (t)), so that the function f (t) has the characteristics in Equation (2) at t ∈ (t 0 − h, t 0 + h), then f (t) has Lipschitz exponent α at the point t 0 . h is a sufficiently small amount. The Lipschitz exponent characterizes the approximation degree of the function and the local polynomial, which is related to the differentiability of the function. The regularity of the wavelet base affects the stability of the reconstruction of the wavelet coefficients.
A certain regularity (smoothness) is usually required for wavelet analysis to obtain a better-reconstructed signal. The wavelet function has the same regularity as the scale function because the wavelet function is composed of the linear combination of the corresponding scale function translation. When quantizing the wavelet coefficients, to reduce the influence of the reconstruction error on the human eye, the smoothness or continuous differentiability of the wavelet must be increased as much as possible. Wavelets with reasonable regularity can achieve a better smoothing effect in signal or image reconstruction. However, if the regularity is reasonable, the support length will be longer, and the cost of computation will be larger. There is an excellent relationship between vanishing moments and regularity. For many intrinsic wavelets (e.g., spline wavelets, Daubechies wavelets), the regularity of the wavelet becomes larger as the vanishing moment increases [26]. However, this does not mean that as the vanishing moment of the wavelet increases, the regularity of the wavelet also increases. Daubechies has rigorously proved that for compactly supported 2-band orthogonal wavelets, there is no symmetric (antisymmetric) wavelet except Haar wavelet [37]. The wavelet with symmetry can effectively avoid phase distortion in image processing. Therefore, researchers generalized the 2-band wavelet and obtained several significant branches, such as the biorthogonal wavelet [38], the vector wavelet [39], and the M-band wavelet [40]. The properties of biorthogonal wavelets are similar to those of orthogonal wavelets, but they can be completely symmetric. The properties discussed above are still applicable to biorthogonal wavelets. Generally, a wavelet with a high vanishing moment is used for decomposition, and then another wavelet is used for reconstruction.
In the continuous wavelet transform (CWT), the wavelet family is obtained from the base wavelet through expansion and translation. At the same time, CWT has the characteristics of being unchanged after translation and the characteristic of changing together after expansion, so the CWT coefficients have a certain degree of correlation. In other words, the wavelet transform coefficients corresponding to two adjacent points in the time-scale plane are correlated. The closer the two points are, the stronger the correlation is. As the distance between the two points increases, their correlation weakens rapidly. It means that there is data redundancy in the CWT of the signal, which increases the difficulty of analyzing and interpreting the results of the wavelet transform [26].
Although the discrete wavelet transform (DWT) can effectively capture the singularity of one-dimensional signals, it is not the case in two-dimensional cases. The two-dimensional orthogonal wavelet base is formed by the tensor product of two one-dimensional orthogonal wavelet bases, and its direction selectivity is very limited. Only the horizontal, vertical, and diagonal two-dimensional DWT can not effectively represent the contour and edge information of the image; that is, DWT is not a sparse representation of the contour and edge of the image. Its performance in image denoising, texture classification, and image retrieval is lower than that of multidirectional wavelet [41].

C. WAVELET BASES 1) COMMON BASIC WAVELET BASES
Haar wavelet is a step function and the earliest discovered wavelet with the simplest form. Its expression is shown in Equation (3). Numerous wavelet theory books also start from the introduction of Haar wavelet [42]- [46] for the reader to study. Due to its simple and convenient nature, a large number 58874 VOLUME 10,2022 of works have adopted the Haar wavelet to achieve their goals.
Daubechies wavelet was constructed in 1987. The most significant advantage of Daubechies wavelet is that it can be realized by finite impulse response conjugate mirror filter. Both Haar and Daubechies wavelets are orthogonal wavelets. Daubechies wavelets are generally abbreviated as dbN , where N represents the order of the wavelet. dbN has no closed-form expression. Daubechies wavelet has better regularity. As the order N increases, the vanishing moment of the wavelet is larger, and localization ability in the frequency domain is stronger; but at the same time, the amount of calculation is larger. In order to have better symmetry, Daubechies improved the wavelet system and constructed the Symlets wavelet, and Coiflets wavelet [25]. Coiflet has better symmetry than dbN. Symlet wavelet function is an approximately symmetric wavelet function, which improves the dbN function. Compared with dbN wavelet, they are consistent with dbN wavelet in terms of continuity, support length, and filter length, but Symlet and Coiflet have better symmetry; they can reduce the phase of signal analysis and reconstruction to a certain extent distortion.
Biorthogonal wavelet is conducive to signal reconstruction and can accurately reconstruct the signal through a finite impulse response filter (FIR). Biorthogonal wavelets are able to have tight support, high vanishing moments and symmetry in the meantime. Its construction methods are able to be roughly separated into two categories: spectral decomposition and lifting schemes. Many researchers have proposed different methods of constructing biorthogonal wavelets. Bhatnagar [47] explained the biorthogonal representation of functions. Biorthogonal wavelets are a generalization of orthogonal wavelets. Therefore, there are more degrees of freedom in designing biorthogonal wavelets. Reference [30] introduced Catmull-Clark subdivision surface and combined it with biorthogonal wavelets. They applied this novel wavelet to noise suppression, data compression, and other applications and achieved better results. In [48], researchers used the homotropy method instead of Newton's method to construct a biorthogonal wavelet, which enlarged the selective range of the biorthogonal wavelet.
Coffey and Etter introduced internalized MRA and constructed biorthogonal wavelet based on the bounded domain efficiently [49]. Tay and Lin proposed a technique for constructing biorthogonal wavelets with rational coefficients. This wavelet has a linear phase and also has very similar properties to quadrature filters [50]. In [51], an algorithm for estimating rotor displacement of a magnetic bearing motor based on a multi-resolution filter bank biorthogonal spline wavelet is proposed. The algorithm utilizes biorthogonal spline wavelets with generalized linear phase and tight support characteristics, which can accurately demodulate the ripple current in the coil and extract the displacement information. Nagare et al. [52] proposed a new half-band polynomial with rational coefficients using Bernoulli polynomials to design biorthogonal filter banks. Singh and Pathak construct biorthogonal wavelet packets in the Sobolev space H s (K) on the local positive eigenfield and derive their biorthogonality at each layer by Fourier transform [53].
Meyer wavelet is different from the previous wavelets. It is defined in the frequency domain [26]. Although it has an analytical form, it does not have a compact support set. Meyer wavelet, therefore, has no fast discrete wavelet transform algorithm. An FIR filter can be used to construct a filter matrix to approximate and simulate Meyer wavelet transform. In [54], the potential problem of contaminated data is handled by a regularization scheme based on Meyer wavelets. The regularization solution is recovered by Meyer wavelet projection of the Meyer MRA elements. Lee and Ryu also suggested that the OFDM system using Dmey is the most similar to traditional OFDM but solves the disadvantages of the traditional Discrete Fourier transform-OFDM system [55], [56]. A novel fractional Meyer neuroevolutionbased intelligent computational solver is proposed in [57] for numerical processing of bi-singular multi-fractional Lane-Emden systems using a combination of Meyer WNNs.
Regimanu et al. [58] used a multi-resolution wavelet transform technique to remove dithered signals. The five-level multi-resolution analysis uses various wavelet types such as discrete Meyer wavelets (Dmey) and Daubechies wavelets. The dithered signal is attenuated by 107.0 dB, and the phase characteristic is found to be linear in the passband, with lower computational complexity. In [59], Sabir and his team propose a novel stochastic computational framework based on fractional Meyer wavelet artificial neural networks, designed for nonlinear singular fractional Lane-Emden differential equations. The statistical results verify the model's superiority in solving singular nonlinear fractional-order systems. Fig. 3 shows some example wavelet functions about Haar wavelet, Daubechies wavelet, Biorthogonal wavelet, Meyer wavelet, Symlets wavelet and Coiflets wavelet.

2) THE DEVELOPMENT OF WAVELET BASE
Classical wavelet has the advantages of multi-resolution analysis structure and time-frequency localization. However, this advantage is only applied in signal processing and cannot be generalized to two-dimensional or even higher dimensions. In order to make up for this shortcoming while retaining the advantages of wavelet analysis, Daubechies and Mallat have constructed numerous new wavelet systems based on classic wavelets, each with its own characteristics.
In order to analyze high-quality audio and speech, a nonuniform frequency domain representation is required [28]. Rational wavelets can provide non-uniform frequency partitions of the signal spectrum and further improve flexibility. They can also provide greater flexibility and higher timefrequency analysis accuracy for WNN design. Chertov and Malchykov verified that the perfect reconstruction condition is satisfied for the reducible fraction as the dilation factor by an example [60]. Ansari and Gupta used lifting scheme to design a rational learning wavelet, which also extends dyadic to rational wavelet [24]. It owns all the lifting framework's advantages and has better results when applied in compressed sensing reconstruction of signals.
Another novel wavelet is the fractional wavelet. Fractional wavelets extend classical wavelets and are suitable for higher dimensions due to their low memory. Tausif, Jain, Khan and Hasan designed two-type architectures of Fractional wavelet filter (FrWF) with 5/3 filter bank: with multiplier and without a multiplier, and it required less memory than existing architecture [61], [62]. Tausif et al. [63] proposed a segmented modified FrWF to reduce the high time complexities of DWT and FrWF, about 16.8% and 53.6%, respectively, and has about 65% lower energy consumption than traditional FrWF for high-resolution images. In [64], Liu et al. combined fractional wavelets and a scattering network and constructed a fractional scattering network to obtain improved signal and image classification performance. Shi et al. studied the sampling theorem for fractional wavelet transform and discussed sampling and aliasing errors estimating [65]. A hybrid fractal wavelet coder is proposed in [66]. It has the advantages of wavelet transform and achieves significantly improved image quality without obviously blur. In [67], the authors combined fractional wavelet and biorthogonal wavelet and defined fractional MRA. They constructed the necessary and sufficient conditions for translation of the wavelet to form a fractional Riesz basis. Reference [68] discussed fractional spline wavelets and verified that it is more effective than traditional wavelets in the texture recognition of the surface texture of machine parts.
When the selected wavelet function has a complex domain instead of a real one, the wavelet is defined as a complex wavelet. Complex wavelet has the form ψ c (t) = ψ real (t) + jψ image (t). The real part ψ real (t) and imaginary part ψ image (t) of most complex wavelets is a Hilbert transform pair. Fernandes et al. [69] combined a mapping filter and an inverse mapping filter with a complex wavelet and constructed a mapping-based complex wavelet transform. This wavelet transform has both directivity and is nonredundant. Toda and Zhang and some other researchers [70], [71] proposed a series of complex wavelet transforms having perfect translation invariance based on different methods: Hilbert transform pair [70], and 3-dilation orthogonal basis with perfect translation invariance [71]. They also constructed a tight wavelet frame based on a designed complex wavelet [72] in the frequency domain. In [73], the complex wavelet packet energy moment entropy is defined as a new monitoring index to characterize bearing performance degradation.
There are some special wavelets. The widely applied nonorthogonal wavelets are Gaussian Wavelet, Morlet Wavelet, and Mexican Hat Wavelet [74]. Orthogonal wavelet function is generally used for discrete wavelet transform; nonorthogonal wavelet function can be used for discrete wavelet transform or continuous wavelet transform [75]. Gaussian wavelet is the first derivative of Gaussian function; its expression is ψ(t) = −te −0.5t 2 . Mexican hat wavelet is the second derivative of Gaussian function; express as ψ(t) = (1 − t 2 )e −0.5t 2 detector for finding a Gaussian noise is the Mexican Hat wavelet [76]. Mexican hat is real-valued and captures both the positive and negative oscillations of the time series as separate peaks in wavelet power [20]. In addition, to obtain information on both the amplitude and phase of the time series, it is necessary to choose a complex wavelet because the complex wavelet has an imaginary part, which can express the phase well. Morlet wavelet (express as ψ(t) = cos(5t)e −0.5t 2 ) is simply a complex wave within a Gaussian envelope. Reference [25] considered several different test signals, such as noise, phase shift, bump and a slight spike, to test the performance of Morlet wavelet with different parameters. Morlet wavelet has a good balance between the localization of time and frequency. Fig. 4 is the waveform of the three non-orthogonal wavelets. Table 1 shows the properties of several standard wavelet bases introduced above.
Novel wavelets could also be constructed by combining different properties of different wavelets. Wen et al. [77] studied the decomposition and reconstruction orthogonal rational wavelet filter bank with dilation factor M = 3 2 . They constructed high-pass filter banks from low-pass filter banks and gave a perfect reconstruction method. In [78], Li also proved the condition of the perfect reconstruction for the orthonormal wavelet bases with rational dilation factor M = p q and gave two examples of orthogonal wavelet bases to verify the perfect reconstruction. Yu and her team constructed a wavelet that combined complex wavelet, rational wavelet and orthogonal wavelet [79]. It achieves better robustness in broadband sonar pulse than linear frequency modulated pulse-based system [80], [81], and has well system performance against Doppler effect and inter-symbol interference caused by multipath while reducing channel noise [82].

3) EXPANSION OF WAVELET BASE IN DIMENSIONS AND SCALES
One of the wavelet extension directions is higher-dimensional wavelets. At present, two-dimensional wavelet analysis has made significant progress both in theory and application (mostly in image processing). In [83], Rinoshika designed a three-dimensional orthogonal wavelet based on Daubechies wavelet and analyzed instantaneous 3-D velocity fields of a high-resolution tomographic particle image velocimetry. At different wavelet decomposition levels, different vortexes could be extracted. Reference [84] proposed a three-dimensional discrete wavelet transform for hyperspectral faces feature extraction compared with three existing hyperspectral face recognition methods and achieved higher accuracy.
The wavelets discussed previously are all single-scaling wavelets. Whether it is a classic wavelet or a newly designed wavelet, the wavelet function is constructed by a single scaling function. In signal processing, whether the wavelet has properties such as compact support, symmetry, orthogonality, and the vanishing moment is essential. However, it is not easy for a single-scaling wavelet to have these properties at the same time. Multi-wavelet means that multiple scaling functions complete the construction of wavelet functions. The construction of multi-wavelets can usually be transformed into the solution of vector filter matrix coefficients. Compared to single-scaling wavelets, multi-wavelets have superior properties such as symmetry, regularity, and vanishing moments in the compactly supported range, so they have received extensive attention in the field of signal processing.
Reference [85] uses the Optimized multi-wavelet transform of electroencephalography (EEG) signals for the classification of eye movements of humans and achieves higher accuracy when there are different movements and blinking. In [86], multi-wavelet transform is applied on mechanical features extraction of on-load tap-changer and achieve a better result in fault detection. Both the authors in [87] and [88] combined multi-wavelet and neural networks and then achieved better results in their research fields. In [88], its approximate performance is far better than that of some classical algorithms, even in algorithms that use mother wavelets. In [87], this method can effectively expand the data set and build a CNN model through experiments and has good robustness to noise, misalignment, and different numbers of training samples of the same type.

IV. CLASSIFICATION OF WAVELET-BASED SIGNAL SPACE DECOMPOSITION A. DISCRETE WAVELET TRANSFORM
Wavelet transform has the characteristics of multi-resolution analysis and can characterize the local characteristics of the signal in both the time and frequency domains. This method performs multiscale analysis on the signals through calculation functions such as expansion and translation. Compared with the Fourier transform, it is able to provide VOLUME 10, 2022 a ''time-frequency'' window that changes with frequency. It can also fully highlight certain aspects of the signals. DWT is the most basic and most widely used wavelet transform, which is implemented by a two-channel filter bank with different levels. DWT is obtained by discretizing the scale and displacement of continuous wavelet transform according to the power of 2, so it is also called dyadic wavelet transform. For many signals, the low-frequency component is essential, it contains the characteristics of the signal in many cases, and the high-frequency component gives the details or differences of the signal. In DWT decomposition, low-frequency information represents the high-scale of the signal, which is an approximation of the signal; high-frequency information represents the high-scale of the signal, which is the detail of the signal. Therefore, the original signal passes through two mutual filters to produce two signals. The approximate signal is continuously decomposed through the continuous decomposition process, and the signal can be decomposed into many low-resolution components. Theoretically, the decomposition can proceed without limit. In practical applications, the appropriate number of decomposition layers is generally selected according to the characteristics of the signal or appropriate standards.
The DWT of the signal is not directly realized by the inner product between signals and ψ(t) (the wavelet function) and φ(t) (the scaling function), but by using high-pass filter h[n] and low-pass filter g [n]. It regards the wavelet coefficients c j [k] and d j [k] of the signal as discrete signals, and h [n] and g[n] as digital filters, thereby establishing the wavelet transform and filter bank. The filter bank theory realizes the relationship between the signal wavelet analysis. Most research involving wavelets will introduce wavelet analysis into the design of filter banks. A particular wavelet filter bank can be designed according to the processing object. The structure of DWT is shown in Fig. 5(a). Fig. 5 introduces the wavelet decomposition structure of different wavelet transforms, which are DWT, discrete wavelet packet transform (DWPT), dual-tree complex WT (DT-CWT), stationary wavelet transform (SWT). DWT also use a downsampling filter after the high-pass filter and low-pass filter. Assume the original signal x[n], the ith level coefficients could be calculated as: (4) where K is the length of the filters, h[n] and g[n] are high-pass filter and low-pass filter, respectively. SWT (stationary wavelet transform) is a DWT with no down-sampling. The structure of SWT is shown in Fig. 5(b) Zheng and his team [89] chose the basic wavelet as Haar wavelet and applied SWT on heart-rate monitoring. They achieved very high accuracy when estimating heart rate in the driving scenario. The results of different wavelets and different decomposition levels are compared from the three aspects of accuracy, sensitivity, and specificity of EEG signal classification and found that deeper levels may have better accuracy in EEG data classification using SWT in [90]. In [91], undecimated wavelet transform (UWT) is used in order to ensure the shift insensitivity property of the coefficients for time series prediction. The core idea of UWT is to remove the down-sampling in the sampling wavelet transform and replace it with the up-sampling of the filter.

B. EXTENSION OF DISCRETE WAVELET TRANSFORM 1) WAVELET PACKET
In the process of decomposition, wavelet analysis only redecomposes low-frequency signals and does not decompose high-frequency signals. Therefore, its frequency resolution decreases as the frequency increases. Wickerhauser and other researchers proposed the concept of wavelet packet [92]. The wavelet packet (WP) decomposes the low-pass and high-pass components of the signal frequency band simultaneously to locate any frequency band. Fig. 6 is the wavelet decomposition tree of DWT and WP. It presents the concept of optimal basis selection based on wavelet analysis theory. Many researchers designed different wavelet packet bases. [73] combined complex wavelet and wavelet packet energy moment entropy and defined it as a new monitoring index to characterize bearing performance degradation. In [93], the combination of wavelet packets and genetic programming significantly improves prediction accuracy.
Islam et al. [94] introduced a particular wavelet packet called perceptual wavelet packet to enhance speech signals. This method resulted in better spectrogram output and higher scores in subjective listening tests. In [95], defects are characterized by the power of low-frequency components and wavelet packet energy after wavelet decomposition, complex signals are analyzed, and defect features are extracted. [96] combining wavelet packets and CNNs to classify sound signals from excitation-induced extraction of wavelet packet decomposition (WPD) features. Liu and his team proposed an improved wavelet packet denoising algorithm, which determines the optimal decomposition layer according to the difference in the correlation function values of the wavelet packet coefficients [97]. In addition, the wavelet packet coefficients are divided into the approximate part, blur part and detail part. Singular spectrum analysis, fuzzy threshold and correlation analysis are carried out on the selection of these three different types of coefficients to preserve the dynamic performance of chaotic signals to the greatest extent. An energy analysis method based on wavelet packet is proposed in [98]. This method is used to calculate the wavelet packet energy index of the ground-penetrating radar signal of clay samples with water content. The results show that there is a highly correlated linear relationship between WPEI and soil water content, and the relationship between the two fits a linear fitting function. Fig. 5(c) shows the discrete wavelet packet transform (DWPT). 2D-DWT has three priority directions: horizontal, vertical and diagonal. Due to the supplementary decomposition of the output of the high-pass filter, 2D-DWPT has higher directional selectivity [99]. The decorrelation property is closely related to the shape of the Fourier transform that supports the width and wavelet packet function [100] addressed the DWPT for continuous-time fBm and considered stationarization and asymptotic decorrelation. They also studied the influence of fBm with or without independent white Gaussian noise on selecting the best wavelet packet basis. Khaleel and his team proposed an adaptive neuro-fuzzy method based on a discrete packet wavelet transform-Kalman filter for power quality identification and classification [101]. [102] proposed a flexible architecture that computes generalized wavelet packet trees with the help of boost-based bypass wavelet filters and bit-swapping circuits. An enhanced fault detection method combining maximum overlap discrete wavelet packet transform and Teager energy adaptive spectral kurtosis denoising algorithm to identify weak periodic pulses is proposed in [103].

2) COMPLEX WAVELET TRANSFORM
Complex Wavelet Transform (complex WT) is a complex extension of DWT. Remenyi et al. [104] defined the complex maximal overlap scale mixing 2D complex WT and applied it to image denoising. Their method achieved excellent visual performance. Fernandes et al. constructed a new framework of complex WT and provided a mapping-based and nonredundant complex WT [69]. The new framework based on the mapping of complex WT overcomes the serious shortcomings of DWT and has the benefits of controllable redundancy and flexibility. In [105], Xu et al. applied complex WT to mitigate noise in gas-insulated switchgear signals. They achieved better noise filtering results by extracting two kinds of the information-the real part of the wavelet analysis and the imaginary part of the wavelet analysis. One important design of complex WT is dual-tree complex WT (DT-CWT). Its implementation uses two real-valued DWTs, one giving the real part of the transform coefficient and one giving the imaginary part. Its advantage is that it has better directionality in two dimensions or even higher dimensions, low redundancy, and is an effective, fast calculation algorithm. Kingsbury [11] first proposed the DT-CWT structure in 1998. Fig. 5(d) is the filter bank structure of DT-CWT.
As introduced before, DT-CWT has advantages in two dimensions or even higher dimensions signals, so it is widely used in image processing. Fahmy et al. [106] used DT-CWT on the video magnification techniques and introduced a new and accurate method of orthogonal filter design for constructing the DT-CWT system. They modify the phase differences between the wavelet coefficients and achieve better video quality with less calculation cost. Farhadiani et al. [107] proposed a new method to reduce the speckles on synthetic aperture radar images based on an undecimated DT-CWT and achieve better performance. However, this method consumes more computational cost. [108] obtained a neural network dataset using chest X-ray images and subband images obtained by applying a DT-CWT to the above images.
Prashar et al. [109] evaluated in detail the impact of threshold, threshold algorithm and distribution function choice on the performance of ECG denoising with DT-CWT. [110] proposed a method using global-based DT-CWT for kinship recognition on similar full-face images. Then, the researchers proposed novel patch-based kinship recognition methods for DT-CWT: local patch-based DT-CWT and selective patchbased DT-CWT. The former extracts the coefficients of smaller face patches for kinship identification. The latter extends the former, only extracting the coefficients of representative blocks with similarity scores above the normalized accumulation threshold. All the references above focus on image processing, which is 2-D DT-CWT. In [111], the authors extended DT-CWT to higher dimensional (e.g., 3-D) and studied the power spectral density of the real and imaginary parts of the complex coefficients of the DT-CWT. They achieve more accurate results in the wavelet noise filtering area.

C. RATIONAL WAVELET TRANSFORM
The traditional wavelet transforms are dyadic wavelet transforms. Its iterative decomposition process repeatedly divides the frequency domain space at the input into two parts with equal bandwidth. The concept of rational multi-resolution analysis was first proposed by Auscher [112] and then systematically introduced by Mallat. Auscher [112] proved in 1992 that real rational orthogonal wavelets were derived under the framework of a rational MRA.
The explanation for the expression is similar to MRA above, but the expansion factor is different. Original MRA is dyadic, and rational MRA contains rational factor M . The orthogonal basis of V j is constructed by extending and translating the mother wavelet function ψ(t) ∈ L 2 (R). It is called the scaling function. The basis function of V j is given by [113]: In [114], Kovacevic constructed perfect reconstruction filter banks with rational sampling factors. The perfect reconstruction filter bank theory is generalized to a rational situation, thereby allowing non-uniform division of the frequency spectrum. This feature may be helpful in speech and music analysis. Reference [113] reviewed the theory of rational MRA, proposed a pyramid algorithm for calculating fast orthogonal wavelet transform, and explained the analysis process and synthesis part in detail. It proposes the application of signal denoising through a rational wavelet to show that the scale factor matches the signal information better. Fig. 7 is the rational part of the filter bank.
The Q factor (Q factor is the Quality Factor, defined as the filter centre frequency to bandwidth ratio) of the wavelet transform should be selected reasonably according to the oscillation behaviour of the signal [115]. For example, the wavelet transform should have a relatively high Q factor when using wavelets to process and analyze oscillating signals (such as speech and EEG signals). However, in addition to continuous wavelet transforms, most wavelet transforms have poor tuning capabilities for wavelet Q-factors. The Q factor is constant and low in the dyadic wavelet [37], [115]. In this transformation model, the bandwidth of the bandpass filter in the higher frequency domain space is wider, resulting in a sparse partition of the higher frequency domain space. Therefore, this conversion mode is suitable for signals with fewer oscillation characteristics but not for signals with significant oscillation characteristics [115]. Compared with the traditional dyadic and integer wavelet transform, the Q factor of the rational wavelet transform (RWT) is adjustable, which can realize more free and fine frequency domain segmentation [37], [115]. However, the local performance in the time domain is relatively weak.
In [116], Bayram and Selesnick introduced a filter bank with a rational q/p sampling factor based on [117]. Reference [117] designed an orthogonal rational filter bank, and it is close to wavelet transform. Fig. 8 is a twoband rational filter bank example, which is defined as rational-dilation discrete wavelet transform (RADWT) [118]. In [118], Bayram and Selesnick also designed overcomplete RWT, which is composed of a self-reverse HIR filter based on the rational sampling factor, so it realizes the reconstruction of the decomposed signal and has translation invariance. The Q-factor can be controlled by changing p and q in Fig. 8. Han et al. [119] design rational coefficients biorthogonal wavelet filters by the thought of complete reconstruction filter idea and adding vanishing moment characteristics. By reducing the vanishing moment of the wavelet filter, more high-frequency information can be retained in the wavelet transform domain, which is suitable for edge detection. The simulation results show that image edge detection under a noisy environment has achieved some significant effects. Fig. 9 shows the structure of DT-RADWT. The structure is based on Fig. 5(d) and Fig. 8. In [120], Canditiis and his team use a complete filter bank (i.e. RADWT) to guarantee a perfect reconstruction property and a tunable Q-factor.
To illustrate the difference between DWT filter bank and RWT filter bank, the example complex rational orthogonal wavelet (CROW) constructed in [79] is shown below. Dilation factor a = 1 + 1 q . The wavelet basis function is defined in the frequency domain by (7):  And the rational scaling function is given also in the frequency domain by: where β(t) is the construction function and has the form in (10). It is not unique.
Based on these definitions above, the CROW function is: whereψ(t) is the Hilbert transform pair of ψ(t) and defined in the frequency domain.
(ω) = −jsign(ω) (ω) Fig. 10 shows the time and frequency response of the wavelet ψ(t) with an example dilation factor a = 3/2, q = 2. Fig. 11 shows the frequency response of DWT filter bank and RWT filter bank. RWT has a better frequency resolution than DWT. As q increases, the dilation factor gradually decreases and is close to 1, and the frequency resolution of the filter bank is better.
Reference [121] proposed a high-accuracy general rational approximation model of Gaussian wavelet series in the time domain. The proposed wavelet basis approximation model can be extended to any order and wavelet function without explicit formulation. In 2009, Selesnick and Bayram [115], [118] constructed an over-complete fractional wavelet transform method, which is different from the early critical sampling mode. It allows a small amount of redundancy to improve local performance in both the time and frequency domains. If the signal and noise have strong time-frequency coupling, that is, the distribution of signal and noise overlaps on the time axis or frequency axis, it is challenging to design a reasonable filter.
Fractional wavelet transform (FrWT) has flourished with the further development of wavelet technology. As introduced above, fractional wavelet is a novel wavelet system. FrWT extends the wavelet transform to the time domaingeneralized frequency domain (fractional Fourier domain), which has greater signal analysis and processing flexibility. Mendlovic et al. [122] first introduced FrWT and suggested that the FrWT may be used for image compression since it improved the reconstruction performance of the wavelet transformation. Reference [61] proposed a fractional wavelet filter and compared it to state-of-the-art low memory DWT, which showed that it has better performance. The architecture proposed by FrWF, which uses filter banks to calculate the two-dimensional DWT coefficients of images, requires less memory and fewer hardware components.
In [64], Liu's team introduced FrWT and designed a scattering network based on FrWT. They extended the traditional scattering network with fractional coefficients and achieved higher image classification accuracy. The authors in [123] also considered combining FrWT and neural network. FrWT is treated as a set of linear translation variable multiscale filters. They defined fractional wavelet scattering transform based on it and validated it with computer simulations. In [124], Fan'team detailed the construction of FrWT and applied it to signal denoising. They proposed a two-dimensional search method to determine the optimal order of the fractional wavelet transform and verified the effectiveness and superiority of the method. For an example of denoising in sine signal, SNR can be increased by about 40%, and RMSE can be reduced by about 50% when applying FrWT. Shi et al. [65] extended the sampling theorem based on the FrWT subspace and discussed some applications of exporting results. Kumar and Naik combined compressive sensing and FrWT and ensured the security of picture transmission [125]. In [126], the authors proposed the definitions and properties of a novel designed FrWT to overcome the limitations of some existing wavelet transform and FrWT.  In recent years, attention to wavelet analysis has increased day by day, and many researchers have published literature on wavelet analysis. Searching for literature on wavelet-related topics on the 'Web of Science' website (the statistics are as of the end of 2021), a total of 5972 samples were obtained. Related keywords in these documents include fractional wavelet (1734 articles), dyadic wavelet (702 articles), orthogonal wavelet (3239 articles) and rational wavelet (297 articles). The time periods include the period from 1990 to 2021. For the convenience of statistics, the periods are divided into five parts: 1990-1996, 1997-2003, 2004-2009, 2010-2015, and 2016 to the present. Statistics may contain duplicate documents because designing wavelets with multiple characteristics is more in line with research needs to apply wavelet analysis better. Fig. 12 shows the statistics about publications in recent years. It can be seen from the figure that orthogonal wavelets are always published with the largest number. The reason is that orthogonal wavelets reduce the correlation of sub-band data and reduce redundancy. When wavelet theory first developed, there were more dyadic wavelets; however, the proportion of research on rational wavelets and fractional wavelets has increased with the development of wavelet analysis. In recent years, the number of publications has far exceeded dyadic wavelets.

V. WAVELET NEURAL NETWORK
Wavelet Neural Network (WNN) integrates the advantages of artificial neural networks and wavelet analysis, which makes the network converge fast and has the characteristics of time-frequency local analysis. Searching for literature on WNN-related topics on the 'Web of Science' website (the statistics are as of the end of 2021), a total of 10990 samples were obtained. The statistic is shown in Fig. 13. In recent years, the research on WNN has shown a blowout type development. There are two primary forms of WNN. In the first one, wavelet analysis performs preliminary processing on the input of the neural network, making the information input to the neural network more effortless for the neural network to process. The decomposed signal obtained by the original signal through different wavelet decomposition levels will be used as the input of the neural network. Furthermore, the features of the decomposed signal could also be extracted as input. These obtained features could be the maximum, minimum, average, and deviation value of the decomposed signals and the amplitude, slope (or gradient) of amplitude, time of occurrence, mean, standard deviation, and energy of the signals.
The second approach is the deep fusion of wavelets and neural networks. There are two ways to integrate. One is to replace neurons with wavelet elements, replace the activation function with the positioned wavelet function, and establish the connection between the wavelet function and the neural network coefficients through affine transformation. The corresponding weights from the input layer to the hidden layer and the threshold of the hidden layer are replaced by the scaling factor and the time shift factor of the wavelet function [14]. The other is recently proposed to replace the convolution kernel with wavelet in CNN because the kernel of CNNs seems like a filter. It provides a very efficient way to obtain custom filter banks [127].

A. SIGNALS PREPROCESSED BY WAVELET ANALYSIS
The signals are preprocessed by wavelet analysis, which means the wavelet space is used as the feature space for pattern recognition. The inner product of the wavelet base and the signal is weighted to realize the feature extraction of the signal, and then the extracted feature vector is sent to the neural network for processing. Reference [128] uses Coiflet wavelet as an envelope extraction and then chooses the mean value, standard deviation, peak value, and RMS (root mean square) value as the features which are the input of PNN (Probabilistic neural network). They also compared it with a traditional back-propagation neural network and achieved better classification accuracy. In [129], wavelet decomposition is first applied to the signal and then obtains the energy and the PSD value of the detailed divided signal as the extracted features for the input of the neural network. Wavelet decomposition architecture is shown in Fig. 14

, which uses DWT as an example. X [n] is the input signal, G[n] and H [n]
are the lowpass filter and highpass filter. When the wavelet decomposition level is different, the number of signals after decomposition is also different. These decomposed signals have different energy and other characteristics.
Appropriate level numbers need to be selected according to the specific situation in actual applications. Reference [90] compared several decomposition levels and found that deeper levels may have better accuracy in EEG data classification using SWT. Sun et al. [130] use wavelet packet to decompose the signal into seven layers and extract the energy of wavelet coefficients in the seventh layer as the input of the PNN classifier. They test several different wavelets and finally, the most effective wavelet is db3. The energy of the 128 nodes of the seventh layer wavelet coefficients is normalized into less dimensional eigenvectors to speed up the classification process. [131] uses wavelet transform to remove noise effects on images and perform feature extraction for recognition. On a limited dataset, the algorithm was still able to identify COVID-19 cases. In [132], the authors introduce synchronous compression wavelet transform to more clearly represent the intrinsic properties of AE waves in the timefrequency domain and find that AE waves caused by different mechanisms exhibit different energy distribution patterns. Then, a multi-branch convolutional neural network model with two branches is developed to automatically classify three types of acoustic emission waves by considering their simultaneous compressed wavelet transform maps at different time-frequency scales.
The authors in [133] and [134] calculated the wavelet energy spectrum of the signal treated with wavelet decomposition and a separate reconstruction algorithm. Shao et al. [133] applied DWT for wavelet decomposition and calculated wavelet energy of each wavelet coefficient as wavelet energy spectrum E as the feature vector, which includes all E j . Use Fig. 14(a) as an example: where E could be calculated as: In [134], Zhang et al. use WP for wavelet decomposition and strike the energy distribution of the wavelet packet as the feature vector. Fig. 14(b) is a three-level WP decomposition, as an example. The total energy of the third level is: and the energy distribution vector is: Another possible feature vector x, which represents the WNN inputs, is shown in Equation (17). These features are from the original signals and the decomposed signals at different decomposition levels. The most widely used features are the energy of decomposed signals at different decomposition levels. The wavelet as the preprocessing method of the input signal of the neural network is similar to most current wavelet analysis methods for feature extraction. Wavelet transform is a local transform of time and frequency. It has the characteristics of multi-resolution analysis, and it can characterize the local characteristics of the signal in the time and frequency domains. Since the wavelet transform can concentrate the energy of the original signal on a small part of the wavelet coefficients, and the decomposed wavelet coefficients have a high degree of local correlation in the detail components, this provides a decisive condition for feature extraction. The use of wavelet transform for feature extraction has been widely used in texture analysis, image compression, and defect detection. The neural network has the characteristics of self-learning, self-adaptation and fault tolerance. Then use the neural network to classify or predict the extracted features, and better results can be obtained.

B. COMBINATION OF WAVELET FUNCTION AND NEURAL NETWORK 1) WAVELET KERNEL-BASED NEURAL NETWORK
Wavelet kernels (WK) are a strong contender for initializing convolutional neural network kernels because the use of these kernels produces useful approximations of the signal after convolution operations [135]. The initialization of kernels in a CNN plays a crucial role in network performance. Better initialization provides better performance with fewer training iterations/epochs. The kernel of CNNs seems like a filter. Therefore, wavelet kernels may be good candidates for initializing CNN kernels, which are hardly reported in the existing literature [136]. Wavelet kernels are usually used in convolutional layers, similar to filters. The kth kernel of the lth layer before the nonlinear activation has the feature value h l k can be denoted as [127]: where w l k is the weight of kth convolutional kernel of the lth layer and b l k denotes the bias.
x is the input signal, * is the convolutional operator. The proposed WK performs the convolution operation with a predefined wavelet function ψ u,v (t) that depends on tranfer parameters u and scale parameter s only, the feature value h can be denoted as: In [137], researchers combined CNN, genetic algorithm and Extreme Learning Machine with WK to increase the performance of classification. They investigated several stateof-art CNN architectures like AlexNet and VGG-19 and achieved more than 95% accuracy even in 10 classes. [127] proposed a novel wavelet deep neural network called WKNet, where a continuous wavelet convolutional layer was designed to replace the first convolutional layer of standard CNN. This enables the layer to discover more meaningful filters. Furthermore, the raw data are directly learned from the scale and translation parameters. It provides a very efficient way to obtain custom filter banks. Mo et al. [138] refer to [127] and designed their variational kernel. They compared their designed kernel with WK, which uses three different wavelets: the Laplace wavelet, Morlet wavelet, and Mexican hat wavelet and concluded that if a wrong type of wavelet kernel (Mexican hat wavelet kernel in this reference) is selected, it may even reduce the network performance. In [136], a WK-based CNN is designed for acoustic sensor data analysis. The proposed network has less training time than other designed CNN and achieves higher accuracy than standard CNN. [135] also used the WK-based CNN in [136] for fault identification and classification and achieved better performance than some other designed CNN.

2) WAVELET FUNCTION AS ACTIVATION FUNCTION
The basic idea was formally put forward by Zhang et al. [13], that is, the wavelet function is used to replace the hidden layer function of the conventional neural network, and the corresponding input layer to the hidden layer weight and hidden layer threshold are respectively determined by the wavelet basis function. The scale parameter and translation parameter are used instead [14]. Its basic structure is shown in Fig. 15, where X i , (i = 1, 2, . . . , L) is the input sample, j , (j = 1, 2, . . . , M ) is the wavelet basis function, F k , (k = 1, 2, . . . , N ) is the output of the network, and U i,j represents the connection weight between the ith neuron in the input layer and the jth neuron in the hidden layer, and ω j,k represents the jth neuron in the hidden layer and the kth neuron in the output layer.
According to the continuity of the selected wavelet basis function, the connection weight between the neurons can be divided into two types: WNN with continuous parameters and WNN based on wavelet framework [139]. For WNN with continuous parameters, wavelet function is ψ(t) and b j , a j are the scale parameter and translation parameter. It comes from the definition of the continuous wavelet transform. Its characteristic is that the positioning of the basis function is not limited to the finite discrete value, the redundancy is high, the expansion is not unique, and the correspondence between the wavelet parameters and the function is not fixed. It has a nonlinear optimization problem similar to the BP network. However, wavelet analysis theory helps the initialization of the network and guides the learning process to have a faster convergence speed. The wavelet function of the hidden layer is: The output of the simple three-layer WNN in Fig. 15 could be written as: For WNN based on a wavelet framework, the theoretical basis is the wavelet frame (detailed information is in [140]). However, the wavelet basis under the tight frame is not necessarily orthogonal and may not have tightly supported characteristics, representing a certain degree of redundancy in the estimation. Since the wavelet frame can represent smooth signals and signals with singular characteristics, the wavelet frame method has been widely used in signal, image processing, and other fields. The wavelet function in the hidden layer could be written as: where a 0 , b 0 are the basic units of scaling and translation. So the output of WNN in Fig. 15 is: The construction of WNN is a critical issue. Zhang [141] used regression analysis to give a method for constructing wavelet networks. He constructed a feed-forward neural network based on WNN structure and discussed that it is suitable for neural network construction methods development. Pati and Krishnaprasad [142] gave two methods of wavelet network synthesis, which systematically defined the structure of the network and determined some weight values in the network in advance, thus simplifying the network training problem. Reference [143] also proposed a ''decompositionsynthesis'' method of wavelet basis function network structure design, which effectively reduces the wavelet primitives required to construct wavelet networks.
For the feed-forward network, Stepanov in [74] detailed the construction of activation functions of WNN and provided the procedure of choosing proper wavelet models. He concluded that polynomials, neural networks and spline wavelet models could be used when constructing the activation function of WNNs. The spline wavelet model provides the guaranteed accuracy of the wavelet approximation to the sample, but the model has a high degree of complexity. In [144], researchers use a single hidden layer feed-forward WNN. The results demonstrate the effectiveness and feasibility of the proposed observer in detecting nonlinear system faults. An example is shown in Fig. 16. ω i,j are the weights from input to the wavelet neurons in the hidden layer, and ω ψ j are the weights from the wavelet hidden layer to the output layer. ω x i are the weights of the input connected to the output directly, and θ is used for nonzero mean functions on finite domains [145]. The output of the feed-forward could be written in (28), where wavelet function is ψ(x) and a j , b j are the scale parameter and translation parameter.  Banakar and Azeem [146] combined the feed-forward network with wavelet functions, where the sigmoid activation function (SAF) and the Morlet wavelet activation function (WAF) are paralleled in each neuron model. After the introduction of the wavelet function, the performance of the performance calculation method adopted by [146] has increased by about 20% on different examples. An example structure of the Sigmoid and Wavelet network is shown in Fig. 17. Different from previous WNN structure, W ω i and W S i on Fig. 17 represent the weight values. In order to represent two sets of parallel activation functions, there should be two sets from the input layer to the hidden layer. After being summed separately, they will go through the sigmoid function and wavelet function of the hidden layer. Fig.18 is the sigmoid function and Morlet wavelet function. Equation (29) is the sigmoid function. Different from traditional WNN, each neuron has two parallel sets of weights. Then the values calculated by the two sets of weights are respectively passed through the wavelet function and the sigmoid function. Finally, they are multiplied to obtain the output of the neuron of the neural network. The output of kth neuron is shown in (30). Some researchers also combined recurrent neural network (RNN) with WNN. Simple recurrent WNN (RWNN) is similar to traditional RNN. The value of the hidden layer of the RNN depends not only on the current input but also on the last value of the hidden layer. A novel Type-2 Fuzzy RWNN is proposed to estimate nonlinear systems [147]. This novel structure has been shown to outperform other conventional techniques in nonlinear system modelling, with better convergence, lower error, and faster response. Fig. 19 is the input and hidden layer of the designed RWNN. Reference [146] also considered the idea of a recurrent neural network. Since there are two parallel lines in one neuron, the feedbacks, which are the outputs of wavelet function and sigmoid function, could be fed back to themselves and the parallel part. Fig. 20 are the designed RWNN with different feedback positions. The output from the sigmoid function could be sent to both the wavelet part and sigmoid part, and the output from the wavelet function could also be feedback to the wavelet part and sigmoid part. In addition, based on the design of RNN, the neural network's final output could also be sent back to the input layer.
In [148], it designed a four-layer WNN. Wavelet function is used in the second hidden layer. It is similar to the structure in Fig. 15 but still adds a sigmoid function as the first hidden layer between the wavelet hidden layer. Reference [149] not only used WP as a signal preprocessed tool but also used a three-layer WNN for prediction and achieved high accuracy. In [150]- [152], they all used a four-layer WNN; the two layers between the input and output layer are the mother wavelet layer and wavelet layer. Fig. 21 is an example of this four-layer WNN. Another important network structure is the BP network, which is currently one of the most widely used and most successful neural network models.
In [153], a variable translation WNN is proposed and compared with other neural networks, which shows a better learning probability. The translation parameter of the mother wavelet in the hidden layer are setting depends on the input variable and is controlled by a non-linear function. The combination of wavelet network and fuzzy logic uses the membership function to express the weight value. The fuzzy wavelet network model with fuzzy weights and output is constructed. The authors in [154] and [155] combined fuzzy Neural Network and WNN and designed a wavelet fuzzy neural network (WFNN). In [155], each node in the fifth layer is with a wavelet function. WFNN proved to be a convergent network. The effectiveness of the proposed control system has been verified by computer simulation and experimental results. Huang et al. [154] extended WFNN to Hybrid WFNN, which is based on PNN. Compared with the results produced by some well-known and commonly used fuzzy neural network models, experimental studies involving three commonly used data sets show some better results. The RMSE of the best-performing method among their proposed methods is about 65% higher than the previously proposed method. VOLUME 10, 2022 Similar to the artificial neural network, not only the structure design of the neural network is a problem, but the choice of the activation function (mother wavelet) is also a hot research topic. As introduced above, the hidden layer of the WNN structure is a scaled and shaped mother wavelet. Both orthogonal and non-orthogonal wavelet functions could be applied to the hidden layer. The widely applied nonorthogonal wavelets are Gaussian Wavelet, Morlet Wavelet, and Mexican Hat Wavelet [74], which are introduced above.
[150], [152], [155]- [158] all choose the first derivative of the Gaussian function as the mother wavelet, which is usually called Gaussian wavelet. [146], [149], [151], [156], [159], [160] applied Morlet Wavelet function in the hidden layer. In [144], [147], [153], [156], Mexican Hat Wavelet is chosen as the mother wavelet. Reference [156] compared different mother wavelet activation functions, including Gaussian wavelet, Mexican Hat wavelet and Morlet wavelet. The experiment results showed that the Gaussian and Morlet wavelets have better classification accuracy. Fig. 22 is the waveform in the time domain of the three wavelet activation functions. Orthogonal wavelet network is more effective for function approximation due to the orthogonality of its basic function, but the orthogonal basis structure and network learning algorithm are more complicated, and the network's anti-interference ability is poor.
In [148], Rajankar and Talbar compared several Daubechies orthogonal wavelets, such as Coiflet Wavelet and Symlet Wavelet, as the mother wavelet function and found that db6 has the best MSE (Mean square error) performance. In [161], Lemarie Meyer wavelet is chosen, which is an orthonormal function. Their model converges quickly and obtains low RMS errors, which is a simple three-layer WNN. Chun and his team [162] used Meyer scaling function as the activation function. The orthogonal wavelet network training method is used to determine the number of hidden layer neurons and the weight of the hidden layer and the output layer, and the Gray system can compensate for the characteristics of the ambiguity problem of the orthogonal wavelet network model. An orthogonal WNN is proposed in [163]. Both orthogonal scaling functions and the corresponding mother wavelets are used and extended WNN to the multi-dimensional cases. They designed WNN based on wavelet framework theory and verified the better function approximation performance. Table 2 shows a brief conclusion of signal preprocessed by wavelet analysis and different WNN structures with various WAF (i.e., different mother wavelets) of reviewed references.
WNN was initially used in function approximation and speech recognition and then gradually extended to prediction, classification, image compression and other aspects. WNN is a neural network constructed based on wavelet transform theory, which makes full use of the localized nature of wavelet transform and the large-scale data-parallel processing and self-learning capabilities of neural networks. Therefore, it can accurately identify signals with local singularities, has a strong approximation ability, faster convergence speed, and fault tolerance, and its realization process is relatively simple. Usually, in signal approximation and estimation, the choice of wavelet function should match the characteristics of the signal, and the wavelet waveform, supporting length, and the number of vanishing moments should be considered. The system established by WNN identification can approximate the system's dynamic characteristics well on the linear model.
WNN has a strong non-linear mapping ability. Its lowpass filtering effect is good because the wavelet function has limited support in the time-frequency domain. Therefore, in terms of function approximation and signal processing, WNN has received more and more attention from experts. Generally speaking, the theoretical research of wavelet networks is still in the initial stage, and there are still many problems to be solved so far. For example, the research combines existing models like emerging neural network models or optimization algorithms, theoretical research on the convergence, robustness, generalization ability, computational complexity of wavelet networks and the selection and design criteria of wavelet base.

VI. REVIEW OF PRACTICAL APPLICATION OF WAVELET
Wavelets can be used in communication, image processing, signal processing and many other areas. BER and PAPR (Peak to Average Power Ratio) are two important criteria to judge the performance of wavelet applications like [164]- [167]. How to reduce or avoid the effect of BER during the application of wavelets has been discussed and analyzed in various research. Methods reducing PAPR during wavelet's application are also designed and discussed in many ways. Lower PAPR is an important index to ensure higher efficiency of wavelet application. Joint methods, pilot symbols, wavelet transform and approaches based on Wavelet Networks are valuable ways to achieve the reduction of PAPR, and recent studies have focused on these ways [166], [167]. Krishna et al. [166] introduced DWT in channel estimation of OFDM system and achieved better PAPR performance than DFT. In [167], Anoh and his team investigated several mother wavelets in a wavelet-based OFDM system for PAPR reduction. They found that their performance of it is often better than traditional wavelets, especially when adding pilot symbols. Fig. 23 shows the structure of wavelet applications reviewed in this section.

A. COMMUNICATION SYSTEM 1) TRADITIONAL SIGNAL PROCESSING
There are many typical applications for signal processing. Classification is a significant branch of signal processing. Applying wavelet analysis to signal processing can achieve a better classification effect, which focuses on feature extraction. The characteristics of the signal can be in the time domain, such as zero-crossing rate, and short-term energy. The characteristics of the signal can also be in the frequency domain, such as the characteristics that contain energy, mean square frequency and frequency variance.
Reference [168] designed fractional wavelet packet decomposition for energy entropy calculation to obtain more information. They investigated several fractional orders to increase the identification accuracy and compared the proposed method with other existing classifiers, showing their method's superiority. The authors in [169], [170] use orthogonal wavelets to achieve feature extraction of fault signals in the power system. In [169], frequency response analysis signals use orthogonal wavelet filter banks to detect winding faults. They extracted log energy features after Daubechies wavelet decomposition and increased classification accuracy. In [170], Aggarwal and Saini used the criterion of energyto-Shannon-entropy ratio to choose the best mother wavelet to decompose the voltage sag signals. They compared it with the classic classifier and showed superiority. Wang and Zhang [171] analyzed wavelet entropy characteristics by extracting the Shannon entropy of wavelet coefficients and the correlation dimension of signals as the feature vector of signal and designed a new method for feature extraction. The classification results showed that the combined features have better performance. The authors in [172] focus on the spectrum characteristics of healthy and faulty parts of signals and are then reconstructed with RADWT to analyze. The energy possessed by the RADWT processed signal is used to estimate the torque.
For identification, in [173], system identification of linear time-invariant systems is studied and has better performance with wavelets. They focused on Basis Pursuit identification using a rational wavelet basis and compared it with the existing method of adaptive Fourier decomposition; the performance is comparable. In [174], Ma and his team studied the spectral identification of Fusarium head blight by applying continuous wavelet analysis to the reflectance spectra of wheat ears. This model performance suggests that spectral signatures obtained using CWA can potentially reflect Fusarium head blight infestation in winter wheat ears. The researchers in [175] proposed a DTCWT-based method to extract sensor pattern noise from a given image, which achieved better performance in regions around strong edges. Authors in [151], [155] tested the identification ability of different designed WNNs. Reference [155] designed a wavelet fuzzy neural network for identifying and controlling nonlinear dynamic systems. Computer simulations have verified the effectiveness of the proposed control system. Khan's team attempted to use a new kind of wavelet-based self-tuned wavelet controller for IPM motor drives which has already been implemented using the MATLAB/Simulink software and the dSPACE digital signal processor hardware and shows better performance than traditional controllers [151].
Wavelet transform provides an efficient way for noise suppression/mitigation [176]. Huang and his team designed a Gaussian wavelet basis expansion [177], and a pseudo-pilotaided complex Gaussian wavelet basis expansion base [178] and compared the BER performance of it with some other phase noise compensation methods. The proposed method is more efficient than other existing methods. In [124], Fan et al. compared FrWT and DWT denoising performance based on SNR (Signal-Noise ratio) and RMSE (Root Mean Squared Error). Reference [30] compared the noise-filtering effects of different wavelet construction and found that biorthogonal wavelet transforms with shape control has the best performance. Chien and Yu [179] focused on impulse noise mitigation in the wavelet-OFDM system for powerline communication. The BER performance shows that the proposed method mitigates the impulse noise much more effectively, especially by adding ideal channel estimation. Denoising ECG signals can also be realized by using suitable wavelet methods [148], [180].
Arvinti and Costache [180] propose a robust and easyto-implement algorithm and achieve high SNR, low RMSE and MSE for ECG signals. They also investigated different choices of mother wavelets and found that reverse biorthogonal wavelet 2.4 is the best mother wavelet for ECG signals. Reference [148] applies WNN as the ECG signal denoising method and investigates the BER and RMSE performance of different wavelet functions as activation functions. They concluded that WNN is a better alternative to the traditional DWT based noise mitigation method and db6 is more suitable for ECG signal denoising. In [94], the authors applied perceptual wavelet packet transform on speech enhancement. Segmental SNR, Perceptual Evaluation of Speech Quality and Weighted Spectral Slope (WSS) are used to evaluate the efficiency of their method compared with some of the state-of-the-art speech enhancement methods. The simulation results show higher segmental SNR, higher output perceptual evaluation of speech quality, and lower WSS values than existing methods.

2) SIGNAL PROCESSING IN EMERGING FIELDS
Wavelet transform is widely used in traditional signal processing. It expands many emerging applications, for example, in electrical signal processing in power systems, biomedical signal processing, IoT (Internet of Things) mobility prediction, and even quantum image processing. Reference [181]  replaces the sampling in the DWT with compressed sensing and reconstructs the high-frequency characteristics of the voltage and current to estimate the equivalent series resistance (ESR) of the aluminium electrolytic capacitor. The cost of data sampling, transmission, and storage is reduced and suitable for various environments. In [182], Gao and his team introduced the empirical wavelet transform (EWT), which has superior time-frequency resolution ability. They compared it with other feature extraction methods like WP, which verified that EWT is more suitable for the extraction of High-impedance faults signals. They considered permutation entropy, which denotes the similarity, the cross-correlation coefficient, the tracking original signal ability and energy ratio, and the energy loss for feature extraction measurements.
Compared to the traditional empirical mode decomposition method, the three criteria improved by about 2%, 260% and 44%, respectively. Reference [183] uses complex WT to detect the phase and duration of voltage sags accurately. Compared with the db4 real wavelet detection voltage sag, it verifies the effectiveness of combining the DQ transform method and the complex WT for voltage sag detection. Wavelet transform can also be applied to the texture feature analysis of microscope images [184]. They extracted detailed information from the wavelet decomposition coefficients and analyzed these features to evaluate the changes in artificially aged power transformer winding insulation paper samples.
Biological signals can also be used to extract features through wavelet transform. Reference [185] tracks the user gait phase and identifies relevant biomechanical gait events. DWT method can robustly adapt to different walking speeds and reduce the RMS of the phase reset error by 64% and 21% in assistive mode and transparent mode, respectively. In [186], WP has been used for feature extraction in electroencephalography (EEG) signals, and the recognition accuracy achieved 68%. Zhang et al. [187] applied DWT to analyze retinal ganglion cell inner plexiform layer (GCIPL) topographic thickness map to extract useful features and used three machine learning methods for further analysis. The performance of traditional thickness analysis in discrimination ability in patients with multiple sclerosis (MS) and a history of optic neuritis (ON) is improved. Machine learning methods may be expected to facilitate the diagnosis of MS patients and ON patients. Wang's team used an improved wavelet threshold method to denoise measured surface electromyography (sEMG) signals [188]. Compared with the traditional wavelet threshold denoising algorithm, it has better SNR and RMS error performance for sEMG signal denoising, improved by about 5%. The features are extracted from the denoised sEMG signal and used as the input of the neural network algorithm to achieve accurate fatigue state recognition. In [189], researchers used ECG signals to predict sudden cardiac death with high accuracy. Use DWT for signal preprocessing, extract features as the classifier's input, and achieve the highest accuracy compared with other research.
Reference [190] applied DWT decomposition to construct an adaptive mobility sampling algorithm, which can reduce wasting computational resources in IoT network mobility prediction. In [191], quantum wavelet transform is used in embedding watermark information in the quantum image. The simulation results show that the watermarked image is not significantly different from the original image for VOLUME 10, 2022 different images. After watermarking, the image distortion is smaller than the quantum image watermarking algorithm using a quantum Fourier transform. The authors in [192] focused on the vibration signal of rolling bearing, and Daubechies wavelet is selected for 3-level wavelet packet decomposition. The proposed method achieved higher classification accuracy than existing classifiers like SVM (support vector machines). Bărbulescu et al. modelled the signals that ultrasonic waves propagate in diesel [193]. Through statistical verification, the method combined with the wavelet better describes the experimental data, which can be used to predict or control the evolution of the cavitation process.
Reference [194] introduced orthogonal wavelet division multiplex (OWDM) as a more flexible alternative for OFDM. It replaces the fast Fourier transform and Inverse Fast Fourier Transform parts in the OFDM structure with DWT and IDWT. Fig. 24 shows the block diagram of DWT-OFDM. In [195], the BER performance shows the superiority of DWT-OFDM over traditional FFT-OFDM in a hybrid powerline communication (PLC)-visible light communicationbased system. Lokesh and his team [196] compared several wavelets in DWT-OFDM and showed that the biorthogonal wavelet transform provides a lower BER in all wavelets by its characteristics. Sarowa et al. [197] compared in more detail. They designed a mitigation technique and compared the wavelet-OFDM system based on this technique with the traditional OFDM system based on self-cancellation and maximum likelihood. In [198], a new wavelet-based multi-carrier modulation technique, namely filtered orthogonal wavelet division multiplexing, is proposed as an effective alternative to traditional OFDM to reduce PAPR. In this model, the system does not require a cyclic prefix, which exhibits higher bandwidth efficiency. Avcı and his team proposed a new asymmetrically clipped optical-OFDM method based on lifting wavelet transform to restore spectral efficiency and improve the performance of the system [199]. In order to improve the spectral efficiency of multi-carrier modulation in sonar image transmission, reference [200] proposes a sparse non-OWDM scheme based on sparse representation. The results show that compared with OFDM, the proposed scheme requires fewer frequency resources and has higher PSNR and lower PAPR.
The coherent optical OFDM (CO-OFDM) system has unique advantages in optical fibre transmission and utilization, which can effectively solve the dispersion and interference problems generated in the system. Reference [201] combined DWT and CO-OFDM and reduced the disadvantages of CO-OFDM. The BER performance of DWT-CO-OFDM is better than CO-OFDM in QPSK (Quadrature Phase Shift Keying) and 16-QAM (Quadrature Amplitude Modulation) modulation. Nonorthogonal multiple access (NOMA) is a currently emerging technology adopted by 5G as a new multiple access technology. Bringing wavelet analysis to NOMA could achieve better results. The authors in [164], [202] both studied wavelet transform-based with pulse-shaped data for downlink NOMA. Baig's team [202] compared the noise variance and BER performance between FFT-NOMA and wavelet-NOMA; wavelet-NOMA outperforms FFT-NOMA in all simulation scenarios. In [164], Haar, Daubechies and coiflet wavelet are applied in the NOMA system and compared with conventional FFT-NOMA. Both BER and PAPR performance showed that wavelet-NOMA is usually superior to traditional FFT-NOMA, and the Haar wavelet has the best PAPR performance. In [165], the wavelet-OFDM system is also applied on precoded NOMA, and the BER and PAPR performance are better than OFDM-based precoded NOMA.

B. IMAGE PROCESSING
Wavelets can be used in image processing areas. While realizing pattern matching and recognition applications, DWT is used in a wide variety of areas [203]. By creating a rational biorthogonal wavelet filter bank, it is possible to optimally extract features in different sizes [204]. They compared the proposed RWT with a biorthogonal wavelet with a standard wavelet filter bank and achieved higher image classification accuracy. Wavelet's applications involve feature extraction and texture. Approaches used to solve challenges of feature extraction in image processing contain optimally extracting features in different sizes. In [64], Liu et al. designed a scattering network based on FrWT. They extended the traditional scattering network with fractional coefficients and improved image classification accuracy. Reference [205] focused on 2-D palm-print images. The investigation of palmprint images after two-level wavelet decomposition shows that the extracted feature values can maintain the uniqueness of each palmprint image and can be used for palmprint image classification.
Furthermore, wavelets are helpful in denoise, enhancement and compression in the image processing area [107]. Reference [107] combines complex wavelet shrinkage and non-local filtering. Experimental results show that the proposed method effectively reduces speckle in Synthetic Aperture Radar images and ensures detail preservation in uniform areas. However, the proposed method is relatively timeconsuming. Norbert Remenyi's team presents an image denoising procedure [104]. They compared the existing image denoising methods and achieved a better denoising effect through the performance of PSNR. The PSNR value is improved by about 5% compared to Hidden Markov Model. Inspired by the powerful learning ability of GAN and the structural information extraction ability of wavelet transform, Su and his team [206] propose a combination of extracting structure and noise information through wavelet transform and generating high-quality images through GAN. Experimental results show that excellent performance is achieved, and noise can be effectively extracted while preserving texture details.
Image enhancement technology is a method that reconstructs a higher-resolution image. It is widely used in satellite image resolution. Many researchers used wavelets to obtain higher resolution images, such as using discrete fractional wavelet transform and fractional fast Fourier transform and combining level set method, biorthogonal CDF (Cohen Daubechies Feauveau) wavelet-based on lifting scheme and complex WT [27], [207]. In [207], Choudhury and Dahake compared traditional DWT and FrWT decomposition, and interpolation is performed in these high-frequency bands using interpolation methods in order to obtain superresolution images. In terms of PSNR, MSE and structural similarity (SSIM) performance, FrWT achieves better results. A medical image compression algorithm combining geometric active contour model and biorthogonal wavelet transform is proposed in [27]. This algorithm is superior to traditional MRI image methods and provides better PSNR and Mean-SSIM values.
Image compression is also an essential part of image processing, reducing data storage and bandwidth limitations. Research related to using wavelets in image compression is also a hot spot. One trend is focused on different kinds of designed wavelets, such as using Daubechies and biorthogonal wavelets with the fusion of Spatial-orientation tree wavelets [30], [208], [209]. In [30], the theoretical analysis and numerical experiments of the proposed biorthogonal wavelet transform are based on the unified Catmull-Clark subdivision with shape control parameters. The proposed wavelet transform achieves a higher compression ratio and a more stable noise filtering effect than the most advanced lifting-based solutions. They improve the PSNR of the reconstruction model and reduce the time cost of encoding and decoding.
Reference [208] proposed a multimedia image compression method based on biorthogonal wavelet packets. The methods include the establishment of linear phase biorthogonal wavelet basis, the selection of 3 or 4 level wavelet decomposition and reconstruction stages, and the combination of improved frequency band division. PSNR was used as the reconstructed image quality evaluation index and achieved a better compression effect, improving about 3%. Bharati and his team compared several Daubechies wavelet and Biorthogonal wavelets at different decomposition levels, and the PSNR, MSE and compression ratio are used to indicate the efficiency of the wavelet-based image compression method [209].

C. OPTIMIZATION PROBLEM
There are many ways to solve optimization problems, but they all have some shortcomings. On one hand, the traditional mathematical optimization method takes the gradient descent direction as the forward direction of the optimization, which can easily fall into the local minimum solution and cannot get an optimal global solution of the problem with a high degree of nonlinearity [210]. On the other hand, the optimal global solution of some optimization problems is often near the pole of the feasible region, and these places correspond to the discontinuity of the derivative of the function mathematically, which makes the traditional mathematical optimization method invalid here.
Some non-traditional optimization methods developed since the 1970s are designed based on the inspiration of certain physical or biological phenomena. These methods include Genetic Algorithms, Simulated Annealing, Ant Colony Optimization Algorithm, Tabu Search and Particle Swarm Optimization (PSO). Although they can theoretically obtain the optimal global solution, the calculation time is theoretically infinite, which is not conducive to practical applications. Wavelet theory has a special function in describing the singularities of functions because many engineering optimization problems can approximate linear objective functions. It transforms the optimization of functions into a finite number of singular points in the feasible region, regardless of the optimization content and constraints [211]. As long as the singularity is determined, the optimal global solution is also obtained.
In [212]- [216], researchers combined PSO and wavelet analysis and achieved more exemplary optimization methods. The idea of the particle swarm algorithm originates from the study of predation behaviour of birds/fish schools [217]. It simulates the behaviour of bird swarms flying for food. The cooperation between birds makes the group achieve the optimal goal. It is an optimization method based on Swarm Intelligence. It finds the global optimum by following the optimal value currently searched. Compared with other modern optimization methods, the obvious feature of particle swarm optimization is that few parameters need to be adjusted, it is simple and easy to implement, and the convergence speed is fast. Like WNN, combining wavelet analysis and optimization methods is also divided into two directions. On the one hand, References [212], [216] are new methods proposed after mixing particle swarm and wavelet analysis.
Zhang and Min [212] designed an improved particle swarm with a wavelet threshold and also used a WNN using the Morlet wavelet as the activation function for classification. The improved PSO algorithm achieved higher classification accuracy, and different wavelet functions were applied for better noise filtering. The subjective visual effects, mean square error, peak signal-to-noise ratio, and structural similarity of the images after noise reduction are better than traditional noise filtering algorithms. The proposed method to classify the data set reduces the number of features and classification error rate. 21.543% reduces the maximum classification error rate, and the number of features is reduced by 12; 29.243% reduces the minimum classification error rate, and the number of features is reduced by 9. In [216], an improved particle swarm optimization scheme using lifting wavelet transform proposes dynamic range enhancement for optical time-domain reflectometry. This scheme enables the design of custom lifting wavelet filters to increase the signal-to-noise ratio and thus improve the dynamic range.
On the other hand, the authors in [213]- [215] first process signals with wavelet analysis and then apply PSO or Enhanced-PSO for optimization. Reference [213] proposed a hybrid prediction model combining wavelet transform, particle swarm optimization and support vector machine for short-term power generation prediction of practical microgrid photovoltaic systems. The prediction accuracy of the proposed model has been compared with seven other prediction strategies and shows excellent performance in terms of prediction accuracy improvement. In [214], Djaghloul and his team performed segmentation and tracking of deformable structures during intervention through an improved PSO scheme. The reconstructed 3D models are analyzed using wavelet-based methods to perform registration tasks. The system can thus track surgical instruments through updates of the colour model guided by prior anatomical knowledge. The researchers in [215] extracted seven wavelet features for Fusarium head blight detection based on continuous wavelet analysis of wheat spike hyperspectral reflectance. They constructed a Fusarium wilt detection model, taking wavelet features and traditional spectral features as input features and combining them with the PSO-SVM algorithm. The accuracy of random forest (RF), backpropagation neural network, and PSO-SVM detection models with wavelet features are improved by 3.7%, 2.9%, and 8.3%, respectively.
For other optimization algorithms, in [218], Yin et al. proposed wavelet transform subspace-based optimization methods and investigated various wavelet functions for the minor part of induced current in the inverse problem. The proposed method increased the resolution of a specific area and significantly accelerated the convergence speed of the algorithm. Temel and his team modified the Cat Swarm Optimization algorithm with wavelet transform to seek the best positioning sensor to cover the specified area in the 3D environment as effectively as possible [219]. Compared with the random deployment and the Delaunay Triangulation based deployment approaches, when covering 90% of the specified area, their method needs the least number of sensors. It has the best QoC (Quality Of Conformance) performance with 96 sensors.

VII. CHALLENGES AND RESEARCH GAP
Although wavelet analysis has achieved certain results in many application fields, it still faces many problems. 1) Except for mature one-dimensional wavelet theory, the theory of high-dimensional wavelet is not well developed. Multiwavelet theory is not extensively developed either. There is no general construction formula for highdimensional wavelets and multi-wavelet. In practical applications, the two-dimensional and high-dimensional wavelet bases currently used are separable; the lowdimensional wavelet base is constructed as a tensor product. However, using separable wavelet bases constructed from tensor products to analyze signals may lose their anisotropic properties. Designing the scaling factor value of multi-wavelet also needs to be further studied according to actual applications. 2) Selecting the most suitable wavelet basis for a specific application or data source has been a challenge in wavelet analysis all the time, both in wavelet transform and WNN. Although there has been researching on optimal basis selection methods in the literature as presented in the review, a systematic way of optimal wavelet basis selection and performance evaluation is still a significant research gap. The current selection of wavelet basis has the following problems: -Considering that some desirable properties of wavelets, such as symmetry and orthogonality, are not easy to obtain at the same time, it is a huge challenge to select or design suitable wavelets to deal with various problems in reality. At present, there are not many qualitative studies in this area. -The RWT is more suitable for oscillating signals because of its more satisfactory frequency resolution. However, the choice of rational factors is worth studying. The rational wavelet preferably includes the characteristics of the analyzed signal in the construction process. In order to make the rational wavelet transform better match the signal characteristics, it is necessary to adjust the parameters to obtain the rational wavelet basis with different time-frequency distribution characteristics. Among them, adjusting the parameters can change the frequency division method and change the timedomain oscillation properties of the wavelet function. For a given signal, achieving the adaptability of the wavelet base to the signal needs to be studied. -Most of the literature uses simple non-orthogonal wavelets, such as Mexican hat wavelets and Morlet wavelets, as activation functions because they are simple and easy to implement. Similar to multiwavelets, the new wavelet network can use multiple mother wavelets to select the best wavelet to the greatest extent. However, if a complex wavelet function is used, the calculation time of WNN will be significantly increased. Moreover, if the initial settings of the wavelet function's scale parameter and translation parameter are unreasonable, the entire network will be difficult to converge. In the aspect of high-dimensional data processing, there is little research on WNN, which is determined by the complicated structure of multi-dimensional wavelet theory. Therefore, the development of wavelet networks also depends on further research of wavelet theory.
3) Traditional signal processing has applied a variety of RWT, and most of them have achieved better results than DWT. However, in emerging fields, DWT still occupies the majority of studies. In the review of wavelet signal processing in the power system voltage and current signals, biological signals and quantum fields, few applications use RWT. The application direction of RWT could be expanded. It can also be seen that rational wavelet is used less for applying wavelets in the optimization algorithm. The type of optimization algorithm combined with wavelet is also relatively less, most of them considering PSO algorithm. 4) For WNN, the neural network structure is still under extensive research. Numerous studies in the literature use WNN based on the Radial Basis Function neural network, which is only a three-layer neural network. Choosing the proper WNN structure to deal with different problems is a great challenge in this field. The wavelet network only uses the expanded and translated version of a mother wavelet to construct the network. It is unrealistic to rely solely on a particular theory and technology. Therefore, attention should be paid to combining interdisciplinary research on fuzzy, fractal and genetic algorithms. Wavelet Kernel in convolution layer is a developing research field. Similar to using the wavelet function as the activation function, selecting a suitable wavelet kernel is also a direction worthy of further study.

VIII. CONCLUSION
The key highlights and concluding points of this paper are summarized as follows: 1) Wavelet theory is briefly summarised, including the construction method and properties of different wavelet bases. There are currently two main wavelet design methods: MRA-based and lifting schemes. The secondgeneration wavelet constructed using the lifting scheme contains the multi-resolution characteristics of the firstgeneration wavelet. They have fast calculation speed and low memory consumption. For various practical problems, rational wavelets, high-dimensional wavelets and multi-wavelets are worthy of further study. 2) Related algorithms using wavelet analysis were also discussed. For example, wavelet packet theory and wavelet transform are constructed by filter banks. DWT is the most basic and most widely used wavelet transform.
RWT can achieve finer frequency domain segmentation. It enhances the signal frequency domain localization and is a very powerful signal processing tool. RWT is more suitable for oscillating signals, and its application in Doppler analysis and radar or sonar detection is very promising. In terms of denoising, FrWT achieves better results than DWT. SNR can be increased by about 40%, and RMSE can be reduced by about 50% [124]. In the case of in-depth analysis of the signal, many particular wavelet transforms, such as DT-CWT and RADWT, are also designed to achieve better or more adaptable effects to special situations. 3) With the development of neural networks, combining neural networks and wavelet analysis has also flourished. The WNN, whose signal is preprocessed by wavelet analysis, combines the advantages of artificial neural networks and wavelet analysis. After wavelet analysis preprocesses the signal, the performance of WNN can reduce the prediction error by about 50% in [220]. Some commonly used WNN structures are summarised for WNN, where wavelet cells replace neurons, and examples of WNN structures combined with RNN are given. Applying WNN to more complex structures or combining it with some interdisciplinary algorithms can improve the performance of neural networks, such as in [154], which increases the performance by about 65%. Wavelet Kernel in convolution layer is a developing research field. Selecting a suitable wavelet kernel is also a direction worthy of further study. In summary, WNN can avoid the blindness of traditional neural network design. It has more vital learning ability, higher accuracy, simple structure and fast convergence speed. It is also the focus of future research. 4) Wavelet analysis has a wide range of applications in signal processing, and it has more advantages than traditional methods in signal analysis in terms of enhancement, denoising, compression and classification. In emerging fields like power systems and biological signals, wavelet signal processing has also achieved better results than traditional FFT. The performance of RMSE after noise reduction and feature extraction has been improved to varying degrees. The application of wavelet analysis in image processing includes image compression, classification and denoising. It deals with the low-frequency and high-frequency parts of wavelet images. In most cases, wavelet transform can achieve better performance in image processing, such as compression rate and denoising effect. The PSNR after noise filtering could be improved by about 5% in [104]. Wavelet analysis is superior to traditional methods in image quality reconstruction under the same compression ratio. PSNR is about 3% higher than existing methods in [208]. Combining wavelets with optimization algorithms can often get better optimization results. The combination of the optimization algorithm and wavelets VOLUME 10, 2022 method reduces the number of features and reduces the classification error rate. 5) The main challenges and research gaps in wavelet research have been discussed. When applied to signal processing, it is necessary to study the selection or design of the optimal wavelet basis. Multiwavelet and high-dimensional wavelet theories are still under development. Although RWT can flexibly adjust the time-frequency distribution characteristics, the amount of calculation and memory consumption has increased. The application areas of RWT also need to be further expanded, not only in traditional signal processing. The wavelet basis function required by the hidden layer of WNN is inconsistent with the wavelet basis selection criteria of signal processing, and it is necessary to introduce advanced wavelet theory further. WNN can be combined with multi-interdisciplinary algorithms such as fuzzy, fractal and genetic algorithms to obtain broad application prospects.