Tutorial on Empirical Mode Decomposition: Basis Decomposition and Frequency Adaptive Graduation in Non-Stationary Time Series

This tutorial explores the class of non-parametric time series basis decomposition methods particularly suited for nonstationary time series known as Empirical Mode Decomposition (EMD). In outlining a statistical perspective of the EMD method, it will be contrasted and combined (for the betterment of both methods) with other existing nonstationary basis decomposition methods. Some such techniques are functional Independent Component Analysis (ICA), Empirical Fourier Decomposition (EFD) (nonstationary extension of the Short-Time Fourier Transform (STFT), Empirical Wavelet Transform (EWT) (nonstationary extension of Morlet Wavelet Transform (MWT)), and Singular Spectrum Decomposition (SSD) (nonstationary extension and refinement of Singular Spectrum Analysis (SSA)). A detailed review of this time series basis decomposition approach is presented that explores 3 core aspects for a statistical audience: 1) the basis functions (Intrinsic Mode Functions (IMFs)) representation and estimation methods including robustness and optimal spline representations including smoothing and knot placements; 2) the computational and numerical robustness of various aspects of the iterative algorithmic design for EMD basis extraction, including treating carefully boundary effects; and 3) the first attempt at a population-based characterisation of EMD that provides a novel stochastic embedding of the EMD method within a stochastic model framework. Furthermore, the basis representations considered will be connected to local frequency graduation smoothing methods, demonstrating that these can be adapted to a local frequency adaptive framework within the EMD context. This will provide new practical insights into the interface between time series basis decomposition and graduation-smoothed representations.

intrinsic mode functions (IMFs), along with a single convexity signal that acts analogously to a trend, known as the tendency. In this method, a connected set of temporal-spectral characterisations is also obtained by converting each IMF basis representation, through the HHT, to obtain instantaneous frequency (IF) characterisations. The instantaneous frequency components characterise the physical meaning of local phase change in a more general way that accommodates non-stationary processes more adequately than any other non-IMF time series attempts to capture local or ''instantaneous'' frequency, see [1], [4]. Furthermore, the EMD decomposition is adaptive and based on the local characteristics of the data, which can add to its appeal in nonstationary and nonlinear time series analysis. However, to date, there has been little work to explore the statistical aspects of EMD regarding: estimation, computational efficiency, and robustness, a stochastic representation to complement the empirical algorithmic characterisation that exists for the EMD method, and comparisons and integration with other statistical time-series methods also of relevance in non-stationary or non-linear time series analysis. This tutorial article uses classical statistical models to make progress on these four components.
In the last two decades, the empirical successes of EMD have led to numerous extensions and refinements. However, little has been written in the statistics literature comparing and contrasting EMD with various other time series decomposition methods and the statistical treatment of the EMD technique has not yet reached the same level of coverage as its empirical application successes, where it has been extensively tested and validated in a variety of real-world time series and signal processing applications; see [1] and [4]. For instance, EMD is a widely used method in geophysics, signal processing, and other areas of engineering and environmental science, although it has received little attention or discussion in the statistical literature. One of the goals of this tutorial is to start to introduce a modern updated initial statistical perspective on EMD and to compare and contrast the EMD methods with other existing time series basis decomposition methods, and in the process to present a treatment of the EMD methodology from the perspective of nonparametric regression and graduation methods applied for adaptive basis decomposition. Furthermore, it will be a second aim of this tutorial to demonstrate how to integrate EMD representations with existing statistical time-series approaches for basis decomposition that will allow one to inherit many of the useful properties of the EMD framework.
As noted in [5], most time series decomposition data analysis methods are developed to study data assumed to be linear, stationary, or both. Furthermore, the presence of outliers in the data when performing many time-series decomposition methods can also significantly deteriorate their ability to accurately represent structure present in the underlying time series. This can produce, in many real-world applications, a highly idealised set of modelling assumptions that do not adequately match the data characteristics. As such, the resultant observed time series are often modelled or decomposed into basis representations in ways that may not adequately characterise the true structure of the signal. This has consequences for the accuracy of the representation and use of such approximations in statistical modelling applications. As will be discussed in this tutorial, one of the advantages of EMD is that it adapts to properties of the process and so can adequately handle nonstationary and nonlinear time series as well as provide a robust decomposition framework even in the presence of outliers in the observed time series or heavy tailed observation noise. Therefore, in this tutorial, one of the contributions will be to explore the settings in which EMD can be combined with classical decomposition methods to enhance their value when used in settings that involve nonstationary signals or nonlinear settings.
In general, there are numerous branches of work in the statistics literature related to the extraction or decomposition of particular structures from a time series. Most of these developed methods output a fixed number of components. In more complex time series, this results in unrelated structures being grouped and subsequent analysis losing intended interpretability. For instance, a classical time series decomposition framework may consist of decomposing a signal into an additive or multiplicative decomposition comprised of components including: • a temporal trend reflecting the long-term progression of the series; • a cyclical component reflecting repeated but nonperiodic fluctuations; • a seasonal component reflecting seasonal variation of constant periodicity; and • a residual or ''noise'' which attempts to describe the random, irregular influences. There exists a wide variety of variations to these types of time series decomposition methods such as the classical X11 decomposition method of [6]. The X11 time series decomposition method involves an iterative process, using appropriate moving averages to decompose a time series into trendcycle, seasonal, and irregular components. The resulting X11 decomposition creates a trend cycle for all observations and can accommodate a slowly changing seasonal component. However, one challenge with X11 is that it only handles monthly and quarterly data that are of a constant frequency, so it does not apply to all frequency resolutions that one may be interested in. This means that such a decomposition method is incapable of working with nonstationary processes in which the underlying process may be comprised of a time-varying frequency structure. Furthermore, the standard form of X11 is not robust to outliers; see discussions in [7] and [8]. Extensions to X11 named X11ARIMA and X12ARIMA have been proposed in a range of works [9], [10], [11], and [12] that accommodate linear process time series models of the Autoregressive Integrated Moving Average (ARIMA) family. However, these are again only applicable to stationary and linear processes. One example of incorporating EMD with well established existing time series methods will VOLUME 11, 2023 94443 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
be where we explore extensions of this type when combining EMD with X11 type decomposition methods in this tutorial to obtain X11EMD. This will be a practically useful approximation approach that will allow one to apply the simple classical X11 in a non-stationary setting with reasonable behaviour.
Many time series decomposition methods can also be significantly influenced by the approaches adopted to treat edge or boundary conditions for the start and end of the window of time in which the decomposition is being performed on the time series. For instance, X11 and its extensions rely heavily on Henderson symmetric weights by [13], [14], and [15] to isolate the trend-cycle component of the time series. This fails at the boundaries of the time series where there is insufficient data to estimate the trend cycle without, often restrictive, assumptions. Classical Henderson asymmetric weights were defined in [16] and [17] with the Reproducing Kernel Hilbert Space derivations of the asymmetric weights proposed by [18], [19], and [20] in an attempt to better differentiate the components at the edges of the time series. Since early work such as [21] there has been a growing literature on various boundary treatment approaches to resolve edge effects of time series decompositions, some of these relevant to EMD will be explored, as they are often ignored by the practitioner community that applies EMD and yet they can greatly benefit the accuracy of the extraction of the IMF basis functions when incorporated. The edges of the time series also cause extensive complications in EMD due to the iterative nature of the algorithm. In this tutorial, coverage will be provided on recent advances to resolve these issues in EMD context that are not adopted to date in the popularly used software packages used when performing EMD time series decompositions. Furthermore, the methods that will be discussed also assist in the resolution of such issues when EMD is combined with classical decomposition methods.
Another popular classical time series decomposition method is the Seasonal-Trend Decomposition Procedure Based on Loess (termed STL); see [22]. The STL method has certain advantages over the X11 approach in that it can handle any type of seasonality, and there is greater flexibility in the rate of change of the seasonal component that can be studied. Furthermore, it can readily be made robust to outliers, but like X11 it is limited in that it may only extract a single seasonal component under strict a priori assumptions about the frequency of the seasonal component. This makes such a method not readily applicable to non-stationary settings. Following the theme of this tutorial, the EMD decomposition method will be combined with STL methods to show how it can facilitate generalisations of this decomposition method that leverage from the properties intrinsic to the EMD method that allow it to work well for non-stationary time series as an approximation that affords reasonable behaviour when adapting this widely used classical technique in nonstationary settings.
Unlike these aforementioned classical methods, the EMD method differs from these time series decomposition methods in that the individual component bases are not interpretable in the same manner as these classical extraction methods for the time series described above. Imagine an ideal, purely stationary time series setting in which the signal is comprised simply of a finite set of deterministic components which are constant pure harmonics. In this ideal setting, EMD would produce a finite set of IMF bases that capture exactly the sine and cosine bases at each time instant that one would otherwise have achieved via a local Fourier expansion of the signal at any time instant. This is a useful starting point to think about the type of representation that will be characterised by the EMD decomposition in an ideal setting. Generalising this concept to time varying non-stationary signals is accommodated with a finite basis set by EMD by considering orders for the EMD components not directly based on periods but rather based on counts of convexity change in the decompositions over time, something to be made more explicit in the following sections. Using this alternative specification of the decomposition basis has numerous advantages that will be outlined below.
One of the advantages of EMD is that it extends beyond this idealised case to accommodate non-stationarity and nonlinearities. As such, in more general time series settings the finite number of IMF bases extracted via EMD characterise local quasi-oscillatory trends that vary over time according to the degree of non-stationarity locally present in that particular local interval of a time in the signal. The IMF bases extracted by the EMD method are no longer then simple cosine and sine bases and in fact, we will come to see that the EMD is not prescriptive of the parametric form of the bases, instead, it defines the required sufficient and necessary properties such IMF bases must satisfy. In general, one may think of the EMD method as producing a finite, often small, collection of multi-frequency resolution graduation smoothed trends, termed IMFs which are adapted to variation in the local time scale of oscillation/frequency variation of the underlying signal being observed in a particular instant of time. The EMD method provides a mathematically precise statement on the properties that each basis IMF should satisfy but not a parametric specification of the functional form, leaving this open to a variety of non-parametric characterizations that we will discuss in the context of multi-resolution regularized trend filtering or graduation. This is where we believe a statistical treatment of EMD methods should begin, in characterising the suitable properties of the IMF basis functions the epitomise the EMD decomposition and explaining that whilst the requirements of EMD are not constructive, one can develop non-parametric models that will accommodate the requirements of the EMD basis functions. This tutorial will also outline the specification of these IMF basis functions formally according to a set of mathematical characteristic properties. In presenting the characterisations in this manner it will allow for the further exploration of statistical models that can be integrated within an EMD decomposition.
To connect this with more recent statistical work, one can see that this ties in well with the recent resurgence in the study of time series trend extraction through methods related to regularized trend filtering, see [23], [24], [25], and [26]. These methods are modern extensions and reinventions of versions of classical smoothing or graduation methods pioneered in the ground-breaking time series characterization or decomposition methods introduced in [13], [14], [15], and [27], and later in widely discussed economic graduation and smoothing trend filters such as the Hodrick-Prescott (HP) filter introduced in [28]. Such classes of filters often form components of the X11 method developed in [6]. In dynamic on-line settings, one can also consider latent trend extraction methods such as the Kalman filter ( [29]). Such trend extraction filters can be combined with variations of the above time series signal decomposition methods and also within the EMD formulations as will be outlined in this tutorial. This is one area in which this tutorial is expected to benefit those seeking to utilise the EMD methodology as the literature on EMD is largely written for a non-statistical and non-econometric audience, and hence considerations such as graduation or smoothing of the IMF estimation or bases representations are often not considered or discussed.
In [30], Kalman filters, already introduced in [29], and Butterworth filters, introduced in [31] are used to explicitly bandpass filter higher frequency economic data in a frequency-orientated graduation method. Furthermore, a wide class of spectral analysis time-frequency methods that apply Butterworth and bandwidth filters designed for trend extraction are based on variations of Fourier-based analysis. However, since Fourier spectra provide a meaningful interpretation for linear and stationary processes, its application to data from non-linear and non-stationary processes is challenging. EMD may be viewed as a multi-level implicit bandpass filter with numerous frequency spectra isolated via implicit graduation. A thorough review of explicit graduation techniques and methods can be found in [32] with emphasis on the matrix formulation of non-stationary signal extraction being done in [33]. For a high-level review of numerous time series regression, forecasting, and decomposition techniques one can see [34].
Yet another, recent time series decomposition technique, that also shows promise is developed in [35] and is called Singular Spectrum Decomposition and was developed from Singular Spectrum Analysis. This technique uses autocorrelation structures and Singular Value Decomposition of the resulting autocorrelation matrix to isolate specific frequency structures, not unlike Principal Component Analysis. In this tutorial we will also review this method and show once again the advantages one may obtain when combining such a method jointly within an EMD decomposition framework.
Finally, polynomial chaos expansion or Wiener chaos expansions are yet another approach to functional basis decomposition, see [36]. In this approach, Hermite polynomials are used to model stochastic processes with Gaussian random variables being used to model the probabilistic uncertainly in the underlying generating process parameters. Often these methods require an infinite basis expansion and since the primary focus of this tutorial will revolve around methods that produce finite basis outputs, such methods will not be of focus in the remainder of this tutorial.
Rather, in this tutorial, we will extend the discussion of EMD basis decomposition methods known in the literature by an iterative decomposition computational algorithm called sifting. The sifting procedure followed by the Hilbert transform being performed on the extracted IMF basis components is referred to collectively as the Hilbert-Huang Transform (HHT) and in almost all the cases studied, the HHT gives results that are much sharper than the majority of the traditional analysis methods aforementioned. Additionally, it has been demonstrated to have unprecedented prowess in revealing hidden physical meanings in data that may otherwise be lost through graduation and spectral mixing.
The sifting process, which forms a critical part of the EMD algorithm, has had many additions and improvements. The two conditions an intrinsic mode function (IMF) must satisfy are often challenging to exactly satisfy for a given representative model one may use to characterise the IMFs, which can lead to a phenomenon known as over-sifting, and which in turn can remove meaningful structures from the data. To remedy this, [1] introduced a stopping criterion in addition to the two core conditions that the IMF representations must satisfy. There have been numerous other conditions proposed with six of these being listed and reviewed in [37].
Local mean estimation, or more accurately, detrended fluctuation analysis has had numerous contributions. The original extrema interpolation technique presented in [1] is used in this work, but other interesting methods exist such as the inflection point interpolation technique proposed in [38] and the binomial averaging technique proposed in [39] in which the groundwork was laid for a cubic B-spline EMD method which itself followed on from [40]. One aspect of this tutorial is to highlight the opportunities that still arise regarding the careful statistical study of convergence of many of these techniques, particularly for the statistical audience.
One will notice in several figures in this tutorial slight deviations of the structures from the underlying trends. This is, unfortunately, a result of a ubiquitous edge-fitting problem. Various methods have been proposed such as numerous variations of the symmetric technique in [41], [42], [43], and [44]; the anti-symmetric technique in [43]; different versions of the characteristic wave technique in [1] and [45]; different slope-based approaches in [44] and [46]; the averaging method as in [47]; and a whole family of machine-learning extrema prediction methods, one of which is the single neuron neural network technique in [48].
The core message this tutorial will convey relates to the significant improvement of the representations of the time-frequency resolution of a signal by EMD basis decomposition approach when compared against traditional techniques such as the short-time Fourier transform (STFT), the Morlet wavelet transform (MWT), and independent component analysis (ICA). Concerning the extensions to the potential algorithms listed above, this tutorial primarily focuses on a baseline characterisation of IMFs via a cubic B-spline EMD with a standard deviation stopping criterion in the sifting routine.

A. TUTORIAL CONTRIBUTIONS AND STRUCTURE
Section II begins by formally introducing the two defining conditions of an IMF, before Section II-A gives a brief overview of the Hilbert transform, instantaneous frequency, and basis representation. The Hilbert transform of these various bases is developed explicitly for B-splines. In Section II-C1, various smoothing options for cubic B-splines, and their properties, are discussed before a computationally efficient IMF stopping criterion along with easily translatable pseudo-code are presented in Section II-D.
In Section III, a novel initial attempt to develop a stochastic embedding of the EMD empirical method into a stochastic model formulation is developed. This involves first developing some statistical objectives that may be suitable to satisfy for such a stochastic embedding, followed by developing a connection between the IMF basis function representations widely used in applications of EMD and the proposed stochastic model embedding. Finally this section is completed with the specification of the EMD stochastic model based on a multi-kernel Gaussian Process formulation that will achieve the properties proposed for such a stochastic representation of the EMD as well as be consistent with the IMF representations often adopted in practical applications. This is by no means claimed to a be unique stochastic embedding and will act as a first stochastic model characterisation that the statistical community can build upon when exploring the EMD methodology.
In Section V, Serial Bisection knot optimisation, ICA, the Minimum Description Length Principle (MDLP), and X11 are briefly introduced in Section V-A, Section V-B, Section V-C, and Section V-D, respectively. These techniques are briefly discussed with some accompanying figures and equations for clarity, whilst citing the defining works on the topics for further reference. Serial Bisection is introduced in [49], the FastICA Algorithm is introduced in [50], while the MDLP is introduced in [51], with X11 being introduced in [6]. These techniques will not be the focus of this paper, but EMD will be used in conjunction with these methods in Section VII-A1, Section VII-A2, Section VII-B1, and Section VII-B2 to meaningfully isolate different frequency structures.
Section VI provides a brief discussion of the problems facing classical time series analysis techniques such as the time-frequency trade-off and non-stationary structures obscuring analysis and are summarised in Figure 5 and Figure 6 respectively. In Section VI-B, the Fourier transform and its variation, STFT, are briefly discussed along with its shortcomings, before the MWT, its improvements upon STFT, and its associated shortcoming are discussed in Section VI-C.
In Section VII common examples of time series are shown with each time series showing a particular undesirable property that traditional techniques struggle to capture. Frequency modulation and the shortcomings of other traditional techniques are the focus of Section VII-A1. Gaussian Processes are the focus of Section VII-A2 with a periodic compound exponentiated sine squared kernel being using to generate Gaussian Process realisations. These realisations are then used in conjunction with ICA, MDLP, and EMD to infer valuable frequency information from the structures.
In Section VII-B1, a well-known, seemingly simple, data set exhibiting transient properties is shown to pose challenges to STFT and MWT, whereas EMD proceeds to accurately capture the underlying structure. The resolution of EMD is shown to be improved by X11 and vice versa. Finally, EMD and ICA are used in conjunction on real-world globally distributed financial indices to extract implicit market factors related to the work presented in [52] and [53]. This work is concluded with some closing remarks in Section VIII.

II. EMPIRICAL MODE DECOMPOSITION
This tutorial provides new perspectives that complement and extend the work in [1] and the B-spline extension proposed in [39] on the EMD of univariate time series. The characteristically flexible bases conditions are described below.
Condition 1 the number of extrema (maxima and minima) and the number of zero crossings differ by at most one; and Condition 2 at all points the local mean of the envelope, defined by a spline through the maxima and a spline through the minima, must be zero. To state this more formally, some notation needs to be introduced. Let |{·}| be the cardinality set, γ k (t) be IMF k,γ M k (t) be the spline through the maxima of IMF k,γ m k (t) be the spline through the minima of IMF k, andγ µ k (t) be the mean of the extreme envelope, with the time series defined for t ∈ [0, T ], Condition 1 and Condition 2 can be formally defined as below.

Condition 1 abs
The above definitions follow those first proposed in [54] which formally sets out the characteristics of an IMF basis in the EMD decomposition approach. One should note that Condition 2 is not numerically easily satisfied in practice and that if this is not properly managed, it can result in nonsensical IMFs and IFs. Therefore, from a computational perspective the Condition 2 may be relaxed to produce an approximation 94446 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
that has the precision controlled by the degree of relaxation ϵ: Modified Condition 2 t abs γ µ k (t) ≤ ϵ, One can note the non-restrictive characterising nature of the IMF representation. This description, while mathematically precise, is not constructive in a prescriptive sense of the form of the statistical model or basis family to use in representing the IMFs. Allowing the IMFs to be constructed using any number of appropriate statistical models basis representations or functions. This will be explored in detail in this tutorial and numerous extensions to practitioners approaches that are more statistically rigorous and computationally robust will be discussed.

A. BASIS REPRESENTATION AND HILBERT TRANSFORM
One of the important features of any basis decomposition method that satisfies the above EMD basis decomposition characterstic properties is that it will also admit a closed form Huang-Hilbert Transform. It is precisely this fact that has seen EMD obtain wide spread practitioner use since it indicates that one may then define simultaneously the time-domain signal decomposition as well as a time-evolving frequency domain decomposition based on the Instantaneous Frequency (IF) decompositions of the IMFs via the Huang-Hilbert Transform.
The Hilbert transform of a univariate IMF, γ k (t), is developed in order to arrive at the analytical signal extension of the IMF which is then transformed into an instantaneous frequency time and frequency domain decomposition. Since there will be one IF signal per EMD basis IMF, one ends up with a corresponding basis decomposition in the frequency domain over time that complements the time-domain basis decomposition obtained from the IMFs via the EMD decomposition. This is a very practically useful outcome for many applications. The details of this important aspect of the EMD framework our outlined as follows.
The analytic extension of the IMF γ k (t) is obtained as follows:γ where HT [·] is the Hilbert transform, PV denotes the fact that the Cauchy principal value integral is considered, andγ k (t) is the resulting Hilbert transformation of the k-th IMF. This integral assigns values to certain improper integrals and it can be approximated using the Discrete-Time Hilbert transform (DTHT). The true integral may be seen in Equation (2) below: One then obtains the analytical extension of the IMF signal, denoted by γ a k (t), as follows: with a(t) representing the instantaneous amplitude, formally defined as: and Having obtained the analytic extension, one then considers the rate of change of the phase of γ a k (t) to obtain the IF, denoted by ω k (t), for the k-th IMF as follows: Finally, one may then define the Hilbert spectrum for IMF k, γ k (t) as follows: with s discretising the instantaneous frequency spectrum with t ∈ {t 0 , . . . , t N } as before. This allows for more accurate frequency resolution compared to other techniques as well as the separability of the IMF bases into ordered frequency structures. The instantaneous amplitude (instantaneous energy or instantaneous envelope in [1]), a(t), of a time series is wellunderstood, but the instantaneous frequency's existence is disputed in the literature such as in [55]. In [56], the problem of isolating very short-term frequencies is addressed using short-time Fourier transforms and in [57] the instantaneous frequency is only acknowledged to exist for 'monocomponent' signals which are poorly defined therein. The reluctance of many authors in the past to acknowledge or formalise the description of instantaneous frequency is two-fold: firstly, authors have a Fourier world-view in that without at least one complete wave cycle any attempt to define a frequency is seen as axiomatically flawed and nonsensical -this becomes very problematic when discussing non-stationary time series as reaching consensus on what is the 'true' wavelength is difficult; secondly, the difficulty in reliably calculating the instantaneous frequency has hindered study, but with the Hilbert-Huang transform the instantaneous frequency can be reliably defined and calculated.
There are two well-known DTHT methods: the Fast-Fourier Transform (FFT) DTHT which takes advantage of the relationship between the Fourier transform and the Hilbert transform, and what is referred to in the literature as the basic DTHT first presented in [58]. While these discrete finite difference approximations of the Hilbert transform are reasonably accurate, some functional forms have closed-form Hilbert transform solutions. Piece-wise polynomial splines defined using basis functions over a knot sequence to approximate a time series are desirable as the coefficients from the basis fitting can be used to calculate the Hilbert transform directly using the Hilbert transform of the spline basis functions. There are many spline basis choices available of varying complexity and flexibility. VOLUME 11, 2023 94447 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
The mean squared error (MSE) optimality of cubic splines with known knots points is shown rather elegantly in [59] as well as more generally in [60] and [61]. One can construct cubic splines in numerous ways, with two common approaches used for EMD being cubic B-splines, as in [39], and cubic Hermite splines, as in [62]. Other cubic spline bases such as the cubic truncated power series splines as in [63] can also be used. Cubic B-splines are preferred here over cubic Hermite splines due to cubic B-splines having second-order continuity. Cubic B-splines are also preferred here over cubic truncated power series splines due to the compact domain of cubic B-splines. The non-compact domain of cubic truncated power series splines results in high correlations between the bases and numerical instability in estimations of coefficients.

1) HEURISTIC INTERPRETATION OF INSTANTANEOUS FREQUENCY
The instantaneous energy (or envelope) of a function is well-understood and has an accepted physical interpretation, whereas the instantaneous frequency of a function is neither well-understood nor does it have an accepted physical interpretation. Authors and practitioners in the past have dichotomised the community by either unacknowledging the existence of an instantaneous frequency ( [55]) or only accepting the existence of an instantaneous frequency in certain (non-formalised) 'mono-component' functions ( [64], [65], and [66]) -these were the prototypical IMFs.
There are two reasons for the hesitancy behind the formal acknowledgement of an instantaneous frequency. The first of these being the Fourier frequency world-view where the measure frequency is only considered for an integer number of full wavelength, id est a frequency is indeterminable in certain trivial (T < λ) sinusoidal functions. The second being the calculation of the instantaneous frequency.
The Hilbert transform in Equation (2) uniquely defines the complex-conjugate pair so that an analytical function may be defined as in Equation (3). For work focusing on the physical interpretation of an instantaneous frequency one can see [67]. For functions to be considered, they would need to be 'monocomponent' as Equation (6) is only a univariate function of time. The definition of 'monocomponent' was not agreed upon, but it only made sense defined over a narrow bandwidth (see [68]). Bandwidth (and therefore narrow bandwidth) in a number of ways.
If the function is assumed to be Gaussian and stationary, the bandwidth can be defined in terms of spectral moments, such that the expected number of zero-crossings per unit time is defined as: with the expected number of extrema per unit time being: with m i being the i th spectral moment. With υ being defined as: it becomes a bandwidth measure. A narrow bandwidth function (υ = 0) is one in which the expected difference between the expected number of extrema and the expected number of zero-crossing equals zero. In [66], a similar conclusion is reached in that a narrow bandwidth is only achievable if a(t) and θ(t) vary gradually. Both of these definitions are restrictive as they are both global definitions. To define a meaningful instantaneous frequency restrictive conditions need to be imposed on the function such as in [69] and [70]. For a meaningful instantaneous frequency to exist, the real part of the Fourier transform of the function must only have positive frequencies -this is proven in [71]. This, however, is still a global definition.
As will be elaborated upon further in Section II-D, there exist very few, if any, theoretical results proving the convergence, uniqueness, et cetera, of the algorithm and the resulting IMFs. As a result of Equation (3), every frequencyand amplitude-modulated sinusoid is necessarily an IMF, with the reverse not necessarily being true as IMFs can be discontinuous whilst still satisfying Condition 1 and Condition 2.
The lack of uniqueness of the IMFs satisfying conditions Condition 1 and Condition 2 motivated the development of stopping criteria which, while also assisting in the computational optimisation of the algorithm, this has allowed uniqueness of the IMFs. In [72], several of these stopping criteria are explored in detail, without formalising the resulting uniqueness.
With γ k (t) = sin(t), it can be shown that: 94448 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  from which it follows that: The constant is not important (as the same instantaneous frequency would result independent of the phase), but from this it follows that:

B. CANDIDATE IMF STATISTICAL MODEL REPRESENTATIONS IN EMD DECOMPOSITIONS AND THEIR HILBERT TRANSFORMS
In this section, we will define closed-form expressions for the Hilbert transform of key characterising basis representations often used to construct parametric representations of IMFs. This includes the piece-wise cubic Hermite spline, the cubic truncated power series bases and the Fourier bases. As is well known, the mean squared error (MSE) optimality of cubic splines with known knots points is shown rather elegantly in [59] as well as more generally in [60] and [61]. One can construct cubic splines in numerous ways, with two common approaches used for EMD being cubic B-splines, as in [39], and cubic Hermite splines, as in [62]. Other cubic spline bases such as the cubic truncated power series splines as in [63] can also be used. Cubic B-splines are preferred here over cubic Hermite splines due to cubic B-splines having second-order continuity. Cubic B-splines are also preferred here over cubic truncated power series splines due to the compact domain of cubic B-splines. The non-compact domain of cubic truncated power series splines results in high correlations between the bases and numerical instability in estimations of coefficients.

1) PIECE-WISE CUBIC HERMITE SPLINE
With knot sequence τ = {τ 0 , . . . , τ N }, y(τ k ) and m(τ k ) being the position scalar and tangent scalar at knot τ k respectively, y(τ k+1 ) and m(τ k+1 ) being the position scalar and tangent scalar at knot τ k+1 respectively, the piece-wise cubic Hermite spline defined over the interval with, The Hilbert transform of these bases, with w = τ k − τ k+1 , is obtained in closed form as follows: for functions see further details in [54] and [73], and Appendix A.

2) CUBIC TRUNCATED POWER SERIES BASIS
The necessity of the inclusion of w parameter in Equation (17) and Equation (16) will become clear when calculating the Hilbert transforms of cubic truncated power series bases as the knot sequences are not as obvious with the non-compact domain. The cubic truncated power series basis defined over the same knot sequence τ is defined as: with, with τ N = T and, and, One can then obtain the Hilbert transforms of these bases in closed form as follows:

3) FOURIER BASIS
If one were to characterise the IMFs via a finite Fourier series to approximate a time series that displays cyclical structures, then one would have a representation: which yields the corresponding Hilbert transform: A Fourier series basis representation would be advantageous given the simple description, closed-form Hilbert transform, and the extensive literature on the bases. However, should the time series of interest be non-linear, non-stationary, and with unknown step-points, the Fourier representation would be wholly unsuitable due to rigidity and the non-compact nature of the bases.

4) CUBIC B-SPLINES
A recommended basis for a wide array of EMD applications is the cubic B-spline used to represent the IMFs, this choice will be the default in applications in this tutorial. Cubic B-splines are reviewed in [40] and used to approximate EMD bases in [39]. Lemma 1 and Lemma 2 below are proved in [39] and [40], with Equation (27) being proven in Appendix B. For this class of splines a recursive relationship exists that is described in Lemma 1. Lemma 1: With B j,k,τ (t) being the j th basis function, of order k, for knot sequence τ , and defined over 0 ≤ t ≤ T with B j,1,τ (t) being defined as: the higher-order basis functions are defined as: The closed-form Hilbert transform may also be defined iteratively in as Lemma 2.
being the j th Hilbert transform basis function, of order k, for knot sequence τ , and defined over 0 ≤ t ≤ T with HT [B j,1,τ (t)] being: the higher-order Hilbert transform basis functions are defined as follows: . (28) With this notation, a time series can be approximated by a piecewise cubic B-spline: with the advantage of having closed-form solutions to the Hilbert transform of these bases (that are also computationally efficient to construct owing to the iterative formula in Equation (28)) being that the corresponding Hilbert transform is:

C. CUBIC B-SPLINE SMOOTHING
Consider a cubic B-spline representation of the IMFs, often it may be beneficial to consider the addition of smoothing to regulate the degree of oscillation, see [74]. Let s ∈ R N +1 be the time series defined over and 94450 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
For matrix B in Equation (31) and hereafter, the knot sequence subscript will be dropped as the sequence is implied. Then one may state the smoothing objective function, as in [74]: To prevent over-fitting, one should impose a penalty term, λ, on the curvature of the spline. For B-splines, this can be done in two ways which are shown in Section II-C1 and Section II-C2.

1) SMOOTHING SPLINES
Consider the second derivatives of cubic B-spline basis matrix B, given by then as introduced in [74] and [75], the spline is smoothed using the integrated squared second derivative such that: and so, with a continuous penalty term, Equation (33) becomes: As stated in [74], any order penalty is acceptable, but the most common is a second-order penalty.

2) P-SPLINES
The difficulty associated with smoothing splines lies in the calculation of Equation (35), with each element of the matrix being the integral of the product of second-order differentiated cubic B-splines. B-splines have a simple iterative relationship for derivatives, that can be found in [39] and [40], but it is still computationally less efficient than opting for the use of basis representations for the IMFs given by P-splines (penalised B-splines) that impose the penalty directly on the differences between the coefficients, giving where as in [74] and [76], the discrete smoothing is performed using the second-order difference between coefficients with, Equation (33), with a discrete penalty term, becomes:

D. EMD IMF BASIS EXTRACTION AND ESTIMATION VIA SIFTING
The sifting procedure is a recursive algorithm that iteratively extracts the IMFs in a sequence of nested residual minimising transforms that produce representations consistent with the characterising properties of an IMF under the EMD decomposition, as outlined in Condition 1 and Condition 2. The procedure is based on an a sequence of envelope based defluctuations that recursively refine the current best estimate the current stages IMF basis being extracted from the EMD decomposition. The number of steps and number of IMFs whilst finite is a priori unknown. Therefore, it is often important in practice to provide algorithmic termination conditions for this sifting procedure at different stages of IMF basis extraction to ensure that numerical and representational approximations do not propagate residual errors associated with estimation of representations of IMFs consistent with Conditions 1 and 2 that could cause algorithmic divergence as the EMD decomposition proceeds recursively through subsequent IMFs. In this regard stopping criteria are introduced. A critical evaluation of EMD stopping criteria is presented in [37]. The most common stopping criterion in the literature is the method originally proposed in [1] and referred to as the Standard Deviation Stopping Criterion -this method is used in this paper. With x(t) being the time series defined being the spline through m(t j ), h (p,q) (t) being the q th iterative candidate for IMF p, with the Standard Deviation Stopping Criterion defined as: for some ϵ with a value of 0.2 − 0.3 recommended in [1], but this is very much dependent upon the length and oscillation/volatility of the time series, the sifting procedure proceeds as outlined in Algorithm 1.
In Algorithm 1, |{·}| is the counting set,h µ (p,q) (t) is the calculated mean of the q th candidate for the p th IMF, and r p (t) is the p th remainder with the time series x(t) being both the initial (or 0 th ) candidate for IMF 1 x(t) = h (1,0) (t) and the first potential trend or remainder r 0 (t) . The final output of the sifting algorithm will be an IMF representation, denoted x IMF (t), of the original signal x(t), that will be expressed as a finite sum, generically K IMFs denoted by γ 1 (t), . . . , γ K (t), and a remainder tendency component, denoted r K (t), constituting a form of trend consisting of two or fewer extrema convexity changes, As mentioned, convergence results do not exist for EMD and the uniqueness results do not exist for the resulting IMFswith or without the additional stopping criteria. The most relevant result is the proof that an instantaneous frequency exists VOLUME 11, 2023 Algorithm 1 Sifting Process Require: if Condition 1 and Modified Condition 2 or Condition 3 if the real part of the Fourier transform of a function is positive in [71]. As a result of the empirical and non-constructive nature of the algorithm, it can be very easily integrated with other algorithms as will be demonstrated in the next section. The underlying theoretical results underpinning the empirical nature of the algorithm are an open problem in EMD and would surely assist in the statistical uptake of the algorithm, its algorithmic variations, and the algorithmic chimeras presented herein and the many possible others which this paper and these authors highly encourage.

E. OPEN STATISTICAL PROBLEMS IN EMD
Very few, if any, formal convergence and uniqueness results exist for EMD, whose name is well chosen owing to the relatively large amount of empirical evidence (and not theoretical) supporting the usage of the technique in a wide range of fields. Some results which are urgently required, is the representation of an arbitrary time series by a finite number of amplitude and frequency modulated sinusoids, as opposed to an infinite number of simple sinusoids as shown by Fourier in [77]. In addition to this result, one requires proofs for an infinite convergence given Condition 1 and Condition 2, a finite convergence given Condition 1 and Modified Condition 2, and uniqueness of convergence given Condition 1 and Modified Condition 2.
The empirical successes of EMD have been nearexhaustively shown in a wide range of fields such as signal processing as in [78] and [79], bio-medical data analysis as in [62] and [80], financial data decomposition and forecasting as in [81] and [82], voice and speech analysis as in [54], [83], and [84]. In [54], it is shown that EMD exceeds Mel Frequency Cepstral Coefficients (MFCC) (the state-of-the-art amongst voice recognition technology). Empirical successes of EMD have been further shown in mechanical fault and failure detection as in [85] and [86], with the papers originally proposing EMD, [1] and [2], demonstrating the superiority of the method in fields such as Hydrology, Seismology, Anemology, and unanswered questions such as turbulence flow in fluid dynamics, with many papers also assessing the different available algorithmic variations such as [37] as well as proposing new variation to address specific problems such as [87].

III. EMBEDDING THE EMPIRICAL EMD METHOD WITHIN A STOCHASTIC MODEL
We have shown in previous sections, working with cubic splines for the representation of the EMD method is advantageous from many perspectives. Firstly it is suitable to represent the interpolated signal s(t) from the observed time series in an optimal fashion based on minimising mean squared error. Secondly, it allows one to perform the sifting procedure readily when representing the envelope functions and results in a collection of IMF basis functions representations that are also cubic splines. Thirdly, the analytic extension via the Huang Hilbert transform, used to obtain the instantaneous frequency, admits closed form solutions for the representations of the IFs {ω l } L l=1 which is also characterised readily by cubic splines. Lastly, and most importantly, when considering moving from the pathwise EMD method basis extraction for one of the time series realised trajectories to a stochastic process embedding representation, the representation of IMFs via cubic splines allows one to utilise the established connection between Gaussian processes and B-splines to motivate working with Gaussian process stochastic embeddings.

A. EMD STOCHASTIC EMBEDDING OBJECTIVES
In developing the stochastic embedding of the EMD, we will distinguish between the deterministic (realised) or empirical EMD decomposition for a given signal trajectory, satisfying at any time t ∈ [0, T ] the property of EMD decomposition for IMF γ l (t) satisfying the mathematical characterisation given in Section II; and the stochastic process embedding of the EMD representation, denoted at any time t ∈ [0, T ], by the random variables (upper case for random variables): The challenge with developing a stochastic embedding for EMD method is that it will be required to satisfy a few core features: 1) sample paths of the embedded EMD stochastic process should be able to be consistent with the basis functions for the IMFs obtained from the empirical sample based characteristics that represent the classical EMD method as set up in Condition 1 and Modified Condition 2; 2) since the EMD method satisfies for each realised sample time-series trajectory s(t) that then one would naturally require such a property to be inherited at the population stochastic process level such that: where one can denote the stochastic process for R(t) by L+1 (t) to reduce notational burden. Ideally the representations of processes S(t) and IMF stochastic processes { l (t)} L l=1 would satisfy: a) stochastic processes used to model S(t) and IMF processes { l (t)} L+1 l=1 have known finite dimensional distributions and are from a family of known stochastic process models which are easily parameterised and characterised. This family of models for distributions at time t will be developed. b) Stochastic processes used to model S(t) and IMF processes { l (t)} L+1 l=1 would also ideally be easily calibrated to realised EMD sample based decompositions via standard estimation methods like maximum likelihood estimation with closed form expressions for the likelihood of the model for the stochastic embedding. c) IMF stochastic processes { l (t)} L l=1 are of the same family of stochastic process model as that which represents the signal stochastic process S(t). In other words if, for each time t, one has that random variable for convenience and with S denoting the parameters of the model that indexes the family member from F and furthermore, where f 1 ,..., L+1 is the joint distribution of the IMF random variables and tendency at time t, then it also holds that for each t ∈ [0, T ] and l ∈ {1, . . . , L + 1} the distribution of the IMF random variables satisfies that it is also a member of this family of distribution models such that indexed by parameter vectors l . d) Another desirable property for the stochastic embedding representation of EMD would be to have the conditional distributions also members of the same family of distributions of S(t), such that for each t ∈ [0, T ] and any combination of J ≤ L + 1 indexes denoted by subset K ⊆ {1, . . . , L + 1} one has that the random variable Note: In the case one assumes an independence model approximation for the joint distribution of the IMF random variables and tendency at each Then the EMD method decomposition implies that the stochastic representation of the IMFs are closed under convolution. This means that at each time t the random variable for the signal S(t) ∼ F (s; S ) and the random variables for the IMFs

IV. DEVELOPING A STOCHASTIC EMBEDDING OF EMD
In this section two approaches will be developed for the stochastic embedding of the EMD method which will be consistent with the EMD empirical decomposition whilst also concurrently satisfying the properties set out for such a stochastic representation of EMD given in Section III-A. To this end, two different system models will be developed, each of which will be based on versions of multi-kernel Gaussian Processes models with specially selected kernel structures. The reference baseline or benchmark model we will compare to these two novel system models for EMD stochastic representation will be a Gaussian process fit directly to the original signal s(t).
Gaussian Processes (GPs) are a highly expressive family of stochastic models widely adopted in machine learning, see [88]. Formally, a Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution, which is entirely described by its mean and kernel covariance function as detailed in Definition 1. VOLUME 11, 2023 The positive definite covariance function often referred to as kernel determines the class of functions from which such processes sample paths take support.
Definition 1 (Gaussian Process (GP)): Denote by f (x) : X → R a stochastic process, parametrised with state-space is a Gaussian Process if all finite dimensional distributions are Gaussian, where for any n ∈ N, the random vector (f (x 1 ), f (x 2 ), . . . , f (x n )) is jointly normally distributed. We can therefore interpret a GP formally defined by the following class of random functions: The properties of the functions, id est smoothness, periodicity, et cetera, are determined by the sufficient statistic given by the covariance kernel function.
Before introducing these GP models, we will motivate theoretically why the class of GP models is suitable for a stochastic embedding that will be shown to be both meaningful for regularised spline representations of IMFs as well as suitable to satisfy the properties outlined for such a stochastic embedding of EMD discussed in Section III-A.

A. SPLINE REPRESENTATIONS OF AN IMF AND REPRODUCING KERNEL HILBERT SPACES
In order to make explicit the connection between using spline models to represent the pathwise empirical EMD decomposition of s(t) and the stochastic embedding via a multi-kernel Gaussian process, we will recall briefly known connections between splines and Gaussian Processes (GPs). Splines may be viewed as limits of interpolations related to stationary Gaussian processes. Hence, we will explore further this connection as follows.
Consider seeking to recover the l th unknown IMF function γ l (t) for t ∈ [0, T ] based on current sifting defluctuation step datas l (t) :=s(t) − l−1 i=1 γ i (t) at time points t 1 , . . . , t N denoted as observations here generically by y i :=s l (t i ). That is, one has data {t i , y i } ∈ T × R and we seek the function representation for the l th IMF γ l (t) : T → R that minimizes the objective given generically in Equation 54, for instance which may be the familiar penalised residual sum-of-squares (PRSS), where L is a loss function, λ ≥ 0 is regularisation strength and J is a functional imposing smoothness on the IMF representation γ l . One can connect the regularised spline solution to GPs by considering Reproducing Kernel Hilbert Spaces (RKHS) to explore the unifying framework to motive the GP stochastic embedding model, see details in [89] and more recent works in [61], [90], and [91]. A Hilbert space H is an inner-product space which is complete in the metric induced by its norm. For every Hilbert space of functions on a set T , one may define for each t ∈ T the evaluation functional f : t → f (t). If every evaluation functional in the Hilbert space is bounded, then one obtains a Reproducing Kernel Hilbert Space (RKHS). Note L 2 is not an RKHS since the Dirac-delta function is not in L 2 . In an RKHS the Riesz representation theorem states that one may find, for each t a representer k t ∈ H such that One can then define a function known as the kernel k : . This function will be unique to a given RKHS H and has the properties of symmetry, nonnegative definiteness, and satisfies the reproducing property ⟨k(·, s), k(·, t)⟩ = k(s, t).
To understand why the RKHS space and reproducing kernel K are introduced, consider the space of all finite linear combinations of functions {k(·, s)|s ∈ T } with the inner product given by ⟨k s , k t ⟩ = k(s, t) along with linearity. It is then the case that k is a kernel for this space with the property, according to the Representer Theorem, that solutions to the regularised empirical risk given in Equation 54 take the form for α i ∈ R for all i ∈ {1, . . . , N }. The conditions under which such a representer theorem exists are studied in [92]. Given these results one may then link the estimation problem for representing each IMF to the case of polynomial smoothing splines, used to represent the IMF basis functions under the EMD method proposed. To see this consider, without loss of generality T = [0, 1], penalty function dt which acts to penalise irregularity and induce smoothness in the spline representation of the IMF basis. One can then construct an RKHS whose norm corresponds to this smoothing penalty J . Hence, the kernel needs to be made explicit.
Using Taylor's Theorem in one dimension with an integral remainder term to express the IMF function γ l , which is assumed to have at least m − 1 order absolutely continuous derivatives in [0, 1] and γ where (·) + is the positive part only and zero otherwise. If functions with this series representation with the first m − 1 derivatives being 0 at t = 0 are denoted by W 0 m , then for γ l ∈ W 0 m one has 94454 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. where Now observe that one can obtain an RKHS space from W 0 m with the inner product and kernel k 1 (t, s) = . As shown in [89], the space W m of functions with m − 1 absolutely continuous derivatives and m derivatives can be written as a direct sum H = H 0 ⊕ W 0 m with kernel k = k 1 + k 0 . Furthermore, J (γ l ) will be the square norm of the projection Pγ l of γ l onto W 0 M so the PRSS estimation objective in Equation 54 with for γ l ∈ H. By the Representer Theorem, the solution is the generalised form given by is comprised of two parts: an unpenalized component of H 0 and a linear combination of the projections onto W 0 m of the representers of evaluation at the N time points t 1 , . . . , t N . For the squared error loss L (y i , γ l (t i )) = L (y i − γ l (t i )) 2 the solution corresponds to the natural polynomial spline, see discussion in [93] and [94].
Hence, one is able to motivate the spline representation of the IMF as the solution to a generalised estimation problem in an RKHS regularised function space. In the following section, the connect between RKHS theory and Gaussian process embedding will be made.

B. RELATING SPLINE REPRESENTATIONS OF AN IMF AND A GAUSSIAN PROCESSES STOCHASTIC EMBEDDING
The function, l (t), is treated as a random function modelled by a GP and the mathematical connection between the spline representation on the pathwise EMD method decomposition of an IMF and the stochastic embedding developed in this work via GP models will be developed.
For Gaussian process prediction with likelihoods that involve the observed values of the IMF, γ l , at the N training points, extracted by the EMD method sifting algorithm, the empirical loss, L (y i , γ l (t i )), can be expressed according to the negative log-likelihood. Then the analog of the Representer Theorem, as detailed in [88] is given as follows.
Since the predictive distribution of l (t * ) at test point t * given observations y 1 , . . . , y N is given by p γ l (t * )|y 1 , . . . , y N = p (γ l (t * )|γ l (t 1 ), . . . , γ l (t N )) × p (γ l (t 1 ), . . . , γ l (t N )|y 1 , . . . , y N ) which in the GP case is expressed in terms of the GP covariance kernel k by One then obtains the regularized solution to Equation 54 from a GP perspective by noting that for the specific choice of loss and penalty given by where the loss function is set to the negative log-likelihood in which σ 2 N is the Gaussian noise model variance. The solution for the estimated IMF using this regularized estimation produces γ l = argmin γ l Q (γ l ) which, if one substitutes γ l (t) = N i=1 α i k(t * , t i ) and uses the fact of the RKHS space ⟨k(·, t i ), k(·, t j )⟩ H = k(t i , t j ), it can be re-expressed by an estimation objective explicitly in terms of the GP model as follows: Rewriting the objective in this manner expresses it as a parameter optimization problem in terms of coefficient vector α, this is the advantage of knowing that a Representer Theorem can be applied. If one then minimizes Q w.r.t. the vector of coefficients α one obtains which gives the prediction at test point t * which is exactly the predictive mean given in Equation 64.
To explicitly recover the solution to the smooth spline interpolation for the IMF representation obtained via solving Equation (62) using m = 2 and the regularised GP solution VOLUME 11, 2023 just presented the result of [95] can be used which shows that in this case if one considers a random function representation of the IMF given by To complete the example of the regulariser in the cubic spline case, one must remove penalties on polynomial terms in the null space by making σ β → ∞. This produces the final predictive mean solution for the GP representation of the cubic spline characterisation of the IMF given by with kernel covariance matrix K y corresponding to elements σ 2 f k sp (t i , t j ) + σ 2 N δ ij evaluated at all training points, H being the matrix collecting the vector of polynomial basis terms (1, t) at training points and kernel least squares coefficient estimator given by From this solution, one can see that the resulting solution for the predictive mean function for the GP representation of the IMF for γ l will have a cubic polynomial form.

C. GAUSSIAN PROCESSES BASED STOCHASTIC EMD EMBEDDINGS
Having established how the GP representations is connected mathematically to the empirical pathwise cubic spline representation for an IMF in the EMD method, one can now generalise the stochastic embedding from a single IMF to the entire collection of IMFs in a stochastic model that will be designed to satisfy the properties proposed for the stochastic embedding objectives set out in Section III-A. To achieve the desired embedding, consider first the stochastic process associated with the observed sampled signal converted from samples {s(t 1 ), . . . , s(t N )} to splinẽ s(t) which when considered as the realisation of stochastic process will be denoted by S(t) and S(t) respectively. The reference model used for comparison to the stochastic EMD models will involve directly modelling the process S(t) without the EMD method signal decomposition information, via a GP model.

1) MULTI-KERNEL EMD BASED GAUSSIAN PROCESS FOR S(t )
When the EMD is applied to signal s(t) and the set of basis functions are extracted, each IMF, γ l (t), will be considered as the realised path of the stochastic process denoted as l (t) and the one for the residual r(t) denoted as R(t). This will produce the following stochastic embedding of the EMD: (73) The SM2 representation of the original stochastic process for S(t) is then given by the GP: with where ϵ(t) ∼ N (0, σ ϵ ) and l (t) represents the GP for IMF l and there are l = 1, . . . , L of them and R(t) represents the GP on the residual tendency component. This general structure will form the basic structure for the stochastic embeddings proposed for the EMD method. Therefore one can see that the resulting model is still a GP model but differs from the application of a GP model directly to the signal as it is comprised of a multi-kernel structure induced by the EMD decomposition as specified below: It is apparent that the proposed GP model for the stochastic embedding of the EMD method differs from a direct GP model on the signal as detailed in reference model directly in how the sufficient statics are designed. The key point of the stochastic embedding of the EMD method GP framework is that the kernel of the GP is now comprised of a multi-kernel framework, where each kernel can be specifically calibrated to the extracted IMF's basis functions. Furthermore, it is trivial to verify that this stochastic embedding of the EMD method satisfies the objectives set-out in Section III-A.

D. TREATMENT OF THE RESIDUAL TENDENCY STOCHASTIC EMBEDDING
As detailed in Section III, the last component extracted by EMD corresponds to the residual or tendency component r(t). By definition, this last component has a maximum of 94456 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
one convexity change within the domain: [0, T ]. Therefore, it is possible, without loss of generality, to partition it in two subregions ([0, s] and [s, T ]) in which monotonicity applies locally in each. Consequently, one could then impose the following structure on the GP model for R(t) over each region that enforces a stochastic monotonicity as discussed in [96], producing an isotonic restriction on the Gaussian Process. This is achieved by imposing derivative constraints on the sufficient statistics. Effectively, this utilises the fact that a derivative of a Gaussian process is a Gaussian process ( [88]) from which it follows that a convexity constraint will result in conditions on the mean as outlined below: One can then consider to impose these conditions at all outof-sample points R(t * ) in such a manner that, on average, one preserves monotonicity. Given the conditional distribution for R(t * )|R(t 1 ), . . . , R(t N ), one imposes the following conditions on the predictive distribution: There exists a second option available for the treatment of tendency in the stochastic embedding of EMD, which involves rewriting the model in a conditional form as follows: Under this formulation, the monotonicity of the tendency is obtained using the EMD methods pathwise extracted tendency function: r(t). This is equivalent to developing an empirical Bayes formulation of the stochastic EMD embedding -see discussion in [97].

V. AUGMENTATIONS AND EMD METHOD EXTENSIONS
It is possible to build upon the EMD framework to incorporate features of other time series analysis methods, thereby highlighting the strength of EMD to not only significantly improve the spectral resolution of time series but also to be easily augmented by other common time series techniques. This has the advantage of producing non-stationary nonlinear adaptive basis representations that may also capture attributes of classical time series methods ultimately to the betterment of both techniques in terms of characterisation and interpretation as will be discussed below.
In this section, the core features of algorithmic chimeras are briefly discussed before both synthetic and real-world examples using these techniques are demonstrated in Section VII. In Section V-A EMD is augmented with a knot optimization technique that further enhances the robustness of EMD in allowing it to dynamically adjust to frequency content of the time series. In Sections V-B, V-C, and V-D three algorithms are combined with EMD to address specific problems which are expanded upon in Section VII.
EMD-ICA and ICA-EMD can both be used (the only difference being the ordering of the applications) to improve the isolation of underlying structures from numerous realisations of some underlying generating process such as realisations of a Gaussian process with periodic kernels. MDLP is combined with EMD to automatically sort the resulting IMFs based solely on their instantaneous frequencies which will be shown later to efficiently sort structures correctly to reveal the true underlying frequencies of the generating process. Finally, EMD-X11 and X11-EMD are presented with X11 functioning as a moving average applied to the IMFs resulting in finer frequency resolution in constant frequency structures and when EMD is applied to components extracted using X11 this results in finer frequency resolution and noise reduction -both techniques improve upon the individually applied techniques as will be presented in Section VII-B1.

A. EMD AND SERIAL BISECTION
In this section, we discuss two practical statistical problems arising with spline representations of IMFs in the EMD algorithm. Namely the number of knot points, i.e. segmentation of the data support, and the location of the knot points. We will explore how to perform dynamic optimal knot selection and placement for the cubic B-spline characterisation of IMFs in the EMD algorithm. We will therefore explore a knot point allocation optimisation technique from [49] known as Serial Bisection which is specifically designed for B-splines. The method does not rely on a fixed number of knots, whose location it will optimise, but it will rather take only a fixed maximum error term, E max , around which the next knot point location will be optimised, and a lower bound distance term, ϵ, that will end the loop when the distance change is below this minimum distance.
With t = {t 0 , . . . , t N } and accepted sequence of knots τ = {τ 0 , . . . , τ k }, the next iteration of the algorithm is initialised VOLUME 11, 2023 by using the final time point as the next potential knot in the sequence, τ (0) k+1 = t N . As elucidated in Figure 3, with B (0) , c (0) , P (0) as in Equation (31), that have been appropriately resized, the optimisation problem becomes constrained minimisation of: with , and the potential knot distance is bisected until, If this were the standard bisection method the knot point would be accepted, τ k+1 = τ , and if, and, 2 < ϵ, the iterations are stopped and the knot is then τ k+1 is rejected and the previous knot is accepted, τ k+1 = τ (2) k+1 . In addition to the flexible and adaptive IMF bases, a knot optimisation technique also allows the IMF to locally adapt to the time series. In Section VII-A1, this technique allows EMD to locally adapt to the increasing frequencies present in the time series.

B. COMBINING EMD WITH INDEPENDENT COMPONENT ANALYSIS: EMD-ICA & ICA-EMD
EMD may be used in combination with ICA to enhance the results obtained from an ICA method, as will be outlined in this section. Unlike ICA, EMD does not require multiple signals from which different structures can be isolated. In this comparison, the focus will be on the ICA implementation given by the algorithm in [50] named the FastICA which is based on [98] and [99]. The work presented in this paper uses the Python defaults for the FastICA algorithm -the parallel version of FastICA is used and the natural logarithm of the hyperbolic cosine function is used to approximate the negentropy (negative entropy).
Consider function g(x) = tanh(x), g ′ (x) = 1 − tanh 2 (x), and X ∈ R N ×M being a matrix of time series data realisations of length N , w ∈ R N ×1 being a randomised weight vector, and 1 M being column vector of ones of length M , the FastICA algorithm, as in [50], proceeds iteratively by updating the iterative steps: until convergence is achieved. One can then combine EMD with ICA in two different ways. Firstly, the EMD-ICA method in which application of the EMD to each input signal of an ICA method can be beneficial for three reasons: 1) Robustification of the ICA method to outliers and noise: since the EMD representation, especially with a p-spline basis for the IMFs will often produce inputs to the FastICA algorithm that are less affected by noise. 2) Missing data or mixed sampling rates: the ICA method assumes that all input channels time series have the same sampling rate and no missing data. Application of the EMD to the input channel signals first will produce input signals to the FastICA that can be sampled from x IMF (t) at a common set of time points. 3) ICA at different frequency components: the ICA method in its standard form is not able to be focused on providing independent components within a given frequency band. One way to achieve this would be to first filter (bandpass filter) the signal and then apply the Fas-tICA -however, this is sub-optimal for non-linear and non-stationary signals. Instead one can apply the EMD decomposition to each input channel signal and then compute the IFs for each IMF via Hilbert-Transforms and then select the representation of x IMF (t) for each input signal to concentrate only on the IMFs that were present at a given time point in the selected frequency band. Secondly, one can adopt the ICA-EMD approach, which applies EMD after the application of the ICA decomposition to the resultant signals, to study the basis representations for each extracted independent component series.
The ICA-EMD method is illustrated in Section VII-A2, where this technique is applied to a set of realisations of a Gaussian process with an underlying compound periodic kernel. The realisations form the columns of X and a single consistent structure is extracted, w new . One can then apply the EMD to the resultant extracted signal to efficiently isolate the IMFs that accurately quantify the frequency of the underlying generating process.

C. EMD AND THE MINIMUM DESCRIPTION LENGTH PRINCIPLE
Minimum Description Length Principle (MDLP) ( [51]) is an optimal partitioning or quantisation method. It can be used for instance in discretising continuous descriptive variables into categories that still optimise their descriptive differences. MDLP can be combined with an EMD decomposition to produce an adaptively filtered signal reconstruction, where the partitions are applied in time and frequency on the IFs obtained from the transformation of the IMFs to isolate adaptively common time-frequency structures in the signal. This approach will be termed the EMD-MDLP filtered representations.
This technique will be used here to automatically sort different frequency structures using cut-points defined in the MDLP algorithm to partition these structures. For a more detailed review of this technique, refer to [51]. The number of discrete variables to isolate is a hyper-parameter that a user can specify to determine the degree of granularity sought in partitioning the time-frequency plane.
An illustration of the EMD-MDLP method with 3 cutpoints is given as follows. Let S be the set of all IFs at time point t k (MDLP is performed element-wise) and let S T 1 1 and S T 1 2 be a partition of S that is partitioned by cut point T 1 as demonstrated in Figure 4. Let S T 2 1 , S T 2 2 , and S T 3 1 , S T 3 2 be analogously defined. The optimal cut point is chosen such that: with E(T k ; S) being the class information entropy induced by the partition of S by T k defined as: with, where and C i is the class of objects within S T k j that display feature i.

D. EXTENDING X11 VIA ADDITION OF EMD: EMD-X11
The X11 method introduced in [6] and discussed in [7] and [8] is an iterative component extraction framework with the basic version of X11 having a fixed number of iterations, primarily three, compared to EMD's more general iterative sifting approach relying on a stopping criterion and having an a priori unknown number of iterations. Unlike an EMD basis decomposition, X11 only allows the extraction of three components that are subject to very restrictive assumptions about their frequency and general structure. The trend-cycle component is extracted using techniques developed in [13], [14], and [15], with the seasonal component being isolated using seasonal weightings such as S {3×3,3} , S {3×5,24} , and S {3×7,12} with the irregular component, ϵ(t), simply being the remainder after the trend-cycle component, T (t), and seasonal component, S(t), have been isolated from the time series, x(t). Despite the restrictive nature of X11, it can be used to refine results from EMD and vice versa. The output of the X11 method is: Variations on X11, such as X11ARIMA in [9], [10], and [11], and X12ARIMA in [12] have been developed to refine the extraction of these components by formally extrapolating the edge of the time series using autoregressive techniques.
To better extract the trend component, the edges of the time series require classical asymmetric Henderson weights as in [16] and [17] which have been formally re-derived in the Reproducing Kernel Hilbert Space setting in [18], [19], and [20]. Despite the algorithmic variations and weightings, the core X11 method proceeds as in Algorithm 2 with T * i (t) being the i th iteration estimate of the trend component, T (t), S * j (t) being the j th iteration estimate of the seasonal component, S(t), ma k (·) being the symmetric moving average weighting of order k such that, VOLUME 11, 2023 S {m×n,c} (·) being the seasonal symmetric moving average of order m × n such that, where c depends upon the sampling rate of the time series and the seasonal effect to be extracted, such as quarterly or seasonally, and with Hma l (·) being the Henderson moving average weights of order l as in [13], [14], [15], [18], [19], and [20].

Algorithm 2 X11
Require: One could readily extend the X11 procedure to an EMD-X11 procedure where it would allow different time scales for X11 to be considered. This could be meaningful in three different contexts analogous to those outlined in the EMD-ICA context, namely the EMD-X11 algorithm will allow one to: 1) Robustify the X11 method to outliers and noise.
2) Treat missing data in the time series one aims to apply X11 too. 3) Perform X11 on different frequency components by applying X11 decomposition to each IMF or sub-sets of IMFs. It is the last of these that is most intriguing. For instance, consider the EMD decomposition of a time series or a signal x(t) which would have a representation in terms of IMFs given by One could apply the X11 decomposition to either each IMF γ j (t) to obtain a collection of or to a sub-construction such as using only the lowest K − p oscillation count IMFs, which may or may not explicitly correspond to the lowest frequency IMFs. This would allow one to explore the ''local'' time-frequency trend and seasonal components of the EMD decomposition in a manner consistent with X11 analysis.

VI. CHALLENGES WITH COMMON TIME-FREQUENCY ALTERNATIVES TO EMD FOR NON-STATIONARY & NON-LINEAR SIGNALS
A time series contains the complete (available) temporal resolution of a process with no frequency resolution. The Fourier transform of a function, conversely, contains the complete (available) frequency resolution of a process with no temporal resolution. The most common time series analysis techniques attempt to find a desirable compromise with some temporal information and some frequency information. There is a trade-off between time and frequency with a lower bound that is a version of the uncertainty principle, as illustrated below in Figure 5. In Figure 5, temporal windows of widths 2 seconds and 10 seconds are used. The 2 second window is capable of discerning the 0.5Hz structure at each of the 5 possible increments, but it is unable to detect the 0.1Hz. This is a gross over-simplification, but it demonstrates the conundrum. This window has a good temporal resolution -it can detect where the components of the 0.5Hz structures are (in 2 second windows), but it is unable to detect the lower frequency structure. To detect the lower frequency structure, one would need to widen the time sampling window. Also in Figure 5, the window is now wide enough to accommodate the 0.1Hz structure. One would be inclined to think that this is the solution, but note that there is little temporal resolutionlower frequency structures can be discerned, but the precise location of the structures (and changes in these structures) is lost. This is in essence the problem faced in the application of the most common time-frequency transforms.
In addition to the time-frequency trade-off, classical time series analysis methods also struggle with transient structures. In Figure 6, a simple sinusoidal time series with a linear trend, f (t) = 1 10 sin(50πt)+2t with t ∈ [0, 1], is plotted along with the Fourier approximation thereof using three, five, and seven bases. The exponential decay of basis coefficients can also be seen in Figure 6 which attempt to replicate the transient nature of the time series -this will obscure the analysis of any sinusoidal structures present. This shortcoming is demonstrated very clearly using a well-known data set in Section VII-B1. Non-stationarity is solved in non-stationary time series analysis techniques by first performing some form of preprocessing or detrending technique.
Most importantly, not only does EMD, and the related HHT, address this problem related to non-stationary time 94460 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. series, but it also addresses the issue of coefficient proliferation when a time series exhibits frequency modulation. STFT and MWT were both created to address the issue of modulated (or changing) frequency time series. While these techniques were able to solve this issue, to a point, they suffered from a physical limitation demonstrated in Figure 6. The application of STFT and MWT to a combination of modulated sinusoids is demonstrated rather elegantly in Section VII-A1 and Figure 9. Unfortunately, and rather tragically, non-stationary time series analysis techniques such as X11 ( [6]), Singular Spectrum Analysis, SSA, ([100]), and Singular Spectrum Decomposition, SSD, ( [35]), while well suited to detrend time series and therefore solve the problem demonstrated in Figure 6, the design of these techniques assumes the frequencies of the underlying structures to be constant, or in the most flexible of these techniques (SSD) to be at least be confined to narrow bands, which would clearly fail in when applied to the modulated frequency data in Section VII-A1. SSD is compared against STFT, MWT, EFT, EWT, and EMD in Section VII-B1 to fairly compare EMD with STFT and MWT and other more-modern nonlinear and non-stationary decomposition and time-frequency analysis.
The earliest, most robust (given the wide range of algorithmic variations available), and for which exists the most empirical evidence, that addresses both non-stationarity and frequency modulation is EMD. Two techniques that have developed as a result of the ground-breaking work done in [1], [2], and [3] are Empirical Wavelet Transforms (EWT), [101], and Empirical Fourier Decomposition (EFD), [102], each of which clearly cite EMD as inspirations and can unambiguously be interpreted as combining EMD with MWT and STFT, respectively, to address their own shortcomings. These methods, if not already proposed, would fit perfectly within Section V and Section VII as algorithmic chimeras formed by combining EMD with other algorithms. Figure 6 demonstrates what happens when STFT and MWT are applied to non-stationary time series and Figure 5 demonstrates that even if trends are extracted before applying STFT and MWT (EFD and EWT, respectively) there are still physical limitations to these methods when compared against EMD.
Despite EFD and EWT improving upon STFT and MWT in allowing accurate frequency resolution of detrended time series, these are still constructive in nature as the bases used to estimate frequencies present are predefined compared to EMD which is fundamentally non-constructive in nature. Condition 1 and Condition 2 are far less restrictive than the fixed amplitude and fixed frequency sinusoids. The bases of EMD, IMFs, are modulated amplitude and modulated frequency sinusoids which are more flexible than the fixed sinusoids and additionally this allows a finite decomposition as well as separable flexible bases.

A. COVARIANCE AND HIGHER-ORDER NON-STATIONARITY
In [103] the excess kurtosis (Pearson's kurtosis minus 3 for direct comparison with the Normal Distribution) of a general GARCH(p, q) process, y t , is proved for: where Z t is an independent and identically distributed (IID) random variable for all t ∈ {0, 1, . . . , T }. In this work the GARCH(1, 1) model kurtosis is proven for processes that satisfy the stationarity condition of: This theoretical underpinning will be compared against the resulting GARCH(1, 1) noise in Figure 22. A level of GARCH(1,1) will then be added to two chirps and the models will be assessed in terms of their ability to still resolve the frequency spectra of the underlying chirps. That VOLUME 11, 2023 is, a GARCH(1, 1) process: and with Z t described above, has the following excess kurtosis (K y ): with the formula for calculating the ψ k terms from [103] being repeated here: with B being the lag-operator, φ = α + β, and ψ(B) being such that: it follows that: from which it follows by multiplying and collecting terms that: Finally, for a GARCH(1, 1) process that satisfies the stationarity condition in Equation (96) it follows that: A truncated version of this is plotted in Figure 7.

B. SHORT-TIME FOURIER TRANSFORM
The Fourier transform, F[·], of a time series, g(t), is given for frequency f by: All temporal information is lost when this transform is directly applied to a time series. To solve this problem, the temporal domain of the time series is partitioned to localise frequency information. The partition increment width is a hyperparameter, as a wider increment will allow lower frequency structures to be detected, but the temporal location of all frequency structures detected will be less precise. A narrower window will allow greater temporal resolution, at the cost of being unable to detect lower frequency structures as demonstrated in Figure 5.
Often aliasing reducing windows are adopted which are not just hard threshold but rather taper the power of the signal in certain specified frequency ranges captured by a given window's temporal-frequency resolution. Popular choices for such windows are the Hann window, and the Hamming window, and variations thereof, see [56] and [104]. These windows are the simplest members of a family of windows known as the generalised cosine windows. The continuous short-time Fourier transform is shown below in Equation (105).
The Hann window (a 0 = 0.5) and the Hamming window (a 0 = 0.53836) are detailed below in a general form in Equation (106): The motivation for the use of these localising windows is the desire to provide a good trade-off between temporal and frequency resolution, but this technique is still plagued by the problems shown in Figure 5 and Figure 6. Also, as already noted in Section II-A, the inflexibility and non-compact nature of the bases prevent them from capturing complex, non-stationary, discontinuous structures.

C. MORLET WAVELETS
Morlet wavelet analysis ( [105]), based on earlier work by [69], convolves specially constructed wavelet packets to infer frequency content from time series. Two common versions of Morlet wavelet analysis are applied in practice: firstly, those that use fixed wavelet packets with no windowed tapering; and secondly, those that employ flexible windows that are more sensitive to the frequency content of lower frequency structures and the temporal content of higher frequency structures. The fixed wavelet packet equation is: To make the Morlet wavelet adaptive, one can introduce a function, h(σ ) = cσ with c ∈ R, such that: The similarity between this wavelet packet and the time series is determined via convolution. With g(t) as above, one may efficiently perform this convolution in the frequency domain using the Convolution Theorem for Fourier transforms as follows: with F[·] and F −1 [·] being the Fourier transform and inverse Fourier transform respectively. Equation (108) does allow more flexibility than STFT, but still suffers from the same short-comings when dealing with more complex structures that exhibit non-stationarity, non-linearity, and discontinuities.

VII. ILLUSTRATIONS OF EMD ON CHALLENGING TIME SERIES VERSUS ALTERNATIVES
In this section, EMD and its augmentations, discussed in Section V, will be used on synthetic data in Section VII-A and real-world data in Section VII-B. These methods will be compared against the results obtained using the methods discussed in Section VI. In Section VII-A1, frequencymodulated chirps are analysed using STFT and MWT, before knot-point-optimised EMD, as in Section V-A, is used on the same time series. In Section VII-A2, multiple realisations of a Gaussian process are analysed using STFT and MWT, before EMD augmented with ICA and MDLP as in Section V-B and Section V-C methods are applied providing detailed incite in the stochastic setting.
In Section VII-B1, STFT and MWT are applied to nonstationary real-world data revealing the problem already alluded to in Figure 6, before EMD augmented with standard X11 and vice-versa, as in Section V-D, are applied to the same data revealing very accurate frequency structures and improved results when each method is applied to the output of the other when compared against the original outputs. Finally, in Section VII-B2 EMD is applied to globally distributed indices to isolate implicit market factors before creating a generalisation of the CAPM type factor models developed in [52] and [53].

A. SYNTHETIC DATA TIME SERIES CASE STUDIES WITH EMD
In this section, two synthetic examples are used to demonstrate where the utility of EMD exceeds traditional time series analysis techniques. In Section VII-A1, the higher frequency resolution of EMD and explicit structural decomposition will be shown to exceed Short-Time Fourier Transform and Morlet Wavelet Analysis. In addition to this, in the interest of computational efficiency, a serial bisection knot point optimisation algorithm from [49] is used to optimise the B-spline knot point allocation that maintains the frequency resolution while decreasing the number of knots required to capture these features.
In Section VII-A2, a Compound Exponentiated Sine Kernel is used to create seemingly random realisations without any clearly inferable underlying common generating structure. EMD is used to isolate IMFs from each of the realisations. MDLP, as mentioned briefly in Section V-C, and ICA, as mentioned briefly in Section V-B, are then used in conjunction with the IMFs extracted to infer the frequencies of the underlying generating process. MDLP is used to group the IMFs before the Law of Large Numbers and Monte Carlo methods are used to impose confidence bounds on the estimated frequencies of the structures. ICA optimisation methods are used on the realisations to extract a single time series from which IMFs are extracted and the underlying frequency of the generating process is estimated.

1) TIME SERIES WITH MODULATION STRUCTURES AND EMD
In order to illustrate how EMD can naturally detect, in the IMF decomposition, a time series representation constructed from a sequence of modulation transformations, we will take the following example from [106], where our observed signal consists of two chirps, g(t) = sin(250πt 2 ) + sin(80πt 2 ) for t ∈ [0, 1]. This illustration is used to demonstrate the frequency resolution power of EMD over classical methods such as the STFT or Morlet Wavelet analysis methods. To mimic the results in [106], a window of 512 time points and a total length of 3840 time points is used. One can compare the results in [106] with the right figure of Figure 9. One can note the very high-frequency structures present in Figure 8.
Uniform B-spline knot allocation is used in this paper which follows the work done in [39], but there are numerous works such as [38], [107], and [108] where knot optimisation is discussed. An approach, without some form of knot allocation optimisation, would be to increase the number of   knots to capture the highest frequency structures. In the left figure of Figure 10 there is one knot for every ten time points. This is insufficient for capturing the highest frequency components of the chirps. At the lower frequencies, the frequency resolution far exceeds STFT and Morlet Wavelet analysis. By doubling the number of knots the frequency resolution capabilities of EMD is greatly enhanced as can be noted from the right figure in Figure 10.   To this example from [106], one can add different levels of noise measured using the Signal-to-Noise Ratio (SNR) in decibels. The SNR of a noisy time series is calculated as: with P signal and P noise being the power of the signal and the noise, respectfully. In Figures 11-20, the IFs, STFTs, and MWTs, are being shown with varying levels of noise 94464 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.    from −10dB to 10dB to demonstrate the effectiveness of the respective techniques. There are several key inferences from this experiment. The first of these being that increasing levels of noise causes mode mixing in EMD that a number of techniques have been developed to solve such as [87] presenting the masking signal as a technique for improving the IMF separation and [109] presenting Ensemble EMD (EEMD) that is a Noise-Assisted Data Analysis (NADA) technique that uses noise to assist in the differentiation.   The next of these being that the IF, STFT, and MWT of these noisy signals being almost equivalent in resolution. While the clean signal in Figure 10 has higher resolution than is present in Figure 9, the increased resolution is forfeited when noise is increased as a result of the Hilbert transform which is essentially a convolution of time series (x(t)) with the inverse of time ( 1 t ) emphasizing the minute changes in the time series (x(t)) and as such is very sensitive to the inaccurate resolution of the IMFs. This is a shortcoming of EMD com-   pared to non-explicit decomposition time-frequency analysis techniques such as STFT and MWT.
Having observed the structure of the time series in Figure 8 and the frequency structure of this signal in Figure 9 and Figure 10, one can note that the high number of uniformly spaced knots is wasted on the low-frequency content of the signal at the beginning of the signal. It is clear that 384  uniformly spaced knots are insufficient to capture the entire content of the signal, while 768 uniformly spaced knots is most certainly sufficient. Logically, somewhere in this range, there exists an optimal number of knots. Using the Serial Bisection method introduced in [49] one can optimally capture the frequency content of the time series using 549 knots. In Figure 21 one can observe the knots initially being spaced further apart, but as the frequency content of the signal increases the knots begin to be spaced closer together until the spacing between individual knots approaches zero.
To clarify, the left image in Figure 21 is a plot of the distance between successive knots versus the position of the knots. This demonstrates that as the frequency content of the time series increases, the distance between successive knots decreases to be able to capture the high frequency structure. The position of the individual knots is on the x-axis, with the distance between that specific knot and the next knot being on the y-axis -the steep declines indicate an increase in the frequency content via the error bound of the knot optimisation algorithm.
In Figure 21 one can observe that the output of EMD applied using the new optimised knot point allocation method still reveals all of the frequency content. The use of an  error bound in the algorithm allows the knot allocation to dynamically adapt to the local frequency structures of the signals. This would most certainly benefit EMD and it will allow more intricate structures with increased localisation to be discerned from the analysis. Should the Serial Bisection method be substituted for a simplified version that merely bisects the range until an error bound is met, the knot distance versus location plot in Figure 21 would resemble the left image except for the gradients of the perceived slopes would be smaller.
In Figure 22 the calculated excess kurtosis is plotted for the corresponding 3840 generated GARCH(1, 1) noise points for varied values of α and β for comparison with the theoretical kurtosis in Figure 7.
With α = 0.25, β = 0.25, and ω = 0.1 in Equation (97) one generates the GARCH(1, 1) noise of the appropriate length and resulting from standard normal random variables and adds this noise to the time series to arrive at the GARCH(1, 1) infused chirps in Figure 23.
When the STFT and MWT are applied to the time series in Figure 23 one has Figure 24. In Figure 24 the GARCH(1, 1) noise is shown to disturb the analysis by  adding high-frequency content to the low-frequency structure -the high-frequency structure is relatively undisturbedthe smoothing of the time series would result in more energy at lower frequencies. This causes the energy concentration seen in Figure 24.
When the EMD is applied to the time series in Figure 23 one has the resulting IF in Figure 25. The analysis from STFT and MWT is reversed for EMD. The low-frequency structure is isolated (with significantly more noise than the noiseless example) with the frequency content of the high-frequency structure being more obscured than when using STFT and MWT. Some low-frequency structures (almost transient) are also erroneously isolated with their energy concentrated at the lower left corners.

2) EMD DECOMPOSITIONS OF NON-STATIONARY GAUSSIAN PROCESS SAMPLES
In this section, we consider the data generating mechanism to be a Gaussian process with a time series kernel (covariance function) characterisation. We will utilise a Compound Exponentiated Sine kernel to generate observations for the time series -this type of kernel has an interesting structure  that includes a decaying autocorrelation along with a cyclical sub-structure.
In Figure 26 below, one can see the plots for the two separate kernels before they are added and renormalised with a cross-section of the kernel indicating the correlation versus lag of the kernel. The functional form can be seen here: with l 1 = l 2 = 1, p 1 = 2π, and p 2 = 2π 5 . The structure of the correlation seen in Figure 26 below and Equation (111) above indicates there are no negative correlations -as a result of ubiquitous positive autocorrelation structure, nonsensical higher-frequency structures will be created that will unfortunately make the resolution of the true lower frequency structures more difficult.
In Figure 27 one can see these higher-frequency structures that are present in some of the realisations as can be expected from Figure 26 and Equation (111). Given these ten structures, the underlying frequency of the generating process is obscured and any method that can shed a light on the underlying structure will be valued.
In Figure 28 below, one can decipher the structures present at ω = 1 and ω = 5. The higher-order super-harmonics can also be seen. The window width is set to 512 time points to capture the low-frequency structures while still maintaining some temporal resolution. Not only is the resolution rather poor with these methods, but we also have not inferred or isolated structures from which further modelling can be performed.
Two methods are used below with both being augmented and improved by EMD. The first technique used isolates the three highest frequency structures using EMD before they are sorted and grouped by the MDLP algorithm, while the second technique uses ICA to extract a single optimised component from which structures may then be isolated and interpreted using EMD.
In Figure 29 below, the first three IMFs isolated from each Gaussian Process realisation are sorted using the MDLP algorithm with the cut-points for categories 0 and 1 and between 1 and 2 are shown in Figure 29 -the median values are then used to categorise the IMF structures. The highest frequency structure is not shown and it works well to remove noise in the system present as either true noise or higherfrequency super-harmonics that are easily seen in Figure 28 as frequency structures at 10 and 15.
The means are calculated as well as the 95% confidence bounds using Monte Carlo methods and the Central Limit Theorem. The effect of the edges is seen in the mean of Structure 2 -the inability to separate structures is a direct result of needing an efficient edge-fitting technique. The symmetric technique is used in this paper where the last extrema are reflected in the envelope fitting process. See [41], [42], [43], and [44] for direct studies on the effect of using the symmetric edge effect.
In Figure 30 a single component is extracted using the FastICA algorithm from [50]. EMD is then performed on this structure and the resulting Hilbert spectrum is displayed 94468 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  below. The noise and higher-frequency super-harmonics are optimised away and the resulting IMFs indicate structures being present at ω = 1 and ω = 5, or at f = 1 2π and f = 5 2π respectively where the energy densities are most concentrated.
In Figure 31 below, the kernels seen in the top row of Figure 26 are compared against the averaged and floored correlation matrix of the corresponding MDLP components and compared against the floored correlation matrix of the corresponding IMFs extracted from the ICA component. The absolute values of the differences are shown below with minimal deviance from the original compound generating kernel seen in the bottom row of Figure 26.

B. REAL DATA TIME SERIES CASE STUDIES WITH EMD
Having demonstrated EMD on synthetic data in Section VII-A, this section focuses on real-world data and the ability of EMD augmented with X11 and ICA to isolate meaningful structures. In Section VII-B1, the Carbon Dioxide concentration in the atmosphere, which demonstrates an easily visible annual cycle as well as the transient nature, is shown to confuse the Short-Time Fourier Transform and Morlet-Wavelet Transform. EMD far exceeds both of these methods and it is also shown to significantly exceed X11. Most significantly, EMD and X11 are both shown to significantly improve the frequency resolution of either one individually performed.
Finally, in Section VII-B2 one of the amalgamated techniques demonstrated in Section VII-A2 is used on a group of globally distributed indices to extract implicit market factors that may then be used in portfolio construction in a model such as in [110] where instead of measuring the return of an asset relative to only the return on the market, one could generalise this to measuring the asset's return relative to these different factors. These factors may also be used as implicit market factor substitutions in the models proposed by [52] and [53].

1) NON-STATIONARITY
The effectiveness of EMD over STFT and MWT is shown here using a well-known non-stationary data set from [111]. Some readers would have seen a version of this graph, but most, if not all, would have heard of the increasing trend of Carbon Dioxide concentration in parts per million in our atmosphere which is shown in Figure 32. A subsection of this data set is also used in [22] to demonstrate the effectiveness of STL. The annual seasonal component is visible in addition to an upward (possibly exponential) trend.
While seemingly simple, the data set is transient. By directly applying STFT and Morlet wavelets to the data one arrives at Figure 33 below. Despite the visible seasonal component and trend component, the traditional time series techniques struggle -the coefficients are, unfortunately, dominated by the low-frequency structures in an attempt to replicate the transient nature of the signal.
In Figure 34, STFT and MWT are applied to the detrended mean monthly concentration of Carbon Dioxide. The trend removed from the Carbon Dioxide concentration is estimated as the remainder in Equation (40) from the initial application of EMD. It is this application that may be viewed as the stepping stone towards EFD and EWT. Despite the trend being removed, the physical limitation of the technique,   Figure 5, is still clearly demonstrated in Figure 34. By comparing Figure 34 against Figure 36 one can note that, even after the removal of the trend, that EMD is more accurate, the resolution is finer, and by applying EMD-X11, it becomes apparent that detrended or preprocessed STFT or MWT (EWT and EFD) cannot reach the accuracy and robustness of EMD and EMD-X11.

demonstrated in
The sifting portion of the EMD algorithm performed on the Carbon Dioxide concentration in the atmosphere produces the following IMFs in Figure 35. EMD has no difficulty in separating these two components, despite having no a priori knowledge about the annual cycle. This data set may be used to demonstrate the strength of X11, from [6], and STL, from [22], but these time series decomposition techniques rely heavily on the presence of a priori knowledge about monthly or quarterly cycles which most real-world data sets would lack or would contain these in addition to numerous other higher frequency cycles that will only serve to corrupt their analysis. Even in this simple case, where these techniques are appropriate, EMD will be shown to exceed X11.  The Hilbert spectrum of the IMF extracted using the sifting procedure displays the annual cycle of Carbon Dioxide concentration in the atmosphere on the left of Figure 36. The Hilbert spectrum is capable of resolving finer structures. In [1] further analysis is suggested by differentiating the trend consisting only of inflection points, performing the EMD to extract further components, and integrating the results. This may be performed easily, should a finer study of the trend component be required. On the right of Figure 36, X11 is used to smooth and increase the accuracy of the frequency resolution of the inferred structure of the seasonal component.
The X11 method, introduced in [6], is effective at isolating and extracting a seasonal component of a times series. This method makes use of work done in [13], [14], and [15] by using the weightings therein derived to smooth away the trend component of the time series. The work done in [18], [19], and [20] rederives this weighting in the Reproducing Kernel Hilbert Space setting with the added advantage of being easily translatable into an asymmetric weighting at the edges where a full symmetric weighting is no longer possible. X11, when used to augment EMD, acts as a smoother or graduater of the seasonal component to which it is appropriately applied, as it takes advantage of the known monthly frequency of the observations and the known annual cycle 94470 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.   present. It has greatly increased the frequency resolution, while also refining the location of the increase in amplitude. The crux of this section is that EMD assists in stabilising non-stationarity for other methods that are less-well adapted to transient time series. Components preprocessed with EMD become more-locally stationary for easier analysis using X11 and other such algorithms.
On the left image of Figure 37, the spread of the instantaneous frequencies is far greater than that seen in Figure 36 demonstrating the superiority of EMD's instantaneous frequency resolution even when there is a single  seasonal/cyclical component. One would naturally wonder if applying X11 and then extracting components from the single seasonal component using EMD would result in a finer resolution. By using the X11 method to extract the seasonal component and then using EMD to iterate and remove unwanted noise and lower amplitude components results in the right image of Figure 37. It is clear that EMD significantly improves the frequency resolution of the X11 technique.
EMD can be used to significantly improve the results derived using exclusively X11. It is also clear, by comparing Figure 36 and Figure 37 that the preferred order of the application would be to decompose the original time series using EMD and then (if required) using X11 as a smoothing and graduation technique that will explicitly smooth the component. The X11 smoothing of an IMF may be necessary for certain instances due to the non-restrictive and non-constructive nature of the IMF definition leading to potentially (accurate) jagged structures.
In Figure 38 it is clear both visually and numerically that the trend estimated using EMD is more accurate than the trend estimated using SSD. This is a major shortcoming of the technique -EMD removes frequency structures in descending order as opposed to SSD which first removes the estimated trend before removing the subsequent structures based on power. As a result, if the trend is inaccurately estimated for the SSD technique, this error perpetuates and the subsequent structures will be disturbed. In Figure 39, it is clear that SSD components 3 and 4 contain the vast majority of the residual error from the inaccurately estimated trend. In addition to this, all of the annual structure is isolated in IMF 1, whereas the annual structure is spread over SSD components 1, 2, and 4. This is a major shortcoming in any further analysis, but IF may still be estimated owing to the overlaying of the IFs of the various structures in the full Hilbert spectrum as in Figure 44.
In Figures 40, 41, 42, and 43, the STFT of each of the SSD components are individually shown. It follows the analysis of Figure 39, that in Figures 40, 41, and 43 the annual frequency of the components are all shown and this clearly indicates that these three structures have erroneously decomposed a single structure whilst also containing some trend data in Figure 43. This is also perpetuated in the Hilbert transform in Figure 44 this adds credence to the order in which EMD performs the decomposition.
The Hilbert spectrum of all four SSD components overlaid on one another can be seen in Figure 44. The annual structure is isolated more accurately than EFT and EWT, and possibly more accurately than EMD in Figure 36. This Hilbert spectrum, however, does show some errors. The trend, erroneously initially extracted, has caused a cyclical error structure to perpetuate (visible at the base of the Hilbert spectrum) and has caused the annual structure to be inaccurately decomposed. The relative intensity of the annual structure has also been erroneously estimated as can be seen by comparing this with Figure 36.

2) MARKET FACTOR PORTFOLIO CONSTRUCTION
There have been few financial applications of EMD -for some examples see [82] and [112]. In this section, EMD-ICA (as explained in Section V-B) will be used to extract estimates for implicit global market factors. Using these implicit market factors and the work done in [110], [113], and [114] one can create a portfolio minimizing risk, while maximizing return relative to each of these implicit factors compared to the work done in [110], [113], and [114] where only one factor is used which is the return above the standard market return. Alternatively, one could use these implicit market factors in conjunction with the work done in [52] and [53] to model the return of the desired asset relative to these implicit market factors. The market factors extract will be very industry-specific and region-specific with obvious strong explainable links between certain regions and sectors.
In this section ten globally distributed indices or exchange traded funds (ETFs) designed to replicate the desired indices consisting of the ASHT40 (South Africa), DJI (USA), FCHI (France), GDAXI (Germany), HSI (China), IBX50 (Brazil), ISF.L (UK), N225 (Japan), NSEI (India), and STW.AX (Australia) will be used to infer implicit global market factors. The range for these ten indices is all the trading days from 22 February 2016 until 19 February 2021 with the missing trading days in different regions being linearly interpolated. These data were accessed from [115].
Each of the ten market factors are decomposed using EMD into at least 5 IMFs each. Each of the 10 indices as well as the correlations between each of the corresponding IMFs are plotted in Figure 45. For clarity, the correlation plotted in the upper centre image of Figure 45 is the correlation structure that exists between the first IMF extracted from each of the ten indices. The upper right image is the correlation structure present between the second IMFs extracted from each of the indices, etc. This positive correlation structure present between each of the IMFs gives credence to their being some underlying frequency-band-specific market factor driving each of the IMFs -that is that all of the first IMFs extracted from each index is driven by the same frequency factor, etc. The actual implicit factors extracted by applying ICA to each group of ten IMFs can be seen in Figure 51 as well as their respective instantaneous frequencies.
In the following figures, one of the first attempts to relate these implicit market factors to more-established financially engineered factors such as those used in the construction of principal portfolios is presented. A notable exception is the supplementary materials of [72] (also found here [116]) where the fluctuations in the prices of popularly traded  Carbon ETFs are directly related (through frequency bandwidths) to the annual Carbon Dioxide concentration in the atmosphere through EMD.
Also, following on from this work, [117] uses and groups EMD components in narrow frequency bandwidths to forecast covariance in a lagged regularised covariance regression framework in a study of horizon-specific portfolio optimisation using risk premia parity weighting strategies. See [118] for further examples of this regularised covariance regression forecasting using implicit factors such as EMD.
Before the results are interpreted below, the frequencybased hierarchy of the EMD algorithm should be restated in that (though not explicit in the non-constructive formulation of the algorithm) the highest frequency component is isolated first and proceeds in a descending robust frequency bandwidth-specific ordering before the final trend is isolated. This has been shown, through this work, to be relocatable to the PCA components isolated from the same data, despite the PCA components being isolated in an eigenvalue-based decomposition algorithm where components are orthogonal in higher dimensional space and capture the most variance in the system whilst remaining orthogonal.  In Figure 46 the standardised (between 0 and 1) first principal component and the standardised fifth EMD-ICA component are plotted with the standardised DJI (Dow Jones Industrial) alongside the corresponding fifth IMF extracted from the DJI. It was an a priori assumption that this dominant factor (out of the ten described) would be the largest proportionally loaded factor and the dominant source of variance owing to the dominance of the USA stock market. The similarities between the structures are noteworthy and this EMD-ICA factor can be interpreted (alongside corresponding PCA components) as an implicit market factor. This is consistent with assumptions of a dominant factor in the market in non-stationary eigenvalue-based decomposition and extraction techniques.
Owing to the USA Dollar ($) and Japanese Yen (¥) currency pair being the most traded on the international market (and the relative dominance of the Tokyo Stock Exchange globally) a second a priori assumption was that the second principal component and the fourth EMD-ICA component would mostly be the loading of the N225 index. In Figure 47, the standardised second principal component isolated and the fourth EMD-ICA component are plotted alongside the  corresponding N225 index and fourth IMF with the same non-stationary eigenvalue-based decomposition assumptions as before.
Finally, it is assumed that the larger European indices of Western Europe such as FCHI (France), GDAXI (Germany), and ISFL (UK) would be among the most highly loaded of the underlying third implicit market factor which is plotted in Figure 48 in the form of the third PCA component and third EMD-ICA component. The corresponding plots for the fourth and fifth implicit market factors can be found in Figure 49 and Figure 50, respectively, for completeness and further interpretation and comparison with Figure 51.
Taking note of the strongly positive correlation structure present between the corresponding IMFs extracted from ten globally distributed indices, seen in Figure 45, one can estimate implicit global market factors that underlie these highly positively correlated IMFs. Using these market factors and the work presented in [52] and [53] one can meaningfully estimate the expected return of the desired asset above the risk-free rate in an extension of the multi-factor factor model to implicit unobservable, but possibly explainable, structures. The resulting model can be seen below: where r j is the return of the desired asset, r f is the risk-free return, α j , β j , and γ (j,Fi) (t) with i ∈ {1, . . . , k} are optimisable parameters, r M is the market return, r Fi is implicit market factor i, and k is the desired number of market factors. Plots of the derived implicit market factors, as well as their associated instantaneous frequencies, can be seen in Figure 51.
In the top image of Figure 52 the prevailing strong negative correlation between the highest frequency IMF of each of the individual indices and the highest frequency EMD-ICA component can be seen. These correlation structures persist as the frequency of the structures under observation decreases as one progresses down Figure 52. This lends credence to the conjectures extended CAPM-like model for modelling financial security returns as in Equation (112).

VIII. CONCLUSION
This tutorial paper seeks to demonstrate the power of the class of non-linear and non-stationary basis decomposition methods for time series data known as Empirical Mode Decomposition (EMD). Since their introduction, there are many aspects to this class of methods that are under-explored from a statistical perspective and their introduction into the time series and statistical literature is not evident. We, therefore, seek to bring a statistical perspective and tutorial on key aspects of this class of methods in this manuscript. In addition, we contrast and demonstrate how they can be combined with other basis decomposition methods or feature extraction methods to achieve numerous desirable goals. Furthermore, we demonstrate the properties of the EMD approach versus classical time-frequency methods to show how EMD can benefit the statistical analysis of complex time series signals.
In Section VII-A1 and Section VII-B1 the results demonstrated the superior time-frequency resolving power of EMD over STFT and MWT. In Section VII-B1 the underlying concept is demonstrated most clearly, while it is also presented in Section VII-A1. The transient nature of the Carbon Dioxide concentration data causes the first few coefficients to dominate the analysis for both STFT and MWT -the first few trigonometric waves and wavelet packets respectively are summed appropriately to create a transient structure as can be seen in Figure 33. This is, unfortunately, unavoidable without any form of pre-processing of the data. EMD exceeds these techniques by first isolating the structures and then processing the resulting structures using the Hilbert spectrum which reveals the annual structure in Figure 36. One could apply STFT and MWT techniques to the processed data, but one would still be plagued by the same resolution issues discussed in Section VI. The (linearly) modulated chirp signals present in Section VII-A1 from [106] are resolved in Figure 9, but there is far superior time-frequency resolution present using the EMD as seen in Figure 10. EMD, in addition to increased resolution and control, is also able to almost perfectly isolate the two underlying chirps for other potential extensions.
In Section VII-A1, in addition to demonstrating the superiority of EMD over STFT and MWT, the Serial Bisection method for knot point location optimisation was shown to significantly reduce the number of knots required to still adequately capture the highest frequency structures as can be seen in Figure 21. In Section VII-B1, after having demonstrated the superior performance of EMD over STFT and MWT as may be seen by comparing Figure 33 and Figure 36, EMD is shown to also exceed the time-frequency resolution of X11 which can be seen by comparing Figure 36 and Figure 37. Most interestingly, one can see that the output of EMD is significantly improved by augmenting EMD with subsequent X11 filtering leading to an even finer time and frequency resolution as can be seen in Figure 36. EMD was also shown to improve the results of X11 when the seasonal component extracted using X11 was further decomposed using EMD and can be seen in Figure 37.
In Section VII-A2 a compound exponentiated sine squared kernel is used to generate realisations of a process that demonstrates cyclical properties. These realisations are then used to infer valuable frequency information about the underlying generating process. This kernel was chosen for its financial applications as numerous financial processes display cyclical structures. The first analysis technique utilises EMD to sift through the Gaussian Process realisations. The three highest frequency structures are isolated -this is to accommodate noise in the system and extract it as the highest frequency IMFs while the next two IMFs capture the Gaussian Process correlation embedded in the kernel. These IMFs are then grouped according to their instantaneous frequencies using the MDLP from [51]. Once the structures are isolated and the number of each structure known, Monte Carlo methods and the Central Limit Theorem are used to impose confidence bounds on the location of the frequency. One can see in Figure 29 that the structures are very accurately discerned with reasonable confidence bounds, but still with significant edge effects. The second analysis technique uses the FastICA Algorithm proposed in [50] to isolate a single optimised independent component. The component does still have noise, but most of it has been optimised out of the structure owing to the predominant frequency structures present in all the realisations. This is even more visible in Figure 30. The noise is still present as a higher frequency structure, but the structure has less power than the ω = 1 and ω = 5 structures.
In Section VII-B2 the financial applications of EMD are thoroughly explored. The structures present in the IMFs isolated from globally distributed financial data has shown a surprisingly high positive correlation implying the presence of some common underlying process. The underlying generating processes are extracted using EMD and ICA from the grouped and sorted IMFs. The structures extracted are referred to as implicit market factors and may be used in a portfolio setting mirroring the work done in [110], [113], and [114] or they may be used in a market factor setting where explicit market factors used in [52] and [53] are substituted for implicit market factors.
EMD has clearly shown its value when dealing with more challenging features present in time series. The sifting process allows meaningful and informative Hilbert spectrums to be discerned from traditionally difficult structures. Additionally, EMD is not plagued by the same resolution limitations as STFT and MWT. In the stochastic setting, EMD used in conjunction with other methods yields very promising information about the underlying generating process. With these results, one can look forward to further extensions, augmentations, and applications of the very promising EMD technique.