ξ - π : a nonparametric model for neural power spectra decomposition

—The power spectra estimated from the iEEG, EEG, MEG recordings, and spike trains are the mixed representation of aperiodic transient activity and periodic oscillations, i.e., aperiodic component (AC) and periodic component (PC). Quantitative neurophysiology requires precise decomposition preceding parameterizing each component. However, the shape, statistical distribution, scale, and mixing mechanism of AC and PCs are unclear, challenging the effectiveness of current popular parametric models such as FOOOF, IRASA, BOSC, etc. Here, ξ - π was proposed to decompose the neural spectra by embedding the nonparametric spectra estimation with penalized Whittle likelihood and the shape language modeling into the expectation maximization framework. ξ - π was validated on the synthesized spectra with loss statistics and on the real spectra from the sleep EEG and the large sample iEEG with evaluation metrics and neurophysiological evidence. Compared to FOOOF, both the simulation presenting shape irregularities and the batch simulation with multiple isolated peaks indicated that ξ - π improved the fit of AC and PCs with less loss and higher F1-score in recognizing the centering frequencies and the number of peaks; the sleep EEG revealed that ξ - π produced more distinguishable AC exponents and improved the sleep state classification accuracy; the iEEG showed that ξ - π approached the clinical findings in peak discovery. Overall, ξ - π offered good performance in the spectra decomposition, which allows flexible parameterization using descriptive statistics or kernel functions. ξ - π may be a promising tool for brain signal decoding in the fields such as cognitive neuroscience, brain-computer interface, neurofeedback, and brain diseases.


I. INTRODUCTION
N EURAL oscillations shape the neuron extracellular field potentials, transmitting the electrical activities between neural masses.Physically, neural oscillation is repetitive or periodic activities centering frequencies.The synchronous oscillations by millions of neurons accumulate the postsynaptic vs. vs.potentials together into the macroscopically detectable activities by the local field potentials (LFP), intracranial Electroencephalography (iEEG), EEG, and Magnetoencephalography (MEG) recordings [1].The studies of neural oscillations can shed light on the nature of the cognitive process and the origin of brain dysfunction [2]- [5].In addition to neural oscillations, attention has recently returned to aperiodic and transient neural activities which are ubiquitously produced by the neural assemblies.The neural underpinnings and interpretation of aperiodic activity and periodic oscillation are different, but the two types of activities constantly mix in brain recording producing the typical shapes of neural power spectra.The 1/flike background spectra are attributed to the aperiodic activity, while the spectral peaks are reflections of periodic oscillations.Studying neural oscillation has to quantitatively characterize an existing spectral peak's center frequency (CF), bandwidth, and amplitude [6].The mixing phenomenon -that the spectral peaks overlay the 1/f-like background spectra and their mutual interference hinders the study of the aperiodic component (AC) and the periodic component (PC).
Several models have been proposed for EEG/MEG spectral decomposition or parameterization, as shown in Table I.They can be distinguished by the scale in which the spectra are fit: natural or logarithmic.ξ-α model adopts the student T curve function to fit the AC and PCs in the natural scale, centering attention on a sole alpha peak [7].BOSC (Better OSCillation detection) detects the PCs based on a chi-squared distribution and linearly fits the AC in the log-log space assuming a 1/f form [8]. IRASA (Irregular Resampling Auto Spectral Analysis) can estimate AC according to the different robustness of fractal and periodic activities to resampling but cannot isolate the peaks [9].As a recent and popular model, FOOOF (Fitting Oscillations & One Over f) employs the Gaussian and the power law functions to fit the PCs and AC in the log scale, respectively [10].Further, adapted from FOOOF, the SPRiNT (Spectral Parameterization Resolved in Time) allows for the time-resolved spectral decomposition [11], and the PAPTO (Periodic/Aperiodic Parametrization of Transient Oscillations) considers the spectral peak as arising from transient events [12].These models may be limited to incomplete decomposition, parametric fitting with hard kernels, or log scale transformation.
Although the shape, statistical distribution, scale, and the mixing mechanism of background and periodic spectral components are not yet clearly understood, opinions can be drawn from practices.Firstly, as shown in Fig. 1, the shape of iEEG spectra may be of greater diversity and present more peaks than the scalp EEG/MEG spectra.The shapes of AC and PC may not strictly follow the power law function and symmetry bell shape, respectively.The left and right petals of a spectral peak may be convex to CF.Two neighboring peaks easily overlap if their CF are close, where the parametric models may mistakenly fit the two as one peak.Prior studies found that the AC may be either scale-free activity or pink noise, complying with the power law distribution [13].Whereas the physical nature of 1/f like AC is not yet determined, which may be modeled from the 1-order autoregressive process, the point process, the sum of Lorentzians, or the avalanches in self-organized criticality [14]- [16].Thus, the shape of AC and PC may be irregular, with the exact statistical distribution remaining unknown.Secondly, the existing models are inconsistent in scale transformation.ξ-α parametrizes the spectra in the linear power vs. frequency space, BOSC detects the AC in the log-power vs. log-frequency space, IRASA estimates the fractal component of power spectra in the linear power vs. frequency space but fits the power law in the log-power vs. log-frequency space, while FOOOF, SPRiNT, and PAPTO decompose the spectra in the log-power vs. frequency space.Inconsistencies of scale transformation imply two fundamentally different mixing mechanisms of AC and PC: additive and multiplicative modulations.One should not apply log or other nonlinear transformations to the power and frequency before decomposition, avoiding under-or overestimating the parameters [17], [18].Overall, any bias to one component introduced in parametric decomposition and nonlinear transformation will contaminate the fit of other components.Misfit spectral components will derive biased neural oscillation representative parameters.This is critical to practical inferences and theoretical interpretations in quantitative neurophysiology.
To address the known issues, we designed a nonparametric decomposition model following the natural scale and additive mechanism, allowing for parameterization as a subsequent step.Since nonparametric models do not need to predetermine a function but learn from data, they help fit the natural shape of spectral curves.The natural shape characteristics of AC and PC are monotonicity and unimodality.Several models may suit this problem, such as the shape regression, the generalized additive models for location, scale, and shape, the unimodal smoothing, and the shape-constrained additive models [19]- [22].Besides, the unmixing step isolates the unknown components from the power spectra.From the view of Bayesian statistics, this step estimates the latent population variances from the observed sample variances, which may be properly resolved in the expectation-maximization framework.
The structure of this paper was organized as follows: the "Methods" introduced the mathematical basis, the algorithm implementation, the simulation, and the validation with neurophysiology evidence; the "Results" described the model effect; the rest of the paper was a detailed discussion with a brief conclusion.

A. Mathematical basis of ξ-π
1) Decompositon: The neural power spectral analysis commonly takes the smoothed periodogram or the multitapers method by segmenting the recordings into quasi-stationary epoches.The epoches may be called segments, windows, or tapers elsewhere.Assuming the FFT size equals N , the sampling rate is f s, the frequency resolution is ∆f = f s/N .Note that the complex normal distribution of Fourier coefficients is asymptotically independent across frequencies due to the stochastic properties of finite Fourier transforms [23], [24].Thus for the segment s = 1, • • • , n s and the frequency where 1 n ∈ R n×1 is a vector of ones, the Fourier coefficients corresponding to multiple brain processes follow a multivariate , the model error is assumed with the constant variance across frequencies, and the diagonals of matrix Estimating the population variance of Eq. 1 derives the mix of variance components, i.e. the mixed power spectrum where σ f,k ∈ R 1 is the power spectrum of the component k at the frequency f .The observed power spectrum p f is the sample variance over finite segments, which approximates to c f The spectral decomposition is to estimate the variance σ f,k of the latent variables b f,s from p f .This is solved by the expectation-maximization (EM) algorithm.The E step and the M step take the effect of 're-decompose' and 'individual fit' accordingly as shown in Fig. 2. The initial conditions of the EM algorithm are where the InitialDecompostion(•) is shown in 2(c), and the ε 1 , ε 2 , ..., ε n f = 0.01.
In the E step, given i the number of iterations, the parameter θ ε ] T and the pseudo mixed power spectrum c (i) f produced in the M step, taking the minimum norm least square solution to estimate the latent variables b f,s which shows that the Fourier coefficient decomposition behaves as the Wiener filtering.
In addition, the pseudo power spectra of each component and model error are And by means of the variance components model, the complete negative log-likelihood is In the M step, given the pseudo power spectr d The solution to this convex optimization problem is the estimation of spectra component σ .Note that the two parts of spectra component and error in this Q function follow an identical structure which is known as the Whittle likelihood [23]- [25].Theoretically, the Whittle likelihood holds statistical consistency to the spectra density estimation [26], [27].The Whittle likelihood [24] of the spectral component is This allows estimating multiple spectral components in a nonparametric way by incorporating the shape constraint, which is detailed in the section II-A.2.The spectra decomposition error for all frequencies is estimated as by taking a part of average reference [28].
2) Nonparametric fitting: The estimation of σ in the M step is transformed to solve the shape constraint Whittle likelihood.The shape regression, the generalized additive models for location, scale, and shape, the unimodal smoothing, and the shape-constrained additive models [19]- [22] may be applied.However, solving the penalized likelihood estimate with constraints for the multiple frequencies across the general broadband becomes time-consuming.One alternative approach is the shape language modeling (SLM) based on the cubic spline and shape priors [29], [30].The priors from the natural shape characteristics are monotonically decreasing for AC and rising first to the CF then falling for PC.The SLM lowers the uncontrollable flexibility of spline fitting by prescribing the shape priors into a set of shape primitives, performing better fit than parametric models and nonlinear regression in a shorter time.Here, the strategy behind SLM is in line with the penalized Whittle likelihood because both are Bayesian approaches.In the M step of spectra decomposition, the prescription is piecewise polynomial Hermite interpolation jointing with different knots.Specifically, for AC fit, the number of knots is set as n f /4, and the shape is set as monotonous decreasing; for PC fit, the number of knots is set as n f /3, and the shape is set as a simple peak.

B. Algorithm implementation
As shown in Fig. 2, the implementation of ξ-π has the following steps: (a) Input Spectra: Dividing the spectral values by their maximum to benefit the generalizability of parameters set for peak detection.The normalization removes the effect of absolute magnitude and judges the peaks by the scale-free shape properties.(b) PC detection: The PC detection is to isolate the prominent peaks forming the concave shapes, which needs to localize the peaks and the valleys in the spectral curve.The MATLAB 'findpeaks' function is efficient in detecting the peaks.The valleys are detected from the up-down inverted spectra, i.e., the negative of the spectra.The parameters were set to:'min peakwidth, 0.9', 'min peakheight, 0.05', and' min peakprominence, 0.025'.(c) Initial decomposition: Based on the leftmost maximum and the first valley right nearby the maximum, the student t-function was used to make an initial fit of AC.Then it sequentially took the next peak-valley pair to make an initial estimate of PCs.(d) Re-decompose: All initialized components were submitted to the EM algorithm, where the E step regenerates the pseudo power spectra of individual components (Eq.6 7 8).(e) Individual fit: The M step fits the individual component using the SLM.The (d-e) loop will be executed iteratively to minimize the incomplete negative log-likelihood.(f) End of fitting: The iteration ends when the EM algorithm iterates to a preset number of times or the incomplete negative log-likelihood is lower than the threshold.

C. Parameterization
It is necessary to parameterize the AC/PC so as to quantify brain rhythms and represent neural dynamics.ξ-π, as a nonparametric model, allowing flexible parameterization using user-preferred models on the actual needs.For the later model comparison and evaluation, the same function as FOOOF was adopted to quantify decomposed AC/PCs by ξ-π, that is )) P C (13) where the a 1 ,a 2 , and a 3 are the offset, the knee parameter, and the exponent for the AC, respectively.The a 2 is set as 0 if the AC has no knee.In addition, a PC is modeled by the Gaussian function in the Eq. 13 where p 1 ,p 2 and p 3 stand for the maximum power in log scale, the CF, and the half of the bandwidth.Note that the scale transformation is made for comparison.

D. Simulation
Prior findings show that AC commonly exists over the global brain regions but PC may merely exist over the local brain regions [31]- [33].The time series or its corresponding spectra of the aperiodic activity or the periodic oscillation with sole spectra peak can be simulated, providing an objective standard to assess how ξ-π and FOOOF behave.The general scheme is synthesizing the AC and the PCs separately and then summing over AC and PCs as the mixed power spectra.The PC simulation follows two different ways: (1)"PCresampling": pruning the band-limited periodic spectra with shape irregularities after the AC was estimated by IRASA and removed from the real spectra; (2)"PC-sine waves": estimated the spectra from the time series composed by a few sine waves that differ in the CFs for batch simulating the spectra with multiple isolated peaks.
1) AC: Either parametric or nonparametric models subjectively used to simulate AC will later show favor to the corresponding decomposition model in the validation process.A theoretically valid approach is learning the prototypes of AC from the real spectra by unsupervised learning.Thus, we empirically and randomly selected a channel from the no-peak set as the ground truth AC as shown in Fig. 3B.The no-peak set consisting of 297 channels was clustered from the open MNI iEEG database detailed in the section II-E.2 [34].
2) PC-Resampling: As shown in Fig. 3D, the simulation of special cases aims to validate the decomposition effects on the spectra that present a peculiar peak.Although it is common that the spectra peak does not strictly follow the symmetric bell shape, there is no straightforward way to simulate the peculiar peak by generating the time series first.Alternatively, the simulation was done by learning from the real iEEG activity, as shown in Fig. 3A.Firstly, IRASA was applied on the iEEG channels in the precentral gyrus to estimate the AC; secondly, the mixed PC was isolated by subtracting the AC from the original spectra; thirdly, the PC with shape irregularity was obtained by pruning the lower and the higher frequencies.Finally, the three special cases of PCs were simulated: (a) overlapping peak: the CFs of two peaks were close so that the left slope of the peak with higher CF rided on the right slope of the peak with lower CF, forming a high amplitude valley between the two peaks.(b) skewed peak: the left and the right slopes of a peak were not symmetric around its CF.The long tail on one side of the peak slopes made the peak skew toward the other side.(c) concave upward peak: the left and right slopes of a peak were concave upward, forming a sharp hat on the top.
3) PC-Sine waves: Large samples of spectra were generated through batch simulation for loss statistics and quantitative comparison, as shown in Fig. 3C.The number of peaks takes the range of 0-5.The CFs are required to distribute in different narrow bands and follow the priority of α(8-13Hz), θ(4-8Hz), β(13-30Hz), δ(1-4Hz), γ(30-42Hz).The priority of narrowband peak comes from the empirical probability of the peak occurrence in a specific band.In addition, two peaks are avoided from overlapping by setting the two CFs at least distant 3Hz.For instance, when simulating a PC with 2 peaks, we randomly set the CFs, one in the α band and the other one in the θ band.Besides, the amplitudes of sine waves increase with the CFs shifting from δ, θ to α and decrease with the CFs shifting from α, β to γ.Finally, the 100 samples by 60 seconds of time series were generated by the repetitive simulation. is expressed as: where p AC f is the simulated ground truth AC and σAC f is the decomposed/fitted AC.The MSE of decomposed PC takes the form: where the p P C f,κ is the simulated ground truth of PC, σf,κ is the decomposed/fitted PC, and κ is indexing the peaks.
Number of peaks.The set number of peaks in batch simulation is taken as the ground truth to check the decomposition effects in PC detection.The number of decomposed PCs was counted and compared with the simulated truth.The overestimation ratio (OER) and the underestimation ratio (UER) are defined as OER = n κ↑ /n bs , U ER = n κ↓ /n bs (16) where n bs denotes the total number of batch simulated samples, and n κ↑ (n κ↓ ) indicates the number of samples that detected more (less) peaks than the set.
Identification of CFs.To quantify the PC detection, the frequency bins of each simulated sample are labeled as 'CF' or 'no CF' based on whether the frequency was set as a CF or not.Thus, being the CFs or not across the frequencies and the repeated samples form the binary discrimination problem, allowing for the use of model evaluation metrics such as accuracy, precision, recall, and the F1-score.

E. Real data and analysis 1) Sleep state classification using CCSHS EEG:
The open Cleveland Children's Sleep and Health Study (CCSHS) [35], [36] database contains the polysomnography recordings from 515 teenagers during the subject was sleeping.Each subject underwent the recording of two electrodes, C3 and C4, lasting for 41220s.The data were segmented into epochs with 30s duration, inspected to manually remove artifacts, and annotated as Wake, NREM-1(N1), NREM-2(N2), NREM-3(N3), and REM states by trained technicians.Here, the N2 was used to represent the NREM state because of its overwhelming dominance compared with the N1 and N3, both of which are with short coverage and low occurrence [37].The CCSHS EEG was 0.5-50Hz bandpass filtered.
For each sleep state, the C3 and C4 epochs from randomly picked two subjects were concatenated into 10 mins recordings by ignoring the differences across electrodes and subjects.Since the whole sleep recording is long enough, we repeated this procedure sequentially in the time order without using the same epoch twice to create multiple data collections.Totally, the EEG recordings of 3 states by 10 mins by 19 collections were prepared for sleep state classification.Hence, we finally conducted 19 times of independent sleep state classification experiments.

Individual components AC fit PC fit
For the recording per state per collection, the time-varying spectra were estimated by applying a 30s sliding window with 50% overlapping, which yielded 39 windows.The time-varying AC parameters, exponent and offset, were obtained after parameterization.Therefore, we obtained 39 pairs of time-varying AC parameters per state per experiment.Later, the linear discriminant analysis (LDA) was performed to predict the sleep state based on the AC exponents [38].For cross-validation, we used the leave-one-out strategy which partitioned the 39 windows as a training set of 38 windows and a test set of 1 window and repeated 39 folds.Three LDA predictors were constructed for differentiating the states of Wake-N2, Wake-REM, and N2-REM, respectively.The distinguishability of AC exponents across states was further evaluated through prediction accuracies 2) Peak discovery using MNI iEEG: The open MNI iEEG dataset was collected from 106 epilepsy patients (33.1±10.8yrs.) over multi-sites and curated by the Montreal Neurological Institute and Hospital (MNI) [34].In addition to the epileptic lesion regions, the healthy brain regions of individual patients underwent the iEEG recordings as well for brain state monitoring and clinical evaluation.Ethical approval was granted at the MNI (REB vote: MUHC-15-950).Although the number of iEEG channels planted into the healthy brain regions was different for each patient, the accumulated 1772 channels from 106 patients broadly covered the whole brain in the MNI space, providing a spectral atlas of the human brain.The sampling rate was 1000Hz for the raw data but down-sampled to 200Hz.A 64s segment was selected from each channel, which contains no epileptic activity and other artifacts.With the 1772 channels of iEEG, the decomposition models were validated from the following metrics: WLS.The weighted least square (WLS) measures the residuals between the real spectra and the reconstructed global fit (GF) summing over the decomposed AC and PCs.The WLS smaller, the model better.The WLS is expressed as where p f is the real spectra, σAC f and σP C f,κ are the decomposed AC and PCs, and g f is a normalized weight vector with relatively large values in the α, β and θ bands.The spectral magnitudes are considerably different between low and high frequencies, resulting from the typical characteristic of exponentially decaying in magnitude across broadband.The g f was designed as an elastic ruler to improve the reliability of the residual calculation.
R 2 .The R 2 is calculated by taking the square of the correlation coefficient between GF and Spt.It indicates the similarity between the GF spectra and the original spectra.
Whittle likelihood.The Whittle likelihood as Eq.11 takes the form of negative log-likelihood which is a statistical measure to indicate how the model is close to the data.Thus, the smaller the Whittle Likelihood, the better the model.
Peak discovery proportion.The spectra decomposition discovered the peaks and allowed counting the proportion of the number of channels presenting a peak within a narrowband to the total channel number in a brain region.The study [34] reported that four brain regions presented the specific peak discovery proportions after inspection.That is, 72% of 36 channels in the hippocampus have a delta peak at ∼1Hz; 68% of 19 channels in the cuneus have an alpha peak at ∼8Hz; 72% of 39 channels in the opercular part of the inferior frontal gyrus (OFG) have a beta peak at 20-24Hz; 64% of 123 channels in the precentral gyrus has a beta peak at 20-24Hz.These findings are taken as the empirical standard to access the decomposition effects on peak detection.For these regions, after spectra decomposition, the peak discovery proportion was counted and visually validated for each channel.

A. Simulation Analysis
1) Special Cases with Shape Irregularity: Fig. 4 is the demonstration of how ξ-π performed on three typical spectra with peculiar shapes.As to "overlapping" in Fig. 4A, although both models found 1 AC and 2 PCs, ξ-π enabled fitting the AC and 2 PCs with good coincidence and alignment as to the original Spt and the CFs, while FOOOF underfitted the AC and the overlapping peak with a major peak and a minor peak which is just the low amplitude tail in the original Spt.The AC and PC fit show clearly that ξ-π can fit the original Spt with high coincidence, but FOOOF fitted the AC much lower than the spectrum at a very low frequency and failed to fit the overlapping peak by mixing two Gaussians.
In terms of "skewed" in the Fig. 4B, ξ-π decomposed Spt into 1 AC with good coincidence and 1 PC with the low amplitude long tail being underfitted, but FOOOF failed to fit the AC at very low frequency and fitted the sole skewed peak with two peaks.The AC fit shows that the AC decomposed by ξ-π may not exponentially decay to zeros at high frequencies, and FOOOF fitted much higher than the spectrum at very low frequencies.The PC fit clearly displays that ξ-π generally coincided with the ground truth PC except for the long tail with low amplitude and high frequencies, but FOOOF fitted higher amplitude than the ground truth peak, shifted the CFs toward higher frequency, and did not fit the right slope of the PC.
Regarding "concave upward" in Fig. 4C, ξ-π enabled to fit the original shape of ground truth AC and PC, but FOOOF overfitted the AC at very low frequencies and decayed fast to zeros, although the ground truth AC are non-zeros, and FOOOF failed to fit the ground truth PC with high peak amplitude and shifted CF toward lower frequency.
Thus, ξ-π was good at keeping the natural shape characteristics, although the shape may be extraordinary, whereas FOOOF may bring about extremely high amplitude in very low frequency, decay exponentially to zeros in AC fit, and fail to fit the peak shape even shifted the CFs in the PC fit.
2) Batch Simulation and Loss Statistics: Fig. 5A displays the distribution of the set CFs in batch simulation.For each row, the grids in color are the set CFs in a simulated sample, and the number of grids is the same as the number of simulated peaks.The number of peaks and the distribution of the CFs were different for each simulated sample.Fig. 5B is the comparison of the peak number between the simulation and the fit.As to columns from left to right, the preset number of peaks is [0, 1,2,3,4,5].The number of simulated samples with a preset peak number was [19,17,17,11,18,18] correspondingly, which equaled the sum of bubble sizes in the same color within a column.The bubbles along, above, and below the diagonals indicate the proper estimation, the overestimation, and the underestimation of the peak number, respectively.The OER (UER) of ξ-π and FOOOF were 0% (7%) and 47% (30%).This means that ξ-π is reliable and not easy to overestimate and underestimate the peak number; FOOOF may not only easily detect more peaks but also neglect more peaks than ξ-π.Table .II is the loss statistics of decomposition models for fitting AC/PCs and CF identification.The mean±deviation of LogMSE in AC (PC) fit was 0.78±1.46(0.89±1.41) for ξ-π and 2.98±1.25 (1.49±1.26)for FOOOF.For both AC and PC, ξ-π generally had much less MSE than FOOOF; especially, the LogMSE increment from ξ-π to FOOOF in fitting AC was more obvious than that in fitting PC; the LogMSE of ξ-π did not show much difference between AC fit and PC fit, whereas the LogMSE of FOOOF in AC fit was higher than that of FOOOF in PC fit.This may indicate that ξ-π is generally better than FOOOF in terms of the fitting error and the stability in AC and PC fit; FOOOF may generally have a larger fitting error than ξ-π and the AC fit using FOOOF usually produced larger error than the PC fit.Besides, ξ-π achieved generally correct identification of CFs with accuracy(Acc) 99.86%, precision(Prec) 100%, recall(Rec) 97.15%, and F1 98.56%.

B. Validation from Real Data
1) Sleep State Classification using CCSHS EEG: Fig. 6A is the decomposed AC and the power law fit to the C4 spectra of subject 1800001 at the wake-N2-REM sleep state.Unlike FOOOF inherently adopted the power law function and obtained the AC exponent simultaneously, ξ-π only obtained the AC after decomposition.The power law function was applied subsequentially to the ξ-π decomposed AC so as to obtain the AC exponent.Hence, the left and middle plots displayed the decomposed AC and further power law fit to the ξ-π decomposed AC correspondingly, whereas the right plot showed the AC directly fit with the power law function by FOOOF.The comparison between the left and the right plots clearly showed that ξ-π could keep the natural shape characteristic of three sleep states by ignoring the low amplitude loss at high frequencies, while FOOOF only fitted the rough trend with a linear decreasing function in the log-log scale.
In addition, the comparison between the middle and the right plots indicated that all the included angles of wake-N2, N2-REM, and wake-REM derived from ξ-π were greater than that from FOOOF, meaning the greater distances of AC exponents within any pair of sleep states.This can be confirmed from that the AC exponents of wake, N2, and REM by ξ-π and FOOOF were 0.784, 2.077, 2.547, and 0.866, 1.813, 2.199, respectively.Thus, Fig. 6A illustrated that ξ-π yielded greater interstate differences of AC exponents, resulting from a better fit than FOOOF Fig. 6B shows the violin plots of AC exponents of three sleep states after the spectral decomposition was applied.The AC exponents generally showed an increasing trend from wakefulness to N2 and then to REM for both two models.This meant that the AC exponent was positively associated with sleep depth, which was in line with the previous findings [37].Considering that the AC exponents derived from both models followed the correct trend, this may prove that both ξ-π and FOOOF were effective.
The independent sample t-test was made to measure the interstate difference of the AC exponent, which followed the way of measuring the interclass distance of features in the classification task, reflecting the interclass distinguishability.Based on ξ-π, the two p values for the pair "wake-N2" and the pair "N2-REM" were less than 0.001 (***), while based on FOOOF, the p values of the pair "wake-N2" and the pair "N2-REM" were greater than 0.001 but still less than 0.01 (**).This proves that the ξ-π helped improve the interstate distinguishability using the AC exponent as a feature.ξ-π may be an effective method in EEG biomarker extraction for sleep staging studies.
Fig. 6C shows the prediction accuracy of the LDA model using AC exponents to discriminate sleep states.In each bar plot, the mean accuracy and the standard deviation were estimated over 19 collections.The mean accuracies of ξπ vs. FOOOF were 83.47±17.22%vs. 79.73±18.28%for Wake-N2, 98.43±2.67% vs. 96.33±6.08% for Wake-REM and 84.46±18.11%vs. 81.68±21.48%for N2-REM.The independent sample t-test was performed on the accuracies of 19 collections between ξ-π and FOOOF.The p values within each pair of "Wake-N2", "Wake-REM" and "N2-REM" were all less than 0.05.This indicated that the predicting accuracies based on ξ-π were significantly higher than that based on FOOOF.Besides, the standard deviations of accuracies over 19 collections based on ξ-π were all less than that based on FOOOF.Hence, Fig. 6C meant that ξ-π was more stable and significantly more accurate than FOOOF in predicting the sleep states.
2) Peak discovery using MNI iEEG: Fig. 7 is an illustration of spectral decomposition on an iEEG channel from the middle temporal gyrus randomly picked from the MNI iEEG dataset.Each subplot displayed the decomposition with individual components and its "log-log" plot which showed the GF but omitted the PC.The raw spectra mainly contained 1 AC and 1 PC that was slightly skewed and concave upward.ξπ decomposed the raw spectra into 1 AC and 1 PC, whereas FOOOF decomposed the raw spectra into 1 AC and 2 PCs.
The AC fit by ξ-π generally followed the monotonically decreasing trend and had well coincidences in the <8Hz low frequencies, while the AC fit by FOOOF only roughly captured the real AC trend but missed the fit in low (<4Hz) frequencies and the local details at 10-30Hz.The PC fit by ξ-π was only one component, all under the raw, and nearly kept the peak shape, while the PC fit by FOOOF were two Gaussian curves.
The major PC by FOOOF showed an even larger bandwidth and a CF slightly shifted toward high frequency, compared to the raw peak, and the minor PC by FOOOF seemed to overfit in the frequency range where no peak existed most probably.
The GF difference between ξ-π and FOOOF can be clearly inspected from the log-log plot.The GF by ξ-π generally  coincided with the Spt and the tiny fit error in the high frequency >20Hz can be neglected.Nevertheless, the GF by FOOOF hardly fitted the Spt.The fitting error of GF by FOOOF would be more apparent after the log scale was backtransformed to the natural scale.Besides, Table .III shows the comparison quantified by the evaluation metrics with the mean (± standard deviation) over 1772 channels of iEEG.The LogWLS of ξ-π with -0.8±1.5 were generally much lower than that FOOOF with 1.9±1.4.The R 2 of ξ-π with 0.9976±0.0079was generally higher than that of FOOOF with 0.9867±0.0109.And the Whittle likelihood of ξ-π with -212.88±182.13 were generally smaller than that of FOOOF with -201.01±117.28.It was found that ξ-π had a lower Whittle likelihood than FOOOF for 97.2% channels, meaning the superiority of ξ-π.
The empirical standard from the resting MNI spectra atlas summarized in the section II-E.2 were filled in the Table .IV together with the reported proportion by decomposition models.For hippocampus (δ peak) and cuneus (α peak), although both models derived lower proportions than the empirical standard, ξ-π was closer to the empirical standard with 5.34% increment on average than FOOOF.For the OFG and the precentral gyrus, ξ-π detected 16.74% less and 3.48% more peaks than the empirical standard, respectively; whereas FOOOF reported the existence of β peak in almost all channels, which was far from the standard.Except that ξ-π found 6 more channels in the OFG (36 channels) presenting δ peak than the empirical standard, the peak discovery by ξ-π nearly approached the clinical finding with a 3.67% deviation on average for the other three brain regions.
Hence, ξ-π was generally much closer than FOOOF to the empirical standard in peak discovery.FOOOF was not good as ξ-π at detecting peaks in the low-frequency range, such as delta and alpha peaks, and much more sensitive than ξπ to detect the beta peaks, even if no beta peak actually existed.The high sensitivity of FOOOF in detecting beta peaks may validate the high OER found in the section III-A.2.Both loss statistics and the validation against the clinical evidence suggest that ξ-π was effective.

IV. DISCUSSION
Here, ξ-π was designed to address the limitations of parametric models such as ξ-α, BOSC, IRASA, FOOOF, and the FOOOF adapted variants PAPTO and SPRiNT.The core strategy behind ξ-π is "nonparametric" by the combination of power independency across frequencies, the iterative Bayesian estimation of the spectra using Whittle likelihood, and the shape priors.Later, the spectra simulation with peculiar shapes and batch simulation found that (1) ξ-π can follow the natural shape characteristics; (2) ξ-π obtained much lower MSE and was more accurate in detecting the peak number and the CFs than FOOOF.Further, the model was validated in the sleep state classification task to assess the effect of AC fit and in the peak discovery to assess the effect of PC fit.We found that (3) ξ-π derived more significantly distinguishable AC exponents within the sleep state pairs and therefore obtained significantly higher prediction accuracies than FOOOF; (4) ξπ yielded much lower MSE, much higher R 2 and generally lower Whittle likelihood than FOOOF in decomposing large sample of iEEG spectra; besides, ξ-π approached the clinical findings in peak discovery.
A seminal idea of ξ-π is to precisely decompose the spectra as the first step and then offer the flexible use of kernel functions to the users for parameterization.Otherwise, based on the individual PCs, the amplitude, bandwidth, and CF are easily measurable with descriptive statistics where kernel functions like Gaussian and student-t functions will be unnecessary.Perhaps, new parameters from AC and PCs may be discovered and serve as novel quantitative indicators.ξπ allowed flexibility to characterize the parameters of neural oscillation and helps deeply investigate the shape, statistical distribution, scale, and mixing mechanism.
The performances of ξ-π and their comparison with FOOOF were summarized in the following.Firstly, ξ-π is robust to shape irregularities and flexible in parameterization [39], and FOOOF may help with interpretability.When the spectra have the multiscale property in the AC part, the AC exponent heavily depends on the frequencies [9], [40], [41], two CFs are close as shown in Fig. 4A [42], or the spectra present other shape irregularities, ξ-π would be much better than FOOOF.Although AC and PC look like the '1/f' and the bell curve, this may not imply that the AC and PC follow the power law function and the Gaussian function, respectively.Based on this view, ξ-π is valid and robust.Secondly, ξπ fits in the natural scale, while FOOOF fits in the log scale.Although the natural scale displays the spectra shape naturally, the log transformation magnifies the high-frequency low-amplitude peaks [43].Compared with ξ-π, FOOOF is thus more sensitive to high-frequency oscillation, which led to detecting peaks that may not exist in the beta band, as shown in Table .IV.In the natural scale, ξ-π fits predominant peaks but may not be sensitive enough to high-frequency lowamplitude peaks.The additive mechanism in the log scale is identical to the multiplicative mechanism in the natural scale.It is the multiplicative mechanism used in FOOOF that will greatly enlarge the fitting error and interfere with the individual fit.
Existing studies suggest that the AC shape may correlate with excitatory-inhibition (E-I) balance, reflecting the brain states [40], [44], [45].Quantifying the E-I ratios by AC exponent helps characterize the brain state [46], [47].As shown in Fig. 6, ξ-π helped reveal that increased exponents were associated with increased inhibition and decreased arousal states during sleep [44].The MNI iEEG dataset, as one of the largest open iEEG repositories, provided the board coverage of the brain and the resting spectra atlas, thus enabling the validation of decomposition models in PC fit.
The questions of shape, statistical distribution, scale, and mixing mechanism of AC and PCs remain open and demand further neurophysiology evidence.The situations where the AC presents a plateau, the PC crosses the frequency range border, the PC with shape irregularity as shown in Fig. 1, or the peak is undistinguishable are all challenging to FOOOF [48].The statistical function used for parameterization may be adjusted by learning the intrinsic properties of spectral components of neural oscillation.Besides, the scale transformation not only matters the visual prominence of highfrequency PCs but also determines the mixing mechanism.The multiplicative mechanism may be only reasonable when the baseline AC and the narrowband PC caused by experimental stimuli are perfectly coupled by the same underlying source in processing event-related brain activities.Otherwise, the multiplicative model should not be used because it involves nonlinear operations and problematic assumptions [17], [49].The additive model is the most parsimonious viable for decoding resting state activity and most event-related activities.These challenges have been partially answered in this work and may be overcome by ξ-π in the future.
ξ-π not only suited for studying the neural oscillations from the scalp EEG/MEG, but would be especially effective for studying the iEEG, sEEG, the local field potentials, and the spike trains the spectra of which may most probably present shape irregularities.In contrast, FOOOF may be merely useful for scalp EEG/MEG spectra parameterization or rough quantification.
The limitations and outlook are: (1) ξ-π can be extended to decomposition with electrophysiology source imaging [50] and to dynamic spectra decomposition by working with the time-varying spectra estimation [11].(2) As FOOOF is recent, popular, and can be viewed as the most typical parametric model, we consider FOOOF as the main competitor.Further, IRASA, BOSC, and FOOOF-adapted models like SPRiNT and PAPTO can be easily applied for comparison.(3) ξ-π is only handled with neural power, which does not account for brain connectivity.The connectivity decomposition model is under development in our group.(4) The parameter such as AC offset, PC bandwidth, and PC exponent between ξ-π and FOOOF were not much involved.The impacts of ξ-π on them demand further studies.More evidence from cognitive function [51], such as development, aging, memory, attention, and the mechanism of brain dysfunction, may help reveal the model value.

V. CONCLUSION
This study presented a nonparametric model ξ-π as a brand-new solution to resolve the spectral mixing problem of multiple neural oscillatory processes.The invention of ξ-π aimed to address the issues of parametric models in dealing with shape irregularities, unknown statistical distribution, and scale transformation.The elaborated simulation demonstrated the advantages of ξ-π.Besides, the sleep state classification and the peak discovery proved the model values through neurophysiological evidence.The applications of ξ-π will be extensively explored in the fields such as cognition neuroscience, mental disorder, brain-machine interface, etc.The MATLAB packages and tutorials are freely available at https://github.com/ShiangHu/Xi-Pi.

Fig. 1 .
Fig. 1.The comparison of spectral shapes between the scalp and the intracranial EEG.Left: the resting EEG power spectra of 10 parietal-occipital electrodes per subject by 13 subjects.Right: the resting and normal iEEG spectra of 123 channels in the precentral gyrus.

Fig. 2 .
Fig. 2. Schematic flowchart of ξ-π.(a).input spectra; (b).peak detection; (c).initial decomposition using the student t function; (d).re-decompose the spectra through the E step; (e).fit the individual component using the SLM through the M step; (f).output the AC and PCs at the end of fitting.Spt: Spectra, GF: global fit.

4 )Fig. 3 .
Fig.3.The spectra simulation pipeline.(A).PC-Resampling: IRASA was applied to remove the AC of a time series; then the skewed, overlapping, or concave upward peak was pruned from the mixed PC; (B).AC was randomly selected from the no-peak set (297 channels); (C).PC-Sine waves: mixing no more than 5 types of sine waves with CFs in different narrow bands; the priority of sine waves follows the order [α, θ, β, δ, γ]; the mixed PC was obtained after estimating the spectra of mixed time series; (D).Synthesized Spt: Special cases -adding the PC from (A) to the AC; Batch simulation: adding the PC from (C) to the AC.

Fig. 4 .
Fig. 4. Special cases with shape irregularity.(A).overlapping peak; (B).skewed peak; (C).concave upward peak.The 1-2 columns display the decomposed individual components.The 3-4 columns are the AC fit to the ground truth AC and the composite PC fit to the ground truth PC by summing over the individual PCs.Spt: simulated spectra; All the plots are the power against the frequency(Hz) in the natural scale.

Fig. 5 .
Fig. 5.The set CFs and the comparison of peaks number.(A).The set CFs in batch simulation; (B).The number of fit peaks vs. the number of simulated peaks; the bubble radius positively correlates with the sample size.