Electromagnetic Source Imaging via Bayesian Modeling With Smoothness in Spatial and Temporal Domains

Accurate reconstruction of cortical activation from electroencephalography and magnetoencephalography (E/MEG) is a long-standing challenge because of the inherently ill-posed inverse problem. In this paper, a novel algorithm under the empirical Bayesian framework, source imaging with smoothness in spatial and temporal domains (SI-SST), is proposed to address this issue. In SI-SST, current sources are decomposed into the product of spatial smoothing kernel, sparseness encoding coefficients, and temporal basis functions (TBFs). Further smoothness is integrated in the temporal domain with the employment of an underlying autoregressive model. Because sparseness encoding coefficients are constructed depending on overlapped clusters over cortex in this model, we derived a novel update rule based on fixed-point criterion instead of the convexity based approach which becomes invalid in this scenario. Entire variables and hyper parameters are updated alternatively in the variational inference procedure. SI-SST was assessed by multiple metrics with both simulated and experimental datasets. In practice, SI-SST had the superior reconstruction performance in both spatial extents and temporal profiles compared to the benchmarks.

forward and inverse models successively. The inverse problem aims at estimating current sources on cortical surface based on E/MEG recordings and the lead field matrix calculated in forward problem. A wide variety of source imaging algorithms for the inverse problem can be categorized into dipole fitting and distributed source imaging. In dipole fitting, it is essential to estimate the brain activity with a few equivalent current dipoles (ECDs) [3], [4]. Because of the preference for acquiring sparse results, dipole fitting methods would obtain poor imaging performance when sources are activated with sophisticated configuration.
In distributed source imaging models, the sources are located on voxels discretized over the cortical surface. Unfortunately, it involves solving an inherently ill-posed linear inverse problem since the source space dimension (∼ O(10 4 )) is always enormously larger than the sensor space's (∼ O(10 2 )). Therefore, prior assumptions are essential to acquire meaningful estimates in the neurophysiology context. In the minimum-norm estimate (MNE) [5], the introduced L 2 -norm penalty of current sources is interpreted as underlying energies which are expected as small as possible. Low resolution brain electromagnetic tomography (LORETA) [6] and standard LORETA [7] are variants of MNE as well, in which the spatial smoothness are integrated through discrete Laplacian operator. The aforementioned algorithms can be framed into the Bayesian scheme with the L 2 -norm penalty interpreted as the negative log likelihood. Moreover, under empirical Bayesian framework [8], which is also referred to as sparse Bayesian learning (SBL), hyper parameters introduced in the regularization constraints can be updated with other coefficients simultaneously. Champagne [9] is one of the representative SBL algorithms which is constructed with relaxed orientation for each voxel. It outperforms several benchmarks in various source configurations.
Spatial smoothing kernel is widely used for the recovery of extended neural activation, with which current sources are converted into a sparse latent space. In Smooth Champagne [10], smoothing kernel is constructed directly based on the distances between voxels. On the contrary, smoothing kernel in BESITES [11] is formulated implicitly with the Markov Random Field (MRF) model [12]. Additionally, spatial Gaussian functions [13] and cortical patch bases [14] have been proposed with the terminology of spatial basis functions (SBFs), which play the equivalent role with smoothing kernel. The source imaging algorithms mentioned above emphasize the spatial constraints but pay little attention to correlations in the temporal domain. Consideration of these correlations benefits to both time course reconstruction and source localization thanks to the intrinsic properties of the neural activation. It is convenient and efficient to integrate time-related information with the usage of temporal basis functions (TBFs). Current sources are restricted in a subspace spanned by the TBFs with much less dimensions than the number of time points. TBFs can be either predefined, such as wavelet dictionaries [15] and Gabor atoms [16], or estimated in data-driven manners like singular value decomposition (SVD) [17] and stimulus-evoked factor analysis (SEFA) [18]. Furthermore, to obviate the incompatibility between chosen TBFs and subsequent algorithms, source imaging based on spatio-temporal basis function (SI-STBF) [19] has been proposed where TBFs are allowed to be updated with other coefficients simultaneously under the empirical Bayesian framework.
Motivated by the prior studies, we propose a novel source imaging algorithm based on a Bayesian probabilistic model where the smoothness is integrated in both spatial and temporal domains. Entire hyper parameters can be updated in a data-driven fashion. The main contributions and creative points of this paper are twofold. First, further temporal smoothness is enforced via an underlying autoregressive model. TBFs are represented as the sum of self average at previous time points and residual error variables. The number of TBFs can be adaptively determined by pruning irrelevant components under the empirical Bayesian framework. Second, we propose a new update rule based on fixed-point criterion for scale parameters in sparseness encoding coefficients. The novel update manner maintains applicable when the typically used convexity based approach becomes invalid.
The rest of this paper is organized as follows. In Section II, we derive the probabilistic Bayesian model and illustrate the corresponding algorithm in detail. In Section III, we expound the simulation protocol and performance metrics. Section IV demonstrates the source imaging results with the comparison to benchmarks for both simulated and experimental E/MEG datasets, followed by some brief discussion in Section V and conclusions in Section VI.
For notation, upper and lower case bold face letters denote matrices and vectors respectively. Scalars are represented by roman lowercase variables. · F denotes the Frobenius norm of a matrix. x i is the i th column of matrix X. I indicates the identity matrix and 1 m denotes a m×1 vector with all elements being unit. N (x|μ, ) denotes a Gaussian distribution of x with the mean μ and covariance .

A. Probabilistic Generative Model
With the help of Maxwell's equation [20], the E/MEG recordings can be approximately expressed as the linear mapping of current sources is the E/MEG measurements with P sensors and T time points, b t represents a snapshot of measured signal at time point t. S ∈ R D×T denotes the current sources and D is the number of discretized voxels over cortical surface. L ∈ R P×D is the lead field matrix whose columns reflect the sensors response induced by the unit current in a certain candidate location. The orientation of each source is restricted perpendicular to the cortical surface. ε ∈ R P×T indicates the measurement noise with independently and identically Gaussian distribution N (ε t |0, ε ) at each time point. The model can be easily whiten via SVD of the covariance as ε = U U . Multiplying by −1/2 U from the left, (1) is converted intoB =L S +ε, wherẽ Therefore, we assume the model has been whitened and the tildes are dropped in the remainder of this paper without loss of generality. 1) Temporal Modeling: In SI-STBF [19], current sources S are constructed as the linear combination of K learnable temporal basis functions (TBFs) kth TBF with T time points. W are composed of weighting coefficients vector w k correlated to the kth TBF. Both and W are updated during the optimization procedure in SI-STBF.
In this paper, we inherit the factorization in (2) and impose further smoothness in the TBFs through exploiting an underlying autoregressive (AR) model. Specifically, the amplitude at time point t in the kth TBF, denoted by φ t k , is modeled as a noisy average of previous amplitudes with the width ϒ where ξ t k denotes the residual error variable. (3) can be expressed collectively as follows: where N i j denotes the (i, j ) th element of N. Rather than enforcing the prior distribution in TBFs directly, we assume ξ k ∼ N ξ k |0, α −1 k I implying the constraints for φ k based on the linear relationship between them.
2) Spatial Modeling: To reconstruct the current sources with spatial coherence and distributed activation, weighting coefficients W is factorized into the product of smoothing kernel A and sparseness encoding coefficients as where With the employment of (7), weighting coefficients are transformed into the latent space, in which the sparseness can be readily imposed. Specifically, A is defined as a Green's function [21] based on a graph Laplacian over the cortical surface, and is constructed depending on the overlapped clusters of dipoles in the following manner: ζ [i],k indicates the coefficients vector corresponding to the i th cluster. The zero mean Gaussian distribution with learnable scale γ i is imposed to ζ [i],k as follows: where I [i] is a diagonal matrix with 1 in the position of the i th cluster and 0 otherwise. Hence, the dth component of ζ [i],k is enforced to zero if the dth dipole is not included in the i th cluster. For convenience, G i is introduced to denote the dipole indexes in the i th cluster. The variables ζ [i],k with distinct cluster indexes i are assumed mutually statistically independent. Because of the linear relationship in (8), θ k implicitly follows the distribution as: where I d denotes the cluster indexes with The detailed construction procedure of smoothing kernel A and clusters {G i } is presented in Section II-C.

B. Full Imaging Algorithm
Substituting (5) and (7) into (2), current sources are represented as below According to (1), the generative model is converted into the following form where F = L A. Both θ k and ξ k are assumed statistically independent with different k. Therefore, the full probabilistic model is constructed as follows The full probabilistic generative model is presented graphically in Fig. 1. Estimation of posterior p ( , |B) is adequate to reconstruct current sources S from the E/MEG measurements B. Nevertheless, it is intractable owing to the coupling between and in the generative model. Instead, approximate posterior q ( , ) is estimated under the variational Bayesian (VB) framework [22], [23], which can be obtained by maximizing the free energy F with the definition as where f (x) q(x) denotes the expectation of f (x) under the distribution q (x). Mean field assumption is widely used to establish the VB algorithms, in which the variational posterior is decomposed into mutually independent components. Each ingredient is updated alternatively until the free energy can be hardly promoted or another stopping criterion is fulfilled.
In this work, we assume that q ( , ) is factorized over groups in the following manner The VB algorithm can be framed into empirical Bayesian scheme with its update procedure partitioned into two alternate steps as expectation-maximization (EM) algorithm [24], [25].
In the VB-E step, F is promoted w.r.t the approximate posterior, given the hyper parameters α, γ . In the VB-M step, free energy is maximized w.r.t the hyper parameters with fixed q (θ k ) and q ξ k for each k. 1) VB-E Step: In the VB-E step, to derive the update rule for each ingredient q (θ k ) in the variational posterior, F is maximized with fixed q θ j , j = k, q ( ) as well as entires hyper parameters. The update rule is obtained as The detail derivation is presented in the supplemental document. Similarly, the update for q ξ k is given as 2) VB-M Step: In the VB-M step, α, γ are optimized by promoting F with fixed q ( , ). Specifically, the EM-based update for α k is obtained by maximizing log p ξ k |α k q(ξ k ) with fixed q ξ k as below To avoid the prohibitively slow convergence of EM-based update for γ owing to high-resolution lead field matrix, a convexity based approach has been widely used to facilitate the optimization [8], [9], [26]. It differs from the EM-based update manner in considering the influence of hyper parameters embedded in the variational posterior. To derive the convex-based update rule, free energy needs to be acquired with fixed q ( ) and q ( ) as (the detailed derivation is presented in the supplemental material) Unfortunately, because the termθ k θ k in (19) is no longer linear to γ i , we are not allowed to derive a convex-based update rule for γ even though log | | + log | θk | is still convex to γ −1 i . To remedy this issue, we propose a novel update manner based on the fixed-point criterion by taking the gradient of F w.r.t γ and rearranging terms [8], [27]. The update formulas for γ are as follows where i indicates the group index and , I [i] are consistent with the definition in (9). F [i] is composed of the columns of F with the indexes in G i . Although the free energy is not guaranteed to be promoted with this update rule in every iteration, we have seen no exception in practice.
The expectation terms in the above update equations are computed as follows In summery, the variational posterior q (θ k ) , q ξ k as well as the hyper parameters α, γ are updated alternately with (16) (17) (18) (20) in each epoch until the relative change of F is less than specific threshold. The procedure is encapsulated in Algorithm 1, which is referred to as SI-SST (Source Imaging with Smoothness on Spatial and Temporal domain).

Input:
Measurement B, lead field matrix L, spatial smoothing kernel A, temporal smooth matrix M, cortical clusters {G i } and tolerance δ Output: Current sources estimation S 1: Initialize encoding coefficients , TBFs and the hyper parameters γ , α, calculate with = M −1 2: repeat 3: update the variational posterior distribution of θ k , ξ k with (16) and (17) respectively 4: update the hyper parameters α, γ using (18) and (20) respectively 5: compute the free energy F based on (19) 6: until relative change of F < δ 7: S = A¯ ¯ M , where¯ and¯ are means of the variational posterior q ( ) and q ( ) respectively

C. Construction of Clusters and Smoothing Kernel
In this work, spatial smoothing kernel A is derived from the spatial coherence prior [25] and the clusters {G i } are constructed by the data drive parcellization (DDP) of the cortical surface [28].

1) Spatial Smoothness Constraint:
We construct the smoothing kernel based on a graph LaplacianÃ over the geodesic surface [25], [28].Ã ∈ R D×D is derived from the adjacency matrix A ∈ R D×D , which reflects the face connection of cortical surface (A i j = 1 if dipole i and j are neighboring and A i j = 0 otherwise) The smoothing kernel is defined as the Green's function [21] with positive scale σ and it can be approximated by the first several terms in the Taylor expansion as: Notably, the approximation can be interpreted as weighted average of different orders of graph LaplacianÃ. Meanwhile, σ controls the degree of spatial smoothness of the smoothing kernel A. We set σ = 0.6 recommended by [25].
2) DDP of the Cortical Surface: DDP consists of a pre-localization procedure called Multivariate Source Prelocalization (MSP) [29] and a subsequent region growing process. MSP estimates a probability like coefficient which measures the contribution of each dipole to the E/MEG measurements. The dipole with highest MSP score is selected as the seed and eliminated from candidate set. Region is grown around this seed by clustering the neighbor remaining dipoles with a specific spatial scale s. Non-overlapping parcels P(s) are generated over the cortical surface by conducting the aforementioned process repeatedly until no dipole is left. The detailed procedures of MSP and DDP are presented in the appendix of [28].
To account for the spatial-temporal behaviors of the brain activity, MSP is conducted based on SVD of E/MEG measurements in [28]. The average of MSP coefficients derived from distinct principle components of the observations are used in the subsequent region growing process. In this work, two different types of MSP score are evaluated and P 1 (s), P 2 (s) are derived from the corresponding MSP coefficients. Concretely, they are obtained as follows: 1) average of MSP scores estimated from each principle component which is extracted from E/MEG recordings. 2) estimated from original observation. The spatial scale is chosen s = 1 and the eventual clusters of dipoles are organized as

D. Implementational Details
In this section, we discuss some implementational details concerning the initialization and computational cost of SI-SST in practice.
1) Initialization and Hyper Parameters: Because the iterative update is adopted in SI-SST, the initial values of parameters and hyperparameters are necessary and essential to the entire algorithm. In our simulations, encoding coefficients is initialized to 0 while the TBFs is initialized based on the SVD of E/MEG measurements B. Specifically, the decomposition B = U V is conducted by SVD and the first K components of V with largest singular values are chosen as the initial value of . We set K = 7 in this work. The initial hyper parameters is chosen as γ = 1 d γ , α = 1 K . It is worth noting that the final numbers of clusters {G i } and TBFs are automatically determined by the hyper parameters γ and α after convergence. More concretely, if γ i is large enough (e.g., γ i min(γ ) > 10 6 ), the corresponding cluster G i is pruned during the optimization because it has neglectable influence in the computation and can be interpreted as irrelevant to the measurements. Similarly, if α k is large enough (e.g., α k min(α) > 10 1 ), the associated ingredients θ k and ξ k are eliminated from VB-EM procedure.

2) Stopping Criterion and Computation Requirements:
Throughout our implementation, iterative update is terminated if the relative change in free energy F is smaller than a specific tolerance value δ. In this work, δ is chosen as 10 −8 .
For P = 62, T = 501, D = 6001, the number of parcels {G i } derived by DDP is typically around 1600 which equals the dimensions of γ . Using a standard PC (Corei7-10700 CPU 2.9 GHz and 16 GB RAM), the convergence time is approximately 40 seconds. To facilitate reproducibility, the MATLAB code for the implementation of SI-SST is available at the following URL: https://github.com/LiangJiawen1994/Source-Imaging.git.

A. Benchmark Source Imaging Algorithms
In this paper, SI-SST is tested against SI-STBF [19] for the similar modeling in both spatial and temporal domain. Additionally, SI-SST is compared with the widely used L 2 norm based method, LORETA [6], as well as the sparse-constrained algorithm based on empirical Bayesian learning, SBL [8]. Furthermore, VB-GLM [15] and STTONNICA [30] are chosen as the benchmarks for the usage of both spatial and temporal constraints. The detailed description of the benchmark methods is presented in the supplementary document.

B. Validation Dataset
The cortical surface used for numerical simulations is obtained by downsampling the default surface of MNI/ICBM152 anatomy distributed with Brainstorm [31]. The downsampled cortical mesh consists of 6001 dipoles. The lead-field matrix is acquired via OpenMEEG [32] software package embedded in Brainstorm, based on the sensor configuration in 64-channel Neuroscan Quik-cap system and three layers head surface obtained from Boundary Element Method (BEM).
To generate spatial extended sources for Monte-Carlo experiments, we firstly select the seed dipoles on cortical mesh randomly and then absorb neighbor elements into the patches iteratively until the desired area of activated region is achieved. To reduce the variance of assessments caused by the diverse patches positions, 100 Monte-Carlo experiments were carried out such that the majority of the cortical surface is covered by the generated extended patches. Dipoles are activated with Gaussian-damped sinusoidal time courses which is defined as where f, τ and ω denote the frequency, phase and damping rate respectively. The time courses are partitioned into pre-and post-stimulus periods with 250 time points per phase. Sources are considered to be active during the post-stimulus period. Simulated sources are transformed into simulated measurements with the lead field matrix and the Gaussian noise in line with (1). The noise is integrated in specific signal-to-noise ratio (SNR) with the definition as SNR = 20 log 10 L S F ε F . In order to investigate the influence of multiple factors on the proposed algorithm, following scenarios are under consideration: 1) Single patch with various extents. SI-SST is tested with the compared methods under the configuration of one patch source with the following extents: 0.5cm 2 , 2cm 2 , 6cm 2 , 10cm 2 , 18cm 2 , and 32cm 2 . SNR is set 5 dB. 2) Simulation with different SNRs. Two patches with fixed 6cm 2 extents are simulated with varying SNRs including −5dB, 0dB, 5dB, 10dB, 15dB, and 20dB.

C. Validation Metrics
In this work, four validation metrics are utilized to acquire the quantitatively and comprehensively evaluation of the source imaging performance. The first metric is the area under the receiver operating characteristic curve (AUC) [23], which is an effective assessment for sensitivity and specificity of the detection. The next metric is the spatial dispersion (SD), which measures the spatial blurriness of the estimated sources [33], [34]. To evaluate the correctness of the localization, the distance of localization error (DLE) [33], [34] is introduced. Moreover, shape error (SE), which is the square error between the normalized estimated and simulated sources, is utilized to evaluate the accuracy of shape fitness of reconstructed time courses. Finally, mean correlation coefficient (MCC) measures the temporal fitness in another aspect, which represents the averaged correlation coefficient between the mean simulated and reconstructed times courses among active regions. For good source imaging performance, the large AUC and MCC as well as small SD, DLE, and SE values are desired. The detailed computations of the aforementioned metrics are presented in the supplemental document. The significance of the entire results is assessed by Kruskal-Wallis test, which plays the role as the non-parametric counterpart of one-way analysis of variance. If a test statistic from the Kruskal-Wallis test is significant, Wilcoxon rank sum tests are subsequently performed in SI-SST against each of the other compared methods to determine if SI-SST yields significantly better estimates.

IV. RESULTS
For clear visualization of the source imaging results at specific time point, the thresholded absolute value of the current sources, which is normalized with unit maxima, is shown on the cortical surface in this paper. Additionally, the thresholds are determined upon the level of background activity based on Otsu's method [28], [35], [36].
A. Results of Simulated Data 1) Influence of Spatial Extent: Fig. 2 depicts the performance metrics of algorithms with varying extents. As shown in the plots, LORETA and VB-GLM acquire upgraded reconstruction performance with increasing AUCs ( p < 0.01) and decreasing DLEs ( p < 0.01), SDs ( p < 0.01), and SEs ( p < 0.01) as the spatial extents increase. On the contrary, SBL attains degraded performance with increasing extents. STTONNICA is not capable to acquire stable AUC values but it achieves clearly descending values of DLE ( p < 0.01), SD ( p < 0.01) and SE ( p < 0.01). Notably, instead of being monotonic as the active patches enlarge, AUC values of SI-SST increase first and then descend. At the mean time, it acquires the values of SD, DLE, and SE in the converse tendency against AUC. In summary, SI-SST is appropriate to reconstruct the sources within a specific range of extents. Additionally, SI-STBF performs similarly as SI-SST owing to analogous spatial modeling. Regarding the MCC, SI-SST and VB-GLM acquire the stable temporal fitness and have superior performance against the compared methods with varying spatial extents. LORETA and SBL, both of which are modeling without the usage of TBFs, result in the relatively low MCCs and wild oscillations in statistics. Moreover, SI-SST outperforms all benchmarks with the exception in some extreme scenarios. Specifically, it attains the highest AUC ( p < 0.01), MCC ( p < 0.01 except VB-GLM) and lowest DLE ( p < 0.01), SD ( p < 0.01), and SE ( p < 0.05) values among all the compared methods with extents from 2cm 2 to 18cm 2 .
To more concretely illustrate how the source imaging capabilities of different algorithms are influenced by spatial extents, one realization of simulations with the imaging results is depicted in Fig. 3. Two active sources are located on postcentral gyrus and middle temporal area respectively with individual time course as shown in Fig. 4(a). LORETA and VB-GLM always reconstruct the sources with diffuse extents but SBL acquire focal imaging results invariably. Additionally, STTONNICA is difficult to acquire stable and reliable imaging results with varying extents sources. In line with Fig. 2, SI-SST fails to obtain perfect source imaging in extremely focal or diffuse sources activation but it reconstruct the sources with most plausible extents and accurate localizations among the compared methods. SI-STBF has the analogous performance with SI-SST but obtains sightly worse imaging results. Fig. 4(b) demonstrates the time courses reconstruction of each algorithms when sources are activated with the area of 10cm 2 . The correlation coefficients between the simulated and estimated time courses are depicted in each plots. SI-SST  produces the most accurate temporal profile in both postcentral gyrus and middle temporal area among all the compared methods.
2) Influence of SNR: The performance metrics of different algorithms with various SNRs are presented in Fig. 5. The source imaging performance of all the algorithms except STTONNICA is significantly influenced by the SNRs. As the SNRs increase, they acquire increasing AUCs ( p < 0.01), MCCs ( p < 0.01) and descending values of SD ( p < 0.01), DLE ( p < 0.01), and SE ( p < 0.01). SI-SST is relatively robust to the SNRs and outperforms all the benchmarks. When noise is integrated with huge intensity (i.e., SNR is less than 0 dB), SI-SST acquires significantly superior performance in multiple criteria while the advantage gradually vanishes as SNRs enlarge.
To further illustrate the variation on sources reconstruction capabilities of different algorithms caused by varying SNRs, one simulation with the imaging results are presented in Fig. 6. Two sources are located on postcentral gyrus and middle temporal area resembling to the one in Fig. 3. Nevertheless, the sources are activated with the identical temporal profiles in this realization. Consistent with the demonstration in Fig. 5, all the methods except STTONNICA can reconstruct the sources with more accurate extents and locations with the decreased intensity of noise. Owing to the identical time courses of two active sources in this example, SI-SST is empirically capable to deal with the situation where different sources are activated with highly correlated temporal patterns.

B. Results of Real Data 1) Results With Median Nerve Stimulation MEG Data:
The MEG data were acquired at the Massachusetts General Hospital in 2005 with a Neuromag Vectorview 306 system (102 magnetometers and 204 planar gradiometers) based on the experiments of median nerve stimulation. The dataset was composed of MEG measurements recorded at 1793 Hz of one healthy subject from stimulations at both arms and it was available from Brainstorm download page. 1 In terms of our analysis, we used merely the recordings during the left arm stimulation. The pre-stimulus window is -100 ms to 0 ms and the post-stimulus window was chosen as 0 ms to 300 ms, where 0 ms indicated the stimulus onset. The averaged respond across 301 trials was used in the subsequent procedure. Noise covariance was estimated from the pre-stimulus record of both arms. The overlapping spheres model [37], which was efficient for MEG data without loss of accuracy, was used to acquire the lead field matrix according to the subject's individual anatomy with its cortical surface downsampled with 6001 voxels.
As expounded in the prior studies [38], a sophisticated region on the cortical surface is activated under the median nerve stimulation, including the contralateral primary somatosensory cortex (cSI), bilateral secondary somatosensory cortices (SII), and posterior parietal cortex (PPC). Fig. 7 depicts the average time courses for the four aforementioned cortical regions reconstructed by distinct algorithms. As shown in the plots, SI-SST, SI-STBF, and VB-GLM result in smoother time courses reconstruction than LORETA and SBL. Furthermore, SI-SST yields significantly smoother profile of reconstructed time courses than SI-STBF, even though they are modeled in similar spatial pattern. Notably, the peak for cSI at around 21 ms, which appears in reconstructed time courses of most methods, disappears in the reconstruction of SI-SST due to the imposed temporal smoothness. The peak is eliminated from the time course in SI-SST because it dose not acquire enough interpretation from the measurements to struggle from the further temporal constraint introduced by the underlying AR model. Overall, the reconstructed activation of   SI-SST has the temporal pattern in agreement with the prior studies [17].
The source imaging results of all the algorithms at 35 ms and 85 ms are depicted in Fig. 8. As reflected in the plots, all the methods are capable to locate the sources in cSI at 35 ms. LORETA and VB-GLM produce too diffuse estimations. On the contrary, SBL reconstructs the activation with extremely focal extents. It is a challenge to locate the activation at iSII area at 85 ms while only SI-SST, SI-STBF, and STTONNICA produce the source imaging with correct locations and plausible extents. Compared to SI-STBF, SI-SST presents more compact imaging result without outlier patches in agreement with the prior studies [38], [39].
2) Results With Face Processing Experiment MEG Data: This data is obtained from the OpenNeuro database. Its accession number is ds000117. The dataset was composed of simultaneous MEG/EEG recordings from 16 participants performing a simple visual recognition task from presentations of famous, unfamiliar and scrambled faces. The experimental design and multiple modalities data acquirement were expounded in [40]. Only the MEG data were under consideration in our analysis, which were recorded at 1100 Hz with an Elekta-Neuromag VectorView 306 system as well. The averaged response of MEG measurement obtained from the subject 01 during the famous face presentation epoch was used for the subsequent analysis. The pre-stimulus window was chosen as -100 ms to 0 ms and the post-stimulus window was 0 ms to 300 ms (0 ms means stimulus onset). The lead field matrix was acquired with Brainstorm based on overlapping sphere model with the cortical surface composed of 6000 vertices. The noise covariance was estimated from the empty room recordings.
The plots in Fig. 9 show the averaged time courses of sources at bilateral occipital and fusiform areas reconstructed  by different methods. For SI-SST, the estimated sources are activated with the peaks in both occipital and fusiform areas at around 100 ms. Additionally, the activations near the right fusiform gyrus have another peak at around 170 ms. With the usage of underlying AR model, SI-SST achieves the smoother reconstructed time courses compared to the benchmarks. Fig. 10 presents the source imaging results of all the algorithms at 100 ms and 170 ms under the famous faces stimulus. As shown in the plots, SI-SST successfully locates the neural activation at the occipital area at 100 ms and reconstructs the sources at bilateral fusiform at 170 ms. Most methods are capable to reconstruct the occipital activation and the right fusiform gyrus sources. However, SBL and SI-STBF fail to locate the source on the left ventral surface of fusiform while LORETA and VB-GLM produce diffuse reconstructions which cover many functionally irrelevant regions. In summery, SI-SST reconstructs the consistent results with the previous reports [15], [41] on both the time courses and activation regions.

V. DISCUSSION
Because of the inherent ill-poseness of the E/MEG source imaging problem, prior constraints are of great importance to acquire a meaningful solution in the context of neurophysiology. It has been widely accepted by previous studies [42], [43] that cortical activations enjoy the spatial coherence and local homogeneity. Multiple L 2 -norm based methods have been proposed according to these properties. Specifically, current sources are assumed to be activated with similar amplitudes in MNE [5]. Spatial smoothness is imposed in LORETA [6] with the usage of discrete Laplacian operator which represents the local correlation between dipoles. The L 2 -norm based methods have preference for diffused estimates [28], [34] in line with our numerical experiments. On the contrary, algorithms with sparseness enforced are appropriate for the focal sources reconstruction while they achieve the poor imaging performance with extended activations. Champagne [9] is a representative algorithm in which the spatial sparseness induced with the usage of individual covariance components for dipoles.
Smoothing kernel is typically used constraints in the spatial domain, through which the brain sources are transformed into a latent space with sparseness. In Smooth Champagne [10], the kernel is defined depending on the distances between voxels. Additionally, to represent the local spatial correlation over cortical surface, discrete Laplacican operator is employed in STTONNICA [30], VB-GLM [15], and BESITES [11]. Furthermore, in SI-SST, smoothing kernel is derived from a Green's function [21] based on the graph Laplacian operator. Notably, SI-STBF [19] has analogous configuration in spatial domain to SI-SST, but whose smoothing kernel is indirectly constructed through the covariance components.
Parameters sharing in sparseness encoding coefficients also makes a great difference to the source imaging results. For example, a global scale is employed in MNE and LORETA, with an implicit assumption that all the current sources have similar amplitudes. On the contrary, dipoles are equipped with individual parameters in SBL, which readily acquires focal sources reconstruction. Additionally, SI-SST has the sparseness encoding coefficients derived from DDP [28] over the cortical surface. Moreover, owing to the overlapped clusters generated from DDP, a novel update rule is proposed in SI-SST while the convexity based approach becomes invalid. It should be emphasized that smoothing kernel and parameters sharing need be regarded as an unified system instead of independent ingredients. They collaboratively determine the source imaging performance of the entire model. As shown in Fig. 2, distinct algorithms are appropriate to reconstruct the sources with different extents. Specifically, SBL has perfect performance in focal source reconstruction while LORETA is suitable to dual with extremely extended activation. In contrast, SI-SST achieves the best reconstructed results in a specific range of extents and has the degraded performance with extremely focal and extended sources. Besides, SI-STBF has the similar pattern in spatial modeling with SI-SST. However, it obtains slightly worse recovery results owing to the difference in parameters sharing. Furthermore, in Smooth Champagne, spatial smoothing kernel is defined with great smoothness degree and parameters sharing is constructed based on small amounts of non-overlapping clusters of the voxels. As shown in the supplemental material, compared to Smooth Champagne, SI-SST acquired superior performance in source localization and time courses reconstruction except that sources were activated with huge spatial extents. Therefore, smoothing kernel and parameters configuration should be modified simultaneously to achieve the satisfactory sources imaging performance within a range of extents as wide as possible.
In addition to manually fine tuning the smoothing kernel, learnable parameters can be introduced such that the algorithm becomes adaptive to source imaging tasks with a wide range of extents. Specifically, in BESTIES, a parameter integrated in MRF model, which controls the smoothness degree, is updated with the other coefficients simultaneously under the empirical Bayesian framework. Analogously, in STTONNICA [30], a smoothness related parameter is also integrated in the smoothing kernel. However, the optimization of them are both computationally inefficient. More concretely, the former involves calculating the inverse of a matrix with great dimensions in every epoch, while the latter utilizes grid-search for hyper parameters update. How to construct the flexible smoothing kernel with learnable parameters, which can be updated in an efficient fashion, still remains a challenge.
Apart from the prior constraints in the spatial domain, the employment of underlying temporal correlation is effective to promote the localization accuracy and robustness against the noise. Temporal basis functions (TBFs) have been proven effective in source reconstruction by the prior studies [18]. With the usage of TBFs, SI-SST, SI-STBF and VB-GLM are more likely to reconstruct smoother time courses compared to LORETA and SBL as depicted in Fig. 4. In both SI-STBF and SI-SST, TBFs are updated by variational Bayesian inference. The irrelevant ingredients with little interpretation of measurements are pruned automatically thanks to the automatic relevance determination (ARD) mechanism [44]. Additionally, because of the further temporal smoothness imposed with an underlying autoregressive model, compared to SI-STBF, SI-SST not only results in smoother time courses reconstruction reflected in Fig. 7 and Fig. 9, but also becomes more robust against the noise as shown in Fig. 5. As discussed in the supplemental document with three different scenarios of examples, SI-SST is capable to maintain the essential high frequency information of sources from severely corrupted measurements. Moreover, SI-SST acquire the smoothest PSD curve among the compared methods such that the redundant frequency components are pruned.
The proposed algorithm can be further promoted in multiple aspects. First, learnable parameters are integrated into the smoothing kernel as in STTONNICA [30] and BESTIES [11], such that the smoothness degree is allowed to be adaptively modified. However, effective update rule should be proposed collaboratively to obviate the expensive computational cost. Furthermore, the cross-clusters effect can be taken into account. The generative model in (8) is generalized by introducing the cross clusters terms. Alternatively, they are directly integrated into the covariance components in (10). Additionally, temporal constraints can be constructed in other manners. For instance, Gaussian distributions can be imposed to the second derivative of distinct TBFs, which can be represented as the variation model in [12]. The redundant components are pruned automatically with the help of ARD mechanism.

VI. CONCLUSION
The current study presented an E/MEG source imaging algorithm termed SI-SST, in which the smoothness is imposed in both spatial and temporal domain, via the employment of a spatial smoothing kernel and an underlying autoregressive model respectively. Sparseness encoding coefficients are constructed based on data driven parcelization of the cortex with overlapped clusters. Because the widely used convex approximation becomes invalid in this situation, we proposed a novel update rule derived from the fixed-point criterion. Entire variables and parameters in the model are updated simultaneously in a data-driven fashion under empirical Bayesian framework. SI-SST was assessed by multiple validation metrics with both simulated and experimental datasets. Numerical simulations showed that, compared to the benchmark methods, SI-SST had the superior imaging performance in a wide range of extents, and was more robust against diverse intensities of noise.