By Topic

• Abstract

SECTION I

## INTRODUCTION

Electroencephalography (EEG) is the recording of electric potentials produced by the local collective partial synchrony of electrical field activity in cortical neuropile, today most commonly measured by an array of electrodes attached to the scalp using water-based gel [1], [2]. EEG is the most widely known and studied portable noninvasive brain imaging modality; another, less developed and not considered here, is functional near-infrared spectroscopy (fNIR). The first report of signals originating in the human brain and recorded noninvasively from the scalp was that of Berger in 1924 [3]. Half a century later both engineers and artists begin to seriously consider the possible use of EEG for active information exchange between humans and machines [4]. It is now generally accepted that the spatio–temporal EEG activity patterns correlate with changes in cognitive arousal, attention, intention, evaluation, and the like, thereby providing a potential “window on the mind.” However, the biological mechanisms that link EEG patterns to these or other aspects of cognition are not understood in much detail [2].

A companion paper [5] describes how, unlike most other forms of functional brain imaging available today [functional magnetic resonance imaging (fMRI), magnetoencephalography (MEG), positron emission tomography (PET)], EEG sensor systems can be made comfortably wearable and thus potentially usable in a wide range of settings. Another companion paper [6] further explores how advances in brain signal processing and deeper understanding of the underlying neural mechanisms may make important contributions to enhancing human performance and learning. The main focus of this paper is the description of the current state and foreseeable trends in the evolution of signal processing approaches that support design of successful brain–computer interface (BCI) systems that deliver interactive cognitive and mental assessment and/or user feedback or brain-actuated control based on noninvasive brain and behavioral measures. Brain–computer interactions using invasive brain measures, while also of intense current research interest and demonstrated utility for some applications [7], [8], [9], will here be discussed only briefly.

We believe that in the coming decades adequate real-time signal processing for feature extraction and state prediction or recognition combined with new, noninvasive, and even wearable electrophysiological sensing technologies can produce meaningful BCI applications in a wide range of directions. Here, we begin with a brief primer on the neuroscientific basis of cognitive state assessment, i.e., the nature of the EEG itself, followed by a review of the history and current state of the use of signal processing in the relatively young BCI design field and then consider avenues for its short-term and medium-term technical advancement. We conclude with some thoughts on potential longer term developments and perspectives.

### A. What is EEG?

Electrical activity among the estimated twenty billion neurons and equal or larger number of nonneural cells that make up the human neocortex (the outer layer of the brain) would have nearly no net projection to the scalp without the spontaneous appearance of sufficiently robust and/or sizable areas of at least partial local field synchrony [1], [2], [10]. Within such areas, the local fields surrounding pyramidal cells, aligned radially to the cortical surface, sum to produce far-field potentials projecting by passive volume conduction to nearly all the scalp EEG sensors. These effective cortical EEG sources also have vertical organization (one or more net field sources and sinks within the six anatomic layers), though currently recovery of their exact depth configuration may not be possible from scalp data alone. At a sufficient electrical distance from the cortex, e.g., on the scalp surface, the projection of a single cortical patch source strongly resembles the projection of a single cortical dipole termed its equivalent dipole [11], [12].

The very broad EEG spatial point-spread function, simulated in Fig. 1 using a realistic electrical forward head model [13], [14], [15], means that locally synchronous activities emerging within relatively small cortical areas are projected and summed at nearly all the widely distributed scalp electrodes in an EEG recording [16]. Unfortunately, the naive viewpoint that EEG potential differences between each scalp channel electrode and a reference electrode represent a single EEG signal originating directly beneath the active scalp electrode continues to color much EEG analysis and BCI design.

Fig. 1. (Left) Simulation of a cm2-scale cortical EEG source representing an area of locally synchronized cortical surface-negative field activity, and (right) its broad projection to the scalp. From an animation by Zeynep Akalin Acar using a biologically realistic Boundary Element Method (BEM) electrical head model built from an individual subject MR head image using the Neuroelectromagnetic Forward Head Modeling Toolbox (NFT) [13] and Freesurfer [14].
Fig. 2. A conceptual schematic overview of evolving BCI design principles. Data obtained from sensors and devices within, on, and around a human subject (bottom left) are transformed into informative representations via domain-specific signal preprocessing (middle left). The resulting signals are combined to produce psychomotor state representations (upper left) using general-purpose inference methods, producing timely estimates of the subject's cognitive, affective, and sensorimotor state, including their cognitive and affective responses to events and cognitive/behavioral intent). These estimates may be made available to the systems the subject is interacting with. Similar cognitive state information derived from (potentially many) other subjects (lower right), intelligently combined, can shape statistical constraints or priors (middle right), thereby enhancing individual state models. Mappings between data representations and psychomotor states (left) allow exploratory data modeling (top right), providing new hypotheses that can guide the confirmatory scientific process (middle right).

When working with EEG it is also important to bear in mind that the circumstances in which local cortical field synchronies appear are not yet well understood, nor are the many biological factors and influences that determine the strongly varying time courses and spectral properties of the EEG source signals. Our relative ignorance regarding the neurobiology of EEG signals is, in part, a side effect of the 50-year focus of the field of animal electrophysiology on neural spike events in single-cell neural recordings [17]. During much of this period, studies of the concurrent lower frequency spatio–temporal field dynamics of the cortical neuropile were rare, though Freeman observed and modeled emergent, locally near-synchronous field patterns [18] he terms “phase cones” [19] and more recently Beggs and Plenz have modeled similar “avalanche” events [20], both descriptions consistent with production of far-field potentials that might reach scalp electrodes.

#### 1) Nonbrain EEG Artifacts

In addition to a mixture of cortical EEG source signals, each scalp EEG recording channel also sums potentials from nonbrain sources (artifacts) and channel noise. Fortunately, in favorable recording circumstances (e.g., using modern recording equipment well connected to a quiet subject in an electrically quiet laboratory) EEG sensor noise is relatively small. However, the strength of contributions from nonbrain sources (eye movements, scalp muscles, line noise, scalp and cable movements, etc.) may be larger than the contributions of the cortical sources. EEG recorded outside the laboratory using new wearable EEG systems with variable conductance between the electrodes and scalp must also take into account and handle possible large, nonstationary increases in EEG sensor, and/or system noise relative to laboratory recordings. Thus, for robust and maximally efficient cognitive state assessment or other BCI applications, explicit or implicit identification and separation of brain source signals of interest from nonbrain and other, less relevant brain signals is important [21].

#### 2) Multiscale Recording and Analysis

A major obstacle to understanding how the brain supports our behavior and experience is that brain dynamics are inherently multiscale. Thus, their more complete understanding will likely require the development of extremely high-density, multiresolution electrical imaging methods [22]. Unfortunately, to date cortical field recordings sufficiently dense to fully reveal the spatio–temporal dynamics of local cortical fields across spatial scales are not yet available. We believe that the most effective real-world applications using EEG signals will depend on (but may also contribute to) better understanding of the biological relationships between neural electrical field dynamics and cognitive/behavioral state. This knowledge is currently still largely inferred from observed correlations between EEG measures and subject behavior or experience, although efforts are underway both to observe the underlying biological phenomena with higher resolution [23], [24] and to model the underlying biological processes mathematically [25], [26], [27] in more detail.

#### 3) The EEG Inverse Problem

Recovery of the cognitive state changes that give rise to changes in observed EEG (or other) measures fundamentally amounts to an inverse problem, and although at least the broad mixing of source signals at the scalp is linear, recovery of the (latent) source signals from given scalp data without additional geometric constraints on the form of the source distributions is a highly underdetermined problem [28]. Even when given an accurate electric forward head model [15] and a near-exact cortical source domain model constructed from the subject's magnetic resonance (MR) head image, finding the sources of an observed EEG scalp pattern remains challenging. However, finding the source of a “simple” EEG scalp map representing the projection of a single compact cortical source domain allows for favorable assumptions (as discussed below) and is thereby more tractable.

#### 4) Response Averaging

Most recent approaches to estimating EEG source spatial locations or distributions have begun by averaging EEG data epochs time locked to some class of sensory or behavioral events posited to produce a single mean transient scalp-projected potential pattern. This average event-related potential (ERP) [29] sums projections of the (typically small) portions of source activities in relevant brain areas that are both partially time locked and phase locked (e.g., most often positive or negative) at some fixed latencies relative to the events of interest. Average ERPs were arguably the first form of functional human brain imaging, and the study of scalp channel ERP waveforms has long dominated cognitive EEG research.

ERP models have been the basis of many BCI designs as well. Unfortunately, ERP averaging is not an efficient method for finding scalp projections of individual EEG source areas other than those associated with the earliest sensory processing. Also, average ERPs capture only one aspect of the EEG activity transformation following meaningful events [30]. BCI designs based on an ERP model therefore ignore other information contained in EEG data about subjects' cognitive responses to events, and also require knowing the times of occurrence of such events. Opposed to these are BCI methods that continuously monitor the EEG data for signal changes in the power spectrum and other higher order statistics, often data features derived from latent source representations of the collected signals.

#### 5) Blind Source Separation

In the last 20 years, methods have been developed for estimating the latent time courses and spatial projections of sources of spontaneous or evoked EEG activity. Independent component analysis (ICA) and other blind source separation (BSS) methods use statistical information contained in the whole data to learn simple maps representing the projections of individual EEG source areas to the scalp channels [31], [32]. These can also aid inverse source localization methods in spatially localizing the sources of both ongoing and evoked EEG activity [33], [34]. Recently, we have demonstrated that measures of source signals unmixed from the continuous EEG by ICA may also be used as features in BCI signal processing pipelines, with two possible advantages. First, they allow more direct use of signals from cortical areas supporting the brain processes of interest, unmixed from other brain and nonbrain activities [35]. Second, source-resolved BCI models allow for examination of the anatomically distinct features contributing most information, and thereby can inform neuroscientific inquiry into the brain processes that support the cognitive process of interest [36].

To date, most BCI signal processing research has not concentrated on neurophysiological interpretation. We argue, however, that treating the EEG and other data used to design and refine a successful BCI as unknown signals from a biological “black box” is unlikely to produce as efficient algorithms as those operating on better neuroscientifically informed and interpretable data models; in particular, informed models may have less susceptibility to overfitting their training data by incorporating biologically relevant constraints. BCI research should remain, therefore, an enterprise requiring, prompting, and benefiting from continuing advances in both signal processing and neuroscience.

SECTION II

## EARLY BCI DESIGNS

BCI design is still a relatively young discipline whose first scientific formulation was in the early 1970s [4]. In its original definition, the term referred to systems that provide voluntary control over external devices (or prostheses) using brain signals, bypassing the need for muscular effectors [37], originally aimed at restoring communication for cognitively intact but completely paralyzed (locked in) persons. This is a somewhat restrictive definition, thus various extensions have been proposed in recent years as the field has grown. These include “hybrid BCIs” [38] that relax the restriction of input signals to brain activity measures to possibly include other biosignals and/or system state parameters, and “passive BCIs” [39], [40] that produce passive readout of cognitive state variables for use in human–computer applications without requiring the user to perform voluntary control that may restrict performance of and attention to concurrent tasks. Over the last 3–5 years, these developments have opened a steadily widening field of BCI research and development with a broad range of possible applications [41].

Since BCI systems (under any definition) transduce brain signals into some form of control or communication signals, they are fundamentally brain (or multimodal) signal processing systems. Indeed, the earliest tested BCI systems were essentially built from single-channel bandpower filters and other standard signal processing components such as the surface Laplacian defined a priori [42], [43]. These primitives were found to detect some features of brain signals relatively well, such as the circa 11-Hz central mu rhythm associated with motor stasis [44] over which many (but not all) subjects can gain voluntary control, or some wavelet-like ERP peak complexes found to indicate enhanced cognitive evaluation of an event by the subject, such as the “P300” complex following anticipated events [30], [45].

The purpose of applying these filtering methods was to emphasize relevant combinations of cortical source activities associated with the subject's movement intent or imagination. These original designs typically had preselected parameters, for example, frequency band(s) that were at best only slightly adapted to individual users. Weeks to months of practice were typically required for a user to acquire the skill of controlling a device (for example, a cursor) using these early BCI systems, as subjects learned to adapt their brain waves to match the expectations of the BCI designers [46]. Not surprisingly, such systems were widely considered to be of foreseeable practical use only to a relatively few cognitively intact “locked in” users suffering near-complete loss of muscular control [47], [48].

### A. Introduction of Machine Learning Approaches

In the early 1990s, the BCI field saw a paradigm shift with the influx of adaptive signal processing and adaptive learning ideas. One such thrust was inspired by the understanding that neural networks are capable of adapting to the information structure of a very wide range of source signals “blindly” without foreknowledge of the specific nature of the transformations needed to produce more informative representations. This resulted in the first BCI research applications of ICA [21], [49], anatomically focused beamforming [50], and other neural network learning methods (unsupervised and supervised), which have produced a series of novel insights and successful applications [51], [52], [53].

A concurrent second approach to subject adaptivity introduced classical statistical learning into the BCI field, one of the simplest examples being Fisher's discriminant analysis (FDA) and the related linear discriminant analysis (LDA) [54], and its later regularized extensions [55], [56], all of which have been applied to EEG and other forms of electrophysiological data with distinct success. Today, these are among the most frequently used statistical methods for BCI design [51], [57], [58]. Under some conditions linear models like these can be shown to discover optimal statistical models linking input patterns to output signals [58]. In themselves, however, off-the-shelf machine learning (ML) tools cannot solve the statistical problems arising from rarely (if ever) having access to enough model training data to completely avoid overfitting. This results in a lack of generality and robustness to changes in the many aspects of the recorded signals that do not contribute directly to the parameters of interest. In part, this is because these methods require information from both ends of the brain signal decoding pipeline to find the desired model parameters: input data in some appropriate representation, and corresponding desired output values. Information about the desired output is often irregularly and sparsely available, and must usually be extracted from dedicated calibration measurements—not unlike using contemporary voice recognition software.

ML is not yet a consolidated field, but rather a broad assortment of techniques and algorithms from a variety of schools or conceptual frameworks such as neural networks [59], statistical learning theory [60], [61], decision theory [62], or graphical models [63]. Yet today ML plays a fundamental role in BCI design because the functional role of any given brain source or the precise configuration of a source network may be specific to the individual and, as a consequence, not identifiable in advance [64]. This is the case both at the near cm2-scale of locally synchronous cortical EEG source patches and at finer spatial scales [65]. For this reason, the modeling is usually framed as a “supervised” ML problem in which the task is to learn a mapping from some input (feature) space onto an output (category) space from a set of (input, output) training data examples extracted from a dedicated calibration recording [66]. A noteworthy complementary approach is “unsupervised” learning that captures structure latent in the input space under certain assumptions without use of “ground truth” target values [60].

#### 1) Gaussian Assumptions

Several well-known assumptions underlie the majority of popular ML approaches in BCI research. One that has strongly influenced the design of adaptive BCI systems is Gaussianity of input-space distributions. This assumption tends to make a vast range of statistical problems analytically tractable, including those modeling brain processes and their functional associations via methods such as linear and quadratic discriminant analysis [58], linear and ridge regression [67], Gaussian mixture models [68], kernel formulations such as Gaussian process regression [69] as well as most BCI approaches built on signal covariance, like common spatial patterns [70] or the dual-augmented Lagrangian (DAL) approach [71]. However, the BCI field is increasingly running into the limits of this not quite justified assumption.

For input features based on scalp EEG measurements, a Gaussian assumption might be defended by application of the central limit theorem to the multitude of stochastic processes that contribute to the signal. However, measurable EEG signals of interest are typically generated by field activities of highly dependent neural processes. Scalp-recorded brain signals are also contaminated by a variety of often large, sporadic rare nonbrain source artifacts [72], [73]. Both these factors can render the probability density functions of the observed signal distributions heavy-tailed, strongly distorting estimates made using Gaussian assumptions. Improving on these assumptions, however, requires additional computational and theoretical machinery.

SECTION III

## CURRENT BCI DESIGN DIRECTIONS AND OPPORTUNITIES

Below we discuss a variety of emerging or foreseeable near-term directions and avenues for improvement in developing models for online cognitive state assessment. We point out a variety of possible advantages derivable from explicit or implicit source representations, such as the ability to compute informative source network properties. Source representations also allow coregistering large pools of empirical data whose shared statistical strength may improve estimation accuracy, robustness, and specificity under real-world conditions. We then discuss the important multimodal data integration problem.

### A. ICA and Related Latent Source Models

Advances in signal processing and ML affect all aspects of EEG analysis [74]. BSS, in particular ICA [31], [32], [75], [76], [77], while still far from being universally adapted, has had a large effect on the EEG field in the past decade, playing a significant role in removal of artifacts from EEG data [78], in analysis of EEG dynamics [32], [79], [80], [81], for BCI design [36], [82], [83], [84], and in clinical research applications [33], [85], [86]. The basic ICA model assumes the multichannel sensor data are noiseless linear mixtures of a number of latent spatially stationary, maximally independent, and non-Gaussian distributed sources or source subspaces. The objective is to learn an “unmixing” matrix that separates the contributions of these sources (or source subspaces) from the observed channel data based on minimizing some measure of their temporal dependency.

While linear propagation to and summation of EEG signals at the scalp channels is a safe assumption [1], the maximal independence and spatial stationarity assumptions used in temporal ICA may hold less strictly in some cases. Thus, future directions in BCI research based on ICA may exploit related multiple mixture [87], [88], convolutive mixture [89], and adaptive mixture [88] models that have been introduced to model spatio–temporal nonstationarity [90], [91], or independence within specific frequency bands [92] and other subspaces [93], [94], or to integrate other tractable assumptions [95]. Although the details of cortical geometry and hence, source scalp projections, as well as source temporal dynamics vary across individuals, accumulating some source model information through simultaneous processing of data from multiple subjects might prove beneficial [96].

ICA does not use an explicit biophysical model of source and channel locations in an electrical forward head model. While this might be seen as an insufficiency of the approach, it may also avoid confounding effects of head and conductance modeling errors while making efficient use of statistical information in the data. Despite the demonstrated utility of advanced ICA and related algorithms, because of the lack of “ground truth” in typical EEG data sets, its real-world estimation errors are not easily quantified. Continued work on statistical modeling and validation [97] are needed to assess the reliability of ICA separation [35], [98], [99], [100] and to minimize propagation of estimation errors through data modeling that follows ICA decomposition.

Comparing brain processes across individuals is another important problem both for EEG analysis and for building BCI models using data from more than one subject. The variability of folding of the cortical surface across individuals means it is not sufficient to simply identify component processes common to two or more individuals by their scalp projection patterns. Promising avenues for future methods development here include joint diagonalization [101] and extracting equivalent component clusters or brain domains using relevant constraints including their coregistered 3-D equivalent dipole positions [35].

### B. Unsupervised Learning and Adaptive Filtering

Unsupervised learning [60] and adaptive signal processing [102] generally both perform adaptive modeling and transformation of data samples. Among the original examples of their use for cognitive state assessment are ICA [49], adaptive noise canceling [103], and variants of the Kalman filter [104]. More recently, work has expanded into the promising areas of dictionary learning [105], unsupervised deep learning [106], and entirely new directions such as stationary subspace analysis (SSA) [107]. One of the currently most popular BCI algorithms, common spatial patterns (and its 20 or more extensions) [70], [108], [109], [110], [111], can also be viewed as producing adaptive spatial filters, although adapted using a supervised cost function involving a categorical target or label variable. Many of these techniques serve either of two purposes. The first is to generate better (e.g., more informative, interpretable, or statistically better behaved) signal features or latent variables based on information readily available in the signal itself, ideally features that make subsequent processing tractable (or trivial). A second goal is to alleviate (or account for, as in coadaptive calibration) the effects of nonstationarity in the underlying brain and/or nonbrain processes, an important avenue of development that could affect almost all BCI methodology [112], [113].

### C. Sparsity Assumptions

Signal processing exploiting data or parameter sparsity is now emerging as a central tool in BCI design as in many other disciplines and can serve to express assumptions of compactness, nonredundancy, or mutual exclusivity across alternative representations. When applied to suitable data, sparse signal processing and modeling approaches can achieve dramatically better statistical power than methods that ignore sparsity, particularly when applied to sparse but very high-dimension data [114], [115], [116], [117]. Sparse representations may also be regarded as a numerical application of Occam's razor (“among equally likely models the simplest should be favored”). For example, because of functional segregation in cortex, constellations of brain EEG sources linked to a specific aspect of cognitive state of interest (for example, imagining an action) may be assumed to be a sparse subset of the entire source activity [118], [119].

A useful application of sparse signal processing is to precisely estimate EEG source distributions [120], [121], [122], [123]. Potential real-time EEG applications could include online scalp and intracranial EEG source imaging to guide neurosurgeons [50], [124]. Sparse Bayesian learning (SBL) [125] is a particularly promising framework for source localization and modeling of spatio–temporal correlations among sources [16], [126], [127]. Some other popular source localization algorithms are special cases of SBL and can be strengthened within the SBL framework [128]. While not designed for real-time implementation, SBL speed can be enhanced [63], [129].

Spatio–temporal (e.g., groupwise) sparsity has been applied successfully to source signal connectivity [130], [131], [132] (discussed below), where it can lead to substantial reductions in the number of observed data samples required to accurately model a high-dimensional, sparsely structured system. Well-established sparse regression methods such as Lasso [60] provide improved estimates of high-dimensional multivariate connectivity over both unconstrained and regularization approaches [130], [133], [134]. Sparse modeling may also use graph theoretic metrics to extract simple topological features from complex brain networks represented as directed or undirected graphs [132], [135], [136].

A complementary and often reasonable assumption is that the biosignal data are smooth across closely related parameters [137], [138]. Several recent further extensions of the sparsity concept, such as low-rank structure [71] or structured sparsity [139], can also be formulated as tractable convex and Bayesian estimation problems.

### D. Exploiting Dynamic Brain Connectivity

Historically, nearly all BCI systems have been based on composed univariate signal features. However, as the primary function of our brains is to organize our behavior (or more particularly, its outcome), modeling brain activity as a set of disjoint cortical processes clearly may ignore information in the EEG data about the complex, precisely timed interactions that may be required to fulfill the brain's primary role. Transient patterns of cortical source synchrony (or other dynamics) that modulate information transmission among noncontiguous brain areas are posited to play critical roles in cognitive state maintenance, information processing, and motor control [140], [141], [142], [143]. Therefore, an ability of BCI systems to monitor dynamic interactions between cortical source processes could provide key information about unobserved cognitive states and responses that might not be obtainable from composed univariate signal analyses [142].

#### 1) Effective Versus Functional Connectivity

Functional connectivity refers to symmetric, undirected correlations among the activities of cortical sources [144]. The earliest functional connectivity studies examined linear cross correlation and coherence between measured EEG scalp signals [145], [146]. These techniques alone carry a serious risk of misidentification in systems involving (closed-loop) feedback, subject to correlated noise, or having strong process autocorrelation [147], [148], [149]. Although neural systems typically exhibit one or more of these characteristics [150], cross correlation and coherence are still among the most commonly used methods for connectivity analysis in the neurosciences [151], [152].

A general deficit of functional connectivity methods is that, being correlative in nature, they cannot be used to identify asymmetric information transfer or causal dependencies between cortical sources. Thus, they cannot distinguish, for instance, between “bottom–up” (sensory $\rightarrow$ cognitive) and “top–down” (cognitive $\rightarrow$ sensory) interactions between a set of sources. In contrast, effective connectivity denotes directed or causal dependencies between brain regions [144]. Currently popular effective connectivity methods include dynamic causal modeling (DCM), structural equation modeling (SEM), transfer entropy (TE), and Wiener–Granger causal (WGC) methods, plus related multivariate methods including the directed transfer function (DTF) (reviewed in [151], [152], [153], [154]). Because of the potentially better fidelity of source-level multivariate effective connectivity models to the underlying cortical dynamics, we foresee a shift in BCI design research in these directions.

#### 2) Confirmatory Versus Exploratory Modeling

Methods for effective connectivity analysis generally fall into two categories: confirmatory and exploratory [155]. Confirmatory methods are hypothesis and model driven, seeking to identify the most plausible model among a finite (generally small) set of valid candidates. Conversely, exploratory methods are data driven and capable of searching a large model space without requiring a set of well-formed hypotheses. Confirmatory methods, such as DCM, have shown demonstrated utility in neurobiological system identification, and may be preferable for confirming a specific hypothesis [156]. However, due to the current paucity in accurate neurobiological models of networks underlying complex cognitive states, and the computational complexity of exploring very large model spaces using DCM, fast exploratory methods such as WGC [157], [158] and extensions thereof may be of greater utility for exploratory BCI research in the near future. As distributed neurobiological interactions are better understood, it will be fruitful to incorporate this understanding explicitly into BCI designs via model constraints or confirmatory model selection.

#### 3) Bivariate Versus Multivariate Connectivity

Recent preliminary BCI designs exploiting connectivity have utilized bivariate functional connectivity estimates such as spectral coherence and phase synchronization measures applied to scalp channel pairs [159], [160], [161] with mixed performance benefits [136], [162], [163]. While these studies have primarily focused on simple motor imagery tasks, the most significant gains from dynamic connectivity modeling seem likely to be achieved when the objective is to identify, in higher density data, a more complex cognitive state or event linked to a specific pattern of multisource network dynamics. However, for even moderately complex networks, bivariate connectivity methods suffer from a high false positive rate due to a higher likelihood of excluding relevant causal variables [164], [165], [166]. This leads to a higher likelihood of incorrectly linking the same connectivity structure to two or more fundamentally different cognitive states, potentially limiting BCI performance. As such, the use of multivariate methods is an important consideration in efficient BCI design.

#### 4) Source Versus Sensor Connectivity

Recent and current advances in source separation and localization of electrophysiological signals greatly expand possibilities for explicit modeling of cortical dynamics including interactions between cortical processes themselves. Assessing connectivity in the cortical source domain rather than between surface EEG channel signals has the advantage of greatly reducing the risk of misidentifying network events because of brain and nonbrain source mixing by volume conduction [167], [168]. Shifting to the source domain furthermore allows accumulating knowledge from functional neuroimaging and neuroanatomy to be used to constrain dynamic connectivity models. In particular, noninvasive diffusion-based MR imaging methods are providing increasingly more accurate in vivo estimates of brain anatomical connectivity that might also be used to constrain dynamic connectivity models based on localized EEG source signals.

#### 5) Adapting to Nonlinear and Nonstationary Dynamics

Electrophysiological data exhibit significant spatio–temporal nonstationarity and nonlinear dynamics [142], [150]. Some adaptive filtering approaches that have been proposed to incorporate nonstationarity include segmentation-based approaches [169], [170] and factorization of spectral matrices obtained from wavelet transforms [171]. However, these techniques typically rely on multiple realizations to function effectively, hindering their application in BCI settings. Among the most promising alternatives are state–space representations (SSR) that assume the observed signals are generated by a partially observed (or even fully hidden) dynamical system that can be nonstationary and/or nonlinear [172], [173], [174]. A class of methods for identifying such systems includes the long-established Kalman filter [175] and its extensions including the cubature Kalman filter [176], which exhibits excellent performance in modeling high-dimensional nonstationary and/or nonlinear systems [177]. These methods have led in turn to extensions of the multivariate Granger-causality concept that allow for nonlinearity and/or nonstationarity while (in part) controlling for exogenous or unobserved variables [174], [178], [179]. SSRs may also flexibly incorporate structural constraints [180], sparsity assumptions [181], and non-Gaussian, e.g., sparse (heavy-tailed) process distribution priors [182]. A final advantage of the state–space framework is the potential to jointly perform source separation and/or localization (as in ICA) together with identification of source dynamics and their causal interactions, all within a single unified state–space model [134], [183].

The developments we briefly describe above suggest that robust and efficient exploratory causal identification in high-dimensional, partially observed, noisy, nonstationary, and nonlinearly generated EEG and other electrophysiological signal sets may become a reality in the coming decades. Leveraging the benefits of such approaches to maximize the range and robustness of brain-based prosthetic control and cognitive state assessment capabilities has great potential to become a key area of BCI research and development.

### E. Unified Modeling Approaches

Sparsity and Gaussianity are principles integral to both machine learning and signal processing, exemplifying the wide-ranging low-level connections between these disciplines despite differences in their problem formulations. Other well-known links are convex optimization and graphical models. Deep connections like these tend to seed or enable the development of new methods and frameworks in interdisciplinary domains such as BCI design [71], [125], [184] and will likely be anchors for future BCI methodology. For instance, while a common operating procedure in current BCI system design is to extract and pass domain-specific features through a processing pipeline built of standard signal processing and ML blocks, the resulting approach may be neither a principled nor an optimal solution to the overall problem. For example, it has been shown that several of the most commonly used multistage BCI approaches (including CSP followed by LDA) can be replaced by a single joint optimization solution (dual-spectral regularized logistic regression) that is provably optimal under principled assumptions [71]. A similar unification can also be naturally realized in hierarchical Bayesian formulations [184]. Unified domain-specific approaches like these may however require mathematically sophisticated problem formulations and custom implementations that cut across theoretical frameworks.

### F. General Purpose Tools

However, it is now easier than ever for application-oriented scientists to design, verify, and prototype new classes of methods, thanks to powerful tools like CVX for convex optimization [185], BNT [186], BUGS [187], or Infer. NET [188] for graphical modeling, and more specialized but fast and still generalizable numerical solvers like DAL [189], glm-ie [190], or ADMM [191]. These and other state-of-the-art tools are transforming what would have been major research projects only 3–5 years ago into Matlab (The Mathworks, Inc.) three liners that are already finding their way into graduate student homework in statistical estimation courses. As unified modeling and estimation/inference frameworks become easier to use, more powerful, and more pervasive, developing principled and more close-to-optimal solutions will require far less heavy lifting than today, leading to our expectation that they will soon become the norm rather than the exception.

### G. Mining Large Data Resources

Overcoming the moderate performance ceiling of the current generation of BCI systems can be viewed as one of the strongest challenges for BCI technology development in the 21st century. One possible answer may lie in “scaling up” the problem significantly beyond currently routine data collection and signal processing limits. In the future, more and more intensive computations may likely be performed to calibrate brain activity models for individual users, and these may take advantage of increasing volumes of stored and/or online data. The potential advantages of such dual approaches to difficult problems are common concepts in the current era of “big data” but have not yet been much explored in BCI research.

For example, current recording channel numbers are orders of magnitude smaller than what will soon be or are already possible using ever-advancing sensing, signal processing, and signal transmission technology (see [5]), though it is not known when diminishing returns appear in the amount of useful information about brain processes that can be extracted from EEG data as channel numbers increase. Also, adaptive signal processing methods might continue to adapt and refine the model of brain processes used in working BCIs during prolonged use, thereby enhancing their performance beyond current time-limited laboratory training and use scenarios. In the majority of contemporary BCI systems, the amount of training data used for estimation of optimal model parameters amounts to not more than a one-hour single-subject calibration session, though for stable ML using much more data would be preferable.

Another major limiting factor in BCI performance lies in the tradeoff between adapting a model to the complex context in which calibration data are initially acquired (for example, subject fatigue or stress level, noise environment, subject intent, etc.) and the desirable ability of the model to continue to perform well in new operational conditions. Although most BCI training methods attempt to maximize generalization of their performance across the available range of training and testing data, the lack of truly adequate data that span a sufficiently rich range of situations that could be potentially or likely encountered in practice severely limits the currently achievable generalization of BCI performance.

#### 1) Collaborative Filtering

For some applications, “zero-training” [192] or “cross-subject” BCI designs, which are usable without any need for individual calibration, are highly desirable or even necessary. However, pure zero-training BCI methods sacrifice performance compared to BCI designs using individualized models. A promising solution is to combine some training data or information from a targeted subject with stored data and information from a large collection of training sets collected from similar subjects [67], [111], [193]. While incorporating relatively small amounts of such data might give only marginal improvement, the availability of massive amounts of data can make tractable learning in previously unthinkably sized parameter spaces, thereby gaining enough statistical power to tune predictive models to more complex and highly specific brain and behavioral models.

The continued scaling of both availability and cost of computational resources and memory resources makes possible the use of methods for production work that were infeasible only a few years ago, particularly when it comes to mining massive data sets (for example, brain and behavioral recordings from hundreds to many thousands of people) or solving optimization problems that involve hundreds of thousands of parameters (for example, large-scale functional connectivity estimates or full joint time/space/frequency modeling of brain dynamics).

As a consequence, it has become possible to replace usual ad hoc simplifying assumptions (including the need for massive data dimensionality reduction) [194], [195] by data-relevant assumptions (such as source or model sparsity or smoothness)—or to propose to entirely displace those assumptions by processing vast samples of routine (or perhaps even high-quality) data from a large group of subjects—and thereby achieve asymptotic optimality under quite mild conditions. Only after the first such projects are carefully explored will it be possible to empirically estimate the ultimate BCI system performance bounds attainable for a given goal and data type, thereby informing further future roadmaps for sensor and processing technology development.

The potential effectiveness of such approaches is rooted in the genetic and social commonalities across subjects that make it possible to find statistical support in the form of empirical priors or constraints across populations (or subpopulations) of users. In practice, this direction requires further generalization of the problem formulation and some dedicated assumptions as to the commonalities that are to be exploited (for example, via coregistration or alignment of individual signal representations [196]). This further increases the need for approaches that are both highly adapted to the particular data domain yet principled and (ideally) provably optimal under reasonable assumptions.

#### 2) Transfer Learning

A theoretical framework that extends ML in these directions has recently been termed transfer learning [197]. Transfer learning approaches [198] have been successfully applied in a variety of fields including computer vision [199] and natural language processing. First, similar approaches have recently been demonstrated for BCI training [67], [200]. Only algorithms that are designed to exploit commonalities across similar users (for example, users of similar age, gender, expertise, etc.), tasks (for example, detecting either self-induced or machine errors), and operational context (e.g., while composing a letter or a piece of music) will be able to leverage this soon-available auxiliary data to maximum effect.

Collaborative filtering and transfer learning methods [201] designed to take advantage of such databases have been in development for several years in other areas, mostly fueled by needs of companies with ever-growing data resources such as Amazon [202], NetFlix [203], and Google [204]. These methods try to estimate some information about each user (e.g., their movie preferences, or for BCI their brain activity patterns) from small amounts of information from that user (text query or brain signal) combined with incomplete stored information from many other users (even concurrently arriving information as in crowd sourcing). These techniques have the potential to elevate the performance and robustness of BCI systems in everyday environments, likely bringing about a paradigm shift in prevailing attitudes toward BCI capabilities and potential applications.

### H. Multimodal BCI Systems

Proliferation of inexpensive dry and wireless EEG acquisition hardware, coupled with advances in wearable electronics and smartphone processing capabilities may soon result in a surge of available data from multimodal recordings of many thousands of “untethered” users of personal electronics. In addition to information about brain activity, multimodal BCI systems may incorporate other concurrently collected physiological measures (respiration, heart and muscle activities, skin conductance, biochemistry), measures of user behavior (body and eye movements), and/or ongoing machine classification of user environmental events and changes in subject task or challenge from audio, visual, and even thermal scene recording. We have recently termed brain research using concurrent EEG, behavioral, and contextual measures collected during ordinary motivated behavior (including social interactions) mobile brain/body imaging (MoBI) [205]. Behavioral information may include information regarding human body kinematics obtained from motion capture [206], facial expression changes from video cameras or EMG sensors [207], and eye tracking [208]. This information will likely be partially tagged with high-level contextual information from computer classification.

Progress in modeling each data domain, then capturing orderly mappings between each set of data domains, and mapping from these to the target cognitive state or response presents significant challenges. However, the potential benefits of effective multimodal integration for a much wider range of BCI applications to the general population might be large, possibly even transformative [6]. The most effective approaches for comparing and combining disparate brain and behavioral data are not yet known. Some progress has been made in integrating brain activity with subject behavioral data in hybrid BCI systems [38] and MoBI [209], [210]. Joint recordings of EEG and functional near-infrared brain imaging data are possible and could be near-equally lightweight [211].

Through efforts to develop more accurate automated affect recognition systems, the field of affective computing has shifted toward incorporating multimodal data—multiple behavioral (posture, facial expression, etc.) as well as physiological measurements (galvanic skin response, cardiac rhythmicity, etc.) [212]. EEG dynamics have recently been shown to contain information about the emotional state of the subject [213] and even the affective quality of the music the subject is listening to [214] or imagining [215], with different patterns of neck and scalp muscle activity, recorded along with the EEG, contributing additional nonbrain but also potentially useful information.

To overcome the significant challenge of learning mappings between high-dimensional, sometimes noisy, and disparate brain and behavior modalities, it should be fruitful to draw on successful applications of multimodal analysis in established fields such as affective and context-aware computing [216], as well as more general advances in ML such as nonlinear dimensionality reduction, nonnegative matrix and tensor factorization [95], multiview clustering and canonical correlation analysis [217], [218], [219], meta-classification approaches [220], [221], and hierarchical Bayesian models and deep learning networks [184], [222]. Further incorporation of natural constraints and priors derived from cognitive and systems neuroscience and neuroanatomy, as well as those obtained from multisubject transfer learning on large data sets, might allow design and demonstration of a new generation of powerful and robust BCI systems for online cognitive state assessment and other applications. A particular type of constraint or assumption that is still underrepresented in BCI and cognitive state assessment algorithms today is probabilistic cognitive modeling, which allows to incorporate finer grained and potentially very complex knowledge about statistical dependencies between cognitive state variables as well as their relationship to behavioral dynamics.

SECTION IV

## THE FUTURE EVOLUTION OF BCI METHODS

### A. BCI Technology for Cognitive and Mental State Assessment

For BCI technology to exploit information provided from expert systems modeling brain and behavioral data (such as EEG and motion capture and/or scene recording), in combination with rich if imperfect information about subject environment and intent, some modality-independent representations of high-level concepts including human cognitive and affective processing states could be highly useful. These representations could act as common nodes connecting different aspects of each “state concept.” As a simple example, an emotional tone might likely simultaneously affect EEG dynamics, gestural dynamics (captured by body motion capture), and facial expression (captured by video and/or EMG recording). Development of intermediate representations of cognitive state could not only facilitate the performance and flexibility of BCI systems, but also make their results more useful for other expert systems, again considering that both BCI systems and intelligent computer systems will doubtlessly grow far beyond their current levels of detail and complexity. Possibly such representations might not map simply onto current psychological terminology.

Availability of such representations and appropriate functional links to different types of physiological and behavioral data would make it possible to combine and exploit information from a variety of source modalities within a common computational framework. Cognitive state and response assessment, evolved to this stage, may lead to the development of more efficient and robust human–machine interfaces in a wide range of settings, from personal electronics and communication to training, rehabilitation, and entertainment programs and within large-scale commercial and military systems in which the human may increasingly be both the guiding intelligence and the most unreliable link.

### B. BCI Technology and the Scientific Process

Currently, we are witnessing massive growth in the amount of data being recorded from the human brain and body, as well as from computational systems with which humans typically interface. In future years and decades, mobile brain/body recording could easily become near ubiquitous [5]. As the amount of such data continues to increase, semiautomated data mining approaches will become increasingly relevant both for mundane BCI purposes such as making information or entertainment resources available to the user when they are most useful and also for identifying and testing novel hypotheses regarding brain support for human individual or social cognition and behavior. BCIs, considered as a form of data mining, will increasingly provide a means for identifying neural features that predict or account for some observable behavior or cognitive state of interest. As such, these methods can be of significant use for taking inductive/exploratory steps in scientific research. In recent years, this concept has been considered by a growing number of researchers [223], [224], [225], [226]. The meaningful interpretability of the features or feature structures learned by such methods is required to allow them make a useful contribution to scientific reasoning.

Scientifically accepted hypotheses about brain and behavior may be incorporated into subsequent generations of BCI technology, improving the robustness, accuracy, and applicability of such systems—both for BCI applications with immediate real-world utility as well as for data exploration for scientific purposes. Thus, machine intelligence may increasingly be used “in the loop” of the scientific process, not just for testing scientific hypotheses, but also for suggesting them—with, we believe, potential for generally accelerating the pace of scientific progress in many disciplines. Bayesian approaches are particularly well suited to hypothesis refinement (e.g., taking current hypotheses as refinable priors) and thus may represent a particularly appropriate framework for future generations of integrative BCI technology as continually expanding computational power makes more and more complex Bayesian analyses feasible.

### C. BCI Systems: The Present and the Future

Today, the BCI field is clearly observing an asymptotic trend in the accuracy of EEG-based BCI systems for cognitive state or intent estimation, a performance trend that does not appear to be converging to near-perfect estimation performance but rather to a still significant error rate (5%–20% depending on the targeted cognitive variable). For BCI technology based on wearable or even epidermal EEG sensor systems [227] to become as useful for everyday activity as computer mice and touch screens are today, technological and methodological breakthroughs will be required that are not likely to represent marginal improvements to current information processing approaches. Some such breakthroughs may be enabled by Moore's (still healthy) law that should continue to allow extended scaling up, likely also by orders of magnitude, both the amount of information integrated and the amount of offline and online computation performed. However, these processing capabilities almost certainly will need to be leveraged by new computational approaches that are not considered under today's resource constraints—thus quite possibly beyond those outlined or imagined here.

Continued advances in electrophysiological sensor technology also have enormous potential to allow BCI performance breakthroughs, possibly via extremely high-channel count (thousands) and high-signal-to-noise ratio (SNR; near physical limits) noninvasive electromagnetic sensing systems that, combined with sufficient computational resources, could conceivably allow modeling of brain activity and its nonstationarity at a range of spatio–temporal scales. Alternatively, to reach this level of information density, safe, relatively high-acceptance medical procedures might possibly be developed and employed to allow closer-to-brain source measurements (see [5]).

Continuing advances in BCI technology may also increase imaginative interest, at least, in future development of bidirectional, high-bandwidth modes of communication that bypass natural human sensory pathways, with potential to affect—positively and/or negatively—many aspects of human individual and social life and society. Meanwhile, the expanding exploration of potential means and uses for BCI systems should continue to generate excitement in the scientific and engineering communities as well as in popular culture.

### Acknowledgment

The authors would like to thank Zeynep Akalin Acar for use of her EEG simulation (Fig. 1) and Jason Palmer and Clemens Brunner for useful discussions.

## Footnotes

This work was supported by the Army Research Laboratories and was accomplished under Cooperative Agreement Number W911NF-10-2-0022, as well as by a gift from The Swartz Foundation, Old Field, NY. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

S. Makeig is with the Swartz Center for Computational Neuroscience, Institute for Neural Computation and the Department of Neurosciences, University of California San Diego (UCSD), La Jolla, CA 92093-0559 USA (e-mail: smakeig@ucsd.edu).

C. Kothe is with the Swartz Center for Computational Neuroscience, Institute for Neural Computation, University of California San Diego (UCSD), La Jolla, CA 92093-0559 USA.

T. Mullen is with the Swartz Center for Computational Neuroscience, Institute for Neural Computation and the Department of Cognitive Science, University of California San Diego (UCSD), La Jolla, CA 92093-0559 USA.

N. Bigdely-Shamlo, Z. Zhang, and K. Kreutz-Delgado are with the Swartz Center for Computational Neuroscience, Institute for Neural Computation and the Department of Electrical and Computer Engineering, University of California San Diego (UCSD), 9500 Gilman Drive, La Jolla, CA 92093-0559 USA.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available