Alpha Oscillations During Effortful Continuous Speech: From Scalp EEG to Ear-EEG

Objective: The purpose of this study was to investigate alpha power as an objective measure of effortful listening in continuous speech with scalp and ear-EEG. Methods: Scalp and ear-EEG were recorded simultaneously during presentation of a 33-s news clip in the presence of 16-talker babble noise. Four different signal-to-noise ratios (SNRs) were used to manipulate task demand. The effects of changes in SNR were investigated on alpha event-related synchronization (ERS) and desynchronization (ERD). Alpha activity was extracted from scalp EEG using different referencing methods (common average and symmetrical bi-polar) in different regions of the brain (parietal and temporal) and ear-EEG. Results: Alpha ERS decreased with decreasing SNR (i.e., increasing task demand) in both scalp and ear-EEG. Alpha ERS was also positively correlated to behavioural performance which was based on the questions regarding the contents of the speech. Conclusion: Alpha ERS/ERD is better suited to track performance of a continuous speech than listening effort. Significance: EEG alpha power in continuous speech may indicate of how well the speech was perceived and it can be measured with both scalp and Ear-EEG.


I. INTRODUCTION
L ISTENING effort is referred as "the deliberate allocation of mental resources to overcome obstacles in goal pursuit when carrying out a [listening] task" [1]. Listening effort may lead to unwanted consequences, such as fatigue or difficulty in comprehension and remembering the speech [2]. Physiological measurements including pupillometry [3], [4], [5] or electroencephalography (EEG) [6], [7] have shown an inverted U-shaped pattern of objective listening effort as a function of task demand. This inverted U-shape demonstrates that increased difficulty of a task leads to increased effort as long as there are enough cognitive resources available to benefit performance of the task. If the task becomes too difficult, then the listener will disengage from the listening task, and that leads to a drop in the objective measure of effort [1].
EEG has been widely used in the literature to measure effort objectively, whether in auditory [7], [8], [9], [10], [11] or nonauditory [12], [13], [14] studies. Specifically, alpha oscillations have been the key measure in most of these studies concerning effort. How alpha power plays a role in the brain during an effortful task, however, is not completely understood. It has been suggested that an increase in alpha oscillations inhibits task-irrelevant areas [15] and can be used as a neural correlate of listening effort (see review [16]). The problem with alpha power is that it is not merely associated with effort. Alpha power also correlates with optimal task performance [17] or it might be used to predict speech intelligibility [18]. Given that effort and performance could vary independently of each other [19], [20], [21] this raises an important question of when alpha power reflects both or either of these concepts.
Traditional EEG, which is also known as scalp EEG, has enjoyed wide attention from brain researchers, due to its noninvasiveness and excellent temporal resolution, along with several other advantages [22]. However, despite high quality recordings by scalp EEG, it is not yet suitable as a wearable technology, instead restricting the wearer's movement and limiting measurement to laboratory experiments [23]. However, EEG signals can now be picked up by electrodes placed inside the ears, which is known as ear-EEG, allowing ambulatory measurement of brain activity both in the laboratory and at home [23], [24]. Despite the attraction of ear-EEG as a wearable technology, it still suffers from poorer spatial resolution compared to scalp EEG, as the signals are limited to the cortical areas close to the ears [25]. Nonetheless, researchers have shown that traditional analyses, This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ such as P300 detection [24], frequency-domain representation [26], steady-state evoked potential [27], or decoding selective attention [28], [29], [30], commonly studied via scalp EEG, are also evident in ear-EEG. However, listening effort has not yet been explored using ear-EEG.
In the present study, we investigated the effects of changing task demand by manipulating the signal-to-noise ratio (SNR) of a continuous speech-in-noise task on EEG data. We used continuous speech as it represents a more realistic listening scenario than traditional short-sentence paradigms. Both scalp and ear-EEG were recorded simultaneously during this task. We previously conducted two experiments with similar long, continuous speech to investigate the effects of varying SNR on alpha power (see [31], [32]), using only scalp EEG. In both studies we observed that more demanding conditions led to decreased alpha power. In this study, we aimed to use a wider range of SNR compared to the two aforementioned studies to capture a bigger picture of alpha changes across different task demands. Therefore, we implemented four different SNRs (−16, −8, −4, +8 dB), creating very low to very high demand conditions for a normal-hearing person listening to continuous speech. Based on the results of our two previous experiments, we expected that alpha power would increase with increasing SNR in scalp EEG. Using ear-EEG for the first time in this paradigm, we then investigated how the measured alpha power in scalp EEG and ear-EEG compared to each other.

A. Participants
Sixteen (4 females) normal-hearing Danish-speaking adults (average age of 42.4 ± 11.4 years) participated in this study. One male participant was excluded due to a problem in sending triggers to the EEG device. All participants signed a written consent form before the experiment. Ethical approval of this study was obtained from the Research Ethics Committees of the Capital Region of Denmark. No participant suffered from neurological or hearing disorders. The pure-tone average of air conduction thresholds at 0.5, 1, 2 and 4 kHz (PTA4) were tested for hearing abilities and confirmed to be below 25 SPL HL.

B. Experimental Setup
The experimental setup consisted of five loudspeakers positioned around the participant at 1.2 m distance. The target loudspeaker was positioned 0°azimuth in front of the listener. The background noise, consisting of 4-talker babble, was presented from four loudspeakers located at ±90°and ±150°azimuth (16-talker babble noise in total). The spatial setup of the test is illustrated in Fig. 1.
Stimuli were routed through a sound card (RME Hammerfall DSB multiface II, Audio AG, Germany) and were played via loudspeakers KEF Q300 (KEF, United Kingdom).
A BioSemi ActiveTwo amplifier system (Biosemi, Netherlands) was used for EEG recording with a 64-channel cap mounted according to the international extended 10-20 system Background noise was presented from the four loudspeakers (in red) for 38s. 5s after the onset of the background, the target was presented from the front loudspeaker (in blue) for 33s. At the end of each trial, the participants answered a two-choice question regarding the contents of the audio clip. and a sampling frequency of 1024 Hz. The cap included DRL and CMS electrodes as references for all other recording electrodes. Conductive gel was applied to the electrodes to obtain stable and below 50 mV offset voltage.
The ear-EEG recordings were acquired with a sampling rate of 1 KHz by a 32-channel portable TMSi MOBITA EEG amplifier (TMSi, Netherlands). In addition, the amplifier enabled active shielding (guarding) of the ear-electrodes all the way to the backside of each of the 12 electrodes. For each participant, earmould impressions were acquired in a session before the test in order to make them a personalized ear-EEG [33].

C. Design
Non-dramatic Danish news clips of neutral content were used for the target speech (33 s), as well as the background noise (38 s). Background noise consisted of 4-talker babble presented from each of the four loudspeakers in the back, resulting in a 16-talker babble noise.
The A-weighted sound pressure level produced by the babble was fixed at 70 dB overall (64 dB each) at the center of the loudspeakers where the participants were seated. The level of the target was varied across trials from 54-78 dB to generate four different SNRs: −16, −8, −4 and +8 dB. In this study, SNR was defined as the long-term average sound level of the target signal (with pauses longer than 200 ms being cut out) compared to the background noise.
There were 21 trials for each SNR, randomly distributed across 84 trials (with a rest period every 28 trials) and an  (30) were used for further analysis (28 Ks and 2 Fs). The unusable electrodes below the "No Connection" or above "> 6 x Normalized SD" had no output or were too noisy, respectively. additional 4 training trials in the beginning of the test. Each trial ( Fig. 1) lasted 38 s while the 16-talker babble played in the background. The target speech was presented 5 s after the onset of the babble (i.e., after the baseline period) and then continued for 33 s, followed by a two-choice question about the content of the attended target audio clip. The percentage of correct answers were considered as performance accuracy.

D. Scalp EEG Pre-Processing
For the analysis of the scalp EEG, first power line noise was rejected with a 50-Hz notch filter with a quality factor of 25. Then a 3rd-order zero-phase Butterworth bandpass filter with cutoff frequencies of 1-40 Hz was applied to the data and the resulting signals were down-sampled to 256 Hz. Bad channels and trials were removed by visual inspection. On average 1.8 bad channels were detected per participant and interpolated using spline interpolation [34]. Also, 9.2% of the trials across all participants were rejected. No participant had more than 23.8% of trials rejected. The remaining trials were denoised using joint decorrelation method, as described in [35], [36].
For the resulting signals two different referencing methods were applied. The first method was the common average referencing which is acquired by subtracting the average of all channels from every single channel [22]. The second method was the symmetrical bi-polar referencing which is acquired by subtracting any two channels mirrored to each other in opposite hemispheres (e.g., T7 -T8). The latter referencing approach allowed us for a more direct comparison of the results of scalp and ear-EEG. The resulting signals were used for power extraction.

E. Ear-EEG Pre-Processing
For the analysis of the ear-EEG, power line noise was first rejected with a 50-Hz notch filter with a quality factor of 25. Then a 3rd-order zero-phase Butterworth bandpass filter with cutoff frequencies of 1-40 Hz was applied to the data. The resulting signals were downsampled to 256 Hz to have the same sampling frequency as scalp EEG, for any further comparison. In total, 5 trials were missed in the ear-EEG data due to trigger issues. In addition to those, bad trials were also removed by visual inspection which summed up to 10.8% of the trials across all participants rejected. No participant had more than 27.2% of trials in ear-EEG data rejected.
The first step in using ear-EEG was to select the best pair of electrodes, one in each ear. Given that the quality of signals in each electrode varied from participant to participant, we decided to choose the most consistent electrodes based on their normalized standard deviation (SD). For this purpose, the SD of all the electrodes within each participant were normalized to the smallest SD in that participant (except when there was no connection for an electrode). Fig. 2 summarizes the normalized SD across all electrodes and participants (in both left and right ears). The further the normalized SD for each channel was from 1, the more chance that they were containing noise. Based on this, K electrode had the most consistent normalized SD and thus it was chosen as the main electrode for further analysis. However, based on the same metric, in two participants the quality for K electrode was poor and instead, F electrode was used in order to replace it. The F electrode is closely located to the K electrode, as both are placed close to the ear canal ( Fig. 2 top panel).
After finding the best electrode pair for each participant (13 right-ear K-electrode paired with left-ear K-electrode, and 2 right-ear K-electrode paired with left-ear F-electrode), the right electrode was subtracted from the left one (similar to symmetrical referencing in the scalp EEG) and the resulting signal was used for power extraction in ear-EEG.

F. Alpha Power Extraction
To obtain time-frequency representation of EEG data, Morlet wavelets (7 cycles width) within the frequency range of 2 and 35 Hz, centered at 500 ms steps, were applied (using Fieldtrip toolbox [37]). Event-related spectral perturbation (ERSP) was then used to investigate how time-frequency data changed relative to baseline [38]. Positive ERSP is referred to as eventrelated synchronization (ERS) and negative ERSP is referred to as event-related desynchronization (ERD). The formula to calculate ERSP is as (1): where A is the absolute power of the post-stimulus signal at time t and frequency f.R is the absolute power of the baseline signal averaged in time (−4 to 0 s), at the same frequency f. The first second was removed to avoid event-related potentials (ERP) due to the onset of the sound. Based on the visual inspection of averaged spectrograms across participants and SNRs in scalp EEG (common and symmetrical referencing) and ear EEG, we defined alpha power as 8-14 Hz averaged in parietal region (common referencing; CP1, CP2, CP3, CP4, CP5, CP6, CPz, P1, P2, P3, P4, P5, P6, P7, P8, Pz, PO3, PO4, PO7, PO8, POz) and temporal region (symmetrical referencing; FT7, FT8, T7, T8, TP7, TP8).

G. Linear Mixed Models
Linear mixed models (LMM) were implemented to analyze the effects of SNR on alpha modulation and performance. The applied models for statistical evaluation were as (2) and (3): In which β values are the weights of predictor (SNR) and μ 1 is the random effect of each participant, added with residual error (ε). Alpha power was extracted from scalp EEG (common and symmetrical referencing) and in ear-EEG. SNR values were centered around 0 (i.e., −11, −3, +1, +13 dB) in the model to avoid correlations between the linear, quadratic, and cubic effects [39]. The degree of the model was chosen based on the sigmoid pattern of grand average results of both alpha power and performance. The MATLAB syntax for (2) was as Alpha ∼ 1 + SNR 3 + (1|Subject) and for (3) as Performance ∼ 1 + SNR 3 + (1|Subject). The next model used was to predict alpha power based on performance with SNR as the random factor. This way, the model outcome will be generalizable to all SNRs and only be driven by performance, as in (4): In which μ 1 is the random effect of each participant and μ 2 is the random effect of SNR. The MATLAB syntax for (4) was as Alpha ∼ Performance + (1|Subject) + (1|SNR). In the Results section, the estimates (β) of the predictors, corresponding t-values and degrees of freedom (t DF ) and P-values for both models are reported.

H. Changes Over Time
One of the advantages of using continuous speech was the possibility of investigating changes of alpha power over time. For this purpose, the changes in slope of alpha power within 5-s time windows (no overlap) were investigated. To do so, firstdegree polynomial curve was fit to the time-series data and LMM was used for evaluation of significant effect of first-order SNR on alpha slope in different time windows. This analysis was done on scalp EEG (parietal with common referencing and temporal with symmetrical referencing) and ear-EEG.

I. Correlation
In order to investigate if there is any correlation between performance (averaged for each condition) and alpha power (averaged for each condition), a robust Pearson skipped correlation was used to eliminate outliers by considering the data structure and mitigate the correlation bias of inter-dependency of samples within an individual [40]. The outcome of the skipped correlation was a bootstrapped data (1000 repetitions) with their 95% percentile confidence interval (CI). If the 95% CI did not contain zero, the Pearson coefficient r was considered significant. This analysis was done on scalp EEG (parietal with common referencing and temporal with symmetrical referencing) and ear-EEG.
The same method was used to look for the correlation between the temporal alpha in scalp EEG (symmetrical referencing) and the ear-EEG alpha power. This analysis was done to make sure that signals picked up by ear-EEG were closely correlated to the ones picked up by scalp EEG, around the ears.

B. Scalp EEG: Parietal -Common Reference
Applying LMM with common referencing for the scalp EEG in the parietal region showed a significant linear effect (β  graph for modulated alpha power, grand averaged spectrograms and corresponding topographic map are shown in Fig. 4.

E. Alpha Changes Over Time
The evaluation of alpha slope within 5-s time windows revealed significant linear effect of SNR in the first 5 s of the stimuli in scalp EEG in the parietal (common referencing; β = 0.41, t 58 = 4.98, p < 0.001) and temporal (symmetrical referencing; β = 0.41, t 58 = 4.56, p < 0.001) regions, and also ear-EEG (β = 0.29, t 58 = 2.94, p = 0.004). No other significant effect of SNR on slope was observed in any other time windows in any of the measures. Fig. 7 show the changes of alpha power over time and the slopes corresponding to each 5-s time windows.

A. Overview
Using four different SNRs, from very low to very high task demand, alpha power ERS decreased with increasing demand. This was present in both scalp EEG (using common or symmetrical referencing) and ear-EEG. Investigating the slope of alpha power over 33 s of the stimulus presentation in the same measures showed that there is a significant linear effect of SNR in the first 5 s.
Alpha ERS was also positively correlated to performance accuracy in scalp EEG and ear-EEG, which was based on answers to questions related to the content of the speech.

B. Alpha Modulation: A Marker for Effort or Performance?
Based on the literature, an increase in alpha power is often related to an increase in task demand of listening when shortsentence paradigms are used to evaluate effort [7], [8], [41], [42], [43]. In these paradigms, when the task is exceedingly difficult for individuals, an inverted U-shaped pattern of alpha power is observed, as participants disengage from the task [6], [41]. Studies with longer stimuli, however, often have reported the inverse effect (i.e., a decrease in alpha relating to an increase in demand) [11], [31], [32], [44], [45], [46].
Irrespective of referencing method (common or symmetrical), brain region (parietal or temporal), or EEG device (scalp or ear-EEG), the results of alpha power in this study showed a significant linear decrease with increasing demand (Figs. 4, 5,  and 6). These results were in line with two of our previous studies, which showed less alpha ERS in more demanding continuous speech in hearing-impaired participants [31], [32]. One of the hypotheses on the functional role of alpha power is "gating by inhibition" [15], [17]. This theory postulates that alpha ERS shuts down irrelevant task regions which helps routing the information to the task-relevant areas. Based on this theory, if alpha power is a measure of listening effort, it should increase with increasing task demand (i.e., lower SNR). Additionally, since the task demand varied from very easy to very difficult in this study (24 dB SNR span), an inverted U-shaped Fig. 8. Skipped Pearson correlation (r) between performance accuracy and alpha power in scalp parietal using common referencing (top plot), temporal using symmetrical referencing (middle plot) and ear-EEG (bottom plot). The gray ellipses are the estimated area by the skipped correlation based on the structure of the data to determine if a data sample is an outlier or not. Data samples outside the gray ellipses (red dots) were considered as outliers and left out for r estimation. The gray area shows 95% CIs of the bootstrapped data which were positively significant for all three measurements. pattern of alpha power over SNR was expected. However, the results showed no inverted U-shaped alpha power. Instead, alpha power increased in sigmoidal fashion with increasing SNR (i.e., opposite to "gating by inhibition" theory). This may indicate that alpha power shows something other than listening effort.
In a recent study [46] on vocoded continuous speech, increased demand (i.e., greater speech degradation) led to decreased alpha power as well. The authors suggested that during a Fig. 9. Skipped Pearson correlation (r) between alpha power extracted from temporal scalp EEG (symmetrical referencing) and ear-EEG. The gray ellipses are the estimated area by the skipped correlation based on the structure of the data to determine if a data sample is an outlier or not. Data samples outside the gray ellipses (red dots) were considered as outliers and left out for r estimation. The gray area shows 95% CIs of the bootstrapped data which were positively significant.
continuous speech task, which might require complex linguistic processing in the brain, increased ability to track and understand speech is accompanied by increased alpha [46]. In fact, our results also showed that alpha during continuous speech might be a better indicator of speech intelligibility than listening effort. Better performance led to increased alpha ERS during the task. Performance accuracy could also predict alpha power, without the influence of SNR on performance, in the statistical model. While performance accuracy in this study was a simplistic measure of speech intelligibility, it still significantly benefitted from increasing SNR. This provides further evidence against the idea that alpha reflects listening effort during continuous speech. While listening effort has been shown to change in an inverted U-shaped pattern (e.g., [5], [6], [47]), speech recognition is best described as sigmoidal with changes of task demand [48], [49]. Therefore, it is unlikely that one measure can reflect both at the same time, at least when such a wide range of task demand (from −16 dB to +8 dB SNR) was used.
However, the problem in interpretating the correlation between performance and alpha ERS is that the causality direction between them is unknown. It is unclear whether increased alpha activity led to better speech intelligibility ("top-down" attention), or increased SNR led to better speech intelligibility as well as entrainment of alpha power in the brain ("bottom-up" attention). Therefore, based on these results, the role of alpha power in speech processing is unclear. We can only conclude that alpha power is correlated to performance which is an indirect measure of speech intelligibility.

C. Changes of Alpha Over Continuous Speech
One of the advantages of using long, continuous speech is to explore changes of alpha activity over an extended period of listening. Reasonably, it cannot be expected that a listener invests effort continuously at a constant level during the whole presentation of continuous speech [50]. A person can adapt to specific listening difficulties, get fatigued, or lose/gain motivation over time.
Using continuous speech that lasted for 33 s, the slope of alpha power in the first 5 s showed a linear trend with SNR in both scalp (common or symmetric referencing) and ear-EEG. In other words, higher SNRs had larger leap from alpha ERD to ERS in the first 5 s of the stimuli. The early transition from alpha ERD to ERS might be an early indicator of whether a listening situation is easily intelligible or not.
The importance of this finding is that in such a continuousspeech paradigm, only the first 5 s of the stimulus presentation can be still informative about the intelligibility manipulations. However, it is important to note that while the averaged alpha power over 33 s showed cubic effect as well, alpha slope within the first 5 s only showed linear effect of SNR. Therefore, using averaged alpha over the whole stimuli may correspond better to the performance, which also followed a cubic pattern with SNR.

D. Feasibility of Ear-EEG to Measure Alpha Power
Ear-EEG is a feasible method for ambulatory measurement of EEG. While ear-EEG is limited to picking up the electrical brain signals around the ears, it has been shown that it can be successfully used in auditory studies to decode the attended talker in the presence of distracting talkers [28], [29], [30]. However, to our knowledge no studies have investigated alpha oscillations in continuous speech using ear-EEG.
The first issue of using ear-EEG in a realistic listening scenario (i.e., continuous speech) is the quality of recorded data compared to the scalp EEG. In this study, for scalp EEG, conductive gel was applied for better contact to the skin, but for ear-EEG the electrodes were used dry. While using conductive gel leads to better-quality signals in general, it is important to implement dry electrodes for any further application of ear-EEGs in real life. Even with the differences in impedance of electrodes, ear-EEG signals had good quality compared to scalp-EEG signals (on average 1.3 more trials were rejected in ear-EEG compared to scalp EEG), especially using the selected electrodes of the ear-EEG (e.g., K and F, refer to Fig. 2). The high correlation in alpha power between the ear-EEG (using dry electrodes) and the temporal region of scalp EEG (using wet electrodes) was another indication that the quality of recorded data in ear-EEG was close to signals from scalp EEG (refer to Fig. 9).
The second aspect of the ear-EEG results was how the changes in alpha power with task demand could be compared in scalp EEG and ear-EEG. We could see that alpha power changed in a sigmoidal pattern in both scalp EEG (temporal, symmetrical referencing) and ear-EEG by manipulating SNR in continuous speech and were correlated to the performance. Similar to scalp EEG, we speculate that alpha power during elicited by such stimuli is less a reflection of listening effort and more a measure of performance. The slope of alpha power within the first 5 s of the stimuli also showed similar pattern in the ear-EEG compared to scalp EEG (all showed linear effect of SNR on the alpha slope in the first 5 s).
More research is required to investigate the feasibility of measuring performance or speech intelligibility using ear-EEG, a question that has potential for real-world impact if implemented in hearing aids. To this aim, using real-life scenarios similar to the conversation-like stimuli used in this study is encouraged.

V. CONCLUSION
Wearable technology such as ear-EEG can give us insight into brain activity in everyday environments. To our knowledge, this is the first study that investigates alpha oscillations in ear-EEG during effortful continuous speech. To do this, we recorded brain signals with a 64-channel scalp EEG simultaneously with dry ear-EEG electrodes during a continuous speech-in-noise task. Four different SNRs were generated to manipulate task demand. The results showed decreasing SNR (i.e., increasing task demand) led to decreasing alpha power in both scalp EEG and ear-EEG. Also, the slope of alpha power within the first 5 s of the speech showed similar linear pattern in scalp EEG and ear-EEG with increasing task demand. We report that ear-EEG can measure changes in performance using alpha activity as a marker, providing an indirect measure of speech intelligibility.