Decoding Bilingual EEG Signals With Complex Semantics Using Adaptive Graph Attention Convolutional Network

Decoding neural signals of silent reading with Brain-Computer Interface (BCI) techniques presents a fast and intuitive communication method for severely aphasia patients. Electroencephalogram (EEG) acquisition is convenient and easily wearable with high temporal resolution. However, existing EEG-based decoding units primarily concentrate on individual words due to their low signal-to-noise ratio, rendering them insufficient for facilitating daily communication. Decoding at the word level is less efficient than decoding at the phrase or sentence level. Furthermore, with the popularity of multilingualism, decoding EEG signals with complex semantics under multiple languages is highly urgent and necessary. To the best of our knowledge, there is currently no research on decoding EEG signals during silent reading of complex semantics, let alone decoding silent reading EEG signals with complex semantics for bilingualism. Moreover, the feasibility of decoding such signals remains to be investigated. In this work, we collect silent reading EEG signals of 9 English Phrases (EP), 7 English Sentences (ES), 10 Chinese Phrases (CP), and 7 Chinese Sentences (CS) from the subject within 26 days. We propose a novel Adaptive Graph Attention Convolution Network (AGACN) for classification. Experimental results demonstrate that our proposed method outperforms state-of-the-art methods, achieving the highest classification accuracy of 54.70%, 62.26%, 44.55%, and 57.14% for silent reading EEG signals of EP, ES, CP, and CS, respectively. Moreover, our results prove the feasibility of complex semantics EEG signal decoding. This work will aid aphasic patients in achieving regular communication while providing novel ideas for neural signal decoding research.


I. INTRODUCTION
T HE aphasia caused by nerve damage is irreversible.
Fortunately, Brain-Computer Interface (BCI) technology for decoding neural signals offers an efficient communication method for aphasic patients [1], [2], [3].Numerous studies [4], [5], [6], [7], [8], [9] have successfully decoded Electrocorticogram (ECoG) signals of silent reading words, phrases, and sentences.Compared with ECoG, Electroencephalogram (EEG) [10] is non-invasive, more user-friendly, has a high temporal resolution, and is widely used for decoding neural signals.Due to its low signal-to-noise ratio, the units of early studies on decoding silent reading EEG signals primarily focused on syllables and phonemes [11], [12], [13], [14].Research [12] also used non-English to replace the imaginary pronunciation of English words and decode their EEG signals to help aphasic patients achieve simple communication.However, decoding limited sets of syllables, phonemes, and non-English words is inadequate for patients with language disorders to communicate daily.With the advancement of deep learning techniques and the enhancement of acquisition equipment [15], [16], researchers [14], [17] have attempted to decode silent reading EEG signals for multiple categories of words and improved classification accuracy.For example, Vorontsova et al. [17] decoded silent reading EEG signals with 85% accuracy using 8 meaningful words.Although the results were not as promising for out-of-sample subjects, good performance was achieved on a single subject.These findings provide an encouraging incentive for investigating the decoding of silent reading EEG signals.However, decoding EEG signals word-by-word is time-consuming and requires more cognitive effort.A more efficient and natural approach is directly decoding entire phrases and sentences from neural signals.This method will significantly enhance the communication efficiency of aphasic patients.Additionally, it is necessary to decode complex semantic EEG signals of multiple languages in a multilingual social environment.Decoding silent reading multiple EEG signals provides a more realistic and smooth communication experience.It avoids delays, misinterpretation, and information leakage caused by translation software, which relies on external input and lacks communication depth.Furthermore, decoding silent reading bilingual EEG signals can help deepen the understanding of the neural mechanisms in the brain related to multilingual processing.It thus can promote neuroscience research in bilingual interaction, memory, and language processing [18].
Compared with words, phrases and sentences contain more complex information.Therefore, the corresponding silent reading EEG signals are more intricate.Decoding neural signals with complex semantics under bilingualism requires consideration not only of the differences in brain use under different languages but also of the complexity of semantics and changes in neural signals over time [19].To effectively extract meaningful information from EEG signals with complex semantics and decode them.The proposed algorithm must be capable of selecting channels that contain the most useful information corresponding to different languages.Additionally, it should capture fluctuations in both temporal and spatial relationships between the involved channels.Therefore, decoding bilingual silent reading EEG signals with complex semantics is methodologically more demanding.Furthermore, whether it is possible to decode them directly has yet to be studied for feasibility.
EEG signals are typically non-Euclidean structured data [20], [21].Representing EEG signals with graphs provides a more comprehensive representation of the spatial connections between channels compared to using temporal sampling points for each channel [22].On this basis, we use a two-dimensional matrix and its adaptive graph to represent each EEG signal jointly.The two-dimensional matrix represents the sampling points of each channel in time, and the graph provides a richer representation of the spatial connections between channels.Using a fixed graph for all EEG signals does not provide a more comprehensive representation of the variability of EEG signals [23] due to the differences in the corresponding EEG signals of the same subject doing the same task at different times.Therefore, we construct an adaptive graph structure for each EEG signal based on its characteristics to capture real-time changes in both temporal and spatial domains.
This study collects silent reading EEG signals of 9 English Phrases (EP), 7 English Sentences (ES), 10 Chinese Phrases (CP), and 7 Chinese Sentences (CS) from the subject over 26 days.We propose a novel Adaptive Graph Attention Convolution Network (AGACN) for classifying our datasets.Our proposed method achieves the accuracy of 51.05±3.62%,57.85±4.41%,40.00±4.55%,and 52.20±4.94% for silent reading EEG signals of EP, ES, CP, and CS, respectively.Our work demonstrates the feasibility of decoding silent reading EEG signals of phrases and sentences, opening up a novel representation method for developing neural signal language decoding techniques.This study is expected to provide more communication options and improve the quality of life for aphasic individuals.Our main contributions are summarized as follows: • We collect comprehensive datasets that contain silent reading EEG signals of 9 English phrases, 7 English sentences, 10 Chinese phrases, and 7 Chinese sentences from the subject over 26 days.These datasets fill the gap for the research of decoding silent reading EEG signals with complex semantics in multiple languages.
• We propose a novel Adaptive Graph Attention Convolution Network (AGACN) for classifying our datasets.Our method uses each feature matrix and its adaptive graph structure as inputs.AGACN uses the attention mechanism on the feature matrix according to the weights on the edges of the graph to effectively capture the temporal and spatial features of EEG signals.
• Our method performs superior to state-of-the-art methods on four collected datasets.Extensive experiments demonstrate the feasibility of decoding bilingual silent reading EEG signals with complex semantics.

A. Preliminary Works
We use a 64-channel wet electrode wireless EEG equipment device with a sampling frequency of 1000H z.The 64 channels include 59 brain and 5 body functional signal channels.The device is equipped with a wireless amplification, which can amplify the signal at the same point in time by an equal factor 2 × 10 4 .In our study, we select channels corresponding to distinct brain regions for the comparative analysis in Section V-E.Therefore, we refer to [31] and categorize the 59 channels into six clusters.These clusters represent the frontal, central, temporal, parietal, and occipital lobe regions, as detailed in [32].Notably, the temporal lobe is further divided into the left and right temporal lobe regions.The positions of the corresponding 59 brain channels and their regions, following the 10-20 system electrode placement method [33] are shown in Fig. 1.
The subject is 21 years old, male, right-handed, and has no neurological or other diseases.His native and second language are Chinese and English.He began learning English at 10, has a high English level, and can communicate skillfully.We collect EEG signals on 16 days during the 26 days, specifically on days 1, 4-8, 13-16, 18-22, and 26, with 3 to 5 daily trials conducted in each category.Data collection involving multiple days can enrich the data diversity because there are some changes in EEG signals when the same subject performs the same task on different days.
To ensure the selection phrases and sentences are appropriate, we refer to the corpus used in [24] and consult several individuals with language disorders.After consulting with numerous aphasia patients, we summarize the most frequently used phrases and sentences they desire to express daily.Subsequently, psychologists and linguists evaluate the scientific validity of these phrases and sentences employed in the experiment.Table I shows the corpus used in each category.
Before the experiment, the experimental monitor helps the subject wear the EEG cap and fill conductive paste into channels to reduce the impedance level within an acceptable error tolerance.During the experiment, the subject wears headphones and faces a white wall to minimize the effects of noise and visual stimulation.Two computers are used in The experimental design aims to collect EEG signals during silent reading.Depending on the characteristics of the used device, we set up 'listening' before silent reading and 'speaking' after silent reading.The 'listening' reminds the subject to read what is pronounced silently, and 'speaking' ensures that the subject does the silent reading of the audio he has heard.Ensure the subject has sufficient reception and reaction time to read phrases and sentences silently.Each phrase and sentence is listened to three times, silently read once, and spoken once, and EEG signals are collected throughout the entire process.The operator monitors the experiment process according to the instructions on the screen and marks the unqualified data.For example, this data will be marked if the subject does not speak or the 'speaking' is inconsistent with the 'listening'.
The experimental process is illustrated by collecting EEG signals of 9 English phrases, as shown in Fig. 2.There are a 'beep' and automatic marks at the beginning and end of 'listening', 'silent reading', and 'speaking'.One trial consists of listening (3 times), silent reading (once), and speaking (once) 9 phrases in sequence.The duration for 'listening', 'silent reading', and 'speaking' is set at 4 seconds each.1-9 represent the beginning of 'listening' 9 phrases.21 and 22 represent the beginning and end of silent reading, and 30 and 31 represent the beginning and end of speaking.For example, marks 21, 22, 30, and 31 after 4 represent the beginning and end of silent reading and speaking the fourth phrase.
The process of collecting EEG data for sentences is identical to that for phrases.The duration of 'listening', 'silent reading', and 'speaking' for sentences is each 5 seconds.Before the experiment, we introduce the process to the subject and sign an informed consent form.The experimental procedure is conducted under the ethical standards encoded in the latest Declaration of Helsinki [34].This research received approval

B. Datasets
We cut 'silent reading' EEG signals of each phrase or sentence according to the marks in each trial.To reduce computation, we apply a sliding window with a size of 2 and a step of 2, downsampling the data within each window on average.The average of the numbers in the sliding window is used as a new sampling point.Thus, 2000 and 2500 sampling points are in the EEG signals of the downsampled silent phrases and sentences, respectively.Eventually, there are four datasets: Silent Reading English Phrases (SREP), Silent Reading English Sentences (SRES), Silent Reading Chinese Phrases (SRCP), and Silent Reading Chinese Sentences (SRCS).We remove the unqualified data marked in the experiment and randomly divide the qualified data into the training and test sets in the ratio of 8 : 2. The size of each dataset is shown in Table II.The dataset will be publicly available at: https://github.com/cfli20/EEG.

IV. METHOD A. Graph Construction
Cumulative changes in neural activity can lead to differences in corresponding EEG signals of doing the same task [35], [36], even if the same subject performs the same task at different times.In this work, we propose a novel method for representing EEG signals to fully capture the temporal and spatial connections between channels in real-time.Specifically, we utilize a feature matrix and its corresponding graph structure to represent each EEG signal at different times.The feature matrix captures the temporal dynamics of EEG signals, while its graph structure represents the spatial correlation between each channel.The feature matrix (F ∈ R N ×T ) is a two-dimensional matrix of the sampling points of each channel in time.The adjacency matrix (G ∈ R N ×N ) is used to represent the graph structure, where N is the number of channels, and T is the data points.In the graph structure G(D, E), D is the set of nodes, and Where E is the set of edges, the node set E s = E i j |(i, j) ∈ N , E s is the weight value of edges i and j.We hypothesize that if channels exhibit strong correlations with each other, it indicates that their corresponding neural activities are relatively consistent.Thus, the more strongly correlated channels there are among the key channels, the more consistent the neural activity of their counterparts will be.Therefore, we calculate the Pearson Correlation Coefficients (PCC) [37] of any two channels and use it as the weight between the corresponding nodes.PCC is calculated as follows: where ρ is the PCC value corresponding to two nodes, µ X and µ Y are the mean value of channels X and Y, and σ X , σ Y are the standard deviation of the two channels.We treat each channel as a graph node and use ρ as the weights on the edges between two nodes to enhance the channel relationship.The selection of weights E(i, j) on the corresponding edges of nodes i, j follows the following equation.
Therefore, we construct an adaptive graph structure for each EEG signal.Extensive experiments show that choosing the top 12 Key channels (AK-channels) using Analysis of Variance (ANOVA) [38] can get the highest accuracy.The specific selection process for the 12 AK-channels is detailed in Section V-D.Each silent reading task takes 5 seconds, and each data has 2500 points after performed downsampling, so its corresponding feature matrix is F ∈ R 12×2500 .The adjacency matrix is G ∈ R 12×12 , and the graph structure differs in the weights of each edge.In Fig. 3, we present the temporal and spatial variations of EEG signals between channels during the same silent reading task on the SREP.The visualization of the adjacency matrix further illustrates that the graph structure varies based on different EEG signals, mainly manifested in the weight differences among individual channels.Due to the fluctuations in EEG signals, the constructed graph structure also changes accordingly.As a result, the correlations between any two channels also vary.Overall, the feature matrix and adjacency matrix keep the spatial and temporal connection between channels and help explore the changes in temporal and spatial dynamics of the EEG signals during different tasks and over different periods.By incorporating the graph structure and the feature matrix, the models can better capture the spatial correlation between the EEG channels, improving classification accuracy.

B. Adaptive Graph Attention Convolution Network
The adjacency matrix (G) is symmetric in an undirected weighted graph, and the principal diagonal is zero.Therefore, we introduce the unit matrix to represent the characteristics of each node adequately.The adjacency matrix ( G) is as follows: where, I is the unit matrix.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The range of the weights on edges is [0,1].Therefore, no normalization of the adjacency matrix can ensure that different input data have similar values and can find the optimal solution faster by gradient descent [39].In this work, each feature matrix (F) and its adaptive graph structure ( G) are inputs of AGACN.AGACN uses G as attention, F to convolve with G, and G is the convolution kernel with the perceptual field of the whole F. According to different weights on the edges of G, different attention is given to its corresponding channels.This can preserve the important features of each channel in time more richly and can effectively capture and strengthen the spatial connection of channels.Then, a hidden layer network maintains useful information and uses the activation function for nonlinear expressions.Therefore, the AGACN cell layer is: where, H is the output of the current layer, σ is the activation function, W is the training weight matrix, i is the input dimension, and o is the output dimension.The AGACN cell layer is shown in Fig. 4.
For the continuation of the features captured by the previous network layer to be passed on, we take the output of the previous layer as the feature matrix of the current layer.The transfer relationship between the AGACN cell layer is as follows: We use three AGACN cell layers to classify complex semantics.The output of the second layer is transposed ((H (2) ) T ), and as Attention to Cross Fusion (CFA layer) convolves with the output of the third layer to fully capture the connections of dynamic features between channels.Using (H (2) ) T as the convolutional kernel with a receptive field corresponding to all the data in the third AGACN cell layer (H (3) ) ensures comprehensive integration of the valuable information captured in both (H (2) ) T and H (3) .Therefore, the CFA layer is as follows: After the CFA layer, a Fully Connected (FC) layer and a softmax layer are added.The structure diagram of the AGACN network is shown in Fig. 5.

A. Experiment Setup
This study implements the AGACN using PyTorch framework [40], [41].Parameters are kept the same for classifying all datasets, demonstrating the effectiveness and robustness of our proposed method.Specifically, the output dimension sizes for three AGACN cell layers are 126, 64, and 132, respectively.The activation functions for the three AGACN cell layers are Relu, Tanh and Relu [42].We use a learning rate of 0.0001 and a dropout rate of 0.05.We use 5-fold cross-validation [43] to evaluate the performance of the model comprehensively.Namely, the training set is equally divided into 5 equal parts, and 4 of them are used in turn as the training set and the remaining one as the validation set.The Adam optimizer is used with a decay of 0.0005, a batch size of 80, and epochs of 500.The loss function is cross-entropy [44] as follows in Eq. ( 7), which measures the discrepancy between predicted and ground truth values.
where, p(x i ) represents the true distribution of ground truth sample, and q(x i ) represents the distribution predicted by the model.
To illustrate the data flow changes during the attention convolution process, we show a feature capture schematic in Fig. 6.The input feature matrix is F ∈ R (12,2500) .The size of the graph structure in each AGACN cell layer is G ∈ R (12,12) .The feature matrices from the first to the third layers are denoted as F 1  ∈ R (12,126) , F 2 ∈ R (12,64) , Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Fig. 6.Data flow changes on SREP using AGACN.Feature matrices: and F 3 ∈ R (12,132) , respectively.Each layer of the AGACN cell uses an adaptive graph as attention.Layer by layer, the feature matrix containing richer features is gradually selected.Then, the feature matrices selected by the second layer of the AGACN cell are transposed as F 2 ∈ R (64,12) ) and further convolved with the third AGACN cell layer.The CFA layer fully fuses features from the second and third layers of the AGACN cell, capturing both temporal and spatial aspects.It aligns the signals captured by the last two layers temporally and integrates features spatially, effectively highlighting useful characteristics.

B. Key Channels Selection
Zheng and Lu [45] observed higher classification accuracy when using key channels selected based on calculated features compared to using all channels.Therefore, identifying the most potent characteristic channels and using relatively few channels to achieve the highest classification accuracy is essential.We select key channels from global features on the training set in the following steps.Firstly, we apply a Butterworth [46] 7th-order filter to eliminate noise above 50H z, ensuring all the selected key channels contain more neural signals.Secondly, we normalize the filtered data to reduce the impact of different impedances on channel signals.For each one-dimensional feature x i , the normalization formula for the eigenvalue x ik of the k-th sample x ik in Eq. ( 8).Finally, we use the SelectKBest [47] function in Keras [48] to achieve it.The selectKBest function scores the features of each channel on the training set, with ANOVA employed as the criterion for feature scoring in this study.
where min(x i ) and max(x i ) represent the minimum and maximum values across all data samples x ik .We sort the 59 channels in descending order of their scores using the SelectKBest function.The selection process for the 12 channels is as follows.Firstly, we extract EEG data from channels 2 to 59 in training and test sets, forming feature matrices.Subsequently, we construct an adaptive graph for each feature matrix.Then, we divide the training set, which comprises channels 2 to 59, into new training and validation sets in a ratio of 7 : 3, respectively.Finally, we utilize the AGACN to classify the new training set and record the highest decoding results of the validation set.We also experiment with the classification accuracy of the validation set using N-fold cross-validation, with N values of 5, 8, and 10.The experimental results show that the validation set accuracy is relatively higher for the four datasets when using 5-fold crossvalidation with the top 12 AK-channels.Therefore, we chose the top 12 AK-channels and 5-fold cross-validation for classification.The top 12 AK-channels are shown in Table III.The classification accuracy of each validation set is shown in Fig. 7.
As can be seen from Fig. 7, the validation accuracy is relatively high when using 12 AK-channels on our four datasets.When the number of channels is less than 12, the included features are not comprehensive, resulting in relatively lower accuracy.EEG signals have a low signal-to-noise ratio.More channels will have more noise, and thus, accuracy will not always increase with more channels.If the newly added channel contains more noise than the neural activity signal, the noise interferes with useful features, causing a decrease in accuracy.Conversely, if the newly added channel contains more useful neural signal features than noise, additional valuable features are introduced, increasing accuracy.

C. Classification Results Using AK-Channels
We simultaneously use EEGNet [49], DeepCovNet (DCN) [50], ShallowConvNet (SCN) [50], Support Vector Machine (SVM) [51] for classifying on our datasets.We also utilize Power Spectral Density [52] and Discrete Wavelet Transform [53] to extract features from EEG signals.Subsequently, we use SVM (PSD-SVM and DWT-SVM) to classify each case.We further utilize Multivariate Fast and Adaptive Empirical Mode Decomposition (MFAEMD) to extract features and employ the Light Gradient Boosting Machine (LGBM) algorithm (MFAEMD-LGBM) for classification [54].They are widely used for EEG classification and have proven highly effective across various EEG classification tasks.In this study, SVM is trained using a Gaussian kernel function [55].In the PSD-SVM experiments, the sampling frequency is selected as 100, the length of each data segment is 300, and the overlap between two analogous data segments is 10.Discrete wavelet transform fifth level decomposition using Db2 [56].Classification results of our datasets using 12 AK-channels are shown in Table IV.The confusion matrix is widely used to evaluate the performance of classification networks, which consists of True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).The number on the diagonal of the confusion matrix indicates the probability that the category corresponding to the true label is correctly classified.In Fig. 8, we show the classification confusion matrix using the AGACN for four dataset classifications.For example, in the confusion matrix of SREP, the classification accuracy of silent EEG signals for 2 ('good morning') and 3 ('go out') reach 85% and 77%.However, the classification accuracy of 1 ('close the window') is lower, with 38% being incorrectly classified as 2 ('good morning').After engaging in discussions with the subject, we conclude that an imbalance in accuracy across categories may be attributed to inconsistent attention to silent reading during various collection times within the same low-accuracy category.The subject exhibited difficulty concentrating and associating different elements with the same low-accuracy category during each collection time, which increased noise and inconsistencies, obscuring category features.The limited data also contributes to reduced feature distinctiveness within the same low-accuracy class.
In Table IV, the classification accuracy of four datasets using our proposed method is higher than that of the other networks.The main reason is that in the representation, we use a feature matrix and its adaptive graph structure jointly representing each EEG signal, fully preserving the temporal 9. Classification results using CTK-channels.and spatial connections the original data.In the conprocess, the graph is used as the convolution kernel to add attention to each channel according to the magnitude of the edge weights on the graph, capturing the temporal and spatial connections between channels.The network effectively captures the spatial continuity between channels by utilizing the output of the previous layer as the feature matrix of the current layer.The second and third layers of the network perform cross-fusion convolution of their outputs, a further fusion of features between each channel in both temporal and spatial.

D. Classification Results Using CTK-Channels and 59 Brain Channels
ANOVA and chi-square test [57] are widely used statistical methods across various data types.We use ANOVA as the criterion to select AK-channels since it is suitable for continuous data and categorical independent variables and can compare means and detect differences between groups.The chi-square test is appropriate for analyzing categorical data and determining independence between variables, which assesses the association between categorical variables.We also utilize the chi-square test as a feature selection criterion, employing the SelectKBest function to choose key channels for comparison.These selected channels are referred to as Chi-square Test Key channels (CTK-channels).The 12 CTKchannels selected in each dataset are shown in Table V.We construct an adaptive graph structure for each feature matrix on each dataset and then use AGACN, EEGNet [49], DeepCovNet [50], ShallowConvNet [50], and Support Vector Machine [51] for classification.The classification results are presented in Fig. 9. Results demonstrate that our proposed method gets higher decoding accuracy than other comparison networks when using CTK-channels but lower than AK-channels.This further demonstrates the effectiveness and robustness of our proposed method.
We also utilize 59 brain channels to decode the four datasets.The decoding accuracy of AGACN for EP, ES, CP, and CS  The decoding accuracy for all of them is lower than that of using AK-channels.We count the correlation between any two of the 12 AK-channels, 12 ATK-channels, and 59 channels of reading the same task silently EEG signals.The statistical results show that while the correlation between any two channels in the CTK-channels strong in any of the datasets, the strength of the correlation of AK-channels is higher.This suggests that all AK-channels contain rich information on neural activity.Consequently, using AK-channels holds the potential for achieving higher decoding accuracy.We randomly visualize the size of the PCC between any two channels on the 59 brain channels, AK-channels, and CTK-channel dataset of the same silent reading tasks, as shown in Fig. 10.In Fig. 10, the strength of correlations between individual channels can be visually assessed, indicating differences in the consistency of neural activity corresponding to each channel.Furthermore, it demonstrates that the key channels selected using ANOVA are more suitable for AGACN than those selected using the chi-square test.

E. Classification Results Using Different Functional Area Channels
Using different functional area channels to decode EEG signals is mainstream [58], [59], [60].To prove the scientificity of AK-channels, as a comparison, we select channels corresponding to the auditory brain center and auditory language center (Broca and Wernicke areas) [61], namely AF3, F3, F5, FC3, FC5, T7, C5, TP7, CP5, and P5 for classifying.We refer to the channels selected from Broca and Wernicke as BW-channels.Meanwhile, we also use the channels from the frontal, central, temporal, parietal, and occipital lobe regions for classifying, respectively.The channels corresponding to each lobe are illustrated in Fig. 1.All the classification results are shown in Table VI.
The classification results show that using AK-channels is higher than using different functional area channels on four datasets.These results are because language processing involves multiple brain regions, and the distribution of EEG signals on the scalp is complex, with possible cross-talk.Furthermore, the signal collected by the functional brain area may contain noise signals caused by the quality of the collection equipment, as well as hair and skin interference.The 12 AK-channels with the strongest features contain more useful information than different functional area channels.This also illustrates the scientific validity and effectiveness of AK-channels.As a result of these factors, the classification accuracy using channels from different functional language areas is lower than that of AK-channels.

F. Ablation Study
The AGACN achieves higher classification accuracy than state-of-the-art methods on four datasets due to the role of each layer.We conduct ablation experiments to explore the role of each layer in decoding.Table VII shows the ablation experiment results, with the '✓' indicating that the corresponding network layer is used.
Ablation experiment results demonstrate that classification results using three AGACN cells are higher than one or two AGACN cells on four datasets.Three AGACN layers can more effectively capture temporal and spatial characteristics than one or two AGACN cells.Using less than three AGACN cells does not fully capture the characteristics within the signals.The classification results using three AGACN cells with a CFA layer are superior to using three AGACN cells, indicating the significant role of the CFA layer in further capturing signal features.The accuracy of classification using four AGACN cells is not necessarily higher than that using three AGACN cells on our four datasets.The primary reason is excessive network layers can lead to over-capturing features between signals, potentially misidentifying noise features as actual signal features.This also leads to the reason that the accuracy of using four AGACN cells with the CFA layer for the classifications is lower than that of three AGACN cell layers followed by the CFA layer.The results show that each layer of AGACN is important for decoding complex semantics silent reading EEG signals.

G. K-Fold Cross-Validation Experiments
K-fold cross-validation divides the training dataset into K equally sized subsets, rotating with each subset used once Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.as the validation set and the remaining K-1 subset used for training.It can comprehensively evaluate the model and prevent over-fitting due to small data.The choice of K plays a significant role in the performance of the model.
Experiments prove that the AGACN performs best using 5-fold cross-validation.The Receiver Operating Characteristic (ROC) curve is a visual representation used to assess the quality of the classification model.TP and TN are the correctly predicted positive and negative classes.FP and FN are the incorrectly predicted positive and negative classes.The ROC curve illustrates the trade-off between the True Positive Rate (TPR) and the False Positive Rate (FPR) at various classification thresholds.The formulas of TPR and FPR are as follows: F P R = F P F P + T N .
Therefore, the higher the value of the Area Under the Curve (AUC), the better the decoding ability of the model [62].Fig. 11 shows the ROC curve of 5-fold, 8-fold, and 10-fold cross-validation classifying using AGACN on four datasets.The results show that AGACN performs better using 5-fold cross-validation than 8-fold and 10-fold cross-validation on four datasets.

VI. CONCLUSION AND FUTURE WORKS
In this study, we collect SREP, SRES, SRCP, and SRCS datasets and fill the blank of decoding bilingual silent reading EEG signals with complex semantics.A novel AGACN is proposed for EEG signals with complex semantics classification.We use each feature matrix and its adaptive graph structure to represent each EEG signal.Further, they are used as inputs to the AGACN to capture the relationships among channels in both temporal and spatial domains.Extensive experiments demonstrate that our proposed method achieves superior classification accuracy compared to state-of-the-art methods.
It is crucial to note that while the current study exhibits the feasibility of decoding silent reading EEG signals with complex semantics, it is not a fully developed, clinically applicable system.Additionally, it should be acknowledged that the results of this work achieve the best performance among the three subjects examined.In addition, the AGACN experiments are limited to one-subject prediction.Further research is necessary to propose a more robust method that can overcome individual differences to achieve the decoding of silent neural signals, which is also the focus of our future work.We will also expand the count of phrases and sentences and further propose methods for decoding silent EEG signals involving more languages.

Fig. 2 .
Fig. 2. Experiment procedure.The operator monitors the experiment process on two screens according to the instructions.The subject wears headphones and faces a white wall.9 phrases, in turn, listening (3 times, each takes 4 seconds, all use 12 seconds), silent reading (once, takes 4 seconds), and speaking (once, takes 4 seconds).1-9 represents listening 9 English phrases in turn.21 and 22 represent the beginning and end of the silent reading corresponding phrase.30 and 31 represent the beginning and end of speaking the corresponding phrase.

Fig. 3 .Fig. 4 .
Fig.3.Visualization of the data from the same silent reading task on the SREP.Each silent reading task lasts 5 seconds, resulting in 2500 downsampled data points.An adaptive graph structure is constructed for each 5-second EEG signal, and its corresponding adjacency matrix is visualized.

Fig. 5 .
Fig. 5. Structure of the AGACN.The inputs of AGACN are each feature matrix and its adaptive adjacency matrix.The outputs of the second and third AGACN cell layers are cross-fusion convolution and following FC layer and softmax layer.whereH (l) is the output of the previous layer as the characteristic matrix of the current layer.The feature of the first AGACN cell layer is F. G(d) is the d-th adjacency matrix that corresponding to the d-th characteristic matrix.In this work, G(d) is equal G.We use three AGACN cell layers to classify complex semantics.The output of the second layer is transposed ((H (2) ) T ), and as Attention to Cross Fusion (CFA layer) convolves with the output of the third layer to fully capture the connections of dynamic features between channels.Using (H(2) ) T as the convolutional kernel with a receptive field corresponding to all the data in the third AGACN cell layer (H(3) ) ensures comprehensive integration of the valuable information captured in both (H(2) ) T and H(3) .Therefore, the CFA layer is as follows:

Fig. 8 .
Fig. 8. Confusion matrix of four datasets classification using AGACN.The numbers corresponding to the horizontal and vertical coordinates represent the EEG signal category of the corresponding silent reading task in each data set in turn.

Fig. 10 .
Fig. 10.PCC value between any two channels.(a) is the PCC values of any two channels from 59 channels.(b) is the PCC values of any channels from AK-channels (c) is the PCC values of any channels from CTK-channels.

TABLE I CORPUS
OF EACH TASK CATEGORY the experiment.According to the set program, computer A passes the marks wirelessly to computer B. Computer B is used to record EEG signals and corresponding marks.

TABLE III THE
TOP 12 AK-CHANNELS ON EACH DATASET 7. The validation accuracy for the new training set using 2 to 59 channels on different datasets, respectively.

TABLE V THE
TOP 12 CTK-CHANNELS ON EACH DATASET

TABLE VII THE
RESULTS (%) OF ABLATION STUDY ON EACH DATASET