Time-Series Data Classification and Analysis Associated With Machine Learning Algorithms for Cognitive Perception and Phenomenon

Analysis and collection of time-series data as a major role of machine learning has been emphasized with an important key in cognitive science. Because the cognitive mechanisms such as human sensation and perception from cognitive science are fast responses ranging from a few milliseconds to hundreds of milliseconds, the method of pattern recognition and analysis of these brain signals must be done and it is necessary to derive some information. In this paper, we investigated time-series data of cognitive function of the brain obtained using a non-invasive technique on multiple channels via signal classification and analysis, using a cognitive science approach and experiments. The test dataset was collected in 19 channels using functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG) techniques with multiple rests and working conditions on eight subjects. From this perspective, the main contributions of this paper are that it completes the collection and analysis of cognitive-scientific time-series data and has scientific implications that extend to other integrated domains, energy, manufacturing, bioinformatics, and finance area. The use of Shapelet and DTW (Dynamic Time Warping) classification techniques on brain signal time-series shows the potential to identify neuro-biological phenomena that can proactively signal a disease or disorder. EEG bandwidth and frequency-specific data have also been categorized as machine learning algorithms and have shown accurate patterns and trends in measuring cognitive functions of scientific, biological and academic importance.


I. INTRODUCTION
There are many ways to analyze data with a machine learning algorithm and reclassify it. Among them, it is true that it is necessary to consider the most effective methodology to handle a large amount of data through bioinformatics. At the same time, it is possible to produce scientific results if it is possible to measure and analyze changes in human cognition, such as perception and emotion, etc. Real-time analysis of brain functions medically, as well as in the domain of bioinformatics, has important implications. Epilepsy needs to be monitored on a regular basis to determine when and where seizures occur [1]. Therefore, if the signal processes The associate editor coordinating the review of this manuscript and approving it for publication was Min Xia . and machine learning methods discussed in this paper can be applied to predict the onset of epilepsy, preemptive treatment and countermeasures are possible.
In response to cognitive tasks, individual cognitive workloads are compared to other individuals using machine learning techniques, and real-time data patterns can be connected to specific physical tasks [2]. For diseases such as seizures, epilepsy, fit, etc., when the brain cognitive function decreases, it is important to find out the changes in the brain signals most quickly. This can be achieved by acquiring brain signal time series, classifying the time series and extracting important patterns and features from it [3,4]. The acquired patterns and features can be stored as patient information and compared with themselves and other human data. In this way, when there is a brain-related disease in which a change in the patient's cognitive function occurs, such as in the case of epilepsy or depression, brain signal data can be helpful in the diagnosis of stages and tendencies of the disease.
Two popular techniques to acquire the neuro-physiological data are using the electroencephalography (EEG) and functional near infrared spectroscopy (fNIRS) [5], [6]. Functional NIRS (fNIRS) offers a way of detecting brain activity through local hemodynamics that is similar to the activation of brain cognitive neural networks measured by functional magnetic radio image (fMRI) [7]. In addition, fNIRS reliably reflects the cognitive load measured by the individual and can be used in real-world settings. Consequently, fNIRS has also been found to be suitable for tracking changes in brain connections and for applying data processing algorithms and classification-based machine learning techniques.
In this paper, the brain signals are acquired using both EEG and fNIRS, which is a hybrid approach. After performing the pre-processing [8], the features are extracted from the brain signal time-series as shown in Figure 1. The data classification and pre-processing technique, based on machinelearning technology, is then applied on the extracted features to understand the brain functionality and signal correlation. The storage and analysis of complete time-series data of the patient takes a lot of storage space and time. Therefore, it is important to extract the important features from it and then store these reduced datasets. These datasets can then be compared with the patient's own history and with other patients as well. This comparison and analysis will help in understanding brain related illnesses and how these illnesses effect the brain signals and can lead to an early diagnosis.
Different data classification approaches have been investigated for EEG and fNIRS previously [7]- [9]. The simplest form of time series classifiers is the nearest neighbor algorithm, k-NN algorithm for classification of objects [10]. Another relatively new classification technique is the time-series Shapelets, which are primitive data mining and require no storing or searching of the entire dataset and can give more insight into the data [11]. Meanwhile, dynamic time warping (DTW) classification technique is also used for finding an optimal alignment between two time series data sets under certain restrictions [12]. The DTW technique can be used to learn a single Shapelet and to compare the brain cognitive data of the other channels to obtain similar subsequence of time series [13].
In this paper, the human subjects were given different activities and there brain activity was recorder using fNIRS and EEG (namely hybrid approach) as shown in Figure 1. The discrimination of mental states during a rest and a task was performed by extracting the respective time-series. The feature in the time-series was then classified by applying the time series Shapelets. These Shapelets were then compared with different brain channels data using the dynamic time warping (DTW) classification technique. Furthermore, when these Shapelets were compared among the human subjects, a brain signal pattern was observed which can help medical field experts in diagnosis of mental health related diseases and disorder.
As shown in Figure 1, brain signals and analysis are very useful for understanding cognitive mechanisms in real time. In this case, the procedure of analyzing the hybrid approach occurring in 1st-time domain and again analyzing and diagnosing hybrid occurring in the n-th time domain is effective when applied to 1st-patient to n-th patients. The analysis method of multi-channel and signal will be dealt with separately because it is a problem in the multi-channel region [14]. In the case of psychiatric problems, these brain signals and time series analysis detection can be used as an important tool for treatment or diagnosis for the diagnosis of depressive disorder by psychiatrists [15], [16]. The scientific value and importance of the brain signal measurement, its data storage, analysis and classification allows for early detection is explained in this paper for early detection and diagnostic of the brain related disease.
In the biological science, the detection of abnormal patterns and early detection of brain signals can be used for diagnostic purposes because neuro vascular coupling is important for brain cognitive function when natural brain abnormality can be detected using non-invasive techniques [17]. In addition, by looking at the changes in brain cognitive function using cognitive sciences techniques, it is expected that brain diseases and psychiatric illnesses (especially, epilepsy or severe depression disorder), even if they have mild symptoms, can be detected. The stored data can be used as evidence of good scientific inquiry for the purpose of an accurate ''diagnosis'' in a hospital that can measure real-time or at least cumulative signals by looking at neurological changes, i.e., in the case of patients entering the emergency room.
By using two machine learning algorithms, namely Shapelet and dynamic time warping (DTW), the classification and analysis of the cognitive workload is performed [18]. This work is contributing towards developing a software tool for quantitative and real-time assessment of cognitive workload in time-series data [19], [20].
The following are the key contributions of this paper: • We investigated that cognitive mechanism in which the brain signals are acquired using both EEG and fNIRS, which is a hybrid approach, after performing the pre-processing and classification.
• Experimental tests have been set up with a configuration of sequence of 120 seconds (eight healthy adult volunteers) and validated through software tools.
• Simultaneously, the mathematical correlations and impact of whole brain analysis were made in terms of abnormal time-series pattern extraction and machine learning perspectives.
• Establishment of a subject and each EEG 19 channels, and entropy, threshold value verified with a threshold value of Euclean distance, VD, for frequency domain.
The remainder of the paper is organized as follows: Section II describes the experimental setup, protocol and data pre-processing for the machine learning algorithm. Section III presents data classification and how it can extract the pattern from time-series using shaplets. Section IV describes the experimental results, Section V discusses the comparison of time-series using shapelets and DTW, and Section VI concludes the paper.

II. EXPERIMENTAL SETUP A. EXPERIMENTAL SETUP AND DATA INJETION
In general, fNIRS is a non-invasive method of measuring changes in local oxygenated and deoxygenated hemoglobin (oxy-Hb and deoxy-Hb; wherein after HbO and HbR) concentrations. In this case, the characteristic absorption spectra of hemoglobin provides information on cerebral hemodynamic changes in a variety of clinical situations, requiring consideration in the near infrared range [21], [22]. Therefore, in this study, fNIRS measurement data was collected by using the NIRScout extended (NIRx Medical Technologies, New York) system [23]. Sensors and detectors were placed over the head and secured with gels and holders as shown in Figure 2. Sensors and detectors were placed 3 cm apart. In general, fNIRS works by transmitting and receiving infrared light (760 nm and 830 nm) and fNIRS does not cause any risk other than the discomfort of having sensors and detectors placed over the head for the duration of the study. It was recording 19 channels at 6.25 Hz sample rates for optimal data collection purposes. The fNIRS dual wavelengths were set at 760 and 830 nm and the software program, NIRStar, by NIRx was used to verify the signal quality before each recording. The fNIRS channels were co-located at the International standard 10∼19 sites FP1, FP2, F7, F3, FZ, F4, F8, C3, CZ, C4, T3, T4, T5, T6, P3, P4, PZ, O1, and O2, covering all areas of the scalp.
To measure the EEG signals, probes for 8 × 2 channels were measured. The co-located, standard international 19 sites EEG and fNIRS channels covering all areas of the scalp are shown in the Figure 2. The EEG reference electrode is on the ears and the grounding electrode is on the forehead. The measured potential difference is a voltage drop, and a triplet holder was then used to manage the EEG and fNIRS sensor and source-detector pairs at each position. [24].

B. EXPERIMENTAL PROTOCOL
A block of the experiment was performed as REST→TASK→REST→CONTROL sequences of 120 second intervals. In the 30 second REST period the subject relaxed without moving, thinking of anything in particular, or falling asleep. This was followed by a 30 seconds TASK. The TASK was followed by another 30 second of REST and then a 30 second CONTROL. The final portion of the block consisted of a CONTROL condition. We included the CONTROL condition since it was less likely to be contaminated by thought than the REST period. Each recording VOLUME 8, 2020  contained three repetitions of the block. Henceforth we use the term episodes to refer to a TASK, REST, or CONTROL time segment as shown in Figure 3. Eight healthy adult volunteers participated in the study (8 males, mean age 26 years (range from 24 yrs to 28 yrs)).

C. DATA PRE-PROCESSING
The fNIRS data analysis was carried out using the open source software HOMER2, which was developed by Dr. David Boas's group from the Martinos Center for Biomedical Imaging and Matlab (Mathworks. Natick, MA) [25]. The intensity signals were converted into optical density (OD) signals for each source-detector pair. The signal was visually inspected for the effects of motion and other artifacts. Motion artifacts were detected and removed using HOMER2.
Motion artifacts were defined as signals variations larger than 5% of the standard deviation of the signal within a time-period of 1 second. A channel-based cubic spline interpolation was used for the motion artifact removal. fNIRS signals were preprocessed by bandpass filtering (0.010.5 Hz) in order to reduce slow drifts and some physiological noises (e.g., heart rate). Hemoglobin (Hb) concentration changes were computed by the Modified Beer-Lambert law [26]. fNIRS Power Spectral Density (PSD) from all channels were also computed and illustrated. The fNIRS PSDs typically have a peak at 0.1 Hz (Mayer waves), which is especially pronounced for oxygenated-hemoglobin (HbO). For each channel, ten time-series were extracted, eight for EEG, one each for HbO and HbR as shown in the Figure 4.
Eight EEG time-series were extracted for each channel. The band-pass filter of different frequency ranges were applied on each of the eight EEG time-series [27] as shown in Figure 4. Signals measured from the low frequency band (0∼4 Hz, d) to the high frequency band (28∼32 Hz, y) have a significantly high HbO and enable analysis of the activated EEG signal even though it is measured through the channel of FP1. That is, it can be seen that the Red line of HbO was successfully recorded for each channel. Eight EEG timeseries were extracted for each channel. The band-pass filter of different frequency ranges were applied on each of the eight EEG time-series as shown in the Figure 4.

III. METHOD
To increase scientific value of this paper, this paper focused on time-series data analysis rather than recording human neurophysiological activity using two different modalities, EEG and fNIRS. As a result, the time series data classification used in the artificial intelligent mechanism was analyzed from the perspective of cognitive science recognition and phenomenon.

A. DATA CLASSIFICATION
The time series contains the data of three states, which included REST, TASK and CONTROL. As we want to classify the REST and TASK states, we extract only REST and TASK states from the time-series data to avoid confusion. This will also reduce the overall time-series size and complexity. As shown in the Figure 5, now we have six REST states (R1∼ R6) and three TASK states (T1∼T3), each having 30 second duration time intervals. The x-axis represents the time for 12 states (REST, TASK and CONTROL) the total time was (12 × 30) 360 seconds. This leads to total of 1,710 time-series of EEG and fNIRS from 19 brain location/channels as shown in Figure 5.
After splitting the time series and considering REST and TASK only, we have 30 sec sub time-series for each subject, 9 states (R1∼R6, T1∼T3) for 19 different locations. In Figure 6, it shows the whole brain signal analysis using EEG and fNIRS data in time-series data classification and analysis purpose. It should be noted that plotting signals by frequency band (0∼4 Hz to 28∼323 Hz), state (REST1 to REST6, brain location (all 19 sites), and signal all at once is not an easy task. Moreover, it is not easy to analyze time series data by HbO and HbR simultaneously. VOLUME 8, 2020

B. PATTERN EXTRACTION FROM TIME-SERIES USING SHAPELETS
The extracting of the patterns from the above classified timeseries data (shown in Figure 6), can help medical domain experts to get insight of the brain signal's meaning and purpose. Therefore, we apply the Shapelet technology to extract the useful time-series pattern from the already classified timeseries. In this regards, hence, the Shapelet pattern will be a sub-portion of the already classified time-series [12].
Therefore, we can find these patterns through a huge amount of shapelets through brain signal data. Shapelets identify maximally discriminative segment of the time-series data [28], [31]. Shapelet is used for high prediction accuracy has the advantage of being able to find the subsequences that best differentiate between classes [29], [32].
Meanwhile, our simulation results are indicated that this sequence is related to the distance between the objects which are measured by Euclidean distance (ED) [30], [33]. Therefore, finding the most similar sequences or sub-sequences at the same time is very difficult, so we have contributed to the scientific commitment to finding similarities or prediction in sequences using machine learning techniques.
Predicting data with this machine learning algorithm, which is the Shaplets learning method, is a key to extracting patterns. The top performing segment was selected after finding the distance between the time-series and a different shapelet in the single time-series data. In other words, knowing the sub-sequence is the most effective and useful machine learning technique applied.
In this paper, the ultimate goal of pattern extraction using Shapelet is to first find the appropriate channel among the 19 scanned channels. That means selecting the physical EEG + fNIRS channel position which relates to the brain area activated by the TASK performed. After selecting this appropriate channel as a reference, the extracted shapelet from this channel time-series data can then be compared and matched among the same physical location signals (8 EEG + 2 fNIRS) and also with the other 19 acquired brain location data.
As each brain location data has 8 EEG time-series data, we use the Euclidean distance to map the points in these eight time-series according to their time-stamps, respectively. Euclidean distance is used for time-series having same timestamp. If we choose a reference EEG shapelet, then we can measure and compare its distances with the other seven shapelet time-series as shown in Figure 7. Then by comparing this distance data, we can find the optimum Shapelet, which will represent the brain signal in each REST and TASK.
For example, we first find a reference Shapelet from R4 as shown in Figure 7. Then we can measure the Euclidean distance between the R4 shapelet and R1 as where, S(t) denotes the reference Shapelet, R1(t) denotes a 1st REST state, and L(t) denotes length of Shapelets. The fNIRS data time series was taken from another instrument and we can observe the tendency of the response slightly delayed overall than the EEG signal. The shapelet were first found in the two segmented HbO and HbR time-series and then using Euclidean distance, the shapelets were compared to find the optimum shapelet representing the work done by brain in TASK state. The distance between the TASK and REST shapelet will be used to distinguish the TASK shapelet.
Thus, each brain channel (location) will give us two Shapelets, one for EEG and one for fNIRS data. Each Shapelet is accompanied with Euclidean data which tells the quality of the Shapelet. Therefore, the 19 different locations in the brain are each represented by a unique Shapelet set (EEG + fNIRIS Shapelet).

IV. RESULTS
In all experiments, the pattern extraction is performed in consideration of cognitive factors in collecting and analyzing time series data. The ultimate pattern extraction is to enable early detection and prediction through shaplets. The results of this case are measured by EEG and fNIRS A. MEASUREMENT OF THE TIME-SERIES EEG Case: For Patient-A2, data was acquired in the EEG alpha (α) band (8∼12 Hz) as shown in Figure 8. The 19 location brain EEG data is shown Figure 8.
In the randomly selected P3 time-series data, the Shapelets were found for (R1∼R6) and (T1∼T3) as shown in Figure 9. Choosing the REST6 shaplet as reference, the Euclidian distance between all the shaplets (9 cases) were compared and a distance list ( d) was generated. It can be seen that the TASKs have a larger distance when compared to the REST Shapelet. Figure 9 shows a randomly selected P3 data. And, it also illustrates shaplets for Patient-A2 case. As shown in this  figure for patient-A2, you can see similar shapes in REST1, REST3, and REST2 in P3. At the same time, the same pattern can be seen in TASK1 and TASK3, even if randomly selected. This shows that the predictions from the machine learning technique are correct.
At the same time, the calculation for using the Euclian distance was performed, and it was confirmed that the REST and TASK for each state saw the same plane. In the case of REST6, which has a zero value, the length of the set is set as 10 seconds. Difference, d, is the Euclidean distance VOLUME 8, 2020  between the Shapelet and the 9 states (REST1∼REST6, TASK1∼TASK3). For example, d (REST1) is approximately 8.776, and all the REST have similar value, while the TASK have larger distances than the RESTs.
In order to find a certain pattern, it is necessary to compare the value of d with each state. Therefore, the distance from the threshold value can be recalculated as in the case of Patient-A2. In the application where the threshold value of position was 12.038296791, it was possible to find a case where the REST state dropped sharply at the last part because d is zero. As can be seen in Figure 11, for the Patient-A2, all REST states are below the threshold. But all the TASK states are above it.
VD (Vector Distance) is the nearest distance in the gap between two points on each side of the threshold and calculated using Equation (2), the two points in Green circles in the plot (TASK1 and REST5), as you can see in Figure 12.

VD =
Gap in Y direction of the two ppoints mean of all nine d (2) where, G (t) is gap in the Y direction of the two points, and m( d) is the mean of all nine d.  fNIRS Case: Similar to EEG, the fNIRS (HbO and HbR) time-series was classified and the Shapelet technique applied to it. The 19 channel HbO data for all eight were acquired and is shown in Figure 13. The x-axis length of the time-series is 30 seconds and the Shapelet length is 15 seconds. Each plot is a state that contains the shapelets among 9 states (RESTs and TASKs). A Blue line represents the Shapelets found in REST states and the Green line in TASK states. Dotted lines represent the Shapelets which are considered good; having entropy > 0 or VD (Vector Distance) < 0.1. The solid shapelet means that it correctly classifies all 9 states and has a good margin.
We have found the shapelets above in REST or TASK in Figure 14, now we will find shapelets in only one of the REST and TASK states in the future for easy subject-channels comparison. In Figure 14, the scaled shapelet for 19 channel and all 9 states (R1∼R6, T1∼T3) is shown. For example, row T3 shows all 9 states and the shapelet in REST1 (Blue line). In row PZ, the shapelet is in TASK3 (Green line). Machine learning can be utilized as a comfortable tool for viewing the HbO Shapelet visible at a glance for all nine states and nineteen channels at a time, as shown for a single Patient-A2.
Doing this process for all 8 subjects, we have information of shapelets for all channels for all subjects. All information of HbO shapelet for channel T3 is shown in Figure 15. As shown in Figure 15, for Patient-S1, the row shows all 9 states and the matching Shapelet is in REST3 (Blue line). For Patient-K1, the shapelet is in TASK2 (Green line). Entropy and VD calculations were performed on each channel to verify the accuracy of the classification by shapelet, and the classification technique was found to be 93% accurate.

V. COMPARISON
Based on the experimental results, we have the two best Shapelets from each 19 brain location for EEG and fNIRIS data; therefore, we must compare the 19 locations data to find the appropriate Shapelet which can represent the brain activity in the TASK state. As the 19 locations represent different brain locations, data might not be directly related to the functionality during the task and the brain location response might be different.

A. CLASSFICATION OF TIME-SERIES USING DTW
We utilize the dynamic time warping (DTW) technique, which is used for measuring similarity between two time-series segments which may vary in speed or time.
fNIRS Case: As explained in the previous section, the VD and entropy calculations were experimented on the HbO and HbR Shapelet data.
The DTW and non-DTW shapelets are compared in the Figure 16 for patient-T1 and channel O2. In the upper plots, Figure 16   the plots are shortened and VD is enlarged), but striking enhancement is not seen. In the lower plots, Figure 16(b), two Shapelets have almostly the same shape. (They are found in the same state -TASK2). And, the shortest distance positions are alike in the other states.
For the patient-T3, the Shapelet and DTW data are compared in Figure 17, where the Blue line represents Shapelet and Green line represents the time-series of each state. In this case, from the legend, DTW represents the DTW distance, and EUC represents the Euclidean distance.
We can see that the DTW distances are largely shortened compared to Euclidean distances in TASKs.
Here, we can observe the difference between the shaplet and DTW that we want to see is REST as the difference between the values comes out when the TASK is given. At the same time, when returning to the REST state, patient-T3 also returned to its original state in channel O2. However, when the Euclidean distance value is small, the original return tends to be slow.
Moreover, similarly, DTW was applied on all 8 subjects and each of the 19 channels, and entropy and VD calculations were performed on the DTW applied on all 152 cases (8 subjects × 19 channels) data. In this case, in 17 cases the entropy was a zero value as indicated in Figure 17 on the bottom right. Therefore, the DTW and non-DTW data was compared.

VI. CONCLUSION
Finding anomalies from the perspective of cognitive science with time series data is necessary in many convergence fields. The introduction of machine learning techniques is becoming more and more popular as an early diagnosis method. When looking for abnormalities in natural scientific brain cognitive function, examples can be easily found in neurovascular coupling or neurobiological phenomenon.
In addition, cognitive science, as well as the changes in brain cognitive function when applying the theory of bioinformatics, can be used to examine the case of brain diseases (especially, epilepsy or severe depression). In particular, as discussed in this paper, brain signal analysis and acquisition is a necessary field regardless of any domain, i.e., manufacturing, network, social, financial, social science). In the bio-informatics domain, for example, psychiatric disorders, even if you have normal symptoms, you can look at special neurological changes (e.g., for patients entering narrow spaces, flashing spaces) to see brain changes in real time or at least accumulate signals. It can be used as evidence of a good scientific inquiry that can be used for the correct ''diagnosis'' and ''analysis'' purposes in measurable hospitals.
A hybrid approach, Shapelets and DTW algorithms, applied to fNIRS data show good potential for distinguishing different levels of mental state. This is the first step towards the development of interactive human-computer interfaces and cognitive science research. Future work will be directed at finding more efficient and systematic ways to choose the right instances (period of activations) from which we can get the best classifications according to machine learning algorithms.
As a result of comparing the shapelets before and after the application of the DTW, it was found that DTW is little hard to follow in terms of performance when comparing the shapelets. Moreover, the computing power required for applying the DTW technique on Shapelets was very high; this can be reduced by reducing the Shapelet lengths. However, the DTW technique will be more significant when comparing the Shapelet having phase transformation or shift.