PredictEYE: Personalized Time Series Model for Mental State Prediction Using Eye Tracking

Mental health is vital for emotional, psychological, and social well-being. Mental illness can affect thoughts, feelings, and behaviors. Early intervention and specialized care can manage major mental illnesses. Predicting mental state accurately can facilitate behavioral changes and promote overall well-being. The paper proposes a novel personalized time series model called PredictEYE, which aims to predict a person’s mental state and identify the specific scene responsible for that mental state. The model achieves this by analyzing individuals’ eye-tracking time series data while watching calm and stressful videos. The model utilizes a deep learning time-series univariate regression model based on Long Short-Term Memory for predicting the future sequence of each feature and a machine learning-based Random Forest algorithm for the mental state prediction. The model’s performance was compared across the state-of-the-art literature survey. The predictEYE model could achieve an accuracy of 86.4% accuracy in predicting mental state. Tailoring eye tracking models to individual differences is more effective in comprehending mental states than models that make comparisons across multiple participants, given eye tracking data’s unique and distinctive idiosyncratic nature. The eye tracking features play a crucial role in predicting the mental state, and the model is adaptable to work with webcam-based eye tracking and can relate to applications where continuous and non-invasive monitoring is required.


I. INTRODUCTION
The growth, development, and productivity of society depend on health, which is essential and significant for a happy and healthy life anywhere in the world.Mental health is an integral and essential component of health.Everyone can be affected by mental disorders, irrespective of age and gender.Brain and mind-related health conditions are increasing daily, and it has become necessary to monitor mental health regularly.So everyone in society requires accurate measures The associate editor coordinating the review of this manuscript and approving it for publication was Claudia Raibulet .
for mental health monitoring [1], [2].Mental health and physical health are closely linked.Mental illness, such as depression and anxiety, affects our ability to participate in healthy behavior.It plays a vital role in maintaining good health [3], [4].
In many countries around the world, primary healthcare systems typically consist of a combination of community health centers, primary health clinics, district or regional hospitals, and other specialized institutions.Integrating mental health care into primary care programs is a way to treat mental health conditions in the primary care system [5].It can involve treating a person entirely with existing health problems.However, as with almost all medical issues, it carries a stigma, and those with mental illness become even more ineffective in the workplace.Growing evidence indicates that receiving mental health care faces much more challenging obstacles than receiving physical care [6].The shortage of mental health professionals leads to the unavailability of needed care.Financial constraint is another mental health service issue that prevents people from accessing necessary therapy.Therefore, low-cost solutions to indicate mental illness early in its development are essential.
One of the primary reasons for mental illness is stress.Stress in an individual activates the sympathetic nervous system and releases hormones that cause changes throughout the body.Stress induces many physiological changes in the body.It causes an increase in blood pressure, heart rate, sweating, pulse, pupil size, quicker breathing, muscle tension, dry mouth, and accelerated heart rate [7].A person's stress level is measured based on their mental workload, cognitive ability, and emotional or mental state.A person's mental state can vary based on physiological and cognitive conditions.There are mental states like calm, stress, drowsiness, distraction, cognitive load, and mind wandering [8].
A person's mental health can be assessed subjectively or objectively [9], [10], [11].Subjective measurements refer to information based on personal opinions, feelings, and perceptions.It can be measured based on well-tested, standardized questionnaires that anyone can undertake to assess their mental health.These measurements have two serious shortcomings: they can't be taken in real-time, and they are subject to psychological bias.Objective measurements refer to information that is based on observable and measurable facts.Personal biases or opinions do not influence it.Objective measurements can be obtained based on physiological and behavioral measures.Audio video recordings are used to collect behavioral data.Physiological measurements are frequently utilized to monitor mental health because they allow for unobtrusive and involuntary data collection [12].The physiological aspects of a patient, including their eye movements, brain activity, heart rate, dermal activity, facial expressions, voice modulations, temperature, respiratory rate(BPM), and blood pressure, are sensed by various technologies to help track mental health (BP) [13].
Remote [14], [15] and personalized [16], [17], [18] monitoring are two approaches to track health and wellness data that have gained significant attention recently.Remote monitoring involves technology to monitor an individual's health and wellness remotely.Remote monitoring in the healthcare sector could empower physicians to deliver high-quality care by keeping patients safe and healthy [15].With multiple sensing systems, physicians can track patients' health effectively, monitor remotely, and provide immediate care [15].
Personalized monitoring involves tracking health and wellness data specific to an individual.It includes tracking dietary intake, sleep patterns, and other vital parameters that can impact health [18], [19].Individuals can learn more about their health and wellness by keeping track of this data.They can use this information to make wise decisions about changing their lifestyles in ways that can enhance their general welfare.On the other hand, A non-personalized model is trained on a larger dataset that is not specific to any individual or group.A non-personalized model aims to provide recommendations or generic predictions that are applicable to a larger population.
Developing a personalized model involves data collection, feature engineering, model selection, training, validation, deployment, and monitoring.Relevant features are identified, and an appropriate model is chosen for the research question.The model is trained, validated, deployed, and continuously monitored and updated for optimal performance in a realworld setting.
Many researchers have built systems based on machine learning [20], [21], [22], [23], [24], [25], and deep learning [16], [20], [26], [27], [28], [29] approaches.The deep learning-based time series personalized model has the advantage of capturing patterns and changes over time, leading to more accurate and personalized predictions.Here we propose a novel predictEYE model that utilizes a deep learning-based time series model to accurately predict a person's mental state and identify the specific scene in a video that triggers that mental state [30].PredictEYE forecasts the mental state based on the eye-tracking data obtained while watching calm and stressful videos.Furthermore, our model goes beyond just detecting the mental state; it can also pinpoint which content in the video induces that particular mental state.This unique feature distinguishes our model from other physiological measures-based models typically used to assess mental states.These findings are presented in this technical paper.
The paper makes a significant contribution to the field of time-series prediction and mental state assessment.We have developed a novel framework called PredictEYE, which combines the power of univariate Long Short-Term Memory (LSTM) and the Random Forest algorithm to forecast future eye gaze data sequences and predict an individual's mental state based on these forecasts [31].It also aims to identify the scene responsible for that mental state.By utilizing timeseries eye gaze data, our model can accurately predict future sequences, offering valuable insights into how individuals' visual attention might evolve over time.To validate the effectiveness of PredictEYE, we incorporate the user's Galvanic Skin Response (GSR) data, which serves as a benchmark for assessing the model's performance.Comparison with GSR data demonstrates the high efficacy of our approach, highlighting its potential in accurately capturing and predicting mental states.
The paper is organized as follows.A study on related works was performed and reported in Section II.A description of the methodology adopted in PredictEYE is provided in Section III.The implementation specifics for forecasting

II. RELATED WORKS
Mental well-being is dealing with life and its various pressures and challenges.Along with the social determinants of health, other factors can increase our stress levels and harm our sense of well-being [32].Developing a personalized model for monitoring mental health requires careful consideration of data collection, feature engineering, model selection, model deployment, and monitoring, as shown in Figure 1.The process can be complex but can improve mental health outcomes and provide personalized support to needy individuals.
To develop a personalized model 1 for monitoring mental health, researchers and practitioners need to explore different types of data that can inform mental health statuses, such as physiological signals, behavioral data, self-reports, social media [33], [34], or environmental factors.They also need to consider different models or approaches to leverage these data to predict, classify, or diagnose mental health issues, such as machine learning, deep learning, signal processing, or natural language processing.Based on this, the literature survey is performed on various types of data used and models available for building such a model.

A. DATA AND FEATURE ENGINEERING
The data collected for mental health monitoring can be categorized as either subjective or objective, with subjective data being based on personal experiences and perceptions and objective data being based on observable, measurable factors.Subjective data are mostly collected using questionnaires, mock interviews, polling, and checklists to assess mental wellness.Standardized and well-tested questionnaires like NASA Task load Index(TLX) [9], [12], [35], Trier Social Stress Test [10], State-Trait Anxiety Inventory (STAI) [11], [16], Karolinska Sleepiness Scale (KSS) [9], [11], Shortened State Stress Questionnaire (SSSQ) [11] Stanford sleepiness Scale(SSS) [9], Wardwick -Edinburg Mental Wellbeing Scale (WEMWBS) [16] are used for the assessment.The main limitations of the subjective data are that they can not be collected in real-time, and there can be psychological biases.Carelessness and memory lapse can lead to wrong assessments [36].
The objective data are usually collected using non-invasive wearable devices or audio-video recordings, making the data collection easier.The objective data include behavioral and physiological data.The behavioral data, facial expressions [37], audio signals [38], gestures [39], head [40], [41], hand [42], leg [43], eye movements [44] and eye contacts [45], [46] are collected using audio and video recordings or using wearable devices.The behavioral data can be collected in real-time using any non-invasive devices.Microphones or acoustic sensors are used to collect the audio signals.The speech and voice modulations are analyzed to detect mental illness [47], [48].Digital cameras and webcams can detect people's movements and extract mental healthrelated data.The typing speed, mouse usage speed, the texts used in messages and emails, body postures, and time spent with mobile and computer are considered the behavioral data for the analysis [49].The limitation of behavioral data is there can be voluntary control of the gestures, and their expressions can vary according to the culture and language.Hence, the assessments can go wrong.
Here, we mainly focus on physiological data obtained from our eyes.Eye tracking is a sensor technology that lets one know where the eyes are focused.It helps to find where people look and what they see.A strong connection exists between attention and eye movements based on the ''eye-mind hypothesis'' [57].Eye tracking technology helps tap into subconscious processing and identify the elements that attract immediate attention.The aspects that attract above-average attention, the parts ignored, and the order in which the details are noticed can be identified with the help of eye tracking [58].Eye tracking technology provides insights into participants' eye movements while engaging in various activities and can illuminate the unconscious mechanisms that underlie human behavior.Eye Tracking helps to measure cognitive load, concentration, focus, drowsiness, consciousness, and other mental states [50], [59].Technologies to track the eye have become efficient, cheap, and compact and are increasing use in many fields, including gaming, driver safety, military, education [60], product recommendation system, psychology research, cognitive studies [51], market analysis, medical research [52], [61], [62], advertising [63], and healthcare [64], [65], [66], [67].Eye tracking can provide valuable information about emotional responses, making it an important tool in mental health monitoring.
Eye tracking data obtained for each person can be considered idiosyncratic, which means that it is unique and specific to that individual [68].The way a person gazes, fixates, and responds to different stimuli or mental states can vary significantly from one person to another [69].This uniqueness arises due to a combination of factors, such as individual differences in cognitive processing, attentional focus, emotional responses, and eye movement patterns.
The galvanic skin response (GSR) is a physiological marker providing information about emotions, cognitive processes, and behavior [70].GSR can potentially be used as a method of neuro-rehabilitation for people with mental disorders [71].It can be used as a stress sensor to detect emotional states with a high success rate [72].Adding GSR to eye tracking studies can help understand what moments of media content are more emotionally engaging for participants.It can also be used as a measure of workload or stress level.GSR, along with pupil dilation, reflects mechanisms of sympathetic activity, providing insights into a participant's arousal level or cognitive load during a certain task.Overall, GSR is a useful tool that can be used to validate eye tracking and provide ground truth for cognitive and emotional processing.Therefore, the PredictEYE model, based on eye tracking data, used GSR data for its validation.
Feature engineering in eye tracking involves selecting and transforming relevant eye movement data attributes to improve the accuracy and effectiveness of the model in predicting mental states or other relevant outcomes.Many features can be extracted based on detecting events such as fixations, saccades, blinks, and features related to pupil size, gaze position, and areas of interest (AOIs).Effective eye-tracking feature engineering is essential for obtaining accurate and meaningful insights into mental states and other cognitive processes.
The use of statistical methods is prevalent in mental health monitoring research.Techniques such as correlation analysis, linear regression models, t-tests, and analysis of variance (ANOVA) are commonly used to find associations and differences in the data.However, traditional statistical methods assume a linear relationship between variables and can be sensitive to outliers and non-normal data.Researchers have introduced robust statistical methods such as robust regression, trimmed means, and bootstrapping [73] to overcome these limitations.These methods can provide more accurate and reliable results, improving the validity and reliability of the analyses.Multiple regression and correlation analysis are powerful tools that help researchers analyze complex relationships between multiple variables [74].In contrast, machine learning models such as neural networks and decision trees can handle non-linear relationships and outliers more effectively.
Statistical methods are also used in analyzing time series data.Traditional time series models can be linear or nonlinear, and some commonly used linear models include autoregressive (AR), moving average (MA), auto-regressive moving average (ARMA), and auto-regressive integrated moving average (ARIMA) [76].Autocorrelation function (ACF) and partial autocorrelation function (PACF) analysis can help determine the appropriate model for a given time series dataset by showing how data sequences are related.Nonlinear models such as autoregressive conditional heteroskedasticity (ARCH), generalized ARCH (GARCH), exponential GARCH (EGARCH), threshold autoregressive (TAR), and nonlinear autoregressive (NAR) models are also available for improved analysis and prediction.
The use of machine learning and statistical analysis applied to physiological data has shown potential in monitoring mental health and assessing human mental states [20].Machine learning models can be trained on labeled eye-tracking data to classify the user's mental state [21], [22].However, more investigation is required to develop the ideal eye-tracking paradigms and machine-learning algorithms for correctly diagnosing people with Autism Spectrum Disorder (ASD) and other neurological and neuropsychiatric illnesses.
Machine learning based on eye-tracking data could classify individuals with ASD and typically developing (TD) individuals with an 81% pooled accuracy [23].A study on young children indicated that with an accuracy of 85.1 %, fixation times at the lips and body could significantly distinguish ASD from TD [24].Video-based eye-tracking method for measuring brain function using readily available webcams has the potential for early detection, diagnosis, and remote/serial monitoring of neurological and neuropsychiatric disorders [25].
Machine learning models, including Decision Trees, Naive Bayes, Support Vector Machines, and Random Forests, can analyze vast amounts of data and detect patterns [13] that may not be visible to human observers, providing an efficient health monitoring system.The studies showed that machine learning algorithms could analyze the data from these sources to accurately classify and predict psychological conditions, with classification accuracy ranging from 66% to 90% depending on the dataset and features used.
Recent technological advancements have enabled deep learning algorithms to automatically detect mental states, using various physiological signals such as visual metrics, EEG, and eye-tracking [16], [20].One study used CNN-LSTM algorithms to analyze visual metrics time-series data and accurately classified individuals' mental health metrics levels with high accuracy, highlighting the potential benefits of home-based mental health monitoring for patients after oncologic surgery [16].The other study proposed the EYE-CNN-DLSTM algorithm for psychological testing based on eye movement tracking data, which uses a fusion strategy combining CNN and DLSTM to evaluate patients with mental disorders [26].
A recent study proposed a deep learning-based technique that combines EEG and eye-tracking signals to improve emotion recognition accuracy [27].The study demonstrated the advantages of using physiological signals to reflect a person's emotional state and highlighted the importance of eye-tracking signals in improving emotion recognition.The proposed approach utilizes a fusion model combining the Gaussian mixed model with signal filters, feature extraction techniques, and normalization methods to achieve precise emotion classification.
These algorithms, such as LSTM, can be optimized through hyperparameter tuning to predict better trends and fluctuations [28].Additionally, deep learning models can learn and understand the mapping between inputs and outputs of linear and nonlinear models, supporting multivariate forecasting [29].
While deep learning and machine learning models have shown promising results in monitoring mental states, they are often limited by the lack of personalization.This means the models are trained on a general population and may not account for individual differences in behavior, preferences, and emotional responses.Personalized time series models can address this limitation by incorporating individual-level data and tailoring the model to each person's unique characteristics.This approach can improve the accuracy and effectiveness of mental health monitoring, especially in cases where individual differences are crucial, such as in psychiatric disorders.Additionally, deep learning models often require large amounts of data to train effectively, which can be challenging in mental health monitoring, where data collection is often limited.Personalized time series models can work with smaller datasets by incorporating prior knowledge and individual-level characteristics, leading to more efficient and cost-effective mental health monitoring [80].
Time series modeling is a popular machine learning and deep learning-based technique that relies on historical data to predict future values of a target variable.The accuracy of attentional state classifiers can be increased by utilizing time series analysis for eye tracking data, followed by classification using convolutional neural networks as suggested in [31].This method can improve mental health monitoring and diagnosis by optimizing attentional state classifiers.Another study [77] compares deep learning models for time series forecasting of COVID-19 cases and highlights the importance of historical data points and geographical location.These findings can inform policy development during the pandemic.
Time series data analysis has shown the potential to improve mental health outcomes and enhance online learning.Acikmese and Alptekin [78] proposed a method for predicting stress levels using mobile sensors and LSTM networks, while Zheng et al. [79] developed an approach for estimating engagement in online learning using webcams.These methods can help individuals manage their stress levels and improve the effectiveness of online learning.

1) SUMMARY
Time series analysis can help identify patterns and changes in data over time, allowing for predictive analytics and forecasting.Time series models are designed to capture temporal dynamics and dependencies, providing more accurate predictions of data that fluctuates over time [31].Time series models can also be personalized to the individual, leading to more accurate predictions [16], [17], [18].
Eye tracking technology provides insights into participants' eye movements while engaging in various activities, including mental health monitoring [30].Eye tracking can measure cognitive load, concentration, focus, drowsiness, consciousness, and other mental states important in identifying and managing mental health issues.Eye tracking is becoming increasingly important in mental health research, providing valuable information about emotional responses and mechanisms underlying human behavior [50], [52], [58], [59], [61], [64], [65].
Eye tracking with time series and personalized models offers several advantages in predicting a person's mental state.Firstly, time series analysis allows for identifying patterns and changes in attentional states over time, providing insights into attention and decision-making-related cognitive processes.Secondly, personalized models consider each individual's unique eye movement patterns and mental states, leading to more accurate predictions of their mental state than generic models.Thirdly, time series models capture temporal dynamics, account for individual differences, and handle missing data more effectively, resulting in more accurate and interpretable mental state predictions.Overall, eye tracking with time series and personalized models can provide a powerful tool for developing accurate and personalized predictions of a person's mental state.
Validation with GSR is important because it can provide insights into a participant's cognitive and emotional processing, allowing for a more accurate interpretation of eye tracking data.The non-invasive nature of GSR measurement and its relative immunity to motion artifacts make it a valuable physiological measure for validating PredictEYE that predicts mental states [70], [71], [72].
Eye-tracking-based studies are a rapidly evolving field of research, and as such, there has been a limited amount of research on the use of time series models in this context.

III. SYSTEM MODEL
PredictEYE is a personalized time series model designed to forecast future sequences by capturing the trends present in the past eye tracking data and to predict the mental state of an individual and the scene responsible for the mental state [30], as illustrated in Figure 2. PredictEYE follows a pipeline that involves collecting and processing raw eye gaze data, extracting features, using an LSTM module to predict future feature sequences, training a Random Forest model on labeled data, and finally using the LSTM predictions as input to the Random Forest model to make predictions on mental state.
Similarly, GSR data is collected and processed using the same pipeline, including LSTM prediction and Random Forest model training.The predicted mental state from the Random Forest model based on GSR is used to validate the predictions made by the PredictEYE system based on eye tracking measures.

A. DATA COLLECTION
A 10-minute video including 5 minutes of calm and 5 minutes of stressful video was utilized as a stimulus [81].Before the actual stimulus, an introductory segment with a short animated movie was played for a 2-minute and 30 seconds.The calm [82] and stressful video scenes [83] were presented subsequently.The calm segment of the video was selected based on its ability to induce relaxation within 5 minutes, as per the recommendation of specialists in stress and anxiety therapy.The introductory video was not included in the analysis but served to acquaint the participants with the experimental setup.The calm video period was considered the baseline phase, with the expectation that the participants would experience a state of relaxation during this time.The data collection and feature extraction procedures are explained in Figure 3.The sampling frequency of 60 Hz allowed us to capture 60 samples of raw eye gaze data per second.In total, accumulated a substantial dataset of observations consisting of 10 minutes, or 600 seconds.This translates to a vast raw eye gaze data repository comprising 36,000 samples for a participant.The raw eye gaze data includes (X, Y) gaze coordinates, pupil diameter, and timestamp.

B. FEATURE EXTRACTION AND LABELING OF THE DATA
The features were extracted from raw eye gaze data using BeGaze 3.7 software, which employs the dispersion-based 128388 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.I-DT algorithm for detecting events such as fixation and blink [84].The identification by dispersion threshold(I-DT) algorithm is designed to detect fixation events in eye-tracking data using two critical parameters: the dispersion and duration threshold.It considers the X and Y coordinates from raw eye gaze data and identifies fixations as clusters of gaze points with a low dispersion value within a specific time interval.The algorithm can accurately identify and locate fixations throughout the eye-tracking data by iteratively applying this process.A minimum duration of 80 ms and a maximum dispersion of 100 pixels were considered in the I-DT algorithm for the fixation detection.A minimum duration of 70 ms was considered for the detection of blink.Events like fixation and blink have been identified, and the corresponding features have been extracted.The features extracted were Fixation duration, Fixation Dispersion X-Axis and Y-Axis, Pupil Diameter, and Blink Duration.In eye movement data analysis, fixations and blinks are mutually exclusive events.When fixation-related features are considered at a given time 't', blink-related features have a value of zero, and vice versa, due to the closure of eyes during blinks and the inability to have fixations during those periods.
The Fixation Duration: Fixation duration is the amount of time that a person's eyes remain still while they are fixating on a particular object.This measurement is typically recorded in milliseconds (ms) and determines how long a person is paying attention to a particular object or area of interest.
Fixation Dispersion X-Axis and Y-Axis: Fixation dispersion measures how widely dispersed a person's eye movements are while they are fixating on a particular object or scene.It is typically measured in terms of the X-axis (horizontal) and the Y-axis (vertical) in pixels.Fixation dispersion in the X and Y axes provides insights into how participants distribute their attention within an image or scene during fixation.It quantifies the spatial variability in gaze patterns, helping researchers understand how individuals allocate their focus during different mental states, whether calm or stressful, within visual stimuli.
Pupil Diameter: Pupil diameter measures the size of a person's pupils and is often used to indicate cognitive load or arousal.The size of the pupils can change in response to changes in visual stimuli and cognitive processing demands, and it is typically measured in millimeters (mm).
Blink Duration: Blink duration is the amount of time a person's eyes are closed during a blink.This measurement is typically recorded in milliseconds and can provide information about a person's cognitive workload, fatigue, or attentional focus.
Fixation, blink-based features, and pupil diameter were input to the LSTM model.The eye tracking data obtained from each participant during calm and stress videos are labeled and provided as input to the LSTM, followed by a Random Forest algorithm to predict mental state.
The plot of features extracted from eye tracking data for Participant 1 while watching the stress video is shown in Figure 2 as part of feature extraction.Each feature in the Y axis is plotted across the starting time of that particular event.
The data is labeled based on a threshold approach during the calm video period, which is considered the baseline phase.
Once the threshold value is determined, any value above the threshold is labeled as 'stressful', and any value below the threshold is labeled as 'calm'.This approach assumes that the baseline phase represents a state of relaxation, and any deviation from this state can be considered an indicator of stress.

C. TIME SERIES PREDICTION USING LSTM
The input data for the task at hand comprises time series eye tracking features.These features were preprocessed and subsequently input into a univariate LSTM (Long Short-Term Memory) model.A popular time series forecasting method is LSTM, which is based on recurrent neural networks (RNN).LSTM uses memory cells to remember the previous stages in the network [85].The memory cells contain the input, forget, and output gates.The input gate controls the flow of input activation, the forget gate decides how long the value should remain in the cell, and the output gate controls the flow of cell activation into other networks.The LSTM model is designed to learn from the patterns and dependencies present in the input data.By processing the time series input data, the model learns to capture the temporal dynamics present in the data and uses this information to make predictions about future values of the features.
The performance of time series models can be evaluated by comparing the predicted values with the actual values of the target variable.Commonly used metrics for evaluating time series models include: 1) Mean Absolute Error (MAE)-This is the average of the absolute differences between the predicted values and the actual values.Since it finds the absolute value, the value of MAE will be positive  1) to (7).
In the given formulas, y i represents the actual values, and ŷi represents the predicted values.By comparing the models based on these metrics, researchers can gain insight into the models that perform better for a specific task.

D. MENTAL STATE PREDICTION WITH RANDOM FOREST ALGORITHM
The Random Forest algorithm is trained based on the labeled eye tracking data.The predicted sequences from the LSTM model were given as input to the Random Forest algorithm to classify the mental state based on the predicted future data sequence.Being an ensemble learning method, Random Forest combines multiple decision trees to generate a prediction.In our study, the Random Forest algorithm was used as the classification algorithm to classify the mental state.Each decision tree created by the Random Forest algorithm is trained using a different subset of the data.The forecasts of each decision tree are then combined to get the final prediction.
By analyzing eye-tracking data, PredictEYE can also identify the areas of the video that an individual was looking at during different stages of the video and correlate these areas with the individual's reported mental state.This information can provide valuable insights into factors contributing to an individual's mental state, allowing for more targeted interventions or treatments.
Classification model performance is assessed according to various criteria, including accuracy, precision, recall, F1 score, and area under the receiver operating characteristic (ROC) curve.These metrics judge how well the model can place instances in the appropriate classes.Accuracy is the ratio of correctly predicted instances to the total instances.The precision metric measures the proportion of true positives (i.e., correctly detected cases) among all instances classified as positive.In contrast, the fraction of true positives that were correctly identified out of all positive instances is measured by the recall.F1 score is a combined metric that considers both precision and recall to provide an overall measure of the model's performance.The ROC curve is a graphical representation of the trade-off between a classifier's true positive rate(TPR) and the false positive rate(FPR).A high Area Under the Curve (AUC) indicates that the classifier is able to discriminate well between positive and negative examples, making it a useful metric for many applications.
The accuracy, precision, recall, F1 Score, TPR, and FPR of the PredictEYE model are calculated based on the formula (8) to (13), respectively.
where TP is True Positive, TN is True Negative, FP is False positive, and FN is False Negatives.The importance of accuracy, precision, recall, F1 score, and ROC lies in their ability to evaluate the performance of a classification model based on different aspects.Accuracy provides an overall measure of the model's performance, while precision and recall help identify the model's ability to correctly identify instances of a specific class.The F1 score combines both precision and recall to provide a comprehensive evaluation of the model's performance.

E. VALIDATION WITH GSR
Galvanic Skin Response (GSR) is a non-invasive physiological measure that reflects the activity of the sympathetic nervous system and has been widely used in research related to emotional and cognitive processing [71].GSR is easy to measure and useful for studying mental states in various settings.Compared to other physiological measures, such as heart rate or electroencephalography (EEG), GSR is less affected by motion artifacts and has a slower response time [70].GSR can be useful for validating models that predict mental states [86].
The performance of PredictEYE was validated by collecting GSR data from participants as they watched calm and stressful videos while their eye movements were tracked.An LSTM model was trained on this data to predict future sequences of GSR data.In parallel, a Random Forest model was trained to predict the participants' mental state based on the predicted GSR sequence.The Random Forest predictions based on the GSR data were used to validate the mental state predictions made using eye tracking data.

IV. IMPLEMENTATION
The experimental study was performed on hospital employees (n=6, 3 male, Mean age = 33.5,SD=5.6, age range = 26 to 42) [30].The procedures followed in the experimental research are shown in Figure 2. The participant was informed about the data collection processes and obtained written consent.The eye tracking data was collected using the SMI Redn Professional Eye Tracker (Company: SensoMotoric Instruments, Germany) with a sampling frequency of 60Hz and Experiment Center 3.7 software, which provides comprehensive tools for stimulus presentation and precise data collection.The eye tracker was connected to the laptop, and the participant was asked to sit comfortably in front of the system at an appropriate distance.The distance between the participant's eye and the eye tracker was maintained within the limit of 50cm.The eye tracking experiment was conducted in a controlled environment, with a standardized lighting setup consistently maintained across all participants and video stimuli.The display screen brightness, color, and contrast were predefined and uniformly maintained for all participants throughout the study.The eye tracker was calibrated before starting each experiment.The data was collected for a duration of 10 minutes for each participant.Over the 10-minute data collection period for all participants, the eye tracker recorded 216,000 eye movement data samples.
GSR data was collected using a grove-GSR sensor, with two electrodes attached to two fingers of a hand, [87].It measures the electrical resistance of the subject's skin, and this information is then used to produce an output voltage, typically measured in millivolts (mV).The participant was given the wearable band and instructed to wear it in accordance with the guidelines.Arduino integrated development environment(IDE) software was used to collect GSR data.Eye tracking and GSR data were collected using the same machine, ensuring the system clock remained consistent.To synchronize the data collection, a software trigger was set to initiate eye tracking and GSR measurements simultaneously, resulting in timestamps that are accurately aligned across the datasets.
The collected eye tracking time series features were fed into the LSTM model to predict the new time series data.A sequence-to-sequence regression LSTM network was used to predict future time series values.The training data was initially normalized to have a mean of zero and a variance of one.The model's predictors and responses were then ready for training.The training sequence with values altered by a single step was used as the response.At each stage of the input sequence, the LSTM network gains the ability to predict the value of the subsequent step.The sequence of data without the final time step is called a predictor.
The LSTM model, as shown in Figure 2, has 3 LSTM layers followed by a Dense output layer.Each LSTM layer has 50 hidden units or neurons that help to store and manipulate information over time.The output of the third LSTM layer is then fed to the Dense, fully connected layer, which has a single output unit.This output unit predicts the next value in the time series based on the input sequence of window-size time steps.
It takes a sequence of window-size time steps and each extracted eye gaze feature as input.The window size indicates the number of consecutive data points considered at a time when performing calculations or making predictions.The window size is a hyperparameter that defines the number of time steps (or data points) to include in each input sample to the LSTM model.The SMI eye tracker used for the data collection had a sampling frequency of 60 Hz.Initially, a window size of 60 was considered to align with the data frequency, but it caused a loss of detail during rapid eye movements.The window size was reduced to 10 to preserve more information, representing 1/6th of a second interval.This allowed for capturing finer changes in gaze behavior and improved the model's prediction accuracy for future eye movements and reactions [88].A window size that is too small (less than 10) in time series analysis can lead to significant issues, including loss of information, increased noise, and overfitting.A value of 10 demonstrated superior performance through experimentation with different window sizes, effectively balancing information capture and overfitting prevention in the time series analysis.The first input sample to the LSTM model will contain the feature values from time step 1 to 10, and the second input sample will contain the values from time step 2 to 11, and so on, as shown in Figure 4.The model is trained to predict the subsequent value based on the previous ten values in each sequence.The sliding window approach is used to create these input-output pairs.The activation function used in the LSTM layer was ReLU (rectified linear unit).ReLU is a popular activation function in deep learning models that has been proven effective in realworld applications.The network parameters were optimized using the Adam optimizer [89], known for its quick and efficient convergence.
The loss function used for training the model was mean squared error (MSE).The network was trained for 100 epochs, with a batch size of 16, and a gradient threshold of 1 to avoid gradient explosions.The initial learning rate was set to 0.005, and a factor dropped to 0.2 after every 125 epochs.This allowed the network to converge smoothly and avoid getting stuck in local minima.
During training, the model was fed with input sequences of window-size time steps and one feature and was trained to predict the next value in the time series.After training, the model was evaluated on a test set using the same window size, and the MSE loss was computed.Finally, the model generated a future sequence by feeding in the last window-size time steps of the training data and iteratively predicted the next value in the time series.The LSTM network was trained to capture the dependencies and patterns in the input time series data, which can be used to make accurate predictions about future eye movements.
The Random Forest algorithm consists of several key components.First, the algorithm randomly selects a subset of features from the input data at each split point.This aids in lowering overfitting and enhancing the model's precision.Second, the method constructs several decision trees from various subsets of the data.This enhances the model's overall accuracy by capturing various data features.The algorithm then integrates all the different decision tree predictions to arrive at a final prediction.
The Random Forest algorithm is trained on labeled data, and once the training is complete, it is applied to the predicted time series data to predict each participant's mental state.Since the physiological data is unique to each person, the Random Forest algorithm is trained separately for each individual.The algorithm uses the predicted data sequence to classify the person's mental state as calm or stressed.
The total video-watching time was divided into ten groups.After each group, the future sequence was predicted with 128392 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.LSTM.Thus, the mental state prediction was performed every minute to forecast the expected sequence of mental states that might be observed in the subsequent videowatching period.This approach allowed for identifying patterns and trends in mental state changes over time, which could inform the development of more effective interventions or personalized treatments for individuals with mental health concerns.
The code implements LSTM architecture using Keras in Python and imports various libraries, including numPy, pandas, matplotlib, and scikit-learn.The input data is normalized using the MinMaxScaler function.Additionally, a Random Forest algorithm is also implemented using the Scikit-learn library.

V. RESULT ANALYSIS
The PredictEYE is a personalized time series regression model designed to forecast an individual's mental state based on eye movements.It also finds the scene responsible for that mental state.PredictEYE has been analyzed in two stages: data exploration and statistical analysis, and performance evaluation of the PredictEYE model in estimating the mental state.
Data exploration and statistical analysis were the first stages of analysis of the PredictEYE model.This stage involved analyzing the collected data to understand the distribution and variability of the dataset.This analysis helped to identify patterns, trends, and relationships between different variables in the dataset, which is crucial for developing accurate and reliable predictive models.After the initial data exploration and statistical analysis, the next step is to evaluate the PredictEYE model's performance in forecasting the future sequence.This is important to measure the model's accuracy in predicting an individual's eye gaze sequence.The third stage of analysis involves the performance evaluation of the classification model in estimating the mental state.In this stage, the model is trained to classify the mental state of an individual based on their eye gaze sequence.

A. DATA EXPLORATION AND STATISTICAL ANALYSIS
The analysis of the mean of all eye tracking measurements taken while watching the relaxing and stressful videos by all participants is shown in Figure 5. 'P1-C' represents Participant 1's data during the calm video-watching duration, and 'P1-S' represents Participant 1's data during the stressful video-watching duration.The study found that the mean of pupil diameter and fixation dispersion in the X and Y axis increased during stress video watching compared to calm video watching, while the mean of fixation duration and blink duration decreased.These observations suggest that eye-tracking features are potentially useful for detecting changes in mental state, particularly during stress-inducing tasks.Additionally, it was noted that sweat glands become active during periods of stress or emotional upset and secrete more moisture into the body, resulting in increased conductivity and decreased resistivity.As resistivity is measured using sensors, a decreased GSR value would be expected if the individual is stressed.This pattern was observed in most participants when analyzing their GSR values.GSR offers a more direct measurement of arousal, reflecting physiological responses, while eye tracking data indirectly measures visual attention and gaze behavior.Even subtle changes in eye tracking data can be valuable, as they can reveal how individuals respond emotionally to different stimuli.
Participant 3 exhibited a significant difference in the measured variables compared to the other participants.Notably, no significant difference was observed in this participant's body resistivity between stress and calm video-watching 128394 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
sessions, suggesting that they did not experience a significant increase in stress while watching the stress video, according to the ground truth.Eye tracking measures supported this finding, with a variation in certain features observed for this participant compared to others.
To ascertain whether there was a significant difference between participants in their eye tracking data while watching calm and stressful videos, a Welch two-sample t-test was performed on all features.It is a statistical test that is used to compare the means of two independent groups when the variances of the two groups are not equal.We aimed to investigate the mental state of individuals at the end of watching calm and stressful videos.Each video lasted for 5 minutes, but for analysis, we focused on the last minute of each video.This statistical test evaluates potential differences in eye movement patterns between the two types of videos.Analyzing the data allows for the exploration of potential variations in ocular behaviors in response to different video content, providing valuable insights into how these videos influence the mental state of participants during the observation.
The null hypothesis was set as there is no significant difference between the two types of data, while the alternative hypothesis stated that there was a considerable difference between the calm and stress data.This test was performed to statistically evaluate whether the observed differences in the features between the two types of video data were likely to have occurred by chance or if they were indicative of a significant difference between the two states.Table 1 shows the P-value obtained on performing the Welch Twosample t-test on all the features.Based on the results of the Welch Two Sample t-test, it can be concluded that there is a significant difference between the calm and stressful video data for most of the features, except for Participant 3. The null hypothesis can be rejected, and the alternative hypothesis is supported, indicating that there was not much substantial difference after watching the stressful video compared to the calm video for most of the analyzed features of Participant 3.
The fact that the p-values for most participants were less than 0.05 for most of the features suggests that the differences between the calm and stressful video data were significant and not simply due to chance.However, for Participant 3, the P-value obtained for the features fixation duration, fixation dispersion X-axis, and blink duration at the end of the video-watching time of calm and stressful videos were not less than 0.05.This suggests that no significant difference was observed in these features at the end of this participant's calm and stressful video-watching sessions.Additionally, the ground truth GSR values for this participant, at the end of the video-watching time of calm and stressful videos, also had a P-value of not less than 0.05, indicating that there was no significant difference in GSR between the calm and stressful video-watching sessions.

B. PERFORMANCE EVALUATION OF PREDICTEYE
During the data collection phase, a total of 216,000 data samples were gathered from all the participants over a span of 10 minutes.Each participant contributed an average of 36,000 data samples, as they were observed while watching both calm and stressful videos.These data samples were then utilized to make predictions regarding the participants' responses, which were subsequently analyzed for further insights.The correctness of the predicted data sequence was evaluated with standard performance measures such as Mean Error, Mean Absolute Error, Mean Percentage Error, Mean Absolute Percentage Error, Mean Squared Error, and Root Mean Squared Error.The prediction accuracy and the error statistics are analyzed based on various performance measures.
The PredcitEYE model, a combination of LSTM and Random Forest models, was compared with various combinations of models, as depicted in Figure 6.The evaluation process began by comparing the time series predictions of LSTM with ARIMA predictions.Subsequently, the mental state predictions based on Random Forest were assessed across multiple machine learning algorithms, including Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naive Bayes (NB).The following section presents a detailed comparison of the PredictEYE model with different models, highlighting its performance and effectiveness in predicting mental states.

1) PERFORMANCE EVALUATION OF LSTM WITH ARIMA BASED ON ERROR STATISTICS
The ARIMA model is a time series analysis algorithm that aims to understand trends, seasonality, and cyclic behavior in the data [90].Exploratory data analysis is performed before fitting the model, which involves checking for autocorrelation between current and past values at different lags.This information is used to select the appropriate ARIMA model  for the data, and the ACF and PACF help to understand the model's autoregressive, moving average, and integrated components.The best-fit ARIMA model can then be used to forecast the next successive values of eye tracking features.
The predicted sequences of LSTM and ARIMA models at different intervals while watching calm and stressful videos were compared with the actual values to understand prediction accuracy.After watching stressful videos, the predicted sequence of LSTM and ARIMA models are shown in Figure 7.The plot illustrates the results based on 80% of the data used for training while watching the stressful video, while the remaining 20% of the data was used for prediction with LSTM and ARIMA methods.The figure displays the performance and predictive capabilities of these models on the given dataset.Figure 8 compares the actual and predicted values of features 'blink duration' and 'fixation duration' based on LSTM and ARIMA predictions for a short time interval.The plot provides clear evidence that fixations occur exclusively during the absence of blinks.The reduction in RMSE in the LSTM model, as demonstrated in Figure 8, carries substantial implications for model performance assessment.A lower RMSE indicates that the LSTM model offers predictions closer to the actual values.This indicates its effectiveness in capturing and predicting subtle variations in eye movement patterns, making it a valuable tool for accurately understanding and analyzing rapid changes in gaze behavior.
The error statistics of LSTM and ARIMA models are compared based on the performance metrics like mean absolute error (MAE), residual sum of squares (RSS), mean squared error (MSE), root mean squared error (RMSE), mean absolute percentage error (MAPE), mean error(ME) and mean percentage error (MPE) and the results are summarized in the Tables 2 and 3.This table provides a comprehensive comparison of the models and helps to conclude which is better suited for the given data.
After analyzing the consolidated error statistics of each feature based on each performance metric, it was observed that the values of the LSTM model were lower than those of the ARIMA models.The lower values for all of these evaluation measures in the LSTM model compared to the ARIMA model suggest that the LSTM model is better suited to capture the patterns in the data and provide more accurate predictions.After analyzing the error statistics of both models based on different performance metrics, it can be concluded that the LSTM model outperforms the ARIMA model in terms of accuracy and predictive power.The LSTM model is well-suited for capturing complex patterns and long-term dependencies in time series data, which could explain its better performance than the ARIMA model.
In the context of the PredictEYE model, the LSTM model was likely chosen for its ability to handle sequential data and capture long-term dependencies.While a powerful tool for time series forecasting, the ARIMA model may not be suitable for this task due to the non-linear and complex nature of the eye gaze sequence data.Therefore, the LSTM model will likely provide better accuracy and performance than the ARIMA model for predicting eye gaze sequences in the PredictEYE model [91].

2) PERFORMANCE EVALUATION OF PREDICTEYE IN ESTIMATING THE MENTAL STATE
This study evaluated six algorithms, namely Logistic Regression, Decision Tree, Random Forest, SVM, KNN, and Naive Bayes, for their performance in predicting the participants' mental states based on the predictions of ARIMA and LSTM models.After thoroughly analyzing and comparing the implemented machine-learning models, it was observed that the machine learning algorithms were performing better with LSTM predictions than with the ARIMA model.
The classification algorithms' performance was evaluated using various metrics, including accuracy, precision, recall, F1 score, and ROC curve, to determine the best-performing algorithm.The accuracy of the PredictEYE model refers to the percentage of correctly predicted mental states (calm or stressful) in the dataset.Precision is the percentage of correctly predicted instances of a specific mental state out of all the instances predicted as that mental state.Recall is the percentage of correctly predicted instances of a specific mental state out of all the actual instances of that mental state in the dataset.The F1 score is the harmonic mean of precision and recall for each mental state.
Figures 9 and 10 show the results compared across various machine learning algorithms based on the time series predictions of LSTM and ARIMA models.Prediction of mental state based on LSTM predictions showed better accuracy than predictions based on ARIMA.The analysis of the performance metrics results shows that all the classification algorithms performed better with the LSTM model than with ARIMA-based predictions.
Table 4 summarizes the performance metrics obtained from various models compared to PredictEYE.Among these models, PredictEYE, a combination of LSTM and Random Forest, achieved the highest accuracy of 86.4% and the highest F1 Score of 86.3% for Participant 4. Despite not achieving better precision and recall scores, the Random Forest model demonstrated superior overall performance 128398 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.when considering all performance measures compared to the other models.
This finding highlights the importance of selecting the appropriate classification algorithm and evaluating the model's performance using multiple metrics to understand its effectiveness comprehensively.Random Forest demonstrated a higher capability to accurately classify and predict the mental states based on the LSTM time series predictions, making it the most suitable choice for the PredictEYE model.
Figure 11 and 12 present the ROC curves and corresponding Area Under the Curve (AUC) values obtained from applying classification algorithms to the predictions generated by the LSTM and ARIMA models, respectively, for each participant.The ROC AUC values obtained from the classification algorithms applied to the LSTM-based predictions were consistently higher than those derived from the ARIMA-based predictions.This indicates that the LSTM model's predictions exhibited better discrimination between positive and negative instances across various classification thresholds.Among the classification algorithms evaluated, Random Forest consistently demonstrated higher performance in terms of AUC across most of the participants.The higher AUC values achieved by the classification algorithms applied to the LSTM-based predictions, along with the superior performance of Random Forest, indicate the strong predictive capabilities of the LSTM model in capturing meaningful patterns and features for classification tasks in this context.
In the analysis of classification models across all participants, the confidence intervals associated with the Receiver Operating Characteristic (ROC) curves, with a specific focus on Figures 11 and 12, were examined.These confidence intervals are valuable indicators of the precision and confidence level of our model estimates.Notably, for most participants, the Random Forest algorithm consistently exhibited narrow confidence intervals, signifying highly precise estimations and a strong degree of confidence that the true values fell within this range.To provide a more detailed illustration, the confidence interval details for participants P3 and P4 are shown in Table 5 Participant P4 emerged as a notable standout in our analysis, as the Random Forest algorithm achieved the highest classification accuracy for this individual.The accuracy, coupled with the narrow confidence intervals, indicates a high level of confidence in the model's ability to accurately predict outcomes for P4.Conversely, for Participant P3, the Random Forest algorithm demonstrated the lowest classification accuracy among all participants.Despite this, the narrow confidence intervals for P3 suggest that even in cases of lower accuracy, the model's predictions are still made with high precision and confidence.

C. VALIDATION WITH GSR
Figure 13 presents the classification of mental states as either 'calm' or 'stressful' using Random Forest.This classification is based on the predicted sequences of eye tracking data collected at the completion of watching the calm and stressful video.The figure depicts the states determined by the Galvanic Skin Response (GSR) and the mental state predictions based on the predicted sequences of LSTM and ARIMA models.The visualization uses green to indicate the calm state and red to represent the stressful state.
When analyzing the mental state prediction based on eye-tracking features and comparing it to the GSR, the Random Forest algorithm could accurately predict the mental states of all participants using the predicted data sequence  from the LSTM model.However, the algorithm failed to accurately predict the mental states of participants 3 and 5 when using the predicted sequence from the ARIMA model.These results suggest that the prediction of the LSTM model may influence the accuracy of the Random Forest algorithm in predicting mental states based on eye-tracking features.
Further analysis demonstrated that Participant 3 consistently exhibited a calm state according to the ground truth GSR during both videos.Random Forest successfully predicted the participant's mental state at the end of watching the stressful video using the LSTM model's predicted sequence.However, based on the ARIMA model's predicted sequence, it yielded inaccurate predictions for the mental state at the end of watching the stressful video.Additionally, the data underwent a Welch two-sample t-test, which was performed on the data samples obtained at the end of both calm and stressful video-watching periods.The results are presented in Table 1.Notably, certain features did not exhibit a significant difference at the conclusion of the stressful video, indicating that the participant remained calm throughout the observation.The findings indicate that the Random Forest algorithm's performance in predicting mental states based on the predicted sequence generated by LSTM is superior to that of ARIMA's predicted sequence.
Considering Participant 6, Random Forest could predict the mental state accurately based on the forecasted sequence of LSTM for both videos.However, based on the predicted sequence of ARIMA data, the Random Forest algorithm classified the state as stressful for both videos, which is inconsistent with the GSR's prediction.These results suggest that the performance of the PredictEYE model in predicting the mental state may depend on the specific features used and the algorithm utilized for data forecasting.

VI. DISCUSSION
PredictEYE demonstrates the effectiveness of predicting participants' mental states based on their eye tracking data.The PredictEYE model utilizes LSTM to predict the future sequences of various eye tracking measures, such as pupil 128402 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.diameter, blink rate, and fixation, while participants watch specific content on a screen.These eye tracking measures provide insights into the participants' mental states, whether they are moving towards a calm or stressful condition.For instance, when a participant is watching a stressful video, PredictEYE seeks to understand how their eye tracking metrics change over time and whether these changes lead them toward a calm or stressful state.The LSTM model plays a crucial role in this process by identifying patterns within each participant's eye tracking time series data and making predictions about their future eye movements and reactions.Random Forest algorithm is employed to collectively interpret the predicted future sequences of all eye tracking features.The Random Forest algorithm helps understand the participants' potential mental state to which they may be headed based on the predictions from the LSTM model.The insights gained from PredictEYE can be utilized to dynamically reorganize or skip the content being displayed to the participants, ensuring a more personalized and engaging experience based on their predicted mental states.This approach can be applied in various domains, such as mental health and stress management, to monitor and predict individuals' mental states in real-time based on their physiological data.
Using time series analysis on eye tracking data with high sampling frequency can provide several benefits in predicting mental states with personalized models.Firstly, it allows for identifying unique patterns of gaze behavior associated with different mental states or disorders.Secondly, using time series analysis on eye-tracking data can help capture temporal dynamics and changes in mental states over time.By capturing changes in gaze behavior and mental states over time, personalized models can provide more accurate and timely predictions, allowing for more effective interventions and treatments.
In the PredictEYE model, eye tracking features play a significant role in predicting mental states.The model's utilization of LSTM-based time series analysis on eye tracking data enables it to capture unique gaze patterns, fixations, and eye movements associated with different mental states, such as calm or stressful.By leveraging these eye tracking features, the PredictEYE model can distinguish individual behavioral patterns, making its predictions more accurate and tailored to the specific mental states of each individual.
Compared to the ground truth Galvanic Skin Response (GSR), which predicts the mental state, the PredictEYE model's eye tracking features offer additional information.
While GSR provides valuable physiological data related to mental state, eye tracking data goes beyond this by revealing what elements in the visual scene draw immediate attention and potentially influence the mental state.The PredictEYE model not only detects the mental state of an individual but also attributes the mental state to specific scenes using the information obtained from eye tracking.This feature allows for a more comprehensive understanding of the factors contributing to a person's mental state during video viewing.By combining eye tracking and mental state prediction, the PredictEYE model provides valuable insights into both conscious and subconscious responses, enhancing the accuracy of its predictions.
PredictEYE model has been compared with existing recent personalized and not personalized models [9], [24], [25], [26], [27], [93] in terms of stimulus, type of participants on which the study was performed, the features, algorithms used by the model, their achieved results, its analysis based on performance metrics, and type of the model as shown in Table 6.In eye tracking research, various stimuli, such as images, videos, tasks, and games, have been utilized to observe and analyze eye tracking measures.These measures typically include fixation, blink, saccade, and pupil diameter, employed in numerous studies to gain insights into different mental states.The PredictEYE model focused on using video stimuli as the input and extracted features based on fixation, blink, and pupil diameter to classify mental states as calm or stressful.Rather than comparing multiple users and attempting to understand the parameters responsible for mental state prediction, our approach aims to observe and comprehend individual patterns, considering the idiosyncratic nature of eye tracking data.
Numerous models have adopted machine learning and deep learning algorithms to classify mental states, attentional states, emotional states, and identify mental disorders, as well as detect perceived workload.Among these models, PredictEYE stands out with its unique approach, utilizing LSTM-based time series data prediction and random forest algorithm to predict mental states based on retrieved eye tracking data.The PredictEYE model falls under the personalized model category, intending to understand individual behavioral patterns while watching calm and stressful videos.By learning from these personalized patterns, the PredictEYE model predicts a person's mental state based on their unique eye tracking responses.PredictEYE is unique in its approach as it analyzes time series eye tracking data, thoroughly understands the unique eye tracking features of that person, and predicts their mental state and the specific scene responsible for it.
The performance of the PredcitEYE model is compared to other models in terms of accuracy.The Random Forest model used in mental state prediction has shown promising results with a maximum accuracy of 86.4%.At the same time, it could not achieve such high accuracy for all the participants, but it is unique in its approach in detecting mental states.Collecting more data over a longer period can help better understand the unique patterns of an individual's mental state, leading to more accurate predictions and improved mental health outcomes.
A stressful mental state for Participant 1 might be attributed to scene 4, while for Participant 2, a different scene could be responsible for their mental state, as shown in Figure 14.The figure illustrates the mental state predictions of Participants 1, 2, and 3 while viewing a series of stressful scenes in a 128404 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.video.The depicted time span ranges from T1 to T18, with each scene labeled S1 to S18.Participant 1 experienced a state of stress, and this was attributed to scene S4, which had a noticeable impact on their mental state.However, the same scene, S4, did not induce any changes in the mental states of Participants 2 and 3. Participant 2, initially in a calm state, transitioned into a state of stress due to scene S8.In contrast, Participant 3 remained consistently calm throughout the entire time span, with no observed alterations in their mental state caused by any scenes.These findings highlight the individual variability in how different participants respond to stressful stimuli and the unique triggers and reactions within their mental states.This capability to capture and differentiate individual responses is an exceptional characteristic of PredictEYE showcasing the diverse ways in which people perceive and react to stressful situations.
PredictEYE is a tool that focuses solely on normal individuals and aims to understand changes in their mental state by establishing a baseline period.During this baseline period, eye tracking measures are observed, and the model attempts to comprehend the trends and patterns of those measures to predict future mental states.This approach allows for the development of personalized, data-driven interventions to support individuals' mental well-being.By utilizing PredictEYE, individuals can gain insight into their mental state and make informed decisions about their mental health care.
One of the key advantages of using time series data analysis in the predictEYE model was the ability to develop personalized models for each participant.By analyzing the eye tracking data during both calm and stressful video viewing, the model was able to identify the underlying patterns in each participant's data and develop a personalized model that could efficiently predict their mental state.
The PredictEYE model customizes its analysis of eye tracking data by utilizing LSTM-based time series models, which adapt to individual differences in a personalized manner.This personalized approach involves training the model on each person's specific eye tracking data, capturing their idiosyncratic patterns and responses.Instead of comparing data across multiple participants and treating them as a homogenous group, this personalized approach recognizes and respects the individuality of each person's cognitive and emotional processes.
The results demonstrated that the predictEYE model could accurately predict each participant's mental state based on their eye tracking data.The use of time series data analysis and personalized modeling in the predictEYE model could be applied to larger datasets in future studies to improve the accuracy and reliability of the model for mental state classification.By leveraging continuous monitoring of individuals, the PredictEYE model has the potential to identify patterns in an individual's mental state over time, providing valuable insights into the factors that contribute to stress or other mental health conditions.This could enable early interventions or treatments to be implemented before a condition becomes more severe.
During the development of the PredictEYE system, the challenge was to build a personalized model.Extensive literature surveys led to the discovery of suitable time series analysis methods for eye tracking data, enhancing personalization, and improving mental state prediction.Selecting the best predictive and classification model combination is crucial in developing accurate and efficient personalized models like the PredictEYE model, which could predict participants' mental states based on their eye tracking data.
By selecting the LSTM and the Random Forest models as the best models for predicting the mental state, the PredictEYE model could understand the trends in the time series data and classify participants' mental states into calm or stressed states with better performance.

VII. CONCLUSION
The PredictEYE system predicts a person's mental state and the scene responsible for it based on eye tracking data obtained while watching calm and stressful videos.The system employs an LSTM-based time series regression model to forecast future data sequences and a Random Forest algorithm-based classification model to forecast the mental state based on the future data sequences.The performance of the LSTM model was compared with that of an ARIMA model using error statistics obtained from performance measures, and the LSTM model was found to perform better.The performance of the Random Forest model was evaluated using various performance metrics and compared to other algorithms.Random Forest performed better than other algorithms in classifying individuals' mental states based on future data sequences.The PredictEYE model achieved a maximum accuracy of 86.4%, precision of 83.9%, recall of 88.8%, and an F1 score of 86.3% in predicting the mental state of a participant.The eye tracking features were found to have a significant role in predicting mental state, and the predictions based on these features were similar to the ground truth GSR.To ensure coordination between GSR and eye data, data was collected in the same machine, initiated both the data collection simultaneously, synchronized their timestamps, and used appropriate statistical analysis.These measures helped to minimize any lack of coordination and ensure the accuracy of the analysis.The PredictEYE model can incorporate various physiological signals to further improve the accuracy of mental state prediction.
The predictEYE model is a promising approach for predicting human mental states using eye tracking data.Using time series data analysis and personalized modeling in the predictEYE model could provide a more comprehensive understanding of the underlying patterns in the eye tracking data and enable more accurate predictions of the participants' mental state.The unique feature of PredictEYE to provide insights into the specific scene responsible for an individual's mental state makes it a valuable tool for understanding and predicting individuals' responses to different stimuli.
The PredictEYE model can be used as a screening tool for mental health disorders, such as anxiety and depression, by analyzing the eye-tracking data to predict a person's mental state.The model can also help monitor the effectiveness of treatment plans for mental health disorders.The PredictEYE model's adaptability and ability to integrate multiple physiological signals suggest that it has various potential applications in various domains, including healthcare and education.
The PredictEYE model can be adapted for webcam-based eye tracking, enabling continuous and non-invasive monitoring of individuals' mental states and providing insights into stress levels, anxiety, or other emotional states over time as they work with the system.The accuracy of LSTM models can be improved by tuning their parameters and applying multivariate data analysis.Incorporating reinforcement learning in PredictEYE can improve the accuracy and personalization of mental state prediction by optimizing decision-making and adapting to changing mental state patterns over time.This can lead to better treatment and outcomes for individuals with mental health concerns.The adaptable and non-invasive nature of the PredictEYE model makes it a promising tool for continuously monitoring individuals' mental states in various applications, including healthcare, education, and employment.

FIGURE 2 .
FIGURE 2. Architecture of PredictEYE, personalized time series model for mental state prediction.

FIGURE 5 .
FIGURE 5. Boxplot of the eye tracking measures.

FIGURE 6 .
FIGURE 6.Comparison of PredictEYE with other models.

FIGURE 7 .
FIGURE 7. Forecasting data sequence of participant-3 after watching stressful video using LSTM and ARIMA models in PredictEYE.Fixation Disp-X is fixation dispersion X and fixation Disp-Y is fixation dispersion Y.

FIGURE 8 .
FIGURE 8. Comparison between actual and predicted values of LSTM and ARIMA for a short interval.

FIGURE 9 .
FIGURE 9. Analysis of classification model based on prediction with LSTM.

FIGURE 10 .
FIGURE 10. Analysis of classification model based on prediction with ARIMA.

FIGURE 11 .
FIGURE 11.ROC curves based on classification algorithms applied after the prediction with LSTM for participants from P1 to P6.

FIGURE 12 .
FIGURE 12. ROC curves based on classification algorithms applied after the prediction with ARIMA for participants from P1 to P6.

FIGURE 13 .
FIGURE 13.Forecasting of mental state based on eye tracking features considering GSR as ground truth.

FIGURE 14 .
FIGURE 14. Sample scenes and gaze responsible for the mental state of participants.
lower value of MAPE indicates better accuracy, meaning that the model is closer to the actual values.6) Mean Error (ME) represents the average difference between actual and predicted values.A value of zero indicates that the model is unbiased.ME is less commonly used than other measures of accuracy, but it can provide useful information about the direction of errors.7) Mean Percentage Error (MPE)-The average percentage of actual values deviating from predictions.A value of zero indicates that the model is unbiased.It can provide useful information about the direction and magnitude of errors.These error statistics are calculated based on the formula ( means the model is closer to the true values.RMSE is a popular measure of accuracy because it is interpretable and easy to compare across models.5) Mean Absolute Percentage Error (MAPE)-It is a metric used to evaluate the accuracy of a predictive model.It calculates the average percentage difference between the predicted and actual values, providing a relative measure of accuracy that is easy to interpret.In time series forecasting, MAPE is a popular metric because it provides a simple way to assess the model's accuracy.A

TABLE 1 .
Results of welch two sample t-test.

TABLE 2 .
Error statistics based on the prediction while watching calm video.

TABLE 3 .
Error statistics based on the prediction while watching stress video.

TABLE 4 .
Performance evaluation of PredictEYE with other models.

TABLE 5 .
Confidence Intervals associated with ROC curves.

TABLE 6 .
Comparison of PredictEYE with existing models.