Loading web-font TeX/Math/Italic
Estimating Self-Confidence in Video-Based Learning Using Eye-Tracking and Deep Neural Networks | IEEE Journals & Magazine | IEEE Xplore

Estimating Self-Confidence in Video-Based Learning Using Eye-Tracking and Deep Neural Networks


0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume
Decrease Volume
Seek Forward
Seek Backward
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
00:00
00:00
00:00
 
Graphical Abstract of Self-Confidence Estimation Using Eye-tracking and Deep Neural Network.

Abstract:

Self-confidence is a crucial trait that significantly influences performance across various life domains, leading to positive outcomes by enabling quick decision-making a...Show More

Abstract:

Self-confidence is a crucial trait that significantly influences performance across various life domains, leading to positive outcomes by enabling quick decision-making and prompt action. Estimating self-confidence in video-based learning is essential as it provides personalized feedback, thereby enhancing learners’ experiences and confidence levels. This study addresses the challenge of self-confidence estimation by comparing traditional machine-learning techniques with advanced deep-learning models. Our study involved a diverse group of thirteen participants (N=13), each of whom viewed and provided responses to seven distinct videos, generating eye-tracking data that was subsequently analyzed to gain insights into their visual attention and behavior. To assess the collected data, we compare three different algorithms: a Long Short-Term Memory (LSTM), a Support Vector Machine (SVM), and a Random Forest (RF), thereby providing a comprehensive evaluation of the data. The achieved outcomes demonstrated that the LSTM model outperformed conventional hand-crafted feature-based methods, achieving the highest accuracy of 76.9% with Leave-One-Category-Out Cross-Validation (LOCOCV) and 70.3% with Leave-One-Participant-Out Cross-Validation (LOPOCV). Our results underscore the superior performance of the deep-learning model in estimating self-confidence in video-based learning contexts compared to hand-crafted feature-based methods. The outcomes of this research pave the way for more personalized and effective educational interventions, ultimately contributing to improved learning experiences and outcomes.
0 seconds of 0 secondsVolume 90%
Press shift question mark to access a list of keyboard shortcuts
Keyboard Shortcuts
Play/PauseSPACE
Increase Volume
Decrease Volume
Seek Forward
Seek Backward
Captions On/Offc
Fullscreen/Exit Fullscreenf
Mute/Unmutem
Seek %0-9
00:00
00:00
00:00
 
Graphical Abstract of Self-Confidence Estimation Using Eye-tracking and Deep Neural Network.
Published in: IEEE Access ( Volume: 12)
Page(s): 192219 - 192229
Date of Publication: 11 December 2024
Electronic ISSN: 2169-3536

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

The COVID-19 pandemic has significantly disrupted traditional learning methods [1], highlighting the need for effective online teaching alternatives and approaches. Enhancing existing e-learning platforms [2] is significant to meet academic requirements. This transition increases awareness of censorship and surveillance [3], but on the other hand, it opens up accessibility for anyone to access online lectures. The possibility of students losing self-confidence due to the lack of feedback and the non-verbal behaviors of traditional classes may be challenging, eventually inhibiting students’ desire to learn and improve their grades.

Quantified learning has incredible potential in the era of digital education. It allows us to monitor learning behaviors and provide specific feedback to both teachers and learners. Smart sensors in devices like computers, tablets, smartphones [4], chairs [5], and even eyeglasses [6] allow access to students’ physical and cognitive states while learning. Physical states include various nonverbal cues, like utterance rate [7], [8], nodding [9], [10], [11], and smiling [12], [13], which provides insights into an individual’s behavioral expressions. Cognitive states, on the other hand, comprise complex mental processes, such as engagement [14], [15], [16], boredom [17], [18], [19], and self-confidence [20], [21], [22], which are essential for understanding an individual’s mental and emotional states.

Previous research has consistently shown that self-confidence and learning are interrelated [23], [24]. Studies have shown that a significant boost in self-confidence among students can lead to substantial improvements in their learning outcomes and overall academic performance [25]. Quantified learning aims to contribute to this, bridging the uncertainty that may arise from the shift to online learning. Quantified learning would provide valuable information regarding learners’ inner experiences by allowing for personalized interventions and tailored advice and encouragement when it matters. Although the relationship between eye movements and levels of self-confidence in this mode of education is not fully understood, the growing trend of telepresence indicates the need for further studies to clarify this relationship.

Building on the work of Ishimaru et al., our study extends their concept of a confidence-aware learning assistant, which uses an eye-tracker to detect self-confidence while students answer multiple-choice questions and adapt the review process based on the estimated confidence levels [21]. Our contribution extends their work by comparing two methods for estimating self-confidence based on eye-tracking data in video-based learning. Figure 1 presents an overview of the proposed method. Initially, we utilize a feature-based approach leveraging traditional classification algorithms, including Support Vector Machines (SVM) and Random Forests (RF). Subsequently, we propose a deep-learning approach based on the Long-Short-Term Memory (LSTM) network, which offers a more effective way to classify self-confidence levels and demonstrates the superiority of a deep-learning-based approach over traditional hand-crafted feature-based methods.

FIGURE 1. - An overview of the data collection to analysis.
FIGURE 1.

An overview of the data collection to analysis.

The main motivation behind this study is to address the growing need for personalized learning experiences that cater to the diverse needs and abilities of learners. Traditional learning systems often rely on explicit feedback mechanisms, such as self-reported confidence levels or multiple-choice question scores, which may not accurately reflect a learner’s true understanding or confidence. By leveraging eye-tracking data, our study aims to develop a more nuanced and objective understanding of learner confidence, enabling the creation of more effective and adaptive learning systems.

Furthermore, this study bridges a research gap in the field of affective computing and learning analytics, where there is a need for more comprehensive and comparative studies on the use of sensor-based approaches for estimating learner confidence. By comparing the efficacy of conventional machine-learning techniques with deep-learning approaches, our study provides a complete understanding of the strengths and limitations of each method, ultimately contributing to the development of more sophisticated and effective learning systems. The research questions are as follows:

  1. What are the most effective methods for incorporating eye-tracking data into video-based learning platforms to improve learning outcomes?

  2. Can machine-learning-based approaches be developed to accurately assess and predict individuals’ confidence levels in various educational settings?

  3. How do the predictive performances of conventional feature extraction methods and deep-learning techniques compare in estimating self-confidence levels, and what are the implications for educational research and practice?

The remainder of this paper is structured as follows. Section II provides a detailed explanation of the technical background and other research that has been done on the subject of estimating confidence through sensor-based approaches. Section III describes the methodology used for gathering data. Section IV explains the methods involving eye tracking and analyzing data used to estimate learner confidence. Section V presents the results of our experiment. Section VI discusses the results in the context of our research questions and future work in this area. Finally, Section VII summarizes the main contributions of this paper and provides a conclusion.

SECTION II.

Related Work

In this section, we explore previous research about confidence and neurocognitive states, the use of eye-tracking in education, and the relationship between eye gaze and self-confidence.

A. Confidence and Neurocognitive States

Studies throughout academic apparatus have pondered upon the role of confidence as a factor affecting different neurocognitive states, from standardized learning [26] to cognitive tests [27] and culinary skills [28]. Forbes-Riley and Litman found out that adding confidence to tutor systems improves learning pace and overall satisfaction [29]. At the same time, it is evident that those who learn to stand up for themselves are always successful in their endeavors. They automatically gain more confidence levels through appreciation for their performance.

According to Sun and Yeh, boosting confidence can assist students in the identification of misconceptions in their minds, where learning has become distorted, and students believe they have the correct answer, although, in reality, they are mistaken [30]. Roderer and Roebers and his colleagues’ findings of age-related discrepancies hint at a possible age-related gap in self-confidence, where younger people have a greater sense of self-confidence than older people [31]. For a long, neuroscientists have discovered the linkage between physiological factors, such as EEG (electroencephalography) and self-efficacy (the person’s level of confidence in performing a particular task) by exploring the relationship [30], [32].

Nevertheless, some EEG devices make partners feel bored because of their constant attachment. Instead, eye trackers provide a more convenient and native way of integrating with emergency messages on display screens. In this context, Maruichi et al. proposed a novel method to estimate a user’s self-confidence based on their stroke-level handwriting behavior. This method enhances learning by enabling users to more efficiently review areas of unacquired knowledge through feedback tailored to their self-confidence [33].

Complementing the work, Bruhin et al. presented a laboratory experiment involving a team effort task where effort and ability are complementary, and synergies exist between teammates’ efforts, revealing the impact of self-confidence on teamwork [34]. The study finds that overconfidence leads to increased effort, reduced free-riding, and higher team revenue when subjects’ self-confidence about their ability is explicitly manipulated through easy and hard general knowledge quizzes.

B. Eye-Tracking in Education

Even though mobile eye-trackers have the potential to discover some learning patterns in a particular setting, the gulf is still relatively large between the very tight situations of research labs and unpredictable real-life situations. This discrepancy poses an impassable obstacle to the elaboration of correct models reflecting learning and information interaction in the real world.

To close this gap, scientists now tend to study the research in the lab and people’s natural habitats. As an example highlighted by [35], it is both the case and that some have been undertaking intensive long-term studies where they captured more than 80 hours of mobile eye-tracking data. Previous research has utilized employees’ commercial Electrooculography glasses to record in 27 months [36].

In line with this trend [37], provides a comprehensive review of recent eye-tracking studies within educational settings, particularly focusing on children and adolescents. It analyzes 68 empirical studies with 78 experiments emphasizing the use of eye-tracking to monitor engagement, learning interactions, and cognitive activities. The review identifies common practices in data analysis and interpretation, stressing the importance of cross-validation with other data sources. Our study aligns with this shift of attention and practical experimenting by looking at the eye-tracking feature to explore behavioral real-world learning patterns.

Eye-tracking, having great potential in this field, is also supported by the fact that it is well known that this is the area where eye movement, both proficiency in language [38] and self-esteem, are linked. Research has found that when students get stuck with a given content, their reading speed slows, and they go back to sections more often [39]. Moreover, the evidence presented by Tsai et al. shows that students’ eye movements, whether from going over a question several times or looking at an explanation they understand, can be indicative of their comprehension of the concepts [40].

C. Eyes and Self-Confidence Estimation

Eye behaviors and body language contribute significantly to self-confidence. Those with low self-confidence tend to spend long revising and re-evaluating every question or choice [41]. With self-assessment being only one of the many benefits of this method, eye-tracking also sheds light on enhancing the learning experience. Okoso et al. brings the idea to the fore by providing readers with information on which aspects of the text cause great difficulty in grasping the plot [42].

On the other hand, Lee et al. showed a positive correlation between eye contact with virtual tutors and an acceleration of learning [43]. Augereau et al. have carried out a good English language proficiency estimation task and achieved high accuracy with very slim error margins while using eye movements during tests [38]. Yamada et al., are among the best, representing the top exploration towards automatically identifying the self-confidence level during problem-solving using the eye movement analysis [22]. These attempts point out the potential of eye tracking to give a general idea about the learner’s process and tailor feedback and learning reinforcements for students.

Through ecological validity between controlled situations and the real world, with the understanding of valuable insights from eye movements, the researcher is now opening the floodgates on comprehending the learning process. This process is a springboard for individual learning and a groundbreaking step toward the one-day development of personalized and reduced educational frameworks for all.

SECTION III.

Data Collection

This section provides a detailed description of the gaze data collection process. An overview of the experimental settings and data collection workflow is presented in Figure 2. The following subsections provide further information on the participant demographics and the data collection protocol.

FIGURE 2. - Experimental setting. A participant works on the laptop with an eye-tracker mounted.
FIGURE 2.

Experimental setting. A participant works on the laptop with an eye-tracker mounted.

A. Participants

The participant pool for our experiment consisted of 13 university students, comprising eight males and five females. The students were recruited from various academic backgrounds, including applied and theoretical science, computer science, cognitive linguistics, and mechanical engineering, thereby providing a diverse and representative sample for testing our framework.

B. Protocol

The experiment was conducted using a Tobii 4C remote eye-tracker with a pro license key in a laboratory setting, isolated from potential environmental distractions. The video stimuli used for recording gaze data covered various topics, including logic, literature, computer science, and medicine, to ensure broad thematic coverage and representative eye movement and attention patterns from the participants. A series of videos were carefully selected to achieve this goal, and the data collection procedure is described in detail below.

  1. An experiment conductor gave a precise description to each participant during the data collection process and the general desires that followed.

  2. Every participant was required to go through the form carefully and needed to sign the consent form if they were fully aware of it and agreed with the study.

  3. Participants of the research project endured a calibration procedure in which several stages were applied to achieve accurate and reliable measurements, as shown in Figure 2. This was carried out to put the eye-tracking equipment to the test and set the groundwork for all participant’s individual eye-tracking and gaze patterns.

  4. The participants were told to sit in front of the computer screen; now, they started watching those videos for approximately one to two minutes. The participants were prompted to follow these instructions as part of the experimental rules: To watch the videos cautiously.

  5. Simultaneously to the participants viewing the video, accuracy in the quality of the pupillary data was measured in milliseconds; therefore, the eye data collection was concluded immediately upon the completion of the video.

  6. Immediately after the participants watched the video, they were asked to complete a questionnaire to test their memory and understanding of the presentation. For each question, there were four choices, of which only one was the choice of the correct answer. Respondents acknowledge the correct answer by clicking on one of the choices below.

  7. As soon as the participants began to write down their responses, we initiated eye data acquisition during IP. In contrast, the eye movements of the participants were recorded, and the whole session ended until they submitted their last answer to the question.

  8. After participants reacted to every question with a different one, we used those to get an idea of their confidence level. The respondents filled in their answer options (Yes/No) to indicate the degree of their assurance in their answers. The confidence level reported by the participants themselves was prepared as a reference or benchmark to measure the system’s performance regarding self-confidence.

  9. After they indicate their confidence level by clicking the corresponding button, the following video will be played automatically.

  10. The last phase was that they were asked to go ahead and continue watching videos (Steps 4-9) and to reduce them with the appropriate question after completing each video.

The experiment was 30 minutes long. During this trial process, the desktop used for the experiment was repeatedly fixed to keep it standing without shaking. The environment building up to the place was designed attentively so that no debris or exposure to other devices leading to any problems with recording the gaze data was of mildest concern. Moreover, the system’s volume was standardized for all participants to ensure consistency, and screen brightness was set in all the experiments without any variation. This step was taken to establish an environment that was as optimal as possible and standardized so that no factor that could affect the results negatively, such as outside influences, would not hamper the data collection.

SECTION IV.

Methodology

In this section, we describe the dataset preparation process, followed by the methodology for feature extraction. Subsequently, we outline the machine-learning and deep-learning models utilized. Finally, the approach to compare model accuracy.

A. Data Pre-Processing

The accuracy of eye-tracking data can be affected by various noise sources, such as blinks and head movements. To mitigate this, noise-reducing techniques are employed to eliminate distorting components and obtain more reliable eye movement indicators. This would assist in precisely examining eye movement rates, including fixations (prolonged gazes at a single location) and saccades (rapid eye movements between fixations).

In this context, fixation refers to maintaining gaze at a specific location for a brief duration (typically less than one second), while saccade is a smooth eye movement from one fixation to another. To analyze the presented model, we employed the technique developed by Buscher et al. to detect fixations and saccades [44]. Instead of exporting the absolute coordinates of fixations, we exported the differential coordinates, which capture the changes in position between consecutive fixations. The differences in eye gaze patterns while solving questions with and without confidence are illustrated in Figure 3.

FIGURE 3. - Example showing eye gaze when answering the questions asked with confidence and with no confidence.
FIGURE 3.

Example showing eye gaze when answering the questions asked with confidence and with no confidence.

B. Feature Extraction

Our approach combines traditional feature extraction techniques with advanced deep-learning methods to provide a comprehensive evaluation of self-confidence estimation. We extracted a set of hand-crafted features from the eye-tracking data, including fixation duration, saccade length, saccade angle, and saccade speed which are detailed in Table 1 which are crucial in capturing the essential information from the data. By condensing the data into these meaningful dimensions, we aim to provide a more accurate and relevant representation of how the audiovisual materials influence participants’ eye movements and understanding.

TABLE 1 The List of Features
Table 1- The List of Features

C. Model Architecture

This study explores conventional hand-crafted feature-based approaches and deep neural networks to determine their effectiveness in predicting self-confidence in video-based learning environments. We employed a two-fold approach: Firstly, we leveraged the power of deep learning by utilizing Long Short-Term Memory (LSTM) neural networks, which are well-suited for modeling complex temporal relationships in data. In parallel, we adopted a more traditional approach, combining hand-crafted features with machine-learning algorithms such as Support Vector Machines (SVM) and Random Forest (RF).

1) Hand-Crafted Feature-Based Model

Our hand-crafted approach employs two established machine-learning algorithms for self-confidence estimation: SVM and Random Forest. For the SVM, we used the features listed in Table 1 and selected the Radial Basis Function (RBF) Kernel as the kernel function for the SVM, as it is well-suited for handling non-linear relationships between the features. Using a grid search approach, we identified the optimal hyperparameter values as {C} = 1 and \gamma = 0.125 , which resulted in the highest performance. We then applied the same feature set to the Random Forest algorithm with n_estimators =100 and criterion = “gini”, which is an ensemble learning method that combines the predictions of multiple decision trees to produce a more accurate and robust prediction.

2) Deep-Learning Model

Our deep-learning module was based on a Long Short-Term Memory (LSTM) architecture, as depicted in Figure 4. To prepare the input data for the network, we applied padding to the data points by duplicating the data sequences, ensuring a consistent input length. As a result, the regression analysis is valid up to a maximum of 573 data points. The LSTM network consisted of a single layer with 64 hidden units, followed by a fully connected layer. The model was trained using the Adam optimizer with an initial learning rate of 0.001, and the Binary Cross-Entropy loss function was employed as the objective function.

FIGURE 4. - Base model architecture of LSTM.
FIGURE 4.

Base model architecture of LSTM.

D. Evaluation Protocol

In our study, we took advantage of two cross-validation methods, Leave-One-Participant-Out (LOPO) and Leave-One-Category-Out (LOCO), to effectively compare hand-crafted feature-based techniques and deep-learning methods. These techniques thoroughly assessed our model’s performance across participants and video content.

1) Leave-One-Participant-Out Cross-Validation (LOPOCV)

The LOPOCV approach excluded one participant from the training set during each iteration and used their data for testing. This process was repeated until every participant had been excluded and tested once. This method allowed us to evaluate the model’s ability to generalize and accurately predict self-confidence for new, unseen participants. The final accuracy was calculated as the average of all accuracies obtained from each iteration, providing a comprehensive measure of the model’s overall performance across different participants.

2) Leave-One-Category-Out Cross-Validation (LOCOCV)

The LOCOCV technique focused on the samples of watching and solving activities together. In each iteration, one category (solving and watching together) was removed from the training set and used as the test set. The model was then trained on the remaining categories. This procedure was repeated until each category had been excluded and tested once. By employing this method, we assessed the model’s performance in predicting self-confidence based on different categories. Similar to LOPOCV, the final accuracy was determined by averaging the accuracies from each iteration, giving an overall performance metric for the model across different categories.

By implementing LOPOCV and LOCOCV cross-validation methods, we ensured that our technique was rigorously tested and performed well across participants and video samples. These cross-validation techniques were crucial for identifying the specific features associated with each participant and training video, thereby enhancing the reliability and generalizability of our model.

SECTION V.

Result

In this section, we present the results of our comprehensive comparison between hand-crafted feature-based methods and a deep-learning-based approach. Our goal is to provide a thorough understanding of each methodology’s strengths and weaknesses in the context of our research.

Table 2 presents a comparison of model results using LOPOCV and LOCOCV under two distinct approaches: a deep-learning-based method utilizing Long Short-Term Memory (LSTM) networks and hand-crafted feature-based models employing Support Vector Machines (SVM) and Random Forest (RF) algorithms.

TABLE 2 Comparison of Model Results Using LOPOCV and LOCOCV Cross-Validation
Table 2- Comparison of Model Results Using LOPOCV and LOCOCV Cross-Validation

A thorough examination of the results reveals that the deep-learning approach, leveraging the power of LSTM networks, outperforms the accuracy of hand-crafted feature-based methods. The deep-learning-based LSTM approach achieves an accuracy of 70.3% and 76.9% for LOPOCV and LOCOCV, respectively, which is significantly higher than the accuracy achieved by the SVM and RF models. The SVM model achieved an accuracy of 54.0% and 52.0% for LOPOCV and LOCOCV, respectively, while the RF model achieved an accuracy of 52.0% and 46.0% for LOPOCV and LOCOCV, respectively. These results demonstrate the efficacy of deep-learning-based approaches, particularly LSTM networks, in our research domain.

The significant performance gap between the deep-learning approach and the traditional machine-learning methods highlights the efficiency of deep-learning in handling complex data patterns and relationships. The ability of deep learning to learn and represent complex features and patterns in the data allows it to achieve higher accuracy and better generalization performance than traditional machine-learning methods. The deep-learning approach demonstrates a significant percentage increase in accuracy compared to the traditional machine-learning methods in the LOCOCV approach. Specifically, the deep-learning approach achieved an accuracy of 76.9%, which represents a 47.5% increase over the SVM model’s accuracy of 52.0% and a 67.4% increase over the RF model’s accuracy of 46.0%. These results highlight the substantial improvement in accuracy achieved by the deep-learning approach compared to the machine-learning methods.

Table 3 presents each participant’s Leave-one-participant-out cross-validation (LOPOCV) results. The model’s accuracy varies significantly across participants, ranging from 57.1% to 100.0%. Participant 6 achieved the highest accuracy of 100.0%, indicating that the model was able to predict their behavior perfectly. In contrast, Participants 1, 9, 12, and 13 achieved lower accuracy values of 57.1%, suggesting that the model struggled to predict their behavior. The model’s performance is inconsistent across all participants, highlighting the need for further research to improve the model’s performance and generalizability. The results suggest that the model can accurately predict the behavior of some participants but not others and that individual differences in behavior might explain this.

TABLE 3 Results of Leave-One-Participant-Out Cross-Validation (LOPOCV) With LSTM Model
Table 3- Results of Leave-One-Participant-Out Cross-Validation (LOPOCV) With LSTM Model

Leave-One-Category-Out Cross-Validation (LOCOCV) experiment results using an LSTM model are presented in Table 4. The accuracy of the model varies across categories (video samples), ranging from 61.5% to 92.3%. The model achieved the highest accuracy of 92.3% for Video 4, indicating that it was able to predict the behavior for this sample accurately. In contrast, the model achieved lower accuracy values for Video 1 and 6, with accuracy values of 61.5% and 69.2%, respectively.

TABLE 4 Results of Leave-One-Category-Out Cross-Validation (LOCOCV) With LSTM Model
Table 4- Results of Leave-One-Category-Out Cross-Validation (LOCOCV) With LSTM Model

Figure 5 presents the confusion matrices for the LOPOCV of the SVM, RF, and LSTM models. Different scenarios illustrate the binary classification model’s performance in predicting confidence levels during question-solving and video-watching activities. Figure 5a, Figure 5b, and Figure 5c depict the results when inferences are made in a user-dependent manner for the SVM, RF, and LSTM models, respectively.

FIGURE 5. - Confusion matrices of SVM, RF, and LSTM models for LOPOCV.
FIGURE 5.

Confusion matrices of SVM, RF, and LSTM models for LOPOCV.

Figure 6 presents the confusion matrices for LOCOCV of the SVM, RF, and LSTM models. Figure 6a, Figure 6b, and Figure 6c show the results when inferences are made in a video-dependent manner for the same models. These confusion matrices provide a comprehensive visual representation of the model’s performance, allowing for a detailed comparison of their predictive capabilities across different validation approaches and inference contexts.

FIGURE 6. - Confusion matrices of SVM, RF, and LSTM models for LOCOCV.
FIGURE 6.

Confusion matrices of SVM, RF, and LSTM models for LOCOCV.

SECTION VI.

Discussion

This section provides a discussion and interpretation of the results presented in Section V. The discussion examines the implications of the findings and addresses the research questions posed in Section I.

A. Incorporating Eye-Tracking Data Into Video-Based Learning Platforms

We have introduced a data collection procedure incorporating eye-tracking data from video-based learning scenarios. This comprehensive dataset includes valuable information such as confidence levels, raw gaze data, and correctness of responses. It provides a robust foundation for analyzing and estimating self-confidence in educational contexts.

The data collection approach is unique in its generalizability and applicability across educational contexts. To ensure data generalizability and alignment potential, all videos are carefully selected to represent a wide range of variety of topics at multiple levels of difficulty. We further ensured diversity as participants were recruited from a variety of ages, backgrounds, and learning preferences using an extensive participant recruitment process. The wide range of background information aids in the generalized as well as reliability, rendering our dataset a useful asset for forthcoming research works on educational psychology and technology-supported learning.

B. Conventional Machine-Learning Techniques for Self-Confidence Estimation

Our approach involved an extensive examination of individuals’ confidence levels using conventional machine-learning methods. Specifically, we utilized Support Vector Machines (SVM) and Random Forests (RF) to estimate confidence levels from the collected data. These methods provided a baseline for evaluating the effectiveness of traditional machine-learning techniques in predicting self-confidence based on eye-tracking data.

There are several reasons that we have selected to implement SVM and RF. Firstly, SVM is used across industries for being efficient in high-dimensional spaces and its ability to construct hyperplanes that optimally separate classes in a feature-rich dataset. It progressively supports the use case of handling eye-tracking data. Furthermore, in cases with fewer points of data compared with the number of dimensions, SVM has a better generalization scale than other methods (for us, this was discussed in our dataset). RF is an ensemble learning method that works by constructing a multitude of decision trees at training time and reporting the mode of classes (classification) or mean prediction (regression) per-tree outputs. This method delivers better accuracy by combining multiple models, which eliminates the overfitting problem. In addition, since RF has no restrictions on the number of features it can manage, this is consistent with our complex and varied eye-tracking data nature.

SVM and RF require carefully engineered features to perform well. So, considerable manual effort and domain expertise are required to figure out which parts of the raw eye-tracking data are likely key to predicting self-confidence. Important as it is, feature engineering can be very time-consuming and serve to limit the portability of these types of models between data sources. In addition, traditional models such as SVM and RF may not be able to model the temporal dynamics in eye-tracking data as well. Eye movements are inherently spatiotemporal, and the order and timing of gazes can provide crucial information about cognitive and emotional states. Traditional machine-learning models are not inherently suited to cope with such temporal dependencies, leading them to underperform when deployed.

C. Comparison of Conventional Hand-Crafted Feature-Based Approach to Deep-Learning-Based Approach

We conducted a comparative analysis of manually designed feature extraction methods versus deep-learning techniques for predicting self-confidence levels. We employed LSTM neural networks for deep learning, which demonstrated superior performance compared to traditional machine-learning methods. The LSTM model outperformed SVM and RF, achieving higher accuracy in both user-dependent and video-dependent scenarios. This finding underscores the potential of deep-learning models to uncover latent patterns in eye-tracking data, leading to more accurate and reliable predictions of self-confidence levels.

D. Limitations and Future Work

While this study on self-confidence estimation using deep learning provides valuable insights, several limitations and challenges must be acknowledged. These include the potential variability in model accuracy and the complexity of interpreting eye-tracking data.

Firstly, the reliance on eye-tracking devices, such as the Tobii 4C, underlines a dependency on specific hardware for data collection, which may limit the applicability of the models in environments where such equipment is unavailable. Additionally, variations in the accuracy and calibration of eye trackers can affect the reliability of the collected data.

The assessment of self-confidence is limited to responses related to video content, which may not be representative of all aspects of self-confidence, as it is influenced by a wide range of factors and contexts. Self-confidence is a multifaceted trait influenced by numerous internal and external factors, and focusing on a specific scenario may oversimplify the broader concept and its determinants.

The study has been conducted with a limited or homogeneous sample population, which may affect the generalizability of the findings. A diverse sample is crucial to ensure that the models can accurately estimate self-confidence across different demographics and cultural backgrounds.

For future work, it is important to consider the variety of online lecture formats, such as those involving teacher-led discussions or document-based materials. Different types of learning materials can impact learners’ self-assessments. Further exploration could provide insight into the influences of factors such as age, prior knowledge, learning styles, and personality on the estimation models.

Integrating eye-tracking with additional sensors remains an important task for future research. Physiological measures, such as EEG and biofeedback, can provide deeper insights into cognitive and emotional states. Additionally, incorporating vocal intonations, facial expressions, and eye movements could lead to a more precise understanding of participants’ confidence levels.

Future work could focus on developing more complex models that help explain the factors behind confidence estimates, enhancing our understanding of learner behavior and model transparency. These models could be applied in personalized learning platforms that offer tailored feedback, targeted interventions, and customized learning paths that adapt to individual needs. It will be challenging to evaluate their performance in authentic educational settings with many students, but it is essential for widespread adoption.

SECTION VII.

Conclusion

We conducted an experiment where computer models based on deep learning could rate self-confidence using eye-tracking data. Participants watched videos and answered questions with their gaze movements captured. In a comparison of different models, the LSTM achieved an average prediction accuracy of 73.6% for the degree of self-confidence, outperforming Support Vector Machines and Random Forests, which had accuracies around 53.0% and 49.0%. This finding also hints that LSTM models can provide feedback and support in domains such as education, where strengthening confidence and skills is vital. Additional research would provide testing of the models’ generalizability, handling multimodal data integration, and developing interventions based on these models’ insights. This research shows us that we can utilize deep learning to understand and motivate self-confidence.

References

References is not available for this document.