Towards Collaborative and Intelligent Learning Environments Based on Eye Tracking Data and Learning Analytics: A Survey

The current pandemic has significantly impacted educational practices, modifying many aspects of how and when we learn. In particular, remote learning and the use of digital platforms have greatly increased in importance. Online teaching and e-learning provide many benefits for information retention and schedule flexibility in our on-demand world while breaking down barriers caused by geographic location, physical facilities, transportation issues, or physical impediments. However, educators and researchers have noticed that students face a learning and performance decline as a result of this sudden shift to online teaching and e-learning from classrooms around the world. In this paper, we focus on reviewing eye-tracking techniques and systems, data collection and management methods, datasets, and multi-modal learning data analytics for promoting pervasive and proactive learning in educational environments. We then describe and discuss the crucial challenges and open issues of current learning environments and data learning methods. The review and discussion show the potential of transforming traditional ways of teaching and learning in the classroom, and the feasibility of adaptively driving learning processes using eye-tracking, data science, multimodal learning analytics, and artificial intelligence. These findings call for further attention and research on collaborative and intelligent learning systems, plug-and-play devices and software modules, data science, and learning analytics methods for promoting the evolution of face-to-face learning and e-learning environments and enhancing student collaboration, engagement, and success.


I. INTRODUCTION
The COVID-19 pandemic is an ongoing disruptive vent across our world. It has changed education entirely and resulted in many primary and higher education institutions switching to in-person classes in 2020 and Spring 2021. There are over 1.2 billion children out of the classroom in 186 countries and about two-thirds of colleges and universities remaining fully or primarily online [1]. Both teachers and researchers have noticed that students face a learning and performance decline as a result of this urgent and mandatory shift in instruction format [1], [2]. Actually, online teaching and distance e-learning is not new as a teaching method.
The associate editor coordinating the review of this manuscript and approving it for publication was Laxmisha Rai .
With advances in networking and edge computing, learning does not necessarily have to be in traditional classrooms, at schools, sitting with other classmates. Remote learning has been used and becoming more popular before the current events, but the pandemic has greatly accelerated their use. Learning activities in online settings can have many advantages. Remote learning can be self-initiated and happen in and outside of classrooms, and can be more convenient for nontraditional students and those continuing adult learning. It is a common experience that if we don't pay attention to learning materials or missing central parts of teaching lectures, the quality of learning, performance, and even retention and application of the learned knowledge can be greatly degraded. To some extent, how well learners have engaged during learning has emerged as a key indicator of whether or not VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. A graphical summary of eye-tracking enabled learning systems and analytics with essential components and techniques.
they are mastering the learning materials in order to achieve personal goals, accomplishment, satisfaction, as well as selfimprovement. We argue that apart from learning management system software such as D2L Brightspace, Blackboard, or Moodle, emerging technologies for promoting proactive and collaborative teaching and learning of complex STEM topics like computer science have become an urgent need to ensure continuous quality learning and transform teaching and learning both in the classroom and online. Typically, in learning environments, there is a wide variety of software and technology available to conduct instruction and learning activities. Examples include projectors, cameras, microphones, computers, and eye trackers. Eye trackers are devices, hardware platforms, or systems that capture and track the movement of the students' eyes to determine where and how long they are gazing. Essentially, the collected eye-tracking data serves as a huge potential resource that can be mined to discover new insights into when students are learning well or poorly, and how to craft teaching sessions to improve the learning experience. Learning and data analytics can be then applied to these environments to better understand, analyze, and improve the learning experience. Over the past few years, extensive research efforts [3]- [7], learning systems with eye-tracking devices [8]- [12], and learning analytics [13]- [16] have been conducted and devoted to understanding student learning and assist student learning in real-world learning environments by capturing, monitoring, estimating, and evaluating student engagement in a variety of learning tasks.
Meantime, recent studies and a few surveys [17]- [22] reviewed eye-tracking methodologies [17], major decisionmaking process factors [18], calibration methods, accuracy, evaluation [22], and the use of popular commercial and eye-tracking applications [19]. Particularly, Alemdag and Cagiltay [17] investigated the use of eye-tracking technology in multimedia learning. This study found that eye tracking assessments of cognitive progress in multimedia learning include selecting, organizing, and integrating information about the learning process. Orquin and Loose [18] found that saliency, surface size, visual clutter, and position are four major identified factors throughout the decision-making process. They also went through four decision theories: rational models, constrained rationality, evidence accumulation, and parallel constraint satisfaction models. Obaidellah et al. [20] conducted a review of the application of eye tracking to programming tasks by describing the most prevalent eye-tracking metrics and commercial product characteristics such as sampling rate, resolution, duration, and dependent variables. This research reported that the Tobii eye-tracker is the most commonly used device. In [19], Al-Rahayfeh and Faezipour examined the use of prominent commercial products and eye tracking tools. Methods for eye tracking and head movement detection were reviewed in [19] and [21], which we have described with multi-sensor eye-tracking systems and summarized in Table 1. [22] researched the degree accuracy and pixel metrics achieved by Personal computers, TV panels, head-mounted, automotive, and head-held device platforms. Those reviews can be excellent resources for technical information, calibration methods and experiment settings, accuracy, evaluations, product specifications, and applications. The surveys on educational data classification, data fusion, and learning analytics were performed in [23] and [24].
Despite there has been extensive research on eye-tracking and learning analytics for improving student learning, they are detached, broadly spreading over the field with different focuses. There is no review in relation to fundamental resources including open projects, accessible sensing data and library/tools, or frameworks to be compatible with existing sensors, devices, and tools while solving scientific problems. The variety of eye-tracking and learning analytics, the boundary between hardware, and the dynamic nature of teaching and learning further increase the complexity of reviewing existing studies and research work in the field. To fill the research gaps and eliminate the complexity, in this paper, we review the literature for promoting pervasive, student collaboration, engagement, and success based on our previous work presented in [25] and [26]. In [25] and [26], we studied and classified eye-tracking devices and platforms into four categories with a detailed comparison. Based on these findings, the main contributions of this paper are summarized as follows: • We present a comprehensive survey by emphasizing the existing eye-tracking enabled learning systems and analytics and combining them with data science, multimodal learning, and artificial intelligence.
• We provide a graphical summary of the current eyetracking-enabled learning systems and analytics with essential components and techniques to lead this survey and simplify the understanding of the systems and methodologies in the literature, as shown in Figure 1.
• We investigate the learning analytics and methodologies to model student learning, capture and study fluctuations, detect gaps between teaching and learning, create personalized curriculum and content, provide and deliver learning resources and learning activities to make education more accessible for students and teachers.
• We identify and discuss the challenges, open issues, and opportunities of enabling collaborative and intelligent teaching and learning in real-world learning environments.
The remaining paper is organized as follows. Section II presents the methodologies used for conducting this literature review and the graphical summary. Section III provides an in-depth review of the most recent available datasets and tools in an attempt to assist current and future studies with more insightful and precise guidance. Section IV states the eye-tracking learning data collection and management. Section V discusses challenges and open issues. Finally, we conclude this survey and suggest future research directions based on current research gaps in Section VI.

II. METHODOLOGY
A set of major terms, such as eye-tracking devices and platforms, gaze tracking, student attention, mind wandering, intentions, cognitive ability, interaction, learning outcomes, learning dataset, learning data mining, and intelligence, were derived in order to conduct a comprehensive literature review. The search expressions for these significant search terms were built using Boolean operators and interchangeable search strings. IEEE (Institute of Electrical and Electronics Engineers), ACM (Association for Computing Machinery), Wiley Online Library, ScienceDirect, Springer-Link, JSTOR, Scopus, and Web of Science were used to apply the search terms and expressions to digital online databases that are well established and maintained in the field of computing, sensing, networking, and education. Additional publications relevant to the study questions we're engaged in were retrieved using Google Scholar and Google. The systematic reviews' references [17], [18], [23], [36], [42]- [50] were also examined to ensure that no relevant papers were neglected. Only English-language articles from conference proceedings, transactions, magazines, books, essays, technical reports, white papers, and manufacturer's technical guides were studied.
A preliminary literature search yielded 295 results. 67 duplicate articles were found in multiple databases and were eliminated. The remaining 228 items that appeared to be acceptable for consideration were carefully reviewed and filtered using the following criteria: C1. The magazine covered cutting-edge technology and use cases for encouraging student learning and engagement; C2. The publication covered eye tracking and learning analytics technologies and related applications for enhancing student learning success; C3. The paper used experimental trials based on either laboratory data or open-source datasets to explain, analyze, show, or assess the technology. A total of 98 papers were gathered at the conclusion of the study selection process.
Data extraction and synthesis were then performed. For ease of explanation, we provide a graphical summary of the current eye-tracking enabled learning systems and analytics with essential components and techniques. It can serve as a generic foundation and strong support for using different types of eye-tracking hardware, analytic modules, algorithms, and tools for future research. A general learning system has five main components, namely (1) face-to-face (F2F) and online learning with eye-tracking systems, (2) data collection and management with tools, (3) feature extraction, VOLUME 9, 2021 (4) measurement, learning analytics, and intelligence, and (5) open issues and challenges, as shown in Figure 1.
F2F and online learning with eye-tracking systems: The use of eye-tracking sensors and hardware platforms has been transforming the traditional way of teaching and learning in physical and virtual classrooms. Both sensors and platforms can assist researchers and educators in data collection, learning analytics, and various application scenarios. Based on the hardware setup, in our previous work [25], we classified the existing eye-tracking systems into four main types: tower-mounted eye tracker [8], [11], [27]- [31], screen-based eye tracker [9], [11], [32], [33], [35], head-mounted/wearable eye tracker [10], [30], [36]- [38], and portable/mobile eye tracker [12], [39]- [41], as listed in Table 1. A detailed discussion on those four categories of eye-tracking devices and platforms can be found in [25].
The details of components (2) -(5) with their corresponding modules or technologies will be described in the rest of this survey. Specifically, the studies for (2) data collection and management with tools and (3) feature extraction will be reviewed in Section III. The technologies for (4) measures, learning analytics, and intelligence will be examined in Section IV followed by a discussion on (5) open issues and challenges.

III. DATA COLLECTION AND MANAGEMENT WITH TOOLS
Data is the most important element and part of all data science, analytics, machine learning, and artificial intelligence. Without data, we have nothing to train models, and all related collaboration and intelligence will remain futile. Data can be any fact, value, eye/face/scene/salience images, sound, videos that are collected during teaching and learning in F2F classrooms and online learning with different eye-tracking hardware and systems. For the purposes of recording and promoting data-driven scientific research in eye-tracking, a set of public eye-tracking and gaze datasets has been published and become available in recent years. Even though existing work [51]- [55] that we analyzed has provided a summary of eye-tracking datasets, they are only used for comparison in terms of the size of the dataset, the number of subjects, gaze targets, head poses, and collection duration without considering hardware features and environmental setup. They, however, don't appear opposite to test algorithms and applications with low-cost webcams or mobile devices in classrooms or online learning.
These observations motivate us to re-examine and take a closer look at the publicly available datasets with their real-world setting and applications, with an emphasis on unique resource requirements for enabling face-to-face or online learning to better benefit from technologies and student learning datasets. We concentrate mostly on the increasing use of eye tracking and eye gaze monitoring systems for learning analytics in real learning environments including classrooms and online learning. Table 2 presents a summary of publicly available student eye tracking datasets and saliency detection benchmarks during student learning.
Saliency is what stands out and attracts great human visual attention in a scene (like photo, website, slide, and video). In essence, our brain and eyes are always unconsciously and automatically focus on the most salient (important) regions during viewing. To detect saliency and predict users' visual attention, eye-tracking data and analysis are often measured and provided as objective ground truth in different studies.
In MIT saliency benchmark [55], there are collections of saliency benchmark, saliency, and saliency-related datasets with eye tracking data captured by different commercialoff-the-shelf eye trackers. To take EyeTrackUAV [56] as an example, raw gaze data, fixation, and cascade events of fourteen (14) participators are collected using EyeLink 1000 Plus when they are watching UAV (Unmanned Aerial Vehicles) videos as visual stimuli to study their visual behavior. EMOd( EMotional attention dataset) [57] is created to help investigate how image sentiment influences human attention and visual perception. TurkerGaze [58] constructs a webcam-based system for crowd-sourced eye tracking data collection from 200 participants across the United States via Amazon. Wandering eyes [59] collects and studies 77 students' eye tracking data using an Eye Tribe tracker in a laboratory environment to examine eye movement patterns of mind-wandering and investigate how mind wandering affects attention during video lectures. A detailed overview of these datasets is available in [54] and [55]. Due to space constraints, we only focus here on datasets collected with mobile or cost-effective eye trackers that can be easily deployed in our daily life for face-to-face, online, and mobile learning.
PoG [60] dataset comprises 3D coordinations of display and Mobile-eye tracker, predefined target pixel locations, and eye tracking videos containing eye movements of 20 subjects sitting in front of the display and looking at the target pixels. It is an early eye-tracking dataset for evaluating point-of-gaze detection algorithms, but severely limited with respect to variabilities in body and head poses, illumination, and visual stimulus. In [61], eight (8) color cameras are fastened on a display to capture images from different views in a synchronized manner. The images are then manually annotated with facial landmarks to define head pose, reconstruct the 3D shape of eye regions, and synthesize more eye images for a person and pose independent 3D gaze estimation. All data including raw multi-view eye images, 3D eye shape models with annotations, and the synthesized eye images based on 3D models are contained in the UT multi-view gaze dataset.
Recent years have seen an explosion of using selfdeveloped mobile eye trackers and smart portable devices as hardware for pervasive eye tracking in daily life settings. The main characteristics of this type of hardware are its abilities of image acquisition, data processing, and computing. These characteristics facilitate the rapid prototyping of new eye tracking configurations, algorithms, and applications in classrooms without requiring to purchase(repurchase) eye tracking systems or limit user movements and illumination changes. MPIIEgoFixation [62], LPW (Labelled Pupils in the Wild) [63], GazeSim [64], MPIIMobileAttention [66] are the datasets collected by exploring the design of pupil (mobile eye tracker [38]) with freedom in selecting cameras and additional parts. MPIIEgoFixation [62] collects data when participants walk around in the presence of head movements and scene dynamics. There are five (5) participants attending experiments and over 2,300 fixations and 40,000 frames with fine-grained annotations for evaluating the proposed fixation detection method based on the visual similarity of gaze targets.
LPW [63] contains 66 eye region videos from 22 participants with a head-mounted eye tracker in indoor and outdoor environments. Without loss of generality, the participants are allowed to wear glasses, contact lenses, and make-up for robust pupil detection as PuRe [67] demonstrates. GazeSim [68] is provided for studying 2D-to-3D gaze estimation. In experiments, 14 participants attend the data collection and need to stand at 5 different distances to the display. The distance is then recorded as the depth information for 2D-to-3D mapping. MPIIMobileAttention [66] contains annotated videos recorded from 20 participants engaged in common activities (like reading books, using mobile phones, working on computers, and walking) in indoor and outdoor environments. InvisibleEye [65] develops its mobile eye tracker with three low-resolution cameras embedded into a normal glass frame. The basic idea behind it is to use low-resolution eye images captured in different views as input to train a neural network for learning-based gaze estimation. The dataset contains 200,000 synthesized eye images, 280,000 real eye images from 17 participants under laboratory setting during calibration, and 240,000 eye images from 4 participants when looking at targets from different angles in a mobile setting.
Several other datasets [40], [41], [51] are recorded and published using modern portable commodity hardware like laptops, tablets, and smartphones in real-world settings. MPIIGaze [51] includes 213,000 images from 15 participants using laptops with a front-facing camera over several months in their daily life and offers 37,667 images with manually annotated facial landmarks. The data in MPIIGaze covers a wider range of background, illumination, eye appearance, and poses. GazeCapture dataset [41] is a large-scale public dataset, containing data from a total of 1474 participants using mobile phones/tablets via crowdsourcing. It trains iTracker that we have analyzed in this study for real-time robust eye tracking and gaze prediction on mobile devices. TabletGaze [40] contains 816 videos, timestamps, and locations of 5 × 7 predefined gaze points. During data collection, 51 participants can have four kinds of body postures: standing, sitting, slouching, and lying with a tablet in an unconstrained mobile environment. They may wear glasses if necessary and have different head poses and eye appearances while looking at predefined locations on the tablet screen.
We argue that open-source projects and datasets have unique abilities to support both commercial and low-cost eye trackers, easily alter experiments to specific scenarios, quickly prototype ideas, and enable research and applications without major restrictions. Development and maintenance of open-source projects and datasets can help build and support a community of student education, researchers, and developers. We include the most recent available datasets in this study in an attempt to assist future studies with more insightful and precise guidance. This is a new dimension beyond just depth and width. However, a number of open-source projects and datasets (like OGAMA, OpenEyes, EyeTab, OpenEyes, and EyeRecToo) are no longer maintained by original developers. There is a huge need for more efforts and contributions made by us in the community to label, format, and pre-process the data so that the dataset can be easily downloaded and used by other studies.
Moreover, the data mining strategies and associated tools are also in need of a substantial amount of research attention. The goal is to facilitate the integration and extension of existing research studies and to promote collaboration, idea sharing, and innovation work. As there is still no widely accepted open-source library or toolkit available for convenient data processing and mining, current strategies [16], [17], [69]- [75] mainly deal with disparate sources of data and very specific situations. This makes it very difficult to understand student learning and develop practical teaching/learning strategies, especially for interested novices as well as interdisciplinary researchers. We expect that both teaching-learning hardware, software, and datasets will be pervasively used in our daily learning activities and incorporated into the next generation of education and human-computer interactions.

IV. MEASURES, LEARNING ANALYTICS, AND INTELLIGENCE
In this section, we will review the studies that have been accomplished in conjunction with the collected data to characterize, study, and improve the learning process. We break down the discussion by generalizing measures or approaches that researchers have used in applying data science, learning analytics, and machine learning (see Table 3).

A. MEASURING ATTENTION AND MIND WANDERING
When monitoring a learning task, how well students are paying attention to the materials is often a good indication of whether or not they are comprehending the materials. Several researchers have attempted to gather information about the attention of students during a learning task, and apply data analytics to build models that can automate the classification and detection of attention and inattention while a learning task is being conducted. Faber et al. [78], [86] have applied the idea to classifying and modeling mind wandering during comprehension tasks where the goal is to study and learn a body of materials.
Participants could self-report anytime they became aware that they had not comprehended a previous part of what they had been reading. Also, explicit probes were sometimes given, here the student had to either state they were currently paying attention, or their mind was wandering. Indeed, this type of self-report and explicit probing can alter the labeling of attention vs. non-attention itself, just by the act of the participants being meta-cognitively more aware of their focus of attention. However, it does provide a set of labeled eye gaze data where time periods or states of inattention can be associated with eye gaze behavior. The authors in this study applied several standard machine learning classifiers, such as simple logistic regression, Bayes nets, SVM, and decision trees, to build a binary classifier that could take summary eye gaze data and predict whether the student was attentive or mind wandering during a particular segment. The authors showed that it was possible to build a model of attention for the system that could potentially be applied to building an automated system capable of detecting and classifying when student's minds are wandering during the learning task.
In another study, Gomes et al. [73], [87] captured eye-tracking data from high school students working on a task designed to teach some basic engineering skills. Unsupervised machine learning clustering was used to cluster student eye data into several groups based on their gaze pattern. When compared to post-task evaluation metrics, highest performing students tended to cluster together with similar gaze patterns, and likewise, lower-performing students had different clusters of gaze behavior identifiable from the eye tracking data.
Raca & Dillenbourg [71] demonstrated a more general system for detecting the attention of a whole group of students in a classroom. In the system, regular videos were captured and video imaging techniques were used to estimate the focus of attention of individuals detected within the view of the cameras. A rough estimation of gaze direction could be calculated for individuals detected in the camera's field of view. Other measures that were extracted from the camera data included when slides were changed, slide durations, and when questions/discussions occurred from students and other properties. In addition, some self-report data was also collected from the students on their level of attention, sometimes during the class, and sometimes in questionnaires afterward. These researchers later went on to refine their system with more sensors and capabilities [47]. In general, this work demonstrated the possibilities that non-intrusive measures of attention might have in building learning management systems for real-time classroom feedback to instructors in more traditional lecture classroom settings.

B. PREDICTING COGNITIVE ABILITIES
Another useful measure is to try and use eye-tracking data to directly predict the level of cognitive ability, or the expertise level of the student or user of the learning system [16], [17], [70]. This is related to the concept of user modeling in user interface design. But here researchers are interested in modeling more fundamental cognitive abilities, such as working memory capacity. In general, this type of model of the users' cognitive abilities might be useful in providing different content or materials that are known to be more effective for a user with, for example, more preference for visual learning or verbal materials.
For example, [70] reports using eye-tracking data to predict some basic cognitive abilities, such as perceptual speed, and visual and spatial working memory capacities. Several good instruments are available in psychology for measuring individuals along these dimensions. So again the goal is to see if standard machine learning and modeling techniques can be used to classify subjects based on gathered eye-tracking data into the correct level of a particular cognitive ability type. In the study, Conati et al. gathered eye-tracking data of people using an interface designed to learn about and make a decision for a target problem domain. They also applied standard measurements to determine cognitive ability scores on the subjects after the fact of five separate cognitive measures. Again basic machine learning classifiers were tested, and it was shown that they are potentially able to predict these types of cognitive measures from eye-tracking data along and automatically, with enough accuracy to be used as a model of the users to help design better decision making systems.
As another example, some researchers are interested in predicting the performance levels of subjects from sensor data. Bilkstein [16] used eye-tracking data to perform a classification task on labels generated from behavioral codes based on think-aloud protocols. In this research, a simple puzzle (an 8-tiles puzzle game) was used as the task to be solved. So another measure used to build machine learning predictors was an after-the-fact assessment of the skill level of the puzzle solvers based on their time/speed of solving the puzzle. The researchers tried several approaches to building models of subject skill levels. In one approach described, subjects were broken down into high, medium, and low performers based on their skill at the task, and a classifier was trained to predict the user skill level from the eye-tracking data gathered while solving the puzzle. The model worked well enough to roughly identify skill level from eye-tracking data of people on this puzzle task.
In [69], the authors used gathered eye-tracking data to model behavioral differences between novice and expert map users. In this paper, the authors showed that there were statistical differences in various derived measures of where and how novice and expert map readers used a program to search for and answer questions about information from a map. This type of real-time prediction of the user level on task-based eye-tracking or other gathered sensor data is obviously a very powerful tool. Open questions remain on if general cognitive abilities are sufficient for building learning systems that are more reactive to the abilities of the learners, or if more specific models need to be developed based on the particular task or subject domain. Or in other words, could a general eye-tracking model be developed that might be applicable to many different learning domains to detect and classify user skill level while learning, or do models need to be built for each individual learning domain and learning system?

C. ANALYZING GROUP PERFORMANCE AND DYNAMICS
Another promising approach to using eye-tracking and multimodal analytics is in studying group dynamics and performance of teams in active learning and hands-on group training [71], [72], [74], [75]. Often one measure that has been found useful is the amount of synchronization of activity and behavior of team members when performing the learning task. The overall approach is to analyze the synchronization of two or more sets of eye-tracking data and compute the number of times a pair or team achieves joint visual attention (JVA). JVA has been studied in many contexts and disciplines and is known as a strong predictor of the quality of a group's interaction and success. For example, [71] have shown that in pair programming, good programmers tend to have higher JVA compared to less proficient programmers. Also, there is some evidence that you can actually try to influence and encourage JVA in groups to get them to improve their performance [74].
Spikol et al. [75] research focused on project-based learning in a group setting. The questions asked were which features of a student group are good predictors of team success. A truly multimodal set of sensor data was gathered from teams working on engineering training tasks, including data from cameras and microphones, embedded sensors on components used in the learning task, and mobile computing devices. One of the key predictors was sensor data from image processing that detected the focus of attention on other team members, or shared focus of attention on learning artifacts. Shared focus and amount of interaction between team members ended up being important features that machine learning classifiers could use to build predictive models of the performance of teams in the lessons based on the multimodal sensory data gathered. Huang et al. [88] proposed to collect eye-tracking data, body postures, and motion and leverage k-means clustering to recognize the collaborative learning states when students programmed robots to solve maze problems. The results indicated that learning was significantly and negatively correlated with the probability of remaining in the non-collaborative state. The more collaboration there is among students, the more learning they gain.

D. PREDICTING USER INTENTIONS AND GOALS
In many types of systems, especially collaborative situations, predicting user intention is useful in mediating better interactions. For example, [89] showed that SVM classifiers can accurately predict intentions for a request before users make them for a collaborative customer-worker system. [85] presented a study with 40 university students who watching massive open online course (MOOC) lecture videos to envision the student motivation and learning gain in terms of content coverage, AOI, perceptual and conceptual attention by using the Gaussian process models (GPM), SVM, Generalized additive models (GAM).
In general, predicting user intentions and goals is very valuable in the context of usability and user interface (UI) design. The general paradigm is that you can easily capture and study eye gaze behavior when given a task to complete using a user interface. You can then measure the ease or difficulty users have with completing the task. For usability testing, the model of eye gaze behavior can then be applied to determining good and usable interfaces from poor ones [90].
For learning analytics, the opposite application is often more useful. We would like to infer what the current goal or intention of the learner might be from the given set of eye gaze or sensor data currently being gathered from the user. This information would allow one to determine when students are on task with what they are currently expecting from the lesson or are off on a tangential task that might not be most conducive to completing the current goal.

E. EVALUATING USER DEFICIENCIES AND USER MODELS OF SUBJECT CONTENT
If we can more accurately model and predict students' skill levels as demonstrated from measures gathered during their learning session, the information can be used to directly influence the learning activities and direct students towards more appropriate tasks or concepts for their current level of mastery. In this type of study, researchers are interested in questions of the domain knowledge of a subject area. Thus building models of the content area (such as a concept map), and mapping or estimating which concepts students have mastered, and which they are deficient in is useful in planning appropriate study activities.
In the field of intelligent tutoring systems, learner modeling is the area by which researchers attempt to model the domain knowledge (and deficiencies) of a learner in order to better guide the learning session [91]. For example, [92] describes work on leveraging eye tracking data to improve student modeling. Most of these efforts look at ways to build classifications of eye gaze patterns that map onto data where it was clear students were deficient or did not comprehend a particular needed concept at a given moment. While much of the research has demonstrated the possible advantages of being able to model high-level student understanding of concepts, it is still very difficult to integrate this type of modeling into working learning systems.

V. OPEN CHALLENGES AND FUTURE WORK
This section presents a range of challenging open issues and future research directions. As mentioned in the previous sections, the data gathered during learning can be used to predict mind wandering and sculpt user cognitive processing profiles, and assess the dynamics of group collaboration on problem-solving with data science and machine learning. Despite its great advantages, there are novel and emerging challenges that must be solved in the future for enabling pervasive and proactive learning with more interactive, personalized, engaging learning systems. Due to this, our discussion is conducted, which could serve as a guide for future studies.

A. COLLABORATIVE SENSING AND PERCEPTION
Instead of single-subject-based eye tracking, a collaborative sensing and networked learning system is expected to be built as a fundamental basis for facilitating face-to-face teaching/ learning, and making eye tracking, attention measuring, intention predicting, group cooperation and performance analysis easy tasks for instructors. The system can comprise many multimodal sensory elements including eye tracking devices (e.g., wearable eye tracker), desktops/laptops, microphones, tablets and digital pens, world cameras, a control terminal/station, and a server farm. The majority of elements are deployed in the classroom and connected to the server farm via a wired/wireless network. During lectures, the server is responsible for gathering and analyzing real-time data streaming from the eye trackers, microphones, tablets, and world cameras to provide feedback, active monitoring, and intervention of learning analytics to the lecturer/instructor with the control terminal/station. The more devices that are involved for cooperation, the more information can be collected for enabling collaborative sensing and perception. However, the system becomes more complex with higher construction costs and maintenance overhead.

B. DATA SHARING AND MINING
One of the main barriers to the widespread adoption of eye tracking and MMLA technology is data collection and sharing. In essence, the multimodal data collected during teaching and learning from different media/sensors (camera, computer, tablet, digital pen) should provide ground truth with detailed annotations against the eye tracking devices, models and techniques used for evaluation. By offering the data, any interested educator or researcher in the community can easily start their studies and research to understand how to improve students' learning, shift the day-to-day teaching activities and promote teachers' professional development in classrooms or online learning. Nevertheless, it is rather costly. While there is a wealth of open-access datasets [40], [41], [51], [56]- [66], [93], [94], it still remains unsatisfactory, mainly due to that most of the datasets were model/applicationoriented, asynchronous, or uncorrelated across modalities (audio, video, eyetracking, and gestures, digital interaction logs). Some datasets like PoG [60], UT Multi-view Gaze dataset [61], TurkerGaze [58], and MPIIEgoFixation [41] have even turned to inadequate as a result of the lack of fidelity annotations, periodical follow-up management, and maintenance. It rightfully illuminates one of the promising paths for future work.

C. REAL-TIME EYE TRACKING ANALYTICS
Over the past five years, as instructors, we have observed that students, particularly beginners, often give up prematurely after encountering initial, sometimes even minor, problems on a learning task, wrongly concluding they lack aptitude for the task area. The advancement of real-time eye tracking analytics makes it possible to detect when students are not progressing sufficiently on a problem, are sidetracked, or focus on irrelevant details. It is also noteworthy that every student is special with a unique educational background, learning capability, and objectives. And students deserve the opportunity to learn in their own ways. Although experienced teachers might be capable of noticing and managing student distractions in the classroom easily, it doesn't mean that experienced teachers understand every student and his/her learning difficulties/observables. For students, inappropriate instructions might cause plenty of anxieties, frustrations, and vexations. There is an urgent need to conduct real-time eye tracking analytics for assisting educators/teachers to come to know more about each student and discover effective ways to engage both the individual students and the entire class in active learning in real time. Additionally, appropriate learning support should be provided to students so that they can easily make progress and gain confidence in their own pathways of learning.

D. TEACHING PRACTICE EVALUATION AND ENHANCEMENT
It is very important to conduct appropriate teaching activities at the right time for enabling effective learning. Teaching practice (activity) assessment and evaluation to measure teaching efficacy and student success, however, is intensely difficult in face-to-face classrooms, where each session is unique and fulfilled with various activities and open-ended tasks [95], [96]. The activities may include speeches, writing, discussion, group work, demos, exercises, games, case studies, and problem-based learning. Teachers are also allowed to make necessary changes to their teaching activities when different types of multimodal sensory elements such as computers, laptops, tablets, PDAs, smart cell phones, and digital pens are available for assistance. The open-ended tasks could be data formatting, problem definition, solution design, program implementation and debugging, data analysis, and design optimization. Even with a slight change in environment or session, the teaching and learning results can be significantly different. What is worse, the possible combinations of activities and tasks increase the complexity of teaching practice evaluation and enhancement. Such inherited nature and physical system limitations in sensing and data collection have made it infeasible to conduct real-time teaching practice evaluation and reflect instant formative feedback from students.
To solve it, novel approaches to characterize teaching practices and discover patterns of effective teaching practices in face-to-face classrooms are desired with fine-grained data collection and analysis. That is, both traditional and innovative teaching practices rely more than ever on highly complex, interactive teaching-learning environments that operate beyond the reach of current learning analytics. Accompanied by the rapid advances in multimodal sensory elements in the environments, the fine-grained data could be collected effortlessly, which would provide new opportunities to develop efficient approaches for characterizing teaching practices and discovering patterns. Particularly, throughout specific teaching activities, students' learning trajectories might be clearly micro-scoped. It would allow teachers to build up better teaching activities and learning materials for improving student learning experiences and promoting student success.

E. PRIVACY AND UTILITY
Students' privacy is one of the biggest concerns of collecting eye-tracking data and conducting learning data analytics during F2F learning in a similar system as shown in Figure 1. Students' interests, reading/writing/web browsing contents, and watching videos displayed on screens can easily expose personal information (e.g., gender, name, address, age, and health status) or reveal political, sexual, cultural, or other personal preferences. It is also proven [22], [97], [98] that eye movement/gaze pattern is as unique as an iris or fingerprint and can be used for user authentication. A key challenge is to find the right trade-off between privacy and utility. A certain amount of random noise can be introduced to hide subjects and reduce the potential loss without affecting the utility of the eye-tracking data and degrading system performance. We expect that a series of advanced eye-tracking data privacy-preserving mechanisms for erasing sensitives data from the raw dataset, aggregating data, enabling differential privacy, and providing AOI (Areas of Interest) metrics and summary data of eye movement events could be developed in the near future.

VI. CONCLUSION
In this study, motivated by the common problems and technical constraints faced by both students and teachers in daily learning and teaching activities, we have examined the current trends in hardware and research that are being applied to using the collected data for classroom and online learning analytics and prediction. Eye-tracking data has certainly been shown to be an attractive source of information for F2F and group settings, as it provides a window into student attention, and allows us to monitor important measures of the learners and group dynamics. We have also reviewed different types of eye-tracking and learning systems, software, datasets, and related studies with a particular emphasis on their respective advantages and limitations. Furthermore, we have identified and discussed a string of open issues and challenges that are needed to be addressed for enabling and promoting pervasive and proactive learning. Those open issues and challenges by no means represent all problems of previous research and studies in progress. But they are identified and highlighted in this study as promising future research directions.
We will continue to witness extensive research studies and activities in this field in the future. Specifically, our finding in this study calls for further attention and research on addressing one or more of the open issues and challenges that we have discussed by combining existing pedagogical and human-computer interaction (HCI) theories. Additionally, future research should focus on developing new multi-sensory elements, systems, and tools, learning analytics methods, machine learning, and artificial intelligence algorithms that can be used as functional modules added or removed as necessary. This can facilitate F2F and online teaching as well as learning and make eye-tracking, attention measuring, intention predicting, group cooperation analysis easy tasks. There is no doubt that as the field matures, it becomes more common to have collaborative sensing and networked learning with positive motivation in pervasive learning.
YUEHUA WANG (Member, IEEE) is currently an Assistant Professor in computer science with Texas A&M University-Commerce. Her research interests include artificial intelligence, human-machine interaction, software programming and engineering, vehicular sensing and control, vehicle communication systems, resource virtualization, cyber-physical systems, autonomous driving, the Internet of Things, ubiquitous, and mobile computing.
SHULAN LU received the Ph.D. degree in cognitive science from The University of Memphis. She is currently a Professor in psychology with Texas A&M University-Commerce. Her research interests include embodied cognition, learning and instruction, human centered computing, and event perception.
DEREK HARTER (Member, IEEE) is currently an Associate Professor of Computer Science at Texas A&M University-Commerce. His primary research interests are in machine learning and deep learning applied to human cognition and instruction. He also has research interests in HCI, virtual reality, and augmented reality.