Developing an Improved ACT-R Model for Pilot Situation Awareness Measurement

According to a previously built situation-awareness (SA) model based on attention allocation, with the ACT-R (Adaptive Control of Thought, Rational) theory for analyzing pilot cognitive processes for situation elements, an SA mathematical model was improved to predict pilot SA during exposure to different display interfaces and missions. An indicator-display monitoring task was performed under different experimental conditions for SA model verification, while the SA global assessment technique (SAGAT), performance measures, 10-dimensional SA rating technique (10-D SART), and eye movement measures were adopted to comprehensively assess the operator’s SA. The experimental results revealed that theoretical prediction values calculated using the improved SA model were strongly correlated with the operation performance, and thus confirming the model validation. The SAGAT was shown to be a more effective method than SART in this research, and the overall SAGAT accuracy rate, as well as the accuracy response time, are effective indices for SA measurement. Eye-movement indices, such as the fixation/saccade ratio, which corresponds to the mode of information perception and extraction, was examined to be sensitive to operator’s SA changes. The Advances of the improved SA model have been achieved in predicting and indicating SA by using human behaviors, including operation performance, SAGAT response behaviors, and visual behaviors. Thereby, it provides a new auxiliary tool for quantitative characterization of pilot’s SA during cockpit interface design optimization and ergonomic evaluation.


I. INTRODUCTION
The pilot-cockpit system is an example of a typical complex human-in-the-loop system. With the development of integrated information and intelligence in modern aircraft, the information flow carried by the flight display and control system has become increasingly diverse and complex. Whether for routine tasks or under special circumstances, the pilot must perform dynamic and repetitive processes of information perception, comprehensive judgment, and decision-making. While performing these tasks, the pilot must constantly maintain situation awareness (SA), which is one of the most important abilities for ensuring flight safety. Aviation accident investigations have indicated that 51.6% The associate editor coordinating the review of this manuscript and approving it for publication was Eunil Park . of fatal accidents and 35.1% of non-fatal accidents can be attributed to the failure of decision-making. However, most of these are not a result of decision errors but SA errors [1]. For example, SA may be lost because of a pilot's overreliance on the automation system, which can cause an aircraft with no mechanical faults to be piloted toward the ground, resulting in the fatal occurrence of controlled flight into terrain. SA reflects pilot cognitive status of various factors and conditions that affect aircraft and crew members within a specific situation and time period. Research has indicated that at a higher pilot SA level, more rapid and effective operation can be applied, which is more helpful to ensure flight safety [2], [3].
Among the various SA definitions, Endsley's information process-based three-level model is the most popular. This model describes SA as the perception of task-relevant VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ elements in the environment (the first level of SA, SA1), the comprehension of the meaning in relation to task goals (the second level of SA, SA2), and the projection of the status in the near future (the third level of SA, SA3) [4], [5]. The SA process can be perceived as cumulative, implying that it is difficult for pilots to cognitively reach SA3 before SA1 and SA2 are achieved. Without good interface design, pilots may stop or be interrupted during information acquisition before they reach SA3 [6]. The defined three-level SA model is important not only because it distinguishes different cognitive processes but also owing to the suggestion that different system designs and training modes are required for dealing with problems caused by multiple SA levels. For example, better alarm mechanism design may be needed when there is a problem with SA1, and better preview display design is more suited to address a problem with SA3. Nowadays, SA is important index to assess design concepts and the technical effectiveness of complex technology systems, which can provide valuable diagnostic data for interactive design [7]- [11]. In recent years, numerous studies have been conducted on SA modeling and experimental measurement in the fields of cognitive psychology and ergonomics [12]- [17]. Early research mainly focused on the formation process analysis of the SA cognitive mechanism and the establishment of SA qualitative models. Accordingly, the development of SA quantitative prediction models has become an important engineering application target for research. Compared with the qualitative model, the quantitative model driven by theory can be used for quantitative analysis and calculation, providing SA prediction and ergonomics evaluation functions [18]- [20]. For example, Entin et al. proposed the performance sensitivity model, which suggested that SA insufficiency leads to task performance deterioration, and the total performance deterioration is related to the reduction in the cognitive accuracy of each situation element (SE) [21]. Hooney et al. developed a man-machine integration design and analysis human performance model for the improvement of multi-operator SA prediction. This model computes SA as a ratio of the actual SA (the number of SEs that are detected or comprehended) to the optimal SA (the number of SEs that are required or desired to complete the task). High-fidelity flight simulation experiments under different task conditions (aviate, separate, and navigate) were performed for validating the model, and the results indicated that the pilot SA level is sensitive to the display interface design [22]. Wickens et al. proposed the attention-situation awareness (A-SA) model, which can be divided into two modules. One governs the allocation of attention to events and channels in the environment (corresponding to SA1), and the second draws an inference or understanding of the current (corresponding to SA2) and future (corresponding to SA3) states of the aircraft within this environment. The A-SA model emphasizes the SA errors related to attention resource allocation. According to a review of SA in flight accidents, there is a close relationship between SA errors and inadequate attention [23]. Additionally, Wickens et al. proposed the display formatting and situation awareness model for predicting which type of aviation display interface features are conducive to improving pilot SA. This model was validated against data from a high-fidelity synthetic vision simulation, and an analysis revealed that the prediction results were strongly correlated with the multitask flight control performance, as well as the traffic awareness response time and accuracy [24]. Adaptive Control of Thought-Rational (ACT-R), which was originally developed by Anderson, is a theory of human cognition and performance as well as a framework for developing computational models of human behavior [25]- [27]. For example, Byrne et al. constructed an ACT-R-based computational model of a pilot-aircraft-visual scene-taxiway system, which was used for the diagnosis of possible sources of taxi-driver error. To evaluate the impact of Synthetic Vision System technology on pilot performance in commercial aviation, Byrne et al. also developed a pilot-vehicle-airport model combined with ACT-R [28], [29]. By analyzing the theories and research on ACT-R, the driver behavior modeling method in ACT-R was employed by Liu et al. to present a highway overtaking behavior model as an example. Driver behavior prediction and model verification both indicated that driver behavior modeling is a flexible and effective method for ensuring traffic safety [30].
Usually, SA models must be verified by combining task scenario simulation and several measurement methods. In a review of multiple SA measurement techniques, subjective measures (e.g., three-dimensional SA rating technique, SA-subjective workload dominance, crew SA, SA supervisory rating form), memory probe measures (e.g., SA global assessment technique), performance measures (e.g., external task performance measures, embedded task performance measures), and physiological measures (e.g., eye-movement tracking, electroencephalograms, event-related potentials, brain mapping, heart rate, electrodermal activity) are all commonly applied [6], [31]- [34]. Using these SA measurements, a large amount of valuable research has been conducted, e.g., comparing the advantages and disadvantages of various SA measurements in different application environments, proposing and testing methods for improving operator SA from different perspectives, using SA as an index for display interface or system design evaluation, and revealing the complex relationships between SA and other factors such as the attention allocation, cognitive load, and automation level. For example, Salmon et al. compared two different SA measures (freeze probe recall approach and post-trial subjective rating approach) during a military planning task, and the results indicated that only the participant SA scores derived via the freeze probe recall method produced a statistically significant correlation with the performance [4]. By employing the widely used SA global assessment technique (SAGAT), Endsley et al. investigated design methods for improving SA in complex system, e.g., a display model with target orientation, highlighting key features, and parallel processing of multi-display models [35], [36]. Using a high-fidelity flight simulator with a focus on the primary flight display (PFD), with its route guidance design, Wickens et al. studied the effects of tunnel characteristics on the pilot route tracking performance, mental workload, and SA (including traffic and terrain awareness). Traffic awareness was measured by observing the operator accuracy rate and the response time for detecting aircraft symbols, and terrain awareness was measured by asking the operator to point out important terrain directions during the flight simulation freezing period [37]. Chudy et al. studied pilot's simultaneous data management ability under stress. The intuitive flight display was designed for improving the pilot's SA and decision-making processes, and its effectiveness was tested via a subjective evaluation based on a flight simulation experiment [38]. For studying SA in the pilot-aircraft system, a human-in-the-loop experiment in the flight simulation environment was performed by Wei et al. to assess SA using the SAGAT method and physiological measurement. Accordingly, participant SA was analyzed under different cockpit display interface designs [2]. To develop an event-based method for measuring air traffic controller SA, Yang et al. simulated radar control scenarios. SA data based on the event-based method and SART, mental-workload ratings based on the air traffic workload input technique and NASA-TLX, the number of flight conflicts, the actual task load, etc. were applied comprehensively to investigate the sensitivity and validity of the event-based SA assessment technique [39].
The aforementioned studies provide valuable reference for SA modeling and measurement methods. However, most of the previous research focused on proposing a relatively independent concept model from the viewpoint of cognitive psychology, and few of the models have been transformed into their corresponding quantitative calculation models, which limits their applicability in engineering. Additionally, few studies have been performed on SA quantitative prediction models-particularly those covering high-level SA and those directed toward the evaluation and optimization of the cockpit display interface design. In a previous study, the forming process of pilot SA was analyzed in combination with ACT-R cognitive theory, and an SA model covering three levels was proposed on the basis of attention allocation. The theoretical model was validated by several experimental measurements [40], [41]. However, the experimental results indicated that the SA prediction value from the theoretical model was not correlated with the operator behavior performance. As high SA is usually necessary for good operation performance, SA should be partly predicted using the operation performance [42]. In this regard, the previous SA model was improved in the present study, and its correlation with the operation performance of an indicator-monitoring simulation task was analyzed to validate the improved model. Furthermore, the changing trends of several typical evaluation indices were investigated with variations in the SA level.

II. IMPROVED SA MODEL BASED ON ACT-R THEORY
Human performance modeling is useful for understanding new systems and their effects on human task performance, without resorting to the more expensive methods of humanin-the-loop experiments or simulations [43]. In the present study, the improved SA model was constructed on the basis of the hierarchical analysis of an SA forming processes combined with the ACT-R theory. FIGURE 1 shows the relationship between the three levels of SA and the ACT-R system [30].
In a certain operation environment, the situation related to the current task can be divided into n parts labeled SE i . Each SE i is necessary for supporting an operator task with high performance. It is assumed that the operator's attention resource A shared by the n SEs is (1) Generally, information is processed from top-down and bottom-up channels simultaneously. The former is a type of automatic search (or selective control of visual system), while the latter is driven by visual features of SE i . The attention recourse A i allocated to a certain SE i can then be expressed by (2) [44], [45]. β i indicates the operator expectation for information, which can be manifested by SE i 's occurrence frequency; V i indicates SE i 's information value or the importance for accomplishing the current task; Sa i indicates SE i 's salience, which can be manifested by its visual coding; and E i indicates the operator effort paid for obtaining the information. Considering the fuzziness and randomness of the human cognitive processing mechanism, the V i can be further expressed as ∂ i u i , where ∂ i represents the possibility of the operator potential cognitive availability status for SE i , and u i is the membership degree normalized into [0,1] based on the cognitive evaluation of SE i 's priority [46]. Thus, the attention allocation proportion f i of SE i can be calculated as ''(3)'': In the ACT-R system, the SE i of the environment is selectively noticed by the visual module. Assuming that the event of paying attention to SE i , which is defined as a i , occurs at the current time, the preliminary cognitive processing of SE i is performed. The occurrence probability of a i can be equivalent to the SE i 's attention allocation proportion f i , as follows: When event a i occurs, declarative knowledge is extracted from declarative memory by the buffers to activate the operator's cognition, and the cognitive activation amount AC i can be defined as the sum of the base-level activation amount AC 0i and the relevant activation amount n j W j S ji . The base-level activation amount reflects the general usefulness of SE i according to operator's past experience. If the fact of recognizing SE i has occurred t times previously, AC 0i ≈ c + 0.5 ln t, where c is usually set as 0. The relevant activation amount reflects the SE i 's relevance to the current context, where W j represents the attention weighting of SE i , which can be equivalent to f i , and S ji represents the association intensity between the current SE i and the other relational SE j . Additionally, S ji = S − ln(fan j ), where fan j represents the number of facts associated with SE j , and S is usually set as 2.
Only when the cognitive activation amount reaches a certain threshold τ can the perception of SE i occur (corresponding to SA1). The event of perception of SE i is denoted as b i . In (5), s controls the noise at the activation level and is typically set as 0.4, and τ is set as 1.0 [25], [46].
According to ACT-R theory, the essence of the cognitive process is the firing of a series of production rules. Multiple production rules can be applied for the ''IF-THEN'' pattern matching at any point in time, but for recognizing SE i , only one production rule with the highest utility U i can be selected. After the procedural knowledge is successfully extracted from the procedural memory and optimal pattern matching is performed by the operator cognitive mechanism, the significance of SE i can be fully comprehended, which may correspond to SA2 or SA3. In dynamic systems, there is fuzzy boundary between SA2 and SA3 because the understanding of the present usually has direct implications for the future, and both are equally relevant for the task [23], [41]. If the event of SE i comprehension is denoted as c i , then According to the definition method of the operator cognitive level P i for a certain SE i [22], there are three situations: the cognitive activation amount is lower than the threshold, the SE i is not perceived, and P i takes the value of 0; the cognitive activation amount is higher than the threshold, where the declarative knowledge is extracted successfully to form perception, SA1 is achieved, and P i takes the value of 0.5; the production rule with the highest utility is matched successfully to form comprehension of the current state or future state, SA2 or SA3 is achieved, and P i takes the value of 1. See (7)-(9): Therefore, the mathematical expectancy P i of the cognitive level for SE i can be calculated as follows [41]: If the operator attempts to obtain and hold a relatively high SA level, the cognitive levels for the critical SEs, which have important impacts on the current task's performance, should be kept high. Simultaneously, the display attributes of the SEs should contribute to the maintenance of operator SA. Therefore, e i is used to represent SE i 's sensitivity coefficient, and its value is related to not only the importance u i of SE i but also the salient Sa i of SE i . The relationship between the operator's whole SA level and the cognitive levels of SEs can then be expressed as follows: The SA modeling in our study is focused on the formation process of SA1, SA2, and SA3 on the basis of ACT-R theory. In addition, the SA prediction model is built based on the quantitative representation of pilot's cognition process rather than simulation calculation. It is conducive to the applicability of the model to flight scenarios, as the model parameters are determined by referring to the literature focusing on pilot cognition and behavior [25], [47]- [50]. However, as the operator SA is affected by multiple factors, the prediction results of our improved SA model cannot represent the absolute-truth value; rather, they represent a value of comparative significance.

III. EXPERIMENT METHOD A. PARTICIPANTS
Twenty-eight highly trained, healthy participants (18 males, 10 females) were included in the present study, all from the School of Aeronautic Science and Engineering in Beihang University. All the subjects (aged from 22 to 28 years with a mean age of 23.95 years) were right-handed and possessed normal or corrected-to-normal vision. Written informed consent was obtained from the participants before the experiment.

B. INTERFACE SIMULATION MODEL DESIGN
The interface simulation models used in the experiment were designed by referring to two typical interfaces for the PFD, as shown in FIGURE 2. Information layout and color coding differed in the two interface simulation models. GL Studio was adopted as the development platform for graphical modeling of the instruments and control panels, and the C++ programming language and network communication technology were used to realize the virtual instrument simulation programming. The experimental procedure was generated on the Microsoft Visual Studio platform. The interface simulation model was presented on a 17-inch IBM monitor with a resolution of 1280 × 1024, and the average illumination in the experiment environment was set as 600 lx.

C. EXPERIMENT DESIGN
A two-factor, completely within-subjects design was adopted in the experiment. Factor 1 was the probability distribution of indicators' abnormal display, with two levels (level 1: uniform probability distribution, UPD; level 2: non-uniform probability distribution, NUPD) set by the frequency with which SEs were questioned, and factor 2 was the interface simulation model, with two levels (level 1: interface simulation model A; level 2: interface simulation model B). An indicator-monitoring task was performed in the experiment. The rolling angle (SE.1), indicated airspeed (SE.2), barometric altitude (SE.3), and heading angle (SE.4) presented on the interface simulation model were set as the monitoring targets, referring to the optimal and effective numbers of targets for human attention allocation [40]. Each participant was asked to perform four trials with different combinations of the interface simulation model and probability distributions for indicators' abnormal display. The task orders were counterbalanced across the subjects according to the Latin square design.

D. EXPERIMENT PROCEDURE
The SAGAT method was employed in the experiment. For one trial, a total of 32 selection questions (with 4 options for each question) covering three levels of SA were presented randomly on a certain selected interface simulation model according to the chosen probability distribution, so that the participants' cognitive status of all the indicators (SEs) under different task conditions could be obtained. During a random frozen time, only one question aiming at a certain indicator was presented, and the participant responded by selecting one option as soon as possible within the given time limit. The overall accuracy rates for the three levels and the accuracy response time were chosen as the evaluation indices for the SAGAT method. Additionally, combined with the SAGAT method, the participants' performance was recorded simultaneously. Before the experiment, the participants were required to be familiar with the relative importance of each indicator manifested by their scores, which could be obtained by answering the corresponding questions correctly, as well as the probability distributions of indicators' abnormal display, which was manifested by the probability of each indicator being questioned. The participants were required to monitor the interface simulation model on the screen and to capture abnormal information as quickly as possible. During each trial, the screen was frozen randomly, and the interface simulation model was immediately replaced by SAGAT question interface on the screen. The question setting of SAGAT was aimed at examining the participants' capability in flight information acquisition, flight status understanding, and near future trends prediction, which corresponds to the three levels of SA (SA1, SA2, and SA3). The participants were asked to click on the correct one of the four options with a mouse within a limited time, clicking on the wrong options or timeout were recorded as an error. Afterwards, the screen freeze was released and the experimental task continued with the previous interface simulation model. All participants were encouraged to obtain scores as high as possible by optimizing their attention allocation strategies. No score was given for an incorrect or missing answer, and the participants' final total scores were used for their performance evaluation. Immediately after each trial, the participants were asked to rate their SA using the 10D-SART (Situation Awareness Rating Technique) self-rating scale [6]. The ratings on 10 individual dimensions were then combined to form a rating for each of the three major factors, including the demand of attention, supply of attention, and understanding of the situation. To objectively record eye-tracking data, the Smart Eye Pro noncontact eye tracker system was employed for real-time tracking during the experiment. The calibration accuracies of all the subjects were always better than 1 • , and the sampling rate was 60 Hz.

A. MODEL PREDICTION AND EXPERIMENTAL MEASUREMENT RESULTS
The attribute values of 'Effort' and 'Salience' for the four SEs displayed for the two types of interface simulation models were calculated. To be precise, the attribute values of 'Effort' were determined by the relative normalized distances between the SEs, and the attribute values of 'Salience' for each SE were determined by its color matching, indicator size, and indicator type. Furthermore, the color-matching value was set according to the legibility degree of the displayed characters, the indicator size value was set according to the displayed areas occupied within the experiment interface, and the indicator type value was set according to the difficulty of reading and understanding the indicator [41]. The prediction results for SA calculated via the theoretical model as well as the statistical results for SA obtained from different measuring methods under the four tasks are presented in Table 1.

B. EXPERIMENTAL ANOVA RESULTS
An analysis of variance (ANOVA) was performed using IBM SPSS 23.0 to investigate the effects of the two factorsthe probability distributions of indicators' abnormal display and the interface simulation model-on the participants' SA, operation performance, and eye movement. The level of significance was set at α = 0.05, yet the significance level lower than 0.1 was recorded as marginally significance for the consideration of the potential research value in the connection between SA prediction model and operator's cognition and behavior. The ANOVA results are presented in Table 2.
The main effect of the interface simulation model on the operation performance was significant (F (1, 27) = 5.311, p = 0.029, η 2 = 0.164), as indicated by the higher operation score for interface B compared with interface A. The main effect of the interface simulation model on the SAGAT accuracy rates of level 1 (F (1, 27) = 3.309, p = 0.080, η 2 = 0.109) and level 3 (F (1, 27) = 3.050, p = 0.092, η 2 = 0.102) was marginally significant. The SAGAT accuracy rates of the two levels in interface B were both higher than those in interface A. In contrast, the main effects of the probability distributions of the indicators' abnormal display, rather than the interface simulation model, on the understanding score in the 10D-SART self-rating scale (F (1, 27) = 6.096, p = 0.020, η 2 = 0.184) were significant. The understanding score under the condition of NUPD was higher than that under

C. CORRELATION ANALYSIS AND MULTILINEAR REGRESSION ANALYSIS RESULTS
To validate the improved SA prediction model based on ACT-R theory, a correlation analysis between the results of the improved SA model prediction and the experimental measurement indices, as well as a multilinear regression analysis, were performed, referring to the method of ''modeling the average pilot'' [23], [24].
The following conclusions were drawn from the correlation analysis, as shown in Tables 3 and 4. For performance measurements, the improved SA model was highly and significantly correlated with the operation score (|r| = 0.972, p = 0.028). For the SAGAT measurements, the improved SA model was highly and close to significantly correlated with   To investigate the consistency between the model prediction and the experimental measurement indices, we analyzed the correlation among the experimental variables with significant effects in the ANOVA, as well as the correlation between the significant variables and the predicted SA in the previous and improved models. The results are shown in FIGURE 4. The operation score was correlated with the predicted SA for the improved model but not for the previous model. The dimension of understanding in 10D-SART was correlated with the predicted SA for both the previous and improved models. The correlations among the operation score, understanding dimension of 10D-SART, and SAGAT accuracy of level 3 (projection) were significant. The correlation between the SAGAT accuracy of level 1 (perception) and the mean saccade duration was negative, and the correlation between the SAGAT accuracy of level 3 and the fixation/saccade ratio was positive.
Compared with the previous SA model (adjusted R 2 = 0.035), the results of the multilinear regression analysis indicated a 14.3% increase in the interpretation rate (adjusted R 2 ) for the improved SA model (adjusted R 2 = 0.040). The linear regression equations of the previous and improved SA models are as follows: SA previous model = 0.331 + 0.008 · Understanding (12) SA improved model = 0.326 + 0.008 · Understanding +0.001 · Performance. (13)

V. DISCUSSIONS
An improved SA prediction model was proposed with the ACT-R theory, based on the previous one. The modelpredicted SA was verified under different experimental conditions by several SA evaluation measurements, including SAGAT, operation performance, 10D-SART, and eye movement measures. The predicted SA value in the improved model has shown more adequate consistency and sensitivity with performance behavior and visual behavior. The SAGAT is a computerized freeze probe technique. Typically, a simulation task is randomly frozen, all displays are temporarily blanked out, and a series of questions about the current situation at the time of freeze are administered. The operators are required to answer the questions according to their knowledge and judgment. Their situational views are compared with the operational scenario, and an overall SA score is calculated at the end [4]. The SAGAT method is widely used to provide an objective and effective measure of operator SA. In the present study, the SAGAT method was adopted for the assessment of operator SA for different SEs from three levels. As high correlations existed between the theoretical prediction values and accuracy rate, as well as the accuracy response time, according to the experimental results, the improved SA model was validated. Although the accuracy rate of the three levels in the SAGAT method has always been the most widely used and effective index, the accuracy response time is shorter than that used for SA assessment in the previous studies. In the present study, the accuracy response time was highly and negatively correlated with the theoretical prediction value, probably because the SA enhancement can improve the operator cognitive processing speed, reducing the reaction time. Therefore, the accuracy response time can be considered as a good indicator of the SA's changing trend.
There are many studies across different domains in which a high SA has been shown to support good task performance [51], [52]. For example, it is common for researcher to propose an interface alteration that would improve SA and test it by determining whether the alteration improved the performance [53]. The maintenance of a high SA usually indicates successful operator attention resource allocation strategies [54], which was manifested by the fact that higher scores were obtained by operators in the research. According to the analysis of the experimental results, the theoretical prediction value based on the improved SA model was significantly and positively correlated with the operator performance; thus, the SA model in the present study was confirmed to be optimized compared with the previous model [41].
The 10D-SART self-rating scale was employed in our research to evaluate the operator ''demand of attention,'' ''supply of attention,'' and ''understanding of the situation'' from 10 different dimensions. However, the experimental results indicated that the correlation between the theoretical prediction value and the evaluation score of the 10D-SART self-rating scale was low, probably owing to the operator's inadequate understanding of the scale and memory decay by post-trial form. Additionally, operators' overestimation or underestimation of their subjective feelings may affect the final evaluation results [55], [56]. Notably, the significant correlation between the operation score and the understanding dimension of 10D-SART, as well as the operation score and the SAGAT accuracy of level 3 (projection) implied the positive effects of comprehension and projection capacity (the two higher levels of SA) on behavioral performance.
In our research, seven eye-movement indices were adopted for SA measurement: the pupil diameter, eyelid opening, blink frequency, fixation frequency, mean fixation time, mean saccade rate, and fixation saccade ratio. Among them, the pupil diameter was highly and positively correlated with SA's theoretical prediction value. This is possibly because when the experimental conditions changed, operators needed to closely monitor the display interface and engage more cognitive efforts (manifesting as the enlargement of pupil diameter) to obtain more information resources [57], so that they could enhance their comprehension and prediction of the dynamically updated information flow, which enhanced their SA accordingly. Additionally, the analysis of the experimental results suggested that a high negative correlation existed between the fixation frequency and the SA's theoretical prediction value. With an increase in the fixation frequency, the consumed time of information extraction per unit time increased, which might indicate the operator's low efficiency in information processing and poor current cognitive level, corresponding to the relatively low SA. Another eye-movement index that was negatively and significantly correlated with the SA's theoretical prediction value in the present study was the fixation/saccade ratio. The fixation/ saccade ratio is considered as the ratio of the time spent processing (fixations) information to the time spent searching (saccades) information [58]. Research has indicated that expert pilots exhibit more frequent fixations (viewed as the increment of saccade behaviors) with a shorter dwell time (viewed as the decrement of fixation behaviors) than novice pilots when performing flight simulation tasks [59]. Therefore, a low fixation/saccade ratio may suggest a type of high-efficiency information perception and extraction mode for operators, which results in enhancements in their cognitive states and SA. Unfortunately, the correlations between the other four eye-movement indices (eyelid opening, fixation frequency, mean fixation time, mean saccade rate) and the SA's theoretical prediction value were relatively low. These complex relationships need to be further investigated in future studies.
Compared with the previous SA prediction model [41], the positive correlation coefficient between the predicted SA value and the operation performance was significantly larger for the improved model. This revealed that the human operation performance can be directly indicated by the predicted SA value through the improved model, and vice versa. Similarly, the response behaviors (accuracy rate and response time) in the SAGAT were more closely correlated with the predicted SA in the improved model than in the previous model, confirming the progression of the improved model in measuring the SA according to behaviors. Additionally, the negative correlation between the predicted SA value and the fixation/saccade ratio was far stronger for the improved model than for the previous model, suggesting that the more efficient mode of information perception and extraction (decreased fixation and increased saccade) can be indicated by higher level of the predicted SA value in the improved model.
Summarily, the improved SA prediction model has made progress in the SA being indicated by operator's behaviors, including the operation performance and information processing. The enhanced consistency between the model-predicted SA and the actual performance behavior can contribute to the computability and interpretability of the ACT-R model in analyzing operator's cognition and behavior. Additionally, the consideration of the three levels of SA (i.e., perception, comprehension, and projection) is beneficial for providing more feasible and targeted solutions in aviation engineering (e.g., interface display design, task flow design, and ergonomic evaluation). However, the present study had several limitations. The participants adopted in our experiment were not pilots but graduate students from an aeronautical science and engineering department with experience in operating the simulated flight platform. Considering the participants' ability and experience, we derived simulated flight scenarios from actual flight tasks, the flight task characteristics were properly displayed and the operational procedures were simplified. In addition, the SA prediction model focused more on visual information while insufficiently considered auditory information and communication among stakeholders, which may restrict the model's utility in naturalistic scenarios, such as air route management in air traffic control. Therefore, the reliability and robustness of our conclusion can be further improved by recruiting experienced pilots, and the effects of various typical naturalistic scenarios on the utility of the ACT-R model for predicting human cognition and behavior will be considered in future research.

VI. CONCLUSION
We improved a previous three-level SA prediction model based on ACT-R cognitive theory and achieved advances in indicating SA by human behaviors, such as the operation performance, SAGAT response behavior, and visual behavior. The support of SA for operation performance can be substantiated in the improved model in view of the variation consistency of SA and performance; thus, the operation performance can be an effective indicator for monitoring and predicting SA through the improved model. The behavioral measurements of SA can be validated by the strengthened correlation between the SAGAT response behaviors and the predicted SA in the improved model. Visual behaviors such as the fixation/saccade ratio can also be an indicator for predicting the SA in the improved model, according to the verified correlation between the SA and the efficiency of the information-processing mode in the improved model. Owing to the substantiated consistency between the model-predicted SA and the performance behavior, as well as the comprehensive consideration of the three levels of SA, the improved SA prediction model provides a new auxiliary tool for quantitative characterization of pilot's SA during cockpit display design optimization and ergonomic evaluation.