Human Heart-Related Indexes Behavior Study for Aircraft Pilots Allowable Workload Level Assessment

This study aimed to evaluate workload by detecting Heart Rate Variability (HRV) indexes in a sample of 34 pilots (with a mean age of 33 years) while performing simulated flight exercises. A one-way ANOVA with repeated measures was performed to assess the changes of the physiological measures in five standard maneuvers associated with different workload levels. The results show that all the indexes, but the Low Frequency to High Frequency ratio index (LF/HF), have a well-defined trend between the baseline and the en-route phase and with the three phases takeoff, steady turn, and landing. This study, as main findings, provides evidence of a differentiation among low, medium, and high workload levels using the time, frequencies, and non-linear HRV domains of analysis. These findings support the relevance of HRV indexes for workload evaluation, suggesting the development of non-invasive instruments capable of assessing workload in real-time. Further studies may be conducted to investigate whether the same findings could also be applied to more challenging maneuvers in real working conditions.


I. INTRODUCTION
The handling qualities of the aircraft are related to the maneuvers workload level experienced by the pilot while performing a specific task during the different flight phases [1]. A well-known consistent method to quantitatively define handling qualities is the Cooper-Harper pilot rating scale [2]. Using a decision-making process, the pilot can assign a rate of the aircraft's controllability and accuracy in performing a given task. Regardless of the aircraft attitude, the value of the rating scale achieved depends on the skill of the pilot and on subjective qualitative judgments or perceptions. Different ratings can be assigned to the same aircraft under the same flight conditions based on different pilot's abilities and backgrounds. However, recently, the idea of assessing both workload and the handling qualities using only classical The associate editor coordinating the review of this manuscript and approving it for publication was Eunil Park .
questionnaire-based methods has been revised by also taking into account measurements of a physiological nature with the aim of quantifying pilots' workload response. To do this, flight simulators for the analysis of the pilots' response to workload levels and tasks have been increasingly used in recent years [3], [4]. Motivation is found in the economic and safety advantages of training activities compared to a real flight, especially in the case of military pilots. For example, the studies carried out by Mansikka et al. [5] are focused on data collected from military pilot samples in training with particular reference during instrumental approaches. Lehmann et al. [6] analyzed the impact of turbulence and degraded visual environment on the pilot workload. Mohanavelu et al. [7] analyzed missions carried out on a simulator with different workload levels in terms of performance and additional tasks. Fuentes-García et al. [8] examined pilots performing a real flight with an F5 aircraft and a simulated flight with an operational F-5M flight simulator by analyzing the variations in objective parameters related to heart rate variability. Klyde et al. [9] ran tests with a flight simulator and used electroencephalogram (EEG) and electrocardiogram (ECG), as physiologically based techniques, to estimate handling qualities. Their results also demonstrated the feasibility of artificial neural networks in using different physiological measurements.
Another field where the pilot mental workload analysis and modeling can be beneficial regards the pilot response models related to the so-called pilot-in-the-loop investigation. Thus, the studies on pilot workload behaviors can improve the accuracy of the models used to analyze human assessment within complex systems, not only in the case of fixed wings [10] but also for missions with helicopters or for space missions. Another advantage is the possibility of adapting the levels of automation of remotely piloted aircraft systems during flight phases charactherized by disturbances or systems faults. According to Ji et al. [11], it is crucial to investigate the pilot workload to improve flight safety and performance during different operations. The pilot model usually takes into account the neuromuscular behavior associated with the pilot actions along with the time delay to account for the pilot's cognitive responsiveness, while the pilot's mental workload due to physiological aspect could improve the pilot-modelin-the-loop [12].
However, it should be noted that an important issue to be tackled concerns the relationship between subjective and objective (physiologically based) measurements because a useful pilot model is not achievable without an adequate correlation on workload levels. Many authors have debated this issue in recent years. In a large number of studies [13], [14] workload level was evaluated using subjective measures such as the NASA-TLX and the Cooper-Harper scale. In some cases, it is possible to associate the pilots' perception of workload levels with an increase or decrease in the cardiac variability indexes. However, there is no consistency among the different authors. For example, comparing the Cooper-Harper scale with the NASA-TLX subjective test and HRV, Mansikka et al. [15] showed that HRV indexes fail to sufficiently capture variations between low and medium workload levels. Moreover, Alaimo et al. [16] have found that the pilot's subjective judgment cannot properly be associated with the human body's physiological results, since the pilot perception subjective may jeopardize the judgment in the performed operations.
In this framework, the objective measurements aim to assess the experienced workload level using only physiological parameters. Thus, they represent useful techniques for workload evaluation, and they are widely applied in several studies and researches [17]- [21].
Physiological measurements in the aviation field have some advantages. First, these measurements are widely tested and used in the medical sector, and their validity and reliability are not a debatable issues. Second, they offer an objective assessment, independent of individuals' own subjective perception, by providing information regarding their physical state. In addition, the detection of physiological parameters can allow a workload assessment in real-time, leading to a continuous monitoring of changes in the amount of physical and mental effort experienced during the performed activities [22]. By doing so, it is easier to identify which specific task induces an increase or decrease in workload. An additional strength of these techniques lies in the use of wearable or contactless sensors with a low degree of intrusiveness with the performance [23], [24]. However, some drawbacks characterize this type of measurement: it is difficult to use standardized pilot groups to compare the obtained values; their variability among individuals could be a limitation for low-size samples; some of them are obtrusive, requiring special equipment and training, and there is not a single measure that can be universally valid in assessing workload across various scenarios because physiological responses caused by workload are highly scenario-dependent [25]- [27]. However, these limitations motivate further research in these directions.
In particular, the current study first examines the effects of demographic variables on HRV indexes. Then, by assuming that different maneuvers imply different workload levels for pilots, a relationship between flight phases and HRV indexes is examined by employing the ANOVA approach. Furthermore, we also assume that the HRV indexes show statistical differences among low, medium, and high levels of workload. To this purpose, a multivariate analysis is then performed.

II. HRV MEASUREMENTS
In healthy and not stressful conditions, the intervals between consecutive heartbeats are not constant but vary randomly from one beat to another. Heart Rate Variability (HRV) is the natural variation in the time between one beat and the next one. It is also known as RR variability, where RR is the distance between two R peaks of ECG.
HRV analysis is a method used to assess the mechanisms of regulation of the human body's physiological functions. These mechanisms refer to the autonomic nervous system and the neuroendocrine system. The balance of these systems determines the capacity and type of adaptation to an external stimulus, which is commonly called a reaction. The adaptation, whether positive or negative, depends on the degree of irregularity of these mechanisms.
The HRV indexes are closely related to the autonomic nervous system, they can reflect the state of psychological stress in people, and are sensitive to changes related to the demands of the current activity and the effort required; therefore, they are widely used as indicators of workload [4], [8], [28]- [32].
It should be noted that although the study of HRV originates from a purely medical context and it is still analyzed and evaluated in clinical studies, its use in aviation is valid and demonstrated by several studies [10], [15], [16], [28], [30], [32]- [41].
HRV can be evaluated using different techniques. The most common are: a) the time domain analysis, b) the frequency domain (or power spectral density) analysis, and c) the non-linear indexes analysis. In the following paragraphs, the measures extracted from the signals acquired on the sample under examination are described in detail.

A. TIME-DOMAIN INDEXES
The time-domain analysis provides indexes that quantify the heart rate variability by using statistical approaches and geometric measurements (related to the density of distribution of RR values). Since these indexes are calculated with procedures that do not depend on particular cases of application, provided that the duration of the acquisitions is the same, it is possible to consider that the data are comparable with each other even if collected in research carried out in different times and contexts.
The main time-domain measurements include the computation of Mean RR, the standard deviation of the ''normal to normal'' series of intervals (SDNN), the square root of the mean squared differences of successive RR series (RMSSD), the number of successive differences that are greater than x milliseconds (NNx), and the percentage of total intervals that differ successively by more than x milliseconds (pNNx). For more details about the definition of such indexes the interested reader is referred to [42]. In this study, the RMSSD and the SDNN were considered. In particular, RMSSD is an estimate of the short-term components of HRV; SDNN, measured in milliseconds, reflects the influence of all nervous system-related factors contributing to HRV. It is worth noting that SDNN values depend on age and sex, and these factors should be considered whenever possible; of course it depends on the available population. A reduction in SDNN suggests activation of sympathetic regulation that inhibits the autonomic cycle. A strong reduction in SDNN results from a critical strain on regulatory systems and the involvement of higher levels of control and can be associated with an increase in workload level [42], [43].
An additional parameter considered in the present study is the one introduced by Baevsky in the space field for research on ECG signals of astronauts before and after space missions [44], [45]. The Baevsky stress index, hereafter referred to as SI, is an index that reflects the degree of centralization of heart rate control and essentially characterizes the activity of the sympathetic part of the autonomic nervous system. In this work, the square root of the Baevsky stress index was computed according to [46]. Of course, it should be noted that SI depends on the action accomplished. It makes a difference if you are sitting quietly or exercising. However, at least in general terms, the lower the SI value, the greater the number of difficult tasks a pilot can execute. The less variable the RR intervals are, the higher the SI scores will be.
Additional measurements of the geometrical characteristics are associated with the time domain RR variability. Indeed, starting from the RR histogram, it is possible to extract two measurements [42], the HRV triangular index (HRV ti ) and the triangular interpolation of the NN interval histogram (TINN). The HRV triangular index measurement is the integral of the density distribution divided by the maximum value of the density distribution. TINN is the base width of the distribution by approximating the NN interval distribution with a triangle.

B. FREQUENCY-DOMAIN INDEXES
Power spectral analysis allows obtaining information regarding the frequency and amplitude of each specific rhythm present in the ECG. The power spectrum can be divided into four frequency bands [42]: i) HF (High Frequency) frequencies between 0.15 and 0.4 Hz; ii) LF (Low Frequency) frequencies between 0.04 and 0.15 Hz; iii) VLF (Very Low Frequency) frequencies between 0.003 and 0.04 Hz; iv) ULF (Ultra Low Frequency) the band falls below 0.003 Hz. The HF band is mainly considered an expression of the activity of the Parasympathetic Nervous System (PNS) (Sztajzel, 2004). The LF band is considered mainly due to the activity of the Sympathetic Nervous System (SNS) [47]. The VLF band is only partly due to SNS activity [48]. The oscillations in heart rhythms within the ULF band are mainly due to very slow regulatory processes, such as regulating body temperature, metabolism, and circadian oscillations. Thus, given the very low frequency of the oscillations, the ULF band's contribution can only be appreciated in 24-hour acquisitions [49].
Considering that observation time intervals equal to 5 minutes are used in the present work, and according to the bands' classification of the cardiology task force [42], it is decided to use only the three bands VLF, LF, and HF. Moreover, the total power (TOT pow ), which is the sum of the energy in the VLF, LF, and HF bands for short-term recordings, is considered.
The LF/HF ratio, which is the power in the LF band divided by the power in the HF band [50] is considered an index of the global sympathovagal balance [47]. This ratio is very complex, since both sympathetic and parasympathetic activities are present in the LF band; therefore, it cannot be seen, in an absolute way, as a balance between the activities of the two categories of the autonomous system. This complicates the estimation of workload conditions; in fact, in some works, LF/HF increases as the workload increases [51]- [53]; whereas it decreases for Hsu et al. [54] it decreases. Considerations based on this relationship must be made knowing the LF band generation mechanism and the context in which the acquisitions are made [55].

C. NON LINEAR INDEXES
Nonlinear indexes used in HRV analysis include Poincaré plot, detrended fluctuation analysis, approximate entropy, and sample entropy calculation [56]. In the present work, the parameters extracted from Poincaré's graphical method are used. In the Poincaré diagram, the scattering of RRn (the time interval between two consecutive R peaks) over RRn+1 (the time between the two successive R peaks) is plotted [57]. An ellipse with a semi-major axis along the bisector of the axes plot can be reproduced on the graph. The standard deviations along the bisector identity line (SD2) and perpendicular to the bisector (SD1) represent the amplitudes of the major and minor axes of the ellipse, respectively. SD1 represents the standard deviation of the instantaneous beat-to-beat variability or short-term variability. SD2 represents the standard deviation of continuous or long-term variability [57]- [59]. The ellipse is primarily a visual aid and the numerical values of the SD1 and SD2 standard deviations contain the important data. Low values of SD1 indicate high levels of workload; in fact, this parameter is closely related to the SDNN baseline statistical measures. Table 1 provides a summary list of the indexes considered in the subsequent analyses.

A. PARTICIPANTS AND EXPERIMENTAL SET-UP
In this study, different flight phases, classified as Category B and C according to the taxonomy described in the Military Specification 8785C [60] are compared by using ECG indexes based on HRV measurement that are acquired during exercises carried out by using a ground-based full flight simulator. The aim is to evaluate how the trend of the physiological indexes change in the different flight phases. An EcgMove 3 sensor manufactured by Movisens GmbH with a sampling frequency of 1024Hz was used for the ECG data acquisition. 34 participants (29 males and 5 females) aged between 23 and 52 years (M = 33, SD = 8.37) are considered for this experiment. Their BMI was in a healthy average range for both males and females (Mmales = 25.18, SDmales = 2.71; Mfemales = 22.96, SDfemales = 1.96). None of them declared to have any diseases, illness, or particular conditions that could alter the responses to physiological measures. It is worth noting that a total of 37 pilots partecipated in the experiments, however, data beloging to three of them were found to be outliers and thus were excluded from the analysis, resulting in a population of 34 pilots. Each pilot signed an informed consent form before the start of the exercise. The experimentation was carried out by means of a ground based full flight simulator that replicates the business jet Cessna Citation 560 XLS. Figure 1 shows an example of a flight path considered in the analysis. The flight phases performed by each pilot under the same instructor directives are shown in blue, whereas the red dashed lines are connections between flight phases not evaluated in the present study. The authors chose the acronyms of different segments according to the ICAO ADREP Taxonomy. STD stands for the standing phase, the aircraft is on the runway head, and the pilot is staying for five minutes in a rest time seated position; TOF stands for Take-off; MNV is the flight phase associated with the turn maneuver, all the pilots performed two steady turns, the second one was in the opposite direction to the first; ENR was a segment of five minutes associated with level flight attitude; finally, the LDG is the landing flight phase. The segments length was selected based on a time interval of five minutes in order to compare the phases according to the minimum time recommended for HRV analyses [42]. It is worth noting that, concerning the handling quality classification [60], the terminal flight phases TOF and LDG are classified as Category C (since usually require accurate flight-path control), whereas ENR belongs to Category B nonterminal flight phases (which are normally accomplished using gradual maneuvers and without precision tracking). This gives an indication of the difficulty associated with the flight phases, and thus it can provide insight into the pilots' workload levels.

B. DATA ANALYSES
Before running the main statistical analyses, the first step is to make sure that our sample is in a resting state during rest time. To this purpose, a large set of physiological indexes during rest time was measured and compared with data obtained from other sources [61], [62], where these indexes were assessed in a quiet situation among healthy individuals. Data were obtained from open access Physionet database, an on-line collection of physiological and clinical data [63].
Mean scores on physiological indexes obtained from the total sample of 33 individuals were then considered as reference standards for the present study. To accomplish this goal, t-test independent samples is performed. To assess whether scores on the physiological indexes could be affected by demographic variables (sex and age), analyses of variances were conducted, taking into account mean scores obtained by the two subgroups (males vs. females, ≤ 30 years vs. > 30 years, respectively).
To evaluate the differences among scores on physiological measures at the five flight phases, a ONE-way ANOVA with repeated measures was conducted. A mixed designed ANOVA with repeated measures was then performed to assess whether changes in phases were invariant across the two different age groups. The Greenhouse-Geisser correction is applied for the degrees-of-freedom adjustment. Descriptives, F values, and effect size (η 2 ) were reported. A 5% significance level was adopted for all tests.
All the indexes were computed with the Kubios HRV -Heart rate variability analysis software [46], the data are processed using IBM SPSS Statistics (Version 20.0).

IV. RESULTS
A. REST TIME ANALYSIS Table 2 shows descriptives and the results from t-test analyses for all the selected HRV indexes. Group 1 is the sample of pilots, whereas group 2 is the sample of Physionet data. Results indicate that the two samples do not differ each other. As a matter of fact, by applying the t-test, all the comparisons show a p value greater than.05, suggesting that no statistical differences are found between the two groups. This means that participants of the current sample are actually in a rest condition during rest time. In addition, we can also assume that our sample is made up by healthy individuals. Table 3 lists the results of the ANOVA test for grouping based on gender. The results show that only scores on the Stress Index (SI) differed between males and females. This means that, except for this index, gender does not affect scores on physiological measurements, at least, considering the present sample.

C. RELATIONSHIPS BETWEEN AGE AND PHYSIOLOGICAL INDEXES
Regarding the associations between age and scores on physiological measurements, the sample was divided into two groups composed of individuals older than 30 years and equal or less than 30 years, respectively. The results in Table 4 show that only HF and LF/HF report significant differences between the two groups. The former is higher in younger individuals, whereas the latter increases with aging. Besides, LF/HF seems to be the most sensitive index to the age effect since it reports the highest effect size.

D. ONE-WAY ANOVA WITH REPEATED MEASURES ON DIFFERENT FLIGHT PHASES
In this section, the results ot the one-way ANOVA with repeated measures on different flight phases are reported. The findings of the analyses are shown in Table 5. After checking the univariate tests, all but the LF/HF report a significant effect (p value lower than 0.001), suggesting that there is a statistically significant difference between the mean scores on the five flight phases.
The flight phases pairwise comparisons considering each physiological index were inspected, using Bonferroni correction, in order to identify which mean scores were statistically different from each other.
A one-way ANOVA with repeated measures showed a significant main effect of flight phases on SI (F (4,132) = 25.261, p < 0.001). Significant differences in mean SI were found between STD and all the other flight phases (all differences were significant at p < 0.001), between ENR and MNV (p = 0.007), and between ENR and LDG (p = 0.001).
The results of one-way ANOVA with repeated measures revealed a significant main effect of flight phases on SDNN (F (2.804,92.524) = 41.995, p < 0.001). A statistical difference for mean SDNN scores was obtained between STD and all the other flight phases (all differences were significant at p < 0.001), between ENR and TOF (p = .002), between ENR and MNV as well as between ENR and LDG (p < 0.001). Regarding RMSSD, the findings indicated a significant main effect (F (2.899,92.695) = 12.803, p < 0.001). Statistical differences were found between STD and ENR (p = 0.007), between STD and TOF (p < 0.001), between STD and MNV (p < 0.001), and between STD and LDG (p = 0.001). Another significant difference in RMSSD mean scores was estimated between ENR and MNV (p = 0.033).
The findings showed a significant main effect for HRV triangular index (F (2.307,76.147) = 60.160, p < 0.001). Significant differences were estimated between STD and all the other flight phases, between ENR and TOF, between ENR and MNV, between ENR and LDG (all differences were significant at p < 0.001).
The results of the one way ANOVA reported a significant main effect of flight phases for mean TINN (F (2.853,94.148) = 25.443, p < 0.001). Significant differences were found between STD and all the other flight phases (all differences were significant at p < 0.001). In addition, a significant difference was observed between ENR and MNV (p = 0.002) and between ENR and LDG (p = 0.003).  VLF power resulted in a (F (2.803,92.504) = 9.192, p = .000), statistical differences were estimated between STD and TOF (p = 0.003), between STD and MND (p = 0.004), and between STD and LDG (p < 0.001).
Regarding the LF power, the results of one-way ANOVA with repeated measurements showed a mean significant effect (F (2.300,79.909) = 30.820, p < 0.001). The pairwise STD and ENR reported statistical differences (p = .007), whereas the pairwise STD with TOF or MNV, or LDG reported values of significance p < 0.001. Other significant differences were found for the LF index between ENR and TOF with (p < 0.001), and between ENR and MNV or LDG (p = 0.003).
Concerning SD1, the results of one way ANOVA with repeated measures report a significant mean effect (F (2.989,95.645) = 12.813,pp < 0.001). Significant differences are shown between STD and ENR (p = 0.007), between STD and TOF or MNV results (p < 0.001), and between STD and LDG (p = 0.001). Another significant difference was also estimated in the comparison between ENR and MNV (p = 0.032), whereas no statistical differences were found with TOF or LDG.
Lastly, the results of one way ANOVA with repeated measures indicated a significant mean effect for SD2 (F (2.849,94.028) = 53.041, p < 0.001). Statistical differences were reported between STD and all the other flight phases, and between ENR and TOF, ENR and MNV, and between ENR and LDG (all differences are significant at p < 0.001). Table 6 shows a synthetic resume scheme of comparisons between each flight phase for each index, in which statistically significant differences are indicated by the boolean marker ''1''.
Lastly, the results of the mixed designed ANOVA test reported no significant interaction between age and flight phases, Wilk's λ = 0.670, F (48,452.735) = 1.035, p = 0.413, partial η 2 = 0.095. This means that changes in mean scores on the physiological measures over the different flight phases were equivalent across the two age groups. Figure 2 shows the notched boxplots normalized to the reference baseline of the STD to produce a visual interpretation of the values of the HRV indexes obtained over the different flight phases. The results described in the previous paragraph have highlighted the significant differences obtained between the various phases; however, to identify whether the workload levels were higher or lower, the trend of each index was analyzed over the flight phases. In Figure 2, a red line between the values of the medians is shown. Analyzing the indexes' tendencies, it is possible to view how all indexes have a well-defined trend between the baseline and the ENR phase and with the three phases of TOF, MNV, and LDG, which do not exhibit noticeable variations between them. This suggests that the workload levels are comparable for the Category C terminal phases of TOF and LDG and a constrained maneuver such as the steady turn.

E. FLIGHT PHASES COMPARISON
This holds for all the indexes with the exception of LF/HF because in the pairwise comparisons no significant differences have emerged using one-way ANOVA with repeated measures. In Figure 2 this effect is noticeable as the fluctuation of the index between ENR and MNV compared to TOF and LDG. Concerning SI, which shows an increasing trend compared to the other indexes, it should be recognized that an increase in SI is associated with an increase in workload levels; hence, the results agree with the other indexes.
Considering the first five indexes in Table 6, namely SDNN, SD2, LF, TOT pow , and HRV ti , Figure 2 presents a clear trend with respect to the baseline. For these indexes, the whiskers of dispersion are more contained than those for the other indexes that present at least one flight phase with a much greater variance than the other or with skewed data. Concerning LF and TOT pow , the ENR whiskers have a greater extension than the other phases; this effect is also found in all the other frequency bands.

F. WORKLOAD ASSESSMENT
A preliminary method of workload levels estimation, based on specific considerations for the first five indexes listed in Table 6, is detailed in this section. These indexes show similar results in the pairwise comparison of the flight phases.
In particular, based on what was mentioned in the previous section, the TOF, LDG, and MNV flight phases can be associated with a high workload level. With the purpose of using three different workload levels (low, medium and high), the data of the three high-workload phases were merged at a high level. See Figure 3 that shows the trends of the previously selected indexes. The y-axis is named AWL-Allowable WorkLoad, meant by authors as the percentage amount of workload that the subjects are still able to use. Thus, a value equal to one (i.e., 100%) is associated with the lowest level of workload. The deviation with respect to the mean value is computed as 0.5 times the standard deviation of the acquired sample. Three different bands are drawn with color gradation, and the highest workload assessment zone is colored in red. Except for the green and yellow bands in the HRV ti index, all others bands show an overlapped transition band from lower to higher workload level.
A sixth further index is introduced in Figure 3, built by combining the previous five selected indexes; this index is named PPM -Pilot Performance Monitoring. The PPM is calculated starting from the normalized values for the baseline rest time of the five indexes considered and summing the contributions of each index for the corresponding difficulty level. The overlap thresholds shown in Figure 3 for PPM were obtained considering a standard deviation of 0.5, which results in central overlapping bands that differentiate the AWL thresholds more systematically than the results obtained by the individual indexes.
A multivariate analysis of variance was conducted to test the overallsignificance of the three workload levels among the five selected measures. Setting a level of α = 0.05, a significant main effect was estimated, Wilks' λ = 0.105, F (10,24) = 20.446, p < 0.001, partial η 2 = 0.895, meaning that the combination of the five selected physiological parameters provides a useful discrimination of the three different workload levels. In addition, the results showed that there were statistical diferrences among the three different workload levels for each measure (Table 7).
Wilks' λ value in a multivariate test does not provide explicit information on how the indexes contribute to differentiating the workload levels for the analyzed case study. Wilks' λ give only the knowledge thet there is a significant effect on at least one of the dependent variables taken into consideration.
Thus, repeated contrasts were chosen; each contrast compared the index's value at one level to the index's value at the following level, and significance values were computed to verify that all the indexes can recognize the differences.
In the low-medium comparison (L-M), the two contrasts listed in Table 8, TOT pow and LF, had significance values greater than or equal to 0.001. The cause of reduced significant effect in these two comparisons is detectable from the analysis of boxplot data described in the previous paragraph flight phases comparison, in which TOT pow and LF reported higher whiskers due to the variability of acquired values in the ENR flight phase.
Nevertheless, by identifying possible outliers in the sample for each level using the multivariate Mahalanobis distance test, four pilots were found in at least one of the levels with remarkably high distance values. Excluding these pilots and reanalyzing the sample, based on 30 pilots, all indexes exhibited significant differences in the comparison between levels. The results for the sample of 30 pilots are shown in Table 9. Descriptives, F values, and effect size (Partial η 2 ) were reported for the analyzed groups.

V. DISCUSSION
Concerning the influence of sex (data reported in Table 3), except for the SI, none of the investigated physiological indexes showed statistical differences between males and females. However, a recent meta-analysis [64] systematically reported the relationship between HRV indexes and sex. For instance, analysis of time-domain HRV indexes showed a more decreased mean score on SDNN in females than in males, whereas analysis of RMSSD yielded no  significant or inconsistent findings. Furthermore, regarding the frequency-domain HRV indexes, females compared to males reported lower scores on VLF, LF, LF/HF ratio, TOT pow , and higher HF [64]. These differences between the present results and the literature may be partly due to the participants' characteristics of the examined studies, which may have affected HRV indexes scores between the two sexes. To include these effects in the workload assessment, it is important to investigate a sample of pilots that includes more females than those involved in the present study. The authors suggest considering these findings with some caution since they may be influenced by the high disproportion of the number of male and female participants. However, it is worth noting that the percentage of females involved in the experiment is in line with the current civil pilots population [65], [66].
Regarding the influence of age, in the present study, only HF and LF/HF showed significant differences between the two age groups, see Table 4. HF was negatively associated with age, suggesting a reduced autonomic tone of the heart with an increase in age. On the other hand, positive associations were estimated with LF/HF, indicating an increment of sympathetic activity with an increase in age. These results agree with previous empirical works [67], [68], which found associations characterized by similar trend. Nevertheless, the results of the current study do not support the existence of any influence of age on the other physiological indexes. There is a possible explanation for these findings. For instance, in some studies, age is negatively related to SDNN, to RMSSD, and TOT pow [67]- [69], but it is important to outline that in the aforementioned studies, the participants' age groups were very larger and distinguished (aged from 16 to 60, from 40 to 100, and from 10 to 80, respectively). In contrast, the present study sample was restricted to only adult individuals (min age = 23 years, max age = 52 years). It is likely that, considering different age groups, significant results would have been found. Nevertheless, it is worth noting that the age of pilots involved in the present study denotes a close proneness to a realistic training group.
Concerning the trend of the HRV indexes over the flight phases, some general aspects can be highlighted. In particular, it is possible to note how an increase in the workload affected each index. Figure 2 showed for the SI index a different response from the other indexes; in fact, the value for the in-flight phases was greater than the baseline. This result agrees with the pilots' physiological response since, according to Baevsky et al. [45] SI increases as the workload increases. By analyzing the time domain indexes SDNN and RMSSD, it can be observed that they had a decrement related to an increase in workload level; comparable results were found by Cinaz et al. [70] and Orsila et al. [43]. The differences in the mean values were estimated in the present study with a significant difference for SDNN, but even to the decrease in RMSSD according to Cinaz et al. [71] and Fallahi et al. [52] should be related to an increase in mental workload. Similarly, a decrease in the geometric index is related to an increase in workload [70], [71]. More specifically, a decrease in the HRV ti index is associated with an increase in the workload levels. This resulted in a significant difference in the pairwise comparisons of flight phases according to Mansikka et al. [4], [5]; the TINN index resulted in a weak significant difference in the pairwise comparisons of flight phases, which is in accordance with Cinaz et al. [71]. Regarding the frequency domain, the results are largely in agreement with the reference literature. In particular, the obtained results showed that a decrease in both LF and HF bands reflects an increase in the workload. Regarding these two frequency bands also Bonner and Wilson [28] found a significant decrease as the workload increased.
Separate considerations must be considered for the LF/HF ratio. The results of the present work show that all the flight phases compared with the baseline have an average level that is higher than the baseline itself. This means that the workload level during rest time is lower than during flight phases; however, for the pilots and flight phases analyzed, the LF/HF ratio is inadequate to differentiate between medium and high workload levels. It is worth mentioning here that the results reported in the scientific literature about the relationships between the LF/HF ratio and workload levels are not always in agreement. In some previous works [30], [51]- [53] it was reported that an increase in the LF/HF ratio is associated with a workload increase. Conversely, the opposite trend was reported by Hsu et al. [54]. The present study as well does not reveal a distinct relationships between WL and flight phases based on the examined sample.
The other two indexes, VLF and TOT pow , examined in the frequency domain, reported a decrease associated with an increase in workload, and similar results were found by VOLUME 10, 2022 Heine et al. [72] and Miyake et al. [73]. More in particular, the decrease in VLF results is sensitive enough to differentiate the maneuvers associated with high mental workloads, this is in accordance with Hsu et al. [54]; TOT pow presents a narrower variance and a significant difference in pairwise comparison of flight phases, and for this reason, it is an appropriate index for the workload level inspection.
Finally, the non linear indexes computed on Poincarè plot, SD1, and SD2 showed a reduction in the median values that should be related to a workload increase as found by Acharya et al. [74]. Both SD1 and SD2 reduced as the workload increased, so the Poincarè ellipse contraction can be related to variations in the workload levels.
As a further result, all the considered HRV indexes' analysis proved that takeoff and landing maneuvers, belonging to Category C handling qualities classification, have similar workload levels to a double steady turn maneuver; therefore, it is possible to consider these three maneuvers as a group to high demand workload.
Finally, with the aim of differentiating among low, medium and high workload levels, the indexes SDNN, SD2, LF, TOT pow , and HRV ti were selected in the present work because they showed a statistically significant decrease with workload increase and because they allowed the estimation of the three different pilots AWL bands. In addition to these findings, an index named PPM -Pilot Performance Monitoring was introduced as the sum of the five HRV indexes. For the considered case study, the PPM seems to asses more consistently the allowable workload level. These results can be considered a step forward in the identification of the workload level that pilots can experience during flight or training since, to the best knowledge of the authors, this is the first study that produces consistency in the differentiation among low, medium, and high workload levels using the time, frequencies, and non-linear HRV domains of analysis.

VI. LIMITATIONS AND FUTURE DIRECTIONS
The main limitation of the study could be seen in the low number of participants which could limit the generalization of the findings beyond the study sample. Nevertheless, it is worth noting that the sample size is similar to the ones reported in other previous works about flight simulation activities and that the results are in line with the literature [4], [8], [52], [54].
A second limitation concerns the inclusion of standard maneuvers, excluding from analysis any non-standard maneuvers, which may be likely associated with higher workload levels. From this perspective, it will be necessary to verify what occurs with much more challenging maneuvers, such as the final approach with low visibility or one engine inoperative.
Another limitation may be that the experiments were carried out using a ground-based simulator instead of real flying aircraft. However, it is clear that using a ground-based simulator is consistent with first steps research studies to lay down the basis of future experiments to be carried out using real aircraft simulator systems such as the Calspan's Learjet Simulators.
Additional studies should also be addressed at taking into account shorter time intervals than five minutes to investigate whether the same conclusions can be drawn also in a different time frame.
For future testing, the optimal and recommendable choice is the detection of multiple HRV parameters, since a single indicator may not be able to fully describe the workload. From this point of view, the information coming from different indexes can converge in the same direction, providing a more detailed description of the investigated workload assessment.

VII. CONCLUSION
This study can be considered a contribution to workload assessment using physiological measures. Specifically, considering time-domain, frequency-domain, and non-linear HRV indexes in a unique experimental design, these findings offer a more comprehensive overview of the different HRV indexes for workload evaluation. These results support the importance of parameters based on HRV measurement for assessing changes in workload, also suggesting the development of non-obtrusive devices that can assess workload in real-time.