Towards Estimation of Emotions From Eye Pupillometry With Low-Cost Devices

Emotional care is important for some patients and their caregivers. Within a clinical or home care situation, technology can be employed to remotely monitor the emotional response of such people. This paper considers pupillometry as a non-invasive way of classifying an individual’s emotions. Standardized audio signals were used to emotionally stimulate the test subjects. Eye pupil images of up to 32 subjects of different genders were captured as video images by low-cost, infrared, Raspberry Pi board cameras. By processing of the images, a dataset of pupil diameters according to gender and age characteristics was established. Appropriate statistical tests for inference of the emotional state were applied to that dataset to establish the subjects’ emotional states in response to the audio stimuli. Results showed agreement between the test subjects’ opinions of their emotional state and the classification of emotions according to the range of pupil diameters found using the described method.


I. INTRODUCTION
Humans commonly express their emotions using facial, gestural, verbal, and written communication. However, all of these are methods can be controlled and, therefore, can hide the true feelings of the subject. Hence, the focus of research has been diverted to extracting emotions from involuntary responses, such as heart rate variability (HRV), skin temperature (SKT), electroencephalography (EEG) and, herein, from eye pupil responses. Even if involuntary responses are utilized, the hardware needed to identify such responses may inhibit a user's free movements when expressing emotions [1], which clearly impedes that hardware's use in a clinical setting, which could include end-of-life care. Systems based on involuntary responses are nonlinear systems, and various methods have been investigated for estimating parameters that depend upon such nonlinear data [2]. Accordingly, in the presented study based on pupillometry, calculated pupil diameters and other statistical features have been used as parameters.
The associate editor coordinating the review of this manuscript and approving it for publication was Derek Abbott . However, pupillometry, as described in this study, represents a non-invasive method of monitoring patients and their caregivers, so that their emotional state can be judged remotely. Thus, this study demonstrates automated classification of emotional states to aid remotely observing clinicians.
Over the past few decades, many researchers, especially psychologists, have explored the reasons behind changes in pupil diameter and have found that those changes could be an index of cognitive function. The pupil within the human eye is the opening of the iris that permits light to enter the eye and reach the retina, allowing humans to see. The diameter of the pupil is controlled by two sets of smooth muscles in the iris. Constrictor muscles decrease its diameter, which is called pupil constriction, while dilator muscles increase the diameter, which is termed pupil dilation [3]. An example of different dilated pupils is shown in Fig. 1. The constriction and dilation of a 3 pupil are entirely controlled by motor sensors linked to the brain, meaning that they operate autonomously and, hence, cannot be consciously controlled by humans [3].
Broadly, the pupil dilates when people are more attentive, emotional, have a greater cognitive load [4], or are feeling aroused. Humans themselves are commonly unaware of these changes happening to their eyes. Pupils dilate and constrict continuously when humans are thinking hard, feeling pain or happiness, taking drugs, watching advertisements and for many more reasons, see [5]- [7] and [8] respectively. Pupillometry is the study of such changes in pupil diameter as a function of different neurological activities in a human brain. In medical terms, a pupil diameter can vary from 1.5 mm to 9 mm and reacts to stimulation in approximately 200 ms. A normal pupil is approximately 3 mm in size during standard light conditions [9].
In this study, to examine the emotional responses of test participants, audio stimuli were applied. Naturally, within a clinical setting, emotional responses would normally come from elsewhere, but for the purpose of this study's test, they represented convenient and standardized stimuli. Audio/video stimuli given to a person of any gender, whatever their age and the activity level of their brain, may change their thinking ability. Moreover, such stimuli may cause many different physiological changes in a subject's mood and body. Given that, a test subject experiences many emotional responses. In the current study, six emotions were considered for the identification of human emotional behaviour by means of the eye's behaviour, i.e., happiness, sadness, anger, irritation, disgust, and fear.
The two primary research contributions of this current study are as follows: • Identification of a range of different emotions based on pupil diameter; • Extraction of pupil diameter by means of a simple camera so that the diameter can be used for medical, or other, needs. More specifically, with respect to the first contribution mentioned above, this study comprises a proof of concept that capturing eye pupil responses using a tiny Raspberry Pi board camera is feasible. Such a camera could easily be fixed in goggles, for example, in a future wireless sensor network application. In addition, in terms of implementation, the study contains a demonstration of video/image processing, allowing accurate pupil-diameter calculation. In terms of practical methodology, the study contains a comparison of voluntary and involuntary responses of a subject, in this case using an audio stimulus; identification of the underlying data characteristics required for statistical analysis of pupil responses; and application of parametric and non-parametric statistical tests to perform statistical validity inference.
The remainder of the study is organised as follows. Previous research and findings are discussed in Section II. Data collection, video images of eye pupils, and extraction of features, i.e., pupil diameters, are then described in Section III.
After that, in Section IV, pupil diameters are analysed to determine the required output, while section V contains a detailed discussion of the results. Concluding remarks include future research directions. Section VII.

A. PUPILLOMETRY
Pupillometry involves measurement of a pupil's diameter and can be applied in security applications [10], traffic safety [11], a clinical setting [12], psycho-physiology [13], and psychology [14]. Here, we examined recognition of a human's emotional state by means of pupillometry. In that respect, this study is most directly related to research on psychopathy [15], as well as helping visually and mentally impaired patients. However, unlike that research, which employs costly eye trackers or other such mechanisms [16], [17], the current research was designed to obtain quality results, while at the same time operating with simple cameras at a low-cost. In fact, the Raspberry Pi board can be extended via a wireless sensor network [18], which in turn might form part of an Internet of Multimedia Things (IoMT).

B. APPLICATIONS OF PUPILLOMETRY
Initial research findings have indicated that pupil dilation is a measure of emotion, cognitive load [19] and memory recognition [20]. Greater arousal is indicated by increased pupil dilation, which in turn indicates decisions and choices are being made by the subject, see [21] and [22]. Researchers have investigated the size of human pupils, comparing pupil dilation to constriction [23]. A subject's pupils tend to dilate more during recognition of relatively older memories [23]. In other words, pupil dilation is greater when a subject is presented with older or more well-established memory traces rather than newer stimuli. Indeed, a prominent difference in pupil behaviour was observed in many early studies, for example, as part of gender-based comparisons [24]. In the same era, pupil sensitivity was identified as different from one person to another during various tense situations, such as during muscle tension when lifting weights or in response to the threat of a gunshot [25]. More recently, an important application of pupillometry has been the detection of drug use [6].

C. PUPILLOMETRY FOR RECOGNITION OF EMOTIONS
A strong trend in pupillometry research and eye behaviour in general has been emotion recognition, which has been enhanced by the ability to classify more categories of emotion and by the use of different analytic techniques. A summary of such investigations is provided in Table 1.
An important application of the investigations of Table 1 is to provide a communication medium for those who otherwise experience difficulties in expressing their feelings, especially during clinical screening and therapy sessions. For example, one such investigation [37] analysed experimental results to provide an interface for health care. An exploratory VOLUME 9, 2021 experiment was conducted for the development of a screening program for people with such disabilities. The interactive computer-based screening program was comprised of sets of stimuli designed to elicit emotional responses from subjects.
Many eye trackers have also been developed [45]- [47] to provide assisted living. Eye-tracking applications are also relevant to multiple other domains, such as psychology, engineering, advertising, and computer science [48].
Recent literature has shown that the human eye pupil has a certain 'language', which can be translated into different emotions. The translation of those emotions using pupillometry has taken place for several applications using multiple analytic techniques, but the translation mechanism is not completely reliable. In addition, a good number of those techniques require expensive and customized sensors, whereas in this study, passive sensing by visual means is employed using a low-cost sensor board, which includes wireless communication capability. A demonstration of the feasibility of using the low-cost board is now given by way of experiments on subjects in total 32 (27 for 12 audio clips and 32 for 2 audio clips).

III. EXPERIMENTAL PROCEDURE
The experimental procedure comprised the acquisition/ collection of data and its subsequent data processing, which consisted of feature extraction from the captured video followed by statistical analysis of the entire dataset, as described below.

A. EXPERIMENTAL SETUP
The tests described in this study were constrained by the need to operate on typical video sensor devices, as are used in wireless sensor networks feeding into an Internet-of-Things (IoT) [49] (or more precisely, an IoMT) [50]). For the purposes of the calibration, test subjects were carefully positioned. The entire experimental setup is shown in Fig. 2.
The experimental setup was built to carefully record the pupil response of each subject. Subject heads were positioned at an angle of 93 degrees and at a distance of 17 cm from a camera. Note that 17 cm was a comfortable position in the test scenario. Since the prototype can eventually be mounted on goggles or glasses to make the participant comfortable, the separation distance will become irrelevant. A 5 mega pixel (MP) camera with a wide-angle fisheye lens (A) was used in combination with a Raspberry Pi-3 board (B) [51]   for capturing videos of subjects' pupils. The pupil images were captured in a room with open windows in normal day light. Furthermore, emotional responses were captured for audio stimuli, and therefore, subjects were not affected by the lighting when responding to the stimuli.
Note in Fig. 2 that an IR camera was chosen. An IR camera generally leads to a clearer image of an eye pupil, either in comparison to the lower resolution of an RGB 5 MP camera or the higher resolution of an RGB 16 MP camera, whatever the illumination, as illustrated in Fig. 3. Using a camera with an IR sensor in the experiments made the colour of the iris irrelevant, as different coloured irises appeared light grey. Therefore, this made it easier to identify pupils and calculate their diameters (Note that this research utilizes cost-effective equipment. Consequently, cameras with higher than 16 MP resolution would violate this requirement.) Recently, IR digital cameras have become more widely available, and their usage for pupillometry has been established [52]. Aside from medical usage, where portability is important, they also seem suitable for IoMT deployment. On the other hand, a camera with 15-20 MPs resolution (infrared or otherwise) might be used for this purpose but may prove difficult to handle for the given type of experimentation. Last, note that in part B of Fig. 2, alongside the central IR camera connected to the main board by a ribbon cable, two circular objects can be identified. These are, in fact, IR LED flashlights. Their main purpose is to shine a light on a subject to convey a clear image in any lighting condition.
After taking videos with the IR camera, the recorded videos were then fed into the data processing module, where image processing methods were employed to accurately calculate pupil diameter in every video frame. Since we had raw pupil readings, pre-processing was performed to remove various defects, including eye blink, missing values and saccades. Linear interpolation and baseline correction were also performed to obtain smooth pupil responses for analysis.
Within an IoMT, nodes are commonly based upon Raspberry Pi [51] boards and CMOS circuitry with restricted computing and communication, and consequently, are not powerful enough for complex computations. In fact, these nodes, for ease of placement, may be battery-powered, which is an additional constraint upon complex computations. In addition, if feature extraction, noise removal and subsequent data processing (see Section III) were to occur remotely, then the response may well need to be in near real-time to allow clinicians to respond to some form of emotional upset, or even crisis. However, some deep-learning techniques are not without their disadvantages [53] for real-time responses, such as a need for parallel computing, possibly on graphical processing units (GPUs), especially during training, and a need for large amounts of labelled data. On the other hand, it is important to note that deep learning models (based on Convolutional Neural Networks) can also be trained using limited ground-truth data (through transfer learning, data augmentation, and other recent regularization approaches), and, in those circumstances, can make inferences very quickly in hardware-constrained execution environments.
In fact, this study aimed to develop an automated statistical mechanism suitable for IoT to estimate emotions by understanding the average pupil size variation, this being the primary motivating factor behind the research. We hypothesized that different subjects' pupils respond similarly, that is, they constrict or dilate during an emotional response. Various second and third order statistical tests were tried out. Thus, during data processing to identify normality, a Kolmogorov-Smirnov test was applied; for moments the kurtosis and skewness were found, a check on the homogeneity of variance was followed by repeated measures analysis of variance (that is via a rmANOVA test); Kruskal-Wallis and Friedman tests were applied to infer whether the results were statistically valid. Last, to infer emotional behaviour based on gender and age, a Wilcoxon rank test was used. Within an IoT, it is possible that pre-processing using image-processing techniques on video frames might take place locally, while once features were extracted, images with reduced data could be sent over a wireless network. The authors have previously investigated [54] lightweight encryption for privacy protection of video communicated over IoMT networks. Ultimately, that can take place once the feasibility of pupillometry and the scale and structure of data processing are established.

B. DATA ACQUISITION 1) SUBJECTS
Subjects were instructed to avoid unnecessary eye and head movement during the data collection phase. Some subjects had difficulty with eye fixation (a lack of maintaining focus on the camera) and teary eyes. Therefore, data for three VOLUME 9, 2021  subjects out of twenty-four was removed due to those limitations. The entire data collection session lasted approximately 10 to 14 minutes per subject.
Thus, twenty-seven mentally and physically healthy subjects initially took part in this experiment (later increased to 32 for experiments on the fear and disgust audio stimuli), consisting of eleven males (mean age = 25.64 years, with Standard Deviation (SD) = 6.87) and sixteen females (mean age = 25.20 years, SD = 7.07) with a minimum of 12 and a maximum of 39 years of age (The overall mean age = 25 years, with an SD = 5.9.) All subjects voluntarily participated in this research, among which 22.2% had either myopia (Near-sightedness) or hyperopia (Far-sightedness) of up to 0.75 m and, were therefore using glasses. However, the rest of the subjects had normal vision. As the experiments did not involve any video stimulus, vision problems did not affect data characteristics.

2) STIMULI
Fourteen audio stimuli were used, including twelve sounds taken from the International Affective Digitized Sounds 1 (IADS) [55]. Selected sounds were chosen based on the mean values of pleasure and arousal related to their respective emotion categories, with high pleasure and high arousal depicting the happiness emotion, low pleasure and low arousal depicting the sadness emotion, and low pleasure and high arousal representing the emotion of anger, irritation, disgust and fear. Importantly, irritation and disgust were categorized as secondary emotions of anger [56], and horror was categorized as a secondary emotion of fear [57], [58].
The categories used in this research were happiness, sadness, anger and irritation, whereas emotional sounds for disgust and fear were created in-house and validated by the process of Content Validity Index (CVI) [59]. CVI is a process of taking more than 10 experts' opinions on a relevant rating scale and then checking that data for their validity [60].
In total, fourteen stimuli were used, with a total of six different emotion categories, as shown in Table 2. The stimuli were input via ear pieces to both ears at a constant volume level of 60 dB in order to hear sound clearly, without any distortion or interruption.

3) VIDEO RECORDING
Rating proforma to rank all the audio stimuli were also given out at this stage. A cue was given to each subject before starting to play each audio stimuli. Each subject, after listening to an audio recording, ranked that particular sound on the given rating proforma. The next sound was played only when a subject said they were ready to listen. An average time of 6 s was taken by all subjects to rank each sound. This time also allowed subjects to move their eyes and head to prevent fatigue. The lighting of the room was kept constant for all subjects.
During this process, videos of the subjects were captured and compressed with an H.264/Advanced Video Coding (AVC) standardized codec [61]. H.264/AVC as a codec is a good compromise between the greater compression efficiency of the more recent High Efficiency Video Coding (HEVC) standard [62] and the need to restrict the computational overhead and latency so that it is within reach of constrained IoMT devices [63]. The H.264/AVC compressed bitstream was encapsulated in MP4 containers to form video files, using the well-known FFmpeg software tool. The MP4 container offers minimal overhead, while being widely compatible with a variety of media players. The relative lack of header overhead also makes then suitable for transport over lower bandwidth wireless sensor networks.

C. IMAGE PROCESSING 1) FEATURE EXTRACTION
Video data were captured using a Raspberry Pi camera V2, having a resolution of 1080 × 1920 pixels/frame and utilizing the maximum frame rate of 25 frames/s (fps). All captured videos were stored separately for each subject's eyes per stimulus, awaiting extraction of the subjects' pupils from the eye images. The latter was accomplished using MAT-LAB's image processing toolbox, which subsequently served to extract a pupil's diameter.
Notice that, though a video of both eyes was captured using the IR camera, only the left eye's pupil diameter was utilized for this research. In fact, the diameters of both eyes were measured, but it was found and confirmed that there was no difference between the two diameters. Therefore, to avoid replication of the results, only one eye was chosen. Furthermore, the left eye was chosen (see Fig. 1) because the selection of that side's eye had already been established in the literature, see [37].

2) STEPS IN IMAGE PROCESSING
Algorithm 1 summarizes the steps taken during image processing.
In more detail, processing proceeded as follows, according to Fig. 4's numbering. 1) Initially, for a particular stimulation, RGB images from all frames were processed. The colour contrast was enhanced by 1  2) RGB to grey scale conversion was performed to maintain the luminance but to remove hue and saturation from each coloured image.
3) The image was converted into binary or monochrome form (black and white) to aid identification of the circular region containing the pupil. The iris surrounding the pixel was reduced to black because the conversion threshold was set close to black. In fact, for each subject, from the initial frames for that subject, five pixels were selected whose intensity value was less than 20, which was already known to approximately represent a pupil's area. The arithmetic mean of these five pixels was used as a threshold to aid in segmenting the pupil in the image. Therefore, currently the threshold value is relative to each participant, though future work will investigate how to completely automate this part of the processing. After conversion to monochrome format, the image was then inverted so that white became black and black became white.

4)
If the inverted image still contained some black and white 'spots', including reflections within a pupil, then those spots were filled with their background tone if their size was less than 20 pixels, a value found heuristically. 5) The roundness of shapes identified within the inverted image was identified by means of a roundness function. If the roundness value was near to one, then a shape was declared to be a circle. Otherwise, it was designated to be a random shape. At this stage, the boundaries of circles were identified. From part five of Fig. 4, the value of the roundness is marked against the pupil shape identified by MATLAB. In the example, the value is 0.94. Any shape with roundness values close to one is identified by the roundness function as a circle. 2 6) To determine the diameter of a circle, the built-in MAT-LAB function regionprops was applied. In general, this function helps in measuring the diameter, area, centroid, bounding-box, perimeter and many other properties of any shape. However, for this experiment, it was the diameter that was used. An example of a diameter measurement calculated through MATLAB is provided in part 6 Fig. 4. All of the properties found with this function resulted in measurements given as in units of pixels.
Last, the calculated diameter in pixels was converted into the desired unit of measurement, millimetres (mm), to perform further processing. In fact, one pixel is equal to 0.26458333 mm and numerically 20 × 0.26458333 = 5.29 mm.

3) COMPUTATIONAL OVERHEAD
Theoretical time complexity of the algorithm is O(n) for a video per session. Segmentation of the pupil is facilitated by binarisation of the coloured video frame and therefore, complete segmentation and diameter calculation step takes O(width × height) which is linear with respect to number of pixels of a video frame. Moreover, Fig. 5 shows the time required to process each video frame to locate an eye pupil and then calculate its diameter. As mentioned previously, in Section III.C part 1, the frame rate was 25 fps. Hence, each frame was captured in 0.04 s while pupil segmentation took 0.08 s, which includes noise removal, greyscale conversion, and binarisation of the image. Last, the diameter calculation step took 0.02 s to mathematically calculate a pupil circle's diameter.

4) NOISE REMOVAL
When capturing a video of a human eye, it is natural to acquire some video frames with a closed eye due to blinking or When capturing a video of a human eye, it is natural to acquire some video frames with a closed eye due to blinking or rapid movement of an eye between fixation points (saccade). In the current study, this limitation was considered to be noise. For accurate response measurement, the data were therefore adjusted before analysis. Thus, pupil size values for eye blinks and saccades were removed by means of methods described in [64]: • Blink Extraction: Blinking is the state of eye where the pupil diameter cannot be measured correctly, as shown in Fig. 6. When blinks were detected by the pupil extraction algorithm, the measured pupil size value was higher or lower prior to the eye blink. To remove such noisy data, a linear interpolation was performed. Subsequently, the last measured pupil size before the eye blink was averaged with the first measured pupil size after the blink to allow removal of the blink size values from the data.
• Saccade Extraction: Saccades are the quick and fast eye movements when a person's focus changes quickly from one object to another, resulting in high jumps in the values of pupil size, as shown in Fig. 7. Such values were also removed, using the same linear interpolation method as for blink extraction, by averaging the first and last values of pupil saccades.
• Baseline Correction: Minor fluctuations in pupil size affect statistical results. To reduce this type of noise, baseline correction was needed to identify mean noise. Thus, subtracted baseline correction was applied to remove identified noise in the data, resulting in enhanced statistical power, as was also performed in [65].

IV. RESULTS AND ANALYSIS
In this section, we used statistical tests and assessment proforma techniques to arrive at an emotional content classification for involuntary and voluntary findings, respectively. Subject gender and age were also the subject of analysis, with results presented herein.

A. PARAMETRIC VS NON-PARAMETRIC STATISTICAL ANALYSIS
In this study, the analysis was performed with a null hypothesis approach by applying appropriate statistical tests using the RStudio software tool [66]. Statistical analysis assumes a statistical distribution of data characteristic before applying a particular test. For normally distributed data, parametric tests are recommended, and if data is not normally distributed, then non-parametric tests better approximate the acceptance or rejection of the null hypothesis. A complete statistical analysis was performed, as indicated in Fig. 8. Note that some tests mentioned below are omitted from Fig. 8 for simplicity of understanding. After checking for normality, moment tests were performed followed by a homogeneity of variance test for the entire dataset. For checking the normal distribution of data, Kolgomorov Smirnov (KS), Cramer Von Mises (CVM), and Anderson Darling (AD) tests were chosen [67]. Then, moments of data were checked using the data's skewness and kurtosis measures [67].
Therefore, when data were symmetrical or moderately skewed and did not have any outliers, parametric tests were used. If data were highly skewed and did exhibit outliers, non-parametric tests were applied, depending on the size of the data set. If the size of the dataset were large, then in any case, parametric tests can also be helpful in identifying more meaningful statistical results. The homogeneity of variance is an assumption when applying an Analysis of Variance (ANOVA) test for parametric data, which also    includes a paired Student's t-test for pairwise analysis. For non-parametric tests, Friedman and Kruskal-Wallis tests were applied for data analysis. Gender-and age-based analysis was also performed using the Wilcoxon test.
Subjects' reactions to the emotional content were determined in two different forms. One form was involuntary, when an involuntary organ (an eye pupil) was involved, to identify or classify content. The second form was voluntary, in which each subject wrote down their perceived thoughts with respect to each stimulus, as shown in Fig. 9.

B. INVOLUNTARY BEHAVIOR ANALYSIS
Pupillometric data for the abovementioned audio stimuli ( Table 2) were analysed to better estimate the predicted emotion.
Initially, applied normality tests showed that data were not normally distributed, as their p value < 0.0001. Normality was also checked using skewness and kurtosis tests, which revealed that the data were moderately skewed but eventually became non-normal. This was reflected in the kurtosis results for subsets of the data for the disgust emotion only. In this case, the statistical indicators for the data were less than would be expected for a normal distribution. In addition, the irritation and fear emotion kurtosis values showed that there were more outliers, and happiness, sadness, and anger emotion moment tests presented moderately skewed and symmetrical data, as shown in Table 3.
Furthermore, to identify the existence of homogeneity of variance, two tests were performed: the Fligner-Killeen (FK) and Levene tests, which resulted in p < 0.0001. These p values indicated that the data do not uphold an assumption of equal variance for each emotion. As a result, non-parametric tests were applied, as shown in Table 4.
A Kruskal-Wallis (KS) test with Holm p-adjustment method was performed for all emotions. Notably, for the happiness emotion, the sounds' null hypothesis was rejected because p < 0.05. The pupil responses of each subject for all three sounds were not the same. That is,, there were differences among the sounds. The Student's t-test as a post hoc test was applied to identify similarity and dissimilarity of sounds with each other. Results showed that audio clips no. 226 and 365 belonged to the same group, and clip no. 813 turned out to be significant among the groups. Hence, all subjects had the same pupil responses for two groups and different pupil responses for the third group of emotional stimuli (no. 813) compared to clips no. 226 and 365.
In case of the sadness emotion category, the null hypothesis was also rejected as p = 0, and the Student's post hoc t-test revealed that the pupil responses of all subjects for audio clip no. 292, 293, and 296 was different. Hence, this category of emotion did not find any degree of agreement between subjects' pupil behaviour. However, the Kruskal-Wallis test for the anger emotion exhibited p < 0.05, but the post hoc test group (278, 420, 422) comparison showed that among all three categories of the anger emotion sound, no. 278 and 422 belonged to the same group and 420 was in the group for which the subjects' pupil response had less agreement with the predefined label of the emotion category. Similarly, the irritation emotion had p < 0.05, and post hoc comparison  of groups showed that clips no. 116 and 252 were not significant compared to clip no. 115.
The disgust and fear emotions had only one sound for each category. Therefore, a within-subjects comparison was performed for analysis rather than a within groups analysis. The number of subjects was also increased to 32 to obtain more promising results. Results of the Kruskal-Wallis analysis showed that p = 0 and for the Student's t-test = 3.9 for the within subject comparison analysis for both the disgust and fear emotions. Non-significant results were shown in the disgust emotion category for some subjects, e.g., subject no. S013, S018, S027, and S028. Similarly, for the fear emotion, some subjects, S009, S010, S013, S014, S021, S025, and S030, resulted in non-significant results evident from Table 4.
Preferably, a non-parametric test was applied for the analysis of data but, due to the large dataset and to strengthen the present research, a parametric test was also applied using a repeated measures ANOVA (rmANOVA) analysis. For happiness, sadness, anger and irritation emotions, rmANOVA showed a p > 0.05, suggesting that the pupil's behaviour of each subject for these emotions was non-significantly different. Conversely, a within subject comparison of the disgust and fear emotions had p < 0.0001. This was followed by a pairwise Student's t-test (as a post hoc analysis) to identify subjects whose pupil behaviour was non-significantly different, as shown in Table 5. Moreover, to identify pupil response for males and females, a Wilcoxon test for independent groups was performed. The ratio of females to males participating in this study was 16:11 for happiness, sadness, anger, and irritation emotions, and the ratio was 22:10 for the disgust and fear emotions. Each emotion for this test was analysed for both males and females, with the significance VOLUME 9, 2021 being p < 0.05. Results revealed that for the emotional stimuli of happy, sad, and irritation, there was a difference in the mean pupil response between male and female subjects. On the other hand, the mean pupil response was almost the same for both genders in cases of angry, disgust, and fear emotional stimuli, as shown in Fig. 10. Thus, males and females have different behaviours towards each emotional response, confirming observations in earlier investigations (refer to Section II.C). Similarly, age-specific analysis was also performed with three levels of age for the available subjects. Each participant age level was mapped onto pupil responses using boxplots with a jitter effect for pupil diameter values, as shown in Fig. 11.

C. VOLUNTARY BEHAVIOR ANALYSIS
Voluntary behaviour consists of a subjective assessment to rank predefined emotional categories for each sound. Each subject was given a time of almost 6 s for each question. In total, approximately 84 s were utilized to complete the rating scale proforma. The rating scale ranged from 1 to 3, applied to a total of fourteen sounds with pre-categorized emotions. For example, sound number 226 was pre-categorized as the happiness emotion by just listening to the sound, which may differ from person to person. Subjects rated 1 for happiness if they felt the sound denoted happiness, 2 for neutral and 3 for unhappiness as the felt emotion. A similar rating scheme was followed for all other sounds, and all subjects' responses from the rating proforma are shown in Table 6.
Analysing the statistics of subjects' rating data, it was established that the Shapiro-Wilk normality test for all emotions was at a less than significance level, i.e., < 0.05. Therefore, data were not considered normal. Thus, a non-parametric Friedman test was applied due to the different data types. Analysis concluded that all subjects had the same perceptions with respect to the happiness, sadness, anger and irritation emotional stimuli because all these emotions had p > 0.05. Since there was only one group within the disgust and fear emotions, a Kruskal-Wallis test within subjects comparison was performed, showing p > 0.05. Importantly, most subjects had given their feedback according to the predefined emotions, as shown in Table 7.

V. DISCUSSION
The code designed in this study could help to automatically identify eye pupils, as well as to calculate the pupil diameter of each detected pupil. The purpose of designing this system is to make processing more robust, as discussed next.

A. INVOLUNTARY CLASSIFICATION OF EMOTIONS
Various testing methods were avoided in this study, including those involving long video clips/films [68] or those involving hard mental activity on the part of the subjects [69], [70]. 5364 VOLUME 9, 2021 FIGURE 11. Age based analysis: Eye pupil behaviour estimation in three participant age groups (less than and equal to 21 years, from 22 years to 30 years, and greater than and equal to 31 years).  The films have to be quite lengthy to create a sufficient emotional impact on a human mind, and tests involving mental activity increase the chances of more noisy data, leading to a less reliable system. These decisions were confirmed in [71]. VOLUME 9, 2021  We analysed all data by means of ten different statistical tests, including tests that, as far as we are aware, have not been used for this purpose before. Some statistical tests were performed in [72]. However, the authors of that study [72] reported contradicting results because, it seems, they worked only with positive and negative emotions, whereas we worked on six different emotions.
Parametric statistical tests for involuntary organ data have a strong degree of agreement for happiness, sadness, anger and irritation emotions, see Table 8. Contrary to this, non-parametric tests classified clips no. 226 and 365 as the explicitly defined happiness emotion, clips no. 278 and 422 as the anger emotion and clips nos. 116 and 252 as the irritation emotion. This is the reason why involuntary organ data has to pass through several types of statistical tests rather than only one. Considering both types of statistical tests, we correctly classified six audio stimuli as shown in Table 8. Based on these, two newly added stimuli were used, and parametric tests exhibited a strong degree of agreement from subjects of 90.6% and 84% for the disgust and fear emotions, respectively. Similarly, in non-parametric tests, values of 81% and 75% occurred for the disgust and fear emotions, respectively. These results could be helpful for future research and further development of a robust system for emotion classification. Pupil behaviour was also affected by gender. Analysis showed that both genders exhibited similar behaviour for angry, disgust and fear emotions, and subjects with ages greater than and equal to 31 years presented reduced pupil dilation and more pupil constriction compared to the other two groups.

B. VOLUNTARY CLASSIFICATION OF EMOTIONS
We employed voluntary behaviour analysis in which all subjects filled surveys proforma for all audio clips. Findings of voluntary behaviour analysis revealed that explicitly definition of the emotion category may lead to bias in the categorization. This can easily be observed in audio clips no. 292, 293 and 296, which were categorized as the sadness emotion in contradiction with the involuntary organ results.
The voluntary organs results also led us to conclude that the majority of subjects felt happiness for the happiness emotion clips and perceived other emotions correctly, as indicated on the rating proforma for all other audio clips, i.e., for the anger, irritation, disgust and fear emotions. As the voluntary organ results were scaled data, distinct from the pupil data, different statistical testing was applied to verify all results. The same type of scale-based data analysis was also performed in [72] but with a Positive and Negative Affect Schedule -Expanded (PANASX) model.

C. ROUNDUP OF STATISTICAL FINDINGS
To summarize the findings of this study with respect to the statistical analysis, Fig. 12 presents estimated ranges of a human's pupil for the given stimuli. Fig. 12 indicates similarity between subject responses according to one or other of the types of test, i.e., the parametric or non-parametric tests. However, though the ranges of constriction or dilation may be consistent, within those ranges, no degree of agreement was found between individual subjects in the expression of any particular emotion. That is, the ranges coincided but not the precise value of constriction or dilation of pupil diameter. Notice also that one way to use ranges to estimate the emotion, used by the authors, was to take the median of each range.
Although the investigations cannot be termed conclusive, statistically significant patterns were identified. This may have been because of overlaps in the pupil constriction or dilation ranges for particular emotions that in [72] analysis was confined to positive and negative emotions. However, the method of selecting an average from the subjects to indicate an emotion, as previously mentioned, may overcome the restriction in pupil diameter ranges. It can also be concluded that pupils constrict to their maximum level when a person feels irritated and when age is greater than or equal to 31 years. Similarly, the maximum dilation occurs when he/she feels happy or when age is less than or equal to 30 years. For other emotions, such as fear, disgust and anger, the pupils grow but not to their maximum extent, regardless of gender. The authors believe that the restraint to dilation could well be due to the influence of another emotional response, which may be curiosity. However, identification of that restraint is beyond the scope of the current study, though it is under investigation by the authors.

D. OVERALL COMPARISON
A comparison was made between the proposed methodology of pupillometry for emotion recognition/classification and papers, which, in this field, constitute state-of-the-art methodology. In the comparison of Table 9, additional methodological features have been included relative to the earlier Table 1. Such analysis reveals that previous systems focused on high-cost devices, mostly eye trackers, biosensors to get EEG signals, SKT, Blood Volume Pulse (BVP) and many other techniques to attain improved accuracy. However, the current study was aimed at developing a cost-effective system method. Instead, data were analysed by means of ten statistical tests, which helped to achieve more accurate results. In addition, previous research only classified general categories of emotions, whereas this research sought to identify six basic emotions.

VI. LIMITATIONS OF THE STUDY
Consistent with the needs of this scientific community, the device used in this study is cost effective. It has never before been used to carry out such promising research, in part because of the technical limitation that the camera will heat up if used for a long duration. Consequently, unlike with other, more costly sensors/devices, we needed to periodically stop the elicitation process for a few minutes to allow the sensor to cool down. Second, it is common practice to perform data normalization to remove irregularities from the data; however, we tried to avoid this to preserve the ability to exploit hidden correlations within the data. Furthermore, this study focused on a statistical analysis of the dilation and constriction of human eye pupils in response to audio stimuli related to different emotions; therefore, a large sample size was required for statistically valid inference. Although the tests performed in this study are also suitable for small data sets, increasing the sample size will make the results more reliable.
Another limitation of this study is the light reactivity of the pupil, which is a limitation that is common in all pupillometry studies, as discussed in a systematic literature review of pupillometry [81]. The data recorded for this study were captured in a room with a constant light source; however, as proposed, a possible IoMT system would need to address this limitation.
Finally, for statistical evaluation or machine learning based emotion classification using pupillometry, accurate measurement of the pupil diameter is crucial; however, the data from previous studies have not been shared for research purposes, and the lack of any reference data for comparison is a major challenge in establishing standards. Therefore, one aim of this study is to share the acquired data with the research community.

VII. CONCLUSION
We investigated whether recognition of pupil dilation and constriction, captured using a cost-effective camera, leads to robust and meaningful results in terms of categorizing human emotions. If this technological solution works, it can VOLUME 9, 2021 be employed as part of a wireless sensor network, which in turn can feed into an IoMT, where statistical processing takes place.
Currently, the image-processing toolbox in MATLAB was used to from a robust algorithm specifically designed to identify these eye activities. Pupil diameters from the camera's captured data were used to determine the pupil diameter, which in turn allowed identification of any pupil dilation or constriction. Fourteen audio stimuli of emotions were used for the categorization of six basic emotions, i.e., happiness, sadness, anger, irritation, disgust and fear.
This study also focused on identifying suitable statistical tests for this purpose, and based this study, ten such tests were shortlisted. The low-cost system correctly classified 226, 365, 278, 422, 116 and 252 audio clips taken from IADS. Newly created audio clips no. 000 and 222 were verified from experts using CVI. Parametric and non-parametric tests for the response to these clips, representing disgust and fear emotional stimulus, respectively, gained agreement of opinion from 26 of 32 subjects and 22 of 32 subjects, respectively. Gender and age-based analysis was also performed, which confirmed the varying behaviour for different emotional content with respect to gender and age. Subjective assessment also acted as a voluntary organ, and its statistical analyses showed a strong degree of agreement between subjects' opinions and the already explicitly defined emotion category for each stimulus.
Human emotion recognition has recently become popular for its application in work on the Human-Computer Interface (HCI) and for its applications in psychological studies. Thus, this study also has a role in those fields, especially if the whole can be made part of an automated system that delivers results via an IoMT. It will also have the advantage that the basic sensor nodes can be deployed in a flexible manner through a wireless sensor network. Beyond that, a deep-learning system based on input from both pupil diameters and pupil images is under development by the authors, providing methods for overcoming the possible disadvantages of deep-learning in real-time environments mentioned in Section III-A can be applied.