You Can’t Hide Behind Your Headset: User Profiling in Augmented and Virtual Reality

Augmented and Virtual Reality (AR and VR), collectively known as Extended Reality (XR), are increasingly gaining traction thanks to their technical advancement and the need for remote connections, recently accentuated by the pandemic. Remote surgery, telerobotics, and virtual offices are only some examples of their successes. As users interact with XR, they generate extensive behavioral data usually leveraged for measuring human activity, which could be used for profiling users’ identities or personal information (e.g., gender). However, several factors affect the efficiency of profiling, such as the technology employed, the action taken, the mental workload, the presence of bias, and the sensors available. To date, no study has considered all of these factors together and in their entirety, limiting the current understanding of XR profiling. In this work, we provide a comprehensive study on user profiling in virtual technologies (i.e., AR, VR). Specifically, we employ machine learning on behavioral data (i.e., head, controllers, and eye data) to identify users and infer their individual attributes (i.e., age, gender). Toward this end, we propose a general framework that can potentially infer any personal information from any virtual scenarios. We test our framework on eleven generic actions (e.g., walking, searching, pointing) involving low and high mental loads, derived from two distinct use cases: an AR everyday application (34 participants) and VR robot teleoperation (35 participants). Our framework limits the burden of creating technology- and action-dependent algorithms, also reducing the experimental bias evidenced in previous work, providing a simple (yet effective) baseline for future works. We identified users up to 97% F1-score in VR and 80% in AR. Gender and Age inference was also facilitated in VR, reaching up to 82% and 90% F1-score, respectively. Through an in-depth analysis of sensors’ impact, we found VR profiling resulting more effective than AR mainly because of the eye sensors’ presence.


Introduction
In recent years, the pandemic has increased the need of remote connections, and we have witnessed to a mass adoption of virtual technologies particularly for teamwork.This has opened new perspectives for different platforms that allow virtual interactions with others, and fostered the already ascending development of the Metaverse.The Metaverse has been recently defined as a "post-reality universe, a perceptual and persistent multiuser environment merging physical reality with digital virtuality" [1].While being designed around the human, which constitutes the physical reality of this interplay, the digital virtuality relies on immersive technologies that allow spatial and interactive features, namely AR and VR.Eventually, these devices became the core of the fourth wave of computing innovation [2].
Currently, there is an ongoing discussion on the potential protocols that will govern the Metaverse, with a particular focus on the controversial interplay between openness and privacy [1].The latest virtual devices allow tracking a large number of behavioral metrics, such as the headset's and controllers' position and rotation (which reflect the users' physical actions), all the interactions between the user and any virtual object present in the scene, and also eye movements.All these data can be source of personal information, and even the user's identity (e.g., [3] [4] [5]).While being private, this information would help to restrict the use of the headset to specific individuals.For example, it would be possible to allow authentication only to those who have the rights, thus increasing the security of such technologies.

Contributions.
In this study, we assessed the feasibility of profiling and identifying users by leveraging behavioral data generated during an AR and a VR headset.We propose a general profiling framework that could be applied to different virtual devices (i.e., VR, AR), different applied fields (i.e., everyday use-case of a smart technology, and a work scenario), and different type of user's behaviors (i.e., walking, searching for landmarks, pointing, performing controller-based operations and physical actions).Second, we deeply study users' profiling at different levels (i.e., identification, age, gender), introducing -to the best of our knowledge -the novelty of profiling personal information of users (specifically, gender and age) in virtual contexts.Third, we additionally explored the impact of each sensor in users' profiling both with AR and VR, specifically assessing the relevance of position and rotation of the headsets and controllers hardware, and of the eye-tracking technology embedded in the VR device.More importantly, we fill a gap in the literature on users' profiling in AR scenarios: while it is true that AR technology is still immature, it is also true that it is largely understudied compared to VR.We summarize our contributions as follows: • we propose a general profiling framework for AR and VR technologies; • we study users' profiling with respect to identification, age and gender inference in virtual contexts, which is novel in the AR context; • we conduct extensive studies to assess sensor' importance in our profiling tasks.
In Section 2, we provide background and review literature on users' profiling.Section 3 presents the general profiling framework we adopted in our experiments.The dataset and experimental settings are shown in Section 4 and Section 5, respectively.We report our results in Section 6, and conclude with a discussion in Section 7.

Background & Related Work
This section aim to describe the importance of security and privacy in virtual technologies such as AR and VR.Section 2.1 summarizes the application of virtual technologies in different fields, from the industry to the medicine.Section 2.2 introduces the threats to AR and VR application with a cyber-security perspective.Section 2.3 describes literature of user profiling in virtual technologies.
2.1 AR / VR use-cases in daily and work scenarios

Industry and remote work
With the advent of Industry 4.0, the benefits of virtual devices have been repeatedly shown in many domains: in the design cycle of products and manufacturing systems [6], for programming machines [7], in the teleoperation industry [8,9] and also for training novices [10,11].In any of these applications, virtual technologies allow the operator to perform work tasks while being immersed in virtual environment that faithfully emulates the physical one.This is particularly important also for architecture, engineering and construction experts: virtual technologies in this sector are helpful for stakeholder engagement, design support and review, construction planning and monitoring, management, and training [12].
Taken together, the reasons why to employ virtual technologies in industry are numerous.This will potentially lead to large-scale adoption of VR and AR devices in industrial fields, opening new questions about how to ensure individuals' security on the work place.An effective algorithm for automatically identifying workers wearing a headset might help in this direction.For example, it would be possible to enabling authentication only for those who have the rights on the workplace (e.g., site manager).Further, assuming that older workers might prefer a different design of the virtual environment [13], an accurate user profiling might help customizing virtual features on the user's age.

Education
As AR and VR have the potential to bridge the limitations of 2D e-learning environments, online education is one of the fundamental pivots of Metaverse [1].Literature extensively examined the characteristics that lead to the successful integration of immersive virtual technologies in education, as well as its positive influences on learning outcomes.For example, 17 positive effects of VR were identified for education, such as improving skills, living more realistic experiences, enhancing the intrinsic motivation and the level of interest in learning [14]; however, these effects were subject-specific.Additionally, [15] reviewed VR application studies by focusing on immersive VR environments for higher education.They showed that, even though most of the literature reports that VR for education is still in its experimental stages, there is a strong general interest in the use of immersive VR particularly in engineering, medicine and computer science education, and that this technology is mature enough for teaching procedural, practical and declarative knowledge.
In view of a large-scale adoption of VR/AR for education, a viable profiling or identification algorithm certainly comes in handy.For example, automatic authentication can be efficient when VR/AR technologies are adopted in numerous classes.Similarly, it would be helpful to automatically detect a student's age for adapting lessons and virtual contents.

Gaming and entertainment
Virtual technologies play an essential role in the gaming market too.While VR games were already well spread since the 1990s (e.g., Virtual Reality Gear [16]), from 2018, AR also reached a large entertainment crowd with the popularization of Pokémon Go, Snapchat, Apple's ARKit, and Google.com'sARCore [17].This sector is expected to grow exponentially, as it also embraced entertainment areas that go far beyond gaming and arcade: film and music industry, live show sectors and sports are just a few examples [18].Particularly after the expensive loss caused by the pandemic in these sectors, the development of immersive virtual platforms can help support the cinema, music and live-show industry [16].For instance, VR cinema was deployed for movies, theatre and art exhibitions [19], providing users with a 360 leisure experience.Last, the recent explosion of virtual influencers phenomenon [20] confirms the crucial role of virtual technologies in both entertainment and marketing levels.
It is clear how the user's profiling/identification could be used for marketing strategies in this sector (e.g., delivering customized advertising).Further, particularly in gaming platforms, user identification might help detect banned individuals and prevent their access to virtual games.

Medicine
Virtual technologies have been proven to be reliable medical tools both for doctors and patients.For instance, as it allows for simulate surgeries, VR can be beneficial for medical education and training [21].Interestingly, AR has the potential to superimpose salient clinical records or visual aids supporting a surgery over the patient's body [22].
Research on virtual control systems for remote robotic surgery operations is also growing [23].Further, from the patient's point of view, VR can help improve cognitive abilities after a traumatic brain injury [24], or it can help increase engagement in Parkinson's motor training via gamification [25].
Under this view, detecting whether a user is the chief of surgery rather than a student can help restrict the rights during a surgical operation involving AR/VR.Similarly, profiling patients using a virtual headset could allow training customization and automatic recordings of clinical improvements.

AR as a smart wearable technology
The latest AR smart glasses are fully wearable devices with computational functions.They allow users to download applications from a mobile operating system and provide various functionalities by freeing the user's hands [26].Notably, most AR glasses currently on the market do not offer an integrated experience of social networks and streaming content.For instance, Vuzix developed AR smart glasses specifically designed to be used with drones or for navigation in unknown areas.AR as an assistive navigation device has also been tested in applied research [27].However, Facebook has already partnered with Ray-Ban and launched their Ray-Ban stories, which have raised important questions about ethical and privacy issues [28].Even though they currently do not allow projecting holograms in the field of view, this is an essential hint for possible connections between AR technology and social networks.
In the foreseeable future, the next generation of smart glasses will likely allow projecting e-mails and notifications from social networks on the user's field of view.In this perspective, accurate automatic identification of the user during everyday activities could help restrict the visualization of personal messages only to the owner of the glasses.

Privacy in Emerging Technologies
The increasing popularity of big data [29] coupled with the rapid adoption of various "smart" devices has resulted in parallel increases in privacy concerns.In today's society, most people consider data collection incessant and believe that the risks outweigh any benefits [30].To prevent (or at least reduce) the exposure of personal data, current and emerging technologies should support privacy by default [31], in accordance with recent legislation such as GDPR [32].Fortunately, researchers are actively focusing on studying and adding a security and privacy level in emerging technologies.For instance, Di Pietro and Cresci [33] deeply discussed security and privacy issues arising in the metaverse, allowing a better understanding and a consequent improvement of the technology with respect to its users.Similarly, Nair et al. [34], proposed a system to browse metaverse in incognito, protecting their privacy from companies, surveillance agencies, or data brokers.Researchers have also focused on incorporating privacy-preserving measures on daily usage systems, such as authentication [35], and more recently, de-authentication techniques [36].
Besides protecting users' data from unwanted usage or sharing, past literature shows how attackers can use public data in unconventional ways to profile users or to infer private users' data (e.g., gender, age, personality traits).For instance, Conti and Tricomi [37] studied user profiling in video games, showing how public gaming data can be exploited to track gamers for malicious activities, e.g., harassment or cyberbullying.Kosinski et al. [38] leveraged Facebook data to infer users' gender, age, personality, or sexual orientation.Jurgens et al. [39] predicted people's physical locations from their tweets, while Zhang et al. [40] leveraged Sharing Platforms' reviews to predict users' gender.The results of such studies highlight the high risks connected with data availability and point to the need for further research to protect users' privacy better.

Users Profiling in AR and VR applications
Privacy risks in AR and VR technologies is not deeply discussed in the current literature.Roger et al. [5] investigated the task of user identification, i.e., identifying a given user among a group of known people.This study is conducted in an AR environment through Google Glasses among 20 participants.Behavioral features include head movement (i.e., accelerometer and gyroscope) and eye blinking patterns.The best performing model -a Random Forest -achieved 94% of accuracy in the task.Li et al. [41] proposed Headbanger, an authentication system for wearable devices.The authentication task differs from the identification one since in the first, users can be unknown, while in the latter, the algorithm aims to identify a user in a group of given users.This study is conducted in an AR environment through Google Glasses among 95 participants.The proposed system relies on motion sensors (mainly the headset accelerometer), and the system authenticates users by leveraging three distance metrics, such as Cosine Distance, Correlation distance, and dynamic-time warping distance.Headbanger achieves 95% of accuracy in the task.Mustafa et al. [42] proposed an authentication system for VR, highlighting the importance of such a security mechanism, especially when a user is completely immersed in the virtual environment, which can lead to the dangerous lunch time attack [43]. 2 This study was conducted through a Google Cardboard VR with a Samsung Galaxy S5 mounted and involved over 23 participants.Behavioral features involve sensors like the headset's accelerometer and gyroscope, from which the authors extracted features such as summary statistics (e.g., mean, variance) and frequency domain features (e.g., energy).The best performing model -a Logistic Regression -achieve 93% of accuracy in the task.Pfeuffer et al. [4] studied the problem of user identification in VR.The experiment is conducted with HTC Vive, involving 22 participants.The authors consider a broad spectrum of features that capture head, hand and eye motions.The best performing model -a Random Forest -achieves up to 40% of accuracy in the task.Miller et al. [3] further deeply explore the identification task in VR, as similarly done by Pfeuffer et al. [4].The experiment is conducted with HTC Vive, involving 511 participants.Behavioral features include summary statistics (e.g., maximum, minimum, average, std), position and rotation of headset and controllers (both right and left hand).The best performing modela Random Forest -achieves up to 95% of accuracy in the task.
The reader can notice that existing works mainly focus on tasks inferring mainly the person with authentication or identification tasks, while there is a lack of understanding of wheter behavioral data can be leveraged to infer other users' private information such as age and gender.Similarly, prior works consider only AR or VR technologies for their experiments.This is a limitation, since the level of virtual immersion allowed by the two technological devices is substantially different [45], and this significantly affects the user's behaviors.Our paper thus aims to fill the current literature gap by considering different privacy inference tasks (i.e., age, gender, identification) explored in both AR and VR environments.This section describes the methodology we propose to execute inference task with virtual technologies.Section 3.1 motivates the reasons of our investigation.Section 3.2 presents our proposed framework.

Scope of the work
Augmented Reality (AR) and Virtual Reality (VR) devices contain several sensors (e.g., accelerometer, gyroscope, eye tracking) essential to interact with virtual environments.Sensors' data describing human behaviour can be used to build biometric applications, opening several opportunities to enhance and tailor users' experience.However, such data might pose risks for users' privacy and security.In this study, we aim to understand whether it is possible to profile users by leveraging their interaction with AR and VR applications.In particular, we conduct our study by considering two categories of profiling: 1. User identification, where we aim to identify a given user within a known population; 2. Private information inference, where we aim to infer users' gender and age.
We thus propose a general framework to accomplish both tasks, which can be extended to infer additional users' information.

Overview
Our goal is to define a generic pipeline that can be adapted and applied on any virtual technology (e.g., AR, VR) context to infer users' private information.As shown in Figure 1, the pipeline consists of four steps, starting from the user from whom we record the behaviors, to his/her actual profiling: 1. Raw Data Acquisition.In this phase, users' behavioral data are acquired.Virtual technologies' devices continuously generate data from users' interactions with the virtual environment (i.e., time series).From these data, we can describe users' behavior.The amount and type of information depends on the virtual technology and its devices.For instance, data might come from users' input (e.g., pressing joystick's buttons) and users' movements.2. Bias Removal.This phase aims to remove potential biases from time series that might lead to train erroneous machine learning models.3. Time Series Engineering.This phase aims to extract insightful information from the time series.4. Machine Learning Prediction.This phase aims to infer users' private information from the data elaborated in the previous phase by leveraging machine learning algorithms.

User Raw Data Acquisition Bias Removal Time Series Engineering Machine Learning Prediction
Profiled User Profiling Framework Raw Data Acquisition Users interact with AR and VR applications through devices such as headsets and joysticks.Such devices embeds several functional sensors to offer users an immersive experience.For example, users move and explore the virtual environment through sensors like accelerometer and gyroscope embedded in the headset.Thus, by combining information retrievable by each sensor s i of the equipment, we can trace users activity a at a given time t: where the subscript denotes the timestamp, and the superscript the sensor involved.We call this process acquisition phase.Acquisition phase can be repeated over time, resulting in a user temporal behavioural description.Thus, by acquiring data in ∆t = t − t 0 , we obtain a behavioral time series, described as follows: B ∆t represents an atomic sample of a user action (or task) of duration ∆t that we will use in the next phases to infer their private information.

Bias Removal
The acquisition phase might lead to enormous quantity of raw data.Such data might not only describe users behaviour, but also environmental information strongly correlated to experimental sessions.For example, using the raw headset height to identify users might be erroneous since such information might not be persistent over time (e.g., different shoes, different body position) [3].The problem of spurious correlations in cybersecurity applications is well known [46].We thus need to be extra careful in understanding if sensors might lead to erroneous and inconsistent machine learning performance.The process of bias removal depends on the sensors' nautre and require an ad-hoc analysis.We explain in details our implementation in Section 5.3.The de-biasing phase results in a new vector of de-biased actions: where d ti is the de-biased version of the feature a ti .
Time Series Engineering Raw temporal data should be properly elaborated to extract meaningful information.Moreover, given the huge amount of data, such sequences should be aggregated (i.e., compressed) to limit the computational cost of their analyses.The aggregation strategy can consider the whole sequence of a specific features, or just subpart of it.For example, given a sensor s i ∆t and its de-biased values over the time , the aggregation of a whole sequence results in a unique number x i , while the partial aggregation (e.g., a transformation every q times step) in a vector of numbers , where m = t/q.Note that the subscript does not denote anymore the temporal axis.Popular features derived from the aggregation phase are the mean, standard deviation, min, max [3].At the end of the process, we obtain, for each participant action or task, an aggregated datapoint that will be used by the machine learning models.

Machine Learning
The last phase of the pipeline involves machine learning approaches like Logistic Regression (LR), Decision Tree (DT) and Random Forest (RF).Training a well-performing model requires validation strategies that consider the nature of the inference.For instance, if the aim is to identify a user within a known population, the training, validation and testing splits should contain samples of the population.However, to avoid trials (or sessions) bias, the three split should consider samples belonging to different trials of collection.On the opposite, when inferring information like age and gender, the three splits should contain different set of users.Regarding the type of machine learning algorithm, we suggest the utilization of inherently interpretable models (e.g, LR, DT) to better understand models decisions while inferring.Moreover, interpretable models allows a transparent debugging phase to identify the presence of spurious features [47].Finally, given the unbalance nature of the problem (i.e, not all the classes are distributed equally), we suggest using performance metrics like F1-score with macro average.

Dataset overview
For the present investigation, we chose two use-case scenarios of virtual technologies, one involving AR and one VR.For both of them, we asked permission to the authors [48], [49] and [50] for sharing their data with our team and conduct the present study.The first dataset, described in Seciton 4.1, thus comes from a study on the multitasking effects when using AR while walking outdoor [48].Specifically, the authors took an experimental paradigm typically used in behavioral and cognitive research outside the lab, in a real dynamic scenario, and measured dual-task walking effects in young users responding to augmented stimuli during navigation.The second dataset, described in Section 4.2, instead comes from a use case scenario introducing VR into robotics and manufacturing industry [50] [49].In this context, the authors tested users guiding an industrial robotic arm via different control systems in VR.Even though the allowed behaviors were kept as simple as possible to ensure experimental control, both scenarios give an important glimpse into practical applications of virtual technologies in the field.Furthermore, in both cases, the dual-task methodology was deployed for testing users' behavior under different levels of workloads.This traditional paradigm is extensively used in human factors and applied research and represents an ecologically valid but still control method for imposing mental strain on a user [48] [49].

AR experiment
The AR experiment investigated multitasking effects in participants using AR while walking outdoor [48].For this case study, 45 young adults wore the Microsoft HoloLens 1st generation smart glasses (OS Windows 10, CPU Intel 32-bit 1GHz, memory 2GB RAM and 1GB HPU RAM, 2.3 megapixel widescreen head-mounted display, field of view 30 × 17, mass 579g) and performed: i) a visual task, in which they discriminated between different augmented targets presented in their peripheral view, ii) a navigation task, in which they reached a series of augmented landmarks via physical walking outdoor, and iii) the combination of these tasks, which they called dual-task.The virtual environment, shown in figure 2, was programmed in Unity (2017.4.18f1) and participants interacted with the augmented targets both via a wireless Xbox One controller and via physical collision with the virtual objects (e.g., walking through an augmented target).
Each participant performed 80 trials of the visual task, 50 trials of the navigation task and 50 trials of the dual-task.Specifically, for each trial of the visual task, a green or red object appeared lateralized on the left or on the right side of the visual field for 300ms; hence, the participant was asked to press a specific button on the joystick based on the color of the target and independently from the hemifield where it appeared.Differently, in the navigation task, a series of augmented landmarks appeared one after the other at -90°, 0°or 90°with respect to the participant's position and at a distance of 3m from each other.Participants were thus instructed to first inspect the surrounding to find the landmark, and then walk through it.In the dual-task, finally, participants walked through the series of landmarks while concurrently responding to the lateralized augmented stimuli.These tasks were specifically designed for measuring the effects of multitasking outside the lab.Therefore, they offer good insights into the potential impact of AR during outdoor walking.
The dataset is composed of 21 females (age mean = 24.28,SD = 2.22) and 24 males (age mean = 24, SD = 2.62) and comprises the following continuous measures: position (in meters) of the AR headset in the three axes (x, y, z), and rotation of the AR headset in Euler angles.Furthermore, time stamps of any button press on the joystick and any collision with virtual objects presented in the scene were also registered, even if they were not considered for the present work.It is to notice that, since the datasets of the first 11 participants did not include data on the headset position, we ran our investigation on 34 participants out of 45.

VR experiment
The VR experiment deployed a virtual reproduction of an industrial robotic arm (Universal Robot UR5) developed in Unity (version 2020.2.1f1) [49] [50].The virtual environment was designed to test performance and eye parameters of users during a simulated teleoperation task.All participants wore an HTC VIVE Pro Eye VR device (resolution 1440x1600 pixels per eye, refresh rate 90Hz, field of view 110°, weight 555g) and were provided with both VR controllers.
The dataset included 21 young adults (10 females, 11 males) and 14 participants who reported being more than 50 years old (8 females, 6 males).Therefore, overall, 18 females (age mean = 39.33,SD = 14.21) and 17 males (age mean = 37.75, SD = 16.32)participated at the experiment.All participants in VR guided the robotic arm shown in figure 3 through a pick-and-place via two different control systems (controller buttons and physical actions) and under two levels of workload (single-task and dual-task).For the pick-and-place task, they had to pick a bolt from the workstation and place it into a box.When using the controller buttons system, they performed the task by only using the pad buttons on the VR controllers.With the physical actions system, instead, they still used the VR controllers, but they were allowed to physically approach the robot with their hand, grasp it and then move it over the worktable by physically moving their arm.Furthermore, in contrast with the single-task condition, in the dual-task, participants operated the pick-and-place task while also performing simple arithmetic sums.A series of numbers ranging between 1 and 10 were randomly presented on a virtual screen in front of the participant for the whole duration of the pickand-place actions.2.5s elapsed between each number presentation, with a random jitter of 0.3s.After the place action, You Can't Hide Behind Your Headset: User Profiling in Augmented and Virtual Reality participants reported the result of the arithmetic operation by pointing a virtual keyboard through the controller and then moved to the next trial.In each condition, the young participants performed 40 trials, while the old participants performed 20 trials.
The following continuous measures were registered: position (in meters) in the three axes (x, y, z) and rotation in Euler angles of both the VR headset and its controllers.As the VR device employed for these investigations is additionally provided with an integrated eye-tracker, the following continuous eye parameters were also recorded: pupil size (in millimeters) and eye openness (expressed from 0 to 1).Finally, time stamps of any button press on the controllers and any collision with virtual objects in the scene were registered too, but they were not used for the present investigation.

Task Identified from the AR and VR Experiments: Task-level
For the present investigation, we isolated specific macro tasks on which we performed users' profiling/identification.

Augmented Reality
Specifically, in AR we considered the same navigation task as identified by the authors [48], and then we called mental task what the authors called visual discrimination task.In the latter, participants were discriminating between different colored and lateralized augmented objects while standing still.As this task was mentally demanding, we consider it as a mental task.Differently, in the navigation task, participants were looking for augmented targets in their surroundings and then walked through it.The navigation task was performed both as a single-task (low workload) and concurrently with the mental task (high workload).To recap, in AR environment, we identified the following tasks: • Mental Task (MT); • Navigation Task -Low workload (NT-Low); • Navigation Task -High workload (NT-High).

Virtual Reality
In VR, instead, we followed the same categorization used by the authors [50].Therefore, we considered two different pick-and-place tasks according to the type of interactions allowed between the user and the virtual robot: controller buttons and physical actions.The CB-based task corresponds to the pick-and-place performed via controller buttons, while the PA-based task includes the same pick-and-place executed via physical actions.Both tasks were executed under low and high workload: compared to the low workload, in the high workload condition participants executed the pick-and-place task simultaneously with the arithmetic task.Overall, the following tasks were identified from the VR scenario: 5.2 Actions Identified from the AR and VR Tasks: Action-level Further, from each of the tasks discussed above, we identified a series of different actions both from the AR and VR experiments.The analysis performed on these actions is at a micro-level, and it is based on the type of interactions performed and on the range of motion involved.

Augmented Reality
Specifically, from the tasks performed in AR, we extracted the following operations: button interaction, search and walk.In the button interaction, we included the task sections in which participants were standing still while discriminating between the lateralized colored targets.Specifically, they pressed specific buttons on the joystick according to the hemifield where the virtual object was displayed.In the search operation, participants were engaged in the visual inspection of the surroundings for finding a virtual landmark; this operation was performed while participants were standing still and just rotated their head for inspecting the surrounding.Finally, in the walking operation, participants were physically walking to the identified virtual landmark.Both the search and walk operations were performed as single-task and concurrently with the secondary mental task (namely, the visual discrimination task).As argued by the authors [48], participants perceived lower workload when performing only the navigation task rather than performing the same task concurrently with the mental task.In other words, the secondary mental task put a strain on the users' mental resources.Therefore, we here refer to the dual-task as the high workload condition, while the single-task is considered as a low workload condition.Table 2 represents the actions isolated in the AR environment.

Virtual Reality
From the VR tasks, we extracted idle, pointing, button and physical interactions.Specifically, we extracted time intervals in which participants were only looking at the robot while it was executing either a pick or a place automation.Those time frames were considered as idle actions, as participants were only looking at the scene without interacting with any of the virtual contents.Idle intervals in which participants were mentally summing the numbers for the arithmetic task were considered as high workload idles, while those cases in which participants were not engaged in the arithmetic task, nor in any interactions with virtual objects, were considered as low workload idles.The pointing action was identified by selecting time periods in which participants were using the VR controller for pointing the numbers on the virtual keyboard shown in figure X.In that experimental phase, they where reporting the sum at the previously performed arithmetic task.
In the button interaction, participants guided the virtual robot through the pick-and-place task by only pressing specific buttons on the VR controller.Differently, in the physical interactions, participants physically touched the virtual robot and moved their own arm for relocating it over the worktable.In line with what demonstrated by the authors [49], both button and physical interactions were categorized according to the level of workload involved.Specifically, when the pick-and-place was performed concurrently with the arithmetic task, participants were imposed with higher workload compared to when performing the pick-and-place task without additional tasks.Table 3 represents the actions isolated in the VR environment.AR and VR datasets contain different type of raw features acquired from the sensors.We now describe, for each category of sensors, a description of the features and de-biasing techniques we applied.
• Head Position (AR and VR), represented as a 3D coordinate (x, y, z) measuring the relative distance (in meters) of the user from a center point in the virtual environment.This feature might contain both sessions and users static traits (e.g., height).We thus derived different variants of this information, such as the movement, computed as the norm between two points at 5 timestamp of distance, and the vertical oscillation, computed as the difference between two height values at 5 timestamp of distance.• Head Rotation (AR and VR), represented as a 3D value.For each axis, we compute its angular speed by considering points at 5 timestamp of distance.This transformation can remove information related to trials (e.g., specific positioning of objects with respect to the participant).• Eyes (VR), includes data on pupil size (in millimeters) and eye openness (0-1), for both left and right eyes.It is to notice that, in order to overcome possible confounding variables [51] [52], it is usually appropriated to preprocess the raw eye data for flattening individual differences.However, as the aim of the present work was specifically to capture individual traits and behaviors for allowing identification/profiling, we opted for not preprocessing eye-tracking data.On the contrary, we leveraged the individual differences in pupil size and eye openness [53] [54] [55] for better identifying and profiling users.Further, we enhance this set of features by additionally computing the symmetry among the eyes for both pupil dilatation and eye openness.On an applied level, using the raw output of the HTC Vive Pro Eye device speeds up the identification/profiling process and allows higher generalizability to multiple VR devices.• Controller Position (VR), represented as a 3D coordinates (x, y, z) relative to the virtual environment center point.Similarly to the head position, this feature might contain both sessions and users traits.We thus transform it in the movement, computed as the norm between two points at 5 timestamps of distance.• Controller Rotation (VR) represented as a 3D value.We conduct the same process of head rotation.
Finally, each feature of the previously describe families is aggregated with tsfresh3 .Given a time series, this library extracts more than 100 features, including average, standard deviation, quantile, and entropy.We further refined the features by keeping only the relevant ones. 4Thus, starting from the raw time series of a single action within a single tasked performed in a single trial by a single user, we extract a single aggregated data point.The process is repeated for all the users, trials, actions, and tasks, obtaining 9360 datapoints in AR, and 16520 datapoints in VR.

Models Training and Validation
In our experiments, we test four different algorithms: logistic regression, ridge classifier, decision tree, and random forest.As a baseline, we defined a Dummy classifier that randomly predicts the outcome based on the training groundtruth distribution.For each experiment presented in Section 6, we adopt a common validation strategy: for each discussed model, we find the best hyper-parameters through a grid-search validation based on a training, validation and testing split of 70%, 10%, and 20% of samples, respectively.For private inferring tasks (i.e., age and gender), the splits contain different set of users, i.e., users in training are not present in validation and testing set.Similar, users in validation are not present in both training and testing set.Machine learning models are designed as a multilabel task for the user identification task.On the opposite, we considered binary tasks both age (i.e., young and old) and gender (i.e., male and female).Note that the young class correspond to users defined in [19 − 24] (AR) and [23 − 30]; the old class is defined in [25,29] (AR) and [31 − 69].We now report the parameter grids involved in the grid-searches.
To provide accurate results, each experiment is repeated five times.We thus report both mean and standard deviation of the F1-scores (with macro average).We implemented our experiments in Python 3.8.5 and we used Scikit-Learn [56] library for training models and validation algorithms.

Results
In this section, we present the results of our experiments.We present both results for the task and action levels, in sections 5.1 and 5.2, respectively.We then conclude with an ablation study to better understand the effect of different sensors to models' performance (Section 6.3),

Task-Level
In this section, we present profiling performance at a task-level.In particular, each presented experiment consider distinctly the tasks presented in Section 5.1.In more details, we train, validate, and test our model only on the task under investigation, predicting each time the identity, age, and gender separately.For instance, we train a specific model to predict gender based only on the Mental Task.

Identification
Figure 4 shows the identification results in AR and VR environments.LR and RI achieved the highest (and comparable) performances in AR, whereas LR and RF performed best in VR.In general, all our algorithms outperform the baseline (Dummy).Looking at the results on the Overall Tasks, both in VR (OT-VR) and AR (OT-AR), we immediately notice that in VR identification, the performances remain quite stable as the number of users increases, while AR degrades significantly.Indeed, AR best algorithms performance goes from near 90% F1-Score (two users) to slightly above 60% F1-Score (30 users).Instead, in VR, LR yields almost perfect prediction on two users, while the F1-Score is above 95% when performing identification over 30 users.This might reflect the different amount of sensors available in VR (headset, controller, and eye-related behaviors) compared to those available in AR (only headset-related behaviors).We further discuss the impact of each of the involved sensors in Section 6.3.
When looking at the individual tasks, we can see that the identification algorithm performs even better than in the overall task, particularly in AR.For instance, we reached 70% F1-Score over 30 users in the NT-Low, which is roughly 10% higher than in the OT-AR.One reason of this result might be related to the nature of the performed task: in the NT-Low, participants were actively moving in the surroundings without performing any additional task.Therefore, their movements might have been more linear compared to the situation in which they performed the same task under high workload (NT-High), thus revealing more identifiable movements' patterns.The same does not apply to the VR scenario.Here, when looking at each of the identified actions, the higher the workload the better the performance of the identification algorithm.Indeed, the best performance was obtained at the AT-High and CT-High, where the F1-Score was around 95% and 97%, respectively.Again, possible explanations might be related to the nature of the tasks and to the number of sensors embedded in the devices.In the VR scenario, participants were only moving their upper body, and in the high workload conditions they were additionally engaged in a secondary mental task.We know from literature that higher workload is related to higher changes in the eye behavior [49].Therefore, the VR-embedded eye-tracker might have had an important impact on the identification performance, particularly when users were under higher mental strain rather than when performing less demanding tasks (i.e., CT-Low, AT-Low).

Age
Figure 5 shows the age classification results in AR and VR environment at task-level.Results from the age profiling clearly yielded better performance in the VR compared to the AR scenario.While in VR all models performed significantly better than the baseline, in AR the F1-Score was consistently lower than the baseline, in all tasks.This is likely to be related to the low age variability of participants that took part in the AR experiment.Therefore, we here only discuss performances of age profiling only in relation to the VR experiment.
In VR, the LR and RF algorithms appear to perform better then the other models in all tasks, but in the OT, whereas RI produced higher F1-Score compared to LR.On the task-level, the users' age was profiled with higher accuracy when they performed the pick-and-place task via physical actions (AT-High and AT-Low, in which F1-Score was around 90% and 85% respectively) compared to controller buttons (CB-High, CB-Low, in which F1-Score was below 80% in both cases).A possible interpretation on this point is that the movements' pattern of older users might have been quite different from younger users.Also, we know from literature that robot teleoperation is significantly influenced by age [57].In this view, our algorithm was particularly successful in detecting users' age during the pick-and-place task only when physical actions were allowed.

Gender
Figure 6 shows the gender classification results in AR and VR environment at task-level.When profiling users' gender, we obtained substantially better results in VR compared to AR.Indeed, in VR, all the tested algorithms performed above the baseline (dummy).More specifically, we can observe a better performance obtained through LR and RF, which reached a maximum F1-Score of 75%.Differently, when detecting users' gender in the AR scenario, our algorithms performed only 5-10% above the baseline.
For the algorithms' performance within each of the identified tasks in VR, a better performance is achieved in tasks involving higher workload (CT-High, AT-High) compared to those under low workload (CT-Low, AT-Low).These results align with recent literature on behavioral gender differences in the VR pick-and-place task.For instance, Nenna et al. [50] demonstrated how men outperformed women in the pick-and-place tasks in terms of task execution time, particularly when using controller buttons.These differences might have been even more marked when performing  an additional mental task, thus allowing a better gender profiling.We observe a similar trend in the AR scenario, in which better performance are reached in the task involving higher workload (NT-High).This behavior reflects previous findings related to the different walking pattern between men and women [48].Indeed, on average, the walking velocity of men is significantly higher than women' one, particularly under high workload.As we were recording the headset shifts in time, the different walking velocity might have been prominent in the gender profiling.Figure 6: Gender profiling on task-level.

Action-Level
Starting from the results obtained in the overall task, we here aimed at seeing whether some actions had a particular influence on the identification and profiling performances.Specifically, we opted for leveraging the model that demonstrated better results, which was the Logistic Regression (LR).Each presented experiment consider distinctly the tasks presented in Section 5.2.In more details, we train, validate, and test our model only on the action under investigation, predicting each time the identity, age, and gender separately.For instance, we train a specific model to predict age based only on Button Interaction with Low Workload.

Identification
Table 4 shows the identification results in AR and VR environment at action-level.Performance obtained on task-level reached an F1-Score of about 60% in the AR and above 90% in the VR scenario.When looking at the action-level, specifically for AR, we see that the walking action reaches the highest performance (F1-Score is about 0.80% under low workload and 0.78% under high workload), while the search action and button interaction reveal F1-Scores below 0.70%.This suggests that the walking action is prominent in identifying users in AR, possibly because the walking pattern is the most singular feature in such a use-case of AR.Differently, in VR, we observe higher F1-Scores for both button and physical interactions specifically under high workload (F1-Score is about 0.96% in both cases).Also the pointing action reached a very similar F1-Score (0.96%), while the idle time intervals yield lower F1-Scores (below 0.80% both under high and low workloads).It seems that the most interactive actions (using controller buttons, pointing and physically moving the upper body) thus yield better results compared to periods in which users were passively looking at the virtual surroundings.5).Regarding the VR scenario, we can note that, under low workload, the pointing (F1-Score = 0.88%) and physical interactions (F1-Score = 0.82%) were the most crucial in profiling users' age, compared to actions allowing less interactivity with the virtual environment (F1-Scores below 0.80%).This might be an hint for a different movement and interaction pattern shown by older and younger users, which comes out particularly when higher freedom of movement is allowed.This is also in line with what observed on task-level.Moreover, this trend becomes even more evident when the physical interactions are performed under high workload (F1-Score = 0.90%), likely reflecting the multitasking and motor difficulties related to age [58].

Gender
Table 6 shows the gender classification results in AR and VR environment at action-level.On task-level, our algorithms generated an F1-Score of about 0.5% in AR and above 0.7% in VR.Even though the gender profiling did not perform sufficiently well in AR, we can here observe that, under high workload, both walk (F1-Score = 0.6%) and search (F1-Score = 0.58%) had a major influence in detecting the user gender compared to the same actions performed under low workload, and also to the button interaction (all f-scores ¡ 0.50%).These results are in line with what observed on task-level, whereby the gender profiling performed better in the NT-high compared to NT-low.Additionally, we here observe how the walking action has the larger influence on the accuracy of gender profiling compared to the other actions.Again, it might be related to different walking velocities demonstrated by men and women, particularly under high workload [48].
When looking at the actions performed in VR, the pointing action stands out.With an f-score of 0.82%, it strongly contributes to the gender profiling compared to all other actions.This might be related both to a singular movement pattern and/or to gender-related eye parameters' variations.Further, results obtained at task-level on a better performance achieved under high compared to low workload are here confirmed only for button interactions.Indeed, the F1-Score at button interactions is about 0.08% higher when users are under high rather than low workload.Again, this reflects results showed in previous study demonstrating faster operation times in men compared to women specifically when using controller buttons, but not when acting via physical actions [49].This suggests that profiling users' gender might be easier during tasks involving button interactions, but not in those allowing higher interactivity with the virtual environment.

Sensors Relevance -Ablation Study
In this section, we conduct an ablation study to understand which sensors contribute the most in our identification, age, and gender predictions.In brief, we trained a Logistic Regression (LR) using only specific subsets of features.
In the AR environment, we distinguish between Head Position and Head Rotation features.In VR, we also consider Eyes, Controller Position, and Controller Rotation features.The ablation study was carried out both at Task-Level (Section 6.3.1) and Action-Level (Section 6.3.2).

Task-level
Table 7 and Table 8 show the results of the ablation study for AR and VR tasks, respectively.In the AR environment, Head Rotation features are predominant in the Mental Task for identification and gender prediction.Indeed, in this task, participants were standing still and were instructed to don't move their head; however, it was plausible that their head oscillated in singular ways, which were detected by our algorithm and leveraged for their identification.In opposition, during the navigation task, Head Position has more impact in all the targets, given that it records the walking patterns.Such pattern was used in the literature to identify people [59], and could help in Age and Gender prediction as well.
In VR, the identification stage seems to be driven manly by Eyes features, followed by Controller features.Reasonably, eyes blinking patterns and pupils' dilatation can be person-specific [53] [54] [55], and thus act as a biometric feature.The controllers, instead, were the main means to interact with the virtual world.Thus, it is reasonable that how a person interact within the environment helps in the identification.This result is in line with recent founds on video games using mouse and keyboards to profile users [37].Therefore, we could expect AR identification achieve better performances if such sensors are available, particularly eyes trackers, as reasoned before in Section 6.1.In predicting the age, the Controller features yields the best performance.This finding can be the consequence of younger people being more familiar with joystick usage.When the workload is high, younger participants may pay more attention to the task rather than how to use the joystick.Moreover, in a low workload scenario, Head and Eyes features contributes similarly.On the other hand, in gender inference, the Head and Eyes features play the most significant role.Indeed, as shown in past literature, there are gender-based differences in how they visually explore a virtual world [60].Controller features influence the prediction mainly in high workload controller based tasks.

Action-level
Table 9 and Table 10 reports the results of the ablation study for AR and VR Actions, respectively.In AR, the Head Position has more impact than Head Rotation in predicting our target actions, especially for the walk action.This is reasonable given that such sensor is mainly recording the users' walking speed.Head Rotation becomes relevant in the Button Interaction action, in which the participants could just rotate their head, and is quite useful to distinguish between genders.As in previous results, the age was difficult to predict.The only case in which we surpass the baseline is in the Walk action with high workload, but the improvement is too small to reason about it.
Looking at VR, we notice that Head Position remains relevant to predict the gender, particularly in scenarios with low workload.However, most of the times, the Eyes features are the main discriminant to predict our targets.In identification, Eyes reached the highest F1-Score in six out of seven actions, suggesting that these features might be the main reason behind the higher identification performances in VR rather than AR.Further, Eyes are predominant in low workload scenarios to predict the users' age.Controller features are quite useful to infer the user's age especially in high workload actions, while only small differences appear in their usage from people of different genders.Regarding the identification task, the Controller Rotation appears more useful than Controller Position.Last, it is interesting to see how in the idle actions, the Eyes play a significant role, particularly in the high workload scenario, in which were able to identify a person with 81% of F1-Score.

Discussions and Conclusions
The profiling of users wearing virtual technologies can present several opportunities and threats, so it is important to examine it closely.In this work, we performed users' identification and profiling in two virtual-based scenarios, one involving AR and the other involving VR.We aimed to test different algorithms and leverage behavioral data outputted by the two virtual devices to accurately trace it back to the user identity and personal information (i.e., gender and age).Further, we developed a generic pipeline that can be used with different virtual devices and in different behavioral contexts: i.e., while walking, searching for landmarks in the surroundings, pointing to a virtual keyboard for typing, and operating on a virtual robot both via controller-based interaction and physical actions.Specifically, both virtual environments simulated highly realistic scenarios and most of these behaviors were executed both under high and low workload, giving a good insight on realistic applications of virtual technologies in the field.
The results show that users can be identified and profiled both in AR and VR, with VR accuracy being higher.Specifically, in AR, user identification reached good results within the walking action at a low workload, while in VR, the identification algorithm was particularly successful when users performed more physical actions (i.e., pointing, physically interacting with the virtual robot) under a higher workload.As observed from the ablation study, this was mainly due to the additional eye-tracking sensors embedded in the VR but not in the AR headset.Indeed, while in VR the eye parameter had the most significant impact, the head movement revealed had the greatest influence on the AR users' identification.
When detecting age, instead, our algorithms were not able to accurately detect the users' age in AR.This was plausibly related to the low age variability of the tested sample, as the age of participants that took part at the experiment ranged between 19 and 29.Differently, in VR, we worked on an experimental sample whose age ranged between 23 and 69 years old, and our algorithms were thus able to detect the user age with good accuracy.Age detection performed better in the most physical actions and interactions rather than in those involving just joysticks and controller buttons, specifically under a higher workload.Interestingly, eye parameters revealed to have the greatest influence on age detection in all actions but in the physical interactions, in which the controller position and rotation had higher impacts.
On gender profiling, instead, we observed how the walking activity was again the most prominent in helping detect the user's gender in AR, with the head position being the most influential sensor for detecting such personal information.Differently, in VR, our algorithms better performed during the pointing action and under actions at high workload.
In this case, the eye-related behaviors demonstrated the most considerable influence on gender detection during both these actions.In agreement with AR findings, the head position is quite relevant.Both findings align with literature on the different eye and head movement behaviors between men and women.
In conclusion, our work thoroughly studied users' profiling in AR and VR technologies.To the best of our knowledge, previous applied research on user profiling never compared performance obtained with these technologies.On this matter, our results highlighted that profiling is more straightforward in Virtual Reality.Through our ablation study, we additionally found eye sensors to be particularly useful in all our predictions (i.e., identification, age, gender), thus revealing to be likely responsible for the performance differences between AR and VR.Therefore, while being conscious of the technical challenges of accurately detecting eye behaviors in the real world, our findings highlight the importance of incorporating eye-tracking technologies to AR headsets.To sum up, our work show the potential of user profiling methodologies with virtual technologies, and pave the road to several future works on how to improve AR and VR technologies with respect to users' profiling.

Figure 1 :
Figure 1: Overview of the proposed framework for user profiling in Augmented and Virtual Reality.

Table 1 :
State of the art overview.

Table 2 :
Augmented Reality actions organized per type of operation and workload level.

Table 3 :
Virtual Reality actions organized per type of operation and workload level.
5.3.1 De-biasing and Feature Extraction

Table 4 :
User identification on action-level organized per type of operation and workload level.Random guess at 0.03 for both AR and VR tasks.All the measures in F1-Score.

Table 5
shows the age classification results in AR and VR environment at action-level.Users' age was profiled with an F1-Score of about 0.5% on the overall task executed in AR, and 0.80% in VR.As the age profiling revealed to be unsuccessful in AR, we will not pay close attention on the action-level results on this use-case.This results confirm what we observed at task-level (see Figure

Table 5 :
Age profiling on action-level organized per type of operation and workload level.Random guess at 0.5 for both AR and VR tasks.All the measures in F1-Score.

Table 6 :
Gender profiling on action-level organized per type of operation and workload level.Random guess at 0.5 for both AR and VR tasks.All the measures in F1-Score.

Table 7 :
Ablation study of sensor importance at task-level in AR.All the measures in F1-Score.

Table 8 :
Ablation study of sensor importance at task-level in VR.All the measures in F1-Score.

Table 9 :
Ablation study of sensor importance at action-level in AR.All the measures in F1-Score.

Table 10 :
Ablation study of sensor importance at action-level in VR.All the measures in F1-Score.