A Study on Collaborative Visual Data Analysis in Augmented Reality with Asymmetric Display Types

Collaboration is a key aspect of immersive visual data analysis. Due to its inherent benefit of seeing co-located collaborators, augmented reality is often useful in such collaborative scenarios. However, to enable the augmentation of the real environment, there are different types of technology available. While there are constant developments in specific devices, each of these device types provide different premises for collaborative visual data analysis. In our work we combine handheld, optical see-through and video see-through displays to explore and understand the impact of these different device types in collaborative immersive analytics. We conducted a mixed-methods collaborative user study where groups of three performed a shared data analysis task in augmented reality with each user working on a different device, to explore differences in collaborative behaviour, user experience and usage patterns. Both quantitative and qualitative data revealed differences in user experience and usage patterns. For collaboration, the different display types influenced how well participants could participate in the collaborative data analysis, nevertheless, there was no measurable effect in verbal communication.


INTRODUCTION
Collaboration is an integral part of immersive analytics [4,11] and crossvirtuality analytics [38].It allows users to gain a deeper understanding of their data by discussing it in a shared space with their peers.There have been multiple efforts to create and explore such immersive spaces for shared data analysis in virtual reality (VR) [13,25], Augmented Reality (AR) [7,23,43] and Cross-Reality (CR) [10,41], as well as frameworks that facilitate data visualisation in immersive spaces [12].Nevertheless, the AR space is especially interesting for co-located collaboration, as users can directly see their team members.Therefore, we want to further explore the AR stage of Milgram's Reality-Virtuality Continuum (RVC) [28] for collaborative immersive analytics [4].
There is ongoing development in consumer grade display devices for AR.Nevertheless, the overarching categories of display devices are still similar to the comparisons by Milgram et al. [28] and Rolland et al. [39].Now, handheld (HH) devices are highly accessible, as every modern smartphone can function as a display device for AR.Optical see-through (OST) devices have become established, especially for use cases in industry, such as the Microsoft HoloLens 2 1 or the Magic Leap 2 2 .In video see-through (VST) devices, there is continuous 1 https://www.microsoft.com/de-de/hololens 2 https://www.magicleap.com/magic-leap-2

Real Environment
Virtual Environment Augmented Virtuality (AV)

Notebook
Fig. 2: Milgram's RVC [28] with a subdivision of the Augmented Reality stage according to different device technologies used in the user study.
development leading to high resolution head-mounted displays (HMDs) becoming more readily available, such as the Varjo XR-3 3 , the Apple Vision Pro 4 and the Meta Quest 3 5 .With this trend of progression in all categories of AR display devices, it is not yet possible to tell if all these device classes will establish themselves in parallel or if one type of devices will prevail, especially since all of these devices come with advantages, and disadvantages.Therefore, a scenario with more than one category of display devices is becoming more and more realistic.While there has already been considerable work in collaboration with symmetric devices and collaboration across realities, co-located collaboration in AR with asymmetric display devices has only been explored sparsely at this point [26].
To our knowledge, there is at this point no research that examines collaboration with different AR display technologies as well as the differences, advantages and disadvantages of each of the devices.Exploring these different categories of devices can therefore shed light on a novel yet realistic kind of asymmetric collaboration.
To investigate this, we want to explore how co-located collaboration using asymmetric display technologies can be performed in the context of immersive analytics.We focus on collaboration where each user has their own personal display.Using a within-subjects mixedmethods study design, we combine quantitative and qualitative as well as objective and subjective data to explore the scenario from multiple perspectives.The within-subjects design also allows us to compare the different display technologies and their capabilities from the participants' subjective perspective, as all participants get to know the limitations and possibilities of all devices.Therefore, our main contributions are insights on the influence of asymmetric display devices on collaborative data analysis as well as the usage of each of the devices within the collaborative data analysis process.

RELATED WORK
Our research builds upon prior findings in the area of collaborative AR and its various application domains.Given the extensive nature of this field, we concentrate our discussion on three key categories that have influenced our work: co-located collaboration with an asymmetric technology approach, the application area collaborative immersive analytics and the asymmetry of AR technology.

Co-located Collaboration in AR
While the initial prototypes for collaborative AR date back almost three decades [6,37,47], recent surveys [33,43] indicate that this remains a prominent and extensively discussed topic.It continues to pose open research questions and challenges across various research domains [15,32,46].
Combining these different output devices led to co-located collaboration with asymmetric AR technology, which aligns with the primary focus of our research.In a study conducted by MacWilliams et al., they combined an AR-HMD with a handheld AR device, a notebook, and projection-based AR to develop a game centred around herding sheep [26].It showcases the potential of merging various devices with tangible interaction.However, its main emphasis lies in the technical implementation, and it incorporates only informal user feedback, rather than a detailed examination of device disparities and their impact on collaborative dynamics.

Collaborative Immersive Analytics
Collaborative immersive analytics involves users collectively analysing data and making decisions using immersive technologies such as AR in a collaborative work environment [4].AR offers distinct advantages in this context by reducing cognitive load [5] and enabling seamless interaction with both the physical and virtual world simultaneously [40,47].It extends traditional 2D displays used in data analysis, enhancing them with stereoscopic 3D content [7,27,48] or overcome screen space limitations [23].This approach allows for differentiation between a user's individual perspective and a shared view, which remains consistent across all participants [36].Users can then utilise their personal workspace to interact and manipulate the shared AR content [42].
Seraji and Stürzlinger have recently developed and investigated an asymmetric immersive system tailored for collaborative analysis of scatterplots, histograms, line graphs, and parallel coordinate plots [41].Their approach divides users into two groups: a desktop group and an immersive HMD-based AR group.Each group can customise the visualisation within their private space or share it with the other group.Their preliminary pilot study suggests that this system is easy to understand, engaging, and enhances data exploration.

Asymmetry in Hardware (VST, OST and HH)
HMD-based AR technology can be categorised into two main types: OST and VST [39].OST merges real and virtual content by using optical combiners, while VST records the real environment with cameras and integrates virtual content through a video compositor [1,39].Roland et al. classified HMD challenges into three primary groups [39].These include technological concerns, such as latency and field of view; perceptual issues, such as depth of field and object overlapping; and human factors considerations, encompassing user acceptance and safety.HH devices, although similar to VST as they rely on cameras for real-world perception [29], differ by being non-stereoscopic, having a limited screen size, and consequently, a smaller field of view [43].These devices provide tangible feedback but have the drawback of occupying the user's hands and necessitating attention on the screen for augmented content, potentially constraining interaction with collaborators or the physical environment [30].To address certain limitations, prior research has proposed pairing an AR HMD with an HH device, such as a tangible input device [2,8,26].friedlknirsch eT Al.: A sTUdY On cOllABOrATiVe VisUAl dATA AnAlYsis in AUGMenTed reAliTY ...

CONCEPT AND DESIGN
Within the AR stage on the RVC different types of display technologies are available.While they all enable data analysis within AR, providing a view on the real environment with registered 3D data, some technologies lean more towards the real end of the RVC, and others tend to go more towards the virtual end of the RVC.We illustrated the location of these display devices on the RVC as well as the other tools we utilise in the user study, i.e. pen & paper and a notebook in Fig. 2.
• Handheld devices are the closest to reality bounds of the RVC.The user is located in the unmediated reality and has a small window to the augmented world.
• In contrast, OST devices are head-worn.The users can therefore permanently see their environment augmented with digital content.Their view on the real world is also not completely unmediated, as there is a translucent screen mounted in front of their eyes.How much this screen interferes with the users' view on reality depends on the specific device and chosen technology.For example, the Magic Leap OST device used in our study, has a slightly tinted screen, and will therefore darken the view on reality, for the sake of higher contrast levels when displaying digital content.
• The VST devices on the other hand completely mediate what users view, as they are head-worn and everything users see is first recorded by cameras and then displayed on the screen.Users are therefore more dependent on the technological specifications of the device.Nevertheless, VST devices provide the opportunity for seamless transitions towards complete VR.Besides the RVC, there are three other dimensions to be considered for analysing the capabilities of different display types, that were presented by Milgram et al. [28]: Extent of World Knowledge (EWK), Reproduction Fidelity (RF) and the Extent of Presence Metaphor (EPM).Figure 3 shows our classification of the device categories along the dimensions of the theoretical framework introduced by Milgram et al., considering the current state of technology, advantages, and downsides of each category of displays.
Extent of World Knowledge: The EWK describes how much and detailed information about the environment and objects the user sees is available to the system [28].This world knowledge is highly dependent on the actual device implementation with their available sensors.In modern devices, there can be great differences that are not specific to their display type.For example, there are some HH devices with built in LiDAR scanners, which enables room reconstruction.Furthermore, front-facing cameras in HH devices can gather the same information as the eye tracking sensors of an HMD.Thus, all of the display device types range from specific devices with low EWK to devices with high EWK.To reflect this wide range of devices, all the display types are assigned the same value in the middle of the EWK dimension in Fig. 3.
Reproduction Fidelity: The RF dimension represents the ability of the device to display the objects.In the original description it describes the quality of both the virtual and real objects that are displayed.It includes both a progression dimension of hardware and one of computer graphic modelling and rendering techniques.However, both dimensions are matched to the same axis, arguing that the perfect configuration in both cases would refer to the real and virtual content being indistinguishable.While we agree with this definition of this shared upper bound, we believe that it is still useful to differ between the display capabilities for real and digital objects, as it depends on the specific AR application scenario whether RF of digital content (RF-digital) is more relevant or whether the RF of real content (RF-real) is more important The VST devices are ranked the highest on the RF-digital dimension as they are independent of outside influences, such as lighting, and are able to display high fidelity 3D animations.Nevertheless, the actual resolution depends on the specific hardware, which is why we allocated it at the upper mid of the dimension.OST devices are located just below the mid as they lack behind in colour representation, opacity and FOV.While the HH devices also provide a smaller FOV, they are ranked higher than OST devices as they do not share the difficulties with opacity and colour representation.On the RF-real dimension, the OST devices are ranked the highest as they provide an almost completely unmediated view on the real environment, where users simply look through a translucent screen.For VST and HH devices, on the other hand, RF-real is only as good as their respective cameras.Yet, VST devices do provide a stereoscopic view on the real environment and are therefore positioned slightly higher than HH displays.While HH devices do provide the opportunity to see the real environment completely unmediated when they are put aside, we refer to their capabilities when used in an AR scene.
Extent of Presence Metaphor: The last dimension is EPM, which describes the technical abilities of devices to allow users to feel present in the displayed augmented scene.There are different definitions around the term presence [44].Although Milgram et al. use the term presence [28] it describes the system capabilities rather than the subjective feeling of users and would therefore be more closely related to Slater's definition of immersion [44].The HH devices provide the least amount of these technical abilities with no stereoscopic view and the need to be deliberately held up to view the AR environment.Therefore, they are placed at the lower end of this dimension.OST and VST devices provide a stereoscopic view based on the location of the user's head instead of their hands, as it is the case with the HH device category.Nevertheless, current OST devices provide a more limited FOV, and the displayed content is less vibrant and sharp than for VST devices.Therefore, we ranked OST displays slightly lower than their VST counterparts, see Fig. 3.
Besides the work of Milgram et al. [28], that we used to classify the display technologies we use in our study, as illustrated in Fig. 3, there is also the work of Rolland et al. which discusses the differences between the VST and OST hardware classes [39].Based on these differences that we find in early research in AR display systems as well as our own classification of modern hardware along Milgram's dimensions [28], we want to explore how they influence user collaboration in the context of visual data analysis.More specifically, we want to empirically evaluate the impact of asymmetric AR displays on collaborative behaviour, user experience and usage patterns of users in an immersive analytics scenario.Therefore, we implemented a multi-user AR system which combines three different categories of AR display devices, i.e.HH, OST and VST.We then conduct a collaborative mixed-methods withinsubjects user study with groups of three users at a time.Finally, we analyse data from multiple sources, such as quantitative standardised questionnaires, quantitative objective system log data and qualitative semi-structured interview data to arrive at conclusions that are validated by multiple data sources to answer the following research questions: RQ1: How do the different display devices influence the communication and collaboration behaviour of the users?
RQ2: How does user experience in a collaborative immersive analytics task differ between the AR display types?
RQ3: Do users exhibit different usage patterns in a collaborative data analysis scenario with each of the AR display devices?

IMPLEMENTATION
This section describes the specific implementation of the concept presented in Sec. 3 and the overall prototype setup (see Fig. 4).

Device Configuration
Our collaborative prototype encompasses three distinct device setups designed for group data analysis.Each setup comprises an AR device paired with a notebook.These setups include the HH device (Samsung Galaxy Tab S7), OST (Magic Leap One), and VST (Varjo XR-3), with the notebook being consistent across all configurations.It is important to note that the data visualisation and manipulation capabilities remain uniform across all setups.The primary distinction lies in the underlying device technology, influencing how the augmentation is delivered to users.This distinction and its implications are the fundamental focus of our research.The notebook serves as a common input interface for all device setups, ensuring uniform input modalities across all users.The only exception is when users need to select specific details within the AR visualisation, which is achieved via 6-DoF controller (OST, VST) or touch input (HH).The collaborative prototype facilitates the sharing of visualisations among users, thereby enhancing data analysis.To enable this feature, each device is linked to a server PC responsible for managing crossplatform communication.

Cross-Platform Communication
To accommodate all three device types within a single software solution, we adopted a cross-platform approach.We used the Unity Engine (2020.3.48f) because it supports the development for different operating systems including Windows, Android and Lumin OS by integrating the various software packages.This approach allowed us to create a single project that is compatible with multiple platforms.To achieve this cross-platform functionality, we utilised conditional compilation to selectively include or exclude platform-dependent code.We established a local network employing a client-server architecture to facilitate communication between all devices.The three notebooks and the three AR devices act as clients and establish connections with the server.Our primary focus was to ensure a responsive network communication [34] to support a seamless and consistent data-sharing experience among all three users.To optimise network efficiency, we utilised Unity's Netcode Library, which supports server remote procedure calls (RPCs) and client RPCs.Data manipulations were distributed via event messages exchanged between the server and clients.

Tracking
For spatial registration of all three AR devices within the same tracking space and synchronising their position and orientation, we opted for optical tracking for the initial position determination.To ensure comprehensive tracking coverage from all angles, we employed a combination of five different fiducial markers arranged to form a cube.This setup enables all three users to freely move around the desk, facilitating face-to-face collaboration.
The data visualisation remains anchored to a fixed position relative to the central fiducial marker on the desk.Both the Android Tablet and the Magic Leap One devices rely on ARCore optical tracking to continually align the data visualisation with the reference position of the fiducial marker.In cases where the marker becomes obscured from the optical tracking system, the position and orientation are determined through a combination of inertial tracking and optical tracking of spatial features, a process managed by the ARCore library.
In contrast, the Varjo XR-3 employs optical tracking solely for the initial alignment phase.Following this initial alignment, the tracking system transitions to laser-based lighthouse tracking to enhance tracking accuracy and stability.

Data Visualisation
To create an immersive data visualisation environment, we integrated the IATK [12] Unity plugin into the prototype.This plugin offers interactive and scalable data visualisation capabilities and is compatible with OpenXR headsets.To meet the specific needs of our prototype, we implemented the following enhancements: • Details-on-Demand: We introduced a feature that allows users to access additional information for each data point as needed.
• Billboard for Axis Labelling: To improve readability from all angles, we incorporated billboards for labelling the axes.
• Support for Nominal Data: We extended the plugin's functionality to include support for nominal data, accommodating our requirements for data representation.friedlknirsch eT Al.: A sTUdY On cOllABOrATiVe VisUAl dATA AnAlYsis in AUGMenTed reAliTY ...

Input and Interaction
The interaction across all three device setups is fundamentally the same, as we want to isolate our observations on the display technology.
All setups use a mouse as input device to interact with GUI elements on the notebook and to provide input for data manipulation.On the notebook, users have the capability to manipulate and configure the data visualisation, as well as initiate the sharing of their current AR plot view.The parameterisation and configuration settings of the visualisation chosen by the user who wishes to share are then transferred to the other two users.
While the general interaction remains consistent across all setups, we had to modify the input method for accessing details-on-demand.This adjustment was necessary because the spatial input required for this interaction would be impractical on the notebook's 2D interface.Therefore, Varjo and Magic Leap users can utilise a 6-DoF controller to select data points, which is then represented on a billboard.In the case of the tablet, data points can be selected through touch input.
Users can configure the data visualisation through the GUI interface on the notebook by performing the following actions: • Assign Dimensions to Axes: Users have the flexibility to designate specific dimensions to the axes, allowing for tailored data representation.
• Adjust Axis Scaling: The scaling of the axes can be modified to alter the data's visual proportions as needed.
• Rotate Axes and Resize the Plot: Users can rotate the axes and resize the plot to achieve the desired orientation and visual size.
• Toggle Camera Modes: Users can switch between orthographic and perspective camera modes within the notebook's plot preview, adapting the view to their preferences.
• Apply Axis Filters: The GUI enables users to apply filters to individual axes, facilitating focused data exploration and analysis.

USER STUDY
We chose a within-subjects mixed-methods study design for a collaborative study in groups of three within a visual data analysis scenario.This allows us to draw conclusions on the impact of the three distinct categories of AR display devices on collaborative data analysis, examined from both quantitative and qualitative as well as objective and subjective perspectives.

Pre-Study
We conducted a short pre-study [14] with two groups of three people.First, users performed a demo trial to get familiar with the prototype and the task.Afterwards, the three study trials were administered, where users switched the display devices after each trial.By performing the demo trial first, users were exposed to the first device twice as long as for the other devices.Therefore, we omitted the demo trial and substituted it with a video of the interaction with the desktop interface.Moreover, we observed that users did not move around the table as much as we had anticipated.The interviews then revealed that being seated at a specific spot around the table may have impeded this physical use of the space.Moreover, participants reported severe issues with neck pain when wearing the VST device.To encourage users to use the physical space and to limit the effect of the heavy HMD, we switched to stand-up work desks.This way, users would automatically be encouraged to use the physical space while also being able to balance out the weight of the HMD.

Participants
For the study we recruited 18 participants (10 male, 8 female, 0 diverse) from faculty, students, and companies around campus.The participants were split into six groups of three.Based on the rating on a ninepoint Likert scale, we calculated a familiarity percentage.This level of familiarity varies greatly between the groups (G1 = 75%, G2 = 95.83%,G3 = 45.83%,G4 = 56.25%,G5 = 20.83%,G6 = 37.5%).The average age was 31.78 (SD = 8.68).All participants had normal or corrected to normal eyesight.Due to the hardware limitations of our OST device, we had to exclude persons who wear glasses.Thus, every person with corrected eyesight wore contact lenses.Additionally, the OST device was only truly suitable for users with an interpupillary distance (IPD) smaller than 65 mm.Since we could not measure the IPD for each participant beforehand, we asked users with an IPD greater than 65 mm, whether they experienced difficulties with clearly seeing the digital objects when using the OST device.While three participants did report some degree of blur, all participants mentioned that it did not negatively influence their ability to work on the task.We asked users to rate their experience with certain technology using a nine-point Likert scale ranging from 0-no experience to 8-Expert, with a value of 4-medium marking the middle of the scale.On average users reported little to medium experience with visual data analysis (M = 3.61; SD = 2.45), VR (M = 3.89; SD = 2.40), video-based AR (M = 3.39; SD = 2.45), optical see-through AR (M = 2.50; SD = 2.46) and handheld AR (M = 3.00; SD = 1.86).

Dataset and Task
As a dataset we chose the AutoMPG dataset from the UCI machine learning repository 6 , which has also been used in other studies on visual data analysis such as [9,45].It was chosen as it provides comprehensible correlations that can be analysed by users without experience in visual data analysis.Since the participants in our study are not used to measuring gasoline consumption in miles per gallon, we transformed this dimension to the more common unit litres per 100 kilometres.
For each run we asked participants to collaboratively answer two questions, which were designed to be answerable by novice users who had never seen a 3D scatterplot before.The first was a simple search task where participants had to find a car that met three predefined requirements, which is a common task also used in studies such as [19,45].The requirements were set in a way so there would only be one car that met the criteria.This task allowed participants to get familiar with interacting with each device at the start of a condition.The second task was a trend identification task, similar to [9,45].Therefore, participants were given four dimensions and they had to identify if these data dimensions showed any visible correlations.This task allowed users to discuss different perspectives and viewpoints of the data set while still limiting the overall study time, as opposed to an open exploration task.The study was only continued when all participants agreed on their answer.

Apparatus
The study prototype was executed on the Varjo XR-3 7 HMD as a VST device, a Magic Leap One8 as an OST device, and a Samsung Galaxy Tab S7 9 as an HH device.Additionally, we used three standard notebooks with a 14" display.In addition, an HTC Vive handheld controller was used as an input device for the VST.The VST was powered by a GeForceRTX 3090, an Intel Core i9-11900K, and 64 GB of RAM, resulting in an average frame rate of 85 frames per second.The tracking space measured 4x4 meters.The device configuration is illustrated in Fig. 4 and the study setting is displayed in Fig. 1.

Measures
We included different types of measures in our mixed-methods design to collect both quantitative and qualitative data, as well as subjective and objective data.
RQ1: Collaboration For user communication and collaboration, we used the recordings of the user study to quantitatively measure how much participants talked during the study, which was also used by Kiyokawa et al. [22].We also included an adapted version of the subjective measures questionnaire from Kiyokawa et al. [22] for more detailed information on how the devices influenced their collaboration.Additionally, we observed the users' collaboration during the study and conducted a semi-structured group interview, as a source of qualitative data.There, we asked users about the perceived influence of the devices on their collaboration and communication behaviour.
RQ2: User Experience To measure user experience we administered the user experience questionnaire (UEQ) [24] as a standardised quantitative measure.The UEQ calculates scores for six different subscales related to user experience, namely attractiveness, perspicuity, efficiency, dependability, stimulation, and novelty.We also asked users to rank the display types from their favourite to their least favourite device.Additionally, we asked participants in the interview about why they preferred a specific device and what their user experience with each of these devices was, as a qualitative measure.The Simulator Sickness Questionnaire (SSQ) [21] was administered once before the study, to gather baseline data, and then after each condition.We chose this tool as it is commonly used and allows for a detailed analysis using subscales for nausea, disorientation, and oculomotor related symptoms.For measuring differences in perceived task load between the devices we administered the NASA Task Load Index (NASA-TLX) [17], which includes subscales for physical, mental, and temporal demand, es well as subjective assessments of performance, effort, and frustration.
RQ3: Usage Patterns For the usage patterns, we analysed differences in the system log data to find out how much and in which way participants interacted with the AR space and the user interface on the notebook.Additionally, we observed the users' behaviour during the study.Finally, we asked participants in the group interview about their strategies for solving the tasks and how suitable each of the device categories were for the different aspects of the task, such as data analysis, data selection, manipulating the scatterplot on the notebook and writing on a piece of paper.

Procedure
First, participants received a short explanation on the study purpose, gave their informed consent and filled out a pre study SSQ.Next, the study procedure was explained in detail and participants were given an introduction into the desktop interface for the study and an explanation on the dataset.This introduction was aided by screenshots and a demo video of the notebook interface.When there were no more questions, the task was explained to the participants, and they were introduced to the AR devices.When the participants were ready, they received their first task.As soon as the discussion came to an end, participants were asked whether they were satisfied with their collective answer and then given the second task.When the group had again reached a consensus on their answer, they were instructed to take off their respective device and fill out the SSQ, the NASA-TLX, the UEQ and the subjective measures questionnaire.When all participants had completed the questionnaires, they switched to the next device in a clockwise order and started with the next task.This process was repeated until all three participants of a group had worked with each of the devices and filled out the questionnaires.Then participants filled out the demographic questionnaire and the semi-structured qualitative group interview was conducted.Finally, participants were thanked for their time and handed a small treat.

RESULTS
In this section we first state the quantitative results from the ranking, standardised questionnaires, subjective measure, speech time and interaction logs.Then we continue with the qualitative results from the semi-structured interview and the observation.

Quantitative Results
Here we report on the results of the quantitative data analysis.Before testing for significant differences, we used Shapiro-Wilk tests and visual plots to check for normality of the data.When data was normally distributed, we used an ANOVA with post-hoc Tukey-HSD test.When data was not normally distributed, we administered a Friedman test and applied the Bonferroni correction of the significance value for multiple tests for the post-hoc pairwise comparisons and calculated Cohen's d for effect sizes, except for ranking, where we used Cramer's V, since both variables had more than two categories.For comparing two nominal variables we calculated Pearson's χ 2 and conducted a post-hoc test using the Bonferroni correction for the analysis of the interaction log data and for Ranking we used Fisher's exact due to the smaller sample size.We only report on significant results.There are some cases with a significant result where the pairwise comparison did not reveal any significant differences.This is most likely due to the strictness of the Bonferroni correction, which is used to avoid type 1 errors of accepting false positives when multiple tests are applied.
Ranking In the ranking of favourite to least favourite display device, we found a significant association between the ranking and the device type (p = 0.001, V = 0.43).The OST device was clearly preferred by the participants as it was ranked in first place by twelve participants, see Fig. 6 (A).This has also been confirmed in the post-hoc evaluation as the OST device was ranked in first place significantly more often than expected, see Fig. 7.In second place is the HH device, which was placed in this spot by eleven participants, which also occurred significantly more often than expected according to the post-hoc test depicted in Fig. 7. Finally, the VST device was ranked by ten participants to come in at third place.Nevertheless, it also received more rankings for first place than the HH device.Overall, the OST device received a mean rank of 1.5 (SD = 0.76), the HH device got an average rank of 2.17 (SD = 0.60) and for the VST device it was an average of 2.33 (SD = 0.82).
SSQ We found a significant difference in the disorientation subscale (χ 2 (3) = 19.50,p < 0.001) with the post-hoc test revealing a significant effect between the VST condition and the baseline value (z = −1.22,p adjusted = 0.027, d = 0.29).Additionally, there was a significant difference in the nausea scale (χ 2 (3) = 8.80, p < 0.032), which was not found in the pairwise comparison with Bonferroni adjusted significance level.For the total score we also found a significant effect (χ 2 (3) = 18.15, p < 0.001) where the difference in the pairwise comparison was significant between HH and VST (z = −1.39 Fig. 8. Therefore, VST led to a significantly higher simulator sickness score. NASA-TLX There was a significant difference in the temporal subscale (χ 2 (2) = 6.50, p < 0.039) which did not have any significant results in the pairwise comparison.In the frustration subscale we found a significant difference (χ 2 (2) = 6.92, p < 0.031) which stems from the difference in frustration between the OST and the HH device (z = 0.81, p adjusted = 0.047, d = 0.19), see Fig. 9 (A).
Subjective Measures [22] We found a difference for three of the five questions of this questionnaire.For the question on how natural the view on the world is, there was a significant difference (χ 2 (2) = 18.66, p < 0.001), where the VST received a lower score (M = 2.33, SD = 1.85) than both the HH (z = 1.25, p adjusted = 0.001, d = 0.29) and OST device (z = 1.08, p adjusted = 0.003, d = 0.25), see Fig. 9 (B).For the questions on ease of viewing their teammates (χ 2 (2) = 6.00, p = 0.050) and on ability to effectively communicate while using the respective device (χ 2 (2) = 6.87, p = 0.032) there was no significant difference in the post-hoc test with Bonferroni adjusted significance level.
Speech Times For the speech times there was no significant difference between the devices.
Interaction with Notebook and AR Space Based on the frequencies of events in the system log data, we found a significant relation between the device type and the interaction with the notebook and AR space (χ 2 (2, N = 4745) = 277.89,p < 0.001).The post-hoc test then revealed significant deviations from the expected frequency for all the comparisons.This means that for the HH and the OST device, there was less frequent interaction with the AR space than expected, but more frequent interaction with the notebook than expected.For the VST it was the opposite, as there was more interaction with the AR space than expected and less with the notebook, see Fig. 6 (B).

Qualitative Results
For the qualitative data analysis, two researchers attended the interviews in person and kept handwritten logs, while a third researcher watched the recordings of the study without the interviews.The main researcher then transcribed the interviews and grouped the feedback according to the discussed topics.Finally, the main researcher and the third researcher discussed the results emerging from the interviews and the observation during the tasks.

Collaboration
In the semi-structured interviews, 13 participants reported that the devices did not influence verbal communication.Nevertheless, the shared AR representation between the participants also encouraged communication (1/18).If there had been more collaboration required in the physical space, such as collaborative writing on a whiteboard, the devices could have made more of a difference (2/18).On the other hand, participants reported that the interaction modalities did have an influence on how well they could participate in the collaboration.While using the HH device users felt that they could not really participate in the data analysis as it provided too little overview and had to be put down when interacting with the notebook or writing something down (3/18) and that they could not freely participate because it was not possible to simultaneously manipulate the plot on the notebook and view the plot in AR (6/18).Additionally, when looking at the tablet, they did not really know what their teammates were doing as their eyes would focus on the tablet instead of looking into the room through the tablet (2/18).The VST device, on the other hand, limited the users' freedom of movement (6/18) which we also observed during the study.VST users rarely moved from their side of the table, while HH and ML users often walked around the table to look at the data from different perspectives.Furthermore, the motion blur of the VST distracted users from the task (2/18).While this perceived motion blur might be manageable in a single user scenario, it leads to considerable problems in a collaborative scenario, where users need to constantly react to their environment and due to other participants, the environment constantly changes.The OST device was not reported to have any negative influence on collaboration.
User Experience For the VST device participants reported that they enjoyed the higher resolution and larger FOV compared to the other devices (12/18) providing an extraordinary view on the digital objects in the AR scene.For interaction with the real environment participants reported major issues with motion blur of the device (11/18), as the cameras cannot focus as quickly as the human eye.Participants even reported dizziness and nausea, with two participants in different groups even using the same analogy, stating that it felt like they were drunk.Additionally, participants reported ergonomic problems as the VST device was very heavy (6/18) leading to physical discomfort.
As for the OST device, participants liked that it was lightweight (2/18) and the manipulation of the scatter plot using the notebook felt natural (4/18).However, ergonomic issue occurred as users needed to move their head a lot more (7/18) than with the VST device due to the small FOV and two users had issues with the fit of the HMD as well as the device heating up during use (2/18).
Since the HH device is the only one that is not head-worn, it does not cause issues with fitting different users.Nevertheless, having to put down the device for interaction with the real world, was perceived as very inconvenient (15/18).Additionally, users reported, that the HH device was heavy, especially since it had to be held up high in front of the head to see the data representation in AR (9/18).Furthermore, the HH device is perceived as not engaging and interesting enough (1/18).

Usage Patterns
Participants appreciated the VST's large FOV as it facilitated distinguishing between individual data points during selection in AR (4/18).Additionally, it provided a good overview of the data (12/18).Interaction with the real environment proved to be difficult as participants reported that reading the question on the screen and writing down the answer were barely possible with the VST (15/18).This was also revealed in our observation, as only ten answers out of 36 (6 questions per group) were written down using pen & paper by users when working with the VST.Since the HMD needed to be connected to a PC with a cable, participants felt limited in their physical movement (6/18).
For the OST device, users emphasised that they could naturally interact with the notebook while still immediately seeing the changes in the AR space (4/18).Moreover, it does not introduce any problems with writing or reading in the real environment (1/18) and since it is wireless, it is comfortable for walking around the physical space (2/18).Therefore, users felt that this device category provided a good middle ground for both interaction in the real environment and interaction with the AR space (12/18).Since the resolution and colour representation as well as the FOV are not as good as in the VST device, it is harder to distinguish overlapping data points (2/18), making it less suitable for data analysis and selection in the AR space than the VST device.
As opposed to the other display types, the HH device provides the unique opportunity of two users sharing the exact same view of the AR space by looking at the device (1/18), which was observed during the study in G1.Furthermore, the HH device is also the only one to provide a completely unmediated view and interaction possibility in the real environment, by simply putting the device down (4/18).This also led to users writing down 23 out of the 36 answers in the HH condition.Nevertheless, this opportunity can be perceived as very inconvenient (15/18) as users need to reorient themselves in the AR graph every time they pick up the device again.Therefore, participants reported that they predominantly worked in the 2D space on the notebook while they were in the HH condition (6/18).Holding the HH display to see the AR representation also makes it more difficult to select specific points in the AR space, as one hand needs to hold the HH device still while the other taps on the screen (12/18).The missing stereoscopic view then adds an additional obstacle to the data analysis in AR, as data points are not as distinguishable as they are with the HMD devices (2/18).Using the HH device also leads to a keyhole navigation, where the user only sees the AR space through a small window, which negatively impacts the overview over the whole plot (2/18).The accumulation of all these factors even led users to disregard the HH device when working with it (6/18) or give up on the task completely (1/18).
General Observations In the interview the groups reported two different types of strategies.Two groups preferred to start sharing their view early on and then collaboratively discuss and answer the questions (G3, G4).The four remaining groups, on the other hand, stated that they preferred to look at the data individually and then discuss whether everybody reached the same conclusion (G1, G2, G5, G6).Although these groups described their strategy similarly in the interview, in the observation we saw that G6 did in fact share their view a lot and would rather fall under the first category.Furthermore, there was a great difference in the way the other three groups (G1, G2, G5) followed their strategy.While one group worked almost exclusively individually and only checked whether everyone reached the same conclusion (G5) the other two groups discussed the data constantly while mainly looking at their personal views (G1, G2).
Additionally, the groups developed strategies to facilitate discovering correlations.Therefore, participants reduced the complexity of the data by filtering the data as much as possible (G1, G3), by looking at 2D representations of the data (G2, G3, G6), and by extensively discussing which dimension should be displayed on which axis (G4, G6).

DISCUSSION
After the reporting of the quantitative and qualitative results, we now connect these results to our research questions to reach comprehensive conclusions.

RQ1: How do the different display devices influence the communication and collaboration behaviour of the users?
The main tool for collaboration in our study was verbal communication, which was not influenced by the devices, according to our qualitative interview data, the quantitative questionnaire data, and the quantitative measure of speech time.However, the different devices influenced how much participants felt they could participate in the collaboration, as the HH device limits the collaborative awareness and the ability to interact with the 2D and 3D space at the same time.The VST device, on the other hand, limited the users' freedom of movement which was both reported in the interviews and observed during the study.Additionally, the motion blur of the VST was amplified by the frequent changes in a collaborative scenario.This is also reflected in the subjective measures questionnaire as for the VST, the view on the real world was significantly less natural than with the other two devices.The OST device was not reported to have any negative influence on collaboration.

RQ2: How does user experience in a collaborative immersive analytics task differ between the AR display types?
In the interviews, participants reported that with its high resolution and large FOV, the VST was especially useful for viewing digital objects.However, the motion blur influenced participants negatively, leading to reports of dizziness and nausea, which also reflects in the SSQ scores.Here we found that the VST device led to higher disorientation than the baseline and in the total score it revealed significantly higher simulator sickness than any other device.This may also be the reason for participants rating this device as confusing and difficult to learn, leading to a significantly lower mean score for the VST device in the perspicuity scale than for the HH device, meaning that they felt it was harder to get familiar with it.In addition to these issues with simulator sickness and getting familiar with the device, the weight of the VST device caused physical discomfort.The OST device was the favourite device of twelve participants, see Fig. 6 (A), due to its low weight and the feeling of natural interaction with the 2D interface.Overall, this led to the lowest overall frustration level in the NASA-TLX, significantly lower than the HH device.
In the interviews, the interaction with the real world using the HH device was reported to be inconvenient and when manipulating the 3D environment, the device was reported as too heavy.Furthermore, it was perceived as not engaging and interesting enough, which is also reflected in its significantly lower scores in the stimulation and novelty scales of the UEQ.

RQ3: Do users exhibit different usage patterns in a
collaborative data analysis scenario with each of the AR display devices?
The VST device was especially useful for data selection in the AR space, as the analysis of the system log data revealed, see Fig. 6 (B).This was further confirmed by the qualitative interviews where users reported that the large FOV and high resolution facilitated the interaction with the 3D space.However, reading and writing in the real world were barely possible and motion was constricted by the cable connection of the device.This was also observed during the study.The OST device, on the other hand, was especially well suited for manipulating the graph on the notebook, as the interaction log data showed.This was also confirmed in the interviews as users felt that they could naturally interact with the physical environment and walk around.Nonetheless, in comparison to the VST the resolution and FOV are reduced, resulting in less overview and difficulties with distinguishing data points in the virtual space.
In contrast to the other devices, the HH device allows more than one user to share the same view on the 3D space, which was also observed in one group.Moreover, the possibility of an unmediated view on the real environment led to users writing down more of the answers when using this device.This also entails several inconvenient aspects that users reported in the interviews.For example, it was more difficult to select a specific data point in the AR space as it required both hands and holding the HH device very still.This led to users predominantly working in the 2D space when they were in the HH condition, which we also found in the interaction log data, see Fig. 6 (B).Thus, the HH device is not well suited as an AR display device in visual data analysis.

Recommendations and Key Findings
The key findings of the study relate to how the different device types can be used in a collaborative data analysis scenario in AR.
• There is no single best display device.Each of the device types exhibits advantages and disadvantages which reflect on the collaboration.It depends on the use case whether a device type is suitable or not.
• The VST device is suitable for the data analysis and data selection in the AR space due to its resolution and FOV, but interaction with the physical environment is still challenging because of the limited real-world clarity introduced by the camera specification.We recommend VSTs for scenarios that focus on the visual data analysis and where change in the environment is limited.
• OST devices provide a good trade-off between data analysis capabilities in the AR space, interactivity with the real environment and ergonomics.However, they are not as capable as the VST for the data analysis and do not fit every user perfectly.We recommend this device for scenarios where the combination of 2D and 3D interaction is important and suggest optimising the colour scheme in the data representation to be distinguishable for the specific hardware.
• The HH device is suitable for interacting with the real environment and for sharing the same view but is not well suited for the data analysis and interaction in the AR space.We recommend these devices for collaborative discussions where the main data analysis has already been performed.
• In a collaborative scenario, the motion blur of the VST can lead to considerable simulator sickness and overall discomfort.
• There was no direct influence on verbal communication between the different device types.
• In an asymmetric collaborative scenario, a user with an HH device needs to make more of an effort to participate in the AR space, as they need to hold up their display constantly to access this AR environment.This can lead to team members feeling left out or giving up on collaborating.

LIMITATIONS AND FUTURE WORK
To measure collaboration, we focused our evaluation on verbal communication.In our mixed-methods approach, we used speech time as a quantitative objective measure, a custom questionnaire adapted from Kiyokawa et al. [22] as subjective quantitative measure and semistructured interviews as a subjective qualitative measure.We expected users to mainly collaborate using verbal communication, as the prototype did not include extensive awareness features.We could not find any quantitative difference in our study.However, this only considers a small portion of collaborative data analysis, with other aspects that should be investigated in future work.A quantitative study could include different metrics of verbal and non-verbal communication as well as different task types and investigate possible task dependence of the relationship between communication and display device.A qualitative study could focus on a single, more complex task and elaborate on collaborative coupling, information sharing and territoriality.We chose a wired VST device to achieve the best possible display quality.Even though we instructed participants that the cable was long enough so they could reach the borders of our tracking space, the wire still limited how users moved in the room.As display hardware improves constantly, a future study should consider using a wireless VST HMD with sufficient display quality, which was not available at the time of our study.
Another limitation of our study is the inclusion of three participants who reported on experiencing blur at the end of the study, despite not reporting it when asked when they were first introduced to each device.We still included those participants as our measures focused on investigating collaboration and did not use measures like task completion time and error rate, which would have been influenced very directly.Additionally, participants reported that the experienced blur did not influence their ability to complete the task.Nevertheless, this is a limitation to our study.
It would also be useful to examine the impact of different factors, such as FOV or display resolution, within each display class on collaboration using a quantitative comparative approach.
Future studies should also include more stable tracking methods for the HH device since we employed Varjo's pre-defined fiducial marker as our reference cube for optical tracking.This marker contains a limited number of features compared to traditional ARCore markers.As a result, the tracking experienced reduced robustness, leading to occasional automatic recalibration phases during the user study.Additionally, recalibration was triggered when participants set aside the tablet completely.

CONCLUSION
We explored collaboration in AR using asymmetric display categories and therefore included HH, OST and VST devices in a use case for visual data analysis.To examine the influence of these device categories on collaboration, user experience and usage patterns from multiple perspectives, we conducted a collaborative within-subjects mixed-methods user study.There, we collected both qualitative and quantitative as well as subjective and objective data.We found different usage patterns with OST devices being the most suitable for the combination of interacting with virtual and real objects.Furthermore, user experience was most influenced by the factors of simulator sickness and frustration with device handling.These individual factors of user experience and usage patterns for each display class then influenced the users' capabilities to participate in the collaboration.

Fig. 4 :
Fig. 4: Device configuration and prototype layout.1. HH with notebook, 2. VST with notebook, 3. OST with notebook, 4. Reference marker and data visualisation as augmentation, 5. Instruction screen for user study tasks, 6. Server to provide a client-server-based network communication.

Fig. 5 :
Fig. 5: Data visualisation captured from the Varjo XR-3 with details-ondemand billboard and notebook to configure and filter the data representation.The visualisation is anchored to the fiducial marker.

Fig. 6 :
Fig. 6: (A) Ranking of the display categories by participants, (B) Statistically expected and real sum of interactions with the Notebook and the AR Space with the brackets showing significant results at the 0.05 level.

Fig. 7 :
Fig. 7: Statistically expected and real device rankings with the brackets showing significant results.

Fig. 8 :
Fig.8: Results from the SSQ with the brackets showing significant results at the 0.05 level.

Fig. 9 :
Fig. 9: (A) Results from the NASA-TLX frustration scale, (B) Question 1 of the subjective measures questionnaire adapted from Kiyokawa et al.[22] with the brackets showing significant results at the 0.05 level.

Fig. 10 :
Fig. 10: Results from the perspicuity, stimulation and novelty scales of the UEQ with the brackets showing significant results at the 0.05 level.