Handwriting for Text Input and the Impact of XR Displays, Surface Alignments, and Sentence Complexities

Text input is desirable across various eXtended Reality (XR) use cases and is particularly crucial for knowledge and office work. This article compares handwriting text input between Virtual Reality (VR) and Video See-Through Augmented Reality (VST AR), facilitated by physically aligned and mid-air surfaces when writing simple and complex sentences. In a $2\times 2\times 2$ experimental design, 72 participants performed two ten-minute handwriting sessions, each including ten simple and ten complex sentences representing text input in real-world scenarios. Our developed handwriting application supports different XR displays, surface alignments, and handwriting recognition based on digital ink. We evaluated usability, user experience, task load, text input performance, and handwriting style. Our results indicate high usability with a successful transfer of handwriting skills to the virtual domain. XR displays and surface alignments did not impact text input speed and error rate. However, sentence complexities did, with participants achieving higher input speeds and fewer errors for simple sentences (17.85 WPM, 0.51% MSD ER) than complex sentences (15.07 WPM, 1.74% MSD ER). Handwriting on physically aligned surfaces showed higher learnability and lower physical demand, making them more suitable for prolonged handwriting sessions. Handwriting on mid-air surfaces yielded higher novelty and stimulation ratings, which might diminish with more experience. Surface alignments and sentence complexities significantly affected handwriting style, leading to enlarged and more connected cursive writing in both mid-air and for simple sentences. The study also demonstrated the benefits of using XR controllers in a pen-like posture to mimic styluses and pressure-sensitive tips on physical surfaces for input detection. We additionally provide a phrase set of simple and complex sentences as a basis for future text input studies, which can be expanded and adapted.


INTRODUCTION
Text input is desirable across various eXtended Reality (XR) use cases and is particularly crucial for knowledge and office work.XR overcomes the physical limitations of traditional seated and stationary workspaces, enabling flexible and portable work in Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR), for short, XR.While XR workspaces can be employed everywhere [6,24,33,42,43,50], they also require versatile text input techniques.Despite virtual and physical keyboards, handwriting text input has great potential, offering unique advantages like natural intuitiveness, flexibility, and expressiveness.Handwriting in XR also benefits from prior knowledge acquired through the use of pen and paper or styluses on tablets.Furthermore, handwriting enhances cognitive engagement, improves understanding, and supports memory retention [51,53], making it valuable for education and work.For VR and Optical See-Through (OST) AR, previous work demonstrated the potential of handwritten text input using handwriting recognition [18,19,21,23,65].Nevertheless, handwriting text input in Video See-Through (VST) AR remains underexplored.Handwriting typically takes place on physical surfaces, providing passive haptic feedback that supports fine-grained movement, improves performance [15,67], and reduces arm fatigue [31,71,72].Consequently, physical surfaces are also integrated in XR.However, text input on midair surfaces is essential for flexible and non-stationary XR use cases.
Furthermore, text input studies often include transcription tasks, requiring participants to copy predefined sentences.While participants in previous handwriting studies transcribed simple text phrases by writing individual letters [18,19,58], words [21,65] or sentences [23], this does not necessarily represent real-world scenarios.Therefore, the impact of sentence complexities on handwriting text input in XR is unclear.
To address these gaps, we investigated the following research questions: (1) How do XR displays (VR and VST AR) affect usability, user experience, task load, text input performance, and handwriting style?(2) How do surface alignments (physically aligned and mid-air surfaces) affect usability, user experience, task load, text input performance, and handwriting style?(3) How do sentence complexities (simple and complex sentences) affect usability, user experience, task load, text input performance, and handwriting style?
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
This work provides novel insights into handwriting for text input in VR and VST AR, facilitated by physically aligned and mid-air surfaces when writing simple and complex sentences.We assessed handwriting performance by text input speed and error rate and identified characteristics of handwriting style using stroke-level analysis.We collected subjective user ratings through questionnaires to evaluate usability, user experience, and task load.From our results, we derived three key findings and discussed four design implications to support developers and researchers in using handwriting text input for future XR use cases.We also highlight the benefits of XR controllers as styluses and demonstrate the use of pressure-sensitive tips on physical surfaces for input detection.We additionally provide a phrase set of simple and complex sentences for text input studies, which can be expanded and adapted.

RELATED WORK
Advancements in VR, MR, AR, for short, XR, overcome physical limitations and enable flexible and portable workspaces that can be employed everywhere [6,24,33,42,43,50].While VR presents a purely virtual environment, OST AR enables users to perceive the real world with virtual content optically superimposed.In contrast, VST AR, or AR passthrough, shows the real world to the user captured by the front-facing cameras of the device and allows an opaque and integrated overlay of virtual content.MR is considered an advanced form, merging [69] or situating [62] virtual content in the real world to facilitate context-aware interaction and response to the physical surroundings.Consequently, the visual-sensory display features can affect the design of text input techniques in XR for both typing and handwriting.The most common text input solutions are physical [25,38,49,55] and virtual keyboards [17,33].They are widely available, benefit from prior knowledge, and offer efficient text input performance.Besides the prevalence of keyboards, handwriting emerges as a promising candidate for text input in XR [18,21,65], offering unique advantages like natural intuitiveness, flexibility, and expressiveness.

Handwriting for Text Input
Handwriting, typically developed in childhood [22,37], requires fine and precise movements, which activates neural processes and brain activation patterns associated with learning, creativity, and cognition [54,63].In digital spaces, handwriting is visualized by digital ink, offering a natural writing experience by mimicking the fluidity and responsiveness of pen and paper in the real world.Additionally, digital note-taking with a stylus and tablet showed similar benefits to pen and paper [51,52], forming the basis for handwriting in XR.For activities like searching, organizing, and editing content, handwriting recognition is essential to transform digital ink into a textual representation.Common use cases are note-taking, document editing, entering emails, phone numbers, and passwords, or browsing the internet [14,31,64].
Handwriting recognition can be categorized into online and offline recognition [12,57,64].Offline handwriting recognition, like Optical Character Recognition (OCR), interprets handwriting based on shapes and patterns in static data after writing is completed [12,28].Online handwriting recognition, or digital ink recognition, analyzes the handwritten trajectory during the writing process, considering dynamic and contextual information like position, direction, velocity, and acceleration [36].Digital ink recognition also enables text composition and dynamic document editing via gestures.Examples are erasing by scratching out, separating or joining words using vertical lines, and formatting by underlining, italicizing, or bolding.As a result, digital ink recognition offers greater flexibility than offline handwriting recognition, making it a more suitable choice for text input in XR.Moreover, digital ink also forms the basis for stroke-level analysis in user studies.

Handwriting for Text Input in XR
In 1998, Poupyrev et al. [58] introduced the Virtual Notepad, a pioneering tool for handwritten notes in VR.Users navigated the notebook by writing individual letter commands using handwriting recognition.Although the handwriting recognition capabilities were limited, the authors acknowledged its future potential for handwritten text input.
González et al. [23] conducted a study using letter-based online handwriting recognition for text input in VR.Participants transcribed sentences using a pen and tablet placed on a physical surface in a horizontal orientation.Corrections were omitted to preserve consistent writing speeds.With a text input speed of about 2.3 words per minute (WPM) and an error rate (ER) of 23%, the study marks a successful implementation of handwriting recognition in VR.However, this work also showed the impact of recognition model constraints by requiring participants to write each letter individually and adapt their handwriting style to predefined shapes to minimize recognition errors.
Elmgren [18] investigated letter-based recognition for handwritten text input in VR.Participants transcribed a pangram using distancebased raycasting with controllers on a vertical surface in mid-air without error correction.Although the input speed of 4.16 WPM (36.86%ER) nearly doubled compared to prior work [23], it remained relatively slow.The authors explained this with letter-based handwriting and an artificial pause before recognition.Further, they noted that handwriting in mid-air via raycasting could be challenging due to hand tremors and lack of physical support, which may lead to increased error rates.
Venkatakrishnan et al. [65] evaluated the effects of input methods, canvas geometries, and inking triggers on word-based VR handwriting on vertical surfaces.They used online handwriting recognition and enabled corrections by deleting the last stroke.Handwriting with controllers in direct contact outperformed raycasting and finger use regarding input speed (10.33 WPM), accuracy (99.50%), and overall NASA TLX workload (32.33).Writing using fingers was significantly faster on physical surfaces (8.10 WPM) than on mid-air surfaces (7.23 WPM) with a trigger button.These results emphasize the impact of input methods and passive haptic feedback on handwriting performance and underline the importance of a thoughtful selection.
Fourrier et al. [21] examined handwriting text input in VR using online handwriting recognition.The study focused on vertical and slanted surfaces in mid-air and visual, auditive, and active haptic feedback.Participants wrote individual words from predefined sentences and triggered handwriting by direct contact with the mid-air surface.For all conditions, usability was considered good to excellent (80.05) using the SUS.Slanted surfaces yielded the highest text input speeds (14.15 WPM), the lowest physical demands (42.31), and the best overall NASA TLX workload scores (40.11).Multimodal feedback reduced error rates to 1.05%, although some participants felt overwhelmed by the sounds and vibrations.This suggests a potential benefit in reducing feedback cues.The authors proposed to improve text editing by reducing the erasing granularity from rewriting entire words to more fine-grained correction (e.g., strokes).Similar to previous work, this study underlines the great potential of handwriting text input in VR, with high usability, low task load, and reasonable error rates.Nevertheless, further flexibility in error correction and more adaptive and accurate handwriting recognition are desirable.
For OST AR, Fang et al. [19] developed handwriting velcro, a wearable system with touch sensors for fingers using letter-based handwriting recognition.Participants achieved 12.32 WPM (97.21% accuracy), highlighting the applicability of AR handwriting on physical surfaces.
However, handwriting in VR and OST AR was still notably slower than with pen and paper at 30.36 WPM [21] and 21.9 WPM [11] for transcription tasks, 31.1 WPM [11] with memorized sentences, and 21.5 WPM [39] with styluses on laptop touchscreens.
The evident room for improvement and the merging boundaries between VR and VST AR require a profound understanding of impact factors.Handwriting has been explored in VR, but remained underexplored in VST AR, particularly regarding visual incongruencies [33,41,44].The impact of surface alignments also needs more investigation, with previous research preferring physical surfaces over mid-air surfaces for tasks requiring precision, fine-grained movements [5,56], and low physical demand [71,72].Furthermore, prior work limited handwriting text input to transcribing simple text phrases by writing individual letters [18,19,58], words [21,65] or sentences [23].While writing individual entities or simple sentences represents fundamental parts of text input, it does not necessarily reflect the complexity of text input in real-world scenarios.

Visual Incongruencies in AR
Handwriting text input has been explored in VR [18,21,23,65] and OST AR [19], but remained underexplored in VST AR.In VST AR, a significant challenge is the visual augmentations of the real-world environment, which can lead to visual incongruencies [33,44], impacting performance and user experience [4,33].Common issues are depth distortion and object misplacement [41,59,68], forming the basis for many other visual incongruencies experienced by users.Adams et al. [1] found that depth perception is a fundamental aspect of AR and varies across displays.They explain differences by the diverse cues and characteristics specific to each AR device, which can affect distance judgment.Additionally, factors like misalignment in VST AR, magnification or minification effects through cameras, and even the weight of Head-Mounted Displays (HMDs) could significantly influence users' spatial understanding.Pham and Stuerzlinger [56] revealed performance discrepancies between VR and OST AR, with VR users significantly outperforming their AR counterparts in pointing tasks, likely due to fewer visual inconsistencies.Similarly, Kern et al. [33] observed faster text input in VR compared to VST AR when using virtual tap and swipe keyboards.The authors attributed this difference to visual mismatches inherent in VST AR that can affect depth perception and, consequently, performance and user experience.While these studies showed that visual incongruences in AR can influence performance and user experience, their impact on handwriting for text input in XR needs to be investigated.

Physical and Mid-Air Surfaces
Physical surfaces are omnipresent in traditional knowledge and office work and will be leveraged in future XR workspaces.Physical surfaces support precise and fine-grained movements, which are recommended for tasks like sketching [3,13], handwriting [30,31], and interactions with physical and virtual keyboards [17,38].In contrast, mid-air surfaces offer more flexibility but lack the passive haptic feedback and stability of physical surfaces, linked to higher performance [15,67] and reduced task load, particularly due to arm fatigue [31,71,72].For handwritten text input in VR, previous work investigated the impact of surface alignments and orientations.Venkatakrishnan et al. [65] found that VR handwritten text input with fingers was significantly more performant on physical surfaces compared to mid-air surfaces using a trigger button.Employing controllers for handwriting on mid-air surfaces resulted in higher performance and lower overall workload than finger use.This suggests an influence of both input methods and physical and mid-air surfaces.Fourrier et al. [21] showed that surface orientation also significantly influenced mid-air handwriting.Slanted surfaces achieved better results than vertical surfaces regarding performance and physical demand using controllers.Horizontal surfaces were not examined due to usability issues when standing.These findings indicate that handwriting on physical surfaces outperforms mid-air surfaces and that controllers are superior to fingers.Additionally, surface orientations towards more horizontal inclinations seem preferable.Notably, these studies did not examine handwriting with controllers on physical surfaces and horizontal surface orientations as well as finger-based raycasting, leaving substantial research gaps.

Sentence Complexities
Evaluating handwriting for text input in XR often involved transcription tasks [40], in which participants had to write simple text phrases by individual letters [18,19,58], words [21,65] or sentences [23].Although writing letters or words is a fundamental part of text input, it does not necessarily reflect complex real-world scenarios.Phrase sets address this limitation by providing different levels of sentence complexity.The MacKenzie and Soukoreff phrase set [47] provides simple sentences, omitting capitalization and punctuation.In contrast, the EnronMobile phrase set created by Vertanen and Kristensson [66] contains simple and complex phrases with uppercase and lowercase letters, digits, punctuation marks, and other symbols.Curran et al. [16] investigated sentence complexities for text input on mobile phones.They used simple phrases with lowercase alphabetical characters.Complex phrases also included uppercase, digits, parentheses, punctuation marks, and other symbols.
The results showed that simple phrases led to higher typing speed and lower error rates than complex phrases.These findings are in line with results from Pham and Stuerzlinger [55] evaluating physical keyboards, where participants typed lowercase simple sentences significantly faster than complex sentences in letter case, digits, parentheses, punctuation marks, and other symbols.While keyboards require the use of modifier keys for uppercase or digits, handwriting faces interruptions due to single strokes for complex units (e.g., numbers, parentheses, or colons).Therefore, simple sentences might lead to more fluent and connected cursive handwriting than complex sentences, which can result in higher performance and adapted handwriting style regarding the number, length, and height of strokes.However, the impact of sentence complexity on XR handwriting text input was not evaluated.

XR Controllers as Styluses
Previous work successfully used controllers in a power grip for handwriting text input in VR [18,21,65], the traditional gripping technique of input devices [5,31].However, handwriting tools are usually held in a pen-like posture, the so-called precision grip, enabling fine-grained and intricate movements using wrist and fingers [31,61].While the power grip offers more stability, it restricts hand and wrist mobility, requires more arm strength, and involves larger movements than fine adjustments with fingers [5,31].Consequently, prior work recommend using XR styluses and controllers in precision grip for tasks requiring finegrained movements and performance.Examples are pointing [5,56], drawing [13], handwriting [31], and keyboard typing [33].Additionally, input devices can support pressure-sensitive stylus tips, like the Logitech VR Ink or Meta Quest Touch Pro Controllers, which are highly beneficial for input detection on physical surfaces [8,9].Although the potential of XR controllers as styluses is clearly evident, their use for handwriting text input in XR remains underexplored.

METHODS
Our study evaluated the impact of XR displays, surface alignments, and sentence complexities on handwriting text input in a structured study using qualitative questionnaires and quantitative performance measures.An ethics application was submitted to the ethics committee of the Institute for Human-Machine-Media (MCM) at the University of Würzburg, which found the study to be ethically unobjectionable.

Hypotheses
For our first research question, addressing the impact of XR displays, we expect higher text input speed, usability, and user experience in VR than in VST AR.

H1.3: VR leads to higher user experience (UEQ) than VST AR.
With our second research question focusing on surface alignments, we expect higher text input speed, usability, and user experience ratings for physically aligned surfaces (PA) and increased physical demand for mid-air surfaces (MA).Furthermore, we expect an enlarged and more connected cursive handwriting style in mid-air.
H2.1: PA lead to higher text input speed (WPM) than MA.
H2.5: PA lead to higher stroke counts than MA.
H2.6: MA lead to higher stroke lengths than PA.H2.7: MA lead to higher stroke heights than PA.
For our third research question, targeting the impact of sentence complexities, we expect higher text input speed and fewer errors for simple sentences (SS) than for complex sentences (CS).We also expect an enlarged and more connected cursive handwriting style when writing simple sentences than complex sentences.
H3.1: SS lead to higher text input speed (WPM) than CS.
H3.2: CS lead to higher error rates (MSD ER) than SS.
H3.3: CS lead to higher stroke counts than SS.
H3.4: SS lead to higher stroke lengths than CS.
H3.5: SS lead to higher stroke heights than CS.

Participants
We recruited 72 participants via the university's participant management system.They received either 15 euros or student credit points.Inclusion criteria were fluency in German and an age of 18 years or older.We utilized the Simulator Sickness Questionnaire (SSQ) [29] to assess well-being.No participants were excluded due to a high difference before and after the experiment or consistently high scores.Participants' ages ranged from 18 to 48 years, with an average of 23.79 years and a standard deviation of 4.90 years.Regarding social gender, 40 identified as female, 32 as male, and none as non-binary.69 were right-handed and three left-handed.The vast majority, 61 participants, were students.Four were employed, two were job-seekers, two were pupils, and one had other employment.69 participants reported not being color-blind, while three had red-green blindness.None reported hearing impairments.41 had no visual impairment, 21 used contact lenses, seven wore glasses, and three had uncorrected visual conditions.All participants reported fluency in the German language, the ability to think arithmetically, and no impairments in wrist movements.Two reported a limitation in literacy skills and one participant in linguistic abilities.Based on experiment observations, these participants were not excluded.All participants used handwriting with pen and paper before, with 71 having more than 20 hours of experience.For handwriting on tablets, 30 participants have more than 20 hours of experience, while four have no experience.17 participants used tablets with handwriting recognition for more than 5 hours.17 participants never used handwriting recognition, and four never tried handwriting on a tablet.Of 67 participants who used VR and 44 who used AR, only 26 spent more than 5 hours in VR and only four in AR.Conversely, five participants never tried VR before, and 28 had no experience with AR.Interestingly, nine participants used handwriting in VR and five in AR, although 65 in VR and 68 in AR had no experience with handwriting recognition.

Apparatus
The study was conducted in a 20m 2 room to ensure adequate distance between the experimenter and participants.Participants were seated to enhance safety (see Figure 1).In the physically aligned surface condition, participants were positioned directly at the physical table, while in the mid-air condition, they were located one meter away from the table.On the physical table, we used the black Logitech VR Ink Drawing Mat in DIN A1 (594mm x 841mm) as writing surface, providing low and uniform friction and increasing future applicability.

Hardware Environment
We used the Meta Quest Pro with two Meta Quest Touch Pro controllers.A unique feature of the controllers is the native pressure-sensitive stylus tip.This feature makes the Meta Quest Pro an outstanding device for applications that require precise tracking and reliable contact detection on physical surfaces.To ensure sufficient battery life and avoid interruptions, we charged the Meta Quest Pro and its controllers with the docking station between experiment sessions.Our technical setup included an XMG Pro 17 laptop with an Intel Core i7-10875H CPU, 16GB DDR4 RAM, an NVIDIA GeForce RTX 2070 Super GPU, and a 1TB Samsung 970 EVO Plus NVMe M.2 SSD.Questionnaires were answered on another desktop computer using mouse and keyboard.We used a self-hosted instance of LimeSurvey 4 to collect responses.

Software Environment
We developed two applications: an XR handwriting application optimized for the Meta Quest Pro running on Android and an experimenter application using Microsoft Windows 10, enabling monitoring and management.Experimental log files were recorded on the XR device.We chose Unity 2020.3LTS for both applications.We imported the Oculus integration asset and based our implementation on the Reality Stack I/O framework [32].For providing web browser functionality, we used the Vuplex 3D WebView asset for Android and iOS.We opted for MyScript, an online handwriting recognition solution based on digital ink, and developed a web interface.We used the Exit Games Photon PUN2 asset and a self-hosted server instance to create a network connection between the Meta Quest Pro and the experimenter application.A stable Wi-Fi 6 wireless network connection was provided by the mobile hotspot capability of the laptop.
A precise alignment of the virtual surface on the physical table is essential for handwriting with passive haptic feedback.Therefore, we decided to use the publicly available Circle Refinement Technique (CRT) introduced by Kern et al. [35], which enhances the original 3ViSuAl technique [31] by recording numerous measure points in a circular movement to improve precision.We also installed four colored visual markers to support tracking stability.
The handwriting controller was held in a pen-like posture, the precision grip, to mimic a stylus, allowing more precise movements than the regular power grip [5,31,56] used in prior research [18,21,65].The other controller was held in power grip and had no interaction functionality but provided a visual reference point.Both virtual controller representations were visible in VR and VST AR.
Previous studies employed forward-direction raycasting to identify surface contact points for controllers and fingers [18,65].While this approach is typically used for distance-based interaction, it also introduces significant cursor movements with even a small controller/wrist displacement at the original position.Instead, we determined contact points by an orthogonal projection of the stylus tip on the virtual surface, preventing cursor displacements.We detected input intention for handwriting on mid-air surfaces by the orthogonal distance between the stylus tip and the virtual surface (i.e., less than 1cm in front and 5cm behind).For handwriting on the physical surface, we utilized the pressure-sensitive stylus tips of the Meta Quest Touch Pro controllers.
We provided visual feedback through digital ink on the virtual surface and added continuous controller vibrations when writing on mid-air surfaces.Passive haptic feedback was present when using physically aligned surfaces.Surface snapping of the controller was not implemented to prevent feeling disconnected when lifting the physical controller without an immediate response from the visual counterpart [8].
The virtual surface (DIN A2, 420mm x 594mm) consists of a handwritten area with horizontal lines at a distance of 3.5 cm.The control area shows the target sentence for the transcription and the currently recognized text below it.Two round buttons are used to switch between digital inking and erasing by scratching out the produced digital ink.The blue rectangular submit button confirms the transcribed text.
We created a realistic and correctly sized digital twin of the physical room to show a similar environment in VR and VST AR (see Figure 1 and Figure 2).We used Blender 3.5 and captured real-world textures with an Apple iPhone 13 Pro.In VR, participants saw the digital twin, which was implicitly aligned by the virtual surface.In VST AR, they viewed the real environment through the stereoscopic and perspectivecorrected video stream of the Meta Quest Pro's front-facing cameras.kern eT AL.: HAnDWrITInG FOr TeXT InPUT AnD THe IMPACT OF Xr DISPLAYS, SUrFACe ALIGnMenTS... Table 1: Our phrase set includes twenty sentences.For the ten simple sentences, we took the German translation of Kern et al. 2023 [33] based on the MacKenzie and Soukoreff corpus [47].For complex sentences, we self-developed ten phrases based on the EnronMobile corpus [66].Simple sentences include lower and uppercase letters, commas, and terminal punctuation at the end of a sentence.Complex sentences also contain numbers, internal punctuation within a sentence, and other symbols.German and English translations can vary due to language differences.

Simple Sentences
Complex Sentences

Design
We applied a 2x2x2 study design with two levels for the betweensubject factor XR displays (VR and VST AR), two levels for the withinsubject factor surface alignments (physically aligned and mid-air), and two levels for the within-subject factor sentence complexities (simple and complex sentences).We followed a strict procedure to minimize the experimenter's influence, with one person conducting the entire user study.We applied the applicable COVID-19 guidelines and believe these regulations did not affect the results of our study.After each participant, the room was ventilated, and all physical surfaces, including tables and electronic equipment, were disinfected.We informed participants that they could stop the experiment at any time if they felt uncomfortable.The experimenter also asked participants about their well-being before and after each exposure.We applied a systematic counterbalance, a type of quasi-randomization when assigning participants to the experimental conditions.This improves the internal validity of our analysis and reduces order and sequence effects.Additionally, we used social gender as a blocking variable to obtain similar sample sizes.We developed a phrase set for our user study, including simple and complex sentences (see Table 1).Simple sentences were taken from the MacKenzie and Soukoreff phrase set [47] and translated by Kern et al. [33] with correction of lower and uppercase, commas, and terminal punctuation at the end of the sentence (i.e., period, question mark, and exclamation mark).Complex sentences were self-developed based on the EnronMobile corpus [66] and additionally contain numbers, internal punctuation within the sentence (i.e., hyphens, colons, and parentheses), and other symbols.In total, we collected 72x22x2 = 3168 phrases.

Study Procedure
Figure 3 shows the study procedure.The duration of each study session was approximately 60 minutes.The experimenter assigned participants to either the VR or VST AR condition.In the beginning, participants had to read and give consent to the study information.They answered questions about demographics and prior experience, limitation of abilities, and completed the SSQ.Participants watched a video showing handwriting input and correction, and were instructed on using the Meta Quest Pro and the Meta Quest Touch Pro controllers.We used Meta's fitting procedure with lens spacing adjustment.The controller chosen by the participants for handwriting, typically the dominant hand, was held upside down in a pen-like posture, the precision grip.The other controller was held in the regular power grip.Participants were advised to enter text phrases as fast and error-free as possible.By touching the round buttons at the top center of the virtual surface, they could switch between digital inking and erasing.Before each handwriting session, the experimenter aligned the virtual surface using CRT [35] and confirmed the alignment using the experimenter application on the laptop.In the training phase, participants were asked to familiarize themselves with the system by writing two training sentences, "Ein Fuchs ist ein sehr schlaues Tier." ("A fox is a very smart animal.")and "Spielen wir eine Runde Karten?" ("Shall we play a round of cards?").
In the main phase, participants wrote ten simple sentences followed by ten complex sentences on each surface alignment (physically aligned and mid-air).Within each complexity, the order of sentences was randomized.The current sentence remains visible during handwriting.One handwriting session took approximately ten minutes.After each alignment, participants were asked to put down the XR device and to complete questionnaires (SSQ, RTLX, UEQ, and SUS).They also had the option to add comments or ask questions.Text input performance measures and stroke-level analysis were computed during the exposure or afterward using the experimental log files.The procedure was then repeated for the second surface alignment condition.

Measures
Text input performance was measured with Words Per Minute (WPM) and Minimum String Distance Error Rate (MSD ER), based on formulas provided by Arif and Stuerzlinger [2].As recommended, we omitted the first character subtraction in the calculation of WPM because, in our study, the input of the first character is timed.An analysis on the stroke level provides information about the individual's handwriting style and legibility, indicating whether the text was written in print or more connected cursive.For this, we included stroke height in millimeters (mm), representing vertical handwriting size, as well as stroke (path) length in millimeters (mm) and stroke count (also called pen lifts), indicating connected cursive handwriting.We also collected participants' subjective feedback using a series of well-established questionnaires.We used the Simulator Sickness Questionnaire (SSQ) [29] as a control measure for discomfort or unwanted symptoms that participants might experience during VR and VST AR exposures.We chose the User Experience Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Questionnaire (UEQ) [45,60] to evaluate the overall user experience in terms of attractiveness, perspicuity, efficiency, dependability, novelty, and stimulation.To assess perceived workload, we included the NASA Raw Task Load Index (RTLX) [26,27], assessing several dimensions of task load for mental, physical, and temporal demand, as well as performance, effort, and frustration.Finally, we measured usability with the System Usability Scale (SUS) [10] and an extension separating the SUS into a usable and learnable subscale [7,46].

RESULTS
Each model was evaluated against an alpha level of .05(*).For a more detailed view, we also consider p-values < .01(**) and p < .001(***).We found violations of assumptions essential to parametric statistical analysis, namely related to outliers, normality, and homoscedasticity.As a result, we adopted a robust statistical approach using the R package WRS2 [48].We used the bwtrim function of the package for robust twoway between-within (mixed) ANOVA and the rmanova function for robust one-way repeated measures ANOVA.Robust Cohen's d effect sizes were computed and can be categorized as 0.2 (small effect), 0.5 (medium effect), and 0.8 (large effect).Following Mair and Wilcox [48], we chose a trimmed mean of 20% because it has nearly equivalent power to the mean of a normally distributed sample and can lead to a substantially smaller standard error when outliers exist.Information about robust methods as an alternative to classic inferential analysis can be found in Field and Wilcox [20] and Wilcox [70].

System Usability Scale (SUS)
The robust mixed ANOVA models for the SUS subscales overall and usable revealed no significant interaction effects and no significant main effects for XR displays and surface alignments.The subscale learnable revealed no significant interaction effect but a significant main effect, with higher scores on physically aligned surfaces than in mid-air, F(1,41.64)= 6.06, p = .018,robust Cohen's d = -0.31.For XR displays, contrary to H1.2, the results suggest no differences in usability between VR and VST AR.For surface alignments, partially confirming H2.2, the results indicate comparable usability but easier learnability on physically aligned than mid-air surfaces.

User Experience Questionnaire (UEQ)
The robust mixed ANOVA models for all UEQ subscales revealed no significant interaction effects.We found no significant main effects for attractiveness, perspicuity, efficiency, and dependability, for XR displays and surface alignments.The subscales novelty and stimulation revealed significant main effects for surface alignments.Novelty ratings were significantly higher in mid-air than on physically aligned surfaces, F(1, 39.64) = 19.64,p < .001,robust Cohen's d = 0.55.Stimulation ratings were also significantly higher in mid-air than on physically aligned surfaces, F(1, 41.97) = 6.44, p = .015,robust Cohen's d = 0.25.For XR displays, contrary to H1.3, the results suggest similar user experiences in VR and VST AR.For surface alignments, contrary to H2.3, the results show similar user experience, except for higher novelty and stimulation for handwriting on mid-air surfaces than physically aligned, indicating participants' less prior knowledge.

NASA Raw Task Load Index (RTLX)
The robust mixed ANOVA models revealed no significant interaction effects and no significant main effects for the RTLX subscales mental demand, temporal demand, performance, effort, frustration, and overall, for XR displays and surface alignments.The subscale physical demand revealed no significant interaction effect but a significant main effect.Physical demands were significantly higher in mid-air than on physically aligned surfaces, F(1, 38.87) = 9.4, p = .004,robust Cohen's d = 0.36.For surface alignments, confirming H2.4, the results imply that handwriting is less fatiguing on physically aligned surfaces.

Simulator Sickness Questionnaire (SSQ)
For well-being, the robust mixed ANOVA model for the SSQ revealed no significant interaction effects and no significant main effects for XR displays and surface alignments, indicating participants' well-being.

Words Per Minute (WPM)
For all sentences (simple and complex), the robust mixed ANOVA models for WPM revealed no significant interaction effect and no significant main effects for XR displays and surface alignments.Analyzing simple and complex sentences separately, we also found no significant interaction effect and no significant main effects.Comparing sentences by complexity (regardless of XR displays and surface alignments), the robust repeated measures ANOVA model revealed a significant main effect with higher WPM for simple than complex kern eT AL.: HAnDWrITInG FOr TeXT InPUT AnD THe IMPACT OF Xr DISPLAYS, SUrFACe ALIGnMenTS...For sentence complexities, confirming H3.1, the results highlight that handwriting text input is faster with simple sentences than with complex sentences.

Minimum String Distance Error Rate (MSD ER)
For all sentences, the robust mixed ANOVA models for MSD ER revealed no significant interaction effect and no significant main effects for XR displays and surface alignments.Analyzing sentences separately, we also found no significant interaction effect and no significant main effects for XR displays and surface alignments.Comparing sentences by complexity, the robust repeated measures ANOVA model revealed a significant main effect with higher MSD ER for complex than simple sentences, F(1, 43) = 154.62,p < .001,robust Cohen's d = 0.75.For sentence complexities, confirming H3.2, the results suggest that more errors remain in the transcribed text when writing complex sentences than simple sentences.This also indicates higher cognitive and motor demands.

Stroke Count
For all sentences, the robust mixed ANOVA models for stroke counts revealed no significant interaction effect but a significant main effect for surface alignments, with higher stroke counts for physically aligned than mid-air surfaces, F(1, 41.93) = 144.37,p < .001,robust Cohen's d = -1.48.Analyzing sentences separately, we found no significant interaction effect but significant main effects for surface alignments with higher stroke counts on physically aligned than mid-air surfaces, for both simple sentences F(1, 41.06) = 133.54,p < .001,robust Cohen's d = -1.51,and complex sentences, F(1, 42.0) = 132.76,p < .001,robust Cohen's d = -1.42.Comparing sentences by complexity, the robust repeated measures ANOVA model revealed a significant main effect with higher stroke counts for complex sentences than simple sentences, F(1, 43.0) = 776.49,p < .001,robust Cohen's d = 0.91.For surface alignments, confirming H2.5, the results imply that handwriting is more connected on mid-air surfaces than on physically aligned surfaces.For sentence complexities, confirming H3.3, the results indicate that complex sentences require more strokes than simple sentences.

Stroke Length
For all sentences, the robust mixed ANOVA for stroke length in mm revealed no significant interaction effect but a significant main effect for surface alignments, with higher stroke lengths for mid-air than physically aligned surfaces, F(1, 41.99) = 87.42,p < .001,robust Cohen's d = 1.13.Analyzing sentences separately, we found no significant interaction effect but significant main effects for surface alignments with higher stroke lengths for mid-air than physically aligned surfaces, for both simple sentences F( For sentence complexities, confirming H3.4, the results also suggest that simple sentences enable longer strokes than complex sentences.

Stroke Height
For all sentences, the robust mixed ANOVA for stroke height in mm revealed no significant interaction effect but a significant main effect for surface alignments, with higher stroke heights for mid-air than physically aligned surfaces, F(

DISCUSSION
In this work, we compared handwriting text input between VR and VST AR (XR Displays), facilitated by physically aligned and mid-air surfaces (Surface Alignments) when writing simple and complex sentences (Sentence Complexities).Our results emphasize that handwriting is a promising text input technique, offering high usability, similar to previous work [21].Participants reported good to excellent user experience and perceived less physical demand and overall workload than in prior studies [21,65].Moreover, high perspicuity showed that it was easy to learn and familiarize with handwriting text input in XR.The averaged text input performance of 15.92 WPM and 1.13% MSD ER in VR, and 16.71 WPM with 1.35% MSD ER in VST AR, clearly surpassed earlier results in VR and OST AR [18,19,21,23,65], but also shows great potential for improvement compared to handwriting with pen and paper [11,21] and styluses on laptop screens [39].We summarize our key findings as follows: (1) XR displays did not affect text input performance and subjective user ratings.(2) Surface alignments yielded similar text input performance but significantly impacted physical demand, learnability, novelty, and stimulation, and led to adaptations in handwriting style.(3) Sentence complexities significantly influenced text input performance and handwriting style.
In the following, we present four design implications to support developers and researchers.Our considerations and suggestions aim to make handwriting text input more reliable and reusable in future XR use cases, which is particularly crucial for knowledge and office work.

Visual Incongruencies in VST AR
According to prior research [33,41,44,68], we expected that potential visual incongruencies in VST AR would adversely impact text input speed (H1.1), usability (H1.2), and user experience (H1.3).However, this was not the case in our study.We attribute this to using the Meta Quest Pro, a recent XR device that produces subjectively less depth distortion and camera magnification than the Varjo XR3 employed in previous work [33].This likely contributed to mitigating the negative effects of VST AR on handwriting text input.On the downside, the Meta Quest Pro compromises color fidelity by merging gray and color camera images, potentially affecting color perception.Based on our results, we assume that depth distortion and camera magnification have a greater impact on text input in VST AR than color fidelity.It also suggests that text input techniques perform differently across XR devices due to the varying visual fidelity of VST AR displays [33,41], which can significantly impact text input performance and user experience.
Consequently, we advise evaluating text input techniques for each relevant display modality and, if possible, also for target XR devices, as findings from one setting may not necessarily apply to another.

Desirable but not Indispensable Physical Surfaces
Our findings indicate that it was easy for participants to learn and familiarize themselves with XR handwriting text input on physically aligned and mid-air surfaces, reflecting a successful transfer of handwriting skills to the virtual domain.However, notably higher learnability ratings for physically aligned surfaces (H2.2) suggest that handwriting on a physical table more closely resembles pen and paper or tablet stylus work.This assumption is supported by significantly higher novelty and stimulation ratings for mid-air surfaces (H2.3), indicating less prior experience, which we expect to diminish with more practice.While text input speed (H2.1) and error rate were similar on physically aligned (16.21 WPM, 1.20% MSD ER) and mid-air surfaces (16.42 WPM, 1.28% MSD ER), participants adapted their handwriting style to enlarged (H2.7) and more connected cursive (H2.5, H2.6) handwriting on mid-air surfaces.We attribute this to the lack of passive haptic feedback or resistance, which reduces control over precise movements.Although we simulated passive haptic feedback by controller vibrations for mid-air surfaces, the tactile sensation of depth and distance is not replicated, making it difficult to perceive virtual surface contact accurately.Nevertheless, participants in our study still produced legible handwriting, minimizing the need for user corrections, which in turn can reduce error rates and increase input speeds.Figure 5 visualizes handwriting legibility and adaptions of handwriting style.The lower physical demand for handwriting on physically aligned than mid-air surfaces (H2.4) suggests that both surfaces are suitable for short handwriting sessions but also indicates that physical surfaces are preferable for prolonged handwriting.This is in line with prior work, recommending physical surfaces for tasks requiring precise and fine-grained movements [3,13,31] or reducing arm and shoulder fatigue [15,71,72].However, mid-air surfaces offer more flexibility, which can be particularly useful for non-stationary XR use cases [24,33,42].Therefore, physical surfaces for handwriting are desirable for their familiarity and low physical demand but are not indispensable for all XR use cases.We propose a balanced approach, using surface alignment techniques [31,35] to support physically aligned and mid-air surfaces in arbitrary orientations, accommodating user preferences and situations, as well as making handwriting text input reusable across XR devices and different input methods (e.g., controllers and fingers).

Representative Handwriting by Sentence Complexities
Sentence complexities significantly influenced text input performance, with higher input speeds (H3.1) and fewer errors (H3.2) for simple sentences (17.85 WPM, 0.51% MSD ER) than complex sentences (15.07 WPM, 1.74% MSD ER).We also found differences in hand-kern eT AL.: HAnDWrITInG FOr TeXT InPUT AnD THe IMPACT OF Xr DISPLAYS, SUrFACe ALIGnMenTS... writing style, which we explain by the transcribed phrases.Simple sentences usually consist of consecutive letters [16,47,55].Complex sentences also include numbers, punctuation marks, and other symbols [16,55,66], that have to be represented individually, each requiring different shapes and separated strokes [28,57].This suggests that the cognitive and motor demands of writing complex sentences are notably more challenging, requiring thought and careful movement to reflect the content accurately.In combination with ensuring legibility for handwriting recognition, this likely impacts input speeds and error rates.As a result, complex sentences caused higher stroke counts (H3.3), lower stroke lengths (H3.4), and lower stroke heights (H3.5), leading to enlarged and more connected cursive handwriting of simple sentences, as shown in Figure 5.These findings highlight the participants' adaptability to different task demands and underscore the importance of including varying sentence complexities in future text input studies.

Improving Precision with XR Controller as Styluses
The gripping technique of controllers can highly influence performance and user comfort [5,31,56].With our study, we showed that handwriting text input benefits from XR controllers in precision grip, mimicking styluses in pen-like postures.In contrast, previous research used controllers in power grip for VR handwriting [18,21,65], offering more stability, but limiting finger and wrist mobility, requiring more strength, and involving larger arm movements [5,31,61].We assume that the precision grip supported precise and intricate movements [5,31,56], reflected in small stroke heights on physically aligned (8.50 mm) and mid-air surfaces (12.14 mm), while maintaining handwriting legibility, visualized in Figure 5.The size, shape, weight, and weight distribution of the XR controller can also influence handwriting text input.Uneven weight distribution makes input devices more uncomfortable, and heavier weight could limit fine-grained movements [8,31,56], leading to increased physical demand and potentially affecting handwriting over time.For example, the HTC Vive Pro controller, weighing 202 grams [31], was found to be unbalanced and impractical for finger movements [56].In contrast, the Meta Quest Touch Pro Controller is lighter and more compact at 164 grams, contributing to a better weight distribution.Even more lightweight input devices like the 68 gram Logitech VR Ink stylus seem preferable but are often expensive or unavailable [31].Therefore, we agree with previous work and recommend using XR controllers as styluses for tasks requiring precision and fine-grained movements, such as handwriting.Handwriting text input on physical surfaces also benefits from pressure-sensitive tips (e.g., Meta Quest Touch Pro controller and Logitech VR Ink), enabling fast and accurate input detection.For mid-air surfaces where physical contact is impossible, we suggest calculating orthogonal distances instead of forward raycasting [18,65] to minimize the influence of wrist movements.Distance-based approaches can also be used for controllers without pressure-sensitive tips on physical surfaces [31,34].

LIMITATIONS
Our research offers new insights into XR handwriting, yet there are limitations.We expect visual incongruencies like color and depth distortions can affect handwriting performance, but our research did not specifically investigate XR display characteristics.Further, our findings indicate that handwriting text input on mid-air surfaces is more physically demanding and less learnable than on physically aligned surfaces, despite similar text input speed, error rates, and other subjective ratings.This suggests the need for longer handwriting sessions with more sentences to understand the impact of physical demand on performance and user experience.While our study revealed higher text input speeds than in prior work, there is a noticeable gap to handwriting with pen and paper or tablet styluses, indicating evident potential for optimization.Another limitation is that we focused on horizontal surfaces.However, vertical and non-stationary surfaces such as whiteboards or tablets could yield different results.Given the potential impact of a controller's shape and weight on task performance, additional studies are essential to clarify how different input devices affect handwriting in XR.Lastly, we used virtual controller models in VR and VST AR, but the influence of externalized embodiment on handwriting is unclear.

CONCLUSION AND FUTURE WORK
Handwriting text input, with its high usability, natural intuitiveness, flexibility, and expressiveness, is a promising text input technique for many XR use cases, including office and knowledge work.We found that handwriting performance and handwriting style are consistent across XR displays, though visual incongruencies may impacted performance and user experience.Surface alignments did not affect performance but significantly influenced handwriting style, which we attribute to the lack of passive haptic feedback on mid-air surfaces.The higher novelty and stimulation ratings for mid-air surfaces likely decrease with more experience.Therefore, physical surfaces are desirable for lower physical demand and higher learnability but are not indispensable for XR handwriting.Sentence complexities significantly influenced text input performance and handwriting style, indicating higher cognitive and motor demands for complex sentences.Consequently, it is reasonable to include different sentence complexities in text input studies, reflecting real-world scenarios.XR controllers are promising styluses, especially when enhanced with pressure-sensitive tips for input detection on physical surfaces.We additionally provide a phrase set of simple and complex sentences with lower and uppercase letters, numbers, punctuation marks, and other symbols, that serves as a basis for text input studies and can be expanded and adapted.
Future work should explore how visual incongruencies in XR display characteristics, such as color and depth fidelity, affect handwriting.Furthermore, investigating the effects of prolonged handwriting on both physically aligned and mid-air surfaces is important.Vertical and non-stationary surfaces are currently underexplored and also require more studies.Although this study successfully used XR controllers in precision grip, the potential of dedicated XR styluses is evident.

Fig. 2 :
Fig.2: In VR, we showed a precisely aligned digital twin, which we created with Blender (left).In VST AR, the real world was captured by the front-facing cameras of the Meta Quest Pro (right).
fängt den Wurm.The early bird gets the worm.Ruf mich an unter: +49 800 1234567 Call me at: +49 800 1234567 Was du siehst, ist was du bekommst.What you see is what you get.Ich arbeite in Gebäude 6 (Raum 4.025).I work in building 6 (Room 4.025).Bist du sicher, dass du das willst?Are you sure you want this?Das Frühstück & Abendessen kostet $30.Breakfast & dinner costs $30.Das ist eine sehr gute Idee.This is a very good idea.Ich verspäte mich um ca. 8 Minuten.I will be about 8 minutes late.Morgen soll es sonnig werden.It should be sunny tomorrow.Auf den Artikel gibt es 20% Rabatt.There is a 20% discount on the article.Diese Kamera macht schöne Fotos!This camera takes nice photographs!Die Besprechung geht von 14 -15 Uhr.The meeting is from 2 -3 pm.Lerne laufen, bevor du rennst!Learn to walk before you run! Mein Flug landet um 17:38 Uhr in Berlin.My flight lands at 17:38 in Berlin.Bewegung ist gut für den Geist!Exercise is good for the mind!Am 25.06.2023gehe ich auf ein Konzert.On 06.25.2023I am going to a concert.Wo habe ich meine Brille gelassen?Where did I leave my glasses?Eine vegetarische Pizza kostet 12,50 C. A vegetarian pizza costs 12.50 C. Kaufst du gerne am Sonntag ein? Do you like to shop on Sunday?Die Geschwindigkeit liegt bei 230 km/h.The speed is at 230 km/h.

Fig. 3 :
Fig. 3: Overview of the study procedure, divided into five phases.

Fig. 5 :
Fig. 5: Handwriting samples from eight participants in our user study for simple and complex sentences.Participants 1 -4 participated in VR, and participants 5 -8 used VST AR.They wrote each sentence on a physically aligned and mid-air surface.Colored dots indicate measure points acquired during the writing process.Fewer colors within a sentence represent lower stroke counts and higher stroke lengths, indicating more connected cursive handwriting.

Table 2 :
Descriptive results of subjective user ratings, text input performance, and stroke-level analysis (n = 72).Reported as Mean (SD).Square brackets after the name indicate the values' range or unit.
1,40.99) = 140.4,p < .001,robust Cohen's d = 1.3.Analyzing sentences separately, we found no significant interaction effect but significant main effects for surface alignments with higher stroke heights for mid-air than physically aligned surfaces, for both simple sentences F(1, 41.54) = 146.54,p < .001,robust Cohen's d = 1.33, and complex sentences, F(1, 39.76) = 131.56,p < .001,robust Cohen's d = 1.3.Comparing sentences by complexity, the robust repeated measures ANOVA model revealed a significant main effect with higher stroke heights for simple sentences than complex sentences, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.