The Impact of Avatar and Environment Congruence on Plausibility, Embodiment, Presence, and the Proteus Effect in Virtual Reality

Many studies show the significance of the Proteus effect for serious virtual reality applications. The present study extends the existing knowledge by considering the relationship (congruence) between the self-embodiment (avatar) and the virtual environment. We investigated the impact of avatar and environment types and their congruence on avatar plausibility, sense of embodiment, spatial presence, and the Proteus effect. In a $2\times 2$ between-subjects design, participants embodied either an avatar in sports- or business wear in a semantic congruent or incongruent environment while performing lightweight exercises in virtual reality. The avatar-environment congruence significantly affected the avatar's plausibility but not the sense of embodiment or spatial presence. However, a significant Proteus effect emerged only for participants who reported a high feeling of (virtual) body ownership, indicating that a strong sense of having and owning a virtual body is key to facilitating the Proteus effect. We discuss the results assuming current theories of bottom-up and top-down determinants of the Proteus effect and thus contribute to understanding its underlying mechanisms and determinants.


INTRODUCTION
The legend of Proteus tells the story of an old man of the sea, Proteus, a prophet who will change shape when he is seized [35]. However, changing shape is not exclusive to Greek mythology. It has gained further attention since virtual reality applications make it possible to embody almost arbitrary digital self-representations, so-called avatars [4]. In 2007, Yee and Bailenson referred to the flexibility of transforming self-representation in virtual environments (VEs) and observed participants adapting their behavior to conform to their avatars' characteristics. They named this phenomenon the Proteus effect [72].
• David Mal, Erik Wolf  The Proteus effect is assumed to depend on the user's subjective virtual reality experience. In this context, recent theoretical work focuses on thoughts about experiences and effects in mixed, augmented, and virtual reality (MR, AR, VR: XR for short). Thereby, plausibility and congruence have received increased attention. Slater et al. [54] discussed an update of the plausibility illusion, while Skarbez et al. [52] and Wienrich et al. [64], in particular, incorporated the current possibilities of MR and investigated how the experience can be described in comparison to immersive virtual worlds. Further, Latoschik and Wienrich [29] recently proposed a novel theoretical congruence and plausibility model (CaP-model). They focus on congruence and plausibility potentially influencing various XR-related qualia. Such can be the spatial presence (the feeling of really being in a VE) [53] or the sense of embodiment (SoE, the feeling of being inside, having, and controlling an avatar in a VE) [23], which, in turn, have been identified as influencing factors of the Proteus effect [22,36,46,73]. However, the theoretical work does not specify the concrete manipulations and congruencies needed to result in certain qualia or effects. Avatars are essential for many relevant applications, e.g., those reliant on the Proteus effect, and have also been indicated to be an important feature of VEs contributing to a user's plausible VR experience [33,51]. In this work, we aim to extend existing knowledge on plausibility, by investigating how semantic congruence between an avatar and the virtual environment affects plausibility, and the important VR-related qualia SoE and spatial presence. Furthermore, we investigate how the avatar-environment congruence affects the Proteus effect, while we also consider the VR-related qualia SoE and spatial presence as potential determinants of the Proteus effect.
Therefore, we investigated the impact of avatar and environment types and their congruence on avatar plausibility, SoE, and spatial presence in a VR-based study. We analyzed moderation effects of the named measures on the Proteus effect. Participants either embodied an avatar in sports-or business wear in a semantic congruent or incongruent environment while performing lightweight exercises (see Fig. 1). Embodying an avatar with athletic attributes might lead to more physical activity than an avatar with business-related attributes. Thus, our experiment was designed to induce a Proteus effect in the context of exercising behavior. The present study extends the existing knowledge about the Proteus effect by considering the relationship (congruence) between the self-embodiment (avatar) and the environment. Thereby, it contributes to understanding the underlying mechanisms and determinants of the Proteus effect and its great potential for safe and effective serious VR applications [42,45].

RELATED WORK
As the first contribution, we aim to extend existing knowledge on congruence and plausibility, by investigating how semantic congruence between an avatar and the virtual environment affects plausibility and important VR-related qualia such as SoE and spatial presence.

Congruence and Plausibility
Congruence and plausibility have been named essential concepts in VR and have recently received increased attention [29,50,54]. Slater et al. [53] defined plausibility as an illusion that "what is apparently happening is really happening" [53, p. 3553]. In contrast, coherence has been named a concept that describes whether a virtual scenario behaves reasonably or predictably, making it plausible [50]. While coherence characterizes a virtual scenario's properties, it relies on how the individual user evaluates it [54]. Slater et al. [54] proposed this evaluation to be dependent on the reactivity of the environment, the contingent references by the elements of the environment to the participant, and the credibility of expectation. Furthermore, in the recently proposed CaP-model, Latoschik and Wienrich [29] named congruence as an ontological specification of coherence. They hypothesized that congruence activations on sensory, perceptual, and cognitive layers lead to a state of plausibility. This state potentially influences various XR-and thus also VR-related qualia, such as spatial presence or the SoE, which we consider essential determinants of VR experiences and effects like the Proteus effect.

Avatar Plausibility
In many serious VR applications, e.g., those reliant on the Proteus effect, a user's digital representation plays a crucial role. These socalled avatars have been indicated to be an essential feature of VEs, contributing to a user's plausible virtual experience [51]. Mal et al. [33] presumed that avatars (org. referred to as "virtual humans") have a considerable impact on the overall state of plausibility. The authors define virtual human plausibility (VHP) as "the subjective feeling of how reasonable and believable a virtual human appears to a user" [33, p. 1], a concept reflecting users' evaluations of how congruent virtual humans act or appear within the virtual scenario. In this work, we refer to this concept as avatar plausibility since the virtual humans we used are utilized as self-representation of the users, i.e., an avatar. The authors further named virtual human congruence (org. referred to as coherence) to be composed of the internal congruence of a virtual human's appearance and behavior, its congruence with the VE, and the congruencies between all additional sensory impressions of the virtual human [33]. The named congruencies hold the potential to manipulate an avatar's appearance and behavior plausibility (ABP) and the avatar's perceived match with the VE (MVE). Taking inspiration from the CaP-model [29], in our study, we manipulated avatar congruencies on the cognition level, as the cognitive plasticity of users possibly allows the manipulation of visual cues and their associated semantics in VR without compromising the overall plausibility of the experience. We, therefore, manipulated (1) the semantic congruence between the avatar's clothing and the environment's interior and (2) the semantic congruence between the avatar's clothing and the behavior, as lightweight exercising might be more plausible for avatars in sports-than business wear and propose the following hypotheses: H1.1: Participants embodying an avatar dressed in attire semantically congruent with the VE will rate the avatar's plausibility concerning its match with the VE higher than participants embodying an avatar with semantically incongruent attire.
H1.2: Participants embodying an avatar dressed in attire semantically congruent with the sports tasks will rate the avatar's plausibility concerning its appearance and behavior higher than participants embodying an avatar with semantically incongruent attire.

VR-related Qualia: Spatial Presence and Sense of Embodiment (SoE)
Spatial presence is an essential VR-related quale primarily defined as the subjective sense of being in a VE [56]. It is driven bottom-up by the objective concept of immersion, which describes a system's properties that provide the boundaries within which spatial presence occurs [53]. Spatial presence is considered an elementary foundation for other VR potentials to become effective [63]. In our experiment, we kept the degree of immersion (bottom-up) constant between conditions and did not suspect strong top-down influences of avatar type or environment style manipulation. Therefore, we assume that avatar-environment congruence would not strongly impact spatial presence. We conclude with the following hypothesis: H1.3: The manipulation of avatar type and the environment style will not lead to significant differences in spatial presence.
Another essential concept describing VR experiences is the sense of embodiment (SoE). It is the feeling of being inside (self-location), having (body ownership), and controlling (agency) an avatar [23,55]. SoE is considered to emerge from a combination of bottom-up and top-down influences [23]. The bottom-up information originates from the coherence of visual, tactile, and proprioceptive input, determined by the technical system [54]. The top-down information comes from cognitive processing of the avatar's visual properties, such as selfsimilarity [15,62] and realism [28]. Thus, avatar embodiment integrates several different perceptual and cognitive processes, which can modulate emotional responses [18], influence the perception of the virtual avatar's body [10,70], or act reciprocally on our behavior as described by the Proteus effect [42,45]. In the presented study, we did not explicitly manipulate the SoE from the bottom up, as we aimed to keep the embodiment configuration, determined by our technical system [54], constant between the conditions. We do not suspect a strong manipulation of top-down processes related to the SoE since avatar realism was kept constant and self-similarity was only affected interindividually and not at the manipulation level. We conclude with the following hypothesis: H1.4: The manipulation of avatar type and the environment style will not lead to significant differences in the SoE.

The Proteus Effect
As a second contribution, we investigate the relationship (congruence) between the self-embodiment (avatar) and the virtual environment as possible influential factors of the Proteus effect. Following the CaP-model, we consider congruence activations to impact the users' subjective VR experience and its effects. We further consider the influence of the named VR-related qualia, which have been indicated as possible determinants of the Proteus effect. The Proteus effect describes that users adapt their behavior and attitudes to conform to their avatar's characteristics [45,72]. This comes with great potential as it may decrease racial bias [6,38], promotes self-confident behavior [72], and is considered to increase efficacy in mental health applications [9,63]. However, with great power comes great responsibility, as the Proteus effect may also lead to unfavorable or unintended behavior changes, like the activation of negative thoughts [39]. Recent reviews summarized multiple studies with various applications, manipulating avatar characteristics and reporting behavioral and attitudinal changes reflecting the Proteus effect [42,45]. For example, various research has investigated the impact of different avatar attributes related to sportiness or athleticism on users' exercise behavior. They showed that the muscularity of the avatars [25,26,31], their body size [40,41], the visualization of sweat [24], and their clothing [22,34] affect the performance in physical exercises, the extent of physical activity, or the perceived exertion or self-efficacy during exercising. In this work, we induced a potential Proteus effect as we manipulated the users' avatar's athleticism using sports-and business clothing. Based on the assumption that embodying an avatar with athletic attributes might lead to more physical activity than an avatar with business-related attributes, we conclude the following hypotheses: H2.1: Participants embodying an avatar dressed in sportswear will show higher exercise performance than those embodying an avatar dressed in business wear.
H2.2: Participants embodying an avatar dressed in sportswear will report higher levels of commitment, enjoyment, and motivation for future activity than those embodying an avatar dressed in business wear.

Determinants of the Proteus Effect
Understanding its fundamental mechanisms seems essential for the Proteus effect to become effective. Recent explanatory approaches assume that the Proteus effect is stronger if the user feels closer to the avatar [45]. Factors affecting this user-avatar bond are typically those directly dependent on users' interindividual differences and the avatars' characteristics like self-similarity or wishful identification, and highlevel VR-specific qualia describing users' subjective VR experience like the SoE and spatial presence [42]. This is in line with the overview of Wienrich et al. [63], presenting possible VR-related features affecting human behavior changes. The authors stress the importance of VR qualia (referred to as "corresponding perceptions"), which moderate the effect of VEs on affect, attitude, and behavior, and name the SoE as a crucial moderator for the impact of avatars and spatial presence as a crucial moderator for the impact of environment representation. In this realm, Yee and Bailenson indicated greater behavioral changes when participants embodied an avatar than those observing identical visual stimuli without embodiment [73]. Subsequently, visuomotor coherence has been shown to impact the SoE and be a pivotal factor in provoking the Proteus effect [5,36]. Kilteni et al. [22] demonstrated that the individually experienced SoE could moderate the impact of avatar appearance on behavior, indicating that a stronger illusion of body ownership may lead to greater behavior changes, even when SoE is not actively manipulated. Contradicting, Reinhard et al. [46] could not find an impact of SoE on the Proteus effect, but they reported that a more pronounced spatial presence led to a stronger Proteus effect.
As mentioned in Section 2.1.2 we did not actively manipulate spatial presence and SoE. However, we consider inter-individual differences in the named qualia potential moderators for the Proteus effect and propose the following hypotheses: H2.3: Participants who report a greater SoE towards the avatar will experience a stronger Proteus effect.
H2.4: Participants who report greater spatial presence will experience a stronger Proteus effect.
In this work, we extend the existing knowledge about the Proteus effect by considering the congruence between the avatar and the virtual environment. The results of Kilteni et al. [22] indicated a behavior change only for participants embodying a semantically congruent avatar. However, previous work on the impact of avatar plausibility on the Proteus effect is sparse, while current work has highlighted plausibility as an important factor for the effects of XR on human processing and behavior [29]. We propose the following hypotheses: H2.5: Participants embodying an avatar congruent with the VE will experience a stronger Proteus effect than participants embodying an avatar incongruent with the VE.
H2.6: Participants who report greater avatar plausibility will experience a stronger Proteus effect.

Participants
A total of 72 bachelor students of the University of Würzburg were recruited through the university's participant management system and received credit points equivalent to one hour for participation. Prespecified exclusion criteria ruled out the data of three participants reporting an uncorrected visual impairment, three participants suffering from severe underweight (BMI < 16.5) or obesity class II (BMI > 35), two participants due to feeling discomfort, and further five participants due to technical issues (2× recording failed, 2× participants lost a tracker and 1× Steam VR Error). In the resulting 59 valid data sets, participants' age ranged from 19 to 27 years (M = 22.14, SD = 2.05), with 33 participants being female (56 %) and 26 male (44 %). The native language of all included participants was German. Three participants had no VR experience, 48 participants had experienced VR between 1 and 10 times, and eight participants had VR experience more than ten times before the experiment.

Design
In a 2 × 2 between-subjects design with the independent variables avatar type (sport/business, see Fig. 2) and environment style (sport/business, see Fig. 1), participants were assigned to one of the four conditions in a counterbalanced manner. Therefore, participants embodied a gender-matching avatar with sports-or business wear in a VE with sports-or business interior. Our dependent variables assessed the participants' motivation when performing lightweight sports tasks in front of a virtual mirror. To this end, we captured the exercise performance of three exercises as behavioral measures and evaluated selfreports on subjective measures. We further measured the VR-related qualia avatar plausibility, SoE, and spatial presence. Control variables accounted for participants' gender, BMI, signs of simulator sickness, and self-reported physical activity. Further information regarding the used measures is provided in Section 3.5. The study received approval from the ethics committee of the Institute Human-Computer-Media at the University of Würzburg without further obligations.

Apparatus
All self-reported measures were assessed on a separate workstation in the experiment room using the online survey tool LimeSurvey 4.5 [30]. Participants' body height and weight were measured with calibrated medical equipment. The interpupillary distance of the participants was determined using the mobile application GlassesOn [1] running on a Samsung Galaxy S6 smartphone.
The VR system was implemented using Unity 2019.4.20f1 [59]. The VR hardware was integrated using SteamVR version 1.20.4 [60] and the corresponding Unity plug-in version 2.7.3. It consisted of a Valve Index HMD [61], two Valve Index controllers, and three HTC Vive Trackers 3.0 [20] (see Fig. 3, left). The Valve Index HMD has a display resolution of 1440 × 1600 pixels per eye, a field of view of 120°, and ran at a refresh rate of 120 Hz. The tracking area (3m × 3m) was set up with three SteamVR Base Stations 2.0. Particular attention was paid to the cable routing to minimize possible disturbance to the participants. A VR-capable gaming PC (Intel Core i7-7700K CPU, NVIDIA GeForce GTX 1080, 16 GB RAM) ensured fluent rendering. The motion-tophoton latency of the system during full embodiment averaged 73.5 ms. The latency was determined by counting frames [57,58] between realworld hand movements and the hand movements of the avatar rendered in the HMD and on a ROG Swift PG43UQ Monitor at 120 fps, using an Aten VanCryst VS192 display port splitter. The figure shows our used sport (left) and business (right) avatars taken from the Rocketbox Library [19] in comparison.

Avatars and Virtual Environments (VEs)
The avatars' appearance differed in the semantic attributes of their clothing, as they were either dressed in sportswear (sport) or formal attire (business), shaping the manipulation of athleticism. To this end, we used the freely available rigged avatars from the Rocketbox Library [19]. The avatars are shown in Figure 2.
The VEs based on Unity assets, which we adapted for our purposes. They differed in their semantic attributes of interior design, appearing either as a fitness room (sport) 1 or an office (business) 2 . The resulting VEs had the same dimensions, lighting conditions, and virtual mirror positions. The VEs are shown in Figure 1. Custom dumbbells were modeled that were initially not visible in the VEs and only appeared when needed for a task (see Fig. 3, right).

Embodiment
We animated the avatars using inverse kinematics [2] implemented with FinalIK version 1.9 [47] following the system of prior work [11,67]. The trackers were attached to the participants' lower back using a belt and on the top of each foot using custom straps (see Fig. 3, left). The avatars were automatically scaled to the height of participants. Participants could observe their avatar's virtual body directly from the first-person perspective and in a virtual mirror from the third-person perspective (see Fig. 3, right). Following Wolf et al. [66], we kept the distance to the mirror around 1.7m, where no distance-related influence is expected. Participants could control their avatar's fingers using the touch sensors of the controllers. Finger control was limited to approximating the participant's fingers' curvature. The participant could grasp the dumbbells by pressing the trigger button of one of the controllers while the respective hand was near the dumbbell's center (< 10 cm). When grasped, the dumbbells snapped to the avatar's hand, and the fingers were adjusted to form a holding fist (see Fig. 3, right). 1 https://assetstore.unity.com/packages/3d/props/interior/fitness-room-143613 2 https://assetstore.unity.com/packages/3d/environments/urban/hq-archviz-office-163824

Exercises
Participants had a two-minute time frame to perform each exercise in VR. Exercises were of voluntary duration as participants were informed that they could stop each exercise at any time in case they were exhausted or felt they had exercised enough. Participants were blind to all behavioral measures. We deliberately chose a short time frame as well as light and easy-to-understand tasks to not be overly dependent on participants' athleticism but rather on their willingness to engage. First, participants performed warm-up body movements to familiarize themselves with the virtual body and induce the SoE through visuomotor coherence by waving at their virtual mirror reflection in a relaxed manner for 20 s with each hand. During warm-up, no measures were taken. After the warm-up, participants performed three exercises.
(1) Knee Lifting, participants were instructed to alternately lift the right and left knee to the hip level (see Fig. 1, top right). We counted each leg lift as half a repetition and measured the time of execution.
(2) Arm Lifting, participants were instructed to lift the stretched arms sideways to shoulder height and back (see Fig. 1, bottom left). We counted each lift as one repetition and measured the execution time.
(3) Weight Lifting, participants were instructed to perform the same exercise as when arm lifting exactly ten times using an additional dumbbell. Therefore, they were told to grasp one of five virtual dumbbells, which appeared in front of the participants (see Fig. 1, bottom right). The dumbbells were equipped with two weight plates each, linearly increasing in size from 1 ( = 13 cm) to 5 ( = 23 cm). As the number of exercise repetitions was given, the repetitions and the execution time were not measured.

Exercise Performance
For exercises 1 and 2, we assessed exercise performance by capturing the number of exercise repetitions and the time of exercise performance (see Sec. 3.4). For exercise 3, we captured weight selection as participants grasped 1 of 5 virtual dumbbells (1 = lowest weight).

Subjective Measures
Four self-report questions captured participants' assessment of whether they (1) attempted to do as many repetitions as possible in the given time, (2) put effort into the execution of the exercises (both commitment), (3) enjoyed performing the exercises (enjoyment), and (4) intend to do movement exercises more often in near future than they have done so far (future motivation). Participants were instructed to tick the extent to which the statements applied on a 7-point Likert scale (1 = does not apply at all). We averaged the first and second questions to calculate the value for commitment.

VR-related Qualia
Avatar Plausibility We assessed the avatars' plausibility with the Virtual Human Plausibility Questionnaire (VHPQ) [33]. The measure consists of two dimensions: (1) The avatar's appearance and behavior plausibility (ABP) and (2) the avatar's match to the VE (MVE). All 11 questions were rated on a 7-point Likert scale (7 = highest plausibility).

Sense of Embodiment (SoE)
We assessed the SoE with the Virtual Embodiment Questionnaire (VEQ) [48]. The questionnaire consists of three dimensions: (1) virtual body ownership (VBO), (2) agency over a virtual body (Agency), and (3) the perceived change in the body schema (Change). Each factor is assessed with four items rated on a 7-point Likert scale (7 = highest SoE).
Spatial Presence Spatial Presence was assessed with the SP sub-dimension of the Igroup Presence Questionnaire (IPQ) [49]. All questions are rated on a 7-point Likert scale (6 = highest presence).

Control Measures
We controlled for interindividual differences between participants by considering their gender (male, female, divers), BMI [71], perceived signs of simulator sickness, and self-reported physical activity.
Simulator Sickness Simulator sickness was assessed using preand post-VR measurements of the Simulator Sickness Questionnaire [21]. It consists of 32 items capturing symptoms associated with simulator sickness. The total score ranges from 0 to 235.62 (235.62 = strongest). We considered the difference between the preand post-assessments for the evaluation. An increase in score indicates the occurrence of simulator sickness due to VR usage.
Physical Activity We assessed participants' usual level of activity during leisure and work time in the past year with the Tromsø Study Physical Activity Questionnaire [12] on a scale from 1 to 4 (4 = high physical activity).

Manipulation Check -Avatar Athleticism
To check whether the between-condition manipulation of the avatar type was successful, we asked participants to self-report how athletic they rated their avatar on a 5-point Likert scale (5 = athletic).

Procedure
The experimental procedure is visualized in Figure 4. It averaged 55 minutes (SD = 12.05). First, the experimenter educated the participants about COVID-19 regulations and ensured that they did not suffer from any physical limitations preventing their participation. Participants then read the experimental information and gave explicit written consent. Subsequently, participants' body height, weight, and interpupillary distance (IPD) were measured. Thereafter, participants answered prequestionnaires on demographic data, physical activity, and pre-VR simulator sickness. Lastly, the experimenter demonstrated how to use and wear the equipment and how to configure the HMD's IPD value. Right before the VR experience, the experimenter primed participants to choose the number of repetitions and the duration of the following exercises themselves.
VR Experience During the VR experience, pre-recorded audio instructions guided participants through their tasks following a prescripted linear procedure. In the beginning, participants performed a vision test and the embodiment calibration. Before starting the exercises, participants could practice grasping virtual objects. In the following, participants performed warm-up body movements and the three exercises described in Section 3.4. Participants could stop the experiment for any reason at any time.
Closing After the VR experience, participants continued with selfreported questionnaires assessing subjective measures, avatar athleticism, post-VR simulator sickness, as well as the VR-related qualia avatar plausibility, SoE, and spatial presence. Lastly, the experimenter debriefed participants about the experiment and all manipulated variables and answered all questions. Fig. 4: The figure provides an overview of the evaluation process as a whole (left) and a detailed overview of the VR exposure (right).

Statistical Analysis
Statistical analyses were performed in R [44]. If Levene's test indicated a violation of homogeneity of variances, we calculated robust tests using the WRS2 R package [32]; see Wilcox [65] for details. To verify our manipulation, we calculated an ANOVA with avatar type and gender as predictors and avatar athleticism as the outcome variable. We further tested whether control measures significantly impacted any dependent variables and verified their independence from the predictor variables. Where applicable, we added control measures to the models described below. A summary of control measures added to at least one of our statistical models is given in Section 4.2.

Avatar and Environment Congruence (H1)
We calculated two-way ANOVAs with avatar type and environment style as predictors and one of the respective VR-related qualia (avatar plausibility, SoE, and spatial presence) as the outcome variable.

Multivariate Analysis of Variance (H2)
To reduce the probability of a type 1 error and increase the clarity of the results, we grouped semantically correlated dependent variables and calculated multivariate analyses of variance (MANOVA) using Pillai's trace test statistics. We combined the outcome variables (1) time of performance and (2) number of performed repetitions of exercises 1 and 2 as exercise performance and (1) commitment, (2) enjoyment, and (3) future motivation as subjective measures. Weight selection was analyzed separately, using univariate ANOVA. This resulted in three statistical models for each combination of predictors. We first calculated the models with only the avatar type as a predictor, reflecting the mere Proteus effect (H2.1 and H2.2). Further, for each potential moderator, we calculated ANOVA/MANOVA models including two predictors, (1) avatar type and (2) either one of the VR-related qualia (H2.3, H2.4, H2.6) or the environment style (H2.5). A potential moderation is indicated by the interaction between avatar type and one of the other predictors. The main effects are reported to be analyzed in an exploratory manner. Analyses were guided by Field et al. [16].
Folow-up Analysis For MANOVAs showing a significant effect, we performed a follow-up analysis with separate ANOVAs for each dependent variable. In case one of the described MANOVAs showed a significant interaction effect between avatar type and one of the other predictors (moderator), additional moderation analyses were performed for each of the original outcome variables to break down the moderation into single exercises or subjective measures respectively. Therefore, simple slope analyses [43] were performed with conditional values of the moderator variable's (1) mean (average moderation), (2) mean plus one standard deviation (high moderation), and (3) mean minus one standard deviation (low moderation). We used the PROCESS function of the bruceR package [7]. Numeric predictors have been mean-centered.

Manipulation Check -Avatar Athleticism
There was a significant main effect of the avatar type on the ratings of avatar athleticism, F(1, 55) = 19.34, p < .001, ω 2 = 0.23. Sport avatars (M = 4.10, SD = 0.79) were rated higher on athleticism than business avatars (M = 3.11, SD = 0.96). Furthermore, there was no significant main effect of gender, F(1, 55) = 1.71, p = .197, ω 2 = 0.01, and no significant interaction between avatar type and gender, F(1, 55) = 0.10, p = .756, ω 2 < 0.01. We consider a sufficient manipulation of avatar athleticism for both avatar genders.  as a control variable to models predicting selected weight as an outcome and leisure activity to models predicting subjective measures. None of the other control variables affected a dependent variable significantly.

Avatar Plausibility (H1.1 and H1.2)
In line with H1.1 (the effect of avatar and environment on MVE), for MVE a robust two-way ANOVA revealed a significant interaction effect between avatar type and environment style, Q = 39.65, p = .001. Post-hoc tests showed a significant difference between conditions with congruent avatar and environment, i.e., business-business and sport-sport (M = 6.09, SD = 0.53), and conditions with incongruent avatar and environment, i.e., sport-business and business-sport (M

Sense of Embodiment (H1.4)
In line with H1. (1) For the number of repetitions of exercise 1, there was an interaction effect of avatar type and VBO, F(1, 55) = 4.8, p = .032, ω 2 = 0.06. Simple slope analysis revealed a significant conditional effect of avatar type on the number of repetitions of exercise 1 for high VBO, b = 19.14, t(55) = 2.31, p = .024 (see Fig. 5, left), indicating that participants embodying a sport avatar performed more repetitions of exercise 1 than those embodying an office avatar.
(2) For the execution time of exercise 1, there was an interaction effect of avatar type and VBO, F(1, 55) = 9.16, p = .004, ω 2 = 0.12. Simple moderator analyses revealed a significant conditional effect for high VBO, b = 34.56, t(55) = 2.51, p = .015 (see Fig. 5, center), indicating that participants embodying a sport avatar executed exercise 1 for a longer time than those embodying an office avatar.

DISCUSSION
Our study investigated the impact of avatar and environment types and their congruence on avatar plausibility, SoE, and spatial presence. We further considered the named congruence and the VR-related qualia as potential moderators of the Proteus effect.

Avatar-Environment Congruence
Our results confirm the assumption that a congruent avatar and environment would lead to a significantly higher reported plausibility concerning the avatars' match with the VE, compared to an incongruent combination of avatar and environment (H1.1). The considerably large effect reflects a sufficient top-down manipulation of the avatars' fit to the VE on a semantic level (sport vs. business). Furthermore, participants embodying an avatar dressed in sportswear when performing sports tasks showed significantly higher appearance and behavior plausibility than participants embodying an avatar dressed in business wear (H1.2). This result indicates that participants processed the congruencies between the self-performed behavior (exercises) and the semantics of the embodied avatar as part of its plausibility. In line with H1.3 and H1.4 concerning spatial presence and SoE, respectively, the manipulation of avatar type and environment style did not lead to significant differences, indicating that avatar-environment congruence did not strongly impact top-down processing of SoE and spatial presence. At the same time, in line with the concept of Slater et al. [54], we kept bottom-up influences (the technical system and the degree of immersion) constant between the conditions.

The Proteus Effect
We consider a sufficient manipulation of avatar athleticism as our manipulation check showed a considerable effect of the avatar type on the reported avatar athleticism independent of the avatar's gender. Nevertheless, without considering the determinants of the Proteus effect, participants embodying an avatar dressed in sportswear did not show higher physical activity in exercise sessions and they did not report higher levels of subjective measures than participants embodying an avatar dressed in business wear (further referred to as our Proteus effect). This contradicts the results of related work on the effects of avatar athleticism in VR, e.g., found for reducing perceived exertion and increasing grip strength for more muscular avatars [26], or reducing perceived exertion and heart rate for athletic avatars [25].
Regarding the Proteus effect, interindividual variance is seemingly high. Most of the named work uses within-subjects designs to compensate for individual differences. In contrast, we used a between-subjects design to prevent repeated measures from having a potential sequential effect on (1) physical activity due to exertion and (2) the user-avatar bond due to the consecutive embodiment of avatars with different characteristics potentially leading to unintended effects. In this regard, there is an indication for the Proteus effect to impact participants' behavior even after (each) VR experience [74]. One solution to account for the individuality of the Proteus effect is using a reasonably large sample size [3]. For example, Navarro et al. [34] reported differences in cardiac frequency between avatars wearing sports or formal clothes with over 300 participants. We followed a different approach and investigated the extent to which different moderator variables account for interindividual variance. Taking these variables into account, we revealed a Proteus effect even with a comparatively small sample. In our work, the sample size of N = 59 may not have had the intended power to reveal small-to-medium effects as they are indicated for the Proteus effect [45]. A post hoc sensitivity analysis (α = 0.05, 1 − β = 0.8, N = 59) using G*Power version 33.1.9.6 [14] revealed, that a two-grouped (avatar type) MANOVA on exercise performance (4 responses) would be sensitive to medium to large effects ( f 2 = 0.22) [8]. Therefore, we cannot rule out the risk of a type 2 error for small to medium effects.
However, we explicitly accounted for between-subject differences in the VR experience as we further considered the moderation effects of avatar-environment congruence and the VR-related qualia avatar plausibility, SoE, and spatial presence.

Moderation by Avatar-Environment Congruence and Avatar Plausibility
We indicated that participants cognitively processed the top-down manipulation of avatar plausibility as we accepted H1.1 and H1.2. However, neither the manipulation of avatar-environment congruence nor ABP or MVE moderated the relation between avatar type and any outcome variables. Therefore, we reject H2.5 and H2.6 and assume that the Proteus effect was seemingly not influenced by the avatar-environment congruence and the avatar plausibility.

Spatial Presence
Spatial presence did not function as a moderator for the Proteus effect, contradicting our hypothesis (H2.4) and the work of Reinhard et al. [46]. To make a comprehensive statement, we propose future work to explicitly manipulate spatial presence to investigate its influence on the Proteus effect further.

Moderation by SoE
We partly accept H2.3, stating that participants who report a greater SoE toward their avatar will experience a stronger Proteus effect. Therefore, we distinguish between the VEQ's dimensions and the outcome measures. Agency and change did not moderate the relation between avatar type and any outcome variables. Virtual Body Ownership Our results indicate a significant interaction between VBO and avatar type on exercise performance while there were no significant main effects of avatar type and VBO on the dependent measures. Therefore, we suggest that both the avatar type (Proteus effect) and VBO jointly influenced task performance. Follow-up analysis revealed that only for participants who reported a high feeling of virtual body ownership (M high = 5.85), there was a significant difference between the avatar types on the outcome variables execution time of exercises 1 and 2, and the repetitions of exercise 1 but not the repetitions of exercise 2. Figure 5 thereby indicates that embodying a sports avatar led to greater exercise performance than embodying a business avatar for participants reporting high VBO.
The result for the number of repetitions of exercise 2 (arm lifting) does not cohere with the other measures of exercise performance. While we consider voluntary execution time a good implicit measure for exercise performance the number of repetitions for exercise 2 (arm lifting) seemed to be strongly influenced by interindividual differences in how the movements were executed, i.e., fast repetitions or controlled slow movements as known from lateral raises. Fox and Bailenson [17] showed shoulder and arm exercises to be an effective behavioral measure in a voluntary exercising session after providing participants with a printout portraying exercises and a research assistant demonstrating the exercises. However, participants in our work were only provided with prerecorded audio instructions. For future work we recommend (1) ensuring that all behavioral measures are explained accurately and that the exercises carried out are comparable between participants and (2) to further investigate whether the inconsistent results of exercise 2 exclusively stem from differences in exercise strategies.
Furthermore, regardless of the reported levels of VBO, the weight selection task and subjective measures were not affected by the avatar type. For weight selection, participants had to know which weight is the ideal to perform arm lifting exercises (i.e., habituation to the context of training with dumbbells) and had to understand and cognitively process that virtual weights would not imply any real aggravation of the task. Since participants did not lift real weights, the individual perception of the virtual weight may also have influenced participants' performance and behavior. This additional task complexity may explain why we could not quantify a Proteus effect for the weightlifting exercise. Also, we could not show any impact of the avatar type on subjective measures. This result may be explained by the statement of Ratan et al., suggesting behavioral effects to be more valid reflections of the Proteus effect as it is situated in self-perception theory, proposing that "individuals assume their attitudes by observing their behaviors" [45, p. 9].
These results are in line with the work of Kilteni et al. [22] who demonstrated that the individually experienced SoE could moderate the impact of avatar appearance on behavior even when the SoE is not actively manipulated. In our work, the moderating effect of VBO on the relation between avatar type and exercise performance was also based on an interindividual and not on the manipulation level, as we kept bottom-up influences constant between conditions [54]. We suspect the characteristics of the participants themselves to have been an important factor influencing the cognitive processing of the avatar's visual properties, such as self-similarity [62], which in turn have been named a determinant of the Proteus effect [42], also in physical activity [34]. For future work, we recommend controlling for the similarity between the used avatars and the individual participant, e.g., by capturing self-identification as suggested by Fiedler et al. [15].
For execution times, Figure 5 might suggest a potential inverse effect of avatar characteristics for low VBO. We recommend further investigating this exploratory finding in future work by explicitly manipulating the VBO from the bottom up, e.g., by introducing additional latency for the virtual body.

The Proteus Effect: Conclusion
We could not show a moderating impact of the manipulation of avatarenvironment congruence, avatar plausibility, and spatial presence on the Proteus effect. However, our results indicated the SoE to be an important factor contributing to the Proteus effect. We suggest the SoE to strengthen the bond between the user and the avatar as has been indicated by the introduced related work [22,42,45,73]. In particular, we found evidence for the component of virtual body ownership to be a driving factor moderating the effect of the avatars' athleticism on behavioral changes in exercise performance. Here, we relate the previously known relationship between embodiment and the Proteus effect to a single factor of a validated embodiment questionnaire [48]. Overall, an interplay of appropriate behavioral measures and a high level of body ownership appeared to be the basis for quantifying the Proteus effect. Therefore, we assume that the environment does not necessarily have to match the avatar and its characteristics for the Proteus effect to become effective, as long as the SoE is preserved on a high level.

Exploratory Insights
The results revealed significant main effects of VBO and ABP on enjoyment. From an HCI perspective, this indicates that a plausible avatar and a strong sense of having and owning a virtual body are important components for user satisfaction and enjoyment when performing physical activity in VR. Further, our results suggest the feeling of controlling one's virtual body movements (agency) has an impact on exercise behavior. For all exploratory results, we recommend further in-depth exploration in future work.

On VR Experiences and Effects
The CaP-model by Latoschik and Wienrich [29] postulates that VR experiences and effects result from a function of weighted congruence activations resulting in VR typical qualia, which in turn can act as moderators and mediators for VR effects and high-level applications. However, the model does not specify the concrete manipulations and congruencies needed to result in certain qualia or effects. It defines the overall frame wherein to integrate such effects consistently. Our results indicate that manipulating the avatar's plausibility by semantic congruencies between the avatar, its behavior, and the environment indeed led to a self-reported change in cognitive processing (H1.1 and H1.2). Further, these manipulations did not significantly affect the SoE and spatial presence for our sample, as we kept the bottom-up processing constant (H1.3 and H1.4) and did not manipulate strong topdown influences of these essential qualia. Most importantly, we indicate that a central characteristic of VR experiences, their huge flexibility concerning the design of virtual environments and the manifestation of the users, mainly affects the cognitive layer. The Proteus effect breaks the congruence between the participant's real appearance and the avatar's characteristics. However, the cognitive plasticity of the human being seems to allow these changes in the virtual world without breaking the overall plausibility, still allowing the Proteus effect to become effective. Even when breaking the avatar's plausibility from the top down, we could show that other essential qualia like spatial presence or the SoE stayed unaffected making it still possible that their effects, in this work, the Proteus effect, can occur, as indicated by participants reporting a high level of VBO (H2.3).
Relating our work to Slater et al. [54], who named presence (spatial presence and plausibility as two orthogonal illusions) and the illusion of ownership over the virtual body as the key illusions of VR, we kept the properties of the immersive configuration for spatial presence and SoE constant between the conditions accordingly not significantly affecting the named concepts (H1.3 and H1.4). Furthermore, concerning plausibility, we manipulated the credibility of the environment based on the expectations users would have on how sports or business avatars behave in a congruent or incongruent environment. This manipulation was reflected in the ratings of the avatars' plausibility (H1.1 and H1.2). By further evaluating the three "key illusions of virtual reality" [54, p. 1], we found evidence for virtual body ownership to be a key factor in facilitating the Proteus effect (H2.3) which supports previous work highlighting the importance of embodiment for the Proteus effect [5,22,36,42,45].

LIMITATIONS AND FUTURE WORK
The evaluation and interpretation of our results are strongly based on the validated VEQ embodiment questionnaire and its factors, especially on the sub-scale of VBO. We did not capture self-location as a component of SoE [23], preventing us from analyzing its moderating influence on the Proteus effect. Recently published alternative embodiment measures were not yet available at the time of data collection [13] or did not consider a discussion of their components compared to those of the VEQ, which in turn hinders the assessment of its benefits [37].
While current theoretical models concerning plausibility increasingly refer to the broader realm of XR systems [29,52,54,64], our findings are limited to the application of fully immersive VR systems. Different display types possibly lead to different degrees of incongruence between the renderings of an avatar and the presentation of the (virtual) environment [69]. We advocate future work to extend our results in lower immersive XR systems, such as the video see-through or optical see-through AR system introduced by Wolf et al. [67,68].
In our experiment, exercises were introduced by audio instructions, which may have led to ambiguities in the execution of exercises and, therefore, may have biased behavioral measures, especially for the arm lifting exercise. For future work, we recommend showing and explaining the exercises before the experiment and performing baseline measurements to keep unintended variance between subjects low. We further recommend considering the participants' qualitative statements to get insides into how they felt about the exercises and their behavior.
We manipulated the athleticism of avatars and VEs using sports and business attributes for attire and interior. Participants performed lightweight exercises to elicit the Proteus effect, reflecting physical activity as a behavioral measure. While it might be likewise interesting to have participants perform typical office tasks congruent with business attributes, we have no theory-based hypotheses that would suggest a different or opposite effect when mirroring the study's design with business tasks. Therefore, this work focused exclusively on sports tasks, reducing the study design's complexity. We suggest future work to replicate our findings with a business-related task.
In this work, we did not find evidence for the (in)congruence between avatar and environment to impact the SoE and spatial presence. However, given a between-subjects design and a sample size of N = 59, we cannot rule out a type 2 error [14], although the overall small effect sizes (ω 2 < 0.05) do not point in this direction. In general, the sample size was limited by the availability of participants, seemingly lowered by COVID-19 health and safety regulations that were in place at the time of data collection. Furthermore, the strict application of pre-specified exclusion criteria (N = 13) further decreased the number of valid data sets for statistical evaluation.
There are many other (in)congruences conceivable to manipulate a virtual scenario with avatars at sensory, perceptual, or cognitive levels, e.g., by altering the self-and external perception of avatars [27]. We suggest future work to deepen the knowledge of (in)congruences and their impact on VR experiences and effects.

Conclusion and Contribution
The Proteus effect brings great potential for multiple relevant use cases. However, understanding its underlying mechanisms is crucial for effective and safe use in VR. This article extends the existing knowledge by considering the relationship (congruence) between the self-embodiment (avatar) and the virtual environment. The present study investigated the impact of the subjective avatar plausibility and the VR-related qualia SoE and spatial presence on the Proteus effect. We showed that the top-down manipulation of avatar-environment congruence affects the avatar's plausibility but not spatial presence or SoE. In our study, we further identified the individual's feeling of body ownership as a key driver of the Proteus effect providing further evidence for the known importance of embodiment in this phenomenon.
However, avatar-environment congruence, spatial presence, and selfreported avatar plausibility did not moderate the Proteus effect. In summary, we have indicated that not only the avatar's properties but its synergy with the SoE determine the Proteus effect. We showed that the avatar-environment congruence affected the avatar's plausibility but did not impact or hinder the Proteus effect. We discussed the results assuming current theories of bottom-up and top-down determinants of the Proteus effect and thus contribute to understanding its underlying mechanisms and determinants.