Location-Dependent Augmented Reality Services in Wireless Edge-Enabled Metaverse Systems

Metaverse is envisioned as the next-generation paradigm that links a virtual world to the physical world, providing immersive experience for human activities in the hypothetical environment with augmented reality (AR) and virtual reality (VR) technology. Recent development of wireless multiaccess edge computing (MEC) has accelerated the realization of this vision. In this paper, we consider location-dependent AR services in MEC-enabled Metaverse systems. In such a system, each user performs simultaneous localization and communication first to estimate its position and request AR contents from the MEC server. Due to limited computation and communication resources, the MEC server adaptively adjusts the resolution of AR contents. However, two challenges arise in such a system. First, the localization error will degrade the user’s quality-of-experience (QoE); and second, the coupling of communication and computation complicates the resource management. To address these issues, we first model the QoE taking the localization errors into consideration, and then propose a minimum QoE maximization scheme to achieve fairness among users by jointly optimizing the waveform at users, the transmit power at the base station (BS), and the resolution of transmitted videos. Simulation results verify the effectiveness of the proposed scheme in comparison with benchmark schemes.


I. INTRODUCTION
M ETAVERSE is an emerging concept that describes how a hypothetical world is connected to the real world and is considered as a potential disruption technology for the next generation Internet [1], [2]. Facebook was rebranded as "Meta" in particular to demonstrate its dedication to developing the Metaverse technology. Due to recent changes in social interaction, users of the Metaverse expect to enjoy immersive experiences in the virtual world provided by augmented reality (AR) and virtual reality (VR) services [3]. First, since online meetings and instructions are becoming more common as a result of the pandemic, it will be crucial to offer immersive interactions in the digital world. Second, users may become "super heroes" by connecting all relevant information with the physical world using an AR glass to react to an event promptly [4]. For example, we can see the participant's LinkedIn profile when meeting a new people in a conference and you can know his background immediately.
Given the aforementioned use cases, it is conceivable that the Metaverse will be an essential component in our daily lives soon.
On the other hand, the recent development of wireless communications and multi-access edge computing (MEC) technology [5], which allows users to access the virtual environment whenever and wherever they are, has become a key enabler for the growth of the Metaverse [6]. Particularly, the Third Generation Partnership Project (3GPP) has begun standardizing the AR/VR support in New Radio (NR). A Release 17 study on NR performance for AR/VR services was undertaken [7], and the work will continue in Release 18 (the first 5G-Advanced release), with the goal of improving the performance of AR/VR services over the NR. Moreover, the wireless MEC-empowered Metaverse can offer seamless immersive experiences with lower latency, due to the faster response of the MEC than cloud computing [8].
In the literature, there are a few studies working on the wireless MEC-enabled Metaverse [9], [10] or AR applications [11], [12], [13], [14]. In [9], an advertising strategy was studied for VR content transmission in the wireless MEC-enabled Metaverse from the perspective of preserving privacy. In [10], a learning-based incentive mechanism was proposed for trading VR services between users and providers in the wireless MEC-enabled Metaverse. These two works considered VR services without physical location information, and thus cannot be applied to the AR services considered in this paper. In [11], the authors proposed to insert an edge layer between the user and cloud layers to deliver a better QoE for AR services. In [12], an energy-efficient resource allocation approach over both communication and computation resources was proposed for MEC-enabled AR applications. The authors in [13] investigated a server assignment and resolution selection problem to achieve the latency-accuracy trade-off. The work in [14] studied the popular content caching and performing computation at the MEC. However, these methods typically assume that the location estimations for users are perfect and neglect the impact of localization errors on the QoE, which motivates this paper.
In this paper, we consider location-dependent AR services in wireless MEC-assisted Metaverse systems as the virtual AR contents should be consistent wit the physical environment. 1 In the Metaverse system, two phases are performed to transmit AR contents: uplink and downlink. In the uplink phase, each user first performs simultaneous localization and communication to estimate its position and request AR contents from the MEC server according to its estimated position. However, the data volume of the entire virtual objects might be too large to be transmitted within one time slot. In the downlink phase, the MEC server 1. The location information should be considered in AR services because the provided AR contents need to be consistent with the physical world. For example, presenting a shark on a beach will confuse the Metaverse user, and thus degrading the QoE of Metaverse. Therefore, accurate localization will help maintain the consistency. needs to select part of the AR content, and adjust its resolution to transmit based on the available computing and communication resources, aiming to maximize the user's Quality-of-Experience (QoE). However, two challenges arise to manage the resources in the Metaverse system. First, localization estimation errors are inevitable, which might degrade the QoE of the user. For example, one should place the virtual dolphin in the pool but the location errors might lead to the dolphin on the land. Therefore, it is important to consider the localization errors in the resolution adjustment. Second, compressing more data to save communication resources needs more computation resources. The coupling of communication and computation makes the resource management not trivial.
To address these challenges, we first model the QoE by taking location estimation errors into consideration, which are modeled with a known distribution within a given range. The distribution determines the probability of the field of view (FoV) of each user for the AR video. Next, we formulate a minimum QoE maximization problem for fairness among users by optimizing the waveform in the uplink phase, and the transmit power at the base station (BS) as well as the resolution of transmitted videos. To tackle this problem, we decouple the problem into two subproblems: the uplink and downlink subproblems, and propose a waveform design algorithm for the uplink subproblem and a joint resolution and transmit power optimization algorithm for the downlink subproblem.
Our contributions can be summarized as follows: • We study a wireless MEC-enabled Metaverse system for location-dependent AR services with multiple users. Limited by the communication and computing resources owned by the system, we formulate a minimum QoE maximization problem by properly scheduling these resources for fairness among users, where the physical locations of users are considered in the QoE model. • To address this problem, we decouple the problem into two subproblems: uplink and downlink subproblems. We propose a waveform design algorithm for the uplink subproblem and a joint resolution and transmit power optimization algorithm for the downlink subproblem. • We perform extensive simulations for validation.
Simulation results show that the proposed scheme outperforms the benchmark schemes in terms of QoE. The estimation error is shown to be necessary to be considered in the data transmission error, which can effectively improve the system QoE in the Metaverse.
The rest of this paper is organized as follows. In Section II, we introduce the system model for the wireless MEC-enabled Metaverse system. In Section III, we first introduce the QoE model to quantify the performance of the Metaverse system, and then formulate a minimum QoE maximization problem. Our proposed algorithm is presented in Section IV. Simulation results are shown in Section V, and finally conclusions are drawn in Section VI. Notation: Boldface lower and upper case symbols denote vectors and matrices, respectively. C M×N denotes a complex matrix with dimension M×N. Conjugate transpose is denoted by (·) H , transpose is denoted by (·) T , and the trace of matrix A is denoted by Tr(A). Expectation operator is denoted by

II. SYSTEM MODEL
In Section II-A, we present the scenario of a wireless MEC-enabled Metaverse system for location-dependent AR services, and introduce its uplink and downlink phases. In Section II-B, we describe the simultaneous localization and communication models in the uplink phase. In Section II-C, we formulate the communication and computation models in the downlink phase.

A. SCENARIO DESCRIPTION
We study a wireless MEC-enabled Metaverse system for location-dependent AR services, as depicted in Fig. 1. In this system, there exists a BS equipped with M antennas, denoted by M = {1, . . . , M}, and an MEC server is installed at the BS to provide AR services for Metaverse users. K Metaverse users, denoted by K = {1, . . . , K}, run AR applications and periodically request videos from the BS. Suppose that each user's AR device has N antennas. The system owns K channels, the bandwidth of which is W. To avoid the mutual interference, each channel is used to serve one user.
The system is assumed to be well synchronized. To be specific, the timeline is divided into time slots and we have the following two phases in each time slot: • Uplink phase: In this phase, the user performs simultaneous localization and transmission. Specifically, the AR device will transmit wireless signals to update its own position and orientation. At the same time, it will also transmit a video update request according to the estimated results. • Downlink phase: After the uplink phase, the MEC server processes the video, and then the BS transmits the processed video back to the user. They need to take the computation and communication resources into consideration when processing these videos. It is important to note that the virtual objects will remain unchanged within the considered time window. In other words, the total amount of data to be transmitted within the time window is fixed. However, for certain AR applications, the total data volume could be large or the number of users can be large [15]. Therefore, we consider an adaptive data transmission approach for AR data, where the data cannot be fully transmitted within a time slot and only a portion of the data can be transmitted. Moreover, the data transmitted in the previous time slots will be cached in the headset and there is no need to re-transmit it again in the following time slots. Details of these two phases will be presented in the following two subsections. Note that these two phases are for each time slot t, and we omit the index t in the following unless mentioned otherwise.

B. UPLINK PHASE
In the uplink phase, each user will send wireless signals for simultaneous communications and localization, as elaborated below.

1) COMMUNICATION
To guarantee that the BS can receive video request from each user, the received signal-to-noise ratio (SNR) should be larger than a predefined threshold. To be specific, denote the channel between the BS and user k as H k ∈ C N×M and the transmitted signals from user k as x k ∈ C (A+1)×1 , where A is the number of single-antenna anchor nodes for localization. We also define the beamformer for user k as W k ∈ C N×(A+1) . Therefore, the received signal at the BS from user k can be expressed as [16] where n k is the addictive Gaussian noise with zero mean and variance σ 2 c . Let |x k | 2 = 1, and the SNR constraint can be expressed as where η k is the SNR threshold for AR user k, which is specific for the AR service provided. Moreover, subject to the power budget for each user, the beamformer vector should satisfy where P max is the power budget for each user.

2) LOCALIZATION
At the same time, each user also generate beams for localization estimation [17]. Let the channel between user k and anchor node i be h ik ∈ C 1×N , and thus, the received signal can be expressed as where n ik is also the addictive Gaussian noise with zero mean and power σ 2 . The SNR for at each anchor node can be expressed as Each user can gather the signals received at these anchor nodes and estimate the location from these signals. However, the estimation will not be perfect and will definitely have estimation errors. To be specific, the location estimation error can be expressed as wherel k is the estimated location and l k is the groundtruth. For any unbiased estimator of the user's position, the estimation error is bounded by the Cramer-Rao bound [18].
To be specific, the estimation error can be approximated as where J k is the Fisher information matrix (FIM). According to the results in [19], the FIM can be defined as where c is the propagation speed, and q ik = [ cos φ ik , sin φ ik ] T with φ ik being the angle between user k and anchor i, which is a constant obtained by estimating the angle of arrival (AoA). For orientation, we assume that the headset is equipped with an accelerometer, gyroscope, and magnetometer [20], and the orientation can be obtained through these sensors, which is assumed to be highly accurate.

C. DOWNLINK PHASE
In the downlink phase, the BS needs to transmit the data of virtual objects to users. However, limited by the communication resources, the BS might not be able to transmit all the required data to users within one time slot. Instead, the BS only sends the data which might be within the FoV of the user relating to the current position, and thus the transmitted data will change according to the movement (including position and orientation) of the user, as shown in Fig. 2. On the other hand, even though the BS only transmits the data of the FOV, the bandwidth might not be sufficient for the highest resolution. Therefore, the BS will transmit a portion of the video according to the available communication  resources by adjusting the resolution of the video. As different parts of a video contribute differently to the QoE, their corresponding resolutions might vary. In the following, we first explain how the system adjusts the resolution of the video.
As shown in Fig. 3, the video for each user is partitioned into I 3D tiles and only part of these tiles will fall inside the user's FoV. Here, we define I k = {1, . . . , I k } as the set of tiles for user k, and I k is the total number of the tile. However, at a certain position and orientation, as different tiles contribute differently to the user's QoE, the resolution for different tiles should vary. For user k, define the resolution for the i-th tile as β k i ∈ (0, B], where B is the highest resolution that a tile can be [21]. Therefore, the resolution for each tile should be adjusted according to the available computation and communication resources. It is worth pointing out that the highest resolution level for each tile will vary in different time slots. To be specific, the transmitted video will be cached in the headset, and thus those data does not need to be retransmitted. Based on these facts, we elaborate on the computation and communication models in the following.

1) COMMUNICATION MODEL
The volume of transmitted data for each tile is proportional to the resolution of this tile [22]. Define κ as the coefficient, and thus, the data traffic to be transmitted to user k can be expressed as Let p k m be the transmit power of antenna m ∈ M for user k, and p k = [p k 1 , . . . , p k M ] T be the transmit power vector at the BS for user k. The transmission rate between the BS and user k can be expressed as where σ 2 is the power of the additive Gaussian white noise. To guarantee that all the data can be transmitted to the user, the data rate should meet the following constraint: Moreover, the transmit power of each antenna over all the channels cannot exceed its energy budget, i.e., m∈M k∈K where P is the energy budget at the BS.

2) COMPUTATION MODEL
To adjust the video resolution, the edge server needs to utilize its central processing units (CPUs). Let the number of overall CPU cycles to process the videos be V within a time slot. According to the video model shown in Fig. 3, for user k, the original data volume is κI k B while the transmitted volume is D k given in (9). Therefore, the compressed data volume is It should be noted that the maximum resolution B varies across different slots. When more data is cached in the headset, the required maximum resolution will decrease. The required number of CPU cycles is linear with respect to the compressed data volume [23]. Let ν be the data bits that can be processed in a CPU cycle, the required number of CPU cycles for user k can be expressed as Therefore, to ensure that all the computation tasks can be completed on time, we have

III. PROBLEM FORMULATION
In Section III-A, we first elaborate on how the estimation error influences the QoE. In Section III-B, we introduce the QoE model to quantify the performance of the Metaverse system. Finally, we formulate the minimum QoE maximization problem in Section III-C.

A. IMPACT OF ESTIMATION ERROR
As we have introduced in Section II-B2, the localization estimation cannot be perfect and will introduce some errors here. As a result, the estimation error will affect the probability that each tile falls within the FOV. In the following, we show the details about how the estimation error influences the probability.
In this paper, we assume that the FoV consists of b w × b h tiles. However, as shown in Fig. 4, the estimation error of position will cause the shift of the FOV, where the yellow and blue ones indicate different tiles in spherical and planar representations. Therefore, to quantify the impact of the estimation error, we use the probability for each tile that falls in the FOV. Intuitively speaking, if the estimation error is smaller, the tiles within the FOV associated with the estimated position will have a higher probability to be shown in the FOV.
We assume that the ground-truth position is within the error range centered at the estimated position [24]. To be specific, we divide the range of location errors into several discrete levels so that the corresponding FOV moves left (right) or up (down), as illustrated by positions 1 and 2 in Fig. 4. Moreover, the orientation including azumith and evaluation angles will also influence the shift of the FOV as discussed below.
Without loss of generality, the position error range is divided into L levels with fixed gap L, denoted by L. We define the azumith and evaluation angles for user k as θ k and ϕ k , respectively. We also consider that the position follows a certain known distribution within the error range, denoted by f p . For tile i, we use x i l to indicate whether this tile is within FOV 2 l under azumith angle θ k and evaluation angle ϕ k , where Therefore, the probability that tile i falls in the FOV can be expressed as 2. This indicates that the FOV is associated with the l-th position in L.
where f p l is the probability that the l-th location is the groundtruth.
Remark 1: The estimation range is a circle plane, and the FOV shift caused by the estimation error of the position is horizontal, as shown in Fig. 4.
Remark 2: It should be noted that the number of tiles I k is highly related to the location estimation error. In general, the higher estimation error, the more tiles should be transmitted, leading to degradation of resolution.

B. QOE MODEL
According to the previous discussions, different tiles contribute to the total QoE differently since its probability to be presented in the FOV varies. Before presenting the optimization problem, we need to first model the QoE as contributed by each tile.
According to the results in [25], the QoE contributed by tile i for user k can be expressed as Here, ω k i is the quality weight for the tile, which can be obtained from historical data [33]. In other words, this parameter indicates the importance of tile i in the video for user k.β k i is the data of tile i that has been stored in the headset, and f k i is the probability that tile i falls within the FoV of user k, which is calculated by (17).
According to the definition in (18), the QoE for videos with different resolutions will be different, leading to unfairness among users. To reduce the QoE variation caused by the resolution, we define the QoE for user k as whereQ is the constant representing the maximum QoE that the system can achieve and s is a scaling factor.

C. PROBLEM STATEMENT
As communication and computation resources are limited, it is not possible to guarantee the QoE for all the users. For fairness among these users, we aim to maximize the minimum QoE among all the users, which aims to align the QoE's of different users under limited resource constraints [26]. Mathematically, the problem can be written as Problem (20) is a max-min problem, which is not easy to deal with. To address this issue, we first introduce an auxiliary variable η, and replace the objective as the maximization of . To guarantee that is the minimum of Q k among K, we need to introduce the following constraint: Moreover, according to Remark 2, the uplink and downlink phases are coupled, which makes the problem hard to solve. To effectively solve the original problem, we decouple it into two subproblems: uplink and downlink subproblems. In this way, the estimation errors are obtained in the uplink phase, and thus, the number of tiles to be transmitted will be fixed in the downlink phase, making the problem easier to tackle.
1) Uplink Subproblem: Before introducing the uplink subproblem, we would like to give a proposition to show how the QoE changes with a larger estimation error range.
Proposition 1: Under a scaled distribution, i.e., the mean and variance of the distribution are multiplied by the same scaling factor by which the range of the estimation error is increased, the QoE will decrease with a larger estimation error range.
Proof: Please see Appendix A.
Based on this proposition, in the uplink phase, the subproblem can be transformed into the minimization of the estimation error range. Mathematically, as each Metaverse user optimizes its waveform independently, the optimization problem can be written as

s.t. (2) and (3). (22)
2) Downlink Subproblem: In this subproblem, we can obtain the localization error range according to the waveform optimized in the uplink phase {W k }. After that, the BS needs to determine the transmit power for each antenna and the resolution for each tile. Mathematically, the optimization problem can be written as It is worth pointing out that these two subproblems are not solved iteratively. Instead, these two subproblems are solved one by one in a time slot. This is because the joint optimization will require frequent signaling between Metaverse users and the BS as the uplink phase is performed by the Metaverse user while the downlink phase is executed by the MEC server and the BS.

IV. ALGORITHM DESIGN
In this section, we elaborate on how to solve the aforementioned two subproblems in the following two subsections, respectively.

A. WAVEFORM DESIGN FOR THE UPLINK SUBPROBLEM
In this subproblem, we aim to minimize the estimation error under the SNR constraint and power budget. It is easy to check that the constraints are convex. Therefore, we elaborate on how to tackle with objective e k . According to [27], such a problem can be converted into a semi-definite program (SDP). To be specific, we first introduce an auxiliary matrix Z k and introduce a new constraint as: Since J k is a positive semi-definite matrix, due to the property of Schur complement, the inequality is equivalent to Therefore, the problem for user k can be rewritten as Such a problem is an SDP, which can be solved by existing convex optimization techniques [28].

Remark 3:
The computational complexity to solve this subproblem is O(KN 6 ) according to [29]. This shows that the subproblem can be solved within a polynomial time.

B. JOINT RESOLUTION AND TRANSMIT POWER OPTIMIZATION FOR THE DOWNLINK SUBPROBLEM
In this subproblem, we use the waveform obtained in the uplink subproblem to calculate the estimation error range. In other words, the probability that each tile falls within the FOV f k i is constant in this subproblem as defined in Section III-A.
As the constraints define a convex set, the problem can be solved by existing convex optimization techniques. In the following, we introduce the algorithm with more details.
Let λ k 1 , λ m 2 , λ 3 , and λ k 4 be the Lagrangian multipliers corresponding to (11), (12), (15), and (21), respectively. Therefore, the Lagrangian can be expressed as (27) and the Lagrangian dual problem can be written as We can solve the dual problem iteratively by decomposing it into master and slave subproblems as follows: a) Slave subproblem: According to the Karush-Kuhn-Tucker (KKT) conditions, we can obtain the optimal solution by equaling the first partial derivative of the Lagrangian function U over each variable to 0. Specifically, we have We can see that p k m and β k i can be obtained through solving the linear programs defined in (29) and (30), respectively. In (31), it gives a condition on the Lagrangian multipliers. For , it can be set as the minimum Q k among all k ∈ K. b) Master subproblem: Once the results of the slave subproblem are obtained, the solution of the dual problem can be obtained by a subgradient method [30]. The update rule is given as where t is the iteration indicator, [x] + = max{0, x}, w t+1 k is weighting factor to satisfy (31), and δ 1 , δ 2 , δ 3 , and δ 4 are the step sizes to guarantee convergence.
Remark 4: Subproblem (23) has a non-empty feasible set only when where k∈K R * k is the maximum sum-rate under the power budget constraint in (12).
Proof: Please see Appendix B. Remark 4 provides the conditions when both computation and communication requirements are satisfied. This provides the guidelines on how to deploy communication and computation resources, which is important for practical system design.
Remark 5: The number of iterations to converge for the gradient decent algorithm is O(log(1/π )), where π is the error tolerance level [28]. The complexity in each iteration should be linear with the number of users, i.e., O(K).

V. SIMULATION RESULTS
In this section, we evaluate the performance of the proposed scheme. We first present our simulation settings, and then discuss the evaluation results.

A. SIMULATION SETTING
The simulation setting is presented in Fig. 5, where a virtual object is placed in the center of the room, and we have K = 5 users who walk around the virtual object in a circular trajectory as suggested in [15] whose radius is 10 m.  A = 3 single-antenna anchor points for localization are randomly located in a circle whose radius is 50 m. The distance between the virtual object and the BS is set to 150 m, and the height of the BS is 10 m. According to [31], we assume that the possible positions are uniformly distributed within the error range.
For video transmission, the videos are segmented into chunks of one second length, i.e., the duration for each time slot is one second [32], and we consider a total duration of T = 10 s. The video is encoded by FFMPEG with X.264 and its best quality is 4K, 3 corresponding to a bitrate of 35 Mbps. In other words, B = 3.686 × 10 5 and κ = 2.086. The FOV is divided into 4 rows and 6 columns, and the tile weights are given in Table 1 [33]. The maximum QoE constant isQ = 30 and the scaling factor is s = 10.
We assume that the AR device is equipped with 4 antennas, and the BS is equipped with 16 antennas. The channel is Rayleigh faded and the channel gain is modeled as given in [34] where the decay factor is 3.76 and the power gain factor is −17.7 dB. The CPU capacity for edge server is V = 6 × 10 10 cycles/second and the required CPU cycles for the compression task is μ = 2, 000 cycles/bit [35]. Other simulation settings are listed in Table 2 [36].

B. EVALUATION RESULTS
To evaluate the performance of the proposed scheme, we compare it with the following two benchmark schemes: • Fixed location: the edge server only transmits the VR content according to the estimated location without considering the estimation error. The resolution adjustment, communication and computation resource management schemes are the same as that in the proposed scheme; 3. 4K video means 4096 × 2160 pixels. • Uniform resolution: the edge server does not recognize the weight of each tile and uses the same resolution for all the possible tiles within the FOV. The resolution needs to be optimized according to the available computation and communication resources. Fig. 6 (a) shows how QoE changes for different number of users K. In general, the system QoE, i.e., the minimum QoE among all the users, decreases as the number of users increases. The reasons are two-fold: 1) The more severe interference among more users will make the estimation less accurate. As a result, more data needs to be transmitted; and 2) The communication and computation resources for each user decrease with more users. On the other hand, compared with benchmark schemes, we can observe that the proposed scheme achieves the best performance, indicating the effectiveness of the proposed scheme. In comparison with the fixed location scheme, our scheme is similar to the idea of robust optimization [37], which allocates communication and computation resources to those FOVs with a lower probability to reduce the QoE degradation caused by the estimation error. Moreover, compared with the uniform resolution scheme, we find that proper resolution matching, i.e., transmitting tiles with a higher weight with a higher resolution 4 will result in a higher total QoE. Furthermore, we can also observe that the QoE obtained by the uniform resolution scheme is higher than that obtained by the fixed location scheme, indicating that the estimation error has a more significant impact on the QoE than the resolution matching.
In Fig. 6 (b), we plot the QoE vs. estimation error range with different video quality levels, i.e., 2K, 4K, and 8K. It is observed that the QoE decreases with a larger estimation error range, which is consistent with Proposition 1. We can also learn that the drop rate is increasing with a larger estimation error. This is because a larger range results in a larger potential FOV range, and thus more data is required to be transmitted. Moreover, we can observe that the QoE will decrease for higher-quality videos with limited communication resources. The reason under this observation is that more data is to be transmitted to maintain the same 4. The idea is similar to maximum ratio combining in wireless communications [38].  QoE level while the communication resource might not be sufficient to support more data transmission.
In Fig. 7, we show how the number of anchors influences the system performance. From Fig. 7(a), we can observe that the QoE slightly increases with the number of anchors. This is because the estimation error can be reduced with more anchor nodes, leading to fewer data to be transmitted and higher resolution for each tile under limited communication resources, which is also verified by the results shown in Fig. 7(b). In this subfigure, we can observe that less data is compressed with more anchors, indicating that 1) less tiles are involved in the possible FOVs (caused by the shrinking estimation error range); or 2) higher resolution video for each tile is transmitted. On the other hand, we find that the performance improvement brought by increasing the number of anchors A is much less than the impact of the number of users K, because the sensing accuracy improvement is limited when the number of anchors is larger than a fixed threshold. Moreover, from this figure, we can conclude that the system tends to maximize the usage of communication resources as the computation resources are not fully utilized. The reason under this observation is that the QoE metric motivates the system to transmit video with as higher resolution as possible.
In Fig. 8, we show how the QoE changes in different time slots. Fig. 8(a) shows the QoE of a static user. From this figure, we can observe that the QoE increases as time progresses because the video resolution will be higher as time goes. This is because the data transmitted in the previous time slots will be stored in the device. For the proposed scheme and the uniform resolution scheme, the QoE will eventually reach its maximum value as all the data at the edge server will be eventually transmitted to the users. In contrast, the QoE obtained by the fixed location scheme will achieve a relatively lower saturated QoE because the scheme will not transmit the tiles in the shifted FoV caused by estimation error. Moreover, we can also see that the proposed scheme has a higher convergence rate, indicating that the adaptive resolution allocation can achieve a better QoE.
In Fig. 8(b), we present the case where the Metaverse user moves randomly. As the speed of the user is typically slow, the FOVs of two successive time slots overlap with each other. We assume that 30% FOVs are the same for two successive time slots in this case. From this figure, we can see that the QoE also increases as time goes because the data that has been cached in the headset can help save communication resources in the following time slots. Moreover, we can also observe that the proposed scheme outperforms the benchmark schemes in the mobile case.
In Fig. 9, we show the sensitivity of QoE with respect to communication and computation resources when the number of Metaverse users is K = 3. From this figure, we can observe that the QoE increases with higher energy budget for data transmission when the computation resource is fixed. However, when we double the CPU cycles, the QoE almost remains the same. This is because the QoE is positively related to the communication resource but independent to the computation resource. If more communication resource is deployed in the system, the BS will tend to transmit higher-resolution videos since the QoE will be improved, while increasing the computation resources will not change the QoE as the computation requirement has been satisfied.

VI. CONCLUSION AND FUTURE WORK
In this paper, we studied providing a location-dependent AR services in wireless MEC-enabled Metaverse systems. To maximize the minimum QoE among users, we first optimized the waveform in the uplink phase to achieve a low localization error, and then, in the downlink phase, we proposed a joint resolution and transmit power optimization algorithm for a high QoE. Simulation results verified the effectiveness of the proposed scheme, with the following findings: 1) Considering the localization estimation error can effectively improve the system QoE in Metaverse; 2) The QoE metric encourages the system to fully utilize the communication resources to transmit videos with a higher resolution; and 3) The impact of the number of anchor nodes on the system QoE is much weaker than that of the number of Metaverse users.
In the current system, the users use orthogonal channels and interactions among users are not considered. In the future work, we will consider the interactions, e.g., cooperation or interference. To be specific, if multiple users cooperate to perform the same task provided by AR services, the QoE per user should not only consider the individual video display on each user, but also the synchronization among these users. Therefore, a protocol should be designed to facilitate the synchronization and the QoE model should be modified. Moreover, since the videos transmitted to these users are highly correlated, the BS can use some network coding techniques [39] to save the bandwidth where the users can share the same content using direct communications. Under such a setting, the interference among users will be inevitable, and proper wireless management will be important to alleviate the interference.

APPENDIX A PROOF OF PROPOSITION 1
Let α > 1 be the increasing scale for the extended range in which the tiles can possibly fall within the FOV. Therefore, the distribution among these tiles also follows the scaled distribution.
However, as the data that can be transmitted is limited, the average resolutions for each tile will decrease, i.e.,β k ≤β k , whereβ k andβ k are the average resolutions for the extended and the original ranges for user k, respectively. Therefore, we have whereω k i andf k i are the weight and the probability for the i-th tile in the extended range, respectively, whileω k i and f k i are the weight and the probability for the i-th tile in the original range, respectively. The equality (a) in equation (37) is achieved as the distribution is scaled. Similarly, we can show that whereB k i andB k i are the maximum resolution and extended one for tile i of user k, respectively. Therefore, the QoE satisfies the following inequalitŷ This shows that the QoE decreases as the FOV range is extended.

APPENDIX B PROOF OF REMARK 4
According to (13)-(15), we have On the other hand, we can derive from (11)