Low-Complexity Multi-Antenna Coded Caching Using Location-Aware Placement Delivery Arrays

A location-aware multi-antenna coded caching scheme is proposed for applications with location-dependent data requests, such as wireless immersive experience, where users are immersed in a three-dimensional virtual world. The wireless connectivity conditions vary as the users move within the application area motivating the use of a non-uniform cache memory allocation process to avoid excessive delivery time for users located in wireless bottleneck areas. To this end, a location-aware placement and delivery array (LAPDA) is designed for cache-aided multiantenna data delivery with a fast converging, iterative linear beamforming process. The underlying weighted max-min transmit precoder design enables the proposed scheme to serve users in poor connectivity areas with smaller amounts of data while simultaneously delivering larger amounts to other users. Our new scheme is suitable for large networks due to its linear transceiver structure and it is not constrained by the number of users, cache size, or the number of antennas at the transmitter, unlike the existing schemes. Despite non-uniform cache placement, the proposed scheme still achieves a significant degree of coded caching gain that is additive to the multiplexing gain and greatly outperforms the conventional symmetric CC schemes in terms of both average and 95-percentile delivery time.


I. INTRODUCTION
Mobile data traffic is exponentially growing, and this trend will continue as the market is constantly inundated with technologies and devices that support new data-intensive applications in different forms and capabilities [1].Wireless eyewear devices, for example, enable data-intensive mobile extended reality (XR) applications [2], [3], which are also subject to strict quality of service (QoS) requirements such as low latency (< 10 ms) and high data rate transmission (6.37 − 95.55 Gbps) [4]- [9].This differs greatly from conventional ultralow latency and low-rate requirements for internet-of-things applications [10].The low latency, along with the high delivery rates, require more sophisticated transmission methods than those offered by current wireless network standards [6]- [8].Therefore, to meet the requirements of future wireless XR applications, new delivery schemes with higher bandwidth efficiency are needed.
One possible option, given that upcoming mobile broadband applications rely heavily on asynchronous content reuse [11], is to utilize proactive caching at the end-users to relieve network congestion and bandwidth consumption during peak times [12].In this regard, various studies have explored proactive caching in single-input single-output (SISO) configurations, demonstrating its benefits for meeting XR application requirements [13]- [16].Specifically, with the available memory at the end users, the whole or part of the requested content can be cached beforehand and rendered by the end user at the request time.This results in significant bandwidth and delay-reduction gains and alleviates the traffic burden over the wireless network [13]- [16].
Unlike conventional caching schemes that rely on the available memory of each user (see, e.g., [11]- [18]), the coded caching (CC) scheme originally introduced in [19] benefits from the aggregated memory throughout the network.In fact, it enjoys a so-called global caching gain, available through careful cache placement and multicast transmissions that results in improved overall performance compared to traditional schemes [11]- [18].As such, the transmission bandwidth for delay-constrained XR applications has been effectively reduced in SISO setups by leveraging coded cache placement and mobile device computing capabilities [20].The CC scheme is especially advantageous for large networks as the achievable global caching gain scales linearly with the number of users in the network.This makes it ideal for collaborative XR scenarios where a group of users is served simultaneously within a confined environment, with each user's individual actions impacting the results perceived by all users (c.f., [2], [21]- [24]).In this regard, a location-dependent CC-based cache placement and delivery scheme, originally designed for SISO setups, has been proposed in [21].
In addition, the CC scheme's ability to combine global caching and spatial multiplexing gains is another critical feature [25].This is particularly appealing given that multiantenna connectivity will be a crucial feature of upcoming communication systems [5].Thus, the SISO setup in [21] has been extended to a multiple-input single-output (MISO) setup in [22]- [24], to benefit from spatial multiplexing and global caching gains simultaneously.In this paper, we intend to overcome some of the practical limitations of our earlier schemes in [22]- [24].Notably, we propose a new locationdependent CC scheme, suitable for large networks due to its linear transceiver structure, and not constrained by the number of users, cache size, or the number of antennas at the transmitter, unlike the existing schemes.

A. Literature review
Coded caching.The original CC scheme in [19] was intended for SISO setups with an error-free shared link.This work was later extended to more practical scenarios, including multiserver [26] and MISO [25], [27], [28] setups.The early high signal-to-noise ratio (SNR) analysis in [25]- [27] proved that the so-called degrees of freedom (DoF) achieved by the MISO-CC scheme is optimal under uncoded cache placement and single-shot data delivery.Later, the analysis in [28] showed that an optimized multi-antenna precoder design is necessary for the CC scheme to perform well also in the low-SNR regime.Soon, device-to-device (D2D) CC schemes were proposed (e.g., [29]- [31]) to increase the network throughput.Bit-and signal-level CC.Despite exciting theoretical gains, various practical issues have restricted the real-world implementation of CC schemes.One prominent problem is the exponentially growing file-division requirement (w.r.t the network size), known as the subpacketization bottleneck.To address this issue, a combinatorial subfile assignment based on placement delivery arrays (PDA) was proposed in [32].
The PDA structure provides a set of conditions (reviewed for MISO systems in Section III) that allows a given matrix to be used for both content placement and delivery of a CC scheme, thus translating the subpacketization reduction problem to finding a small-dimension matrix satisfying PDA conditions.Interestingly, authors in [32] demonstrated that all the schemes in [19], [25]- [31] could also be presented as PDAs.Motivated by the generalized framework in [32], various PDA-based CC schemes were later proposed for different settings, aiming for reduced subpacketization [33]- [35].
A major breakthrough in subpacketization reduction was achieved with the introduction of signal-level CC schemes in [36].In contrast to bit-level CC schemes [19], [25]- [31], where file fragments intended to different users are combined/separated using bit-wise XOR operations in the finite field, signal-level CC schemes rely on the superposition of all precoded data terms in the signal domain and the regeneration and cancellation of the unwanted parts at the physical layer of each receiver (see [2] for a more detailed explanation).As a result, the design flexibility is greatly increased compared with bit-level schemes, enabling the subpacketization requirement of MISO-CC setups to be even smaller than their comparable SISO-CC settings [36].The signal-level scheme of [36] was then extended to centralized [37] and decentralized [38] shared-cache scenarios where a limited number of cacheenabled helper nodes serve a group of cache-less users.The applicability of signal-level CC schemes was later extended, e.g., to multiple-input multiple-output (MIMO) setups [39], [40], and to dynamic networks wherein users may freely enter/depart the network at will [41]- [43].Finally, to make the design of signal-level CC schemes more systematic, an enhanced PDA framework, called multi-antenna placement and delivery arrays (MLPDA), was proposed in [44].Of course, signal-level CC schemes also suffer from drawbacks such as inferior finite-SNR performance compared with bitlevel CC schemes [45], [46] and the requirement to regenerate and remove the interference in the physical layer [2].However, the remarkable flexibility of signal-level approaches continues to inspire ongoing research endeavors aimed at utilizing them to overcome different implementation challenges encountered in CC schemes.The near-far effect.Another crucial problem of conventional CC schemes is the near-far issue, which affects content delivery applications (e.g., [25]- [35]) in general and XR applications in particular.In CC schemes, a common multicast message is transmitted to serve several users at a time, and all these users must be able to decode the message simultaneously.As a result, the achievable rate is always limited by the user(s) with the worst channel condition.Studies on SISO-CC networks have shown that the practical gains of CC schemes could entirely vanish at the low-SNR region due to the nearfar issue [47].To address this issue, a congestion control technique is proposed in [48] with the intention of avoiding serving users that experience adverse channel conditions altogether.Similar scheduling approaches are also proposed in [49], [50], where joint queue minimization and packet control, as well as power minimization and scheduling, are considered for delay-constrained CC applications.In another work [51], a CC scheme with partial codewords (i.e., with a smaller number of data terms in the codeword compared to the baseline CC scheme of [52]) is introduced to adjust the user-specific QoE based on their current channel conditions.Multi-rate transmission in CC.In [25]- [38], [44], [53], equal-sized data chunks are combined to form a common message.In contrast, in [54], data terms with different sizes are combined via nested code modulation (NCM), creating codewords that serve every user in the multicasting group at a different rate.Similarly, combining the shared-cache idea of [36] with the NCM of [54], the near-far problem was mitigated in [55], [56].The proposed system model in [54] assumed fixed link capacities, particularly tailored for backhaul networks.Motivated by the results in [54], a locationdependent CC scheme was proposed in [21] for networks with variable link capacities, particularly applicable for future wireless XR applications.Later, the SISO setting in [21] was extended to a location-dependent MISO setup in [22]- [24].Specifically, while the schemes in [22], [24] benefit from the NCM for data delivery, the scheme in [23] uses a modified version of the signal-level scheme in [53] to support multirate transmission.Nevertheless, the schemes proposed in [22], [24] do not scale well with the increasing number of users.This is attributed to the exponential increase in the number of variables and constraints in the transmit precoder design optimization problem and the complex receiver structure.Similarly, the signal-level scheme proposed in [23] is limited to scenarios where the global caching gain is not greater than the multiplexing gain.Hence, a general framework that can scale with the number of users without such scaling impediments is still missing in the literature.

B. Our Contribution
A novel location-dependent multi-antenna CC scheme is proposed in this paper, leveraging location-aware placement and delivery arrays (LAPDA) formed using a proper set of MLPDAs (c.f., [44]).In the considered system setup, users are equipped with dedicated cache memories and can roam freely within the application environment.A location-aware, non-uniform memory allocation strategy similar to [24] is employed to ensure that users in areas with poor wireless link quality do not experience excessive delivery times.Due to different amounts of memory allocated to different locations, multiple location-dependent MLPDAs are utilized to define which part(s) of every file should be cached by each user.As a result, the number of cached subfiles for each locationdependent content could be different, and hence, a different transmission schedule may be needed to deliver the missing subfiles requested at each location.To handle this requirement, a file-mapping process is devised where all requested files in different locations are divided into equal numbers of file fragments (of various sizes).A common MLPDA is then used to deliver all requested file segments to all users, regardless of their location.
Parts of this paper have been published in our previous work [23].In this paper, one important limitation of [23] is addressed, namely the requirement for a larger spatial multiplexing gain than the global caching gain.This limitation arose from the utilization of the CC scheme proposed in [53], which is primarily designed for scenarios with a lower global caching gain relative to the multiplexing gain.To circumvent this limitation, we instead utilize a set of appropriately designed MLPDAs, which are demonstrated to be a general framework for signal-level CC schemes [44], including shared cachingbased approaches (e.g., [39], [40], [57]).Serving a large number of concurrent users submerged into a collaborative XR experience within a bounded environment is an ideal scenario for the proposed delivery scheme.In this regard, we follow the XR connectivity framework proposed in our earlier work [24] but utilize a simple linear transceiver design for data delivery with a fast iterative beamforming process through the use of LAPDAs that enable unicast transmissions.In particular, the unicast transmission allows a low-complexity beamformer design based on weighted max-min optimization, which is iteratively solved via Lagrangian duality.This fast beamformer design allows the proposed scheme to be applied to large networks, leading to significantly higher achievable coded caching gains compared to our previous method presented in [24].This translates into a notable performance advantage over conventional unicasting and multicasting schemes, as demonstrated by simulation results.

C. Notation and structure
Matrices and vectors are presented by boldface upper and lower case letters, respectively, and calligraphic letters are used to denote sets.For the set A and vector v, |A| and v represent the cardinality of A and norm of v, respectively.For two sets A and B, A\B includes the elements of A that are not in B. Finally, W denotes the set of non-negative integers, [m] represents the set of integers from 1 to m, and ⊕ denotes addition in the corresponding finite field.
The rest of this paper is organized as follows.In section II, we describe our location-based system model.A two-phase Fig. 1: An application environment with K = 3 users, split into S = 8 STUs.r(s) is the STU-specific achievable rate and r(3) > r(2) > r (7).X 123 is the transmitted message, X i and w i represents the data part intended for user i and its corresponding precoder, respectively.The black bar below each user indicates how much of the requested data is cached.cache placement scheme comprised of memory allocation and cache arrangement processes is described in section III, while section IV discusses the delivery procedure.In section IV-B, weighted-max-min beamforming, tailored for the considered location-based cache placement setup, is introduced.In the end, numerical results are provided in section V, while section VI concludes the paper.

II. SYSTEM MODEL
A downlink scenario is considered where a server with L transmit antennas serves K single-antenna, cache-enabled users. 1 The users are located within a bounded environment, such as a gaming hall, an operating room, or an exhibition hall.The system model is quite similar to our previous study in [24] but with new placement and transmission schemes designed to support large networks with improved scalability.Let K = [K] denote the set of users with limited memory capacities who can navigate within the coverage area.Users are assumed to request data from the server, depending on their location and application requirements.The environment is partitioned into S single transmission units (STU), wherein a distinct 3D image is required to reconstruct the 360-degree spherical virtual viewpoint around the user at each STU [2].As a small example, Figure 1 represents a simple application environment with eight STUs, where S denotes the set of STUs.
The STU mapping is designed so that the wireless channel quality can be assumed to be almost the same for all points within a given STU.We also assume the 3D image within each STU can be decomposed into static and dynamic components [2], [24].An example of such decomposition is shown in Figure 2. A proper modeling structure, such as the one described in [3], would allow users to cache the entire static part and a significant portion of the dynamic part in advance [24].In this paper, we concentrate on the efficient delivery of this cacheable part of the content.Denote W (s) as the (cacheable part of) file required for reconstructing the detailed FoV in STU s ∈ S and, without loss of generality, assume |W (s)| = F bits for every s ∈ S. Unless otherwise stated, this paper considers normalized data units, and F is dropped in subsequent notations.System operation consists of two phases: cache placement and delivery.During the placement phase, each user k, equipped with a cache memory of size M (normalized) bits, stores a message Z k = Z k (W (s 1 ), . . ., W (s S )) in its cache, where Z k (•) denotes a function of the files W (s), ∀s ∈ S, with entropy not larger than M bits.
During the delivery phase, users located within the application environment request missing data from the server to reconstruct the FoV of their current locations.Specifically, a request vector To deliver missing parts of the files in d, the BS then transmits several precoded messages x U at different intervals, where U ⊆ K denotes the set of users receiving (a part of) their requested data from x U .The number of precoded messages (and hence, the number of different user sets U) depends on the underlying CC scheme.Every message x U comprises several unit power codewords x k , where x k contains useful data for a user k ∈ U. Thus, x U is built as x U = k∈U v k x k , where v k ∈ C L denotes the precoding vector dedicated to codeword x k .To be specific, v k is designed to suppress the interference caused by x k on a subset of users in I k ⊂ U that can not remove the interference by their cache contents.After the transmission of x U , every user k ∈ U receives where the channel vector between the BS and user k is denoted by h k ∈ C L , and z k ∼ CN(0, N 0 ) represents the additive white Gaussian noise.Note that both the local cache content Z k and the received signals from the wireless channel over different time intervals are used at the decoder of user k to reproduce the requested file W d k .Moreover, the instantaneous channel state information at the transmitter (CSIT) is assumed to be available during the delivery phase, which is utilized for beamformer design and rate allocation. 3 Finally, as discussed in [24], an approximate throughput estimate, e.g., based on the statistics collected from previous application runs, is required for proper location-dependent cache placement.Unlike the delivery phase, it is not possible to calculate instantaneous achievable rates during the 3 CSIT measurement is feasible through reciprocal reverse link pilot measurements assuming data delivery is carried out within the channel coherence time.A detailed discussion on CSI acquisition in CC networks can be found in [53].
placement phase.This is because important information such as concurrently scheduled users, their locations, channel conditions, and precoding algorithms is not yet available.The aim of the approximation is to have a relative rate difference among different STUs available to allocate distinct portions of memory to them accordingly.Intuitively, to avoid extensive transmission times for users with poor connectivity, data needed at STUs with the lowest approximated rates should occupy most of the memory.To this end, we use the following rate approximation originally proposed in [24] where C p is a pre-log scaling factor containing any practical overhead, P T is the transmission power, Ω is the communication bandwidth, and h ks ∈ C L is the channel vector between the server and a user k located in STU s.Note that the expectation is taken over all user locations and channel realizations in STU s (c.f.[24] for more details).

III. CACHE PLACEMENT
A PDA-based location-dependent cache placement scheme comprising two consecutive processes, memory allocation and cache arrangement, is used in this paper.The memory allocation process is similar to [24] and prioritizes content requested in locations with poor wireless connectivity in order to mitigate excessive delays during the content delivery phase.Given the result of the memory allocation process, the cache arrangement process is then used to clarify what data parts should be cached by each user.One of the key novelties of this paper is the introduction of a PDA-based cache arrangement process that allows the overall scheme to be scalable w.r.t various network parameters.Nevertheless, for clarity, the memory allocation process of [24] is also briefly explained in the following.Memory allocation [24]: Real-time applications (e.g., XR) typically require a bounded delivery time.Excessive delivery delays can be circumvented by reserving a larger share of memory to store data requested in poor connectivity areas.Accordingly, the memory allocation process specifies the normalized cache size m(s) reserved for storing (a fraction of) each STU-specific file W (s) at every user.This paper assumes no prior knowledge about the users' spatial locations during the subsequent delivery phase; hence, we consider uniform access probability for all STUs during the placement phase. 4ollowing [24], if m(s) values are known, the total delivery time T T can be approximated as where m = min s∈S m(s) is the least allocated memory for a STU and r(s) is the approximated rate at STU s (c.f.Eq. ( 2)).
The K m + L term in the denominator of (3) approximates the achievable DoF for the non-uniform memory allocation scenario (note that for the uniform allocation, the DoF is upper bounded by K M S +L [44]), and the term K max approximates the worst-case delivery time across all the STUs when K users are served simultaneously.In order to find approximate m(s) values that minimize the expected delivery time, we first rewrite (3) as TT = 1 r(s) , and then, formulate the memory allocation process as the following linear fractional programming (LFP) problem: min Note that at the optimal solution to (4), m = m = min m(s).
Using the Charnes-Cooper transformation, the LFP in (4) can be reformulated as a linear programming problem and solved efficiently [24].For ease of exposition, we assume that Km(s) is a positive integer for all s ∈ S throughout the text.The non-integer Km(s) case is addressed in Appendix B using time-sharing, which is an alternative method that surpasses the performance of the approach proposed in [24].
Cache arrangement: We utilize a location-aware placement delivery array (LAPDA) to store data fragments of files in users' cache memories.Let Q denote a specific LAPDA consisting of a set of S STU-specific MLPDA matrices Q s , s ∈ S, that are interrelated with an extra cross-matrix condition that ensures data delivery is possible with the given non-uniform memory allocation.Before going through a detailed explanation of LAPDA, let us first review the general definition of MLPDAs.
, is a F s × K matrix whose elements include the specific symbol " * " and N s positive integers {1, 2, . . ., N s }.For positive integers L, K, F s , and Z s , Q s satisfies [44]: C1.The symbol " * " appears Z s times in each column, such that Zs Fs = m(s); C2.Each integer n ∈ [N s ] appears at least once in the matrix; C3.Each integer appears at most once in each column; C4.For every n ∈ N s , if we define U(n, s) [44], the (L, K, F s , Z s , N s ) MLPDA Q s uniquely identifies a placement-delivery strategy for a MISO network with K cache-enabled users, a coded caching gain of t s = Km(s), and a spatial multiplexing gain of L. In this regard, each file is first divided into F s subpackets, from which all subpackets for all s ∈ S do 5: for all f ∈ [Fs] do 7: for all k ∈ K do 8: if Qs(f, k) = * then 9: Put Wp(s) in the cache of user k Note that condition C1 ensures that the memory constraints are met, C2 prevents empty transmission, C3 removes the need for successive interference cancellation (SIC), and C4 ensures that any interference that cannot be removed with cache content is suppressed by a proper precoder.
For the proposed location-dependent cache placement where each STU s ∈ S has a possibly different allocated memory portion m(s), it is necessary to use a different MLPDA Q s for each state to satisfy the memory constraint.This is different from conventional MLPDA schemes that use a single MLPDA to store all library files.In fact, given a set of MLPDAs {Q s }, the files for every STU s ∈ S are first divided into F s subfiles, where F s can vary for each STU.Then, for every STU s ∈ S, Algorithm 1 summarises the placement process for a set of MLPDA {Q s }.
Using a distinct Q s for each STU results in an unequal number of subfiles F s , number of cached data elements Z s , and time slots N s to deliver location-dependent missing data.Hence, an additional cross-matrix condition is added to the conventional MLPDA definition to ensure a feasible delivery scheme for the proposed non-uniform placement.Accordingly, a proper LAPDA Q is defined as follows.
The following example illustrates the entire cache placement process, including memory sharing and cache arrangement.In Section IV, we propose a novel delivery algorithm tailored for the above described non-uniform cache placement that achieves a significant coded caching gain, similar to the location-dependent scheme of [24], but now applicable to much larger networks.Moreover, in Appendix A, we discuss how the extra condition in Definition 2 ensures the feasibility of the delivery scheme.
Example 1.To illustrate the proposed location-dependent cache placement, we consider an example scenario from [24] with K = 4 users and L = 2 transmit antennas.The environment is split into S = 5 STUs, and for each STU, the required data size is F = 400 Megabytes.Each user has a cache size of 900 Megabytes; hence, the normalized cache size is M = 2.25 data units.The approximated normalized 0.25 0.5 0.75 0.5 0.25 throughput value for each STU is as given in Table I, where the memory allocation resulting from solving (4) is also shown.
Consider the following MLPDA matrices Q 1 -Q 5 , which satisfy the conditions in Definition 1 for the resulting m(s) values in Table I:

IV. CONTENT DELIVERY
During the delivery phase, users move within the application environment (hence, change their locations) over time.In each time instance, users reveal their location-dependent file requests to the server. 5Without loss of generality, we consider a specific time slot, where every user k in STU s k requests the file W d k ≡ W (s k ) from the server to reconstruct its STUspecific FoV.Accordingly, the server builds and transmits several precoded messages to the requesting users.To reconstruct W (s k ), user k requires one normalized data unit, from which a portion of size m k ≡ m(s k ) units is available in its cache and the remaining part should be delivered by the server.Note that the conventional PDA-based delivery schemes are suited for scenarios where all users cache the same amount of data (e.g., [36], [44], [53]).So, they do not apply to our case where users have cached different amounts of their requested files.

A. PDA-based Delivery
To tackle the challenge posed by uneven memory allocation, we first make a temporary assumption that all users have cached the same portion of m = min k∈K m k of their requested files, and use any conventional PDA-based delivery scheme in the literature (e.g., [36], [44], [53]) to generate a set of preliminary transmission vectors (PTVs).These vectors are subsequently adjusted to accommodate the different file-indexing procedures employed for STU-dependent cache placement during the placement phase.Using the 'min' operation can limit the performance in certain scenarios when a subset of users have relatively smaller m(s k ) values compared to the rest, e.g., as they are close to the transmitter.To address such cases, the concept of phantom users has been proposed in [24] to separately serve users with small m(s k ) value via unicasting.However, in larger networks considered in this paper, such scenarios are less probable.This is due to the variable m in the LFP formulation (4), which inhibits assigning small values to any m(s), especially when the ratio L/K is small (e.g., for K ≫ L case considered in this paper).
For clarity, we designate xU to represent a PTV, which will be later modified to form the transmission vector x U .In order to build xU , we use the MLPDA Q corresponding to the location of the user with the least available memory, i.e., Q ≡ Q s k * , k * := arg min k∈K m(s k ).This means we need N = N s k * consecutive transmissions, and the PTV at time where is the set of users served in the n'th transmission, f n k is the temporary index of the subfile destined to user k ∈ U(n), and v k (n) is the optimized beamforming vector dedicated to user k.
Since x(n) is built using matrix Q but data placement is done using the set of matrices Q s , the temporary index f n k may not coincide with the missing subfiles of the files requested by every user k ∈ U(n).Example 2 clarifies this statement.
Example 2. Consider the network in Example 1, for which the cache placement is given in (5).Consider a specific time instant n with the following user-to-STU associations: s 1 = 1, s 2 = 2, s 3 = 3, and s 4 = 4. Denoting the set of requested sub-files for user k with M k and assuming Note that the subfiles of A, B, C, D are 1/12, 1/4, 1/4, 1/4 data units in size, respectively.Here, the minimum available amount of the requested data in cache belongs to user 1; hence, Q ≡ Q 1 , and we serve all users in N = N s1 = 12 time slots.For example, the first PTV is built as and the rest can be built accordingly.Now, considering x(1), temporary file indices for users 1, 2, and 3 are 4, 1, and 1, respectively.However, from (9), users 2 and 3 already have B 1 and C 1 in their cache memories.Hence, we must carry out an appropriate index mapping process in PTVs before transmission.As a side note, using (8), one can easily verify that the interference indicator sets for PTV x(1) are I 1 (1) = 3, I 2 (1) = 3, and I 3 (1) = 2.
To adjust temporary file indices in PTVs, we first note that every user k appears F − Ẑ times in all PTVs, where Ẑ ≡ Z s k * and F ≡ F s k * .However, in practice, every user k needs for all k ∈ [K] do 3: for all ω ∈ [Fs k ] do 9: if g χ,ω,k > 0 then 10: Break; 13: return {P k } Z s k subfiles to construct its FoV.Hence, to uniformly map the missing subfile indices file-fragments, where α is a normalizing coefficient guaranteeing D k is an integer for every user k (for example, we may set α to be the smallest common multiplier of all to represent the file-fragments resulting from subfile W fs (s k ).
After the division of subfiles into file-fragments, the transmission vectors x(n) are obtained from PTVs x(n) by replacing each subfile , where denotes bit-wise concatenation, P k is the N × α userspecific file-fragment index matrix (explained shortly), and q(W P k (n,m) (s k )) represents the file-specific counter corresponding to the q'th fragment of the file W P k (n,m) (s k ).After initializing with q = 1, it is incremented by one each time a fragment of W P k (n,m) (s k ) is assigned to the transmission vector.Finally, x(n) is expressed as Matrix elements P k (n, m) ∈ [F s k ] are designed such that 1) all the missing subfiles are delivered to all the users and 2) the cache-aided interference cancellation is performed correctly.To build P k , we first form K user-specific File-Mapping (FM) matrices G k , ∀k ∈ [K], with size F × F s k , to map missing subfile indices to temporary PTV indices.Denoting the i'th row and j'th column of the matrix G k by g i,j,k , it represents the number of file-fragments of the subfile W j (s k ) that should be included in the concatenation process while building x(n) in (10), if the corresponding PTV x(n) includes W i (s k ) (recall that j ∈ [F s k ] and i ∈ [ F ]). Accordingly, the FM matrices G k , k ∈ [K] are defined as follows: Form G k based on Q 7: Compute P k based on G k using Algorithm 2 8: for all n ∈ [ N ] do 9: x(n) ← 0 10: for all k ∈ U (n) do 11: x k ← 0 12: for all m ∈ [α] do 13: p ← P k (n, m) 14: q(Wp(s k )) ← q(Wp(s k )) + 1 15: χ ← q(Wp(s k )) 16: C3. Based on Definition 1, each temporary index i ∈ Ňk appears only once in PTVs.However, every user k needs a total number of Thus, α file-fragments must be considered for the concatenation process in (10), i.e., C4.As discussed above, each missing subfile of user k is divided into D k file-fragments.To make sure that all these file-fragments are delivered, the sum of the elements in every column of G k that is included in RIS must be equal to D k , i.e., i∈ Ňk The conditions C1-C4 in Definition 3 constitute a system of equations that needs to be solved to obtain the matrix G k .The details of creating and solving this system of equations are provided in Appendix A.

B. Weighted Max-Min Beamforming
In this section, we explain the optimal design of precoding vectors v k (n) in ( 6) to enhance the finite-SNR performance.Unlike the conventional max-min beamforming problem addressed in previous works such as [28], [31], [53], we consider a weighted-max-min (WMM) beamforming approach to enable multi-rate transmission.In this regard, we alter the iterative approach presented in [53], with a slight modification to account for the weighted beamforming requirements, wherein the weights reflect the non-uniform quantities of data transmitted to different users.To this end, we first briefly review the delivery process.
As discussed in Section IV, the proposed scheme serves all the users using N transmission vectors x(n).Every vector x(n) comprises |U(n)| data terms x k and the same number of beamforming vectors v k (n) as represented in (1).Hence, the corresponding received signal at user k in n'th transmission can be rewritten as 12), each intended message x k contains a fresh data for user k with size Recall that x k comprises α file-segments, where each segment is 1 D k of a subfile and each subfile is 1 Fs k data units in size.Note that the underlined terms in (12) are removed from the received signals by utilizing cache memories and estimating the equivalent channels h H k v j (n) prior to the decoding process.Consequently, these terms are not regarded as interference.So, the received SINR at user k is calculated as follows Moreover, the required time to deliver x k is calculated as . Thus, the total delivery time to send Now, since we aim to minimize the delivery time, the beamformer optimization problem is formulated as min {v k } max k∈U (n) T k , or equivalently as , where log(1 + γ k ) is the dedicated rate to user k.Hence, the weighted minimum rate maximization for a given transmission x U (n) can be formulated as Now, an equivalent objective can be achieved by using 1 c k , we can express the weighted-max-min (WMM) beamforming problem as: where P T is the transmit power and γ = exp (γ).The quasiconvex problem (15) is similar to the one discussed in [53], but with additional convex constraints γ c k −1 ≤ γ k , ∀k ∈ U(n). 6his problem can be optimally solved by conducting a search over γ using bisection and applying the Lagrangian duality (LD) scheme in [53] for a fixed The LD scheme is an iterative fast-beamforming method used for linear-beamformer design in the literature (c.f., [53] and the references therein).Since it has already been thoroughly described in [53], we will briefly review the LD scheme and its modifications for our WMM optimization.Specifically, we iteratively determine the optimal value of γ using a bisection search, where γ k = γ c k − 1.Once γ is fixed, we employ the fixed point iteration method [53] to obtain the dual variables ν k for ∀k ∈ U(n).Specifically, in the initial step, we initialize ν k as ν k ← ν k [1].Subsequently, we iteratively update the dual variables until the desired level of convergence is attained.For each iteration τ + 1, the dual variable ν k is updated according to the following: where k + N 0 I.To determine the power vector of the beamformers, denoted by p, we can follow the same steps as presented in [53,Eq.(26)].Note that γ will be updated based on p to satisfy the power constraint P T .Hence, the set of optimal downlink beamformers is computed as , where p k is the k'th entry of p.
Remark 1. Similar to [24], the proposed WMM beamforming in (15) results in proportional rate allocation such that . However, unlike [24], the proposed delivery scheme in this paper removes the need for successive interference cancellation (SIC) at the receiver.Herein, the SIC requirement is removed due to condition C3 in Definition 1, which prevents multiple message transmissions to a single user at a time.This paper proposes a scheme that is more suitable for large networks, not only because of its simplified transceiver design, but also because of its reduced processing requirements for signal-level delivery, decreased subpacketization [43], and the ability to employ shared caching concepts [37], [38], [42].Now, using Eq. ( 13), the total transmission time T T for the whole delivery process can be calculated as It is imperative to note that the calculation of the total transmission time T T in equation ( 17), relies on knowing the beamformers for all N transmissions, which in turn requires accurate information about user locations and channel conditions.However, during the placement phase, the actual user locations and channel states are unknown, rendering T T computation infeasible.To address this challenge, we adopt an approximation for T T for placement purposes that assumes a uniform access probability for all STUs.This approximation is in accordance with the findings presented in [24] and is based on the delivery in Section IV.Thus, the same approximation as in [24] is obtained in this paper, despite the utilization of different delivery and placement techniques.
Lemma 1.The total delivery time T T calculated in (17) can be approximated as Proof.We first substitute log(1 + γ k ) in ( 17) with its upper bound r(s k ) given in (2) to get Then, using inequality max r(s) , we substitute the RHS of (19) with its upper bound to get , where is nothing but the sum-DoF [44].Hence, approximating K( F − Ẑ) N with its upper bound K m + L [44], we have Note that m = min k∈[K] m(s k ) requires user location knowledge, which is unknown during the placement phase.Thus, replacing m with its lower bound m, (18) is achieved.

V. SIMULATION RESULTS
To evaluate the proposed location-dependent scheme, we conduct numerical simulations in a scenario similar to the one studied in [24], albeit with a much larger number of users K. 7 Specifically, we consider a 30 × 30[m 2 ] XR application environment, where a unique 3D image is required at each STU of size 1 × 1[m 2 ] to reconstruct a detailed FoV (resulting in a total number of S = 900 STUs).A transmitter with L antennas is located on the ceiling, 5[m] above the floor, in the middle of the room.We assume that the small-scale fading of the channel vectors h k follows a Rayleigh distribution and use the path loss model of [24] for a user at STU s ∈ where d s represents the distance between the center of STU s and the transmitter, η = 3 is the pass-loss exponent, and f denotes the frequency.To simulate the effect of randomly placed objects that can obstruct the propagation path between the transmitter and receivers, we use the term ζ ∼ N(0, σ), where σ is the standard deviation.Note that ζ is similar to the shadowing effect observed in outdoor propagation environments.We calculate the expected delivery rates r(s) for the initial memory allocation in (4) by averaging over the rate values for all possible user locations and channel realizations in a given STU (c.f., Eq. ( 2)).Without considering the shadowing effect ζ, the transmit power is assumed to provide a 5[dB] SNR at the room boundaries, (unless mentioned otherwise).During the delivery phase, optimized weighted max-min beamformers (15) are used, and users are assumed to be located at any STU with equal probability.Due to the stringent requirement of XR applications for a bounded delivery time, we use the 95-percentile of expected delivery time as a key performance metric (see [24]).We evaluate four distinct placement and delivery schemes: • Single-user placement, w/o CC: This scheme employs a memory allocation process similar to [21] that maximizes the local caching gain at bottleneck areas (equivalent to (4), when the denominator of the objective function is ignored).Data transmission is done with conventional unicast beamforming and without using any CC technique.
• Multi-user placement, w/o CC: This scheme employs the memory allocation process proposed in Section III, but data transmission is done with conventional unicast beamforming and without using any CC technique.• Multi-user placement, w/ CC: This scheme employs the memory allocation process proposed in Section III, together with the location-dependent CC delivery scheme in Algorithm 3; • Uniform placement, w/ CC: This scheme employs the conventional uniform memory allocation together with the CC delivery scheme proposed in [42] based on shared caching idea.
In all of these schemes, we use the shared caching approach described in [42] to construct a proper PDA, satisfying the conditions outlined in Definition 1 (as well as Definition 2 for location-dependent CC schemes).All schemes are compared in terms of their delivery times for 500 random user drops and the resulting cumulative distribution functions (CDF) are shown in Fig. 3.As shown, the scheme with uniform placement has the worst variation in total delivery time, making it unsuitable for applications with real-time content requests (e.g., XR gaming).This variation occurs because uniform placement only maximizes the minimum global caching gain, leading to optimal performance when all users have good connectivity but poor performance when some users experience poor connectivity in bottleneck areas.A location-dependent placement (even without any CC technique) can avoid this issue and improve overall performance.Furthermore, we observe that when no CC technique is used, single-user placement outperforms multi-user placement due to its higher local-caching gain.However, single-user placement is unsuitable for multicast CC transmission as it does not allocate any cache to the content requested in locations with good connectivity, reducing the possibility of achieving any coded caching gain.Finally, the proposed CCbased transmission scheme with multi-user placement provides the best performance as it enables a global caching gain while also avoiding wireless connectivity bottleneck areas.
Figure 4 compares the performance of different schemes for various values of σ, which controls the attenuation intensity in different STUs.For small σ, the traditional uniform-placement method performs just as well as the proposed CC scheme with multi-user placement, and sometimes even better.This is because, with small σ, the variation in large-scale fading among STUs is small, and hence, non-uniform memory allocation is unnecessary since it reduces the minimum achievable coded caching gain.However, the proposed scheme is more effective (outperforming all other schemes) for larger σ, as there are more attenuated STUs with significant rate differences to wellconditioned STUs.In fact, in these cases, the rate improvement for individual users outweighs the DoF loss caused by the memory allocation process (K m + L vs K M S + L for the uniform placement case).
Figure 5 compares different scheme based on the SNR value at the room border.As depicted, the single-user cache placement scheme is the best option when the received SNR is very low.In this case, the achievable rate at different locations is highly diverse, making the local caching gain the most influential factor in reducing the overall delivery time.Conversely, when the transmit power is high enough to make all locations have similar achievable rates, coded caching gain is the primary factor in reducing the overall transmission time.Therefore, uniform placement is optimal as it maximizes the minimum achievable DoF.
Finally, Figure 6 compares different schemes for a different number of transmit antennas (L).Results show that when L is large, the coded caching gain is less effective in improving overall performance since L is the main contributor to the achievable DoF, i.e., K m + L. Thus, location-dependent schemes with no CC techniques perform almost as well as the proposed method with multicast CC transmission.However, when L is relatively small, the coded caching gain is crucial in reducing the transmission time, and the proposed scheme is much more effective than the single-user cache placement case.Figures 7 and 8 support a similar conclusion: Figure 7 illustrates that the local caching gain is the most influential factor in reducing delivery time when the available memory is small.This is because the CC gain K m is much less than the number of transmit antennas (L) in such scenarios.Figure 8 shows that the performance gap between the proposed method and the rest widens as the number of users increases due to higher achievable CC gain (K m) for larger K.

VI. CONCLUSION
In this paper, we have proposed a cache placement and delivery scheme for location-dependent data requests suitable for future collaborative wireless XR applications.Our scheme      mitigates excessive delivery times through an efficient memory allocation process.We allocate a large portion of memory to bottleneck areas by approximating a rate difference between various locations in the application hall.Due to its simple transceiver design, reduced subpacketization and processing requirements, and the ability to employ shared caching concepts, the proposed scheme can be easily implemented in large networks with many users utilizing a well-defined LAPDA structure.Numerical results demonstrated the superiority of the proposed approach in different scenarios, especially those with high channel variations and a large number of users, for which a bounded transmission time was ensured by minimizing the use of wireless resources in bottleneck areas.In the future, the proposed approach could be extended to include multiple transmitters, incorporate side information regarding user movements and STU transition probabilities, and examine more dynamic scenarios where users' cache content is updated as they navigate through the environment.

APPENDIX A FM MATRIX FORMATION
Section IV introduces the mapping process from RIS N k to the TIS Ňk to serve user k appropriately.To facilitate this mapping, K MF matrices G k are defined.The element g i,j,k in G k specifies the number of file fragments {W q j (s k ), j ∈ N k , q ∈ [D k ]} that should be substituted for the temporary index i ∈ Ňk .According to Algorithm 3 for the delivery process, the content transmitted with temporary index i ∈ Ňk On the other hand, based on the cache placement described in section III, sub-file W j (s) is cached by all users k ∈ U j (s), where U j (s) := {k | k ∈ [K], Q s (j, k) = * }.Hnece, the value of g i,j,k is determined as follows: In cases where U i (n) U j (s k ) for some j ∈ N k and i ∈ Ňk , it means that there is at least one user in the set U i (n) who does not have W q j(s k ) in its cache.This violation of condition C.4 in Definition 1 results in interference-limited transmission.
Consequently, for such cases, no file fragment W q j (s k ) should substituted for the delivery index i ∈ Ňk , i.e., g i,j,k = 0.
Next, as discussed in section IV, the total number of filefragments substituted for a temporary index i ∈ Ňk must be equal to α.Thus, the non-zero variables g i,j,k must satisfy the following Additionally, the total number of file fragments for each subfile W j (s k ) is equal to D k .Therefore, the total number of file fragments {W q j (s k ), j ∈ N k } substituted for different temporary indexes {i | U i (n) ⊆ U j (s k ), i ∈ Ňk } should sum up to D k .In other words, i∈ Ňk Consequently, to form each user-specific matrix G k , the server needs to solve the non-zero variables {g i,j,k }, satisfying F − Ẑ conditions in (21) and F s k − Z s k conditions in (22).Hence, to form the matrix G k , the server needs to solve the following system of equations where, A k ∈ {0, 1} ϕ k ×ψ k is the coefficient matrix, ϕ k = F + F s k − ( Ẑ + Z s k ) is the total number of conditions in (21) and ( 22), ψ k is the total number of non-zero variables, y ∈ W ψ k is the variable vector, and b k ∈ {α, D k } ϕ k is the target vector.Based on each user state s k , variable y in ( 23) is given in one of the following closed-form [58] 1) Over-determined case Rank(A k ) = ϕ k : A k )y 0 , In case Rank(A k ) < min{ϕ k , ψ k }, there are several methods (e.g., singular value decomposition and rank decomposition [58]) available to solve (23), which are beyond the scope of this study.It is worth noting that Definition 2 ensures that there is at least one g i,j,k > 0 for every i ∈ Ňk in equation ( 21), and similarly for every j ∈ N k in equation (22).Consequently, these equalities remain valid at all times, and equation ( 23) has a solution that is not empty.

APPENDIX B NON-INTEGER CODED CACHING GAINS
We follow a similar memory-sharing scheme as in [19] for non-integer coded caching gains (i.e., in case Km(s) is not an integer).In this regard: In this case, the delivery will be done in two subphases based on time-sharing.In the first sub-phase, We consider two cases for data delivery, 1) ⌊Km(s k )⌋ > ⌊Km(s k * )⌋ and 2) ⌊Km(s k )⌋ = ⌊Km(s k * )⌋.Case 1 (⌊Km(s k )⌋ > ⌊Km(s k * )⌋): Since in this case, for all user k, the placement is done differently for W 1 (s k ) and W 2 (s k ), the index mapping process is also separately performed for W 1 (s k ) and W 2 (s k ).To this end, during the first delivery sub-phase, β 1 portion of every sub-file of W 1 (s k ) smaller fragments, where F and Z are equivalent to F s k * and Z s k * , respectively.Similarly, β 1 portion of every sub-file of W 2 (s k ) is divided into and D 2 k = α F −Z F s k −Zs k smaller fragments.Then, based on Q1 , Q(s k ), and Q(s k ), two file-fragment matrices P 1 k and P 2 k are formed for W 1 (s k ) and W 2 (s k ), respectively.Using P 1 k and P 2 k , each transmitted message to user k will carry α filefragments of β 1 portion of W 1 (s k ) and α file-fragments of β 1 portion of W 2 (s k ).The remaining β 2 portion of W 1 (s k ) and W 2 (s k ) will be delivered in the second sub-phase, using Q2 , Q(s k ), and Q(s k ) to form P

1 k and P 2 k
. For the sake of brevity, we avoid reviewing a similar process in the second delivery sub-phase.Case 2 (⌊Km(s k )⌋ = ⌊Km(s k * )⌋): In this case, ⌊Km(s k )⌋ + 1 − Km(s k ) portion of W (s k ) (i.e., W 1 (s k ))is already cached based on Q1 and can be easily delivered by forming P 1 k .The remaining β 1 − ⌊Km(s k )⌋ + 1 − Km(s k ) portion of W (s k ) can also be delivered in the first delivery sub-phase based on Q1 and Q(s k ).In this regard, firstβ 1 − ⌊Km(s k )⌋ + 1 − Km(s k ) portion of of every sub-file of W 2 (s k ) is divided into and D 2 k = α F −Z F s k −Zs ksmaller fragments.Then, using P 1 k and P 2 k , each transmitted message to user k will carry α file-fragments of W 1 (s k ) and α file-fragments ofβ 1 − ⌊Km(s k )⌋ + 1 − Km(s k ) portion of W 2 (s k ).In the second sub-phase, the remaining portion of W (s k ), i.e., Km(s k ) − ⌊Km(s k )⌋ − β 1 − ⌊Km(s k )⌋ +

TABLE I :
Location-specific rate and memory allocation for Example 1.
a LAPDA according to Definition 2, and hence, can be utilized for location-dependent data delivery.In this regard, using Q 1 -Q 5 for the data placement, files W (1) and W (5) are first divided into 12 subfiles, from which 3 are cached in each user's memory.Similarly, files W (2) and W (4) are divided into 4 subfiles, and 2 of them are cached in the memory of each user.Finally, the file W (3) is divided into 4 subfiles, from which 3 are cached in each user's memory.