Gambling on Reconfigurable Intelligent Surfaces

We consider multi-operator wireless networks where broadband reconfigurable intelligent surfaces (RISs) effectively cover the transmission bands of all operators. These RISs are supplied by a dedicated provider and dynamically leased on-demand to individual operators to support their transmissions. When an operator takes control of a RIS, it can adjust its phase-response to meet the requirements of its users. This sets the stage for a competitive scenario where operators vie for control of RISs. To address this competition, we introduce an auction format designed to efficiently allocate RISs to operators. Furthermore, we develop a multi-agent reinforcement learning environment to optimize operators’ bidding strategies, demonstrating its superiority over the heuristic dominant strategy of greedy bidding.


Gambling on Reconfigurable Intelligent Surfaces
Stefan Schwarz , Senior Member, IEEE Abstract-We consider multi-operator wireless networks where broadband reconfigurable intelligent surfaces (RISs) effectively cover the transmission bands of all operators.These RISs are supplied by a dedicated provider and dynamically leased on-demand to individual operators to support their transmissions.When an operator takes control of a RIS, it can adjust its phase-response to meet the requirements of its users.This sets the stage for a competitive scenario where operators vie for control of RISs.To address this competition, we introduce an auction format designed to efficiently allocate RISs to operators.Furthermore, we develop a multi-agent reinforcement learning environment to optimize operators' bidding strategies, demonstrating its superiority over the heuristic dominant strategy of greedy bidding.

I. INTRODUCTION
I N RECENT years, RISs have demonstrated their ability to enhance the performance of wireless networks across various aspects.They have the capability to improve the capacity, coverage, and energy efficiency of wireless networks [1], [2], [3].This becomes particularly crucial in mitigating shadowing effects at higher carrier frequencies.Additionally, RISs can enhance the secrecy of wireless transmissions [4] and support wireless localization [5].
In this study, our focus is on incorporating RISs into multi-operator wireless networks.Given their capacity for achieving wide-band transmission [6], RISs can effectively cover the frequency bands of multiple operators.This presents a coexistence challenge, as a RIS response setup suitable for one operator may compromise the performance of another.This challenge is also highlighted in [7], where the so-called bandwidth-of-influence, i.e. the frequency band over which a RIS significantly impacts incident signals, of several RIS-types is experimentally characterized, showing strong influence of the applied technology.Multi-operator RIS deployments are also compromised by inter-operator pilot contamination [8].
To overcome this coexistence challenge, we consider a network model where RISs are not owned by individual operators but by a dedicated RIS provider/controller.This provider sells on-demand leases of RISs in a free market to individual operators, supporting transmissions within their networks.This dynamic introduces a unique competitive landscape, which has not been investigated before, where operators actively compete for control over RISs.We propose the implementation Manuscript received 4 January 2024; accepted 28 January 2024.Date of publication 31 January 2024; date of current version 11 April 2024.The authors acknowledge TU Wien Bibliothek for financial support through its Open Access Funding Programme.The associate editor coordinating the review of this letter and approving it for publication was A. Celik.
Digital Object Identifier 10.1109/LCOMM.2024.3360477 of an auction format tailored to efficiently allocate RISs among operators and develop a multi-agent reinforcement learning (RL) framework to dynamically optimize operators' bidding strategies.Through our experiments, we substantiate the effectiveness of this approach, showcasing its superiority over the heuristic dominant strategy of greedy bidding.This not only enhances the overall efficiency of RIS allocation but also underscores the adaptability achievable through the integration of RL into the auction process for multi-operator RIS-assisted wireless networks.The proposed auction format could be realized within the RIS-enabled smart wireless environment architecture of [7].RL has previously proven effective in optimizing the phase-response of RIS alongside beamforming weights for a base station within a singleoperator setup [9].Conversely, our study demonstrates that RL can effectively direct the allocation of RISs in a multi-operator environment.
Notation: The complex Gaussian distribution with mean µ and covariance The indicator of a is 1(a), i.e. 1(a) = 1 if a ̸ = 0 and zero otherwise.The phase of complex number z is arg(z).The expected value of random variable r is E(r).

II. SYSTEM MODEL
We consider the mobile networks of N O operators, each serving the same geographic region through their separate and non-interfering bands centered around the carrier frequency base stations.We assume that users and base stations are equipped with single antennas.We consider orthogonal frequency division multiple access (OFDMA) transmissions at each base station, so that transmissions are only impaired by intercell-interference.These assumptions may appear simplistic; however, they are fundamentally unessential for the auction problem described later, and should be regarded as illustrative, serving the purpose of simplifying the formulation while maintaining relevance.
Under these assumptions, we obtain the following frequency-flat per-subcarrier single-input single-output (SISO) down-link input-output relationship of user u served by operator o where d u denotes the index of the base station serving user u, h u,ℓ is the channel between the user and base station ℓ, and n u ∼ CN 0, σ 2 n is noise.Within the considered region, N R RISs assist the transmissions between base stations and users.We consider the most basic idealistic model for the interaction of RISs with wireless signals through frequency-flat diagonal phaseresponse matrices.While more accurate models exist [10], their incorporation might potentially obscure the main focus in this work.With appropriate modifications of the RIS response and the RIS-assisted channels defined below, the proposed setup can also be generalized to more advanced RIS technologies, such as intelligent omni-surfaces [11], stacked intelligent metasurfaces [12] and non-diagonal RIS; we leave this for future work.The RISs are owned by an independent RIS provider, who offers on-demand leases of the RISs to the highest bidding operator.We assume that the RISs are sufficiently broadband to effectively cover the transmission bands of all operators.If an operator takes control of a RIS, it can set the phaseresponse of the RIS to maximize the performance of its users.In particular, considering RIS r consisting of M discrete reconfigurable elements, the RIS response is governed by a diagonal matrix Φ r = diag e jϕ (r) 1 , . . ., e jϕ (r) M ∈ C M ×M .Although the operators transmit in separate, non-overlapping frequency bands, this still leads to a mutual coupling effect between them.In particular, the optimization of the RIS response for one operator can negatively impact the performance of another operator.For instance, it may result in increased inter-cell interference for the non-controlling operator or create destructive multi-path interference for users' channels.
The users' channels are comprised of a direct component h The RIS-assisted component can further be written as with h r,ℓ ∈ C M ×1 denoting the vector-valued user-to-RIS and RIS-to-base station channels, respectively.
The users' signal to interference and noise ratios (SINRs) β (o) u and achievable rates r

III. AUCTION-BASED RIS ALLOCATION
We now develop an auction-based allocation of RISs to operators.To conduct this auction, operators must be able to estimate the utility of a RIS allocation, i.e. how much a given allocation improves the performance of their networks.This estimation cannot be based on perfect channel knowledge, since these channels can only be accurately estimated once the RISs have been assigned and pilot signals have been transmitted.To obtain a coarse utility estimate, we first derive SINR and rate expressions that are based only on macroscopic channel properties which are relatively easy to observe.Based on these utility estimates, we then develop a low complexity simultaneously ascending auction format in Section III-B.
A. Utility Estimation a) Utility function: Consider bidding for a subset R ⊆ {1, . . ., N R } of RISs.To determine the appropriate bid amount for this specific subset, each operator must assess the utility or valuation of this subset.Hence, we establish a utility function by calculating the sum of exponentiated rates for the operator where r(o) u (R) is an estimate of the rate achieved by user u when RISs R are controlled by the operator.We use the percentage improvement compared to allocating no RISs to the operator.The parameter α ∈ (0, ∞) allows to gauge the fairness of user rates.When α approaches 0, the operator prioritizes higher user rates, whereas as α approaches infinity, the emphasis shifts towards equalizing the rates of all users.
b) Channel model: To illustrate the estimation of the rate r(o) u (R), we adopt a specific simple geometry-based wireless channel model.In particular, consider a vector-valued channel h ∈ C M ×1 of an arbitrary link between two nodes (users, base stations, RISs); for SISO channels we just set M = 1 Here, γ denotes the distance-dependent macroscopic path loss.It also depends on the propagation conditions of the link, i.e. whether it is in line-of-sight (LOS) or non-line-ofsight (NLOS); in the simulations these factors are determined for each link individually.The Rician K-factor serves to linearly combine a directional path e jφ a(θ) with a random scattering component g ∼ CN (0, I M ).Here, a(θ) is the RIS response vector w.r.t. a plane-wave in angular direction θ and φ is a propagation distance/delay dependent phase-shift.For links related to non-serving base stations, we assume that φ follows a uniform distribution φ ∼ U(0, 2π), whereas, for channels associated with the serving base station, we set φ = 0 assuming perfect synchronization.We consider different K-factors depending on the LOS/NLOS conditions of the link.This model is applied to all channels in (2),(3).c) SINR estimation: To estimate the SINR and rate of user u, we replace the instantaneous channel gains |h (4) with their expected values.For this, we assume that Φ r is optimized for the user if the RIS is controlled by the corresponding operator r ∈ R, whereas it is random ϕ ) which only accounts for the directional part in (6), because the scattering part varies too quickly to adjust the RIS for it.
For channels h r,ℓ between base stations and RISs, we assume that the directional path is dominant, i.e. the Kfactor is so large that the Gaussian scattering component can be neglected (k . This is justified considering that RISs are intentionally placed by the RIS provider to support the operators' base stations.Under these assumptions, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the expected channel gain of the intended signal becomes Here, p c contains the coherently combined signals received over the directional components of the direct and RIS-assisted channels.Term p i contains the incoherently combined signals received over the Gaussian scattering components.These are incoherently combined, because the RIS responses Φ r of controlled RISs r ∈ R only compensate for the phase shifts of the directional components.Finally, p u is the incoherent combination of signals received from RISs not controlled by the operator.These are incoherently combined, because their RIS response matrices Φ r , r / ∈ R are random.Equivalent expressions are derived for channels of interfering signals |h A difference arises in the coherently combined signal power p c .This is due to the fact that the RIS responses Φ r , optimized in (7) for the serving base station a(θ r,du ), generally do not align with the channels of interferers a(θ r,i ).This leads to a reduction of the interfering signals by a(θ r,i ) .All necessary terms for computing average signal powers can thus be derived from macroscopic channel properties.Substituting into (4) results in r(o) u (R).

B. Auction Format
The allocation of RISs to operators can be handled using a combinatorial auction, similar to spectrum auctions.However, our goal is to conduct this auction dynamically during live operation of the operators' networks, depending on demand.We cannot execute a complex combinatorial auction using a direct-revelation mechanism for valuations of all potential RIS subset allocations (e.g., employing the Vickrey-Clarke-Groves (VCG) mechanism [13]).This would demand excessive computation and communication overhead to exchange valuations with the auctioneer.Note that the valuation/utility (5) of a subset R is not simply the sum of the utilities of the individual RISs.
We therefore consider a low-complexity indirect auction mechanism, in particular a simultaneously ascending "Japanese" forward auction [14].In round t of the auction, the auctioneer (RIS provider) sets a uniform price p t > p t−1 for each RIS, starting from an initially low price p 0 .In each round, the price is increased by a fixed increment ∆ p = p t − p t−1 .The bidders (operators) submit their bids in the form of binary vectors b (o) ∈ {0, 1} NRIS , indicating their willingness to pay the current price for specific RIS units.If a particular RIS unit receives a bid from only one operator, the auctioneer accepts the bid, and the RIS is allocated to that operator.If there are no further bids for a particular RIS, it remains unassigned and its response is set randomly.The auction concludes when all RISs are assigned or when there are no more bids.Furthermore, we implement a consistent bidding activity rule, meaning that operators are not allowed to bid for a particular RIS in round t if they did not place a bid for the same RIS in round t − 1 (rule-defying bids are ignored by the auctioneer).This rule facilitates the identification of the preferences amongst operators.
Alternatively a descending auction could be used, where the price starts high and falls until a bidder accepts it.However, ascending auctions are more bidder-friendly as every participant can observe the interest of other bidders from the beginning.Through our experiments, we have noticed that the RL agent learns more effectively using the adopted ascending format.
IV. BIDDING STRATEGIES Consider round t of the auction: we denote by R (o) t−1 the set of RISs that have been allocated to operator o in previous rounds.The set of remaining RISs is Each bidder has to decide for which of the remaining RISs in R t it is willing to pay the current price p t .Therefore, each bidder has to estimate the value of adding RIS r ∈ R t to its already allocated RISs R (o) t−1 .In principle, this requires determining the average utility of incorporating RIS r alongside all combinations of the remaining RISs in R t .This is necessary because the operator lacks knowledge about which other RISs it might secure.However, for a larger number of RISs, the complexity of this combinatorial approach becomes impractical.Consequently, we simplify the calculation by assessing the value of acquiring RIS r with the assumption that it would be the sole RIS secured by the bidder A. Heuristic Bidding In the given auction setup, each bidder has a dominant strategy, namely to stay in the auction as long as the valuation of a RIS unit is higher than the price.To compare the percentage value in (9) to the price p t , we introduce a constant c

(o)
V that represents the maximum price the operator is willing to pay for 100 % improvement.Using this approach, we identify the set of RISs that may be worth bidding on However, if the operator bids on all of them, it risks potentially surpassing its available budget.Let B t .Without a strict budget cap, the operator may bid on more items, knowing it may not secure all of them.Nevertheless, in our simulations we apply a strict budget cap and have found that the conservative approach yields the highest reward as defined below.

B. RL-Based Bidding
Alternatively to this greedy approach, we can train RL agents for each operator to learn optimized bidding strategies.In our simulations, we investigate using the same agent for each operator individually, but there is also the possibility of training a RL agent to compete against other strategies.Below we specify the state, observation and action spaces, and the rewards utilized for RL training.Note that the observations are based only on the information available to each operator individually, without any exchange of information between them.
a) States: The state of our auction environment at time t is determined by the following variables b) Observations: The partial observation of operator/agent o at time t contains To maintain a fixed-size observation space, a requirement for existing algorithms, we assign a value V t (r) of 0 to RIS units that are no longer available.This applies to RISs that have already been allocated in previous rounds, as well as, to RISs that the operator did not bid on in previous rounds, enforcing the mentioned activity rule.This approach also allows using the same trained agent in environments with varying numbers of RIS units.We achieve this by defining spaces large enough to cover a maximum number of RISs and setting values to zero if the actual number of RISs is less than the maximum allowed.However, it is advisable not to unnecessarily increase the complexity of the RL algorithm with larger spaces.
• Penalize invalid bids on RISs with value 0 P denotes a tunable punishment factor.• Penalize bids that cause exceeding the available budget It is not possible to strictly enforce staying within the available budget with RL.Therefore c P must be chosen sufficiently large to keep such occurrences to a minimum.e) Implementation: 1 We used the proximal policy optimization algorithm [15] of Stable-Baselines3 2.1.0[16] to build our RL agent, using its default hyper-parameters.The multi-agent RL auction environment was realized using Gymnasium 0.29.0 [17] and PettingZoo 1.24.1 [18] combined with Supersuit 3.9.0[19] to create a vector environment for multi-agent training.In such environments, it is necessary to normalize the continuous state and observation spaces to finite ranges.This requires some experimentation with the environment, since the possible range of values in (9) depends on the environment geometry, and the number of RISs N R and RIS elements M . 1 Code is available at https://github.com/StefanSchwarzTUW.

V. SIMULATIONS
We consider transmissions in millimeter wave wireless networks of N O = 2 operators.We consider a region of interest of 100 m 2 where we randomly place users.The base stations and RISs are arranged in a regular grid, with an additional layer of randomness introduced through a Gaussian distortion with a standard deviation of 20 meters.We assume that RISs are intentionally placed such that there is a LOS path between RISs and base stations (e.g., on top of buildings).We also assume that the direct paths between base stations and users are blocked, such that NLOS propagation conditions apply.For the channels between RISs and users, we consider a distancedependent LOS probability as specified in Table I.
The RL agent undergoes training through episodes, each aligning with a distinct auction.The episode length is thereby variable, depending on the number of steps to complete the auction.In every episode, a fresh environment is created, complete with randomized positions and wireless channel realizations.Invalid actions are penalized by a punishment factor c (o) P that is ten times larger than the largest possible value V (o) t .The agent is trained for 3 • 10 5 auction steps.In Fig. 1, we observe that the RL agent outperforms the heuristic approach and achieves a higher reward.Breaking it down into costs and utility achieved by the RIS allocation, we can see that both approaches essentially achieve the same utility; however, the RL agent incurs significantly lower costs.This means that the agent implicitly learns to coordinate the bids of operators: fundamentally, the RL agent only bids on Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the most valuable RISs, while not driving the price of other RISs up.The heuristic, on the other hand, drives the price of all RISs up until one operator drops out.It is worth noting that each operator uses its own isolated instance of the RL agent, observing only its values and not those of others.As a result, there is no direct collaboration between the operators.
Next, we vary the budget of operator 1, while keeping the budget of operator 2 constant.This enables operator 1 to acquire a larger number of RISs, thereby improving its reward; see Fir. 2. The achieved result, coming from an agent trained with a budget set to one, implies that adjusting the budget does not require retraining the agent.This is also true for scenarios with varying numbers of users and base stations, although not explicitly shown in the simulations due to space constraints.The agent, trained solely on observing the RIS allocation value, does not rely on knowledge of specific values for N U and N B .
The results so far are based on the utility estimates derived in Sec.III-A.Next, we therefore demonstrate the behavior in terms of the instantaneous SINR (4) for fixed positions and random microscopic fading channels.In Fig. 3, we show the distribution of the users' SINRs for various RIS allocations.Gray curves depict performance in the absence of RISs, essentially mirroring the operators' performance when RISs are present but remain unallocated to them (dotted curves).The notation RIS@OpK -OpL means that all RISs are assigned to operator K and we observe the performance of operator L. Dashed curves show the performance when all RISs support the respective operator, 2 while solid lines show the performance achieved by the RL agent.There is a substantial enhancement in performance when compared to scenarios without available RISs or when they are not explicitly allocated to the operators.Therefore, strategically 2 Simultaneously achieving both dashed curves is not possible since RISs can only be assigned to a single operator.
sharing RISs between operators can significantly improve the performance of both networks at the same time.

VI. CONCLUSION
In this study, we delved into multi-operator wireless networks supported by RISs supplied by a dedicated RIS provider.Introducing an auction format for this RIS market, we illustrated the applicability of RL in learning effective bidding strategies.The outcomes from our simulations showcased significant enhancements in SINR and data rates, underscoring improved performance for all participating network operators.
power of base station ℓ of operator o.For simplicity we consider P (o) ℓ = P, ∀ℓ, o.
of operator o at round t, calculated as the initial budget B (o) 0 at the start of the auction minus the costs paid for R (o) t−1 .A conservative strategy is to bid only on the ⌊B (o) t /p t ⌋ most valuable RISs in ϱ (o) c) Actions: Based on O (o) t , each agent makes bidding decisions using the binary action vector b (o) ∈ {0, 1} NRIS .d) Rewards: Assume that the agent wins the set w (o) t ⊆ R t of RISs in round t.The reward is then composed of three contributions r

Fig. 3 .
Fig. 3. Distribution of instantaneous SINRs for fixed random positions of network elements and varying microscopic fading channel conditions.
© 2024 The Authors.This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.