Intelligent Beam Steering for Wireless Communication Using Programmable Metasurfaces

Reconfigurable Intelligent Surfaces (RIS) are well established as a promising solution to the blockage problem in millimeter-wave (mm-wave) and terahertz (THz) communications, envisioned to serve demanding networking applications, such as 6G and vehicular. HyperSurfaces (HSF) is a revolutionary enabling technology for RIS, complementing Software Defined Metasurfaces (SDM) with an embedded network of controllers to enhance intelligence and autonomous operation in wireless networks. In this work, we consider feedback-based autonomous reconfiguration of the HSF controller states to establish a reliable communication channel between a transmitter and a receiver via programmable reflection on the HSF when Line-of-sight (LoS) between them is absent. The problem is to regulate the angle of reflection on the metasurface such that the power at the receiver is maximized. Extremum Seeking Control (ESC) is employed with the control signals generated mapped into appropriate metasurface coding signals which are communicated to the controllers via the embedded controller network (CN). This information dissemination process incurs delays which can compromise the stability of the feedback system and are thus accounted for in the performance evaluation. Extensive simulation results demonstrate the effectiveness of the proposed method to maximize the power at the receiver within a reasonable time even when the latter is mobile. The spatiotemporal nature of the traffic for different sampling periods is also characterized.


Intelligent Beam Steering for Wireless Communication Using Programmable Metasurfaces
can compromise the stability of the feedback system and are thus accounted for in the performance evaluation. Extensive simulation results demonstrate the effectiveness of the proposed method to maximize the power at the receiver within a reasonable time even when the latter is mobile. The spatiotemporal nature of the traffic for different sampling periods is also characterized.

I. INTRODUCTION
F UTURE generations of wireless communications are envisioned to provide several Gbps data rates per user for billions of users simultaneously [1]. The proliferation of billions of Internet of Things (IoT) devices in various industries has pushed wireless networks to their limits in terms of the required capacity to effectively communicate the vast amounts of data that are generated [2]. Moreover, the increasing number of IoT devices is changing mobile communication services from interpersonal communication to smart interconnection among billions of devices. 6G is expected to fulfill the requirements of a fully connected world and provide ubiquitous wireless connectivity for all [3], adopting transformative solutions, e.g. intelligent surfaces and programmable wireless environments [4], [5], [6], [7]. Furthermore, it is to be noted that among these interconnected devices, millions of vehicles fitted with onboard communication systems and a range of autonomous capabilities are being increasingly phased in as part of this network of connected devices [8]. According to the US Department of Transportation and the Connected-Intelligent Transportation System (C-ITS) initiative of the European Commission [9], [10], this connectivity would enable the vehicles to participate in intelligent transportation systems (ITS) such as See-Through, High-Density Platooning, Automated Overtake, and so on [11], [12]. However, high data rates and reliable communications are needed for next-generation vehicular networks. For example, applications such as See-through vision and bird's eye view necessitate a data rate in excess of 50 Mbps and a delay of 50 ms. Likewise, automatic overtake necessitates a 10 ms delay and 99.999 percent reliability [13], [14].
Existing communication technologies have limitations on achievable speeds. For example, Dedicated Short-Range Communication (DSRC) such as IEEE 802.11p/DSRC and ITS-G5/DSRC, and 3GPP's Long Term Evolution (LTE) technologies which were adopted for vehicular communication purposes have been shown to pose limitations on the achievable speeds (6 Mbps for DSRC and up to 100 Mbps for LTE-A) [15], and key stakeholders have sought solutions in the millimeter-wave (mm-wave) [16], [17] and even terahertz (THz) bands [13].
Frequency allocation in the mm-wave and THz range can offer tremendous bandwidth as compared to microwave signals, however, such high-frequency signals suffer from high path losses and atmospheric absorption thus necessitating highly directional transmissions. As a result, they rely on directional antenna technologies [18] to increase the antenna gain [19]. Moreover, directional beams can serve the purpose of reducing interference from signals of non-interest [20], [21]. To best serve these functions, transmitter and receiver alignment is a critical process [22], however, it is greatly challenged by node mobility and blockage, typical in vehicle-to-vehicle (V2V) and vehicle-to-everything (V2X) communications. It is well established that key challenges in mobile mm-wave networks are effective beam management under high mobility and blockage recovery [23], [24].
Several beam management schemes have been proposed in the literature to address the aforementioned challenges [25], [26], [27], with notable attempts incorporating techniques such as beam switching [28], 3D interaction with the spatial channel profile [29], out-of-band direction inference [30], decoupling of transmitter and receiver steering [31] and predictability emanating from the abundance of available data [32], [33]. It must be noted here, that beam steering is a significant part of the beam management problem, relying on reconfigurable antennas. Beam steering techniques beyond beamforming which is common in multiple-input and multiple-output (MIMO) 5G systems are found in literature [34].
One of the key technologies to address the blockage recovery problem as shown in Fig. 1 are Reconfigurable Intelligent Surfaces (RIS) [35], [36], also known as Intelligent Reflecting Surfaces (IRS) among other names, which can be realized using a number of technologies as for example reflect arrays [37], [38], [39] or metasurfaces [40], [41] with recent designs reported even in the THz band [42], [43]. Beam training in the presence of RIS has been recently addressed in [44], [45], [46], and [47] for single and multiuser cases. The approaches therein, however, are generic without reference to the underlying RIS technology. Beam management in 802.11ad networks assisted by reconfigurable reflects arrays has been addressed in [48] leading to a three-party beam searching protocol involving the RIS. The beam searching protocol involves the second stage of beam training providing finer control of the link between the RIS to the receiver. The approach is accompanied by experimental verification results and is compatible with existing 802.11ad transceivers. However, metasurfaces, owing to subwavelength apertures which allow for finer control of the electromagnetic waves [49], have distinct advantages relative to competing technologies and have thus recently attracted significant attention. Beam management in the presence of metasurface RIS is an open problem [50] and a number of recent works have considered online tuning of metasurface parameters to optimize the transmitter-receiver channel via the RIS [51], [52]. These approaches are feedback based and aim toward the standalone operation of metasurfaces without human intervention. However, the aforementioned approaches are based on machine learning techniques which might lead to slow convergence due to model training. Moreover, they do not account for the implementation details of the metasurface i.e. the metasurface coding that will yield the desired functionality and the methodology with which the control signals will be disseminated to the reconfigurable unit cells. As such, the effect of delays on the performance of the closed-loop system is not accounted for. A notable work incorporating feedback, aiming towards autonomous stand-alone operation is reported in [53]. In that work, practical aspects of a 2-bit digital programmable metasurface are accounted for, and experimental results demonstrate the effectiveness of the proposed design. However, the thrust there-in is different aiming for orientation control with the desired angles fed directly to the metasurface controller.
In this work, different from the aforementioned approaches, we consider feedback-based iterative beam steering, based on power measurements at the receiver, taking into account the required coding patterns of a multi-state metasurface and the methodology with which reconfiguration signals are sent from the metasurface controller to the appropriate unit cells. The underlying architecture is based on the recently proposed hypersurface (HSF) paradigm [4] which integrates an embedded network of controllers (CN) on the software-defined metasurface (SDM) whose unit cells can assume multiple states by suitable choice of the varactor values within the unit cell [54]. A well-known form of extremum seeking control (ESC) is adopted to serve as the feedback-based iterative control algorithm [55] whose effectiveness to track mobile targets with speeds beyond 100 km/h is demonstrated, rendering it suitable for V2X scenarios. Our contribution goes beyond that, to characterize the "workload" incurred within the metasurface as a result of the iterative control procedure and the packetized directives sent to each of the unit cells. This workload procedure allows us to account for the feedback delays associated with message transmission and can also be used to characterize the energy consumption of the overall procedure.
The work presented in this paper is based on our previous work in [56], however, in this extension, we consider changes in both the elevation and azimuth angles and in addition demonstrate the effectiveness of the methodology to track mobile targets, even in the presence of delays and fading. The main new contributions of this paper relative to our previous work are outlined below: • Simultaneous control of the azimuth and elevation angles is realized via a pair of independent Extremum Seeking Controllers (only elevation angle changes were considered in our previous work).
• The effectiveness of the proposed scheme is demonstrated in the presence of mobile receivers typical in vehicular environments with indicative performance bounds in terms of the achievable speed.
• The discrete-time version of the ESC is presented and the effect of the sampling period is investigated.
• The load within the CN and the effect of delays are characterized for the newly introduced scenarios as outlined above.
• It is demonstrated that the convergence time of the proposed scheme can be significantly improved by fine-tuning the controller parameters.
• The effectiveness of the proposed scheme is shown in the presence of shadow fading. The paper is organized as follows. In Section II, we describe the considered HSF, in Section III, we formulate the problem mathematically and present the utilized controller. In Section IV, we evaluate the performance of the closed loop system, and finally, in Section V, we offer our conclusions and future research directions.

A. Hypersurface Structure
The technology considered in this work for the realization of the RIS is the Hypersurface which complements the SDM with an embedded controller network (CN) to provide programmability and autonomous operation. A detailed description of the HSF is provided in recent literature [57] and a reference HSF is shown in Fig. 2 including multiple layers such as the gateway layer, embedded control layer, and EM. In short, the metasurface embodies the EM layer as a collection of unit cells that modify the impinging wave by demand of the corresponding application programming interface calls. In more detail, the EM layer concerns the physical realization of the deep subwavelength unit cells or meta-atoms of the metasurface, as well as of the tunable actuators that allow controlling the reflection. Coupling between the elements depends on the metasurface design such that if the size of the elements is larger than the spacing the coupling is high and vice versa. The high density of elements allows for the micromanagement of EM waves at the level of electric and magnetic field vectors [58]. It is in fact the interaction of an impinging wave with the unit cells that induces local currents in the EM layer and, by controlling these currents as secondary EM sources, one can manipulate the scattered field wavefront. The characteristics of these secondary currents are essentially determined by the impedance of each unit cell as seen by the EM wave. This means that we need to apply individual and independent changes to each unit cell's impedance to effectively engineer the wave. To do so, tunable elements need to be integrated into each of the unit cells to tune their resistance R and capacitance C as real and imaginary parts of the surface impedance, respectively. These elements may be circuit based, such as varactors and varistors, which have been adopted in the prototype under development [54], [59]. However, as these reach their operational limits at mmwave frequencies, despite very recent advancements which can also render them suitable [60], other tuning technologies may prove to be more suitable, for example, nematic liquid crystals [61].
An embedded control layer [62] exists behind the EM layer, which includes the hardware and protocols required for transferring information between the gateway (GW) and the tuning elements at each unit cell to modify their state. The control layer, in its simplest form, can be the direct interconnection from the GW to all unit cells, which can be practical for some programmable wireless environment (PWE) applications [63]. However, as more unit cells are integrated into an HSF, a more scalable network of embedded controller chips is required. We refer to this network as CN [64]. In this case, it is assumed that each controller chip can only apply a discrete number of states with cardinality N s , whereby each state is a combination of RC values. The GW layer operates on top of the embedded control layer. The GW layer not only connects the HSF to the external world via standardized protocols but also converts external software commands into specific metasurface codes. The metasurface code is the set of unit cell states or RC values, out of the N s available ones, by which the metasurface achieves the desired functionality with low error. The code is composed by the GW through the distribution of simple individual messages in the CN that trigger the appropriate changes of state at each unit cell.

B. Metasurface Coding
The code to be applied to a metasurface depends not only on the functionality but also on functionality-dependent inputs [65]. Here, we describe the process of metasurface coding for the particular case of anomalous reflection used in beam steering. For a metasurface in reflection mode, unit cells shall be designed for high reflection amplitude and wide phase range. Anomalous reflection is achieved by applying a phase gradient to the impinging wavefront [66]. In particular, phase gradients are applied in the x and y directions, where x and y directions define the plane on which the HSF is located.
Let us denote the reflection phase as . Then, the desired reflection phase gradients in the x and y directions are x = ∂ /∂ x and y = ∂ /∂ y, respectively. By applying such gradients, the phase of a unit cell at location (m, n), i.e. the m-th column and n-th row, can be expressed as: where D u is the lateral size of a square unit cell, and 00 is the phase of the first unit cell (whose value can be arbitrary, since the importance lies in the phase gradient). In order to relate the target reflected angle ({θ r , φ r } in polar coordinates: where θ r and φ r denote the elevation and azimuth components respectively) with the angle of incidence ({θ i , φ i }) and the phase gradients implemented in the HSF ( x and y ), we apply the momentum conservation law for wave vectors as: where k r = 2π λ r and k i = 2π λ i are the wave vectors of the reflected and incident mediums with wavelengths λ r and λ i , respectively. In this paper, we consider that λ i = λ r = λ 0 because the host medium for most communication applications is generally air. Since the deflection angle (the difference between the incident and reflected beam angles) is dictating the phase gradient (not the absolute value of the incident and reflected beam angles), we can derive the formula for normal incidence (θ i = 0) and then extend it to arbitrary cases [67], [68].
Since the number of unit cell states is limited to N s , the target phase mn is actually mapped to that of the closest state [69]. In any case, if the incidence or reflected angle changes, the HSF code will need to change to accommodate the new required phase gradients, triggering the transmission of internal commands from the GW to the concerned unit cells.

C. Far Field Evaluation
A model is necessary to determine the stability of the communication link by comparing signal power at the receiver and its sensitivity. To obtain the received power, one must calculate the far field pattern of the metasurface. Since the size of the unit cells is small compared to the wavelength, we can assume that the current distribution on the patches is uniform. In other words, we can model each unit cell as an independent punctual source and apply Huygens' principle to obtain the far field pattern as the sum of the contributions of all unit cells. This model assumes that the crosstalk between adjacent unit cells is negligible, which can be enforced with proper unit cell design and spacing. Further, we model the scattering pattern of the unit cell with the function cos(θ ). Under full illumination of the metasurface, the scattering field can be expressed as [68] where K is a constant defined by the incident amplitude, mn is the reflection coefficient (ideally it has to be unity), mn is the reflection phase of unit cell (m, n), M and N are the number of unit cells in a row and a column, k 0 is the wave number and ζ mn (θ, ϕ) is the relative phase shift of the unit cell with respect to its coordinates (θ, ϕ), given by This method has proven to be accurate in evaluating the far field of a metasurface for beam steering by comparing the results with full-wave simulations [67], [68], [70] and has been used extensively in a number of recent studies [46], [47], [67], [68], [70], [71], [72]. The approximations made have a small impact on the radiation pattern.

III. BEAM STEERING ALGORITHM A. Problem Formulation
We consider a scenario comprising of a transmitter, a receiver, and a Hypersurface acting as the RIS (H-RIS). The transmitter, due to the absence of LoS communication with the receiver, attempts to establish a reliable communication channel via the H-RIS. A beam is directed towards the H-RIS and then the objective is to iteratively re-configure the controller states so that the impinging wave is directed towards the receiver. Despite the fact that the channel reliability can be assessed by a number of metrics, here we pose the objective of maximizing the received signal strength. Note that maximizing the signal strength may not be optimal in terms of channel quality but consideration of alternative metrics is left for future work. It is also worth noting that the same methodology and solution are applicable to the setting of a directional SDM-based transmitter, where the objective is to direct the outgoing beam towards the receiver. This may find significant applications in the mm-Wave and THz bands and in space applications. Below we introduce the necessary notation to formalize the described problem depicted graphically in Fig. 3.
The transmitted beam is incident on the H-RIS with the corresponding vector valued angle of incidence ψ i = [θ i , φ i ] comprising two components relative to the RIS plane, the elevation angle θ i and the azimuth angle φ i . Depending on the configuration of the controller states, the beam is reflected on the H-RIS with a reflection angle ψ r = [θ r , φ r ], where θ r and φ r denote the elevation and azimuth components respectively. The reflection angle is the control variable or actuation signal which is dictated by the controller states. The controller states are updated by packetized directives which are sent by the input GW, one by one at the beginning of each update period. The dissemination is challenging, in terms of deadlock-free delivery, 1 by the unique Manhattan type topology of the controller network [74] which comprises of N = M × L controllers, where M and L denote the number of controllers along a single row and a single column respectively. These challenges have been addressed in [75] and the workload characteristics have been characterized in [69]. Here, for simplicity, we assume an "agnostic" XY routing mechanism and unlike previous work, the workload is characterized in the considered feedback setting where the update times are determined by the sampling period (in [69] the updates are determined by the angular step). Such a characterization is necessary, as the delivery of the reconfiguration packets from the input GW to all the controller nodes which need to be updated to realize a particular reflection angle, is associated with a delivery delay, the main source of the overall feedback delays. Feedback delays are well known from the control systems literature [76], [77] to compromise the stability of closed-loop systems such as the one considered here. Further information on the packet delivery delays is provided in section IV, however, it is to be noted that they are time-varying with specific bounds. Further, given the architectural properties of the metasurface (e.g.the number of states, the CN size, and the routing algorithm), the delays only depend on the change of the reflection angle to be realized and this is the approach adopted in section IV for their characterization.
The controller states c 1 , c 2 , . . . c N are lumped into a vector c = [c 1 , c 2 , . . . c N ] with each entry assuming a value from the set S = s 1 , s 2 , s 3 , . . . s N s . Based on the chosen controller states and the angle of incidence, a particular reflection angle is realized based on the metasurface coding procedure described in the previous section. This relationship is expressed via a function f (.) such that ψ r = f (c, ψ i ). The receiver at any time t is located at the position l(t), with corresponding coordinates in 3D space. The function l(t) defines its time-varying trajectory. The receiver is appropriately equipped to measure the received power P(t). The received power is considered as a function of the transmitter power P t , the location of the receiver, the incidence angle, and the HSF controller states which yield a far-field pattern as described in the previous section. This is expressed via the function g(.) such that P(t) = g(c, ψ i , l(t), P t ). Based on the latter, one may consider c to be the control variable, posing the objective of choosing c directly such that the power is maximized. This can be cast as a multiinput single-output (MISO) control problem, which, however, poses challenges due to the multi-dimensionality of the input space. To overcome this, we account for the function f (.) in g(.), considering the reflection angle as the control variable instead of c. This results in the composite function v(.) such that P(t) = v(ψ r , ψ i , l(t), P t ), rendering the control variable (i.e. the reflection angle), two-dimensional as compared to the multidimensional case, had each controller been considered independently. This change in the input space also ensures the concavity of P(t) with respect to the input variable ψ r , which is important in the application of ESC. So the considered optimization problem becomes: The difficulty in solving the problem above stems from the following: • The functions f (.) and g(.) although appropriately specified using the modeling procedures described in the previous section are subject to uncertainty due to modeling errors, which are inevitable due to the complex interactions, and also due to the inaccuracies that may be incurred during the construction phase of the HSF and the probability of hardware faults and run-time errors. This, coupled with their highly nonlinear nature suggests that the true behavior can severely deviate from the one predicted by the assumed model.
• External disturbances acting at different points in the considered system are random in nature.
• The assumed single point of measurement in 3D space at the location of the receiver (assumed unique and dimensionless) only allows partial observation of the system states.
• The angle of incidence may be unknown to the HSF and the receiver.
• The location of the receiver which may be unknown to the HSF. The aforementioned challenges and especially the lack of robustness with respect to model uncertainties and external disturbances, hint towards the consideration of a closed loop implementation rather than an open loop implementation [48]. Open loop implementations may be realized if, for example, the location of the the receiver is known and the incidence angle can be estimated via a suitable algorithm. Such an approach, however, does not account for modeling and estimation errors which can yield a significant bias from the desired reflection angle. For closed-loop implementation, we assume the existence of a control channel that allows the receiver to communicate the received power to the input gateway of the HSF. The propagation delay is assumed negligible. The input gateway then processes the received power according to the control algorithm to be designed and generates the desired reflection angle ψ r . The latter is then transformed into the controller states c, via the inverse of the function f (.), assumed to be one-to-one. The desired controller states are communicated to the corresponding controllers via the embedded controller network as described earlier. Once the controller messages are sent, the controller states abide by these directions, the Far Field pattern is modified and the procedure is repeated. We thus consider a control law to be implemented at the input gateway which takes as input the measured power P and yields the desired reflection angle. A general description of the control law [78], in the form of a nonlinear state space representation is given below: where ψ r (t) is the desired angle of reflection, P(t) is the measured signal strength, q is a vector of controller states,q is a vector of changes in controller states and h(.) and v(.) are possibly nonlinear functions. The problem is then to design the functions h(.) and v(.) such that problem P is solved. A block diagram of the feedback system, as described above is shown in Fig. 4(a).

B. Proposed Control Algorithm
ESC is a model-free adaptive control method, deemed suitable for the problem under consideration due to the compatibility with the posed received signal maximization objective and the uncertain, nonlinear nature of the cost function, highly dependent on the incidence angle which is hard to estimate. In this scheme, the input signal is perturbed sinusoidally and the resulting change in the output signal is measured to determine and follow a local maximum of a measurable objective function.
Since the maximization must be done with respect to both the elevation and azimuth angle, the controller takes the form of a 2-dimensional matrix. Here we consider decoupled control action [79], i.e. the matrix is diagonal with each diagonal entry referring to the elevation and azimuth angle respectively. Coupled control action with off-diagonal elements for example in [80] is left to be explored in the future. For clarity of presentation, we present below the controller for the elevation angle θ r with the algorithm being the same for the φ r . Fig. 4(b) shows a schematic representation of the employed ESC scheme, where P(t) and θ (t) are the controller input and output respectively, with the latter identified as the control variable. As shown in Fig. 4(b), a perturbed signal θ r is obtained by adding to the best estimate of the optimal input signalθ a dither signal which in this case takes the form of a sinusoidal α sin(ωt). The updated θ (r ) is realized via appropriate HSF reconfiguration yielding a new received power P(t) which is measured. A high pass filter (HPF) is used to remove undesired steady state components, with the filtered signal then demodulated by means of multiplication with the dither signal αsin(wt). Unwanted high-frequency components are induced by this procedure which is removed by the low pass filter (LPF). Integral action is then employed to improve the best estimateθ. The integration basically implements a gradient policy of the time-averaged system aiming to maximize the cost function, typical in adaptive control. The complete algorithm can be represented in continuous time as follows: where ω is the frequency of the perturbation signal, k is the gain, ω l represents the cut-off frequency of the LPF and ω h is the cut-off frequency of the HPF. The principle of operation of the algorithm can be explained as follows [81]: At the time instants whenθ is on either side of the maximum point θ * , the periodic dither signal excites a periodic response of P(t) whose dc component is removed by the high pass filter. The remaining periodic response can either be in-phase with the perturbation signal (if θ is smaller than θ * ) or out of phase if θ is larger than θ * . In the former case, the dc component of the product of the two signals, which is extracted using the low pass filter, is positive and thus, after the integration process, drives the estimate of θ "uphill" towards the maximum. In the latter case, the dc component is negative and thus drives the estimate "downhill" towards the maximum. One can thus view the modulation/demodulation procedure as a gradient estimation procedure whose sign is retrieved and integrated. An alternative derivation, which again supports the reasoning of the method acting as a gradient estimation algorithm can be found in [81] and [82].

C. Discrete Time Implementation
In practice, the continuous time algorithm needs to be transformed into its discrete time equivalent due to the suitability of the latter for implementation on the HSF gateway. The discrete-time algorithm and the chosen sampling period are of primal significance for the embedded CN as higher sampling frequencies lead to higher data traffic and consequently more energy consumption. The traffic aspect is investigated in the performance evaluation section. Below, we present the discrete time ESC algorithm.
where k denotes the discrete time index, P(k) is the received signal power at instant k, θ r (k) is the angle of reflection at time k and P h (k) is the received signal power after high pass filtering at instant k. The LPF parameter γ is chosen such that γ > 0, the HPF parameter h is constrained by 0 < h < 1 and the modulation frequency ω is usually set according to ω = βπ, with 0 < |β| < 1 and β being rational.

IV. PERFORMANCE EVALUATION
In this section, we evaluate the performance of the proposed control scheme investigating its ability to guide the reflected beam towards a possibly mobile receiver so that the received power is maximized. A unique feature of the system under consideration is the embedded CN through which packetized directives are forwarded to the metasurface controllers. The delivery procedure is associated with forwarding delays and the effect of these delays on the convergence properties of the algorithm is assessed. In addition, the workload and the traffic within the CN are critical to be characterized as the load is aimed to be kept at a minimum. Towards this end the Spatio-temporal traffic patterns for different sampling times are also investigated.
The evaluation is conducted using two simulators, one developed on Matlab and one developed using the Analogic simulation platform. The latter models the process of delivering packetized directives within the HSF embedded CN taking into account its unique characteristics, for example, its asynchronous operation. The Matlab simulator comprises three main components. The controller implementation, the function mapping the reflection angle generated by the controller to the controller states, and the module which characterizes the Far Field pattern as a result of the chosen controller states. The Far-field pattern dictates the received signal strength. Throughout the evaluation procedure, we consider received power values normalized by the maximum value such that 0 ≤ P(t) ≤ 1. Time-varying receiver profiles which incur changes for both the azimuth and elevation angles are considered.

A. Static Receiver
The base scenario of the initial simulation experiments assumes zero feedback delays. We consider initial elevation and azimuth reflection angle equal to 45 o and 50 o respectively, with the receiver placed at a location such that the desired corresponding reflection angles are θ = 55 o and φ = 60 o . Fig. 5 depicts the time evolution of the elevation and azimuth angles of the reflected beam together with the normalized power of the received signal. The results demonstrate that the ESC scheme is successful in directing the beam towards the desired direction, enabling the normalized received power to rise from approximately 0 to 1. It is to be noted that in our previous work [56], the convergence time was rather slow, of the order of a couple of seconds, thus, a major objective of the current work is to further reduce the convergence time by appropriately tuning the controller parameters. This tuning procedure has enabled the ESC algorithm to guide the beam to the vicinity of the target location within 1 second as indicated in Fig. 5. In addition, the radiation patterns, in the form of normalized heat maps for different azimuth and elevation angles, are depicted graphically in Fig. 6 at the start of the simulation experiment (t=0sec) and after the system reaches   the steady state within reasonable proximity at t=2.03 seconds. It can be observed that most of the intensity is concentrated around the center point, which shifts from 50 o to 60 o on the x-axis and 45 o to 55 o on the y-axis. Fig. 7 shows the corresponding phase profile for the 24 × 24 metasurface with number of unit cell states N s = 4 for t = 0 and t = 2 seconds, respectively. Each color indicates a different state. This is useful in appreciating the re-configurations that need to be realized in order to facilitate the desired beam steering change.
In the subsequent simulation study, we investigate the effect of delays on the closed-loop system performance. Delays can be identified in both the forward and feedback paths: the propagation delay when forwarding the power recorded at the receiver to the GW, the latency of message delivery of the packetized directives to the controller nodes of the CN, and the time response of reconfiguring the state altering elements once the directives have been received. The former and the latter are established to be negligible compared to the dissemination delays within the CN and the focus is thus on characterizing these dissemination delays and examining their effect on the convergence and stability of the control algorithm. This characterization is conducted using a custom-developed simulation tool based on the Analogic simulator which accounts for the considered non-regular controller topology, the asynchronous design, and the XY-agnostic routing algorithm. In order to avoid directly integrating the AnyLogic Simulator with Matlab to examine the effect of delays, we instead pre-calculate the information dissemination delays for each possible change in the reflected azimuth and elevation angle pair (using the AnyLogic Simulator) as depicted in Fig. 8a and the resulting relationship is embedded in the Matlab Simulator for the evaluation of the closed loop system. It must be noted that the GW disseminates the messages by employing unicast routing to each controller node in one-by-one fashion. Multi-casting solutions which can reduce the induced traffic and thus the delays will be pursued in the future.
For the scenario under consideration, the delay relationship of Fig. 8a leads to the time profile of Fig. 8b. In order to account for this time-varying delay in the conducted simulations, the time-dependent delay relationship is represented as a polynomial function via a polynomial fitting procedure conducted on Matlab. The time evolution of the elevation and azimuth angles in the presence of time-varying delays is depicted in Fig. 9 indicating that convergence to the desired reflection angle is still achieved, with the time evolution becoming slightly more oscillatory.
Remark 1: It is to be noted that the delay values for the continuous-time version of the proposed ESC scheme were up to 15 ms, as presented in our previous paper [56]. However, in this paper, we implement the discrete-time version of the scheme which yields delay values less than 2.2 ms as shown in Fig. 8. This reduction is natural due to the lower frequency of updating the controller parameters which yields less number of update directives.
It is well known in the controls literature [77], [83], that the stability properties of feedback systems can be compromised in the presence of delays. The next step is thus to investigate the tolerance. We next investigate the tolerance of the system with respect to feedback delays i.e. what are the delay values beyond which the system performance degrades significantly? The stability properties in the presence of delays are a problem we plan to address analytically in the near future. In this work, we investigate the problem using simulations. The transient and steady-state behavior of the system is examined when the  feedback delays assume values equal to 20, 30, and 50ms. Fig. 10 shows the resulting time response of the received power and the azimuth angle. It can be observed that when the feedback delay becomes equal to 50ms, the beam is no longer steered towards the receiver, which causes the received signal power to become almost 0. It must be noted that the 50ms delay is significantly larger than the delay values recorded in the discrete-time implementation of the control algorithm ( Fig. 8 and 9), thus delay-related instabilities are not expected, unless very high sampling rates are utilized which approximate the continuous time case. A significant design parameter of the discrete-time implementation of the algorithm is the sampling period. Higher sampling periods are known to have an adversarial effect on the stability margins of the system, however at the benefit of more rare reconfigurations of the HSF, less traffic within the controller network, and smaller feedback delays. In the subsequent analysis, we investigate these effects of the sampling period T on the system performance. We consider sampling period values equal to 0.01, 0.02, 0.03, 0.04 and 0.05 sec. The time evolution of φ r , and the corresponding signal power P(t) for each of these sampling periods are shown in Fig. 11. Increasing sampling periods cause an increase in the system damping resulting in slower responses. When T reaches a sufficiently high value (0.05 sec), the system even fails to converge to the desired azimuth and Fig. 11. Effect of the sampling period on system performance. elevation angles, causing the received power to attain values less than 1. It is to be noted that for Fig. 10 and 11, we have not presented the time evolution of the elevation angle θ r to save the space.
Remark 2: It is worth mentioning that in our previous work while keeping φ constant and only varying θ, the proposed controller was able to converge the θ to its desired value for relatively high sampling time such that T < 0.5 s. However, in this work, for simultaneous control of both elevation and azimuth angle, a relatively small sampling time is required. This depicts a trade-off between sampling time and the achieved performance.
However, as highlighted above, higher sampling frequencies (lower sampling periods) lead to re-configurations of the controller states occurring more often, in turn resulting in higher traffic loads within the CN. In order to characterize this effect, we use the custom-developed Analogic simulator to analyze the traffic patterns generated by the control policy within the CN. First, we consider the effect of the sampling period on the spatial distribution of traffic workload. Throughout the simulation experiment we record the number of times each controller has been reconfigured and this information is visualized in the form of a heatmap, the cells of which correspond to the controllers of the HSF. The "hotter" the color of the cell, the higher is the number of times it has been reconfigured. Fig. 12 depicts the spatial distribution recorded for the basic scenario when both θ and φ are successfully guided to the desired values. The first thing to note is that for small sampling periods, only a small percentage of the cells are re-configured frequently, with most cells being rarely reconfigured. This can be attributed to the fact that the variations in the reflecting angle are minor and therefore the required phase profile modifications are spatially limited thus affecting a small number of cells. However, as the sampling period increases, the load spreads spatially as more controllers need to be updated, however the frequency of the heavily loaded ones drops. The net load on the HSF is assessed by investigating the sending rate of packetized directives injected by the gateway, which is a characteristic of temporal behavior. This is depicted in Fig. 13, where it can be observed that as the sampling period becomes larger, the periodic traffic patterns exhibit a decrease in both their peak amplitude and frequency until the sampling period reaches 0.05sec, in which case the peak amplitude rises again with a further decrease in the frequency of the peaks.

B. Mobile Receiver
Until now we have considered a target receiver that is stationary with respect to the HSF. However, in many applications including V2V, Vehicle to Infrastructure (V2I), and UAVs, the target receiver is usually mobile in nature. This potential mobility of the receiver poses additional challenges to the beam steering scheme and raises concerns with respect to the effectiveness of the proposed scheme under such mobility scenarios. In the rest of the paper, we investigate this effect and its limitations with reference to two types of mobility models for the receiver. The first one involves a single mobile receiver that moves in a straight line parallel to the surface. For simplicity, the target receiver is considered at the same height as the HSF, such that φ i = φ r = 0. When the receiver moves parallel to the HSF, its angle of elevation θ changes persistently with respect to the HSF, and thus the proposed algorithm needs to reconfigure the parameters of the HSF continuously in order to ensure maximum received power. In the first set of experiments, we consider a constant speed of the receiver set to 5.04 km/ h, which corresponds to a scenario of a person walking. Fig. 14 depicts graphical representations of the behavior of the beam steering scheme and the corresponding received signal power. As highlighted earlier, the target only moves in a horizontal direction parallel to the surface, thus its azimuth is fixed and only the elevation angle changes. Fig. 14a depicts this in the form of the target elevation angle which is initiated at 42 degrees and linearly decreases with time to gradually reach 36 degrees. The initial deviation of the actual elevation angle from the one that will steer the beam towards the receiver is reflected on the received signal power attaining half its maximum value as shown by Fig. 14b. However, the proposed beam steering algorithm is successful in steering the beam towards the target location and within a very short time (0.25 seconds) the beam points sufficiently close to the receiver such that the received signal power achieves values close to its maximum. As the target continues to move the algorithm achieves fairly good tracking maintaining high power values. Fig. 14c presents the  In the second and third set of experiments, we continue to consider a linear motion of the receiver, however, the speed is increased to 20 and 72 km/h respectively. For the case of 72 km/h we also account for the effects of shadow fading which is modeled as a log-normal distribution with the standard deviation chosen to be 8.14dBs according to the design guidelines of [84]. The results are presented in Fig. 15 and Fig. 16 with the latter showing both the no fading and fading cases. The first thing to note is that the effects of shadow fading are not significant on the output responses. This can be attributed to the Low Pass Filter which the ESC scheme incorporates, which filters out the high-frequency components induced by fading. Further, it may be noted that the proposed algorithm is capable of steering the beam towards the target despite its increased speed. This highlights the suitability of the proposed method to be used for applications like V2V and V2I. Moreover, it can be observed from Fig. 15c that with the increase in speed of the target, more aggressive control action is required and hence more traffic is generated. Due to this fact, Fig. 15c shows more active cells as compared to Fig. 14c. Despite the good tracking, we observe that an increase in the speed results in higher fluctuations of the received signal power around the maximum value as shown by Fig. 15b. By increasing the speed further, as shown in Fig. 16, the algorithm is still able to track the beam near the target location, however, there are more variations in the corresponding received signals power and performance of the algorithm downgrades. The results demonstrate that, as expected, the increase in the speed has an adversarial effect on the tracking performance and this raises the question of which speed is the system performance at acceptable levels. We conduct additional simulations to investigate this and we observe that for a speed of 144 km/h, the algorithm yields poor performance in terms of convergence time and corresponding received signal power. Similarly, it has been observed that for speeds greater than 160 km/h the algorithm fails to converge within a reasonable time.
Remark 3: This performance bound in terms of the allowable speed of the target for the effective operation of the proposed algorithm has been tested via simulations. However, in the future, we intend to evaluate the performance via mathematical analysis and improve the performance by incorporating predictions of the receiver trajectory.
Remark 4: It is worth mentioning that the scenarios leading to Fig. 14, 15 and 16 involve horizontal movement of the receiver parallel to the HSF at different speeds. At relatively low speeds (Fig. 14 and 15) the simulation experiment ends before the receiver exceeds the center of the HSF, thus tracking a strictly decreasing elevation angle profile. At high speeds (Fig. 16), the receiver has sufficient time to exceed the center thus tracking a convex (decreasing, reaching a minimum, and then decreasing) elevation angle profile.

C. Mobility of the Receiver in 3D
The considered linear movement model is adequate for evaluation purposes and sufficient to describe a number of real-life cases i.e. a vehicle moving on a straight line road. However, more sophisticated models which could describe a richer set of scenarios, for example, UAV movement, involve mobility in the 3D space which causes both angles of azimuth and elevation to change. In this case, we need to control both θ r and φ r simultaneously in such a way that the beam tracks the moving target in 3D space and the corresponding received signal power is maximized. In the next set of experiments, we consider target movement which causes both θ r and φ r to change in spherical coordinates. The change in θ and φ is piece-wise linear, one proportional to the other. We evaluate the performance of our algorithm in controlling both angles simultaneously when both targeted values are time-varying. Fig. 17 shows the corresponding results, where it can be seen that initially the target was at (θ r = 52, φ r = 42) and the beam was pointing towards (θ r = 49, φ r = 39). Within 2 seconds, the algorithm successfully directs the beam toward the target location and maximizes the received signal power. After this time, the target keeps changing its location in three-dimensional space and the beam successfully tracks the moving target.
Remark 5: It is to be noted that for the case of both the azimuth and elevation angles changing, the convergence speed of the power to its maximum value is relatively slow as compared to the case of only one angle changing. Moreover, since both angles are adjusted based on a single measurement, namely the received signal strength, the proposed algorithm may fail in achieving the goals in the case when the azimuth and elevation angles are changing in different directions. In such a case, the control objectives become contradictory. To tackle this, a more sophisticated control algorithm involving observers to estimate the states using the received power measurement is required and will be the topic of future work.

V. CONCLUSION
In this paper, we have considered the application of extremum seeking control for the autonomous reconfiguration of Hypersurface controller states to guide an impinging wave towards the receiver so that the received power is maximized. Extended simulations have demonstrated the effectiveness of Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. the method even when the receiver is mobile. Speeds beyond 100 km/h can be tolerated rendering it suitable even for vehicular or UAV settings. The workload within the CN is also characterized and it is demonstrated that delays up to 2.2 msec are incurred due to the traffic which is not sufficient to render the system unstable. Finally, an increase in the sampling period reduces the load at the expense of a degradation in performance. Future work will involve multivariable cascaded control over multiple tiles and event-based extremum-seeking controllers which minimize the frequency of reconfigurations also accounting for the delays through appropriate predictors. A practical demonstration of the method on a real test bed is also aimed in the future. He has also been appointed as the Senate of the University of Cyprus; and the Board of Director of CYNET, the National Research and Educational Network, where he has serving as the Chair, since November 2016. He has more than 75 publications in academic journals, books, and international conferences, with more than 1300 citations. His research interests include protocol design and performance aspects of networks (fixed, mobile, and wireless), in particular mobility management, QoS adaptation and control, resource allocation techniques, wireless sensor networks, and the Internet of Things. He has published over 350 refereed articles in flagship journals (e.g. IEEE, Elsevier, IFAC, Springer), international conferences, book chapters, and coauthored two books (one edited). His broad research interests include communication networks, software defined metasurfaces/reconfigurable intelligent surfaces, and their application in programmable wireless environments, nanonetworks, smart systems, and e-health, and networking security aspects. He has a particular interest in adapting tools from various fields of applied mathematics such as adaptive non-linear control theory, computational intelligence, game theory, and complex systems and nature inspired techniques, to solve problems in communication networks.
Marios Lestas (Member, IEEE) received the B.A. and M.Eng. degrees in electrical and information engineering from the University of Cambridge, U.K., and the Ph.D. degree in electrical engineering from the University of Southern California in 2000 and 2006, respectively. He is currently an Associate Professor with Frederick University, Cyprus. His research interests include application of control theoretic tools and optimization methods toward the development of practical solutions in a number of Intelligent Networks and Cyber-Physical Systems, for example, computer networks, the Internet of Things, transportation networks, power networks, bio-nano-networks, and nano-metasurface controller networks. In the aforementioned networks, he has investigated issues pertinent to congestion control, information dissemination, network vulnerability, demand response, and more recently privacy and security. He has participated in a number of projects funded by the Research Promotion Foundation and the EU.