Cooperative Beamforming with Predictive Relay Selection for Urban mmWave Communications

While millimeter wave (mmWave) communications promise high data rates, their sensitivity to blockage and severe signal attenuation presents challenges in their deployment in urban settings. To overcome these effects, we consider a distributed cooperative beamforming system, which relies on static relays deployed in clusters with similar channel characteristics, and where, at every time instance, only one relay from each cluster is selected to participate in beamforming to the destination. To meet the quality-of-service guarantees of the network, a key prerequisite for beamforming is relay selection. However, as the channels change with time, relay selection becomes a resource demanding task. Indeed, estimation of channel state information for all candidate relays, essential for relay selection, is a process that takes up bandwidth, wastes power and introduces latency and interference in the network. We instead propose a unique, predictive scheme for resource efficient relay selection, which exploits the special propagation patterns of the mmWave medium, and can be executed distributively across clusters, and in parallel to optimal beamforming-based communication. The proposed predictive scheme efficiently exploits spatiotemporal channel correlations with current and past networkwide Received Signal Strength (RSS), the latter being invariant to relay cluster size, measured sequentially during the operation of the system. Our numerical results confirm that our proposed relay selection strategy outperforms any randomized selection policy that does not exploit channel correlations, whereas, at the same time, it performs very close to an ideal scheme that uses complete, cluster size dependent RSS, and offers significant savings in terms of channel estimation overhead, providing substantially better network utilization, especially in dense topologies, typical in mmWave networks.


I. INTRODUCTION
T HE continuously growing number of connected devices has led to congestion in the licensed spectrum. To alleviate the problem, next generation of commercial wireless networks will exploit the previously untapped millimeter wave (mmWave) spectrum band [1]. The mmWave spectrum encompasses the frequencies 30 − 300 GHz, however the considered mmWave bands for use in urban cellular systems are at 28 and 38 GHz, since it has been shown that rain attenuation and A. Dimas  oxygen absorption is low [2], [3] at those frequencies. The large available bandwidth will enable data rates of the order of Gigabits-per-second (Gbps), and thus will be able to support future wireless applications such as Vehicle-to-Vehicle (V2V) or Vehicle-to-Infrastructure (V2I) communications. However, to exploit the full potential of mmWave networks for outdoorto-outdoor wireless communications, a new set of challenges, which are not present in current wireless networks would need to be overcome, such as severe path loss, and sensitivity to blockage and fading [4].
Various measurement campaigns [2], [5]- [7] have revealed that mmWaves incur increased propagation path-loss as compared to the sub-6 GHz frequencies used today. One way to compensate for that effect is to employ transmit beamforming [8], which can increase the probability of the signal arriving successfully at the destination. MmWaves also experience large-scale fading, which, according to measurement campaigns conducted in New York City and Austin, and Texas [2], can be modeled as normally distributed in the dB domain. A similar campaign conducted in Daejeon City, Korea led to similar conclusions [5]. MmWave wireless communications in densely built cities are also susceptible to blockage by their surrounding environment, e.g., by high-rise buildings, moving vehicles, pedestrians, etc. [9]- [11]. In such an environment, the dominant propagation mode of mmWaves is along street canyons by reflecting off of buildings, in a way analogous to ray optics. This is facilitated by the high reflection coefficient of common materials found in cities, such as concrete and glass [12].
One way to overcome blockage effects would be to deploy a dense network of street level mmWave base stations. Although this would optimize the city wide Line-of-Sight (LoS) coverage rate [13], it would not provide Quality-of-Service (QoS) assurances. In fact, a dense network of mmWave base stations can degrade the QoS of users due to interfering base station transmissions [14], [15]. An alternative solution to overcoming blockage and improve coverage is through relaying [4], [8], [16], a concept that has been included in the currently applied LTE-Advanced standard [17]. Multi-hop relaying can improve connectivity [18] as well as the coverage probability and the transmission capacity of a network [8]. However, a large number of hops requires significant signaling and scheduling overhead to determine the most suitable relays, which in turn can lead to increased energy consumption and latency. Also, the potential security issues associated with multi-hop relaying cannot be ignored [19], [20], as the points at which a malicious user can intercept the transmitted signal increase. On the other  Fig. 1. A birds-eye view of the assumed urban topology. Each numbered red circle depicts an intersection, that connects two or more road segments. A solid numbered blue line depicts a relay cluster placed along the respective street segment. The dotted lines depict all mmWave signal propagation paths from the source to the destination, through relay cluster 3.
hand, 2-hop relaying in combination with relay beamforming can increase the communication range while avoiding most of the aforementioned issues [21], at the cost, however, of added complexity for the computation of beamforming weights and phase synchronization [22]. By additionally taking into consideration the 3D structure of the environment, signal blockage can also be avoided by optimal positioning of aerial mobile relays [23]. This approach has already been considered for supporting mmWave communication [24], in the context of Unmanned Aerial Vehicles (UAV) networks.
The most widely studied problem in 2-hop relaying for device-to-device outdoor communication is relay selection, i.e., determining the minimum number of deployed relays [25],and/or best relay configuration, in order to optimize the quality of communication [8], [26]- [29]. Typically, relay selection requires instantaneous Channel State Information (CSI) between the source and relays, and the relays and destination. When CSI changes with time, optimal relay selection can become a resource-demanding task, as the network needs to first compute its channels via the exchange of pilots, and then decide on the best relay configuration. This process takes up bandwidth, wastes power and introduces latency and interference in the network. In the following, any relay selection scheme that conforms to the aforementioned serial procedure will be referred to as ideal.
In this paper, we consider point-to-point, relay-assisted mmWave communications in an urban scenario, and propose a new, resource efficient relay selection scheme for overcoming the effects of blockage and severe attenuation, caused by compact urban forms. The proposed scheme is designed to optimally enhance QoS in 2-hop Amplify-and-Forward (AF) cooperative networks and, as compared to the state of the art, induces significantly lower CSI estimation overhead, while, providing substantially better network utilization.
More specifically, the system model considered consists of static relays deployed in clusters across streets, within a specific area over which the channels have similar statistical characteristics. This is typical in mmWave networks, which are primarily designed for relatively short distance pointto-point communications. Assuming a time-slotted system operation, the proposed scheme optimizes QoS in a 2-stage fashion, where, in every time slot, and simultaneously with AF beamforming to the destination, each cluster predictively selects a new representative (1st stage) to optimally enhance AF beamforming at the subsequent time slot (2nd stage). Predictive relay selection is achieved by exploiting channel correlations with current and past networkwide magnitudeonly CSI (also known as Received Signal Strength (RSS)) which is invariant to relay cluster size, and is measured sequentially during the operation of the system. In our approach, network QoS is quantified by the Signalto-Interference+Noise Ratio (SINR) at the destination, which is a standard performance metric, and our goal is to maximize that SINR, subject to a shared power constraint among all relay clusters. Therefore, the optimal beamforming weights need to be computed centrally for all clusters. Nevertheless, the proposed relay selection procedure is conducted in a completely distributed manner; each cluster independently decides its successor representative for the subsequent time slot by solving a simple local stochastic optimization problem, without the need for inter-cluster information exchange.
Exploiting CSI correlations to predictively determine mobile relay movement in cooperative beamforming networks was recently explored in [22], in a convetional, free-space setting. Here, we use a similar idea for relay selection. However, the special mmWave signal propagation characteristics make the treatment of the selection problem substantially different than that of [22]. In particular, the diversity resulting from the reflective nature of the mmWave signal propagation, as well as the possibly street-wise varying channel parameters are two features not present in free-space formulations, which induce significant complications herein, as far as both the description of the problem and the development of corresponding efficient implementation techniques are concerned.
Along with our proposed adaptive relay selection scheme, this paper makes the following additional contributions: 1) We propose distributed cooperative beamforming for SINR maximization in mmWave networks. Our beamforming formulation alone allows for efficient exploitation of the spatial diversity induced by dominant mmWave propagation paths, which is a consequence of the spatial propagation patterns of the mmWave medium [30]- [32]. The CSI induced by the mmWave propagation paths is optimally combined constructively at the destination, resulting in superior network QoS, without the disadvantages of multihop relaying. 2) As briefly mentioned above, within each time slot, the execution of the proposed relay selection scheme is completely decoupled from optimal beamforming. Consequently, optimal communications and optimal relay selection can be performed in parallel, resulting in substantially improved time slot utilization. This is due to the predictive nature of the proposed scheme, which allows for the determination of the best cluster representative before the start of each time slot. Additionally, we show explicitly that our predictive scheme requires less CSI estimation effort per time slot as compared to the respective ideal scheme, with such reduction being more pronounced as the relay density per cluster increases. This is particularly important in mmWave networks, where dense infrastructure is essential for achieving satisfactory performance [33]. 3) We propose a practical and computationally efficient technique for implementing our proposed relay selection scheme. Specifically, the local stochastic problem each cluster is responsible for is replaced by a surrogate based on Sample Average Approximation (SAA) [34], which relies on predictive Monte Carlo sampling of the channel uncertainty involved. The proposed technique efficiently exploits spatiotemporal correlations of the mmWave channel structure via a well-designed combination of Kalman filtering and Gaussian process regression, and results in easily computable, near-optimal relay selection policies.
The effectiveness of the proposed joint beamforming and relay selection system is confirmed through numerical simulations, conducted using synthetically generated CSI data. First, all our numerical results show that the SINR performance of our adaptive relay selection scheme is almost identical to the ultimate performance achieved via the respective ideal scheme. On the other extent, our simulations also verify that, as expected, the proposed strategic scheme clearly outperforms any randomized selection policy which does not exploit experience accumulated during system operation.
Additionally, we examine the effect of different cluster topologies on overall system performance. On the one hand, we confirm that expected network QoS increases with the number of clusters taking part in the communication, also revealing a probably sublinear relevant trend. On the other hand, our simulations corroborate that cluster placement in the city indeed affects the overall QoS of the network users, recognizing the importance of cluster assignment as an open problem for future research.

II. MMWAVE URBAN CHANNEL MODEL
This section is dedicated to the development of a sufficiently detailed urban mmWave channel model, which will be exploited throughout the rest of the paper. Our channel model is versatile enough so that it can be applied to any city topology consisting of a densely built area with high-rise buildings, separated by non-curved street canyons.
To facilitate our presentation, we hereafter consider simplified city topologies such as that of Fig. 1, which shows a top view schematic of a particular urban area, where the numbered circles indicate road intersections, and the vertical and horizontal lines connecting those circles denote streets. Due to blockage caused by high-rise buildings, the only way a mmWave signal starting from a source located at p S can reach its destination at p D is by traversing street segments [30]. More specifically, the transmitted signal is spatially diversified through all sets of consecutive, non-repeating segments from the source to the destination. Then, a (dominant) propagation path is defined as every such set of traversed street segments whose aggregate length is equal to the minimum 1 -distance from the source to the destination. We adopt the convention of [30], where the Line-of-Sight (LoS) portion of every path is the segment between the transmitting node and the nearest intersection, while the remaining segments comprise the Non-Line-of-Sight (NLoS) portion of the path. All considered paths have common LoS portion, while their NLoS portions differ.
To overcome severe signal attenuation, we deploy clusters of relays across certain street segments, which will beamform the signal to its destination. For simplicity, we assume that each cluster contains evenly spaced relays. At each time instance, only one relay from every cluster, namely, the cluster representative, is active. The relays are connected via fiber to a central node via which they can exchange information. A propagation path between the source or destination and any of the cluster representatives, as well as the corresponding LoS and NLoS portions of the path are all defined in exactly the same fashion as in the previous paragraph.
Let N c be the number of available relay clusters in the network. Also, let L r be the number of all possible signal paths from p S to relay cluster r = 1, . . . , N c . The channel between p S and a relay in cluster r located at p is experienced as a combination of all channels across all possible paths between p S and p. In particular, under the flat fading assumption, the complex channel gain from p S to point p along path i can be decomposed as [35] f ri (p, t) where f P L ri (p) is the path-loss component, f SH ri (p, t) the large-scale fading component (shadowing), and f M F ri (p, t) the small-scale fading component (multi-path). A similar decomposition holds for the channel g ri (p, t) from p to p D , along path i = 1, . . . , K r , where K r is the number of respective signal paths from cluster r to the destination.
In the mmWave setting, the channel path-loss does not depend on the Euclidean distance between p S and p, but rather on their absolute locations (Manhattan distance), and is therefore parametrized separately for each segment [7], [30], [36]. Let the set of all individually traversed street segments τ of path i to cluster r be denoted by S f ri , which includes the segment τ S where the source is located, but does not include segment τ r where cluster r is located. Similarly, the set of traversed segments of path i from cluster r to the destination, including the segment τ D the destination is located but excluding segment τ r , is S g ri . In the following, we consider only the source-relay channels f ri , for every path i associated with cluster r. The discussion for g ri follows in a completely analog manner, and is omitted for brevity.
As in [30], we also assume an additional loss ∆ occurring at every intersection, i.e., every propagation path exhibits a total intersection loss ∆N f r , where N f r are the number of traversed intersections from p S to p. Therefore, the overall path-loss component of channel f ri is expressed as where d τ S denotes the length of the LoS segment τ S , d f τr (p) is the distance between the intersection of segment τ r associated with f ri and location p in τ r (that is, the intersection of τ r which is 1 -closest to the source), and d τ denotes the length of the τ -th street segment. We assume that a relay cannot be located exactly on an intersection, so d f τr = 0. Likewise, the shadowing and multi-path components of the channel experienced across each path may be decomposed on a per-segment basis as (3) where s τ and q τ are the shadowing and multi-path terms experienced across segment τ . Consequently, by expressing the magnitude of (1) in logarithmic scale (dB), we obtain the additive model where We should note that, in (7), the combined shadowing components pertaining to segments without relays have been separated from the respective term referring to the segment containing the relay cluster. Those terms exhibit distinct statistical behavior, and will be considered separately.
In addition to the above, for every time slot t, we assume a phase term e j2πφτ (t) for each distinct segment τ ∈ S f ri , Similarly, for every time slot t and every location p, we assume another phase term φ f τr (p, t), also uniformly distributed in [0, 1]. Across all time slots, all segments, and all locations, all phase components are mutually independent, and also independent of the respective channel magnitudes, as well. Then, the channel f ri (p, t) can be reconstructed as . For all segments, we assume a log-normal distribution for modeling shadowing and multi-path fading [11]. The channel paths f ri , i = 1, . . . , L r , are statistically dependent, as they might traverse common segments. Still, it is reasonable to model all shadowing and multi-path components as being mutually independent across different segments, since each segment will exhibit distinct spatial features.
However, within each segment τ ∈ S f ri , β τ (t) is assumed to be zero mean and jointly Gaussian in time, with correlation between two time slots k and l given by where η 2 is the shadowing power and γ the correlation time [22]. Further assuming that the multi-path component q τ (t) is white in time with variance σ 2 ξ [37], the combined logmagnitude terms z τ (t) β τ (t) + ξ τ (t), t = 1, . . . , N T are jointly Gaussian with mean zero and covariance Likewise, the term β f τr (p, t), corresponding to the segment where cluster r is located, is assumed to be jointly Gaussian, both in space and time. Specifically, we assume that the individual relays of cluster r can be located at a discrete set of δ positions across the segment τ r . At two such positions, say p m and p n , and between any two time slots k and l, the spatiotemporal correlation of β τr (p, t) is defined as [22] E where the spatial kernel K F F is given by We further assume that the "incoming" and "outgoing" shadowing terms β f τr (p, t) and β g τr (p, t) at positions p m and p n and between time slots k and l are themselves correlated as . Crosscorrelation structure of the incoming and outgoing channel terms where the crosscorrelation kernel K F G is defined as and where d f ull is the length of segment τ r , and d max is the furthest possible distance between two discrete relay positions of a cluster. This kernel describes the correlation between the incoming and outgoing channels at each cluster, at different locations and at different time slots. Intuitively, correlation should be proportional to the size of the part of the segment which is traversed by both channels, if such a part exists (see Fig. 3c). Otherwise, as the distance between the locations where the two channels are respectively experienced increases, their correlation should be decreasing (see Fig. 3b). The proposed kernel captures precisely the behavior outlined above, while resulting in a valid cross-covariance structure for β f τr and β g τr . As above, assuming that q f τr (p, t) and q g τr (p, t) are both white in both space and time, as well as mutually independent, the collection of combined terms for i = 1, . . . , δ and t = 1, . . . , N T , are Gaussian with mean zero and covariance Σ τr ∈ R 2δN T ×2δN T given by where ⊗ indicates Kronecker product, and the per-slot crosscovariance matrix K ∈ R 2δ×2δ is defined as where, overloading notation, K F F , K GG and K F G are correlation matrices corresponding to the kernels (13) and (15), respectively, each evaluated on all δ 2 pairs of possible positions across segment τ r , according to some common order.
III. JOINT BEAMFORMING AND RELAY SELECTION Determining the cluster topology, i.e., the number of deployed clusters as well as their locations is an important problem that will be studied in our future work, and could draw from current literature on optimal relay placement [25], [38], [39]. Our proposed scheme operates assuming that the clusters change at a low rate, and focuses on the time period over which the clusters have been optimally determined and are fixed. During that time, the statistical model of the channel stays the same; however, the channel itself changes.
In every time slot, the proposed system jointly performs beaforming and relay selection, by addressing a 2-stage stochastic problem [22]. Before going into the details (and the advantages) of each stage separately, we should note that although the 2-stage problem refers to the necessary actions needed to be taken during a single time slot, in practice these actions refer to two consecutive time slots, due to the availability of the required CSI. More specifically, during time slot t, both current beamforming weights of the cluster representative are calculated (corresponding, as discussed below, to the 2nd stage problem at time slot t), and the relays from all clusters to be selected at the next time slot are determined (which corresponds to the 1st stage problem at time slot t + 1). Both tasks (beamforming and relay selection) are based on current CSI, as well as past CSI of cluster representatives selected up to time slot t.
We assume that 2-hop relaying is used to assist the communication between p S and p D . The whole network is assumed to operate for N T time slots. In each time slot t = 1, . . . , N T , the source at p S transmits the signal √ P S s(t), where s(t) is an information symbol with E[|s(t)| 2 ] = 1, and P S > 0 the source transmission power. The signal received at the representative relay of each cluster r, located at p r (t) is, where n r (t) ∼ CN (0, σ 2 ) is the reception noise at cluster r. Working in an Amplify-and-Forward (AF) fashion, each cluster representative modulates its received signal R r (t) by a complex weight w r (t) and re-transmits it. Note that a mmWave signal arriving at p D directly from the source and without the help of a relay has negligible power, and can be ignored. Therefore, the aggregate signal received at the destination from all relay representatives is where n D (t) ∼ CN (0, σ 2 D ) is the reception noise at p D .

A. Optimal Beamforming for 2-hop relaying
We extend the distributed relay beamforming schemes studied in [40], [41], to the significantly more complex setting of urban mmWave relay networks. Here, distributed beamforming is considered for enforcing relay cluster cooperation, such that all individual signal paths forwarded from all relay clusters are combined constructively at the destination.
At every time slot t, the goal is to obtain the respective beamforming weights to be used by each cluster, w(t) [w * 1 (t), . . . , w * Nc (t)] T ∈ C Nc×1 , such that the SINR at p D is maximized, subject to a total transmission power budget P C > 0 over all relay clusters. Define the vectors r = 1, . . . , N c . Then, after dropping dependence on (t) and (p r (t), t) for brevity, the SINR is maximized by solving [40] maximize where and 1 is the all-ones vector. A crucial technical property of problem (23) is that its optimal can be explicitly expressed as [22], [40] where S r p r (t), t is a vector of all random variables referring to the shadowing, multi-path, and phase terms of all unique segments traversed for all paths from p S to p D , which also pass through each cluster representative, located at p r (t).
As can be seen from (27), V (t) depends on the relay positions at time slot t. Thus, by optimally positioning the relays, V (t) can be further maximized. This problem is explored in the next subsection. Interestingly enough, it turns out that the optimal beamforming vector that achieves (27) enjoys an explicit form similar to that used in the free space scenario [40], i.e., where the alignment vector v max ∈ C Nc×1 is defined as and where |1 T f r | 2 and |1 T g r | 2 are the incoming and outgoing aggregate channels at p r , respectively. In other words, each cluster representative at p r does not need to estimate the individual channels from every propagation path, but rather only the aggregate channel from all propagation paths. In practice, this can be computed by the selected relay via the exchange of pilots. One may also observe that, while the i−th element of v max can be estimated by the i-th relay only, the vector norm in (28) involves the source and destination channels of all cluster representatives who will beamform at the current time. Therefore, that scalar will have to be computed centrally and then distributed to all clusters. This can be done through a highspeed, optical fiber based, backhaul network, that connects all relay clusters, as well as all relays within a cluster, with each other [42], [42]. Clearly, for the source and destination which are, e.g. moving vehicles, no wired connection to the backhaul exists. The beamforming stage requires O(N c ) operations.
As a final remark, we should also note that in practice, during this beamforming step, phase synchronization is required to take care of local oscillator phase offsets. Distributed beamforming synchronization is an active field of research [43], [44], that has also been studied in the context of mmWaves [45]- [47]. Here, we assume that the backhaul network can also take care of the synchronization between the relay clusters.

B. Optimal Relay Selection for 2-hop relaying
At every time slot, each cluster must decide which is the appropriate relay to be used for beamforming. Typically, this would first require estimating the respective channel of every relay in the cluster, and then deciding upon the strongest one. Clearly, this decision making procedure not only wastes power and bandwidth during CSI estimation, but also induces extra delay before optimized communication can take place within each time slot. This delay is significant, especially if the number of relays per cluster is large. In this subsection, we propose a new scheme for adaptive relay selection which completely avoids this overhead, thus resulting in much better time slot utilization.
More specifically, the proposed relay selection scheme is based on transferring the implementation of the relay selection procedure from the current time slot, to the previous time slot. In other words, relay selection would be implemented predictively by efficiently exploiting the statistical model of the mmWave channel, before the respective time slot starts. This immediately results in the complete elimination of the "waiting delay" discussed above; indeed, if predictive relay selection is sufficiently accurate, then the cluster representatives at each time slot can be predetermined, before the slot starts. This means that relay selection and beamforming can be completely decoupled within each time slot, and thus can be parallelized. What is more, as a desirable byproduct of eliminating this "waiting delay", the proposed scheme also comes with a substantial reduction on the amount of CSI required for relay selection, as well as significant power savings. See Section IV for a more detailed discussion.
We now describe the proposed relay selection scheme in detail. As described above, at time slot t, we are interested in deciding on the best relay representatives from all clusters to participate in beamforming at time slot t + 1, such that the networkwide SINR, V (t + 1), is maximized. However, at the current time slot t, future CSI needed for evaluating V (t+1) is not yet available. Nevertheless, a reasonable causal criterion for optimal relay selection is to maximize a projection of V (t + 1) on information available at time slot t. Following this path, we propose to maximize an MMSE predictor of V (t + 1) relative to the collection C r (t) of all magnitude CSI, or RSS, from the segments of all propagation paths associated with all previously selected representatives of cluster r, as well as the positions of the representatives themselves, up until and including t. Then, due to the additive structure of (27), each cluster r can independently solve where C r (·) constitutes the set of candidate relays within the cluster which can potentially be selected. This set can either be unconstrained, including any relay within the cluster, or constrained to only a subset of relays within the cluster. Next, define the sets S f r = ∪ Lr i=1 S f ri and S g r = ∪ Kr i=1 S g ri . Then, at every feasible location p ∈ C r (t), the objective of (30) may be expressed as where, dropping dependence on (p, t + 1), V I may be reexpressed in a more integration-friendly form as with F being a function of the sets Z f r = {z f τr , {z τ } τ ∈S f r }, and ϕ f r = {φ f τr , {φ τ } τ ∈S f r }, corresponding to the combined shadowing plus multi-path, and phase terms of the unique segments traversed in all paths between the source and cluster r, and respectively for G, Z g r and ϕ g r . Analytical expressions for F and G are presented in Appendix .
By a slightly tedious but straightforward procedure, it may be shown that the joint conditional density of all random variables contained in vector S r (p, t + 1) relative to C r (t) can be expressed as (by overloading notation) where U(·; 0, 1) denotes the uniform density on Note that, although phase information at time slot t + 1 is present in (32), the objective (31) is independent of phase information at past time slots. This is because of the standard assumption that, for each segment, the phase component of the channel is white in time and space, and mutually independent of the respective phase component of all other segments. Indeed, one may readily observe that, in (33), all distributions associated with channel phases are uniform in [0, 1], which is precisely the prior distribution of all phase components, for all segments taking part in the communication.
From the discussion above, it follows that tractably evaluating (31) is a challenging task. As it might be expected, the first step towards evaluation of (31) is the efficient determination of the aforementioned predictors. This is the subject of the next two subsections.
1) Channel Prediction for Cluster-free Segments: The shadowing component of the channel for a cluster-free segment τ , β τ , is a Gaussian process evolving in time, which may also be represented as a stable autoregression of order 1. Indeed, it may be easily shown that, at every segment τ ∈ ∪ Nc r=1 (S f r ∪ S g r ), β τ can be represented via the stochastic difference equation [48] where κ e −1/γ , β τ (0) ∼ N (0, η 2 ), with the latter being independent of w τ (t) At the same time, across time slots, the shadowing process β τ (t) cannot be measured directly. Instead, it may be considered as corrupted by unpredictable noise, due to the presence of the multi-path component ξ τ (t); indeed, at each time slot t and segment τ , the term z τ (t) = β τ (t) + ξ τ (t) is observed. Now, for every segment τ ∈ ∪ Nc r=1 (S f r ∪ S g r ), define the vector which contains all observable CSI magnitudes associated with that segment, up to time t. Then, exploiting the autoregressive representation of (34), it follows that the posterior distribution of z τ (t+1) relative to m 1:t τ is Gaussian with conditional mean and variance given by respectively, where, by definition, are the conditional mean and variance of β τ (t) relative to m 1:t τ , respectively. Therefore, determination of µ initialized by setting β 0|0 τ = 0 and ρ 0|0 βτ = η 2 , stemming from the statistics of the initial condition β τ (0). By direct comparison of (36) and (37) to the Kalman filter equations (40), (41) and (42), it is easy to derive an algorithm for the direct recursive evaluation of µ t+1|t τ and (σ t+1|t τ ) 2 , comprised, for t = 1, . . . , N T , by the dynamic equations initialized by setting µ 1|0 τ = 0 and (σ 1|0 τ ) 2 = η 2 + σ 2 ξ . For each cluster-less segment τ ∈ ∪ Nc r=1 (S f r ∪ S g r ), the corresponding Kalman filter may be implemented either centrally within each cluster, or in a completely distributed fashion, where each cluster-less segment is responsible for tracking its own channel, and then for distributing its estimate to the associated cluster, responsible for the actual relay selection.
2) Channel Prediction for Segments Containing Clusters: Next, consider the segment τ r , containing cluster r. Then, if we define z f,g τr [z f τr z g τr ] T and store all CSI measurements of every previously selected representative of cluster r in m 1:t τr = [z f,g τr (p r (1), 1), . . . , z f,g τr (p r (t), t)] T ∈ R 2t×1 , (46) then, for each location p ∈ C r (t), the mean vector and covariance matrix of the Gaussian random vector z f,g τr (p, t+1) conditioned on m 1:t τr are µ t+1|t τr (p)=(σ 1:t τr (p)) T (Σ 1:t τr ) −1 m 1:t τr ∈ R 2×1 (47) respectively, wherē andΣ 1:t τr ∈ R 2t×2t ,σ 1:t τr ∈ R 2t×2 are sampled for every time slot until t from Σ τr ∈ R 2δN T ×2δN T , at the positions that correspond to the distance between the candidate location p and the respective locations where the incoming and outgoing channels of segment τ r have been experienced so far. We should note that unlike before, (47) and (48) cannot be estimated using a Kalman filter, but rather, using full-blown Gaussian process regression. The dominant operation of (47) and (48) is the inversion of the covariance matrixΣ 1:t τr . The computational complexity of this inversion is of the order of O(t 3 ) operations, and grows with time due to conditioning on past CSI. Nevertheless, the complexity can be reduced to O(t 2 ), via a typical application of the matrix inversion lemma; for details, see ( [22], section VI).

3) Reduced-Complexity Sample Average Approximation:
Having determined the necessary posterior statistics involved in (33), our next step would be to evaluate the objective of (30) or, equivalently, the multidimensional integral (31). However, to the best of our knowledge, a closed-form representation of (31) is impossible to derive. Therefore, we resort to a near-optimal approach. In particular, we rely on the SAA method, and replace (30) by an easily computable surrogate, constructed via unconditional Monte Carlo sampling.
To define the proposed surrogate to (30), fix p ∈ C r (t) and t = 1, . . . , N T , and consider the change of variables (again, overloading notation) to the integral of (31). Additionally, also define the collections Then, (31) may be equivalently represented as with the functions F t+1|t and G t+1|t being defined as 1 and G t+1|t p, V g r , ϕ g G p, Σ t+1|t and whereS follows the distribution induced by the density The representation (52) exhibits an important and rather practically appealing property: The density pS is completely independent of both p and C r (t), and all such dependence has been transferred toV t+1|t I . Consequently, sampling from pS is greatly facilitated, and this fact is exactly what makes our proposed SAA-based scheme attractive, whose description now follows.
For each relay cluster r and at every time slot t, the SAA method works by randomly generating a total of N S scenarios, Beamforming (2nd stage of time slot t)

5:
Relay selection (1st stage of time slot t + 1) 6: for every cluster r do 7: Inputs: a) RSS {z τ (t)} τ ∈S f r ∪S g r until t. 8: b) RSS z f τr (t) and z g τr (t) until t.
Note that, following [22], it may be argued that exactly the same set of scenarios may be used by all relays, in all clusters, and even at all time slots. This, of course, keeps the sampling requirements at a bare minimum, networkwide.

C. 2-stage joint beamforming/relay selection
Our 2-stage joint beamforming and relay selection scheme is described in Algorithm III-B3. At time t, beamforming towards the destination is performed, which corresponds to the 2nd stage problem of time slot t. In this stage, the RSS and phases of the channel aggregates at every cluster representative need to be centrally collected, in order to compute the optimal beamforming weights. Within the same time slot t, and in parallel to beamforming, the 1st stage problem of time slot t + 1 is solved, i.e., every cluster individually selects the relay to be used for beamforming in the subsequent time slot. In this stage, in addition to the CSI of the cluster representative at p r , the relay selection process also requires the CSI of the unique segments that comprise the propagation paths to that cluster. This information can be easily acquired via low cost devices, e.g., channel sounders, placed on every street segment, and then sent through the backhaul network to the respective cluster.
In practice, it is sufficient to condition on a window of past time slots, as opposed to the entire observed RSS history. Such an approximation is expected to work well even for a relatively small window size, due to the exponentially decaying structure of the temporal correlation component of the channel model. Moreover, depending on the mmWave channel coherence time, it might be sufficient to follow a two-timescale design, where the beamforming weights would be computed in every time slot but relay selection would be executed over a longer time interval.

IV. OPERATIONAL PHASES OF THE TIME SLOT
In this section, we discuss how the operations of relay selection and beamforming are scheduled within each time slot, comparatively for the proposed and ideal selection schemes.
In every time slot of the ideal scheme ( Fig. 4 top), relay selection is always implemented before optimal beamforming; this is simply due to the fact that acquisition of the current RSS has to inevitably be performed in the same time slot as beamforming. On the other hand, in the proposed scheme ( Fig. 4 bottom), relay selection at the current time slot is implemented predictively during the previous time slot, by efficiently exploiting past RSS observations. As a result, the overhead caused by the relay selection process can be effectively bypassed, and optimal beamforming at the current time slot may be implemented completely in parallel with the predictive relay selection affecting the next time slot.
Next, let us look at the CSI estimation requirement of each selection scheme into more detail. In the ideal scheme, the incoming and outgoing channels of every relay for all clusters are initially estimated. This requires estimating N ideal = 2δN c distinct channels. Channel estimation is initiated by a pilot symbol broadcaster from the source, with every relay of all clusters measuring their RSS. A similar procedure is done for estimating the respective channels towards the destination. On the contrary, our proposed scheme requires estimating only the CSI of the cluster representatives, as well as the CSI of segment τ ∈ S f r ∪ S g r , for all r = 1, . . . , N c . Therefore, N proposed = 2N c + | ∪ Nc r=1 S f r ∪ S g r | channels have to be estimated, where | · | denotes the cardinality of a set.
Compared to the ideal, the proposed relay selection scheme is particularly advantageous in dense network topologies, where, to account for high channel variability, each cluster needs to include a large number of relays, and the number of relays per cluster is relatively larger than the number of segments taking part in the communication. Actually, a dense network is required even if the shadowing variance is low, since this implies weaker channel correlation, due to the now dominant multi-path fading. It is then clear that our proposed scheme requires significantly fewer channels to be estimated, which in turn leads to reduced channel estimation overhead. Note that, while our proposed scheme does incur a computational burden associated with the relay selection process, due to the need for execution of Algorithm III-B3, the parallelization of relay selection and optimal beamforming in each time slot not only compensates for that burden, but also naturally leads to more consistent ergodic performance, as long as the accuracy of predictive relay selection is adequate.

V. SIMULATIONS
We examine the performance of the proposed relay selection scheme using synthetically generated CSI data. For our simulations, we assume the topology and cluster placement of Fig. 1, unless otherwise stated. All segments are of length d f ull = 100m. For segments containing a relay cluster, we consider δ = 50 evenly spaced positions upon which the individual relays are located. We assume a carrier frequency of 28 GHz, with path-loss exponent similar to those reported in [5], [49], i.e. α L = α N = 2.1. The assumed channel parameters are η 2 = 40, γ = 15, β = 10m, ∆ = 10dB, σ 2 ξ = 20, σ 2 = 1, and σ 2 D = 1. The transmission power (which also includes the directional antenna gain) is P S = 80dBm and the total relay power budget is P C = 100dBm. The relay clusters are connected to a stable power source.
To assess the long term system performance of our SAAbased system, we compare it against two benchmark relay selection policies. The first is the ideal policy, where, as described previously, the best relay from every cluster is selected and used in beamforming, during the current time slot. This policy provides an upper bound on the performance of any admissible policy. The second benchmark is a nonstrategic, randomized policy, where a relay is chosen uniformly at random from its cluster to be used in the subsequent time slot, irrespective of the observed CSI. For the SAA and randomized policies, we also examine the constrained neighborhood case, where the only candidate relays are those in the 4 positions closest to either one of the two cluster segment edges.
All selection policies were executed over N T = 50 time slots, with N S = 500 SAA scenarios. This constitutes one trial of the whole communication task. The QoS achieved in each time slot, averaged over 10 4 trials, is shown for all policies in Fig. 5b. Note that assessing the performance of the proposed relay selection policies via averaging is correct and technically justified; see [22] for details. The superiority of the SAA policy is evident, as it achieves almost 7.7dB larger SINR than the randomized policy, while also performing only 2.3dB worse than the ideal policy. This significant result implies that near optimal selection is achieved by exploiting one-step ahead SINR prediction, rather than waiting for the actual future channel values, available at the next time slot.
For the ideal and SAA approach, we also show the histograms of the fraction of times a specific relay was chosen in every time slot. Observing the ideal policy of Fig. 7, it seems that there is a tendency to select relays located near either of the two end points of the cluster, which for relay clusters 2 and 3 is done about 60% of the time. This is more evident in the SAA approach where at every time slot about 99% of the total trials choose either of the end point relays.
Under the same channel and simulation parameters, we varied the number of available clusters. We first considered the case where the topology consisted only of clusters 1 and 2 of Fig. 1, and another were in addition to the four clusters of Fig. 1, two more were placed between intersections 6 and 5, and 4 and 7. The average SINR for all policies of the two and six cluster configurations are plotted in Fig. 5a and 5c, respectively. Notice that, as compared to Fig. 5b, the average SINR achieved for the two cluster configuration performs about 5dB worse for all policies, while for the six cluster configuration the performace is about 3dB better. This implies that the more relay clusters present in the city, the better the overall SINR performance achieved at the destination. Of course, when more relays are involved the synchronization overhead increases.
In Fig. 6a we also examined a topology where two clusters are placed near the source and two near the destination. This case can be thought of as directly assisting the source and destination, fairly. In reality, the probability of this happening in a city is expected to be low as there will be multiple users the clusters would need to service. Still, one can observe that the SINR performance is practically equivalent to that of Fig. 5c. In other words, the cluster placement problem is equally important to the performance of the overall system, and therefore deserves further investigation.

VI. CONCLUSION
This paper considered a set of spatially distributed clusters of static relays, cooperatively enhancing mmWave communications in an urban environment. In particular, on a per time slot basis, one relay from each cluster is selected to participate in optimal transmit beamforming towards the destination. In order to decide on the best relay configuration to be used for beamforming, we proposed an adaptive, sampling-based, distributed across clusters and of reduced complexity relay selection scheme, which efficiently exploits spatiotemporal correlations of the mmWave medium, and can be executed completely in parallel to optimal beamforming-based communication. The effectiveness of our approach is established via numerical simulations, which corroborate its superiority against a RSS-agnostic, purely randomized relay selection policy, as well as its efficiency relative to a reference, resource demanding ideal selection scheme. As a byproduct of our results, we also demonstrated the sensitivity of the proposed system with respect to spatial cluster placement, confirming that efficient spatial cluster assignment constitutes an interesting, natural direction for future research.

APPENDIX
If we assume the set S f r = ∪ Lr i=1 S f ri that contains all unique segments of all channels f i leading to relay r, as well as the sets Z f r = {z τ }, τ ∈ S f r and set ϕ f = {φ τ }, we have that: |f ri | 2 + |f r1 ||f r2 |(e j(Φr1−Φr2) + e j(Φr2−Φr1) ) + . . . × cos φ f τr (p i , t) − φ f τr (p k , t) where χ = ln(10)/10. Note that in the above, the left hand side refers to the unique segments traversed in all paths, while the equation on the right hand side to the ordered segments in a path. A similar expansion holds for G(Z g , ϕ g ).