Blockage-Robust Hybrid Beamforming Enabling High Sum Rate for Millimeter-Wave OFDM Systems

We propose a scheme for the concomitant design of hybrid beamforming and per-carrier transmit power allocation to mitigate the effect of random path blockages in coordinated multi-point (CoMP) systems using orthogonal frequency division multiplexing (OFDM) in millimeter-wave (mmWave) channels. In order to optimize both the beamformers and power allocation while dealing simultaneously with outage minimization and sum rate maximization (SRM) requirements, a regularized sum-ofoutage minimization problem is formulated. The problem is then transformed into an empirical risk minimization (ERM) problem, solved via block stochastic learning and manifold optimization, with required learning rates derived and tuned to guarantee convergence. The method, which demands only a few radio frequency (RF) chains and relies only on knowledge of blockage probabilities, is shown via simulation results not only to outperform state-of-the-art (SotA) alternatives, but to actually achieve outage probabilities comparable to those a fully digital CoMP-SRM scheme with perfect knowledge of instantaneous blockages.

further utilization of the mmWave and sub-THz bands will be crucial to address spectrum shortage and meet even more sophisticated requirements [4], [5], [6].
Wireless communications at high-frequency bands suffer, however, from severe signal attenuation owing to larger free-space propagation losses [7], [8].Fortunately, mmWave systems can be equipped with more antenna elements than microwave systems, due to the shorter wavelengths of carriers.Therefore, beamforming schemes operating with large antenna arrays that achieve high directivity can compensate for power losses, making multiple-input multiple-output (MIMO) technologies fundamental in future wireless communication systems [9], [10], [11].
The first contributions in this direction relied on fully digital beamforming approaches that require systems to be equipped with the same number of radio frequency (RF) chains as the number of antenna elements.In large MIMO systems, however, fully digital architecture leads to high cost and high power consumption demands, which limits the practicality of their implementation.In response to the latter, hybrid beamforming methods, which require fewer RF chains than antenna elements, have gained much attention as a practical alternative for mmWave MIMO [12], [13], [14], [15].However, the highly directive transmissions in sharp beams are prone to sudden and rapid attenuation due to blockages in propagation paths, caused by small objects such as pedestrians or vehicles [16], [17], [18], [19], [20].
In order to overcome this challenge, new blockage-robust transmission strategies have been actively discussed recently.An example of this is CoMP transmission, whereby multiple synchronized base stations (BSs) transmit data cooperatively, which was shown to maintain high data rates even in the presence of blockages [21], [22].
However, CoMP systems do not really resolve the path blockage problem, but rather avoid it by adding more diversity to the channel.The approach therefore detracts efficiency from the network, since power continues to be transmitted towards blocked paths.In contrast, blockage prediction and mitigation strategies have been proposed [23], [24], [25] to directly address the path blockage problem.To cite a few examples, methods based on the spatial correlation between mmWave and sub-6 [GHz] channels [23], visual information from cameras [24], and in-band signatures [25] were proposed, which predict instantaneous blockage occurrence or their probabilities.
As for mitigation approaches, although techniques based on handover management have been proposed [26], that strategy may lead to rapid throughput degradation and considerable delays due to need for re-establishing links when prediction fails, such that recent work has rather focused on the incorporation of blockage predictions into CoMP transmission methods [27], [28], [29], [30], [31].As an example, a robust CoMP transmission method was proposed in [27], which is based on the design of hybrid beamforming with blockage probability, solved by a worst-case optimization approach, aiming at maximizing the total system data rate.Despite the elegance of the design, due to the high computational complexity, the method can mitigate blockage effects on lineof-sight (LOS) paths only, which results in frequent outages for some users and, consequently, sub-optimal quality of service (QoS).
In turn, cooperative beamforming designs to guarantee QoS under blockage occurrences on both LOS and non-line-ofsight (NLOS) paths were proposed in [28], [29], [30], and [31], which employ stochastic learning methods to minimize outage probabilities with respect to prescribed target rates, relying on knowledge of blockage probabilities.In particular, a fully digital outage minimization (OutMin) beamforming was first presented in [28], where a sum-of-outage minimization problem was formulated and cast into an empirical risk minimization (ERM) problem, efficiently solved via a mini-batch stochastic gradient descent (MSGD) approach.The extension of the latter to a hybrid design was then proposed in [29], and the approach was modified in [30] to also exploit reflected intelligence surfaces (RISs), by employing a block mini-batch stochastic gradient descent (BMSGD) technique to design beamforming vectors and reflection coefficients jointly.
A limitation of the aforementioned blockage-robust mitigation methods [28], [29], [30] is, however, that they all consider single carrier transmission over frequency flat channels, making them unsuitable to mmWave systems, which operate over much wider bandwidths than sub-6 [GHz] systems and are affected by frequency selectivity.While the effect of the frequency selectivity can be effectively mitigated by equalization over orthogonal frequency division multiplexing (OFDM) transmissions, conventional methods based on this approach exhibit high outage probabilities due to the per-carrier transmit power allocation that does not consider blockage probability and the distribution of the sum rate over subcarriers.
In order to address this limitation, in [31], a BMSGD-based scheme for the joint design of OutMin fully digital beamformers and optimal transmit power allocation was proposed for multi-carrier OFDM mmWave MIMO systems, which was shown to successfully combat both path blockage and frequency selective effects of the channel.Still, the method proposed in [31] has two drawbacks; requiring a fully digital architecture and exhibiting a decrease in total system data rate the same as single carrier approaches [29], [30].
From all the above, it is natural to consider the joint design of per-carrier power allocation, baseband beamforming, and analog beamforming to guarantee QoS requirement while maintaining a high total system data rate for mmWave OFDM systems, which is the novelty of this article compared with our previous work [31].We therefore extend the latter approach to a flexible hybrid beamforming alternative, which is furthermore designed to minimizing both outage and loss in data rate.Simulation results confirm that the proposed scheme achieves, using only blockage probabilities and a few RF chains, outage probabilities comparable to those of a fully digital CoMP-sum rate maximization (SRM) transmission scheme under the ideal case where full knowledge of actual instantaneous blockage occurrences is available.These results also show that the proposed scheme achieves higher total system data rates than state-of-the-art (SotA) schemes while maintaining comparable outage probabilities.
The method is further optimized by the derivation and tuning of the convergence-guaranteeing learning rates, as well as with a discrete Fourier transform (DFT)-based initialization beamforming, which are also shown via simulations to be effective.The contributions of the article can be summarized as follows: • A sum-outage-probability minimization problem is formulated, including per-user data rates aggregated over multiple subcarriers, manifold constraints, and a regularizer to increase the data rate, which ultimately enables joint hybrid robust beamforming design and power allocation, and balancing outage probability and total system data rate.• A new BMSGD approach is developed to efficiently solve the aforementioned problem, yielding hybrid (both baseband and analog) beamformers for all subcarriers and users with optimal powers.• The learning rates required to guarantee the convergence of the method are derived and tuned to obtain the lowest empirical risks, both for outage probabilities and rate losses.• A simple initialization beamforming using the DFT matrix is introduced, which is also shown to be effective in improving the overall performance of the scheme.Notation: The following notation is used throughout the article.Matrices and vectors are denoted by upper-and lower-case bold letters, as in X and x, respectively.The j-th column of a matrix X is denoted by [X] j .The sets of integers, real numbers, and complex numbers are represented by N, R, and C, respectively.The operators (•) T , (•) * , (•) H , and Tr(X) respectively denote the transpose, conjugate, complex conjugate transpose, and trace of the argument.A diagonal matrix obtained from a vector x, and a block diagonal matrix obtained from given matrices are respectively denoted by diag(x) and blkdiag(• • • ).An N -dimensional vector whose elements are all 1, and the identity matrix of size N , are respectively denoted by 1 N and I N .The operators ⊗ and • denote the Kronecker and Hadamard products, respectively.The functions vec(•) and ∥ • ∥ p , respectively, denote vectorization and the l p norm of the argument.The real part of a complex number is denoted by R(•), and the circularly symmetric complex normal distribution with mean µ and variance σ 2 is denoted by CN µ, σ 2 .

A. Communication Scenario
Consider a CoMP downlink system employing OFDM transmission, as shown in Fig. 1, where multiple BSs, each equipped with an uniform planar array (UPA) comprising of N t antenna elements and a fully connected structure with N RF RF chains, cooperatively serve multiple single-antenna user equipments (UEs).The BSs are synchronized and connected to a common central processing unit (CPU), which designs baseband and analog beamformers via a fronthaul.It is assumed that space division multiple access (SDMA) enables access to multiple UEs, and all subcarriers are assigned to each UE.
It is assumed that the uplink and downlink communications are separated via time-division duplexing (TDD), such that the uplink and downlink channels can be assumed reciprocal, except for path blockages 1 , as illustrated in Fig. 2.
The path gains, angle of departures (AoDs), and multipath propagation time delays are assumed to be perfectly estimated from uplink signals.Following related literature, it is assumed that propagation paths may be suddenly blocked by surrounding obstacles, with probabilities ranging from 20 % to 60 % [16], [18], and that the blockage probability of each path can be perfectly estimated using any blockage prediction method [24], [25].These assumptions imply that the available channel state information (CSI) may differ from the actual CSI during data transmissions, even under perfect channel estimation.

B. Channel Model
Let B ∈ N and U ∈ N denote the total number of BSs and UEs, while b ∈ B ≜ {1, 2, . . ., B} and u ∈ U ≜ {1, 2, . . ., U } denote the BS and UE indices, respectively.It is assumed that the mmWave channel between the b-th BS and the u-th UE contains a random number C b,u of clusters, with C b,u modeled 1 Although path blockage at the uplink also happens, modeling the phenomenon is unnecessary for design purposes since uplink blockages merely prevent the corresponding paths to be known/exploited at the downlink [28].
Consequently, the estimated channel ĥb,u [d] ∈ C Nt between the b-th BS and the u-th UE at the d-th delay tap can be modeled as [33] ĥb,u where N h t ∈ N + and N v t ∈ N + respectively denote the number of antenna elements in the horizontal and vertical directions, satisfying N t = N h t N v t , and c N ∈ C N is the uniform linear array (ULA) response, given by [33] Let k ∈ K ≜ {0, 1, . . ., K − 1} denote the subcarrier indices, where K ∈ N + is the total number of available subcarriers.Using the relationship between the delay and frequency domains described by the Fourier transform, the mmWave channel between the b-th BS and the u-th UE at the k-th subcarrier can be modeled as During data transmission, objects such as human bodies or vehicles block the estimated path [7], [16], [17], [18], [19], [20] with probability p c b,u .These blockages are modeled using the random variable ω c b,u ∈ {0, 1}, which are assumed to follow a Bernoulli distribution, typical in related work on blockage robust CoMP [27], [28], [29], [30], [34].The mean of the Bernoulli distribution corresponding to the c-th cluster from the b-th BS toward the u-th UE is blockage probability p c b,u .During data transmission, the actual channel between the b-th BS and u-th UE at the d-th delay tap and the k-th subcarrier can be modeled as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Because the sampling duration T s specified in 5G new radio (NR) is significantly shorter than the continuous blockage time revealed in some studies [17], [18], it is assumed that all the delayed rays of the c-th cluster experience the same blockage.Hence, if ω c b,u is zero, the path gain of the c-th cluster from the b-th BS toward the u-th UE disappears completely for any subcarrier.The actual channel in the data transmission phase h b,u [k] differs from the estimated channel ĥb,u [k] unless all estimated paths are available ω c b,u = 1, ∀b, u, c.

C. Received Signal Model
Let f b,u [k] ∈ C NRF and V b ∈ C Nt×NRF denote the baseband beamforming vector from the b-th BS toward the u-th UE at the k-th subcarrier and the analog beamforming matrix at the b-th BS, respectively.The received signal at the u-th UE and the k-th subcarrier can be represented as where the channel vector h u [k], the baseband beamforming vector f u [k], and the analog beamforming matrix V are respectively defined as ∈ C are the transmitted data symbols taken from a zero-mean unit-energy constellation, and ) denotes additive white Gaussian noise (AWGN), at the u-th UE and the k-th subcarrier.

D. Proposed Problem Formulation
In view of the system and signal models of the previous subsections, consider the case when the system has only knowledge of blockage probabilities, such that the achievable data rate of each user is probabilistic quantity dependent on the actual (unknown) blockage realization.In this case, actual rate-maximization is impossible, and since we are considering a multi-band system, the most reasonable option to guarantee QoS is to minimize outage probability of the sum of data rates over all subcarriers.
It is known [31], however, that such an OutMin scheme tends to lead to a loss of total sum data rate, in comparing conventional approaches [21], [22].We therefore propose a beamforming design considering the balance between outage minimization and sum rate-maximization, formulated as the following regularized sum-of-outage-probability minimization problem minimize where denote the maximum transmit power at the b-th BS, the target rate for the u-th UE, and the achievable data rate at the k-th subcarrier of the u-th UE, respectively.
For clarity, the data rate in equation ( 8a) is given by .
The set M NtNRF in constraint (8c) is the Riemann circle manifold defined as The second term ℓ u in the objective function (8a) is a regularizer introduced to control eventual losses in total sum rate resulting from the outage-centric approach, with the scalar µ ∈ R + denoting, as usual, a hyper-parameter to be determined later.Details of the motivation to employ such regularizer are explained in the next section.

III. PROPOSED HYBRID BEAMFORMING DESIGN
In this section, we propose methods to solve the problem (8), starting however with a brief comparative review of the approach generally taken in related literature, with the aim of establishing a reference for the future purpose of performance assessment.

A. Conventional Approach: Per-Carrier Outage Minimization
In conventional methods [29], [30], rather than attempting to solve the original optimization problem (8), the latter is first divided into K sub-problems with µ = 0, which implies that the multi-band nature of OFDM is not explicitly taken into account.In addition, as described in [31], the mismatch between the estimated and actual CSI (due to post-estimation path blockages), and the lack of an analytical expression of the rate distribution, requires the relaxation of each sub-problem into minimize Notice that the solution of this problem is bound to be sub-optimal in terms of outage probability itself, since the latter objective is not minimized over the ensemble of subcarriers, leading to the allocation of powers that are oblivious to sum-rate distributions, including effects of both frequency selectivity and blockages.Although a beamforming scheme Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
that inherently integrates optimal power allocation based on numerical evaluations of sum-rate distributions over all subcarriers was proposed in [31], the latter is based on a fully digital architecture (i.e., N RF = N t and V b = I Nt ) and still maintains the focus on outage minimization (i.e., µ = 0), exposing the method to degradation in total system data rate.
In contrast to [31], the proposed formulation in equation ( 8) enables the suppression in data rate loss.Furthermore, if solved directly in a hybrid fashion, the alternative also results in the optimum power allocation across subcarriers, yielding a cost-effective mechanism to reduce the sum-ofoutage probability while maintaining a high total system data rate.This motivates our key contribution, introduced in the next subsection.

B. Proposed Regularization
In principle, the regularizer ℓ u , ∀u can be any outage function that returns a larger value at a lower data rate.In order to evaluate sum-rate distribution properly, however, the regularizer must be stochastic so as to enable the design hybrid beamforming under a stochastic learning framework only.
We therefore consider the regularizer defined by where ru ∈ R + denotes the predicted data rate at the u-th UE.This regularizer implicates that the proposed approach improves data rates by minimizing rate loss probability calculated as the difference between achievable k∈K R u [k] and predicted data rates ru . 2n what follows, we show an example of a predicted data rate.Given that minimization of rate loss, defined as the difference between achievable and ideal data rates, is equivalent to SRM, the predicted data rate ru can be determined by hybrid beamformers designed with basis on estimated CSI ĥu [k] consisting of unblocked estimated paths only, which can be achieved by solving the following SRM problem maximize subject to whose solution can be obtained via the quadratic transform (QT) [35] and any hybrid beamforming designs [13], [15].Let f [k], ∀k and V denote the optimal solutions for the optimization problem (13).Then, the predicted data rate ru can be calculated as which substituted into problem (8) yields minimize where denotes the expectation of channel realization caused by path blockages, and the function t 1 is defined as Since the indicator function in ( 14) is not smooth, the gradient of the objective (15a) can not be directly evaluated.
To circumvent this problem, we introduce the following generalized smooth-hinge surrogate function ν(•), defined as [31] Thanks to the above, the following cost function t ν (•) can be defined as (18) which introduced into objective (15a) enables the original optimization problem (8) to be rewritten as the following ERM problem minimize that can be solved efficiently, for instance by stochastic approximation approaches [36].

D. Proposed Solver: Block Stochastic Gradient Descent
If information on the path gain coefficients g c b,u , the AoDs θ c b,u and ϕ c b,u , the time delays τ c , and the blockage probabilities p c b,u are available, channels with various blockage patterns can be generated as training data for stochastic optimization.In turn, referring on the mini-batch approach [36], the ERM problem in equation ( 19) can be reduced to an equivalent problem by replacing the expectation value with an empirical mean calculated over the training the data subject to where V)}, and M ∈ N + denotes the mini-batch size.
Although baseband beamforming vectors for all subcarriers f [k], ∀k ∈ K and analog beamforming matrices V b , ∀b ∈ B should be updated to minimize the empirical risk, it is difficult to calculate the gradient of the objective function for all variables.Fortunately, such nonconvex multivariate ERM problems with manifold constraints can be solved via MSGD, using either an alternate update approach [29] or BMSGD algorithms [30].
Since, however, the contributions in [29] and [30] do not incorporate optimal power allocation over the subcarriers, nor do they account for the summation inside the hinge function, we introduce in the sequel a new, purpose-built BMSGD approach to update such variables efficiently and thus solve problem (20).
1) Baseband Beamforming Design: First, the gradient of the objective function (20a) is calculated for each beamforming vector f [k ′ ], k ′ ∈ K, with all the remaining beamforming vectors f [k], k ∈ K\k ′ and analog beamforming matrix V fixed, using where denotes the gradient of the SINR for the baseband beamforming vector f [k ′ ], given by with where e u ∈ {0, 1} U denotes the vector of length U with only the u-th element equal to 1, and ēu ∈ {0, 1} U denotes its complement, such that e u + ēu = 1 U .With possession of the gradient in equation ( 21), the corresponding baseband beamforming vector f [k ′ ] is updated as where denote the learning rate, and the baseband beamforming vector at the i-th iteration and the k-th subcarrier, respectively.
On the matter of baseband beamforming, similarly to the fully digital approach of [31], after updating one vector f [k ′ ], k ′ ∈ K, the vectors for all subcarriers f [k], ∀k ∈ K are projected onto the feasible region by normalizing to satisfy power constraints.The baseband beamforming for each subcarrier is sequentially updated following these operations with the same training dataset h m u [k], with m = {1, . . ., M }, ∀k, u, in order to combine the beamforming design with power allocation over subcarriers based on the stochastic learning of blockage effects.
2) Analog Beamforming Design: After updating all baseband beamforming vectors, the analog beamforming matrix is updated using the same training dataset.The received signal is rewritten using the relation vec(AXB) = (B T ⊗ A)vec(X) so as to enable the calculation of the gradient of the objective function (20a) for the analog beamforming, which is given by The analog beamforming vector vec(V), which has a sparse structure representing CoMP transmission, can be decomposed into the non-sparse vector v ≜ vec( Ṽ) ∈ C BNtNRF and the matrix The Euclidean gradient of the objective function (20a) for the analog beamforming vector v can be calculated as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where ∇ v Γ u [k] denotes the gradient of the SINR for the analog beamforming vector, given by Unlike the previous baseband beamforming process, here updates based on Euclidean gradients is insufficient, since manifold optimization [37] must be performed to satisfy the unit modular constraints (20c) defined by the Riemann circle manifold in (10).In this case, the tangent space at a given x ∈ M NtNRF can be defined as in [29], while the Riemann gradient can be expressed as an orthogonal projection of the Euclidean gradient such that the analog beamforming vector can be updated as where α v i ∈ R + and v (i) ∈ C BNtNRF denote the learning rate and the analog beamforming vector at the i-th iteration, respectively, and Retr[•] is a retraction operation to satisfy the unit modular constraints in equation (20c).
We summarize the proposed OutMin hybrid beamforming design in Algorithm 1, where f ini [k] and v ini denote the initial baseband and analog beamforming for stochastic learning, respectively.
3) Initial Beamforming Design: It is well-known [38] that the convergence behavior of stochastic learning methods can be significantly affected by the initial point used to kick-start the optimization process.It is therefore worthwhile to briefly discuss a suitable initialization alternative, especially in the case of hybrid designs where the unit modular constraint imposed onto the analog beamforming component makes the design of initializers more challenging.
In order to satisfy such constraints, we first select analog beamforming vector among the column vectors of the DFT matrix  Determine update order randomly K ′ ={k 1 , . . ., k K } 6: Update f (i) [k ′ ] following equation (25) 8: Projection of f (i) [k], ∀k onto the feasible region 9: end for on k ′ 10: Update the vector v (i) following equation (32) 11: Retraction Projection of f (i) [k], ∀k onto the feasible region 13: Go to line 14 if convergence before i = I BMSGD 14: end for on i 15: {1, 2, . . ., N RF }-th column of the initial analog beamformer at the b-th BS as Finally, the initial baseband beamformer is then obtained based on the maximum ratio transmission (MRT) criterion to maximize signal-to-noise ratio (SNR), which contributes to both outage minimization and rate maximization, namely where is normalized to satisfy the power constraints, considering equal power allocation over the subcarriers.

E. Learning Rates for Convergence Guarantee
It is well known that sufficient conditions for the convergence of stochastic learning algorithms with shrinking learning rates α i are [39] In turn, the Lipschitz criterion [38] ensures the convergence of the BMSGD algorithm by adjusting the learning rate via α i = ρ/( √ i • L ⋆ ), where ρ ∈ R and L ⋆ denote the scaling coefficient and lower bound of the Lipschitz constant, respectively.In what follows we therefore derive a lower bound on the Lipschitz constants L ⋆ f and L ⋆ v , to be respectively used in baseband and analog beamforming, so as to obtain the learning rates α f i [k], ∀k and α v i , that ensure the convergence of Algorithm 1.
Let ∇ 2 f be the Hessian of a generic objective function, then, as discussed in [29], it can be shown using the Taylor theorem that the lower bound on the Lipschitz constant is given by where λ max (•) denotes the largest eigenvalue.It follows that the largest eigenvalues of the Hessians ∇ 2 k ′ t νu and ∇ 2 v t νu of the objective function (20a) with respect to the baseband and the analog beamformers for the u-th UE, respectively, satisfy the following inequalities: such that the corresponding learning rates that ensure convergence, are given by Proof: See Appendix.

F. Learning Rates for Performance Improvement
Although the learning rates in (38) guarantee convergence, they may be quite small depending on the tightness of the inequalities in (37), which may either cause excessive delays or, when combined with the shrinking criterion may lead to a premature termination prior to sufficient learning, resulting in sub-optimal local solutions.To mitigate this problem, we propose in the sequel alternative (heuristic) learning rates which are later shown via simulations to lead to lower outage probabilities.
The key idea is to tune the learning rates considering that the gradient at the first iteration is steeper for larger target rates.Therefore, in order to avoid undesirable updates caused by exceedingly steep gradients, the proposed learning rates consist of the inverse function of the target rate, while subsequently maintaining the shrinking strategy [39] to satisfy the convergence criteria (35).Mathematically, the alternative learning rates are given by where f i ≜ 3 (•) and f t ≜ (•) are defined from the results of the hyperparameter tuning.
It will be shown in Section IV-A that the learning rates obtained from equation ( 39) lead to a convergence behavior similar to that obtained via hyperparameter optimization approaches such as Optuna [40], which is the current SotA.

G. Computational Complexity
The most expensive step of the proposed algorithm is the computation of the SINR expression for gradients and sum rates k∈K R u [k], ∀u.It is clear from equations ( 22) and ( 28) that the complexity orders of the gradient calculations for the baseband and analog beamformers are ), respectively, considering the block diagonal structure of matrix V, the sparsity of matrix W, and all the subcarriers.
The proposed algorithm must also recalculate the sum rate over subcarriers k∈K R u [k] for each beamformer update, resulting in K times sum rate calculation at each iteration (i = 1, . . ., I BMSGD ).In the first update of the baseband beamforming vector f (i) [k 1 ] in Algorithm 1, the complexity order of the sum rate calculation is the same as that of gradients.In subsequent update phases, the SINR expression, consisting of the beamformer updated via equation ( 25) in the previous phase, requires the complexity order O(max{U B 2 N 2 RF , BN t N RF }).For the remaining K − 1 subcarriers, calculations of the SINR require only scalar multiplications (i.e., normalizations), whose complexity order is O((K − 1)BU ).Therefore, the complexity order of sum rate calculations is O(max{K(K − 1)BU, 2KU B 2 N 2 RF , 2KBN t N RF }) in total at each iteration.From the above, the complexity order of the proposed algorithm is O(max{K(K − 1)BU, 2KU B 2 N 2 RF , KB 2 N 2 t N 2 RF }) owing to the SINR expression, which is common for both data rates and gradient expressions. 4Considering mmWave systems, which usually employ large or massive antenna arrays, such that the inequalities N 2 t > 2U and B U N 2 t N 2 RF − 1 > K typically hold, the computational complexity of the proposed hybrid beamforming design is O(KB 2 N 2 t N 2 RF ), which is the same order as the conventional design of [29], on a per-subcarrier basis.

IV. PERFORMANCE ASSESSMENT
In this section, we evaluate the proposed OutMin hybrid beamforming scheme, contrasting its performance with those of the comparable hybrid alternating minimization (AltMin) approach of [15], as well as with SotA fully digital beamforming techniques such as the MRT, the minimum mean square error (MMSE) [21], [22], and the conventional OutMin methods of [28], [31].
In order to serve as a reference lower bound on the achievable outage probability, comparisons with an ideal CoMP-SRM transmission scheme with perfect knowledge of the actual CSI h b,u [k], ∀b, u, k and their instantaneous blockages is also offered.
In our computer simulations we consider a square cell with a width of 100 [m] and B = 4 BSs located at the corners, which cooperatively serve U = 2 single-antenna UEs randomly located within the cell.Each BS is equipped with an UPA with N h t = 4 horizontal and N v t = 4 vertical antenna elements, but only N RF = 2 RF chains in the case of hybrid methods. 5he maximum transmit power per BS is set to P max,b = 30 [dBm], and a total of K = 36 subcarriers with W = 240 [kHz] subcarrier spacing operating at the 28 [GHz] band is assumed, which leads to a sampling period of T s = 1/ (W K) = 0.115 [µs]. 6Pulse shaping is performed with a root-raised cosine roll-off filter with the roll-off rate of 0.8, and the equivalent pulse response p(dT s −τ c ) is calculated using the raised cosine roll-off filter.It is also assumed that the time delay of each cluster τ c follows a uniform distribution in the interval [0, DT s ] as [14], where D = K/4 = 9.
The blockage probability p c b,u of each path follows an independent equivalence uniform distribution in the interval [0.2, 0.6] [16], [18], and the AWGN variance at the u-th UE and the k-th subcarriers is given by where κ denotes the Boltzmann constant, T = 293.15[K] denotes the physical temperature, and NF = 5 [dB] is the noise figure.
The MRT and MMSE beamformers are designed based on estimated CSI and equal power allocation, which is optimal under unpredictable CSI errors [41], [42], [43], to keep fair comparisons.Then, these beamformers are computed by normalizing to the power constraint the expressions where the matrices are respectively defined as The hybrid AltMin beamformer [15] is obtained via the solution of following the Frobenius minimization problem minimize where F opt b [k] ∈ C BNt×U denotes a pre-calculated fully digital beamformer, which for the sake of a fair comparison is here given by the proposed baseband beamformer of Section III-D.1, with N RF = N t and V b = I Nt , ∀b.
In turn, the fully digital MSGD-based OutMin beamformer, is obtained from the optimization problem (11) with N t = N RF and V b = I Nt , ∀b.Notice that this beamforming achieves the upper bound on the performance of the conventional hybrid beamforming proposed in [29].
As for the fully digital BMSGD-based OutMin approach [31], the beamformers are designed with basis on the optimization problem (8) with µ = 0, N t = N RF , and V b = I Nt , ∀b.We remark that in all fully digital beamforming designs based on stochastic learning, the initial beamformer is obtained via the MMSE approach summarize by equation ( 42), and the mini-batch size is set to M = 16 in all stochastic learning approaches.
In the ideal CoMP-SRM scheme, fully digital beamforming is designed via the optimization problem (13) with N t = N RF , V b = I Nt , ∀b, and actual CSI h u [k], as described in equation ( 6), including realizations of blockage patterns ω c b,u , ∀b, u, c.Note that CoMP transmissions in [27] can achieve the same outage probabilities and total system data rate as the ideal scheme if worst-case optimization is solved with fully digital architecture N RF = N t and realizations of blockage patterns ω c b,u , ∀b, u, c.

A. Convergence Behavior
Let us start by assessing the convergence of the proposed approach, capturing in particular the effects of the beamformer initialization methods described in Section III-D.3, the learning rate mechanisms of Section III-E, and the hyperparameter µ.To this end, we first compare in Fig. 3 the convergence behavior of the proposed OutMin beamforming design algorithm at the target rate of r u = 20 [Mbps], ∀u, and the specific hyperparameter µ = 0.1, with the various initialization and learning rate adaptation schemes.
The horizontal and vertical axes of this figure correspond to the number of iterations (i.e., training-dataset generations) and the objective function in equation (19a), respectively.The values of the objective function with learning rates that ensure the convergence in equations (38a) and (38b) are denoted by lines without markers.Conversely, convergence behaviors determined by tuned learning rates are denoted by lines with markers.Curves with dotted line show the convergence behavior with randomly initialized analog beamforming and the initial baseband beamforming following equation (34).
The results confirms that Algorithm 1 converges to local optimal points regardless of the initialization method employed under the learning rates satisfying the convergence criterion.The effectiveness of the initialization method described at Section III-D.3 is also corroborated.
The gap between lines with and without markers shows the gain due to learning rate tuning.In particular, it is found that if learning rates are tuned via Optuna, the best convergence behavior is obtained, which confirms both the necessity of the hyperparameter tuning, and the effectiveness of the proposed method.For instance, the tuning results for the proposed approach with initialization via the DFT codebook are α f i [k] = 0.0228, α v i = 0.9837, ∀k, i. Building onto the findings of Fig. 3, in what follows, the performance of the proposed approach with initialization via the DFT codebook and learning rates as in equation ( 39) is further assessed, considering the balance between the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.outage probability, Fig. 4(a) confirms that the proposed algorithm converges to local optimum points within only a few iterations, regardless of the hyperparameter values, although the level of outage actually achieved is found to be lower for lower values of µ, as can be trivially expected from the objective function in equation (8a).
In turn, the vertical axis in Fig. 4(b) corresponds to the total effective data rate for all users u∈U R u,eff , with the effective rate for each user defined as It can be seen from the results of Fig. 4(b) that, complementary to the findings in Fig. 4(a), a higher effective aggregate rate is achieved with larger values of µ in the range where 0 ≤ µ ≤ 0.4, again as expected from the formulation of the problem (8), such that altogether, the results of Fig. 4 demonstrate that the proposed regularized formulation is suitable for Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.hybrid beamforming design to improve both outage probability and data rate.
To elaborate, we emphasize that the results of the proposed method with µ = 0 are equivalent to SotA OutMin approaches such as those in [28] and [31], which achieve the lowest outage probabilities at the expense of decreased data rates.In contrast, the proposed regularized OutMin approach achieves low outage probabilities, comparable to those of previous methods [28], [31], while avoiding data rate losses.
In the following subsection, the proposed approach with µ = 0.1 is evaluated in more detail, considering different values of target rates.

B. Outage and Rate Performance
The improvement achieved with the proposed method in terms of both outage probability and effective data rate is assessed under various system conditions in Fig. 5.To that end, we first compare the outage probabilities of the proposed and SotA schemes directly in Fig. 5(a), with the performances of fully digital and hybrid beamforming methods depicted in white and black markers, respectively.The results confirm that all BMSGD-based OutMin approaches achieve lower outage probabilities than SotA alternatives based on other techniques.In particular, it is found that despite relying only on statistical path blockage information, the BMSGD-based OutMin methods come closest to the performance of the ideal CoMP-SRM with full and instantaneous knowledge of path blockages, achieving nearly the same result for low target rates.
As a highlight, the wide gap between the performance of the proposed approach and the fully-digital MSGD-OutMin [28], [29] approach demonstrates the effectiveness of the joint beamforming and power allocation approach here introduced.
It can also be clearly observed that methods such as the MRT, MMSE, and SRM with estimated CSI, which ignore blockages altogether, are found to result in poor outage performances compared to the proposed scheme and to all BMSGD-based OutMin methods in general.
In order to further elucidate the advantage of the proposed scheme over other BMSGD-based methods [31], we compare in Fig. 5(b) the sum effective data rates achieved by the various techniques.These results indicate that, unlike the Out-Min approach without regularization, the proposed approach outperforms the MMSE and SRM methods without blockage information in terms of both outage probability and total system data rates.
In addition, the narrower performance gap between the fully digital and hybrid variations of the proposed method than the that of the AltMin approach [15] illustrates the small loss in performance paid for the significant reduction in the number of RF chains achieved by the technique here contributed.

C. Data Rate Statistics
Given that outage probability and sum rate are opposite performance metrics -in the sense that a high sum rate at the expense of a high outage is as undesirable as a low outage at the cost of a low sum rate -we conclude our performance assessment by evaluating the CDFs of the data rates achieved by the various methods compared, as shown in Fig. 6, for four distinct target rates.
It is found that the rate CDFs for the beamformers that are oblivious to target rates, such as the MRT, the MMSE, and the SRM with estimated CSI, are similar in all figures, in the sense that they do not exhibit a reduction in the likelihood of rates below the target.For this reason, the lines corresponding to the latter three methods are shown in grey, not to disturb the visibility of the better-performing and more recent alternative schemes.In contrast, the curves for all OutMin schemes have a "knee shape" in the vicinity of the target rate, clearly showing a reduction in that region of their respective CDFs, which of BMSGD-based schemes remarkably come close to the curve corresponding to the ideal CoMP-SRM, the only method assumed to have full and perfect knowledge of instantaneous blockages.
The CDFs in the whole region, including the non-outage region, show that the proposed hybrid approach achieve higher data rates than the fully digital SotA technique [31] with comparable outage probabilities, despite the significant savings in the number of RF chains, which is reduced from N RF = N t = 16 to N RF = 2.The results also confirm that the OutMin approaches achieve higher data rates than the conventional SRM approach with probabilities of about 80% and 70% at target rates of 8 [Mbps] and 20 [Mbps], respectively, thanks to the proposed regularization. 7.CONCLUSION We proposed new schemes to jointly optimize hybrid beamformers and the per-carrier allocation of transmit powers, with aim at mitigating the effect of random path blockages in CoMP systems using OFDM in mmWave channels.Our designs are based on a newly formulated sum-of-outage minimization problem with manifold constraints, per-user sum data rates, and a regularizer corresponding to the ideal data rate.In order to enable efficient solution, the latter problem is transformed into an ERM problem, solved via a stochastic learning method here introduced, which requires only knowledge of path blockage probabilities.To further improve the convergence behavior of the proposed technique, beamforming initialization and learning rate adaptation schemes are also contributed.Numerical results confirm that under realistic conditions, the proposed approach outperforms SotA methods in terms of convergence, outage probability, total system data rate, and requirements in number of RF chains.
A possible future work might aim to reduce the complexity of the blockage-robust hybrid beamforming design, aiming at enabling practical mmWave systems employing a larger antenna array and more UEs.In addition, since the stochastic approach may require a large amount of training data, owing to mini-batch size tuning and imperfect knowledge of channel gains and angles, the design of deterministic approach with comparable performance remains as a possible target for future work.

APPENDIX
Then, the Hessian of the objective function t νu for the baseband and analog beamformers for the u-th UE are given by where ∇ f * and ∇ v * denote conjugate gradients for the baseband and analog beamforming, respectively, and the scalar S u takes the value β u or Λ u .
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
To calculate the largest eigenvalues for Hessian of these, we consider a scalar function Γ ∈ R in the form Γ = x H Hx x H Hx + σ 2 , ( where x ∈ C N and σ 2 ∈ R are given vector and scalar, respectively, while H ∈ C N ×N and H ∈ C N ×N are positive semi-definite matrices, respectively.Then, the partial derivative and conjugate partial derivative of the scalar function Γ ∈ R with respect to the vector x are respectively given by HT x * , (47) x H Hx (x H Hx+σ 2 ) 2 Hx , (48) Hence, the Hessian of the function − log 2 (1 + Γ) with respect to the vector x is calculated as with where β ≜ 1/ log 2 ∈ R + .Define the matrix Φ ≜ H + H.Then, the partial derivatives of the vector in equation (50a) with respect to the vectors x T and x H are given by where ξ ∈ R and ξ ∈ R are defined as ξ ≜ x H Φx + σ 2 and ξ ≜ x H Hx + σ 2 , respectively.Similarly, the partial derivatives of the vector in equation (50b) are given by In turn, Hessian H Γ in equation ( 49) can be written as Φxx H H Φxx From the triangle inequality, the largest eigenvalue of the Hessian H Γ satisfies where the matrices Q 1 through Q 6 are rank 1 and positive semi-definite.Next, notice that the largest eigenvalues of the matrices −Q 1 , −Q 4 , and −Q 6 are zero, while the largest eigenvalues of the remaining matrices are given by where the relation was used in (55c).
In baseband beamforming design, it is considered that x = f [k ′ ], H = A m u [k ′ ], β = S u , and H = Ām u [k ′ ], such that the largest eigenvalue of the matrix Q 3 becomes Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 1 .
Fig. 1.A CoMP system, in which multiple BSs cooperatively serve multiple single-antenna UEs.

Fig. 2 .
Fig. 2. A TDD system, in which channel parameters are perfectly estimated during uplink.
p(dT s − τ c )a Nt (θ c b,u , ϕ c b,u ) , (1) where d ∈ {0, 1, . . ., D − 1} denotes the delay tap index, and D ∈ N denotes the total number of delay taps; the path gain of the c-th cluster is modeled as g c b,u ∼ CN (0, 10 −PL c b,u /10 ), with the associated path loss PL c b,u ∈ R calculated as PL c b,u = α + 10β log 10 (d b,u ) + γ, in which d b,u denotes distance between the b-th BS and the u-th UE, and the parameters α, β, γ are listed in [32, Table I]; the function p(dT s − τ c ) represents the equivalent pulse response at the transmitter and receiver, calculated by the sampling duration T s and time delay at the c-th cluster τ c .Without loss of generality, the index c = 1, and the time delay τ 1 = 0 represent the LOS components.In the above, the vector a Nt (θ c b,u , ϕ c b,u ) ∈ C Nt denotes the array response as a function of the elevation θ c b,u and azimuth ϕ c b,u of AoD of the c-th cluster from the b-th BS toward the u-th UE and is given by , and then align them with the estimated channels via the inner products∥ ĤH b [k]d i ∥ 2 between the channel matrix Ĥb [k] ≜ ĥb,1 [k], . . ., ĥb,U [k] ∈ C Nt×U and each i-th column d i ∈ C Nt of the DFT matrix.Denoting the indices of the columns of the DFT matrix by D, and the indices of vectors assigned to the analog beamformer by V, we can concisely describe the j ∈ Algorithm 1 BMSGD-Based Hybrid Beamforming Design

Fig. 3 .Fig. 4 .
Fig. 3. Convergence behavior with different initial beamforming designs and learning rates at the target rate of 20 [Mbps].

Fig. 5 .
Fig. 5. Outage probability and effective data rate comparison with SotA.

Fig.
Fig. Cumulative distribution functions (CDFs) of achievable data rates with target rates.
T H T Φ T x * x H H Φ T x * x T H T Hxx H H Hxx T HT H T x * x H H H T x * x T HT x H H HT x * x T HT x H H Φ T x * x T HT * * ≜Q6 ,