Outage-Minimization Coordinated Multi-Point for Millimeter-Wave OFDM With Random Blockages

We consider millimeter-wave (mmWave) orthogonal frequency division multiplexing (OFDM) systems subjected to random propagation path blockages and propose a new Coordinated multi-point (CoMP) transmission scheme that minimizes the outage probability of users with respect to given target data rates. To this end, a stochastic sum-outage-probability minimization problem is formulated for joint beamforming design, data rate allocation, and power allocation over subcarriers. In order to solve this problem efficiently, a block statistic learning approach is introduced using training data generated from a priori knowledge of path blockage probabilities. To initialize the stochastic learning solver, the novel initial beamforming is also proposed based on the upper bound of the original objective function, which improves convergence without tuning hyper-parameters. Numerical results confirm the effectiveness of the proposed block stochastic learning approach in terms of both convergence behavior and outage probability. Furthermore, these results confirm that the proposed approach with only blockage probabilities is comparable to the outage performance of a CoMP sum rate maximization (SRM) transmission scheme with perfect channel state information (CSI) and perfect knowledge of blockages.


I. INTRODUCTION
O WING to the growing number of wireless devices and the complexity of user demands, modern wireless communications systems require increasingly larger bandwidths [1], [2], [3]. fifth-generation mobile communications systems (5G) have therefore been designed to exploit available spaces in the mmWave band within 24 [GHz] to 300 [GHz], as a means to address spectrum shortages [4], [5], [6], with further utilization of the mmWave band considered crucial to meet the requirements of Beyond-5G and sixth-generation mobile communications systems (6G) [7], [8], [9], [10], [11]. However, mmWave systems suffer from severe path loss owing to the more significant signal attenuation and absorption by the atmosphere faced at increased frequencies [12], [13]. On the other hand, the shorter wavelengths of mmWave carriers allow for more antenna elements to be equipped, enabling highly directional transmissions that can compensate for unfavorable propagation effects.
For all the above, mmWave multiple-input multiple-output (MIMO) and beamforming technologies are considered fundamental for future wireless communications systems [14], [15], [16] and have therefore been the subject of various innovations.
To cite a few examples, hybrid beamforming techniques to compensate for signal attenuation while avoiding implementation issues in terms of cost and power consumption caused by the enormous numbers of radio frequency (RF) chains were proposed in [17], [18]. These revealed significant and comparable spectral efficiency improvement with fully digital architectures under perfect CSI. In turn, [19], [20], [21] addressed that maximizing beamforming gain requires large antenna arrays and accurate CSI, proposing efficient channel estimation and tracking algorithms for mmWave systems that exploit the sparsity of the channel owing to the presence of few dominant paths. In order to compensate for CSI errors inevitably introduced by estimation techniques, a beamforming design with robustness to CSI errors has also been proposed in [22].
While the aforementioned beamforming techniques compensate for the attenuation, mmWave signals are also subjected to sudden and rapid signal attenuation during data transmission due to path blockage by obstacles such as pedestrians and vehicles. Path blockages are caused by the combined effects of the weak diffraction and penetration characteristic of mmWave signals, with the sharp beams created by highly directive MIMO antenna arrays [23], [24], [25], [26]. CoMP transmission has been actively studied to mitigate this effect [27], [28], in which multiple geometrically-distributed base stations (BSs) transmit data cooperatively, providing spatial diversity gain. Meanwhile, this approach is probabilistic and does not fully exploit instantaneous available paths.
In order to determine which propagation paths exist during data transmission, blockage prediction strategies have been proposed, which exploit out-of-band information, visual information, or mmWave in-band signatures [29], [30], [31], [32], [33], [34], [35]. In [29], [30], [31], blockages are predicted using sub-6 GHz channels, whose spatial similarity to mmWave channels improves prediction accuracy and reduces the mmWave beam training overhead. Blockage prediction approaches using visual images from cameras equipped at BSs have been proposed in [32], [33], [34], [36]. In [32], [33], the received signal power after a few milliseconds is predicted using machine learning with spatiotemporal images. The method in [35] exploits mmWave in-band signatures to provide high prediction accuracy using machine learning with received signal power sequences. To further enhance the prediction accuracy, a technique to track positions of user equipments (UEs) via received power and visual images has been studied [37]. Based on the prediction of the instantaneous blockage occurrence, handover strategies have been proposed in [33], [34]. Although these approaches exhibit significant performance gain, sudden throughput degradation due to prediction errors and unacceptable delay to re-establish the communication link are still inevitable. In [38], [39], robust CoMP-SRM transmission schemes exploiting the blockage probabilities instead of the instantaneous blockage occurrence have been proposed. Thereby, cooperative beamforming methods based on worst-case optimization using blockage probability information, were shown to maintain high data rates without the knowledge of instantaneous available paths.
The aforementioned contributions, however, consider only total throughput, such that quality of service (QoS) cannot be guaranteed, and assume that blockage occurs only on line-of-sight (LOS) paths, which is impractical for mmWave systems [25], [40]. In contrast, stochastic approaches for beamforming design aiming at guaranteeing QoS have been proposed in [41], [42], [43]. These considered the cooperative outage minimization (OutMin) beamforming schemes with predicted blockage probabilities that minimize the outage probability of the given users' target data rates. In [41], the OutMin beamforming design is formulated as an empirical risk minimization (ERM) problem, which can be solved by stochastic learning based on predicted blockage probabilities. Specifically, a fully digital CoMP scheme employing the mini-batch stochastic gradient descent (MSGD) approach was proposed in mmWave channels, where both the LOS and non-line-of-sight (NLOS) paths are subjected to blockages. The extension of that approach to hybrid beamforming design was then obtained in [42] by following a block coordinate descent (BCD) framework. In [43], these stochastic approaches were applied to a reflected intelligence surface (RIS)-aided mmWave system, with the beamforming and reflection coefficient vectors updated based on the block mini-batch stochastic gradient descent (BMSGD).
All these stochastic approaches have learning convergence guaranteed by setting the learning rate based on the Lipschitz constant and were found to achieve low outage probabilities, at the penalty of a slight decrease in the system's total throughput. However, these methods only considered single-carrier narrowband channel models, which do not suit practical mmWave communications systems. mmWave communications usually operate on channels with bandwidths considerably more extensive than those of sub-6 GHz systems and employ multi-carrier OFDM to mitigate the effects of frequency selectivity.
In light of the above, we propose a CoMP transmission scheme robust against random blockages and designed to minimize the outage probability of users in mmWave OFDM systems. To that end, we formulate a new sum-outage-probability minimization problem with the per-user sum rate over subcarriers. This problem is solved efficiently using a block stochastic learning method that ultimately yields the design of OutMin beamforming, jointly with the allocation of per-user data rate and power over subcarriers. In order to improve convergence behavior, we also propose initial beamforming for stochastic learning. This beamforming is designed based on the original ERM, which allows us to obtain a better local optimal solution with fewer iterations without introducing tunned hyper-parameters. Furthermore, we compare the proposed method with state-of-the-art (SotA) methods and confirm the effectiveness of mitigating the blockage effect of the proposed method via computer simulations. Then, comparing to CoMP scheme with the knowledge of actual blockages, we also confirm that the proposed method with only blockage probability achieves comparable outage performance with this scheme and clarify the relationship between the blockage prediction and the mmWave communication performance. Notation: The following notation is used throughout the article. The sets of natural, real, and complex numbers are denoted by N, R, and C, respectively. Boldface capital and lowercase letters denote matrices and vectors, such as in X and x, respectively. A circularly symmetric complex Gaussian distribution with mean μ and variance σ 2 is represented by CN (μ, σ 2 ). The operators (·) T and (·) H denote transpose and Hermitian transpose, such as in X T and X H , respectively. The l p -norm of a vector x is denoted by x p , with p ≥ 0. The operator ⊗ denotes Kronecker product. The N -dimensional identity matrix is denoted by I N and the operator E[ · ] denotes the expected value taken over blockage patterns.

A. Communication Scenario
Consider a CoMP OFDM system operating in downlink transmission, as illustrated in Fig. 1, in which multiple BSs, each equipped with a uniform planar array (UPA) of N t transmit antenna elements, simultaneously serve multiple single-antenna Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. UEs. The BSs are connected via fronthaul to a central processing unit (CPU), which designs the cooperative beamforming. space division multiple access (SDMA) is used to provide access to multiple UEs, each of which makes use of all subcarriers. The mmWave channel of each BS-UE pair is assumed to consist of one LOS path and several NLOS paths generated by scatterer clusters. Uplink and downlink communications are also assumed to be separated based on time-division duplexing (TDD) as illustrated in Fig. 2. As a result, it is assumed that the path gain and time delay, as well as the azimuth and elevation angle of departure (AoD) of the multipath signals, can be perfectly estimated via uplink and used for downlink owing to channel reciprocity. However, even with perfect channel estimation, the estimated CSI may differ from the actual CSI during data transmission due to path blockage by surrounding obstacles.
It follows from the above that the estimate channelĥ b,u [d] between the b-th BS and the u-th UE at the d-th delay tap can be modeled aŝ where d ∈ {0, 1, . . . , D − 1} denotes the delay tap index with, D ∈ N denoting the total number of delay taps. The function p(dT s − τ c ) is the d-th delay tap sample of the equivalent response of pulse-sharper at the transmitter and receiver after the propagation delay τ c , where T s denotes the sampling period and τ c the time delay associated with the c-th cluster. Without loss of generality, it is assumed that the time delay of the first cluster is zero (i.e., τ 1 = 0), as c = 1 is the index of the LOS component of the channel. The channel gain of the c-th cluster path between the b-th BS and the u-th UE is modeled as , where the associated path loss PL c b,u ∈ R is given, in accordance to [12], by in which d b,u denotes the distance between the b-th BS and the u-th UE. The parameters α, β, and γ are given in [12, Table I].
The array response vector corresponding to the elevation θ c b,u ∈ R and azimuth φ c b,u ∈ R AoDs of the c-th cluster from the b-th BS toward the u-th UE is given by where N h t ∈ N and N v t ∈ N denote the number of antenna elements in the horizontal and vertical directions of the UPA, respectively, such The vector c N is the standard uniform linear array response vector, defined as According to [24], it is experimentally confirmed that propagation paths are randomly blocked with a probability ranging from 20% to 60%. Therefore, similarly to [23], [38], [39], [41], [42], [43], we assume that blockage effects are modeled by random variables ω c b,u ∈ {0, 1} following the Bernoulli distribution with the mean p c b,u . Then, during the data transmission phase after the channel estimation, the actual channel between the b-th BS and u-th UE at the d-th delay tap can be expressed as We remark that since the sampling period T s specified in 5G new radio (NR) is much shorter than the continuous blockage time revealed in empirical studies [25], [26], all delayed rays of the c-th cluster are assumed to be blocked when the c-th cluster is blocked. This is implied by the notation in (5), namely In what follows, it is assumed that the only blockage probabilities for all clusters p c b,u , ∀b, u, c are perfectly available [38], [39], [41], [42], [43]. It is worth noting that, since the stochastic prediction about the blockage probabilities do not require the precise information on locations of UEs and surrounding obstacles, this information does not mean the availability of actual CSI in data transmission h b,u [d] at the CPU.
Using the relationship between the delay-time domain and the frequency domain described by the Fourier transform, the estimated and actual mmWave channels between the b-th BS and the u-th UE at the k-th subcarrier can be modeled aŝ respectively, where k ∈ K {0, 1, . . . , K − 1} denotes the subcarrier index, with K ∈ N denoting the total number of subcarriers. The blockage pattern is identical across all subcarriers.

C. Received Signal Model
denote the beamforming vector from the b-th BS toward the u-th UE at the k-th subcarrier. The received signal at the u-th UE and the k-th subcarrier can be written as ) are respectively the transmitted data symbols and additive white Gaussian noise (AWGN) with variance σ 2 u [k], at the u-th UE and the k-th subcarrier, and where the channel vector h u [k] and the beamforming vector

D. Problem Formulation
As shown in (5), the realization of the channel during data transmission might be different from the estimated CSI defined in (1). Therefore, the achievable data rate of each user determined by the channel realization is a random variable. To meet the QoS requirements of the system, a judicious option is to minimize the sum of the users' outage probabilities, which correspond to the event that the resulting data rate is inferior to the given target rate. Since the achievable data rate of the u-th UE equals the sum rate over all available subcarriers, the associated stochastic sum-outage-probability-minimization problem can be described as subject to u∈U k∈K where ) and r u denote the achievable data rate at the u-th UE and the k-th subcarrier, and the target rate for the u-th user, respectively; P max,b denotes the maximum transmit power at the b-th BS, and the beamforming vector , ∀k, u are assumed to follow a circularly symmetric complex Gaussian distribution. Hence, the function is the signal-to-noise interference ratio (SINR) at the u-th UE and the k-th subcarrier given by . (10) Note that in order to solve (9) by conventional OutMin optimization approaches [41], [42], [43], the problem must be divided into K sub-problems, which are described by subject to where r u [k] represents the target rate at the k-th subcarrier for the u-th UE, and P max,b [k] is the allocated maximum transmit power at the k-th subcarrier at the b-th BS.
To solve these sub-problems, the target rate r u [k] and maximum transmit power P max,b [k] at the k-th subcarrier must be determined. For the transmit power allocation, considering the channel uncertainty by random blockagesĥ u [k] = h u [k], ∀u, k, an equal power allocation is optimum [44], [45], [46] since this channel uncertainty can be seen as CSI errors. For the target rate allocation, one option is to allocate the rate based on the average achievable data rate over the possible blockage events. To this end, the distribution of the data rate with all the blockage probabilities is necessary, and this is mathematically difficult. Hence, the reasonable target rate allocation at the k-th subcarrier is r u [k] = r u /K.
Then, the sub-problems described in (11) can be rewritten by The conventional approach [41] can design OutMin beamforming by solving this optimization problem with stochastic learning based on the given blockage probabilities. Obviously, this solution is sub-optimal since this does not minimize the outage probability summed over the subcarriers.
In contrast to such a naive approach, if the statistical knowledge of the blockage probabilities is available, an alternative approach is to design both the allocation and the beamforming jointly by considering the distribution given by blockage probabilities. However, it is mathematically difficult again to obtain the distribution in the closed form, so that the possible and judicious option is to design them numerically. This motivates us the key contribution of this article, which will be described in the next section. We consider the joint design of OutMin beamforming, data rate allocation, and maximum transmit power allocation over all subcarriers of an OFDM system by directly solving the problem given in (9) based on a block stochastic learning approach.

A. Empirical Risk Minimization
To solve problem (9) directly and efficiently, we transform it into an ERM problem by introducing an indicator function defined as which yields subject to u∈U k∈K where the expectation in (14a) is taken over multiple channel realizations given by random blockages. This ERM problem can be efficiently solved via a stochastic approximation approach using samples from possible channel realizations. This is possible because the channel gains, the AoDs, and blockage probabilities are assumed to be available at the BSs in the present study. However, the direct calculation of the gradient direction in (14) is difficult because the indicator function given by equation (14a) is not smooth. Therefore, a generalized smooth hinge surrogate function ν(·) is introduced instead of the earlier indicator function, namely where Hence, the ERM problem of (14) becomes subject to u∈U k∈K The ERM problem given above has the summation over subcarriers inside the hinge function and cannot be solved by the algorithms in the literature [41], [42], [43]. Therefore, we propose a new BMSGD algorithm in the next subsection.

B. Block Stochastic Gradient Descent
By generating training data h m u [k] and replacing the expectation with the ensemble mean, the objective function (16a) can be rewritten as subject to u∈U k∈K where M mini ∈ N is the mini-batch size (1 M mini M u ). In order to minimize outage probability, beamforming vectors for all subcarriers f [k], ∀k ∈ K must be updated, although it is difficult to compute the gradient of the objective function (18a) for all variables. Therefore, we instead perform the latter calculation for one beamforming vector f [k ], k ∈ K with fixed beamforming vectors f [k], k ∈ K\k and update all beamforming vectors based on the BMSGD. The gradient of the hinge function ν(·) with respect to the selected vector f [k ] can be calculated as where with where e u ∈ {0, 1} U ×1 is a vector of length U with all elements 0 except for the u-th element, which is 1, and e u ∈ {0, 1} U ×1 denotes the complement of e u , such that all elements of e u + e u are 1.
In the above, the update f [k ] can be written as where α i [k] ∈ R + and f (i) [k] ∈ C UBN t ×1 are the learning rate and the beamforming vector at the i-th iteration and the k-th subcarrier, respectively. In the proposed BMSGD approach, all beamforming vectors f (i) [k], ∀k ∈ K are projected onto the feasible region to satisfy the power constraints upon each update f (i) [k ]. It should be noted that this normalization rule, which is different from the conventional BMSGD approach [43] allows efficient data rate Algorithm 1: BMSGD-based OutMin Beamforming Design. After all beamforming vectors have been updated, the algorithm is repeated using new training data. By following this update method, not only can robust beamforming be designed against blockages, but also transmit power and data rate can be optimally allocated over subcarriers in terms of outage minimization based on statistical learning of blockage effects.
We remark that although the order of subcarriers followed during updates can affect convergence behavior and the optimal value obtained, determining the optimal order of subcarriers that contributes the most to minimize the outage probability for blockage occurrence is a combinatorial problem. Therefore, adopt here a random update order for each training dataset.
The proposed OutMin beamforming design based on BMSGD is summarized in Algorithm 1, where f ini [k] is the initial beamforming vector, the design of which is considered in the next subsection.

C. Initial Beamforming Design
As discussed in the previous subsection, in Algorithm 1 an initializing beamforming vector is required, which may impact convergence behavior. In order to address this issue, we propose here an initial beamforming design based on an upper bound of the hinge function (15) to obtain a better local optimal value with less learning. First, notice that the generalized smooth hinge surrogate function ν(·) is convex with respect to the data rate, whose maximum and minimum values at the u-th UE, denoted respectively by V u ∈ R and W u ∈ R, are known. Thanks to these limits, the upper bound of the objective function (16a) can be derived from the Edmundson-Madnsky (EM) bound [47], [48] described as which, given that the data rate is zero, i.e., V u = 0 when all paths are blocked for the u-th UE, reduces to (26) Using the latter inequality in the problem formulated in (16) yields the following original formulation for initial beamforming Unfortunately, the maximum value constraints (27b) and the multiplication of an expectation and quantities dependent on the maximum value W u make it hard to solve the problem (27) efficiently. In order to circumvent this challenge, we relax the problem (27) by dropping the constraint (27b) and the negative quantity ν(W u )−1 W u ≤ 0 by replacing this quantity with the hyperparameter β u ≥ 0. Furthermore, by ignoring the constant term +1, the initial beamforming design can be reformulated as the following weighted expected SRM problem subject to u∈U k∈K Next, we observe that the objective function of the problem (28) can be upper-bounded by a Jensen's inequality [49], yielding In addition, in order to relax to the convex optimization problem, the expected value of SINR appearing in (29) is further developed using a first-order Taylor approximation, which Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.

yields [50]
where the positive semi-definite matrix ]. Substituting the approximate expected SINR expression of (30) into the bound (29) transforms problem (28) into which under the quadratic transform (QT) [51] is finally transformed into the equivalent convex problem: subject to u∈U k∈K where the QT objective function is described in (33), shown at the bottom of this page, and the auxiliary variable γ u [k] is defined by Lastly, let us address the hyperparameter β u introduced in the SRM-reformulated problem given in (28), which must be determined using the quantity 1−ν(W u ) W u ≥ 0 according to the constraint (27b) and the EM bound [47], [48] given in (25).
(35) A pseudo-code summarizing the initial beamforming design described in this subsection is offered in Algorithm 2.

D. Convergence
We emphasize that the convergence of Algorithm 2 is guaranteed by the fractional programming (FP) employed, as proved in [51]. Moreover, the convergence of Algorithm 1 is also guaranteed if the learning rate α i decreases appropriately as the number of iterations increases so as to meet the sufficient convergence conditions of stochastic gradient descent (SGD) for a smooth nonconvex optimization problems [52], [53], namely The Lipschitz criterion [53] also guarantees the convergence of the block stochastic learning by setting the learning rate α i ∈ (0, 1/L], where L denotes the Lipschitz constant. Furthermore, if the exact Lipschitz constant cannot be obtained, the learning rate α i = ρ/( √ i · L ), where ρ and L denote the scaling coefficient and the lower bound of the Lipschitz constant, respectively, leads to the convergence [53]. In Algorithm 1, as [42], [43], [54], following Taylor's theorem, the lower bound on the Lipschitz constant for the k -th subcarrier is given by where λ max (·) denotes the largest eigenvalue of the matrix argument, and ∇ 2 k ν u the Hessian of the hinge function for the u-th UE, respectively.
In order to derive learning rates to assure the convergence of Algorithm 1, the Hessian of the hinge function for the beamforming vector f [k ] of the u-the UE is calculated by where ζ u ∈ R + is defined as ζ u 1/(r u · log e 2), and ∇ k * and ∇ k denotes the partial derivations ∂ ∂f * [k ] and ∂ ∂f [k ] , respectively. Following the Wirtinger derivative, the Hessian ∇ 2 k ν u can be calculated as per (39), shown at the bottom of this page, where . Then, consider the following triangle inequality The matrices Q 1 through Q 6 are rank-one and positive semidefinite, owing to the vector multiplication h m . It follows that the largest eigenvalue of matrices −Q 1 , −Q 4 , and −Q 6 is zero, for all channels and beamforming vectors. The remaining largest eigenvalues required to evaluate (40) are given by , (41) where the identity was used in (43). For the matrix Q 3 , following the identity Tr(AB) = Tr(BA), the largest eigenvalue can be obtained by In turn, the expression of the largest eigenvalue of the matrix Q 2 can be simplified as follows: Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
where the inequality Tr(AB) ≥ λ max (A)Tr(B) was used. The final inequality in (46) follows from two facts. The first is that the channels h m u [k ] used for training include blocked paths and have smaller l 2 -norm than the estimated channelŝ h u [k ], considering all the propagation paths are not blocked. The second is that the square value of the l 2 -norm f [k ] 2 2 is never greater than b∈B P max,b owing to the power constraints.
Similarly, the upper bound on λ max (Q 5 ) is given by Considering ζ u = 1/(r u · log e 2), the largest eigenvalue of the Hessian satisfies Finally, the lower bound on the Lipschitz for the k -th subcarrier is given by from which we obtain the learning rate that ensures the convergence of Algorithm 1.

IV. PERFORMANCE ASSESSMENT
Before we proceed to numerical performance evaluations of the proposed joint OutMin beamforming, data rate, and power allocation scheme, let us briefly assess the complexity. The most computationally expensive operation in Algorithm 1 is the differential operation defined by (21), whose complexity order is O(N 2 t B 2 U ) per subcarrier, such that the total complexity order of the method can be estimated as O(KN 2 t B 2 U ). We highlight that this computational complexity is, on a per subcarrier basis, the same as that of preceding work such as [41].

A. Comparison to State-of-The-Art Methods
In this section, we evaluate the proposed joint OutMin beamforming, data rate, and power allocation scheme, comparing its performance to those of the maximum ratio transmission (MRT), minimum mean square error (MMSE), and MSGDbased OutMin [41]. Note that the MMSE beamformer is utilized to initialize both Algorithm 2 and the MSGD-based approach of [41].
To elaborate further, the MRT beamforming is given by the conjugate vector of the channel, while the MMSE beamforming followed by normalization to satisfy the power constraint considering equal power allocation over the subcarriers, and wherê subject to u∈U k∈K which can be solved efficiently by the QT [51]. As obvious, this CoMP-SRM never experiences the outage events due to random blockages and presents the maximum performance of the distributed antenna systems [38], [39]. Moreover, the BS handover scheme with perfect knowledge of instantaneous blockages is also considered. This scheme assumes that each UE establishes a link to one BS based on the actual CSI h b,u [k], ∀b, u, k in (7). Specifically, the BS is determined by the following criterion: where b u ∈ B denotes an index of the BS, which serve the u-th UE. This approach also employs SRM beamforming, with maximum transmit power set to a value B times higher than the CoMP schemes for fair comparisons with CoMP schemes.

B. Simulation Setup
We consider a system operating at the 28 [GHz] band and employing OFDM transmission using K = 36 subcarriers with W = 240 [kHz] subcarrier spacing and a sampling period of by T s = 1/(W K) = 0.115 [μs] 1 . A square cell 100 [m] wide is assumed, with B = 4 BSs located at the corners and U = 2 single-antenna UEs uniform and randomly distributed within it. Each BS is equipped with a UPA with N t = 16 antennas arranged in N h t = 4 horizontal and N v t = 4 vertical rows of elements. The maximum transmit power of each BS is P max,b = 30 [dBm]. It is assumed that pulse shaping is performed using a root-raised cosine roll-off filter with a roll-off rate of 0.8, such that the impulse response of the raised cosine roll-off filter gives an equivalent pulse response.
It is also assumed that the time delays of each cluster τ c are uniformly distributed in the interval [0, DT s ), as suggested in [21], where D = K/4 = 9. The AWGN variance, in dBm, of each user in the k-th subcarrier is modeled as 10 log 10 (σ 2 u [k]) = 10 log 10 (1000κT ) + 10 log 10 (W ) + NF,  (50) and scaling with ρ are used for all stochastic learning approaches. The blockage probability of each path p c b,u is assumed to follow an independent and identically uniform distribution in the interval [0.2, 0.6] according to [24], [26].

C. Convergence Behavior
We start by evaluating the convergence of the proposed method towards the achieved outage probability. Fig. 3 shows the outage probabilities of Algorithm 1 with different initial beams and target rates over the different numbers of iterations. Here, initial beams are give by three distinct ways: simple MMSE beamforming in (51), a conventional SRM beamforming with estimated CSIĥ b,u [k], and the proposed beamforming based on the EM bound summarized in Algorithm 2. From the figure, the proposed BMSGD method stably converges to the locally optimal solution regardless of the initialization whereas the initialization by Algorithm 2 leads to the lower outage probability with fewer iterations than MMSE or SRM beamforming especially when higher target rate is assumed. This confirms the fast and stable convergence behavior of the proposed BMSGD with the proposed initialization even without heuristic hyperparameter tuning.   The comparison between the initial beam and the proposed OutMin beamforming indicates the gain obtained by the proposed BMSGD. Moreover, the comparison between conventional OutMin approach [41] dubbed "OutMin(Conventional, MSGD)" in the figure and the proposed one reveals the gain obtained by the blockwise learning of the proposed BMSGD. The MRT aiming to maximize signal-to-noise ratio (SNR) while ignoring interference [28] and the MMSE beamforming aiming to maximize SINR [27] are found to be significantly outperformed by the proposed OutMin beamforming.

D. Outage Performance
Although the BS handover scheme is assumed to have the actual CSI with perfect knowledge of instantaneous blockages h b,u [k], ∀b, u, k, it is found that this scheme leads to the inferior outage probability performance to the proposed OutMin only with the statistical knowledge of blockage probabilities. This gap reveals the importance to exploit the spatial degrees of freedom for inter-UE interference mitigation and diversity via cooperation among BSs.
The proposed OutMin beamforming hence is superior to all the conventional methods except CoMP-SRM with perfect knowledge of blockages. Even though CoMP-SRM ideally exploits the available channels during the data transmission, the proposed OutMin beamforming approaches the performance up to target rates of 16 [Mbps] whereas the gap between them becomes larger as the target rate increases. This fact must not, however, be misunderstood as a weakness of the proposed method, because that is a consequence of the idealistic assumption of perfect knowledge of instantaneous path blockages used in the CoMP-SRM method. Instead, this comparison reveals the outstanding performance of the proposed method, which is capable of sustaining up to a target rate of 16 [Mbps], and with only statistical knowledge of path blockage probabilities, the same levels of the outage probability that would be obtained with full knowledge of instantaneous blockages.
The perfect estimation of blockages is impractical, and the natural question that may arise here is how robust against the imperfection of the blockage prediction these schemes are. To this end, we compare the CoMP-SRM with the proposed OutMin beamforming under the assumption on imperfect prediction of either blockages or blockage probabilities.
For fair comparisons, we introduce the following model of imperfect blockage prediction. For CoMP-SRM, the estimated blockage realizationω c b,u ∈ {0, 1} is defined aŝ where Δ c b,u ∈ [0, 1] denotes the prediction error probability at the c-th cluster between the b-th BS and u-th UE. For the proposed OutMin beamforming, the estimated blockage probabilitŷ p c b,u is defined aŝ where the integer n ∈ {−1, 1} takes −1 or 1 with the probability of 0.5. Note that we set p c b,u ∈ [0.2, 0.6] and Δ c b,u ∈ [0, 0.14] in the subsequent simulations, so that 0 ≤ p c b,u + nΔ c b,u ≤ 1 always holds. From the figure, accurate instantaneous blockage prediction is required for the CoMP-SRM to outperform the proposed method. More specifically, when target rates are 20 [Mbps] and 2 Although we should evaluate with target rates defined by use cases of beyond 5G and 6G, the evaluations with these rates are remarkably difficult due to computational complexity. Whereas, the performance behavior of the outage probability is determined by the spectrum efficiency rather than the specific target rate with the given bandwidth. Therefore, considering the balance between the simulation time and the reasonable rate, we have chosen 20 6. Effective throughput as a function of the target rate. The CoMP-SRM and the proposed OutMin schemes have blockage prediction errors such that the difference in outage probability between them is within 0.5%. 28 [Mbps], the CoMP-SRM scheme would require significantly low blockage prediction error probabilities of only 4% and 11%, respectively, which are hard to maintain in practice [34], [35]. Moreover, the outage probability of the CoMP-SRM scheme exhibits a linear and fast-ascending relationship with the blockage prediction error probability while the proposedOutMin scheme is almost constant. These results confirm the robustness of our proposed approach against the prediction errors that are unavoidable in practice.
Motivated by the above results, we seek another metric to compare the CoMP-SRM and OutMin schemes. To that end, we define the effective throughput per user as comparing in Fig. 6 the average effective throughput R eff ( u∈U R u,eff )/U of the CoMP-SRM and OutMin schemes under the difference of the outage probability within 0.5% as a function of the target rate. The result indicates that the actual effective throughput of the CoMP-SRM schemes decreases rapidly for target rates beyond 16 [Mbps] due to the stronger dependence on accurate knowledge of blockages, while that of the proposed OutMin scheme remains steady at the higher target-rate region.
In order to fully appreciate the trade-off between the SRM approach and the OutMin approach observed in the results shown so far, a more detailed analysis of the statistics on achieved rates is required, which is the objective of the following subsection.

E. Throughput Statistics
The throughput statistics of the proposed and SotA methods are compared in Fig. 7(a) through Fig. 7(d) for various target rates ranging from 8 [Mbps] to 32 [Mbps], where the perfect blockage prediction is assumed again. It can be seen from the CDFs that the statistical properties of the throughput of stochastic learning approaches vary with the target rate, and the proposed OutMin achieves the lowest CDF at the outage region of all practical methods. The difference between this CDF and CDFs, such as beamforming designed based on the MMSE method and Algorithm 2 to use an initializer to the stochastic learning, proves that the beamforming is designed to minimize the outage probability on the OutMin design. The CDF curves also confirm that the proposed OutMin coincides up to the target rate point with the idealized CoMP-SRM scheme under actual CSI in Fig. 7(a) and (b), and maintain a relatively small gap to the lower bound in the cases of Fig. 7(c) and (d). The results confirm the efficacy of the proposed method in minimizing outage probability, despite the decrease in the total throughput comparing the CoMP-SRM approach.

V. CONCLUSION
We proposed a robust CoMP transmission approach to minimize the outage probability of mmWave OFDM systems subjected to random path blockages. The proposed approach considers the minimization of the outage via beamforming design under the knowledge of blockage probabilities, formulating an ERM problem corresponding to the sum data rate over subcarriers. The ERM problem is then solved via a BMSGD approach, combined with an initial beamforming method obtained by applying the EM bound on the empirical risk, followed by a Jensen's inequality on the rate function and QT of the resulting objective. Unlike the conventional approaches based on MSGD, the proposed approach allows for joint beamforming design, data rate allocation, and power allocation over subcarriers. Comparisons of the proposed method against various conventional methods and a robust CoMP-SRM scheme with perfect knowledge of instantaneous blockages were shown, which corroborated the efficacy of the contributed art.
The computational complexity of the proposed beamforming design, which increases in proportion to the number of subcarriers, could perhaps be reduced by designing a dedicated optimizer. The robustness against the beam squint effect must be considered in practice. Also, to enhance the user fairness, different design criteria should be considered instead of sum-outageprobability-minimization. These remain as our future work.