Joint Beamforming Algorithm for Multi-Stream MIMO Systems Assisted by Multiple Reconfigurable Intelligent Surfaces

In recent years there has been a growing interest in reconfigurable intelligent surfaces (RISs) as enablers for the realization of smart radio propagation environments which can provide performance improvements with low energy consumption in future wireless networks. However, to reap the potential gains of RIS it is crucial to jointly design both the transmit precoder and the phases of the RIS elements. Within this context, in this paper we study the use of multiple RIS panels in a parallel or multi-hop configuration with the aim of assisting a multi-stream multiple-input multiple-output (MIMO) communication. To solve the nonconvex joint optimization problem of the precoder and RIS elements targeted at maximizing the achievable rate, we propose a novel iterative algorithm based on the monotone accelerated proximal gradient (mAPG) method which includes an extrapolation step for improving the convergence speed and monitoring variables for ensuring sufficient descent of the algorithm. Based on the sufficient descent property we then present a detailed convergence analysis of the algorithm which includes expressions for the step size. Simulation results in different scenarios show that the use of multiple RIS panels combined with the proposed algorithm can be an effective solution for improving the achievable rates.


I. INTRODUCTION
O VER the last two decades we have been witnessing tremendous advances in wireless communication systems, which have enabled the support of a multitude of multimedia applications while also coping with the evergrowing number of mobile devices. Still, it is already clear that the full support of demanding applications such as immersive reality, holographic projection, autonomous driving, and tactile Internet, will only be feasible in future 6G and beyond networks [1]. Addressing this ever-increasing demand for higher data rates motivates further exploration of the mmWave bands that started to be integrated into 5G systems, and the move to even higher frequencies such as the Terahertz (THz) band [2]. While these bands can offer large available bandwidths they also pose several difficult challenges for implementing viable communication systems. One of the main problems lies in the very high propagation loss, which can drastically limit the signal coverage range. To address this limitation, one can resort to antenna arrays comprising a massive number of elements since the short wavelengths at these bands enable the deployment of elements with small footprints. Combined with multi-input multi-output (MIMO) beamforming techniques, these massive antenna arrays are able to provide high directional gains. However, the transmitted signals are still very susceptible to obstacles [3] which has raised the interest in recent years for the adoption of reconfigurable intelligent surfaces (RISs).
A RIS is an artificial planar structure, typically composed of a large number of low-cost, quasi-passive elements which can be independently tuned to induce a specific phase shift/amplitude on the reflected electromagnetic wave [4]. These elements can be reconfigured in real-time thus providing the capability to modify the wireless communication environment. Therefore, RISs have arisen has a potential solution for increasing coverage of communication systems as they have the capability to reflect and shape the incoming radio waves without the need for a power amplifier, RF chain, nor sophisticated signal processing [5], [6]. This allows the use of RIS for accomplishing passive beamforming which can significantly improve the spectral efficiency with low energy consumption [7], [8], [9]. Still, realizing RIS-based communication systems is extremely challenging as recent design formulations have shown [10]. The optimization problems tend to be large, the RIS matrices and precoding matrices are coupled, and they typically include non-convex constraints such as constant modulus or discrete-valued phase shifts. Therefore, finding global optimal solutions efficiently is non-trivial. A strategy to simplify the problem is to independently design the precoder and RIS phases. This approach is followed within the context of RIS-aided MIMO THz systems in [11], where an adaptive gradient method is applied for optimizing the RIS phases. Once the RIS matrix is computed, the precoder and combiner matrices are then obtained using a singular value decomposition (SVD) of the cascaded channel. While the approach achieves large gains over a RIS with random phase, the complexity tends to become very high when working with large antenna arrays and RISs with many elements. Also relying on the independent optimization of the precoder and RIS, [12] presented a low complexity proximal gradient algorithm which receives a previously computed SVD-based precoder and then optimizes the RIS matrix resorting to simple element-wise normalization. Due to its simplicity, it can cope with large problem settings while providing large gains. Still, to obtain improved performance it is necessary to jointly optimize both the precoder and RIS. However, since finding the global optimal solution is difficult, most works try to propose efficient approaches for finding local optimal solutions. For example, the work in [13] considered a joint beamforming scheme based on fractional programming optimization for RIS-assisted THz communications. Even though it corresponds to a limited scenario, it showed encouraging results. In [14] the authors targeted the minimization of the transmitted power constrained on the signal to interference plus noise ratio required at the receivers. To solve the problem, they proposed an alternating optimization (AO) algorithm that iteratively optimizes the RIS phase shifts and the transmit beamforming vector. The coverage and achievable rate performance are significantly improved when compared to conventional systems without RIS. However, it requires many iterations, especially when the number of RIS elements is large. Another iterative algorithm based on the AO approach was proposed in [15] but it only optimizes the transmit covariance matrix or one of the reflection coefficients with all the other variables being fixed. While this allows the implementation complexity to be reduced, it can still be high when the number of reflecting elements is large. In [16], the authors addressed the joint optimization of the transmit beamformer and RIS phase shifts with the objective of maximizing the achievable rate in single-antenna single-user scenarios. Algorithms based on the fixed-point iteration method and manifold optimization were proposed for solving the problem. Also aiming at maximizing the achievable rate but in a multi-stream MIMO system, the authors in [17] proposed an iterative projected gradient method (PGM) for solving the joint optimization problem of the covariance matrix of the transmitted signal and the RIS elements. The PGM algorithm was shown to achieve similar achievable rates to the AO method from [14] in single-user scenarios but requiring with fewer iterations. However its complexity can become very high when using large transmitter and receiver arrays.
While all the previous works consider single RIS scenarios, the adoption of multiple RIS can potentially enable additional degrees of freedom and ensure uninterrupted connectivity inside a target area, as discussed in [18]. Therefore, a few recent works with multiple RIS have started to appear [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31]. Regarding distributed RISs, two main approaches can be adopted. In the simpler approach, only one of the RIS panels selected according to some criteria participates in the transmission. In the second approach, all the RISs aid in establishing the communication which can potentially provide better performance at the cost of additional complexity [19]. Focusing on the second approach, in [20] the authors showed that unless the RIS is located near the base station (BS) or near the user, the adoption of multiple RISs is capable to achieve higher ergodic capacity than a single large panel RIS. Considering a single antenna BS and receiver, the outage and the rate performance of a multi-RIS aided system with limited backhaul capacity was analyzed in [21]. It was shown that, while rate performance degradation can occur due to backhaul limitation, the capacity can be improved when one of the RISs is deployed close to the line between the BS and the UE. Focusing on energy efficiency, in [22] the authors established a federated deep learning framework for the configuration optimization of multiple RISs in highdensity networks with multiple single-antenna users. The work in [23] studied the joint active and passive precoding design, where multiple parallel RISs assist a single stream transmission between a BS and a single-antenna receiver. In this case, a semidefinite relaxation-based method was proposed for solving the problem. Also concerning multiple parallel RISs, the work in [24] considered a simpler approach where the RISs matrices are first optimized followed by the computation of a hybrid precoder/combiner using the SVD.
Regarding the use of multiple RISs in a multi-hop configuration, a few recent studies have started to appear such as in [25], which emphasized the need for innovative design approaches and optimization methods in this type of systems. The authors in [26] addressed the problem of optimizing the RIS deployment and proposed a hybrid strategy that includes not only single reflection RIS-aided links but also doublereflection links. The approach can be exploited to establish additional line-of-sight (LOS) paths between the BS and users, especially when severe blocking exists in the direct and single-reflection links. Considering a single-antenna BS and user, [27] showed that given a sufficient total number of RIS elements, a LOS double reflection link can outperform a single RIS deployment, even if the former suffers a more severe pathloss due to the double signal reflection. Also addressing RIS-aided dual-hop links, the authors in [28] presented a joint active and passive beamforming design to maximize the secrecy rate in multiple-input single-output channels. To cope with the more general multi-hop scenario, the authors in [29] applied machine learning methods, in particular a deep reinforcement learning based framework, to find feasible solutions to the design problem of transmit beamforming at the BS and phase shift matrices at the RISs. This system model considered a multi-antenna BS transmitting to several single-antenna users. The work in [30], adopted a statistical approach to the multi-RIS beamforming problem in the absence of channel information. Similarly to other works, the system model is simplified as it assumes that the transmitter and receiver are equipped with one antenna each. Also focusing on multi-hop RIS-aided communications, [31] proposed the optimal active and cooperative passive beamforming for the downlink connection between a multi-antenna BS and a remote single-antenna user. For this setting, it also presented the optimal RIS selection and beam routing solution, which maximizes the cascaded LoS channel power. It was shown that there exists a fundamental trade-off between minimizing the end-to-end path loss and maximizing the cooperative passive beamforming gain.
Motivated by the work above, in this paper we study the realization of smart radio propagation environments where multiple passive RIS panels deployed in the vicinity of a BS assist a transmission towards a user. Whereas most of the previous works on active and passive beamforming designs restrict the system model to either parallel RIS or to dual-hop RIS links, and to simple configurations relying on a single stream, a single antenna user and sometimes a single-antenna BS, in this paper we consider a more general multi-stream MIMO setting where both the BS and user can be equipped with multiple antennas potentially arranged in large arrays, as envisioned in mmWave/THz communications. Targeting multiple RIS-aided links in parallel or multi-hop configurations, we address the joint design problem of the precoder and of all the RIS elements in the panels with the aim of maximizing the achievable rate. To efficiently tackle this nonconvex problem with largescale settings, we apply the monotone accelerated proximal gradient (mAPG) method [32] in order to obtain a novel iterative algorithm that is simple to implement and is guaranteed to converge to a critical point of the problem. Although gradient-based methods have already been successfully applied for optimizing the achievable rate of RIS-aided systems such as the PGM in [17], they typically exhibit slow convergence speed. To counter this problem, the adopted mAPG method includes an extrapolation step based on a linear combination of the previous and current solution estimates. It also includes the computation of additional monitoring variables for ensuring sufficient descent of the algorithm. This property facilitates the convergence analysis of the algorithm which, otherwise would be difficult to perform due to the nonconvex nature of the problem. The mAPG approach is first developed for multiple parallel RISs and then extended to multi-hop scenarios.
The main contributions of the paper can be summarized as follows.
• We formulate the joint achievable rate maximization problem defined over the precoder and the phase shifts of all RIS elements, considering a multi-stream MIMO communication system assisted by multiple parallel RIS panels. To find a solution for this problem we derive an mAPG-based iterative algorithm, presenting exact expressions for the required gradients and proximal operators. • We show that the objective function in the optimization problem has Lipschitz continuous gradients and provide a bound for the Lipschitz constants. Using the Lipschitz continuity of the gradients we then prove that the proposed joint precoding and RIS mAPG based (JPR-mAPG) algorithm is guaranteed to converge to a critical point. • We extend the JPR-mAPG algorithm for multi-hop RIS-assisted scenarios, providing expressions for the required gradients. We establish the Lipschitz continuity of the gradients also for this scheme, presenting the respective bound for the Lipschitz constants. This ensures that the convergence of the algorithm is still valid for the multi-hop case. • The achievable rate of the proposed approach is evaluated through numerical simulations and compared against the RIS-optimized APG algorithm (R-APG) algorithm that only optimizes the RIS phases shifts, the AO algorithm and the PGM algorithm which, while being a gradient-based algorithm, does not apply acceleration. Results show that, while the adoption of some type of optimization in the design of the RIS can offer clear gains over a static approach, the proposed JPR-mAPG approach is often able to provide higher rates. Furthermore, unlike the PGM and AO algorithms, JPR-mAPG does not have a computational complexity with a cubic or quadratic dependency on the transmitter and receiver array sizes, thus making it better suited for scenarios with very large antenna arrays which are common in the mmWave/THz bands. In the case of multi-hop, it was shown that the RIS panels optimized with the JPR-mAPG algorithm can overcome the multiplicative effect of the path losses and provide gains, enabling signal coverage when severe blockage between transmitter and receiver exists. The rest of the paper is organized as follows. Section II defines the system model and optimization problem for the multiple parallel RIS-assisted scheme, presenting the proposed JPR-mAPG algorithm and providing the respective convergence analysis. Section III extends the work to a multi-hop RIS-assisted scheme. Section IV presents several simulation results in different scenarios, whereas the conclusions are provided in Section V.
Notation: Matrices and vectors are represented by uppercase and lowercase boldface letters, respectively. (·) T and (·) H denote the transpose and conjugate transpose of a matrix/vector, · 2 is the 2-norm of a vector/matrix, · F is the Frobenius norm, diag( . ) represents a diagonal of a matrix if the argument is a matrix or represents a diagonal matrix if the argument is a vector, blkdiag i=1,...,N {A i } is a block diagonal matrix whose diagonal elements are matrices A i (i = 1 · · · N), ∂g(x) is the sub-differential set of g(x), I n is the n×n identity matrix and I D (v) is the indicator function which returns 0 if v ∈ D and +∞ otherwise. We also consider that for matrices we have n i=1 A i = A n ×· · ·×A 1 .

A. SYSTEM MODEL AND PROBLEM FORMULATION
First, we address the case where, to help establish the communication, the receiver is served by N PAN RIS panels simultaneously, as illustrated in Fig. 1. The transmitted signal comprises N s data streams and is represented as s = [s 1 · · ·s N s ] T , where s i ∈ C corresponds to an amplitude and phase modulated symbol with E[||s|| 2 ] = N s . The signal arriving at the receiver array, can be expressed as where √ ρ denotes the power per stream, H ∈ C N rx ×N tx is the RIS dependent channel matrix, N tx is the number of transmit antennas, N rx is the number of receive antennas, F ∈ C N tx ×N s is the BS precoder matrix and n ∈ C N rx ×1 is the noise vector, which contains independent zero-mean circularly symmetric Gaussian samples with covariance σ 2 n I N rx . We consider the existence of blockages between multiple RIS which, combined with the high path-loss at high frequencies, will tend to result in negligible powers in the signals that are reflected by the RISs two or more times. Therefore, similarly to [24], we ignore the cross-reflections and approximate the channel matrix as where H S,D ∈ C N rx ×N tx denotes the direct channel between the transmitter and receiver, H S,i ∈ C N ris ×N tx is the channel matrix between the transmitter and the i th RIS panel, and H i,D ∈ C N rx ×N ris is the channel matrix between the i th RIS panel and the receiver. Matrix i ∈ C N ris ×N ris models the phase shifting effect of the i th RIS panel. This matrix has a diagonal structure and can be written represents the phase shift of the m th RIS element and a denotes the amplitude of the reflection coefficients. Along the paper we will assume perfect channel state information (CSI) at the BS. Even though, obtaining CSI in RIS aided schemes is challenging, several methods have been proposed in recent years [10], [33].
In this paper we design the precoder and RIS matrices to maximize the achievable rate of the system which, assuming Gaussian signaling, can be written as [8] in bits/s/Hz, with P n denoting the noise power (i.e., P n = σ 2 n ). It is important to remind that the channel matrix H, and thus also the rate, depend on the RIS coefficient vectors ϕ i , i = 1 · · · N pan . The optimization problem can then be formulated as the set of all complex valued vectors whose elements have a magnitude of a.

B. ACCELERATED PROXIMAL GRADIENT BASED SOLUTION
The coupling between optimization variables F and ϕ i in (4a), as well as constraint (4c) make the problem nonconvex and difficult to solve efficiently. Furthermore, when considering a typical RIS assisted massive or ultra-massive MIMO scenario, formulation (4) will typically correspond to a large-scale optimization problem. This motivates the resort to first-order methods with low iteration cost for solving the problem. In this case, we propose an algorithm based on the application of the monotone Accelerated Proximal Gradient (mAPG) method for nonconvex optimization problems described in [32]. First, we use the indicator function to encode constraints (4b) and (4c) and rewrite problem (4) as To find a solution for (5) we apply a gradient step consisting of , followed by a proximal mapping, whose operator for a function g is defined as To improve the typical slow convergence rate of gradientbased methods, we can apply an extrapolation step as proposed in [34], which involves the computation of additional variables (P (q) , y N pan ). However, this extrapolation may result in a bad update which, while not causing the objective function to increase, makes the convergence difficult to analyze, especially due to the nonconvex nature of the problem. Therefore, we add an extra set of variables N pan ) whose role is to monitor and correct the update of (F (q) , ϕ N pan ). Algorithm 1 summarizes all the steps of the proposed approach. Variables α, Q and t k denote the step size, maximum number of iterations and extrapolation parameter, respectively. In step 5 and 6, the proximal operator of the indicator functions corresponds to the Euclidean projection over the respective sets and can be computed as and where ∅ denotes the Hadamard division and |·| represents the absolute value applied elementwise. Note that these two projections enforce constrains (4b) and (4c).
Regarding the later projection, we note that in the special case of prox αI U 1 (0), any point of the form ae jθ with θ ∈ [0, 2π) is a valid solution. The complex-valued gradient of f (F, ϕ 1 , . . . , ϕ N pan ) required in the proposed algorithm can be computed according to the following Lemma.
Lemma 1: Let f (F, ϕ 1 , . . . , ϕ N pan ) be defined as in (4a) with the channel matrix represented as (2). Then, the corresponding complex-valued gradient with respect to F * and ϕ * i (i = 1, . . . , N pan ) can be computed as Proof: Proof is shown in Appendix A.

C. COMPUTATIONAL COMPLEXITY
Regarding the complexity of the proposed JPR-mAPG algorithm, it is associated mainly to steps 5, 6 and 7. These steps require the computation of the channel matrix using (2) which has a complexity cost of O(N tx N ris N rx N pan ).
Steps 5 and 6 are dominated by the computation of the gradients ∇ F * f and ∇ ϕ * i f (i = 1 · · · N pan ), which entail several matrix multiplications and a small matrix inversion, requiring O(N s N tx N rx ) and O(N tx N ris N rx N pan ) operations, respectively. As for the projections (7) and (8), they are relatively cheap having a complexity of O(N s N tx ) and O(N ris ), respectively. Step 7 computes the objective function f (F, ϕ 1 , . . . , ϕ N pan ) which also comprises several matrix multiplications and the determinant of a small matrix. In this case the complexity is O(N tx N ris N rx N pan + N s N tx N rx ).
Steps 5-7 are repeated for Q iterations which, keeping only the dominant terms, results in an overall computational complexity order of O(Q(N tx N ris N rx N pan +N s N tx N rx )). As a comparison, the complexity of the R-APG algorithm [12] (directly extended to the parallel multi-RIS case) . While the complexity of both JPR-mAPG and R-APG grows with the product of the transmitter, receiver and RIS array sizes and has a linear dependency on the number of RIS panels, the R-APG algorithm has a stronger dependency on the receiver array size (grows with O(QN tx N rx 2 ) and O(QN rx 3 )). Bearing in mind that they were designed for a single panel RIS (i.e., N pan = 1), the AO method [15] has a complexity of O( ) (L is the number of initial random phase shifts realizations and Q AM is the number of outer iterations) and the PGM approach [17] has a complexity of The PGM algorithm has a dependency on the product of the transmitter, receiver and RIS array sizes, as happens with JPR-mAPG. However, its complexity also has a cubic dependency on the transmitter and receiver array sizes (grows with O(Q PGM N tx 3 ) and O(Q PGM N rx 3 )) which may hinder the application to very large arrays. The AO algorithm also suffers from the same problem as it also grows with the cubic size of the transmitter and receiver arrays.

D. CONVERGENCE ANALYSIS
In this subsection we show that with the proper selection of the step size, the algorithm can be guaranteed to converge to a critical point. To help prove the convergence of the algorithm we first present the following theorem.
Proof: Proof is shown in Appendix B.
After establishing the Lipschitz continuity of the gradients, the convergence of the algorithm can be stated in the following theorem.
Theorem 2: Let the step size in Algorithm 1 be selected N pan )}, then it must be a critical point of (4).
Proof: Proof is shown in Appendix C.

III. MULTI-HOP RIS-ASSISTED COMMUNICATION A. SYSTEM MODEL
The second case that we address in this paper corresponds to a multi-hop scheme where, to help circumvent existing obstructions, the signal transmitted by the BS arrives at the receiver after being sequentially reflected by N PAN aiding RIS panels, as illustrated in Fig. 2. For this type of scenario, we still use the received signal model (1), but with the channel matrix defined as In this expression H 1 ∈ C N ris ×N tx is the channel matrix between the transmitter and the first RIS panel, H i ∈ C N ris ×N ris (i=2, . . . , N pan ) is the channel matrix between RIS panels i and i+1 and H N pan +1 ∈ C N rx ×N ris is the channel matrix between the N th pan RIS panel and the receiver. It is important to note that, with the exception of the first and last RIS, in this model we are neglecting the direct links between the BS and a RIS and also between a RIS and the user, as these are assumed to be all NLOS and suffer a high propagations loss (considering the envisioned mmWave/THz frequencies). The same happens to the multiple reflections between non-consecutive RIS panels. These assumptions are often used in multi-hop RIS aided works [25], [29]. If a more accurate channel model that includes for example the direct links between the middle RISs and the BS/user is required, the proposed algorithm can still be directly applied (only the gradient expression with respect to ϕ * i will need to be re-computed).

B. PRECODER AND RIS OPTIMIZATION
Since the received signal model in the multi-hop case is essentially the same as for the multiple parallel RIS scheme, we can also formulate the precoder and RIS design problem as (4a)-(4c) and solve it using Algorithm 1. However, due to the different channel matrix representation, the complexvalued gradient of f (F, ϕ 1 , . . . , ϕ N pan ) will be different, as stated in the following lemma.
Lemma 2: Let f (F, ϕ 1 , . . . , ϕ N pan ) be defined as in (4a) with the channel matrix represented as (18). Then, the corresponding complex-valued gradient with respect to F * can be computed as (9) whereas the gradient with respect to ϕ * i (i = 1, . . . , N pan ) can be computed as with and Proof: The derivation is straightforward by directly extending the approach presented in Appendix A for the multiple parallel RIS-assisted channel model.
Convergence of the JPR-mAPG algorithm can also be established for the multi-hop case through Theorem 2, as long the gradients are still Lipschitz continuous. The following theorem states that this condition is still satisfied in this case.
Theorem 3 : Function f (F, ϕ 1 , . . . , ϕ N pan ) defined as in (4a), with the channel matrix represented as (18), has Lipschitz continuous gradients over feasible set (4b)-(4c), thus satisfying (11) with constant L given by Proof: Proof is similar to the one in Appendix B for Theorem 1. Therefore, the details are omitted here.

C. COMPUTATIONAL COMPLEXITY
In the case of the multi-hop RIS-assisted communication, the main differences lie on the computation of the channel matrix (18) which has complexity of O(N tx N ris 2 (N pan − 1) + N tx N ris N rx ), on the gradients ∇ ϕ * i f (i = 1 · · · N pan ) which require O(N ris 2 N rx (N pan − 1) + N s N tx N rx + N tx N ris N rx N pan ) operations and also on the computation of the objective function f (F, ϕ 1 , . . . , ϕ N N tx N rx )). Similarly to the parallel RIS configuration, the complexity grows with the product of the transmitter, receiver and RIS array sizes. However, in this case it also reveals a quadratic dependency on the panel size as a result of the cascaded channels. This dependency also exists in the R-APG algorithm [12] (extended to the multi-hop RISassisted configuration), since its computational complexity is O(N rx 3 + Q((N tx N ris 2 + N ris 2 N rx )(N pan − 1) + N rx 3 + N tx N rx 2 + N tx N ris N rx N pan )).

IV. NUMERICAL RESULTS
In this section we evaluate the achievable rate of the proposed approach with the aid of Monte Carlo simulations. Two main scenarios are considered according to the two versions of the JPR-mAPG algorithm described previously namely, one based on multiple parallel RIS and one based on multi-hop.

A. CHANNEL MODEL
Even though the proposed approach is independent of a specific channel model, in this evaluation we adopt a clustered geometric channel [35], that is commonly assumed in mmWave and sub-THz literature [36], [37]. In the case of the channels between the transmitter, RIS and receiver, we consider that the RIS panels were properly located so that they consist of a LOS and a non-line-of-sight ( where K Rice defines the energy ratio between the LOS and NLOS components, α S,i l,n,m is the complex gain of the l th path between element m and n of the BS and i th RIS panel (normalized as  (27) and (28). When the distances between transmitter, RIS and receiver are larger than the Fraunhofer distance, we adopt a planar wave model. In this case, the respective NLOS channel matrix is approximated as ) (φ denotes the azimuth and θ the elevation). β S,i NLOS represents the path loss, which we express as where d i S↔RIS denotes the distance between the center of the transmitter array and the center of the i th RIS panel and γ is the path loss exponent. A similar expression is used for the LOS matrix (in this case with only one path). For the direct channel between the BS and the user, we consider that due to the presence of many obstructing obstacles it will typically consist solely of NLOS components and, therefore, can be expressed similarly to (28) or (29).
In all simulations uniform planar arrays (UPAs) with √ N tx × √ N tx , √ N rx × √ N rx and √ N ris × √ N ris elements are adopted at the transmitter, receiver and RIS, respectively. In this case, the steering vectors for the transmitter array are given by where p, q are the antenna indices, and d S is the interelement spacing. A similar expression is adopted for the RIS and receiver arrays. In the simulations, the inter-element spacing of the transmitter and receiver arrays, as well as the distance between the centers of adjacent elements of the RIS are always set as λ/2, whereas the sizes of the RIS elements are λ/2 × λ/2.

B. MULTIPLE PARALLEL RIS SCENARIO
In the first scenario we consider a large indoor environment (office/shopping mall) where one or more RISs are deployed in the vicinity of the BS. The transmission is configured with N s = 3, N tx = 64 and N rx = 16. The operating frequency is set to 28 GHz with an occupied bandwidth of 800 MHz. We assume that the amplitude of the reflection coefficients is a =1. Based on [40] we apply a path loss exponent of 1.90 for the LOS propagation and 4.39 for the NLOS propagation. The number of propagation paths is set to N ray = 10, with the azimuth and elevation AoD/AoA uniformly random distributed in (−π, π) and (−π/2, π/2), respectively [41]. For the indirect link through the RIS we apply K Rice = 10, whereas the direct link has no LOS component (K Rice = 0). All simulated results are averaged over 500 random channel realizations. To simplify the setups, we work in a two-dimensional coordinate system (the third coordinate is assumed to be the same for the BS, RIS panels and receiver) with the BS located at (0,0) m and the receiver at (50,0) m. In Fig. 3 we evaluate the convergence behavior of the JPR-mAPG algorithm. The RIS panels have N ris = 256 elements each and are placed at positions (10,10), (10,-10),  [42]. It can be observed that even if the initialization corresponds to a higher initial achievable rate, it does not seem to have a relevant impact on the quality of the final solution provided by the algorithm. The curves referred in the legend as "no accel." represent the case where the JPR-mAPG algorithm is operating with P (q) = F (q) and y which means that no extrapolated and monitoring variables are being used and the algorithm reduces to a conventional proximal gradient approach (only steps 6 and 9 of Algorithm 1 are applied). It can be observed that while the normal JPR-mAPG algorithm requires less than 25 iterations to converge, the version without acceleration can take from 50 to over 100 iterations, depending on the number of panels. Fig. 4 shows the achievable rate as a function of the transmitted power with N ris = 256 and only one panel. The evaluation is performed with the RIS in two different positions: (10,10) m and (25,10) m. As benchmarks, we include curves for a static RIS (acting as a simple mirror), the R-APG from [12], the AO method from [15] and the PGM approach from [17]. To limit the computational complexity, the maximum number of iterations in all these benchmark algorithms is set to 300. As a reference we also include the case of no RIS. It can be seen that the adoption of a RIS can substantially improve the achievable rate over a transmission without RIS, with the proposed JPR-mAPG algorithm providing a 5-fold improvement at P tx = 30 dBm. We can also observe that this gain is higher when the RIS is closer to the BS (or closer to the user, even though these curves were not included). Comparing the different RIS approaches, we can  see that both JPR-mAPG and AO achieve similar results, with PGM also achieving close results at high transmission powers. R-APG displays a somewhat inferior performance as it only tries to optimize the RIS matrix after the precoder is computed.
Keeping basically the same setup but with P tx = 30 dBm and a single RIS positioned at (10,10) m, Fig. 5 evaluates the impact of the number of RIS elements using different algorithms. We can observe that the achievable rate clearly improves with the number of RIS elements, with the curves showing a higher slope for low N ris values, which gradually decreases as the number of elements grows. Similarly to the previous figure, all the evaluated RIS optimization algorithms provide substantial gains over a static RIS, with the proposed JPR-mAPG approach achieving the best results, especially with larger RISs.
Considering the same scenario, Fig. 6 shows the achievable rate versus transmitter-receiver distance, depicting the effect of using multiple RIS panels combined with the proposed algorithm. As expected, the adoption of multiple RIS panels can improve the rates even further. However, similarly to what was observed in the single panel case, the location of the various panels can have a significant impact on the results. Considering the case N pan = 6, it can be seen that spreading the panels along a straight path starting from the transmitter instead of locating them all in the vicinity of the BS can result in a reduced number of panels being effective in aiding the communication depending on the position of the receiver. It is important to note that while the optimal deployment of multiple RIS is outside the scope of this paper, this behavior is in line with previous studies that stated that placing RISs near the BS and user can not only minimize the product-distance path-loss but also increase the path diversity and increase the resilience of the network against signal blockage [25], [26]. Still, finding optimal RIS deployments as well as RIS-user association and multi-hop RIS-aided routing are still active topics of research [25], [30].
While the proposed JPR-mAPG algorithm assumes continuous phase shifts, in practical implementations of RISassisted systems only a finite number of phase shifts can be supported in the RIS elements [43]. In this case we adopt the simplest approach that is commonly followed with continuous based algorithms which relies on directly quantizing the obtained final solutions to the nearest discrete values [17]. Using this approach, Fig. 7 evaluates the impact of using discrete phase shifts at the RIS, considering the same setup of the previous figures. Curves for different numbers of multiple RIS panels are presented, where the locations are assumed to be the same as those used in Fig. 6 (the case N pan = 6 corresponds to the deployment where the panels are all in the vicinity of the BS). As expected, the quantization of the phase shifts results in some performance loss which is worse when using one bit quantization. In this case the degradation ranges between 16%-22%, depending on the number of panels. Still, increasing the number of quantization bits strongly reduces the degradation, and with 3 bits the loss to the ideal unquantized curves is under 1% for all the different sets of panels.

C. MULTI-HOP RIS-ASSISTED SCENARIO
In the second scenario we consider the transmission between an access point (AP) and user which is supported with the aid of two RIS panels, establishing a two-hop communication scheme. The layout with the positions of AP, user and RIS panels is presented in Fig. 8. In this case the system is operating at the sub-THz band with the frequency set to 142 GHz and the occupied bandwidth kept at 800 MHz. The number of propagation paths is set to N ray = 3 and K Rice = 10. For the AP→RIS1, RIS1→RIS2 and RIS2→user links we consider LOS propagation with a path loss exponent of 2.05 [40]. For the direct link AP→user (as well as for the RIS1→user link considered for the N pan = 1 results) we assume NLOS propagation with a path loss exponent of 4.6 [40]. The other parameters are configured as N s = 1, N tx = 256, N rx = 36.
In Fig. 9 we study the convergence behavior of the JPR-mAPG algorithm in this multi-hop scenario (N pan = 2). Both panels are set with N ris = 2500 elements. As previously observed in the parallel RIS scenario, the normal algorithm with the extrapolation steps converges faster than the "no accel." version which relies on a direct projected gradient step. The former requires around 50 iterations whereas the latter needs almost 150 iterations to converge. It can also be observed that in this multi-hop configuration, there seems to be a small impact on the convergence speed when initializing both RIS panels as static reflectors instead of adopting random phases.
Keeping the same sizes of the RIS panels as in the previous figure, Fig. 10 shows the achievable rate versus transmitted power of the proposed JPR-mAPG approach as well as of the other benchmark algorithms, considering both two-hop and one-hop links. The one-hop case corresponds to having RIS1 in operation and RIS2 removed. Curves with N pan = 2 are not included for PGM and AM since these algorithms were designed for one-hop links. With only one RIS panel, the gain provided by any of the algorithms over the case without a RIS is very small. This small gain is a consequence of the RIS1→user link being NLOS with a high path loss exponent. So, even if the AP→RIS1 is LOS and has a low path loss exponent, the path loss of the indirect link depends on the product of these partial path losses and therefore, it would be necessary to adopt a very large RIS to obtain a strong indirect link which could result in substantial gains. In this case, the two-hop link is more effective since by using two similarly sized panels that enable establishing an indirect link where all segments have LOS propagation with a low path loss exponent, the optimized RISs are able to provide substantial gains even under the multiplicative path loss effect. In fact, for a transmitted power of P tx = 30 dBm, the two-hop JPR-mAPG algorithm is able to increase the achievable rate from 7.4 bps/Hz to 13.4 bps/Hz. Similarly to what was observed in the multiple parallel RIS scenario, adopting a joint design of the precoder and RIS (JPR-mAPG), achieves better performances than applying an algorithm that only optimizes the RIS panels (R-APG).
In order to get a better insight on the impact of the RIS panels, especially at these high frequency bands, Fig. 11 presents the achievable rate for the same settings of Fig. 10, but considering the existence of the indirect link only. Curves with different numbers of RIS elements per panel, N ris , are shown. Besides the results with the JPR-mAPG algorithm, we include results with static RISs (simple reflectors) and also a curve representing a reference direct transmission with LOS propagation, a path loss exponent of 2.05 and a distance equal to the overall length of the dual-hop RIS aided link. We can see that working in a higher frequency band and with a multi-hop scheme requires the adoption of properly optimized RIS panels with a higher number of elements (even though not necessarily larger due to the smaller wavelength) in order to achieve acceptable rates. In fact, adopting simple reflectors requires large panels in order to obtain some noticeable improvements on the achievable rate. In contrast, if the RISs are optimized using the JPR-mAPG algorithm, they can overcome the multiplicative effect of the path losses of the segments comprising the indirect link and large improvements can be observed in the achievable rate. This capability to overcome the higher path loss caused by multiple signal reflections and provide substantial beamforming gains becomes more evident when the number of RIS elements is large, which is in line with the conclusions in [31]. In fact, with N ris = 10000, the behavior of the indirect link is becoming close to the reference LOS direct link whose path loss depends solely on the sum of the lengths of the different segments. This type of behavior was previously discussed for single RIS in [44].

V. CONCLUSION
In this paper we addressed the joint design of the transmit precoder and RIS phases within the context of multi-stream MIMO communications assisted by multiple RIS panels. Aiming at solving the nonconvex problem of maximizing the achievable rate, we derived an algorithm based on the mAPG method that can be tailored for the case of parallel RISs as well as for multi-hop RIS-assisted scenarios. The mAPG approach relies on an extrapolation step for improving the convergence speed and a set of monitoring variables that ensure the sufficient descent of the algorithm. Detailed convergence analysis was presented which included expressions for the step size that guarantee convergence to a critical point. Simulation results showed that the proposed JPR-mAPG algorithm can often achieve higher rates than other benchmarked approaches. Furthermore, it was observed that the use of properly located multiple RIS panels operating in parallel can have a substantial impact on the achieved rates, whereas the adoption of several RIS panels in a multihop configuration can be an effective approach for providing coverage in the case of severe blockage effects.

APPENDIX A PROOF OF LEMMA 1
The following derivations for the complex-valued gradient of f (F, ϕ 1 , . . . , ϕ N pan )with respect to F * and ϕ * i (i = 1, . . . , N pan ) are based on the procedure described in [45]. First, we derive the complex differential of f (F, ϕ 1 where we used the relation [45, Table 3.1] d(lndet(X)) = Tr X −1 dX .

APPENDIX B PROOF OF THEOREM 1
The following inequalities will be useful for the proof. For matrices A and B, we have AB ≤ A B (38) for the Frobenius and the 2-norm. Also and Before proving Theorem 1, we first introduce the following Lemma which will be applied in several steps of the proof. 2, and k=1,..,N (N ≥ 2) be a set of matrices. Then we have the following inequality Proof: We prove by induction on N. For N=2 we can write: (2) k 2 (42) thus showing that the inequality is valid. For N+1 we can write N+1 k=1 Assume that inequality (41) Noting that It can be easily shown that K(F, ϕ) 2 ≤ 1 and i 2 = a. Using Lemma 2 we can write Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Applying these inequalities and Lemma 2 to (52) results Similarly, we can also use Lemma 2 and write the following inequality If we square both sides of the expressions in (55) and (56) we can rewrite the inequalities as where we applied the relation Combining both expressions and applying the square root to both sides we finally obtain (59) This completes the proof.