Relay Placement for Maximum Flow Rate via Learning and Optimization Over Riemannian Manifolds

Networks topology can be represented over Riemannian manifolds (i.e., curved surfaces), given the symmetric positive definite (SPD) property of their spectral graphs. Moreover, maximizing flow rate of a baseline network topology through relay placement can be equivalent to finding the relay location that maximizes the geodesic distance (i.e., Riemannian metric) between the representations of a relay-assisted network topology and the baseline one over Riemannian manifolds. Therefore in this paper, we propose two complementary approaches to find relay locations that maximize Riemannian metrics, such as Log-Euclidean metric (LEM), and hence maximize the network flow rate. First, we propose a Riemannian multi-armed bandit (RMAB) reinforcement learning model to track the relay positions, which increase the LEM towards the baseline network. Particularly, selecting a possible relay location is considered as an action, whereas the LEM represents the reward of the RMAB model. Second, we propose a Riemannian Particle Swarm Optimization (RPSO) algorithm that iteratively attempts to find the representation of relay-assisted network topology with maximum LEM towards that of the baseline network over the Riemannian manifold. Simulation results show that both the RMAB and RPSO approaches converge to near-optimum solutions, which in the case of single relay placement achieve 94.3% and 90.6%, respectively, of the maximum possible network flow rate.


I. INTRODUCTION
C ONVENTIONAL mobile networks, with fixed base sta- tions, are not prepared to address the spatio-temporal variability in wireless traffic demands [1].Dynamic networks, which include mobile relays [2] such as road side units (RSUs) over trucks or unmanned aerial vehicles (UAV), can adapt to such spatio-temporal demand variability.More specifically, mobile relays can temporarily move closer to areas of high traffic demand.Finding the best locations for mobile relays, which maximize the network flow rate within such networks, is the overarching objective of this paper.
Multiple criteria for relay positioning have been previously considered in the literature [3], [4].For example, maximizing the algebraic connectivity (AC) of relay-assisted network was considered for 3-dimensional (3-D) UAV positioning in [5] and [6] or 2-D relay positioning in [7].While maximizing the AC naturally enhances the network flow rate, it does not guarantee the maximum achievable flow rate as was shown in [8].While maximizing the network flow rate was directly targeted in [9], it was only considered for the association between small cells and UAVs using an integer programming approach, with no consideration of UAV or relay positioning.
Due to the high quality solutions, computational efficiency, high convergence speed and easy implementation process of particle swarm optimization (PSO) algorithm [10], it was also considered for placement of nodes in wireless networks.For example, the authors in [11] and [12] applied PSO for optimal deployment of nodes in wireless sensor networks using maximum coverage as the optimization criterion.Similar approach with PSO is studied in [13], where energy consumption is used as the cost and the algorithm searches for the near optimal base station locations in heterogeneous sensor networks.The study in [14] tackles a node localization VOLUME 1, 2023 This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/problem using PSO considering minimum localization error as the cost function.
Along with analytical approaches, machine learning (ML) models have also been considered for optimal relay positioning.For example, a reinforcement learning (RL) model was developed in [15], which utilizes channel map estimation in time and space to place a relay so that the signal-tointerference and noise ratio (SINR) at the destination is improved.Multi-agent Q-learning based RL scheme is studied in [16], where the optimal position of the UAVs are determined for a trajectory design problem to amplify the sum transmit rate.RL assisted relay placement for improving localization accuracy of multiple objects was studied in [17].The optimal localization of UAVs is utilized to improve the coverage in [18].The authors in [19] proposed a novel meta-reinforcement learning based relay positioning scheme where the hyperparameters are adaptive and the target is to serve maximum number of user equipments (UEs) within a considered network.Also, coordinate relay positioning based on RL is proposed in [20], where multiple drone base stations aim to find the optimal trajectories to enhance the coverage for a varying network.In addition to simple ML models, multi-layered deep RL solutions were also considered to place a relay to enhance the link capacity [21].
While the aforementioned learning models ( [15], [16], [17], [18], [19], [20], [21]) and analytical approaches ( [11], [12], [13], [14]) have addressed relay placement from various angles, there was no consideration of optimization metrics or learning cost functions that are directly related to network flow rate, and this is the research gap addressed in this paper.In particular, we address this research gap through transforming the relay placement problem to a different domain, namely, Riemannian manifold, over which a cost function that is related to network flow rate, can be defined and optimized.
Riemannian manifolds refer to non-Euclidean (i.e., curved) surfaces such as cones and spheres.Networks topology can be represented over conic (i.e., Riemannian) manifolds through their graph-based Laplacian matrices [8].More specifically, a positively-shifted Laplacian matrix of a particular network topology is a symmetric positive definite (SPD) one, which can be represented as a point over Riemannian manifolds.The utilization of SPD matrices to visualize the network topology offers an abstract representation of the network, enhancing analysis and facilitating learning for various applications.This serves as a significant advantage for incorporating Riemannian manifolds in addressing the relay placement problem in this work.Furthermore larger geodesic distances, such as Log-Euclidean metric (LEM) [22], among such points over the Riemannian manifolds is a reflection of distinguishable functional connectivity paths [23] enabling parallel data flow which can be mapped to the network flow rate [8].Consequently and as we have shown in [8], relay positions that maximize the LEM geodesic distance achieve higher network flow rate than other metrics (e.g., algebraic connectivity).
While such concept was established in [8], no practical solution was proposed except for an exhaustive searching over all possible locations of relays.Therefore and unlike [8], there is a need to find efficacious solutions for relay placement through mapping of Riemannian manifolds, and this is the specific goal of this paper.
In this paper, we propose two novel solutions for relay selection towards achieving maximum network flow rate.The first one considers an RL approach, while the second employs a PSO-based one.Together, the two solutions present a comprehensive approach on addressing relay placement problem through mapping over Riemannian manifolds.Both solutions depend on representing any potential relay-assisted network topology as a concatenation of two graphs, which are a) baseline graph (i.e., without relays) with edges among its network nodes and b) additional edges among the network nodes due to connectivity enhancements via that particular relay.Such relay-assisted network topology, and similar to the baseline one, can be modeled as a point over Riemannian manifolds, thanks to its underlying SPD structure.Consequently, the proposed solutions aim to maximize the LEM geodesic distance between the representation points of relay-assisted networks and the baseline one over the Riemannian manifold.
Utilizing such problem modeling, first we employ a multi-armed bandit (MAB) model over Riemannian manifolds, which will be known as Riemannian MAB (RMAB).MAB is a reinforcement learning one that is known for its good exploration-exploitation trade-off [24].While MAB has been previously employed in multiple wireless communication challenges such as beamforming vector tracking in millimeter wave (mmWave) mobile systems [25], [26], dynamic channel selection [27], and resource allocation [28], there was no consideration of applying it over Riemannian manifolds to address wireless challenges (e.g., relay placement) as is proposed in this paper.The proposed MAB-based learning model of this paper, namely, Riemannian MAB (RMAB), learns to select the best relay location from a set of possible discrete relay locations, through finding maximum LEM from the baseline network representation over the Riemannian manifold.We show that the proposed RMAB model achieves 94.3% of the maximum network flow rate after a few learning episodes and with less complexity compared to our proof of concept work in [8].
Second, we propose an analytical optimization approach, namely Riemannian Particle Swarm Optimization (RPSO), that searches for the best relay positions through continuous optimization over the Riemannian manifold.While particle swarm optimization is a popular bio-inspired algorithm and has been studied in several applications including [11], [29] for positioning nodes to maximize the coverage, or for localization problems using minimum localization error as the optimization metric [14], [30], it has not been studied in applications involving Riemannian manifolds for relay selection towards maximum flow rate.In this work for finding the relay with maximum LEM distance towards the baseline network, the proposed RPSO algorithm minimizes the negative of the LEM distance, which is used as the cost.We show that our RPSO algorithm converges once a criterion is satisfied, and achieves 90.6% of the maximum network flow rate.
We summarize the contributions of this paper as follows.
• Representing networks topology as points over Riemannian manifolds, using their SPD structures, which paves the road towards finding relay locations with maximum network flow rate.
• Employing MAB to efficiently learn the best relays that maximize LEM distance among a discrete set of relay-based representations over the Riemannian manifold.
• Employing PSO model for efficiently maximizing the LEM distance through continuous optimization over the Riemannian manifold.The abbreviations used in this paper are listed in Table 1.The rest of the paper is organized as follows.Section II introduces the system model and preliminaries about Riemannian geometry.The main relay positioning problem for maximum flow rate and its mapping towards LEM metric is given in Section III.Sections IV and V explain our LEM-based RMAB and RPSO approach for finding the best relay towards maximizing the LEM distance and network flow rate.Section VI presents the performance evaluation of the proposed solutions with comparison to appropriate benchmarks.Finally, we conclude the paper in Section VII.

II. SYSTEM MODEL AND PRELIMINARIES
We introduce the readers with some preliminaries on Riemannian manifolds before describing our system model for optimal relay positioning.

A. RIEMANNIAN MANIFOLDS
A topological manifold M represents non-Euclidean (i.e., curved) surfaces and is defined by a space that is locally similar to a Euclidean one.As shown in Fig. 1, a tangent space T x M at any point x ∈ M is a set of all tangent vectors ν which are derivatives of curves crossing the point x [31].Riemannian manifolds are differential ones with mathematically-tractable geodesic distance metrics  (e.g., LEM [22]), which are inner products on the space of T x M and compute the shortest length of curves between any two points over Riemannian manifolds [32].SPD matrices generate points over the conic Riemannian manifolds [33] and enables the employment of novel Riemannian-geometric solutions to solve problems over the Riemannian surface.Riemannian manifolds were studied previously for several applications including beamforming codebook design via learning channel covariance matrices [34] or link scheduling for device-to-device pairs [35], [36], [37].

B. SYSTEM MODEL
Fig. 2 shows an example of a network topology where relays are placed to increase the network flow rate.Such network topology can be represented as a graph G(N , E), where N ={n 1 , n 2 , • • • , n K } represents the set of K available network nodes (i.e., without relays) and E={e 1 , e 2 , • • • , e L } represents the set of all L edges among such nodes.It is assumed that an edge exists between any two network nodes if their inter-distance is less than a threshold R. A K ×1 edge vector a l for an edge l (where l = 1, 2, • • • , L), connecting the nodes {n i , n j } has a n i = 1, a n j = −1, and the rest of its elements are zeros.Such edges are grouped in a K × L incidence matrix, denoted as A.
The incidence matrix A is used to compute its corresponding Laplacian matrix L = AA T , which is a positive semi-definite one, where T denotes matrix transpose.An SPD regularized Laplacian matrix can be created as where I is a K × K identity matrix and λ is a small scalar (λ = 0.5 in this paper.)Such regularized Laplacian matrices (or simply SPD matrices) can be modeled as points over Riemannian manifolds, thanks to their SPD structures.
Let S 1 and S 2 denote the SPD matrices of the baseline and a given relay-assisted networks topology, respectively.The LEM metric between two such SPD matrices S 1 and S 2 can be calculated as where .F is the matrix Frobenius norm operator.The LEM is the geodesic distance, which is calculated considering the curvature of the Riemannian manifold.In particular, the Riemannian manifold of SPD matrices are generally of the conic shape [33] as shown in Fig. 3, and the baseline network (i.e., without any relay), relay-1 enhanced network and relay-2 enhanced network, each is represented as a different point (A, B, C respectively) on the cone.LEM is used to measure the non-Euclidean distance between these points over the conic manifold.We aim to seek the point of the relay-assisted topology that falls furthest from the baseline topology to get the maximum LEM.In other words, we are interested to find the maximum LEM of a relay-assisted SPD point that deviates most from the baseline SPD one over the curved Riemannian surface.

III. PROBLEM FORMULATION
In this section, we present the problem formulation for maximum flow rate through relay positioning along with its mapping over Riemannian manifolds.

A. MAXIMUM NETWORK FLOW RATE
Let C = {c 1 , c 2 , • • • , c M } be a specific set of candidate relay locations.The goal is to find the optimal relay locations, among set C that maximize the network flow rate between any two nodes, namely, source s and any destination d.We note that selecting a relay at any given location c m creates edges between such relay and all network nodes that are within the connectivity threshold distance R from that relay.These relay-based additional edges are used to forward data packets to and from original network nodes.Effectively, relays create new edges between the original networks beyond what exists in the baseline network.The new set of edges in relay-assisted networks, which is generally a bigger set of edges than that of the baseline network, can be denoted as E(c m ).
From a source node s ∈ N , the maximum flow problem can be formulated as The last constraint f i,j = {0, 1} indicates that the flow rate between any two nodes (n i , n j ) ∈ N can be 1 if their inter-distance is less than threshold R, otherwise it will be 0. Thus, the average maximum flow rate of the network considering all available nodes as potential source nodes can be computed as 1 K s∈N f (s, c m ).Consequently and for our relay placement problem, the optimum relay locations c * are the solution of the following optimization problem

B. MAXIMUM RIEMANNIAN LOG-EUCLIDEAN METRIC
A mapping between maximizing the network flow rate in (4) and maximizing Riemannian metrics (e.g., LEM) can be established as follows.
On one hand, multiple research works in brain networks have utilized Riemannian metrics as bases for classification among connectivity patterns of different tasks [23] or persons [38].In other words, having a larger Riemannian metric on the manifold is a reflection of having distinguishable (i.e., non-overlapping) inter-regional brain paths.On the other hand and transitioning to multi-commodity data network, constructing multiple non-overlapping routes will lead to higher aggregated network flow.Therefore, and as we have shown in [8], selecting the relay position, which gives a point on the manifold that has maximum Riemannian metric towards the baseline point (i.e., the point representing the baseline network with no relays), will add new distinguishable or non-overlapping routes in the network, which will then lead to higher aggregated network flow rate.
Therefore, the optimum relays maximizing the network flow rate in (4) can be found through maximizing the LEM from the baseline network, which can be formulated as follows.
The two regularized Laplacian matrices S r and S b in (5) correspond to having a relay-assisted network topology and the baseline no-relay one, respectively, and D represents their LEM metric as in (2).In the following sections, we will briefly explain our solution approach to find these optimal relays towards maximizing the network flow rate.

IV. RMAB FOR OPTIMAL RELAY SELECTION
In this section, we propose the MAB learning model over Riemannian manifolds, or briefly RMAB, which is a RL approach that learns with time.The proposed RMAB algorithm for selecting the optimal relay locations given in ( 5) is summarized in Algorithm 1.To map the problem described in Section III into an RMAB one, we model the candidate relay locations C as the possible arms of an RMAB agent, which are denoted by a m,t for m = 1, 2, • • • , M at time t.For a dynamic network scenario, we assume that the network layout evolves with time with some slight changes in the location of the nodes from a given time index t to the next one t + 1.These time indices t = 1, 2, • • • T , basically represent the learning episodes of an RL algorithm [39], where the algorithm learns more after each episode and finally converges when the learning is appropriate.Every episode in the proposed RMAB model requires matrix computations that involve calculations of the incidence matrices.The observed reward r t at episode t is defined by the LEM distance D t between the baseline SPD point S b and the selected relay assisted SPD point S r,t

Algorithm 1
At a given time (or episode) t, playing the arm a m,t is equivalent to selecting the relay location c m,t .The maximum reward of the proposed MAB-based model r * t , for selected action a * t , is given by the Riemannian LEM metric D t between S r and S b as denoted in (5).
The algorithm applies its prior knowledge about each arm's reward to select the best arms (i.e., actions) for the next layout at episode t = t + 1.For a probabilistic representation of the prior knowledge of an arm's reward, we require a suitable prior information before taking an observation.We consider Thompson sampling (TS) algorithm, which is a Bayesian inference algorithm [24], to update the reward distribution of each relay's location according to [26] and [25].TS algorithm imitates the Bayesian prior of the expected reward as Dirichlet distribution with parameter α c m ,t , Dir(α c m ,t ).Such Dirichlet distributions are characterized as multivariate generalization of the beta distribution [40].The proposed RMAB model applies the TS approach and uses this reward distribution for playing the best arm (given in step 3.a) with highest probability from the available amrs.In particular, the RMAB model virtually samples from each arm's updated distribution using TS at every episode t, and selects the action according to: Modeling the Bayesian prior as Dirichlet also makes the Bayesian posterior, which is the reward distribution, a Dirichlet distribution according to [26].We assume a feedback about the reward is available that helps the algorithm to select its actions for the following episodes.Once a reward is achieved at step 3.b, the algorithm updates the distribution about each arm's reward using step 5.a -5.c.While considering the network to be varying because of the mobile nodes, we need the algorithm to continuously explore and keep track of the changes within the network.To this end, we introduce a forget factor γ 1 in step 5.a, that ignores the relevance from past occurrences and a boost factor γ 2 in step 5.b, that increases the impact of the most recent observations as outlined in Algorithm 1.
The proposed RMAB algorithm continues to iterate over the previous steps to select the next relay location and updates the reward r t accordingly.While considering an iterative approach for multiple relay selection, the selected relay in the previous step is considered as a new network node and generates a whole new network topology before the algorithm starts searching for the next relay position.For a fixed network topology where the network nodes do not change, the RMAB algorithm learns the same network at every episode t = 1 to t = T until it reaches convergence.In other words, for a fixed network assumption, every episode resembles the same network layout as opposed to the dynamic network case.

V. SELECTING THE BEST RELAY POSITION APPLYING RPSO
In this section, we turn our attention to another solution, namely RPSO, for selecting the optimal relay positions.Inspired by the behaviour of a flock of birds, Kennedy and Algorithm 2 Relay Selection Algorithm Using RPSO 1: Initialize a population of particles over M with random position and velocities 2: for iter = 1 to ψ do 3: for any particle ε evaluate fitness function, i.e., the negative of (2) 4: Compare particle's fitness evaluation with its pbest ε 4.a: if current fitness value is better than pbest ε , then 4.b: current = pbest ε , and 4.c: x ε = p ε 5: Identify particle with best success so far, and assign its index to variable g for gbest 6: Update v ε and x ε according to ( 9) and (10) 7: Exit loop if stopping criterion is met (a good fitness function or maximum iterations) 8: Quantize the RPSO output 8.a: Apply Cholesky decomposition using (11) 8.b: Compare adjacency matrices to find the new edges 8.c: Find the relay that satisfies maximum new connections 9: Go to step 1 for selecting next relay Eberhart first proposed the standard PSO technique [41], where a number of random entities, called particles, hover over the required search space of a given problem to minimize an objective function.Each particle uses some information from the history about its current and best (best-fitness) locations as well as those of other swarm members, along with certain random perturbations, to determine how it will travel through the problem's solution space [42] which is linear.Once all the particles have been moved, the algorithm advances to the next iteration.
Our proposed PSO solution on Riemannian manifold (or simply RPSO) is based on this standard PSO principle with the exception that the particles move through geodesic curved surfaces on the manifold, instead of moving in straight linear direction.The proposed RPSO algorithm is an analytical approach that optimizes over the Riemannian manifold based on the LEM metric, and outputs a relay-assisted topology with a relay position for which maximum network flow rate is achieved.The proposed algorithm for relay selection using RPSO is summarized in Algorithm 2.
Towards finding the optimal relays over Riemannian manifold, the task is decomposed among the local particles of the RPSO model.When the particles communicate, they exchange information about the task using geodesic lines.The particles fly along the tangent space to guarantee that they move along the Riemannian manifold.This is achieved with a projection of the velocity onto the tangent space T x M. Let − → x ε be the current position for particle ε, − → v ε is its current velocity, − → p ε is its current best position and F(x) is the fitness (i.e., cost) function that needs to be minimized.The current position − → x ε can be considered as a set of coordinates describing a point on the manifold M.
In our context of maximizing the flow rate utilizing Riemannian metric, we model LEM as the required cost function of the proposed RPSO algorithm, where the goal is to choose the maximum LEM distance of a relay-assisted SPD matrix from the no-relay baseline one.Since an optimization solution always minimizes the cost, we model the cost as negative of LEM as listed in step 3, which is equivalent to maximizing the LEM between the two SPD matrices S r and S b given in (5).
The algorithm evaluates the position of the particles (step 4.a -4.c) to find the solution in each iteration iter, where iter = [1, • • • , ψ].Throughout these iterations, the particles move along the given manifold and continues to share their data with the neighboring particles of the swarm.When one particle meets another as it moves, the two particles merge into one.Once the current position − → x ε is better than any of the previous ones found so far (step 4.a), then the current best is updated (step 4.b) and the coordinates are stored in − → p ε (step 4.c).The current best position − → p ε is given by [43] The value of the best function resulted so far is collected in a variable called pbest ε (for ''previous best''), and is used for comparison on later iterations.The findings from above steps are used to update the global best −−→ gbest (step 5), and calculate each particle's position and velocity (step 6).
The objective of the algorithm is to keep looking for better positions and update − → p ε and pbest ε .Let, S ε be a K ×K matrix at point − → x ε iter .Then the positions of the particles are updated according to where P ε − → v ε iter+1 is the projection of velocity − → v ε iter+1 for particle ε onto the tangent space [43].The algorithm iteratively adjusts − → v ε , which can effectively be seen as a step size [42].
where ϕ 1 and ϕ 2 are accelaration coefficients that control the influence of global best −−→ gbest and individual best − → p ε position's impact on particle's velocity and movement direction.R 1 and R 2 are uniformly distributed random vectors, which are used to maintain an adequate diversity in the swarm population [44].
The iterations of the proposed RPSO algorithm is simple and do not require large matrix calculations as opposed to the RMAB episodes.In particular, the RPSO algorithm at each iteration generates an SPD matrix and only calculates its LEM distance D from the baseline topology point.In addition, since the algorithm generates SPD matrices for optimization over continious space, the resulted relay location could be anywhere in the region of interest and hence it requires quantization to match the given discrete Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.relay locations.Recall that we only have some fixed possible locations to deploy a relay and it cannot be deployed at any random location.
We apply Cholesky decomposition on the output SPD matrix of the RPSO algorithm in step 8.a to retrieve its corresponding adjacency matrix.Cholesky decomposition is a powerful tool that decomposes a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose.The Cholesky decomposition for any SPD matrix S r is written as We compare the newly retrieved adjacency matrix A r with the baseline adjacent matrix in step 8.b to get the additional accumulated edges, which are then matched to find its most nearest relay position from the existing ones.More specifically, we utilize the additional edges to find the node that forms the highest number of edges with other nodes considering a relay, and pick the corresponding relay position from the available ones that satisfies the maximum additional edges with that node (step 8.c).Finally, the network flow rate considering that relay is calculated using equation (3).

VI. SIMULATION RESULTS
In this section, we present simulation results of the proposed LEM-based RMAB and RPSO model for finding relay locations that maximize the network flow rate.

A. SIMULATION MODELING
The simulation parameters considered in this paper are given in Table 2.The RMAB algorithm was applied using Python, while we used the MATLAB toolbox-MANOPT [45] to implement our RPSO algorithm.We show a typical network layout in Fig. 4, with randomly placed network nodes and the blue dots represent possible relay locations.A distance of 2 is considered as the threshold to construct an edge between two nodes and any distance beyond that indicates no connection.Initially, we assume twenty nodes and 16 possible relay locations in the experiment, but we will increase them later to validate the scalability of our work.In the dynamic scenario, these network nodes are allowed to move 10% within the 6 × 6 deployment area in any direction.Consequently, each time instant represents a new network layout with some spatio-temporal correlation with the previous one.

B. PERFORMANCE BENCHMARKS
We compare the performance of the two proposed solutions with a number of benchmarks.First, the maximum flow scheme is the optimal scheme that places a relay so that the maximum network flow rate is achieved.However, this scheme is of high complexity in the order of O(|N | |E| 2 ), where N and E are the number of nodes and edges, respectively, of such graph, and |.| is the cardinality of a given set of elements [46].Such complexity gets amplified with considering multiple locations of relays, as in the relay positioning problem.Second, the LEM-based exhaustive search conducts an exhaustive scan over all potential M relay locations until it finds the one with maximum LEM towards the no-relay baseline network.Third, we include another benchmark that considers algebraic connectivity (AC) [6] as a metric towards maximizing the flow rate.Each of the three aforementioned schemes is an exhaustive search one.Finally, the maximum flow of the baseline network, with no relays, will be presented and act as a lower bound.

C. TRACKING IN SINGLE-RELAY RMAB FOR DYNAMIC NETWORKS
Assuming a single-relay scenario, Fig. 5 demonstrates that starting the 4-th network change, the LEM-based RMAB approach achieves 99% LEM of the exhaustive search benchmark and starting the 8-th network change over time, the proposed LEM-based RMAB model is able to track exactly the best position of the deployed relay, similar to that of the LEM-based exhaustive search.In other words, the proposed RMAB solution is finding the best relay location in such dynamic network in only 8 episodes, as opposed to conducting an exhaustive search over all M = 16 potential relay positions.Therefore, our proposed LEM-based RMAB model requires half of the searches compared to the exhaustive search upper-bound benchmark.
The advantage of such LEM learning in Fig. 5 is translated towards the main objective of this paper, which is  maximizing the network flow rate in (4).As for the network flow rate, Fig. 6 demonstrates that for the singlerelay case, the proposed LEM-based RMAB learning model approaches the high complexity maximum flow based network starting the 4-th network layout.More specifically, it achieves 98% flow rate compared to that achieved by the maximum flow rate one [46].Finally, the proposed RMAB model significantly increases the network flow rate beyond the baseline benchmark (e.g., by 28% at the 15-th network layout) due to adding 1 relay only.

D. SINGLE-RELAY RPSO AND RMAB FOR STATIC NETWORKS
We show the flow rate for stationary network layouts in Fig. 7, where each network layout remains stationary throughout the process.As shown in Fig. 7 and for a single relay, the proposed LEM-based RPSO scheme achieves high network  flow rate and approaches the upper-bound benchmarks (e.g., 97% of the maximum flow and exhaustive search at the 8-th network layout).Similar performance is observed for the LEM-based RMAB solution.We also note that the proposed RMAB model achieves the flow rate of the LEM-based exhaustive search benchmark in 50% of the considered layouts.Also, the achievable flow rate of the proposed solutions are either similar or even better than the AC benchmark [6].Moreover, both of our proposed LEM-based solutions perform significantly better than the baseline lower-bound scheme.One of the key reasons for such an outperformance of the Riemannian based solutions is due to representing the entire network topology with a single SPD matrix point that enables faster and more accurate learning, facilitating the identification of optimal relay locations to maximize the flow rate.
On average and for the single relay case, Table 3 summarizes the average network flow rate of the proposed models with comparison to the other benchmarks.The results listed in Table 3 are averaged over 150 randomly generated network layouts.We show the percentage loss in flow rate of the proposed solutions in comparison to the optimal maximum flow based upper-bound.The LEM-based RMAB and RPSO model provide significantly higher flow rate than the baseline benchmark and approach the maximum flow upper-bound with only 5.7% and 9.4% loss, respectively, which is equivalent to 94.3% and 90.6% of the maximum flow rate.When compared to the baseline lower-bound, the LEM-based RMAB and LEM-based RPSO provide 37% and 31% gain in the flow rate, respectively.Additionally, we see that the RMAB scheme achieves 3% better flow rate than AC benchmark [6].
We also provide the required number of episodes (for RMAB) or iterations (for RPSO) for each of the proposed models to reach the optimal solution.As listed in Table 3, the proposed RMAB solution only requires an average of 6 episodes to track the optimal relay location.On other hand, the proposed RPSO approach takes an average of 126 iterations to find the same relay.Recall that the iterations of the RPSO solution only involve LEM calculations without requiring large matrix calculations as opposed to the episodes involved in RMAB.Therefore, this result shows the trade-off between less complexity iterations of RPSO and comparatively higher complexity episodes of RMAB scheme.However, the exhaustive search benchmark requires 16 complex calculations which are equivalent to the episodes, to find the optimal relay and is therefore, less efficient towards maximizing the flow rate.

E. SCALABILITY
Selecting multiple relays instead of a single relay further increases the overall network flow rate as demonstrated in Fig 8 .For any randomly selected network layout, the network flow rate of the proposed RMAB scheme increases by more than 42% for selecting five relays as compared to that of a single relay.Similar behavior is observed for the RPSO and other relay selection schemes.In addition, our RMAB and RPSO solutions can provide higher flow rate than the Euclidean based AC benchmark [6].For instance, RMAB and RPSO achieve 8.3% and 5.8% higher network flow rate, respectively, than AC for selecting five relays.While evaluating the multiple relay selection case in Fig 8, the relays were selected in an iterative procedure as was shown in Algorithm 1 and 2.More specifically, for selecting multiple relays, the previ- ously selected relay was assumed as a new node within the network before selecting the subsequent relay.
Fig. 9 demonstrates that our proposed solutions perform significantly higher than the baseline solution when the network is scaled to increased number of nodes.In comparison to the maximum flow based upper bound, our proposed RMAB and RPSO achieve 91% and network flow rate, respectively, for 40 nodes.

F. CONVERGENCE
Our proposed solutions are iterative approaches and they both converge when the algorithms satisfy a given stopping criterion.Fig. 10 demonstrates that the LEM-based RMAB algorithm converges at different episodes for different network layouts.For instance, the network layout-1 converges at the 6-th episode, where layout-2 and layout-3 require 4 and 7 episodes, respectively.However, the convergence is always reached before 16 episodes, which is the maximum number of searches required by the LEM-based exhaustive search benchmark.For example, when considering layout-2, the proposed LEM-based RMAB provides 75% saving compared to the exhaustive search benchmark.
For the proposed RMAB model, we establish the stopping criterion based on the stability of the rewards (i.e., the LEM metric) over consecutive learning episodes.Specifically, if the proposed RMAB model consistently generates the same reward for four consecutive episodes, the algorithm terminates and the corresponding relay location is chosen as the solution for the given network layout.This is a valid stopping criterion as it signifies the completion of learning and the successful identification of an optimal solution by the proposed RMAB model.
Similar to the RMAB approach, the convergence for LEM-based RPSO solution also requires different iterations depending on the network layout as shown in Fig. VI-F, but it may require more iterations than the RMAB episodes, as was explained earlier in Table 3.As explained earlier, the itera-   tions involved in the RPSO solution are computationally less expensive due to only LEM calculations compared to each of the 16 steps required for the exhaustive search benchmark, which involve matrix calculations.
The stopping criterion of the proposed RPSO model relies on the behavior of the cost function which is given by the LEM distance.When there is approximately no change (less than 0.1%) observed in the cost function, indicating that the proposed RPSO model has found an optimal relay-assisted SPD matrix point over the Riemannian manifold that maximizes the flow rate, the algorithm terminates.Consequently, the corresponding relay location is selected as the solution.One can compare the computational time between less complex iterations of RPSO and more complex matrix calculations involved in RMAB episodes.Fig. 12 demonstrates that the RPSO scheme requires less time to converge than the RMAB learning model, even though the required number of iterations for RPSO can be higher than the RMAB episodes (as given in Table 3).For example, at network layout-5, the LEM-based RPSO requires 70% less computational time than the LEM-based RMAB algorithm to reach convergence.The simulations for compuational time was performed using an Intel(R) i7-4770 CPU, 3.4 GHz and 16 GB RAM configuration.We only compare the computational time result with the LEM-based exhaustive search benchmark, which searches over all possible locations to find the optimal relay.The maximum flow based benchmark has the highest complexity and therefore, we do not show its performance in Fig. 12.
Table 4 indicates that the computational time of the proposed RMAB and RPSO model are increased by only 3.7% and 3.2% respectively, for increasing the node from 20 to 40.Therefore, our proposed solutions are faster to converge than maximum flow based high complexity algorithm or LEM based exhaustive search upper bound benchmarks.

G. NETWORK CONNECTIVITY
While higher network flow rate is the main focus of this paper, the robustness of networks measured in terms of its connectivity degree, is also of great importance.The AC benchmark applied in previous studies [6], which targets to maximize the connectivity degree, is defined as the upper-bound towards achieving this goal.Consequently, we compare the robustness of our proposed solutions with this AC benchmark in Fig. 13.
As shown in Fig. 13, our proposed LEM-based RMAB and RPSO model can provide higher connectivity of the network, and consequently, achieve better robustness compared to that of the maximum flow based algorithm.Also, both of the proposed relay selection schemes have smaller gap between its own connectivity and the AC upper-bound.For example, for the 3-rd network layout, the LEM-based RMAB and RPSO solution achieve 89% and 98% connectivity of the AC based one, respectively.Hence, in addition to achieving high network flow rate (shown in Table .3), our proposed LEM-based solutions provide a balanced robustness to the network (shown in Fig. 13) while selecting the optimal relay.Moreover, the connectivity of both of our proposed solutions are significantly higher than that of a no relay based baseline lower-bound benchmark.
For dynamic networks where the nodes move 10% in the deployed region, the connectivity of the proposed LEM based RMAB solution mostly falls within the algebraic connectivity upper bound and the maximum flow based benchmark as shown in Fig 14 .As a result, for dynamic networks, our proposed RMAB reinforcement learning solution again provides a balanced robustness in addition to achieving higher network flow rate.

H. SENSITIVITY ANALYSIS
We show the impact of varying forget factor γ 1 and boost factor γ 2 in Fig. 15 on the flow rate performance of our proposed RMAB scheme.For varying γ 1 only in Fig. 15(a) keeping γ 2 = 5, we notice that the worst flow rate is observed for very small γ 1 , such as γ 1 = 0.05.On the other hand, when γ 1 is very large, for example γ 1 = 0.7, the RMAB algorithm again achieves poor flow rate performance.This is because when the forget factor is too small, the algorithm forgets the initial information almost instantly and when it is very large it keeps selecting the same relay every time and cannot adapt to the changing network dynamics.At γ 1 = 0.3, our proposed RMAB relay selection scheme makes a balance of the exploration-exploitation trade-off achieving better flow rates.
Fig. 15(b) depicts that except for γ 2 < 4, the flow rate performance of the RMAB model remains almost similar due to its additive effect [25] as opposed to the multiplicative effect of γ 1 as was illustrated in Algorithm 1.

I. LIMITATIONS AND FUTURE RESEARCH DIRECTION
At this point, we intend to highlight a few limitations of our work along with some future research directions.We have employed an iterative relay selection approach for the selection of multiple relays.In doing so, we have treated the previously selected relay as a node before determining the subsequent relay location.Although this approach has effectively served our purpose and has achieved higher network flow rate, there is a need for a solution that addresses simultaneous relay selection for scenarios involving multiple relays.Hence, our future objective is to develop an algorithm that enable joint selection of relays and subsequently compare the performance with our sequential relay selection models.
Furthermore, we have utilized LEM to identify the maximum distance between two points over the Riemannian manifold.While our primary focus in this study did not involve a comparative analysis of various non-Euclidean distance metrics, it is indeed a valuable contribution that warrants separate research based on its own merits.Therefore, an intriguing avenue for future research lies in evaluating our performance against alternative non-Euclidean metrics such as stein divergence, affine invariant metric, and others.

VII. CONCLUSION
In this paper, we have represented networks topology over Riemannian manifolds, thanks to their underlying symmetric positive definite (SPD) data structures.Consequently, we have proposed two different approaches towards maximizing the flow rate via optimal relay selection.First, we employed a multi-armed bandit (MAB) learning model over Riemannian manifold to find the relay positions that maximize the network flow rate.Log-Euclidean metric (LEM) was utilized as the reward function for the proposed Riemannian MAB (RMAB).We have shown that the proposed LEM-based RMAB model approaches the maximum flow rate within the first few learning episodes.Furthermore, we have shown that the LEM-based RMAB increases the network flow rate by more than 37% due to adding 1 relay alone, which is equivalent to 94.3% of the maximum achievable flow rate.Second, we have proposed a LEM-based Riemannian particle swarm optimization (RPSO) solution that applies an optimization algorithm over the Riemannian manifold to find the optimal relay.We have shown that the RPSO model also achieves 90.6% flow rate of the maximum achievable one.We have shown that the proposed LEM-based solutions converge when a given criterion is fulfilled.Finally, we have simulated the connectivity degree of the new network layouts given by the proposed solutions and validated their robustness with other benchmarks.

FIGURE 1 .
FIGURE 1.A non-Euclidean manifold M with tangent plane T χ M at point χ, which is a collection of all tangent vectors ν.

FIGURE 2 .
FIGURE 2. A network where mobile relays are placed to increase connectivity paths.

FIGURE 4 .
FIGURE 4. A network graph with its nodes and corresponding edges.An edge between two network nodes is formed if their inter-distance is less than R. The blue circles represent possible relay locations.

FIGURE 5 .
FIGURE 5. Proposed RMAB model tracks the achievable LEM metric as that of the LEM-based exhaustive search in a dynamic network with single relay.

FIGURE 6 .
FIGURE 6. Network flow rate of the proposed LEM-based RMAB model in a dynamic network with single relay, and its relative performance with other benchmarks.

FIGURE 7 .
FIGURE 7. Network flow rate of the proposed solutions for fixed network layouts with single relay, with comparison to other benchmarks.

FIGURE 8 .
FIGURE 8. Network flow rate of the proposed solutions for multiple relays, with comparison to other benchmarks.

FIGURE 9 .
FIGURE 9. Network flow rate versus number of nodes.

FIGURE 12 .
FIGURE 12. Required time to converge for the proposed LEM based solutions.

FIGURE 13 .FIGURE 14 .
FIGURE 13.Network robustness in terms of connectivity for the proposed LEM-based solutions with comparison to other benchmarks.