Machine Learning for Relaying Topology: Optimization of IoT Network with Energy Harvesting

In this paper, we examine the internet of things system which is dedicated for smart cities, smart factory, and connected cars, etc. To support such systems in wide area with low power consumption, energy harvesting technology without wired charging infrastructure is one of the important issues for longevity of networks. In consideration of the fact that the position and amount of energy charged for each device might be unbalanced according to the distribution of nodes and energy sources, the problem of maximizing the minimum throughput among all nodes becomes a NP-hard challenging issue. To overcome this complexity, we propose a machine learning based relaying topology algorithm with a novel backward-pass rate assessment method to present proper learning direction and an iterative balancing time slot allocation algorithm which can utilize the node with sufficient energy as the relay. To validate the proposed scheme, we conducted simulations on the system model we established, thus confirm that the proposed scheme is stable and superior to conventional schemes.


I. INTRODUCTION
Internet of Things (IoT) technology will bring enormous innovations for societal and industrial systems in terms of improved efficiency, sustainability, and safety. By exchanging many types of information such as traffic, energy usage, and environment data (e.g., temperature, humidity), IoT devices can create smartcities, connected industries, connected vehicles and integrated health care services [1]. As these application services must have numerous wireless devices with scalability and a long lifespan, there is an emerging need for easy-to-maintain wireless networks with a simplified topology. In addition, networks such as these do not require a high data rate for transmitted data, but require several different characteristics, such as frame sizes to be in the order of tens of bytes, intermittent transmission to be a few times per day, ultra-low speeds with an insensitive delay requirement, and mostly uplink-centric transmission. These requirements for an uplink-centric, low data rate and maintenance free network result in wireless-powered star-ofstars network technologies where a new air interface provides an energy-efficient relaying alternative to available wireless The preprint of this article may be found in: http://arxiv.org/abs/2301.08481 network systems while covering larger areas.
Recently, a study on a relay-based star-of-stars topology for energy efficiency was conducted to cover a large area [2]. [2] proposed a cooperative multi-hop transmission scheme in wireless sensor networks considering the circuit energy consumption, and proved that the energy consumption per unit transmit distance can be minimized. As a new method for long-term operation without maintenance, research on energy harvestingbased IoT networks are also being actively conducted [3].
In the energy harvesting-based IoT research area, there are some issues where the amount of energy charged by each node is different according to the distribution of the energy sources (e.g., power beacons (PBs)) and nodes, with a varying amount of residual/remaining energy in the devices. [4] studied the resource allocation in energy harvesting (EH)-enabled Long Range (LoRa) networks with external energy sources such as power beacons. They solved the problem of regional energy distribution by the sub-optimal spreading factor and power allocation algorithm. Moreover, [5] proposed a clustering algorithm for grouping sensor nodes based on the energy distribution of the remaining nodes. These studies maximize the energy efficiency of the network through scheduling under the distribution of energy. Furthermore, [6] proposed a distributed protocol maximizing the minimum sensing rate of sources operating in energy harvesting-based IoT networks. They suggested a method for determining nodes to be charged, transmission power and charging duration of a PB, and routing of data by nodes and link scheduling.
However, prior studies have yet to considered that, if the role of the node is not fixed, that is, if it is allowed to operate as a relay optimally adapting to the change of the energy distribution, energy efficiency could be improved. More specifically, if a certain node is close to PBs, it may have more harvested energy. Moreover, nodes closer to the sink have a lower required energy for transmission; conversely, nodes farther away from the sink have a higher required energy. Accordingly, if some nodes with sufficient energy can transmit data on behalf of nodes with lower energy, the efficiency of the entire network could be maximized. However, these kinds of routing problems are usually formulated as a max-min mixed integer optimization problem, which hinders solving the optimal solution in polynomial time due to its NP-hardness and non-linearity of the problem [7].

A. RELATED WORKS
Recently, deep learning based techniques have been applied to various wireless network problems in [8]- [10]. Unlike traditional solutions based on mathematical models, deep learning provides the solution without the complex manipulation of mathematics. The neural network itself is trained to produce the optimal solution based on the given data or learning direction. Applying deep learning to existing communication networks can largely be classified into supervised learning or unsupervised learning, depending on whether there is a data set or not. There has been one recent work on bringing network topology graph information into learning models in the communication community. [11] proposed a network routing method based on supervised deep learning techniques which shows adequate results on simple topologies, but is less effective on complex ones. In addition, the application of the deep learning for handling on graphrelated problems was examined in [12], based on supervised learning. In addition to topology problems, deep learning is being applied to many communication studies. [13] designed an efficient deep neural network framework and a novel training strategy for wireless-powered secure communication with nearoptimal performance and a low computation time. As can be seen, the solution using supervised learning is achieving modest results, but it its premise is based on the possession of a vast labeled dataset for learning.
Unfortunately, since the routing topology problem defined above is NP-hard and non-linear, the labeled dataset for supervised learning is not easily obtainable. Unsupervised learning does not require a dataset, making it generally important to design a loss function or rewards that can set the correct direction for learning. Deep Reinforcement Learning (DRL), one of the unsupervised learning techniques, is about an agent interacting with the environment: learning an optimal policy by trial and error for sequential decision making problems. In a given communication problem, if the environment, state, reward, and action are properly modeled, the solution for the problem can be found through DRL without a dataset. Therefore, the optimization level of the solution depends on how accurately the environment, states, reward and action are modeled. Usually, such modeling requires considerable skill as the problem becomes more complex. Recently, [14] proposed an energy-efficient fair communication through trajectory design and band allocation (EEFC-TDBA) which allows an unmanned aerial vehicle (UAV) to adjust the flight speed and direction to enhance energy efficiency and allocate a frequency band to achieve fair communication service. The DRL algorithm for solving the latency minimization problem for both communication and computation in a maritime UAV swarm mobile edge computing network was suggested and analyzed in [15].
Additionally, the variational autoencoder (VAE), first proposed by [16], could be an unsupervised learning technique that can avoid the complex modeling of DRL and can solve the NPhard and non-linear problem in an unsupervised manner [17]. VAEs are probabilistic generative models that require neural networks as the encoder and decoder for the first and second component, respectively. The encoder maps the input variable to a latent space that corresponds to the parameters of a variational distribution. The decoder has the opposite function: to map from the latent space to the input space to produce or generate data points. Configured in this way, the encoder and decoder can generate multiple different samples that all come from the same latent distribution. Therefore, the applications of VAEs are usually found in generating data for speech, images, and text. For example, [18] implemented a chemical molecule graph generative model using VAE, though its application was limited only to smaller graphs and required a dataset for training, making it difficult to apply directly to the NP-hard problem.
In the communication field, VAE has been applied in the following recent studies. [19] analyzed the VAE in terms of performance and flexibility over a classical additive white Gaussian noise channel with inter-symbol interference and over a dispersive linear optical dual-polarization channel, showing that it can extend the application range of blind adaptive equalizers. A probabilistic model based on VAEs was proposed for short packet wireless communication systems in [20], where the information messages are represented by the so called packet hot vectors which are inferred by the VAE latent random variables.
However, the generative capabilities of the VAE decoder can be used as a novel method to solve optimization problems. After appropriately reflecting the optimization problem in the loss function for learning the decoder, the solution for the optimization problem can be obtained by applying various latent inputs during learning. Therefore, in this paper, we apply the VAE to solve the NP-hard and non-linear IoT problem defined above.

B. CONTRIBUTIONS AND ORGANIZATION
The main contributions of this paper are threefold: • First, we propose a VAE-based scalable and unsupervised machine-learning scheme that can determine the dynamic relay topology under the regional distribution of nodes and energy, thus overcoming the need for a labeled dataset or the lack of scalability of the model shown in the previous works using supervised learning. • Second, we propose a novel backward-pass based rate evaluation method, called "Packet-Tracing"(PT), which can properly and concisely assess the output of our VAE scheme, thereby giving a proper direction for training without the need of a skilled and specified agent and environment modeling, which is inevitable in the previous works using DRL. • Lastly, we propose an iterative balancing (IB) algorithm for time slot allocation over a time-division multiplexing access (TDMA) based system, which ultimately gives the solution on both topology planning and time slot planning for the formulated max-min optimization problem regarding fairness.
The remainder of this paper consists of the following. In Section II, we define our system model and formulate a maxmin optimization problem: our problem to solve. In Section III, from the problem formulation, we present a topology algorithm based on a VAE scheme and our novel backward-pass based rate evaluation method, and finally, a time slot allocation algorithm that maximizes the max-min fairness of nodes under a TDMA system. In Section IV, we present operation details of our proposed scheme and perform numerical simulations and analysis which confirm the sub-optimality of our proposed scheme, leading to our conclusions in Section V.

A. SYSTEM MODEL
In this subsection, we describe the system model. Here, the uplink transmissions in an IoT sensor network during the time frame T is considered. In our system model consideration, N d IoT devices (shortly, nodes) are uniformly distributed in a circle of radius R centred around the sink. Let us denote N d = {1, 2, ..., N d } as the set associated with nodes.
By indexing the packet sink with N d + 1, we can define N = N d ∪ {N d + 1} as the communicating node set. In addition, a TDMA system is assumed in our system model. TDMA is an effective system to alleviate the performance of nodes placed in an inferior power/position environment [21], by allocating more time slots within the uplink frame by taking spare time slots from the nodes placed in a superior power/position environment. The considered time frame T is divided into N d time slots and allocated to each IoT device, whose time slot for each node n is indicated by t n ∈ (0, T ). During t n , node n has the opportunity to transmit.
Next, we consider energy harvesting (EH) for our system model. In our model, each node is self-powered by harvesting energy from PBs and stores the remaining energy in a rechargeable battery with limited capacity. Power beacons(PB) of energy can be of any type (solar, RF, or others), and only one condition applies if the RF source is considered, whose frequency band should be different from the communication band used. This condition alleviates the possibility that the energy harvesting signals cause additional interference to the sink. In our problem, PBs are also randomly located in the cell with radius R, whose transmit power is defined as P b . Let us denote N b = {1, 2, ..., N b } as the set associated with PBs. An example of the network is presented in Fig. 1, consisting of 2 PBs and 25 nodes around the sink. In this paper, we assume that the distribution of nodes and PBs follow a uniform distribution for simplicity. However, other practical distributions considering urban, indoor or transportation environments are also applicable.
We also consider RF as the power source type in our system model. In this case, the harvested energy of node n which is sent from the PBs can be defined as where d i,n is the distance between node i and node n, h i,n is the channel between node i and node n which follows a complex Gaussian distribution with zero-mean and unit variance, such that h i,n ∼ CN (0, 1) and α is the path-loss exponent. In our system model, we consider a discrete time block-fading model, where the channel state is constant for a time frame T [22], and also assume that the instantaneous channel state information for h i,n is available. Then, the harvested energy at each node depends on which EH model is considered: either linear or nonlinear [23]. One possible nonlinear EH model to consider is the sigmoidal model [4], which was shown to fit well with the experimental results. In this paper, a linear model is considered for simplicity, but nonlinear models can also be applied. When the linear model is considered, the harvested energy E n is given by E n = ηE rec,n , where η ∈ [0, 1] is the conversion efficiency.
We also assume that the harvested energy during T is consumed for signal transmission in the t n time slot in the next T time window, where t = {t n |∀n ∈ N d }. Then, the transmit power can be expressed as P n = En tn . 1 In addition, in the TDMA system, the signal-to-noise ratio from a transmitting node n to a receiving node n , Γ n,n , can be expressed as where N is the noise power. Moreover, we define c = {c n,n |∀n ∈ N d , ∀n ∈ N} whose element is c n,n ∈ {0, 1} to represent the connection of the relay topology where c n,n = 1 if node n transmits its data and its child nodes' data to node n ; otherwise, c n,n = 0. Using the definition of c and Shannon's capacity, the amount of bits per frequency (bits/Hz) that node k could transmit is defined as n ∈N,n =k c k,n t k log 2 (1 + Γ k,n ), and the amount of bits/Hz k n' n+1 n n+2 Illustration of relaying data from lower nodes to a higher node that node k receives is expressed as n∈N d ,n =k c n,k t n log 2 (1+ Γ n,k ) (as shown in Fig. 2). Then, the difference between the two could be considered as the amount of bits/Hz the node k itself can transmit, denoted as B k (c, t), and can be expressed as follows (2)

B. PROBLEM FORMULATION
In this subsection, we formulate the optimization problem based on the system model described above. The optimization problem for maximizing the minimum of B k among all nodes, N d , can be formulated as follows: n ∈N where C is the extended adjacency matrix of topology, whose (4) guarantees that the sum of t n for all nodes is T . (5) indicates that the outward link of each node is only connected to one of the other nodes including the sink. (6) means that every node is connected to the sink in the end. 2 This optimization problem is NP-hard; therefore, an exact algorithm demands enormous computational effort. However, by decoupling the problem into two sub-problems, we can transform the original problem with c and t to the single variable problem with c.
Since the result of the IB time slot allocation algorithm proposed in the next section satisfies max k B k (c, t) = min k B k (c, t) under the given c, we can always find t satisfying (P1-1) by using this algorithm. In other words, t could be considered as a variable determined by c. If we define the optimal value of (P1-1) as B IB (c), the original problem (P1) could be expressed as below: Then, problem (P2) is equivalent to problem (P1). We can also ignore the constraints (10) and (11) since c does not satisfy those constraints, and gives B IB (c) ≤ 0. Unfortunately, for solving (9), no computationally efficient method could be applied without involving exhaustive search which has a time complexity of . To avoid such time complexity limitation, we propose a VAE based Machine-Learning algorithm in the next section.

III. DESCRIPTION OF THE ALGORITHM
In this section, we propose a VAE scheme to find c * and use our novel PT algorithm to assess it. We also propose the IB time slot allocation to find t * , thereby obtaining the sub-optimal solution over both c and t.

A. TIME SLOT ALLOCATION ALGORITHM
In this subsection, we propose an IB time slot allocation algorithm which derives the optimal t satisfying (P1-1) under a given c, thus maximizing B min . Firstly, we define the hyperparameter for algorithm, 1 and 2 , which are the upper bound of the difference between B max and B min and the minimum allocatable time slot, respectively. Additionally, the condition 2 T and B k (c, 2 2 ) < 1 for ∀k ∈ N d must be met to guarantee the convergence of our algorithm.
The key principle of the proposed algorithm is to repeatedly reduce the difference between B max and B min . First, under the initial t assignment, we find i * = arg max k B k (t) and j * = arg min k B k (t). Then, the bisection algorithm is repeatedly applied to make the amount of bits/Hz between node i * and j * , δ 2 , less than 1 , or to make the time slot to be delivered between node i * and j * , ∆/2, less than 2 . The bisection algorithm then sets ∆ = ∆/2 and delivers the time slot between node i * and j * by reducing the time slot of the node which has a superior B k (t) by ∆ and instead allocates that reduced ∆ to the node which has an inferior B k (t).
When the bisection algorithm of the two selected nodes is finished, the same procedure is repeated by obtaining the newly founded node i * and node j * under the condition depicted above. This procedure is performed until if δ 2 > 0 then 10: becomes less than than 1 . A detailed description of the process described is given in Alg. 1. Furthermore, the convergence proof of Alg. 1 is given as follows. Proof. Let us define the time slot allocated to the node i and the amount of bits/Hz of node i at the beginning of the k-th iteration as t i,k and B i,k = B i (t i,k ), respectively. In this notation, B max,k and B min,k at the beginning of the k-th iteration can be written with B i * (t i * ,k ) and B j * (t j * ,k ), respectively.
Conversely, termination of the inner loop consists of lines 7 to 18 in Alg. 1 indicating that the condition ∆ ≤ 2 2 or the condition 1 ≥ |δ 2 | has been met. If at least one iteration of the inner loop has been performed, we can find such proper positive constant σ among all outer loop iterations that satisfy σ < min(B max,k − B i * ,k+1 , B j * ,k − B min,k+1 ) and σ < 1 according to lemma 1. Since ∆ > 2 is always guaranteed, it eventually guarantees the existence of constant σ and N δ < N/2 satisfying δ 1,k ≥ δ 1,k+N δ + σ for ∀k in the total outer loop iteration count domain.
Thus, in this case, the two expressions δ 1,k ≥ δ 1,k+N δ +σ and δ 1,k ≥ 0 are always satisfied for ∀k and constant ∃σ > 0 and ∃N δ < N/2 eventually leads to the satisfaction of the condition δ 1,k ≤ 1 for some k.
However, such termination condition of the inner loop could be met at the beginning of the inner loop iteration; hence, the iteration of the inner loop may not be performed. Nevertheless, neither condition can be met at the beginning of the first iteration of the inner loop.
Firstly, satisfying the former condition ∆ ≤ 2 2 means that t i * ,k ≤ 2 2 since we set the initial ∆ = t i * . Subsequently, the inequality is given. Next, satisfying the latter condition 1 ≥ |δ 2 | implies that 1 ≥ δ 1,k has been met, since we set the initial δ 2 = δ 1,k . Therefore, it can be shown that both cases already satisfy the termination condition of the outer loop, eventually satisfying the proposition.
n,n N and P n = En tn , respectively. This satisfies 2 ≤ t A − t B and c n,n = 1.
Thus, it is sufficient to set the positive constant σ satisfying the lemma above as B i * (T + 2 ) − B i * (T ).

B. MACHINE LEARNING BASED TOPOLOGY ALGORITHM
In this subsection, we propose a machine-learning based VAE model and a backward-pass based rate evaluation model called the Packet-Tracing evaluation model (PT-EVM), inspired by the ray tracing method used in 3-D graphics rendering. Our proposed model to find c * consists of two parts: firstly, deriving c * using the VAE part and assessing the derived c * with the PT-EVM part, and secondly, training the VAE part with the assessment from the PT-EVM part.
Since (P2) has non-linearity and discontinuity characteristics due to the nature of the Shannon-capacity formula and the mixed-integer problem, it is difficult to find the sub-optimal c * using analytical methods. Given these considerations, we propose machine-learning as a method to overcome these limitations. However, there are some restrictions in applying supervised-learning in our formulated problem. The dataset used for supervised learning, such as a pair set of positional distributions among nodes and PBs and optimal topology, cannot be obtained due to the characteristics of our formulated problem. Moreover, it is difficult to build such dataset in an exhaustive way due to the exponential growth of computational time with the number of nodes.
One alternative to consider is a reinforcement learning method that retrieves rewards from interacting with a physical network by applying resulting topology from the algorithm to the physical network, then analyzes real achievable rates as a reward. However, this method has some drawbacks. When a VOLUME X, 2023 model is in the learning phase, topology output is imperfect, which is an essential procedure for receiving a reward to use in reinforcement learning. As a result, initial performance degradation while learning in practical use is inevitable. Moreover, as depicted in Section I, the IoT network environment has ultralow speed intermittent transmission properties, meaning that there is a very low frequency of rewards which can be used in reinforcement learning, thus maximizing the drawback depicted above. These drawback make applying reinforcement learning improper.
Taking these limitations into consideration, we propose a VAE based generative model which can be used in an unsupervised learning manner. Typically, generative models are used to resemble an observed dataset and embed the dataset to a highdimensional space, called the latent vector space, thus gaining the ability to generate new data points based on observed distribution by simply modifying the newly latent vector. Considering this, our generative model and unsupervised-learning hybrid scheme slightly modifies these original structures, which explores latent vector space and its mapped data point space to derive the sub-optimal solution c * . However, to realize the proposed concept depicted above, a proper assessment of the resulting output c is necessary, which enables exploration of the latent vector space and derivation of the sub-optimal solution. Therefore, a simulation-based assessment for the resulting topology is essential, even though it requires both brevity but sufficient detail to provide precision network modeling. However, conventional network modeling and simulation (M&S) tools such as NS-3 or the Riverbed Modeler requires excessive computing overhead since they simulate the entire network operation, such as buffer management, TCP/IP protocol procedure, wireless channel emulation, packet switching among interacting interfaces, etc. Such simulation overhead is unnecessary since our assessment requirement is sufficient to examine the achievable rate over nodes with regards to the congestion and distribution of PBs and nodes. One imaginable solution is to subtract the sum of the inbound capacity from the sum of the outbound capacity on the basis of the definition of Shannon's capacity, depicted as (2). Unfortunately, this naive solution cannot consider how much of the rate(packet) will be inbound to a specific node and how much of the rate(packet) can be handled in the destination node regarding congestion.
Since these limitations and requirements cannot be fully reflected through typical solutions, we propose a PT algorithm to enable congestion-aware rate assessment. In our PT algorithm, we sequentially assign the achievable rate of the node starting with the packet sink and towards the backward-pass manner, like the ray-tracing algorithm in 3-D graphics rendering. Raytracing works by tracing a path from a viewpoint (camera) to an object in virtual 3-D graphics space, and the ray propagates under photonics simulation, eventually hitting the light source. Finally, the ray shows the valid path for the light to propagate [25]. Since the ray-tracing algorithm does not consider rays which don't reach the camera, the computation cost is drastically reduced. Similarly, the backward-pass property of our packet-tracing algorithm provides both feasible and concise rate assessment similar to that of the ray-tracing algorithm.
As shown in Fig. 3, the proposed unsupervised-learning model consists of two parts: mainly the VAE part and the PT-EVM part. To summarize, this proposed concept can be implemented by entering a random latent vector input into VAE, assessing the output c, and updating the parameters of VAE using these assessments iteratively. This procedure eventually focuses and maps prior latent vector space to the sub-optimal solution data point and draws c * without the dataset necessary for supervised learning. A detailed description of each part of scheme is depicted in the subsections below.

1) VAE structure
The VAE part generates a topology for a given latent vector input. The left side of Fig. 3 depicts the VAE structure as 4 fully connected (FC) layers and 3 rectified linear unit (ReLU) layers between each FC layer, followed by a 2Dmap layer and softmax layer.
Firstly, in the FC layer, each input element x i is processed into the output element y j by the following equation: where i and j are the input and output element index, respectively. N I , N O , w, and b are the number set of the input element, output element, weights, and biases of the layers, respectively. For each FC layer, the number of output elements is set in an arithmetic sequence manner, whose number of output elements of the last layer is fixed to N d (N d + 1) for topology mapping (i.e., each node in N d can select the upper link among N).
Between the FC layer, an ReLU layer is used as an activation function to provide handling on the non-linear property of the formulated problem. In the ReLU layer, each input element x is processed into the output element y by the following equation: Next, a 2DMap layer is considered as a means of mapping the FC layer output to adjacency matrix form. The 2DMap layer maps the 1-Dimension input vector x totaling a N d (N d + 1) element count to N d by the N d + 1 adjacency matrix, denoted as y i,j , as follows: Finally, a softmax layer is applied to make the sum of each sequential row equal to 1, so that the total outward connectivity summation for each node is equal to 1. Furthermore, the softmax layer functions as an activation function, thus enhancing the ability to handle the non-linear property. The softmax layer operates for the input element x i and the output element y i in each sequential row, as follows: Therefore, the VAE part results in an adjacency matrix c as the final output, which determines the entire topology connectivity configuration as described above.

2) PT-EVM structure
The PT-EVM part evaluates the resulting topology from the VAE part. The right side of Fig. 3 depicts the PT-EVM structure consisting of the PT algorithm module to achieve the rate required for training loss function calculation, and the postprocessing module to post-process the inference result.
Similar to the ray-tracing algorithm, our PT algorithm backpropagates and assess the achievable rate budget in a backward process, starting from the packet sink. A detailed description of the procedure of our PT algorithm is as follows. Firstly, in line 9 of Alg. 2, the focused node (i.e., a packet sink in the initial step) starts the algorithm with the granted rate budget (i.e., ∞ in the initial step) given in the previous step. Next, the focused node takes its share of rate budget, R self , which depends on the total inbound connection status and rate budget (line 12). We turned off auto-differentiation calculation when calculating I, B, R self , thus preventing the assessment of the other nodes to be affected.
Subsequently, in line 17, the focused node gives its remaining rate budget for inbound nodes. Basically, each inbound node budget is set to their link rate. However, if the remaining budget of the focused node, B − R self , is less than I, this indicates that congestion is happening on the focused node. In this case, each inbound node budget is reduced in proportion to B−Rself I . Then, the next unallocated node is called respectively in line 20 and these steps are repeated until a termination condition (e.g., remaining allocatable rate budget drops below a certain threshold, B th ) is satisfied. This recursive algorithm spreads out through the inbound connections, and the sum of all granted rate budgets over various packet-rays pass a specific node, it implies the achievable rate of the node. With this manner of calculation, we can guarantee that all granted achievable rates can be reached into the packet sink since we calculated in backward-pass from the packet sink, like in the ray of raytracing.
After finalizing the backward calculation of all packet-rays, we calculate the net packet-ray for all edges of the topology since a single packet-ray does not consider other packet rays; thus, net rate calculation is essential to PT-EVM modeling (lines 5-7). Alg. 2 depicts the implementation details of the proposed packet-tracing algorithm, where I, B, and R self are the sum of the inbound rate, the granted rate budget through the packet-ray, and the allocated rate of the called node, respectively. t is the time slot scale factor for the simulation algorithm and B th is the rate budget threshold for determining termination condition. After assessing the achievable rate of each node through the PT-EVM part, we calculate the training loss function using the calculated achievable rate, and use its value in our unsupervised learning for the VAE part. Since the PT-EVM part performs auto-differentiation tracing along the calculation, we can use the output of the PT-EVM part into training the loss function calculation for the VAE part and use it directly as a conventional neural-network training method (e.g., stochastic gradient descent). Our training of model is specific to the given topology VOLUME X, 2023 which enables compatibility over an arbitrary node and PB configuration (e.g., the number of nodes/PBs and the positional distribution of nodes/PBs). The unsupervised learning proceeds by minimizing the training loss function value L which is depicted as, where (R sim ) i is the rate of node i calculated through the PT-EVM part. Note that the training loss function we propose does not indicate the degree of similarity or sagging with the optimal answer, but is only a performance indicator evaluated by the PT structure. That is, the training loss value cannot be zero, as such situation implies that (R sim ) i = ∞, ∀i ∈ N d . Also, optimal training loss may vary for various IoT network settings (e.g. the optimal training loss L opt may have a value of about 0.8 while the optimal solution of the target IoT network configuration has (R sim ) i = 0.2, ∀i ∈ N d ).
Additionally, since algorithms learn through interactions that occur in a specific target IoT network configuration, there is no separate training or validation dataset in this methodology: there are no applicable validation loss metrics for this methodology. Alternatively, to validate the training epoch without validation loss, the actual performance indicator(B min ) was tracked during the training phase in our analysis described below.
Next, the parameters of the VAE part are updated towards minimizing the training loss function (16) using the Adaptive Moment Estimation (ADAM) optimizing algorithm [26] as follows: where θ, t, m, v,m, andv are the learnable parameters (such as weights and biases in the neural network), timestep, biased first moment, biased second raw moment, bias-corrected first moment, and bias-corrected second raw moment, respectively. κ, β 1 , β 2 , and are the hyper-parameters for the model, which means the learning rate, decay rate for the first moment, decay rate for the second raw moment, and the small scalar value used to prevent the divide by zero error, respectively. [27] has shown that an arbitrary simple prior latent vector distribution can be converted to a specific target manifold distribution since the first few layers of the VAE structure adequately provide such conversion. Thus, we use prior latent vector distribution as the uniform distribution, U(0, 1), for each element during training, and measure the training loss function for the sampled result. We then finally update the parameters of the VAE structure using the loss function measured. Through these iterations, the VAE structure maps the simple prior latent vector distribution to the specific point of the latent space which embeds the best topology that VAE can provide. It then maps that specific latent space point to the best topology.  After training, we obtain our final inference result for a specific topology by post-processing the results as follows. The post-processing step finds the max index (j * ) in an adjacency matrix over a row, which is defined by C i,j * ≥ C i,k , ∀k ∈ N−{j * }. Then, it zeros out other outbound connections that are inferior to one which has the maximum connectivity, satisfying the condition C i,j ∈ {0, 1}, i ∈ N d , j ∈ N. This postprocessing step forces the topology result to have only one outbound connection. Such computation can be described for i ∈ N d as below:

IV. PERFORMANCE ANALYSIS AND DISCUSSIONS A. TEST ENVIRONMENT AND PARAMETER CONFIGURATION
To evaluate the performance of our proposed method, we performed and evaluated simulations using the following target system model parameters. First, the path loss exponent was assumed to be α = 3 and the bandwidth to be BW = 125 in kHz. Noise power was defined as N = −174+N F +10 log 10 (BW ) in dBm, where N F is the noise figure equal to 6 dB. The nodes and PBs were distributed around a single packet sink with a coverage radius of R = 0.5 km. Next, for the generation of wireless fading channels, |h i,j | 2 , an exponential random variable with unit mean was used. The transmit power of the PB was set to 1 W. The linear model was considered with η = 0.7 for the energy harvesting model. Finally, the time frame T was set to 100 milliseconds.
For our simulation, we set our VAE hyper-parameters as follows. First, the learning rate was set to κ = 0.001, the decay rate for the first moment was set to β 1 = 0.9, the decay rate for the second raw moment was set to β 2 = 0.999, and the divide by zero prevention small scalar was set to = 10 −8 . Next, we set our VAE input vector size to N d √ N d and initialized our neural network weight using a glorot initializer [28]. Finally, we set our neural network bias to be zero initialized. We trained our VAE neural network until a training loss value smaller than the previous minimum training loss value was not found for 30 + 500/N d epochs.
In addition, we set PT-EVM hyper parameters as follows: time slot scale factor t = T N d and rate budget threshold B th = 0.01. Also, for the iterative balancing time slot allocation, the hyper-parameters were set as follows: the upper bound of the difference between B max and B min was 1 = 10 −6 bits/Hz and the minimum allocatable time slot was 2 = 10 −7 sec.
For a performance comparison, we adopted 3 other conventional schemes and optimal solution cases along with our proposed scheme: • Optimal solution (Opt) which exhaustively searches all cases of valid topology configuration while allocating the time slot through Algorithm 1, then chooses the best rate topology. • Direct connect scheme (Dir) which connects all nodes directly to the sink (c n,N d +1 = 1, ∀n ∈ N d ), and allocates the time slot through Algorithm 1. • MST scheme (MST) which makes a Minimum Spanning Tree (MST) starting from the sink under initial t where t i = T /N d for i ∈ N d , based on link costs which are set to be reciprocal of the capacity, denoted as 1/ (t i log 2 (1 + Γ i,j )), and allocates the time slot through Algorithm 1. • Greedy scheme (Greedy) which connects all nodes directly to the sink (c n,N d +1 = 1 ∀n ∈ N d ), and repeats randomly selecting a node i and connecting it to the one with highest achievable rate, min(t i log 2 (1 + Γ i,j ), B j ) where ∀j ∈ N d , then allocates the time slot through Algorithm 1. • Proposed scheme (Prop) which decides the topology with our proposed VAE and PT-EVM scheme and allocates the time slot with Algorithm 1.
All considered schemes were implemented on Matlab R2022b, using a computer equipped with an Intel Core i7-11700 and 16 GB of memory, without GPU acceleration.    Fig. 4 shows that the resulting raw topology (not postprocessed) gradually changes in the example given in Fig. 1 as learning progresses. Each sub-figure shows the raw topology result in epoch 0, 10, and 50. We can see that our proposed scheme effectively converges to a sub-optimal topology in a reasonable training epoch. As shown in Fig. 4, the proposed VAE scheme converges to the final result topology within 50 iterations (roughly taking 60 seconds to compute). the enlarged final result topology for this network example is depicted in Appendix A. Fig. 5 shows the time slot allocation results of the example given in Fig. 1. Note that relaying nodes such as node ID 1, 4, 5 and 15 are allocated relatively more time slots, as well as the nodes in an inferior power/position environment such as node ID 12 and 14. Because of this, the proposed IB time slot allocation algorithm effectively allocates the time slot fairly. Fig. 6 shows the training loss and performance (bits/Hz, B min ) convergence with respect to the number of training epochs for the proposed VAE and PT-EVM hybrid scheme under the N d = 25, N b = 2 sample, previously described in Fig. 1.

B. EXPERIMENT ANALYSIS AND DISCUSSION
The blue dashed line in Fig. 6 indicates the point at which training has ended. The proposed VAE scheme ends its training VOLUME X, 2023  Fig. 6 shows that (R sim ) i have an initial value of zero and a final value of about 0.2 for the network configuration in Fig. 1, meaning that the training loss value L starts from 1 and finalizes at approximately 0.7 to 0.8, respectively. Therefore, the training loss value is bound to show minimal change. Fig. 7 and Table 1 show the topology results and performances (bits/Hz) over the considered schemes with respect to the transmit power of the PB. It can be seen that the topology results by the proposed scheme adaptively change depending on the transmit power of the PB, showing a preference for direct connections in a high power environment and a preference for relaying connections in a low power environment. The MST algorithm results in identical topology despite the changing power configuration, showing a lack of adaptability for various IoT network configurations. As shown in Table 1, we can also confirm that our proposed scheme outperforms other schemes over different configurations of the transmit power of the PB. Fig. 8 shows the measured computation time among the considered schemes with respect to the number of nodes from N d = 5 to 25 with an interval of 2. Note that the number of PBs does not affect the measured computation time. We can see that the computation time of the optimal solution search grows exponentially, approximately 2 hours for the N d = 7 case, which makes the exhaustive search method infeasible in practical environments, e.g., N d ≥ 8. Consequently, the computation time of the optimal solution is not suggested over N d ≥ 8. This suggests the need of an alternative scheme with sub-optimal performance and rational computation time.
Although our proposed VAE and PT-EVM scheme takes more computation time over other considered schemes, the computa-tion time gap between the proposed scheme and conventional schemes decreases as the order of magnitude of N d increases. This makes the proposed scheme plausible for use over large IoT networks. Fig. 9(a) to 9(c) shows the minimum bits per frequency with respect to N b ∈ {1, 2, 3} and N d ∈ {5, 6, 7}, respectively, over all the considered schemes. Each data point on Fig. 9(a) to 9(c) is averaged over the performance results from 30 random distributions of nodes and PBs. The minimum bits/Hz could be converted to a rate by multiplying BW T , and the difference between the minimum and maximum bits/Hz is guaranteed to be below 10 −6 since Alg. 1 is implemented with 1 = 10 −6 bits/Hz. Fig. 9(a) to 9(c) shows that the minimum bits/Hz increases over all considered schemes as N b increases since the received power at each node increases. In particular, the MST scheme shows relatively good performance in the low power condition, while the direct connect scheme shows good performance in the high power condition; therefore, those schemes have strengths in different power environments. Conversely, our proposed scheme achieves better performance than all other considered schemes at all configurations of number of nodes and PBs: closest to the optimal solution we found. Fig. 10(a) to 10(c) shows the minimum bits/Hz with respect to N b ∈ {1, 2, 3, 4, 5} and N d ∈ {10, 20, 30}. Note that each data point on Fig. 10(a) to 10(c) is also averaged over the performance results from 30 random distributions of nodes and PBs. the greedy method shows lower performance than the direct connect method compared to fewer N d s, as shown in Fig. 9(a) to 9(c), since the optimal choice from a local perspective may not be appropriate from a global perspective, thus deteriorating the overall performance of the network. Our proposed scheme is also confirmed to be superior over the three other considered schemes, ranked at the highest performance in the graph at all simulation configurations. In these cases, we do not provide optimal solutions due to the excessive computation time of an exhaustive search. Also, as shown in Fig. 9(a) to 10(c), our proposed scheme is superior among various N d from 5 to 30, illustrating the scalability of our method. Therefore, we can confirm that our proposed scheme outperforms other existing schemes while achieving sub-optimal performance. Subsequently, we can infer three reasons for the performance superiority and adaptability towards varying environment configurations of our proposed scheme. i) The VAE stage has a non-linearity property within the model, thereby effectively coping with our NP-hard mixed integer system model problem. ii) The backward-pass based evaluation stage and its novel packet-tracing algorithm gives a sufficiently concise and accurate assessment for output topology from the VAE part result simultaneously, thus providing the VAE stage with an appropriate learning direction. iii) Lastly, our proposed IB based time allocation algorithm fairly distributes the time slot in the TDMA system, resulting in effective final fine-tuning on the formulated max-min problem.

V. CONCLUSIONS
In this study, we formulated a max-min optimization problem for the TDMA based IoT relay network with energy harvesting (EH). We proposed a VAE based module to effectively solve the formulated problem, in spite of a lack of dataset and the non-linearity characteristic of problem. We also proposed a novel backward-pass based assessment algorithm called "Packet-Tracing" to precisely and concisely assess and train our proposed VAE module. Finally, we proposed an IB time slot allocation algorithm to achieve TDMA fine-tuning optimization in the fairness aspect. We presented a practical example of our proposed scheme running along with the loss and performance plot, showing its stability and performance. We also provided a computation and performance comparison with three other conventional schemes and the optimal brute-force solution. We observed and confirmed that our proposed scheme is both stable and superior to other considered schemes through numerical simulations. . Output of proposed scheme in the example given in Fig. 1