Convolutional Neural Networks for Flexible Payload Management in VHTS Systems

Very high throughput satellite (VHTS) systems are expected to have a large increase in traffic demand in the near future. However, this increase will not be uniform throughout the service area due to the nonuniform user distribution, and the changing traffic demand during the day. This problem is addressed using flexible payload architectures, enabling the allocation of the payload resources in a flexible manner to meet traffic demand of each beam, leading to dynamic resource management (DRM) approaches. However, DRM adds significant complexity to the VHTS systems, which is why in this article, we are analyzing the use of convolutional neural networks (CNNs) to manage the resources available in flexible payload architectures for DRM. The VHTS system model is first outlined, for introducing the DRM problem statement and the CNN-based solution. A comparison between different payload architectures is performed in terms of DRM response, and the CNN algorithm performance is compared by three other algorithms, previously suggested in the literature to demonstrate the effectiveness of the suggested approach and to examine all the challenges involved.

, based on multibeam coverage with polarization schemes, frequency reuse, and spectrum optimization [4].
Nowadays, VHTS systems provide uniform throughput over the entire service area; however, traffic demands are expected to be nonuniformly distributed over the service area since the user distribution is not uniform within the coverage. This will result in a system where some beams do not have the required capacity, i.e., not meeting the traffic demands, whereas other beams overcome the required capacity or, simply, wasting resources [5], [6]. On the other hand, operators claim that one of the main challenges in the design of future satellite broadband systems is the way to increase the satellite revenues while addressing uneven and dynamic traffic demands. In that sense, flexible payload is a promising solution to meet changing traffic demand patterns [3], [7]- [9].
Most of the existing satellite communication (SatCom) payloads do not offer any flexibility in terms of neither bandwidth nor coverage. Power flexibility can be instead achieved by modifying the working point of the on-board amplifier according to the transponder loading. Recently, research interests have been focusing on designing a new generation of flexible satellite payloads enabling dynamic resource management (DRM) [7]- [9].
Cocco et al. [10] represent the problem of radio resource allocation for VHTS as an objective function that minimizes the error between the offered and the required capacity. However, extensive analysis is required for both the payload architecture design and resource management.
The next-generation VHTS systems will provide Terabit connections using advanced flexible payloads, which will allow the redirection and reconfiguration of beams, in addition to individual per-beam power and bandwidth allocation. Thus, DRM techniques for SatCom will be a key for operators [11]. While it may seem feasible to achieve a solution to this problem through optimization techniques, on a larger scale, the number of resources to be managed, the constraints coming from the system and the infinite number of traffic demand situations may lead to a problem that cannot be solved by conventional techniques [12]. To solve this problem, Liu et al. [13] suggested an assignment game-based dynamic power allocation (AG-DPA) to achieve suboptimal low complexity in multibeam satellite systems. The authors compare the results obtained with a proportional power allocation (PPA) algorithm, obtaining a remarkable advantage in terms of power saving; however, the management of resources is still insufficient for the required demand.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Recently, there is an increasing amount of research effort proposing to solve the problem of DRM in SatCom using machine learning (ML) techniques. In this sense, Ferreira et al. [14] discuss the reason why ML techniques should be used instead of traditional optimization techniques for resource allocation in SatCom. The authors comment that when the number of communication resources and the selection of a set present contradictory objectives, the use of traditional optimization techniques is limited. Alternatively, the use of ML techniques in SatComs overcame some of the limitations of other approaches to resource allocation in cognitive radios.
To this aim, the interest to use ML algorithms in satellite systems has increased recently [14]- [16]. There have been some technological advances in the use of ML on-board communications satellites (e.g., by NASA for cognitive space communications) [17]- [19]. On the other hand, a model for satellite image motion blurring using a convolutional neural network (CNN) is proposed in [20], and the results obtained confirm that the proposed method reliably eliminates more motion blurring than conventional methods. In this regard, a new method for satellite rapid positioning using neural network based beam scanning is proposed in [21].
In [22], the application of reinforced learning to manage complex space links operations occurring in LEO and Deep Space missions shows promising results. Here, authors propose the use of software-defined radio technology to enable the flexibility and configurability that future cognitive communication missions will require, thus allowing the system to adapt to changes using software updates.
Ortiz-Gomez et al. [23], [24] propose to solve the DRM problem using a neural network through a classification algorithm, where classes correspond to all the possible configurations of payload resource allocation. The management of the payload resources is performed in an autonomous way; the main advantage of this methodology is that the management is performed with a low computational cost, since the neural network training is performed offline. However, this methodology has several challenges; one of them is the exponential dependence of number of classes on the number of beams in addition to possible variations of power, bandwidth, and/or beamwidth, resulting in unsolvable problems due to flexibility increase.
Liu et al. [16] suggest a novel dynamic channel allocation algorithm based on deep reinforcement learning (DRL-DCA) in multibeam satellite systems, where the results showed that this algorithm can achieve a lower blocking probability, compared to traditional algorithms; however, the joint channel and the power allocation algorithm is not taken into consideration. Preceding with the previous algorithms, based on the deep reinforcement learning (DRL) architectures, Ferreira et al. [14], [15] demonstrated that a feasible solution to real-time and single-channel resource allocation problems can be designed. However, in their proposed study, DRL architectures are based on quantizing the resources before they are allocated, whereas satellite resources, such as power, are inherently continuous. In that sense, Luis et al. [25] explore a DRL architecture for power allocation that uses continuous and state action spaces, avoiding the need for discretization. Nonetheless, the policy is not optimal, as some demand is still being lost.
The operation of DRL algorithms for DRM has proven to have great advantages among the various ML algorithms; however, latency has a vital role in SatCom. Hence, the great disadvantage of DRL algorithms is the additional latency due to the online processing delay. When the algorithm is implemented offline, the DRM functions as an intelligent switch for the SatCom system, reducing the added latency [23].
Some of the recent literature considers that DRM is able to take into account only two possible flexible resources [10], [11], [23], i.e., power and bandwidth, whereas other studies consider that only one flexible resources can be handled [14], [15], [25]. Differently from previous approaches, the main objective of this article is to solve the DRM problem using an ML algorithm based on CNNs.
The main contributions of this article can be listed as follows. 1) CNN algorithm is suggested and allows us to implement offline the system, thus avoiding the latency problems in DRL. 2) A novel system model and cost function are suggested that allows for an optimal solution, which determines how to match resources to a demand pattern while minimizing the resource consumption of satellites. 3) Differently from previous approaches, we consider three possible flexible resources for the study of DRM, i.e., power, bandwidth, and beamwidth. 4) In this article, a comparison is made between different flexible payloads according to the three previously identified resources. The CNN algorithm performance is compared with three other algorithms: PPA [13], AG-DPA [13], and classification algorithm [23].
The article is organized as follows. Section II includes the considered system model and the problem statement, Section III presents the CNN-based algorithm, Section IV presents the simulation results and the analysis of the case study, and finally, Section V concludes the article. In the Appendix, the CNN design for the considered DRM problem is outlined.

II. SYSTEM MODEL AND PROBLEM STATEMENT
In this section, the system architecture is introduced, and the DRM problem is defined.

A. System Architecture
In Fig. 1, the high-level model of the system is depicted. The considered payload is supposed to be able to manage in a flexible way, three resource types, i.e., power, bandwidth, and beamwidth, similar to that in [26] and [27]. The flexibility in these resources is achieved using the architecture shown in Fig. 1.
From the technological point of view, the flexible power allocation can be obtained through traveling wave tube amplifiers adapting the input back-off. At the same time, the flexible payload must be able to separate the signals into frequency blocks and then rearrange them to obtain a flexible bandwidth; this process requires a channelizer on board the satellite, as mentioned in [27], able to identify the frequency plan (color assignment by frequency and polarization). The proposed system has a four-color frequency plan (two frequencies and two polarizations). Finally, to achieve flexibility in beamwidth, the output multiplexer of the traditional payload must be replaced by beam forming networks (BFNs). The BFN configuration change is halfway between the possibility of synthesizing any beam and the possibility of choosing from a set of configurations for the same coverage [7].
The proposed system manages communication resources in response to changes in traffic demand. The payload manager receives the input data from the gateways and the user beams, and, then, generates an optimal control through the payload control center using a DRM and sends it to the satellite. This in turn affects the downlinks of the users.
In the following, we consider that the system is composed by B beams, in which the overall bandwidth is BW and the transmission power of the bth beam is P b .

B. Link Budget
We assume that capacity C b offered by the bth beam depends on the bandwidth allocated to the beam, BW b , and the spectral efficiency in the beam, SE b where SE b is the spectral efficiency of the modulation and coding scheme of a commercial reference modem used to obtain the capacity of each beam [28]. Hence, in (2), the variation of the spectral efficiency of each beam with the carrier to interferenceplus-noise ratio (CINR b ) is modeled through a generic function f 1 (·) as in [3] and [28]. A bent-pipe transponder architecture is considered in the satellite. The feeder link from the gateway to the satellite is not considered in the forward link because different technologies are considered for guaranteeing the total link budget, such as uplink power control, and gateway diversity [29], [30]. In this sense, the downlink link budget in the user link can be written as where CINR b , CIR b (i.e., carrier to interference ratio), and CNR b (i.e., carrier to noise ratio) are expressed in decibel. CIR b in (4) represents the ratio of the power allocated at bth beam (P b , in dBW) to the interference power at bth beam (I b , in dBW). The beam gain must be evaluated for the bth beam in the service area and it depends on θ b , which is the beamwidth, i.e., the 3-dB aperture angle of the bth beam. The sidelobes of the satellite antenna pattern are taken into consideration only for the calculation of the cochannel interference due to spatially separated cochannel beams (the same color in the frequency plan). The cochannel interference power (I b ) is a function of the frequency reuse scheme and θ b , and can be calculated, assuming the cochannel beams set in the system (5), where ϕ represents the ϕth interferer spot, Φ is the total number of interfering beams of the beam b, and P co is the power level (in W) of ϕth interference inside the bth beam.
The traditional calculation of the CNR b is defined as a function of the beam power (P b , in dBW), beam gain (G b , in dB), and bandwidth (BW b , in Hz) assigned to each beam as presented in (6). G b represents the beam antenna gain in the user location, whose dependence with beamwidth (θ b ) can be modeled through an adequate function f 2 (·) [3]. CNR b depends on the antenna gain of the user (G u , in dB), the Boltzmann constant (k, in W/K·Hz) and the system temperature (T sys , in K). In addition, free space loss, atmospheric loss [31], transmission loss, and receive loss (L FS,b , L atm,b , L RF,b , and L RF,u respectively, in dB) are included for total link loss (L b , in dB).

C. Traffic Model
In [32], traffic models representing realistic operating scenarios are introduced. The authors consider four different datasets that provide measurements for all beams and cover a sufficiently long time-window. These data were provided by SES S.A.; by exploiting these datasets, the authors aim to represent, first, the demand behavior during a typical daily operation cycle, and second, the unfrequented cases that could lead to major service failures if the algorithms do not provide adequate results.
The reference model used in [32] represents the throughput demand in a typical commercial scenario, where a higher data rate is requested during specific time intervals of the day.
It is possible to define x as a specific geographic area of 1 km 2 and A b as bth beam area (in km 2 ). In this sense, r x (t) is defined as the required throughput density per km 2 (in bps/km 2 ) inside x at time t, r b (t) as expected value over all the area inside bth beam at time t (in bps/km 2 ) and the requested traffic for the bth The throughput density per km 2 depends on the throughput per user (C u , in bps/user), the population density (D x, in inhabitant/km 2 ), the penetration rate (F x , in user/inhabitant), and the concurrence rate that depends on the time of day (T x (t)), hence To this aim, the variation of the concurrence rate throughout the day (T x (t)) is obtained by simulating the behavior of the data presented in [32]. The traffic demand in each beam behaves as shown in Fig. 2, in which the traffic demand varies depending on the time of day. The figure shows a one-day cycle of traffic demand in two different beams (number 1 and 37).

D. DRM Cost Function
The DRM must manage the available resources to minimize the error between the offered capacity in each beam (C b ) and the required capacity (R b ) while optimizing the used resources (power, bandwidth, and beamwidth) over time. In that sense, the DRM cost function is defined as subject to where the beamforming antenna generates B beams over the coverage area. The offered capacity by bth beam at the time t, C b (t) (in bps), must change as it depends on R b (t) (in bps), the required capacity in the bth beam at the time t.
The DRM cost function is proposed in (12), aiming at minimizing three parameters for each time instant t. The first parameter is the error between the offered capacity and the required capacity, where α (in s/b) is the weight of the error in the cost function. The second parameter to be minimized is total effective isotropic radiated power (EIRP), which is assigned to all the beams (in W), where β (in 1/W) is the weight of the total EIRP; the third parameter to be minimized is the total bandwidth (in Hz) that is assigned to the beams of each color (BW c ) within the frequency plan (N c is the number of colors in the frequency plan), and where γ (in 1/Hz or s) is the weight of the total bandwidth assigned in each color of the frequency plan.
From (13), the offered capacity in the bth beam at time t, C b (t) is a function f 3 (·) on the power [3], beamwidth, and the bandwidth assigned to the bth beam at time t (P b (t), θ b (t), and BW b c (t), respectively). On the other hand, EIRP of the bth beam at instant t, EIRP b (t) is a function f 4 (·) on the power and beamwidth assigned to the bth beam at time t (14) [3].
Equation (15) introduces the minimum capacity constraint, stating an important limitation of the proposed cost function, where the offered capacity must be greater than or equal to the required capacity for each beam, with the condition that the power, beamwidth, and bandwidth allocated to the bth beam at time t are less than the maximum allowed for each beam (P max,b , θ max and BW max,b ). In case the offered capacity cannot satisfy the beam requirement constraint, the capacity offered on the bth beam shall be the maximum possible value.
Equations (16) and (17) represent the other two constraints of the cost function. These constraints are that the total power used (i.e., B b=1 P b (t)) should not be greater than the maximum total system power (P max,S ), and the total bandwidth allocated in each colour of the frequency plan (i.e., should not be greater than the available bandwidth per color (BW max,c ). In addition, the beamwidth of the bth beam must belong to the set of possible configurations previously established (18). The selected beamwidths must meet the requirement of completely covering the entire service area.

III. CNN-BASED DRM ALGORITHM
A CNN algorithm handling resource allocation is the main part of the proposed DRM. The CNN determines how to match resources to a demand pattern while minimizing the resource consumption of satellites. Other ML algorithms are capable of providing solutions for time-variant data; however, a CNN architecture has been chosen because the distribution of traffic demand in the service area can be represented by a spatial dependence and CNN networks have demonstrated good performance in exploiting the features of spatial distributions [33], [34].
The training of the CNN is performed offline so that from the SatCom system point of view, the CNN will be operating as an intelligent switch performing the DRM.
In this section, we introduce the proposed CNN algorithm for managing the dynamic resources. For more details on CNN operations, the CNN architecture is introduced in the Appendix, whereas more general details on CNN can be found in [33]- [35].

A. CNN Architecture
A CNN is a deep learning (DL) algorithm that can take an input and assign importance (weights) to various aspects or objects in the input in order to distinguish one from another. The preprocessing required in a CNN is much less compared to other classification algorithms [33]- [35]. In that sense, CNN can successfully capture the spatial and temporal dependencies in the input through the application of relevant filters. The CNN architecture is optimized in comparison with other classification algorithms in order to improve the processing performance of a dataset due to the reduction of the number of parameters and the reuse of the weights. Thus, the CNN can be trained to better understand the complexity of the neural network input. Fig. 3 presents a typical CNN architecture. In order to understand its behavior, it is divided into four main CNN operations: 1) convolution; 2) nonlinearity with a rectifier linear unit (ReLU); 3) pooling or subsampling; 4) classification (full-connected layer). One of the most typical CNN applications is image classification. In image classification, the channel is a conventional term used to refer to a certain component of an image, e.g., the red, green, and blue channels of an image taken from a standard digital camera. In other words, an image from a standard digital camera has three channels: red, green, and blue. The channels can be seen as three 2-D matrices stacked together (one for each color), meaning at the CNN input, we have a tensor of matrices [35].
CNNs derive their name from the "convolution" operator. The main goal of convolution in the case of a CNN is to extract features from the input image (tensor of matrices). Convolution preserves the spatial relationship between pixels by learning the features of the image using small input data squares.
The ReLU is an element-based operation and replaces all negative element values in the characteristics map with zero. The purpose of ReLU is to introduce nonlinearity into the CNN [33].
On the other hand, the spatial clustering (also called pooling or subsampling) reduces the dimensionality of each feature map but retains the most important information. The result of the convolutional and pooling layers represents high-level features of the input. The purpose of the full-connected layer is to use these features to classify the input image into various classes according to the training dataset [33]- [35].
CNN belongs to the collection of supervised DL algorithms, so two sets of data are required for its functioning: the training data and the test data. Observations in the training set form the experience used to learn by the algorithm. In supervised learning problems, each observation consists of one observed output variable and one or more observed inputs, whereas test data are a set of observations used to evaluate model performance using some efficiency metrics, such as accuracy [33]- [35].

B. CNN for DRM
In this article, we propose to adapt CNN architecture, usually considered in image classification problems, for solving the DRM problem. The adaptation of the traditional CNN architecture has been performed in the input layer and in the output layer to solve the DRM problem. Fig. 4 represents the adaptation in the input layer where there is a tensor of matrices, and each matrix represents the required capacity at each geographical position in the service area. The service area does not have a regular geometrical shape, so the geographical coordinates contained in each matrix outside the service area are padded to zero. With this adaptation, the depth of the matrix tensor does not represent the channels in an image, though it stands for the time instants (states) in which the system is evaluated. That is, the depth of the matrix tensor is given by the vector {t, t − 1, t − 2, ..., t − T}, where t is the current time instant and T is the time window size.
Depending on these features; constraints and flexibility of the payload; there is a set of possible configurations for allocating resources in the beams. These possible configurations are coded  in a vector of size L, representing the number of possible configurations of the payload resources. By exploiting this, the CNN has, in the output layer, the configuration that minimizes the cost function in (12) for the conditions of the input layer. This is shown in Fig. 5. Fig. 6 shows the scheme of the different layers of the CNN. The input layer is a matrix tensor with the required capacities in each geographic coordinate. The convolution layers are used to obtain the main features of the traffic demand in the geographic coordinates. The full-connected layers are used to allocate the resource configuration with the features obtained in the convolution layers. The output layer results in a resource allocation that minimizes the DRM cost function (12).
In convolutional layers, single neurons in a perceptron are replaced by matrix processors, thus they perform an operation on the input matrix data, rather than on a single numerical value. For a better understanding, in the Appendix, the CNN architecture and related parameters are explained with additional details. The output of each convolutional neuron is calculated as [33] where Y j , the output of the jth neuron, is a matrix calculated as linear combination of the outputs Y i of the neurons in the previous layer, each operated with the convolutional K ij kernel corresponding to that connection; this amount is added to a p j connection and then passed through a nonlinear activation function g 1 (·). The convolution operator has the effect of filtering the input matrix with a previously trained kernel.
After the convolution layers, the data finally reach the fullconnected layers, where the data are then debugged; thus, these layers implement the resource configuration that minimizes the DRM cost function (12). The neurons in these layers work identically to those in a multilayer perceptron, where the output of each perceptron of each layer is calculated as [33] The output y j of the jth neuron is a value that is calculated as the linear combination of the outputs y i of the neurons in the previous layer, each multiplied with a weight w ij corresponding to that connection. This amount is added to an influence term p j and then passed through a nonlinear activation function g 2 (·).
The CNN cost function is represented by the error between the expected and the obtained value where where Ω is used to denote the CNN configuration. That is, Ω is a function f 5 (·) of W and K, the values of the weights and biases, whereas D is the size of the training set, Q(·) is a function to be determined by CNN configuration, andỹ represents the expected values [33]. Training data are generated with the traffic demand model defined in (9) and (10), and the labels designated (Y) for training are obtained with the DRM cost function (12).
CNN training is performed with the backpropagation algorithm [34]. An accuracy parameter is defined for this purpose, where accuracy is a metric for measuring the performance of CNN and is defined as ratio of the total number of times the correct resource configuration was allocated (RA correct ) to the total number of times a resource configuration was allocated (RA total ), i.e., The proposed system manages the communication resources in response to changes in traffic demand. The payload manager receives the input data from the gateways and the user beams, generates an optimal control through the DRM, and sends it to the satellite. This in turn affects the downlinks of the users.
A CNN algorithm that handles resource allocation is at the kernel of the DRM (see Fig. 7). The CNN determines how to match resources to a demand pattern while minimizing the resource consumption of satellites. The training of the network is performed offline so that for the SatCom system the CNN represents an intelligent switch that works as a DRM.  The software tool chain used to implement CNN consists of a Jupyter development environment using Keras 2.0. CNN, which has been defined using four convolutional layers (Conv) and two full-connected layers (FC), as shown in Table I. The first convolutional layer, Conv1, consists of 16 kernels, each of them 10 × 10 in size. The second convolutional layer, Conv2, consists of 16 kernels, each with a size of 8 × 8. The third convolutional layer, Conv3, consists of 32 kernels, each with a size of 5 × 5, and Conv4 consists of 32 kernels, each with a size of 3 × 3. The first full-connected layer, FC1, remodels the output of Conv4 using a flatten layer. The second layer (FC2) is connected to the classification layer. The input in the Conv1 layer represents the size of the input layer tensor, with matrices of 256 × 256 and a depth equal to T (i.e., time window size). One pooling layer is added after each convolutional layer, using the maximum value of 2 × 2 size to reduce the size of the convolutional layers. The classification layer provides the L probabilities that, given the received input, the lth resource configuration corresponds.

C. Performance Evaluation
To evaluate the performance of the DRM algorithm, two key performance indicators (KPIs) are proposed with which an important tradeoff can be observed.
The first KPI is defined as the normalized mean error of the DRM algorithm where ρ is the normalization parameter. The normalization parameter allows the comparison of the different DRM algorithms performance when used in different traffic demand scenarios. The second KPI is the power saving [13], defined as where P Total,UPA is the total payload power when using a uniform power allocation (UPA) and P Total,Alg is total payload power when using the power allocation using the proposed algorithm.

IV. NUMERICAL RESULTS AND ANALYSIS
In this section, the numerical results obtained in a reference scenario are presented, comparing the performance of the proposed CNN approach with the PPA, AG-DPA, and classification algorithms [13], [23]. The selected reference scenario corresponds to a multibeam coverage for Europe and the Mediterranean basin with 82 beams, similar to the coverage currently provided by KA-SAT [36]. Table II shows the characteristics of the eight different payload architectures that have been evaluated using a CNN for DRM. In order to analyze the advantage of the proposed CNN approach, we considered different payload architectures achieved by exploiting three parameters (i.e., beamwidth, bandwidth per beam, and power per beam) in a flexible way.
The flexibility obtained by changing the beamwidth provides the possibility of generating irregular beams coverage, allowing the adaptation to changing requests. However, the flexibility of the beamwidth size depends on the definition of the BFN [7]. In this sense, three possible beamwidth sizes are considered (see Table II). Each payload has fixed or flexible parameters according to the features presented in Table II.
These parameters represent the resource allocation per beam. The fixed parameters are power with 15 dBW, bandwidth with 250 MHz, and beamwidth with 0.60°. The flexible parameters are: power 8-15 dBW with steps of 0.5 dB, bandwidth 100, 200, or 250 MHz, and beamwidth 0.55°, 0.60°, or 0.65°. The three possible beamwidth values comply with the constraint set in (17). In a multibeam system, the power allocation is usually a continuous variable [25]; however, in this article, the power allocation is considered to be performed by selecting one value among several in a set of possible power levels per beam, due to existing technologies that allow power to be modified in 0.5-dB steps [7].
In Table II, each payload has specific features depending on the defined flexibility parameters. Those have a direct effect on the DRM cost function parameters (α, β, and γ).The Payload 1 has all values set to zero because it represents a traditional payload (no flexibility), so it has no parameters to optimize. To minimize the error of the capacity offered and to save resources, the parameters α, β, and γ are used to normalize each term of the cost function; α is equal to (C T,P 1 ) −1 for all payloads that have some flexibility factor (Payload 2-8), where C T,P 1 represents the total capacity provided by Payload 1 (traditional payload). β is equal to (EIRP T,P 1 ) −1 for all payloads that have flexibility in power and/or beamwidth otherwise is equal to zero, where EIRP T,P 1 represents the total EIRP provided by Payload 1 (traditional payload). And finally γ is equal to (BW T,P 1 ) −1 for all payloads that have flexibility in bandwidth otherwise it is equal to zero, where BW T,P 1 represents the total bandwidth provided by Payload 1 (traditional payload).

A. CNN Training Analysis
The first step to be addressed is related to the CNN training, which allows us to set up the CNN parameters for the following real-time usage.
Algorithm convergence can be observed during training and testing with accuracy as a performance measure. The CNN algorithm accuracy as defined in (23) allows us to obtain the relationship between the correctly predicted values and the desired values, obtained from the training data, during each iteration; we notice that the accuracy has a value very close to 1 both in training and in the test.
Among other influential parameters, the time window size is the parameter that influences more the CNN training. Fig. 8 shows the performance of CNN during training for four different time window sizes, where time window size T = 1 corresponds to only one state, time window size T = 2 corresponds to two states, time window size T = 3 corresponds to three states and time window size T = 4 corresponds to four states.  Table II).
The Payload 8 configuration was used to compare the impact of the different time window sizes on the training and test accuracy while the minimum required accuracy value is set at 0.97 [34]. The performance during training between using Time Window Sizes 3 and 4 is almost the same. Assuming that the larger the time window size, more delay can be added to the system; it is concluded that time window size T = 3 is the most suitable for the proposed CNN approach.

B. CNN Performance for DRM and Comparison With Benchmark Algorithms
Once convergence is guaranteed at the training step, the DRM will work online as an intelligent switch that manages resources according to the required capacity (see Fig. 7). The DRM will attempt to minimize KPI 1 in (24) while maximizing KPI 2 in (25). In this sense, the performance of the algorithm used for the DRM is evaluated by exploiting a joint KPI, defined as where A 1 and A 2 are weights parameters allowing to give different importance to the two KPIs. Better an algorithm performs lower is F 3 in (26). By assuming that KPI 1 and KPI 2 have the same importance, we set both A 1 and A 2 to 0.5; four algorithms have been evaluated for DRM using the Payload 4 architecture (see Table II), a time window size T = 3, and a normalization parameter ρ = R max , where R max represents the maximum capacity required per beam for each scenario where the algorithm was evaluated.
The evaluated algorithms are as follows: 1) PPA: Proportional power allocation algorithm [13]; Fig. 9. Performance comparison of different algorithms for DRM using Payload 4 architecture (see Table II).
2) AG-DPA: Assignment game-based dynamic power allocation algorithm [13]; 3) Classification: Neural network for classification algorithm [23]; 4) CNN: Convolutional neural network algorithm. Fig. 9 shows the performance comparison of the four algorithms in terms of power management. The figure shows the tradeoff between the defined KPIs. In the case of the PPA algorithm (F 3 = 0.47), it is demonstrated that it is not a suitable algorithm since the way it manages power is proportional to traffic demand without taking into account interference between beams of the same color [13]. On the other hand, resource management using a classification algorithm with neural networks loses its effectiveness by increasing the possible situations of traffic demand [23], for that reason, it is the algorithm that presents the worst performance (F 3 = 0.48).
The algorithm that obtains the lowest normalized mean error, i.e., KPI 1 , is the CNN algorithm (0.4096), whereas the algorithm that obtains the highest power saving is the AG-D (3.86).
AG-D focuses more on optimizing power savings [13] and, therefore, neglects the normalized error of capacity by setting its F 3 to 0.38. Therefore, based on (26), the algorithm that has the best performance is the CNN algorithm that obtains F 3 value of 0.34 since the proposed cost function (12) achieves a balance between minimizing KPI 1 and maximizing KPI 2 .

C. Payload Architectures Performance
After having evaluated the impact of the training phase and the algorithm KPIs, the performance for different payload architectures, as presented in Table II, has been evaluated using the CNN algorithm and a 3-state time window size. Fig. 10 shows the DRM performance for the different payload architectures in a specific time of the day (i.e., 10 A.M.) for the 82 beams in the service area. The blue bars show the required capacity in each beam at 10 A.M. (using the proposed traffic model). The offered capacity of each payload architecture is represented by the respective mark shown in Fig. 10. It is shown that Payloads 7 (flexibility in bandwidth and power) and 8 (full flexibility) are those achieving better performance, since the  capacity offered by these payloads have a behavior similar to the required capacity.
As expected, Payload 1 (traditional) has the worst performance since there is no flexibility in any of its parameters and the capacity offered is always the same.
The parameters that have the most notable effect on payload performance are power and bandwidth, where beamwidth has very reduced influence (Payload 2) on resource management performance (given the options in the case study). However, the beamwidth allows adjusting the cost function when accompanied by other flexibility parameters (Payload 5, Payload 6, and Payload 8).
The DRM performance of the eight payloads was evaluated for 48 h and is shown in Figs. 11 and 12. Fig. 11 shows the normalized mean error between the offered capacity and the required capacity of the system. CNN was used for all payloads and evaluated for the same distribution of traffic demand during the 48-h evaluation. For this reason, the mean error presented in Fig. 11 was normalized to ρ = error max , where error max is the maximum error obtained in all cases, and corresponding to 0.475 Gb/s. Fig. 11 shows the better performance of Payloads 7 and 8 during the 48 h compared to the rest, reducing the mean error between the offered capacity and the required capacity by up to 89% compared to a traditional payload (Payload 1). According to the conditions of the case study, the bandwidth (Payload 3) is the parameter that has a greater influence on the error between the offered capacity and the required capacity, reducing the error up to 68% compared to a traditional architecture (Payload 1), followed by the power (Payload 4) that reduces the error up to 51%, and finally the beamwidth, reducing only 9%. Fig. 12 shows the normalized power consumed by the payloads, representing the inverse of the power saving. It also shows the power requirement of the payload for 48 h, and it can be seen that the payloads that have a constant power value (i.e., Payloads 1, 2, 3, and 6) have the highest power requirement; this is due to the conditions of the case study (see Table II), in addition to having fixed power, the assigned value is the maximum possible in these payloads.
The power requirements of the payload are reduced by up to 50% (see Fig. 12) when the flexibility parameters are power and beamwidth (Payload 5), because the gain in each beam can also be adjusted to meet the required capacity, allowing power savings.
Power required for Payload 7 (flexibility in power and bandwidth) is similar to that required for Payload 8 (full flexibility). Fig. 12 also shows that both payloads can reduce the power by up to 65% given the conditions of the case study (see Table II).
In the case of Payload 4, the least power is required (see Fig. 11) because the only resource being managed is power, since in addition to the attempts to reduce the error between the offered capacity and the required capacity, the payload also tries to minimize the power used by the payload. However, if the priority is to reduce the error between the offered capacity and the required capacity, the Payloads 3, 5, 6, 7, and 8 have better performance (see Fig. 11).

V. CONCLUSION
This article analyzes the use of a CNN to solve the DRM problem in SatComs by using a suitable cost function and a realistic traffic model. The suggested DRM cost function aims to minimize the error between the offered capacity and the required capacity while minimizing the amount of resources used in the satellite.
This contribution provides a tool to implement a dynamic and efficient resource management in future VHTS systems. The satellite industry increasingly expresses an interest in DL applications for satellite systems [19], thus opening a great possibility for future works.
Compared to PPA, AG-DPA, and classification, the proposed algorithm achieves a better performance on the tradeoff to reduce the capacity error and power consumption.
One of the limitations of CNN in DRM is the dependence with the traffic model used during training. Thus, in a real system with changes in the traffic behavior noncompliant with the model, the CNN will have to be trained again.
A multiscale CNN has been shown to provide better predictions for other applications in engineering systems compared to the CNN [37]. Therefore, in future work, it is suggested to implement this algorithm for DRM and make a comparison with the CNN performance.

APPENDIX
The CNN architecture can be divided into the input layer, the hidden convolution layers, the full connection hidden layers, and the output layer, as shown in Figs 3-6.
In the convolutional layers, the features are extracted from the kernels obtaining at the output Y j as represented in (19). In this sense, the activation function g 1 (·) we have used in the convolutional layers is the ReLu function defined in [34].
MaxPooling was used to pool the convolutional layers. Max-Pooling works to position a 2 × 2 matrix on the feature map and choose the largest value in that matrix. The 2 × 2 matrix moves from left to right across the feature map by choosing the largest value on each pass. These values form a new matrix called the pooled features map. MaxPooling works to preserve the main features while reducing the size of the image.
After the features map is obtained, the next step is to flatten it. The flatten layer involves the transformation of the entire feature map matrix into a single column that is fed into the neural network for processing.
In the neural networks area, perceptron refers to the artificial neuron or basic unit of inference, from which an algorithm is developed.
In the full connection layers, the features obtained in the convolutional layers are analyzed in order to allocate the resources that minimize the DRM cost function (12). The multilayer perceptron are obtained with (20) where the activation function used g 2 (·) in the full connection layers is defined by the hyperbolic tangent (tanh) function in [34].
The softmax function was used in the output layer of a CNN classifier. Softmax generates a vector that represents the probability distributions of a list of potential outcomes.
Softmax takes a vector z of L possible configurations as input and normalizes it into a probability distribution consisting of L probabilities proportional to the exponentials of the input numbers. That is, before applying softmax, some vector elements could be negative or greater than one depending on the cost function; and it could not add up to 1; but after applying softmax, each element will be in the range of 0-1, and the elements will add up to 1, so that they can be interpreted as probabilities.
Since the CNN is a supervised learning algorithm, the cost function of the DRM (12) is used to assign in a supervised way the labels (Y) corresponding to the training data generated with the traffic models (9) and (10).
The CNN cost function (21) aims to minimize the error that exists between the predicted values (ỹ) and the labels (Y) in the output layer. In this article, the CNN output layer is a classification layer (see Fig. 5) where each class represents a possible configuration of satellite resources. In that sense, the Q(·) function in (21)  The method implemented for learning all the supervised DL algorithms is known as "backpropagation." The error outputs are propagated backward from the output layer to all the neurons in the hidden layers that contribute directly to the output. This process is repeated, layer by layer, until all the neurons in the network have received an error signal describing their relative contribution to the total error. For more detail on how the backpropagation algorithm works, see El-Amir and Hamdy [33].