Cost- and Delay-Efficient Backhaul Selection for Time-Sensitive Maritime Communications

Extending the existing near-shore terrestrial infrastructure with non-terrestrial network capabilities helps maritime operators alleviate the high costs of communication and meet the requirements imposed by time-sensitive applications. Recognizing that the deployment of terrestrial and non-terrestrial networks necessitates selecting from the available wireless backhaul solutions, which have dissimilar data transmission costs and communication link qualities, it is essential to propose an appropriate backhaul selection policy. Specifically, in this letter, we coin a backhaul selection policy that manages the inherent trade-off between data transmission expenses and timely throughput guarantees for maritime communications. We formulate the backhaul selection problem as a Markov decision process and show that the proposed solution is not only more cost-efficient, but also satisfies the timely throughput requirements in contrast to the currently used greedy strategies.

5G base stations (BSs) can be deployed onboard vessels similarly to the cells-on-wheels and cells-on-wings utilized in public safety applications [5]. A vessel-mounted BS (VBS) can thus employ wireless backhauling and transmit data from onboard devices through the available terrestrial BS (TBS), UAV-mounted BS (UBS), or low Earth orbit (LEO) satellite. Specifically, we consider a scenario where traffic from onboard devices is aggregated at the VBS and subsequently forwarded to the core network either over a direct backhaul route to the TBS or over two-hop wireless backhaul routes via UBS or LEO satellite. Such onboard devices include wearables, which typically perform discontinuous transmissions to save battery. Hence, the periodically generated data should be delivered within a given time frame to offer timely and relevant onboard monitoring.
To date, coverage extension of the near-shore wireless terrestrial networks with the aid of non-terrestrial platforms, as well as their optimal deployment, management [6], and integration [7] within a single system for maritime applications, have been extensively studied in the literature. However, the use of non-terrestrial platforms is usually limited to areas wherein vessels are out of coverage of the terrestrial infrastructure, i.e., beyond coastal waters [6]. Even in coastal waters, vessels may experience low radio link quality or be out of the terrestrial network coverage.
The duration and frequency of the out-of-coverage periods and poor communication conditions increase as the distance between the TBS and the vessel grows due to irregular deployment of the BSs. Such coverage gaps may significantly deteriorate the performance of time-sensitive maritime applications, e.g., asset monitoring where staff onboard a vessel send high-definition footage from body cameras to shore-based control centers for data analytics [8]. In addition, the backhauling solutions can be costly and may not always be efficient for the vessel operator. The latter seeks a timeaveraged cost-performance trade-off for choosing a backhauling solution at a given time. This problem motivates our work and, to the best of our knowledge, has not been addressed in prior literature.
In this letter, we propose a framework to devise the backhaul selection policy that achieves a desired trade-off between communication expenses and system performance. We formulate the backhaul selection problem as an infinite horizon discounted Markov decision process (MDP) with the aim of jointly minimizing the total cost of data transmission and the number of data units lost due to deadline violation. We then compare the obtained ϵ-optimal policy with greedy approaches in terms of communication cost efficiency and timely throughput [9] under different system parameters. As the computation of the ϵ-optimal policy may be timeand resource-consuming, our contribution in this regard is lightweight heuristics that offer near-optimal backhaul selection for the given environment and system parameters.

II. SYSTEM MODEL AND PROBLEM FORMULATION
We consider an integrated terrestrial and non-terrestrial cellular network to support near-shore maritime communications. The network comprises a terrestrial infrastructure (i.e., TBS and ground station (GS) for satellite communications) and a non-terrestrial segment (i.e., UBS and satellite). Hence, it offers the VBS three options for wireless backhaul relaying: direct backhaul to the TBS, two-hop backhaul via UBS, or two-hop backhaul via LEO satellite. Let N , M, and K be the sets of TBS, UBS, and satellite, respectively.
The locations of TBS and UBS form 3D spatial Poisson point processes in Φ N and Φ M with densities λ N and λ M nodes per km 2 , respectively, while the location of the LEO satellite is defined by its altitude h LEO and elevation angles ϕ V S and ϕ SG . The geometry of Φ N and Φ M features the essential properties of the coastline, feasible communication ranges for the considered backhaul, and potential mobile network operator restrictions. Hence, the area of terrestrial access is restricted by Y N and H N , as well as the aerial access segment that is bounded by Y M and H M (see Fig. 1).
In the considered communication scenario, a given VBS transmits aggregated onboard traffic to the core network. As the vessel moves with the constant speed v along the coastline, the quality of its communication links may change due to the time-varying fading. Therefore, the network controller may choose between the three different backhaul options (TBS, UBS, or satellite) to provide timely throughput guarantees [9].
Let the system time be slotted and indexed by t ∈ {1, 2 . . . } with equal slot duration ∆t. At the beginning of every slot t, Q data units arrive at the backlog queue of the VBS and the deadline for the newly arrived units is ∆t. The controller chooses a backhaul a t from the set A of the available options for data transmission based on the system state s t . Due to predefined vessel mobility and slowly changing environment, the controller estimates only the expected loss rate l(s t , a t ) for each backhaul at a given time t. Hence, the outcome of the data transmission in every slot is unknown.
Let D be a random variable for the number of delivered data units out of Q units under the given loss rate l(s, a). Therefore, D follows a binomial distribution [10] with parameters 1 − l(s, a) and Q, and has the probability mass function of: Hence, the expected number of delivered data units d(s, a) in state s if backhaul a is selected becomes: Data transmission over the three backhaul options incurs different costs. Let m a be a known monetary cost per data unit transmission over backhaul a. The immediate cost of using backhaul a in state s is m d(s, a) ≜ m a d(s, a). Let π denote a feasible policy that specifies the backhaul selection in state s t , i.e., a t = π(s t ). For an arbitrary communication session that takes τ slots on average, the expected total cost C π,τ and the timely throughput T π,τ can be obtained using (3) and (4): where E τ [·] and E π [·] are the expectations with respect to the probability distribution of τ and policy π, respectively. We assume that τ follows a geometric distribution with parameter γ and mean 1/(1 − γ) [11, p.125]. The expectation over τ can be transformed into the discounted cost C π and timely throughput T π over infinite time horizon given as: Let q * be the timely throughput requirement. If the outcome of the transmission d(s t , a t ) violates this requirement, action a t incurs penalty ϵ d(s t , a t ) . Our goal is to find policy π * ∈ Π that minimizes the expected total normalized cost and penalty as given in (7), shown at the bottom of the next page.

III. MDP FRAMEWORK AND SOLUTION
The problem in (7) is an MDP with infinite horizon. In this section, we define states, actions, transition probabilities, cost, and penalty to transform the optimization problem (7) into an MDP and solve the latter to obtain the optimal policy for backhaul selection.

A. States
Let s t = n 1 (t), . . . , n |A| (t) denote the system state at slot t, where n a (t) stands for the state of the backhaul a. We assume that the number of states of every backhaul is finite and n a (t) takes an integer number from the set {1, . . . , N a }.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
Each state n a of the backhaul a is associated with the loss rate l(s, a). Therefore, the state space S = {(n 1 , . . . , n |A| ) : n a ∈ {1, . . . , N a }, ∀a ∈ A} is a countable and finite set with cardinality |S| = |A| a=1 N a .

B. Actions
At each time slot t, the controller chooses action a t from the set of actions A. We assume that any action a ∈ A is feasible in any state s ∈ S. Hence, A is finite and does not depend on the state of the system.

C. Transition Probabilities
Let P na,n ′ a denote the transition probability from state n a to n ′ a of the backhaul a. In the proposed formulation, we assume that the states of the three backhaul options are independent and that their evolution is not impacted by the action taken. The transition probability p(s ′ |s, a) from system state s = (n 1 , . . . , n |A| ) to s ′ = (n ′ 1 , . . . , n ′ |A| ) can then be given as:

D. Cost and Penalty
The cost incurred by transmitting d(s, a) data units is m (d(s, a)). To penalize the actions that violate the timely throughput requirement q * , we define penalty function ϵ(d(s, a)) as follows:

E. Solution
Based on the above MDP formulation, we note that (i) the cost and transition probabilities are stationary and do not vary from slot to slot, (ii) the cost is bounded c(s, a) > 0 for all a ∈ A and s ∈ S, (iii) the future costs are discounted according to the discount factor γ, and (iv) the sets of states and actions at each state are discrete and finite.
Let V π (s) be the expected total discounted cost over the infinite horizon starting from state s and applying policy π.
The policy π * ∈ Π exists and is optimal if and only if V π * is a solution of the following optimality (Bellman) equation: v n (s) = min a∈A c(s, a) + s ′ ∈S γp(s ′ |s, a)v n−1 (s ′ ) , (12) where v n (s) denotes the expected cost after n slots starting in state s.
According to [11, Theorem 6.2.10], under the assumptions (i)-(iv), there exists an optimal deterministic stationary policy. As the cost bound in the assumption (ii) does not reach 0, we apply the value iteration method summarized in Algorithm 1 that guarantees ϵ-optimality of the obtained stationary policy over a finite number of iterations. By choosing small ϵ, e.g., ϵ = 0.01, the devised sub-optimal policy π * ϵ approaches the true-optimal policy π * in norm as ||V π * ϵ − V π * || ≤ ϵ.

Algorithm 1 Value Iteration for ϵ-Optimal Solution of MDP
IV. PERFORMANCE EVALUATION In this section, we outline the behavior of π * using heuristics and evaluate the performance of our system when applying several policies. We start with providing the main simulation parameters used in this evaluation.

A. Communication Assumptions
Areas Φ N and Φ M are approximated by the rectangles [X × Y N ] ∈ R 2 and [X × Y M ] ∈ R 2 , respectively. Here, X = 20 km represents the length of the area along the shore, Y M denotes the distance from the coastline to the vessel path in km, To capture the propagation losses, such as reflection, diffraction, and tropospheric scattering in maritime environments, we employ the empirical propagation model (EPM-73) [12] for the corresponding backhaul links between VBS, TBS, and UBS. We also account for random fading caused by sea wave movement as suggested in [13]. We consider that both TBS and UBS operate at the 3.5 GHz frequency band with the available bandwidth of 100 MHz and 80 MHz, respectively. For the satellite backhaul links, we use the third generation partnership project (3GPP) propagation model for non-terrestrial networks (NTNs) [3] and assume Ka-band with 30 GHz and 20 GHz for uplink and downlink transmissions, respectively. A total bandwidth of 400 MHz is available in both directions. The transmit power of all the involved nodes is P T = 33 dBm, while the antenna gains for the backhaul links are selected according to the 3GPP recommendations in [3] and [6].
We consider a variable number of data units Q to arrive in the backlog queue of the VBS and to be transmitted within a slot. We assume a slot duration of ∆t = 100 ms, which is convenient for periodic time-sensitive transmissions and backhaul switching. Moreover, for the two-hop backhaul solutions, we assume that each time slot is equally shared by the two communication links and that the data units are transmitted sequentially. Intuitively, this requires higher data rates on both links of the two-hop backhaul path to successfully deliver the demand within a slot as compared to the direct link between the VBS and the TBS, and can be captured by the loss rate l(s, a) when the data loss is due to a deadline violation. The data unit size is 1500 bytes, and we let the requirement q * be 0.8 · Q.

B. Heuristic Backhaul Selection
Computation of ϵ-optimal policy π * ϵ may be challenging due to several reasons. First, it requires the knowledge of all the possible MDP states and transition probabilities, which can be difficult to obtain in practice. Second, storing all the state-and action-related information and computation of the ϵ-optimal policy can be time-and resource-consuming as the number of states is subject to the combinatorial explosion. We, therefore, propose a lightweight heuristic algorithm that allows for computing a sub-optimal backhaul solution on the fly and for any state of the system. It does not require the knowledge of previous or future states and relies only on the expected loss rate l(s, a) at the current state for all a ∈ A. It has constant complexity (O(1)) as the number of actions at every state is constant and is fast since the number of actions is small (|A| = 3). The main shortcoming of the suggested heuristics is the fact that the bound on the optimality gap may vary with the parameters of the system model. However, it shows adequate empirical results as discussed in Section IV-C.
To introduce a heuristic algorithm that describes the behavior of the obtained policy π * ϵ , we first define π 1 , π 2 , and π 3 , such that policy π 1 chooses the backhaul with the lowest monetary cost when applied, policy π 2 selects the backhaul with the second lowest cost, and policy π 3 chooses the backhaul with the third lowest cost, which is the most expensive one in our scenario. The proposed lightweight method for backhaul selection applies each of these policies according to (13), where p * = Q−q * Q . Hence, the most costly backhaul is selected only when the loss rate l(s, π 3 ) associated with its state is less than or equal to p * , while the loss rates of the other backhaul options are greater than p * . If the loss rate associated with the state of the least expensive backhaul l(s, π 1 ) is greater than p * but the loss rate l(s, π 2 ) associated with the state of the second cheapest backhaul does not exceed p * , then the backhaul with the second lowest cost is to be selected. Otherwise, the algorithm suggests the backhaul with the lowest cost:

C. Evaluation Results
We compare the ϵ-optimal policy and the proposed heuristic solution given in (13) with two greedy backhaul selection approaches chosen for benchmarking. In the considered scenario, a greedy policy always selects the backhaul with the minimal monetary cost of data transmission (referred to as MinCost) or selects the option with the highest signal-tonoise ratio (named MaxSNR). The key performance indicators utilized in this comparison are cost efficiency, defined as the number of data units delivered within the deadline per unit of the monetary cost, and timely throughput [9].
In Fig. 2, we vary the cost per data unit transmission via TBS (m T BS ), while the costs of the two remaining options are updated accordingly by considering the proportions from the existing microwave and satellite backhaul deployments [14]. More precisely, we assume that transmitting one data unit via UBS and satellite is two and eight times more expensive than doing so via TBS, respectively (i.e., m U BS = 2 · m T BS and m LEO = 8 · m T BS ) [14]. Fig. 3 displays the obtained results for timely throughput as a function of Q.
As reported in Fig. 2 and Fig. 3, the output of the heuristics given in (13) approaches the results of the ϵ-optimal policy for a variable monetary cost m T BS per data unit transmission and traffic demand Q. Therefore, the proposed heuristics demonstrate near-optimal performance under a wide range of key system parameters. The performance gap between ϵ-optimal policy and the proposed heuristics as a function of   demand Q for the average timely throughput is highlighted in Fig. 4. This gap grows with an increase in demand Q but remains below 1% of the performance loss, which justifies the use of the proposed heuristics as a lightweight solution for the near-optimal backhaul selection.
By applying the proposed heuristics, the system can make more cost-efficient decisions for backhaul selection than when following the MaxSNR strategy (see Fig. 2). These two policies help the system meet the timely throughput requirement in contrast to the MinCost approach, as demonstrated in Fig. 3. Hence, the heuristic solution proposed in this work provides the system with better cost efficiency than the MaxSNR strategy and with higher timely throughput than the MinCost approach, which confirms the benefits of taking into account both cost and loss rate factors in the backhaul selection and the limitations of making a decision based on signal quality or monetary cost alone.
As the distance from the vessel to the coastline impacts the behavior of the backhaul selection policy, we illustrate the utilization of each of the three backhauling options as a function of such distance in Fig. 5. The latter shows that the system relies more on the NTNs with an increase in the vesselto-shore distance, and specifically on the UBS backhauling solution. As the LEO satellite is the most costly option, it is selected only in a few cases where the other two alternatives are not available.
V. CONCLUSION In this letter, we studied the backhaul selection problem for time-sensitive applications in a near-shore maritime communication system leveraging integrated terrestrial and non-terrestrial networks. Aiming to control the trade-off between data transmission expenses and stringent application requirements, we formulated the problem as an MDP and proposed a lightweight algorithm for near-optimal backhaul selection. We evaluated the performance of the ϵ-optimal policy and heuristics, and compared these against two reference greedy strategies in terms of the cost efficiency and timely throughput. Several extensions are considered for future work, including complex scenarios with more relaying options and relay mobility.