Online Learning for Rate-Adaptive Task Offloading Under Latency Constraints in Serverless Edge Computing

We consider the interplay between latency constrained applications and function-level resource management in a serverless edge computing environment. We develop a game theoretic model of the interaction between rate adaptive applications and a load balancing operator under a function-oriented pay-as-you-go pricing model. We show that under perfect information, the strategic interaction between the applications can be formulated as a generalized Nash equilibrium problem, and use variational inequality theory to prove that the game admits an equilibrium. For the case of imperfect information, we propose an online learning algorithm for applications to maximize their utility through rate adaptation and resource reservation. We show that the proposed algorithm can converge to equilibria and achieves zero regret asymptotically, and our simulation results show that the algorithm achieves good system performance at equilibrium, ensures fast convergence, and enables applications to meet their latency constraints.

between data rate and latency, while providing information about pricing and billing. At the same time, the abstraction should allow edge infrastructure operators to efficiently manage the available physical resources, subject to energy and reliability constraints.
A promising lightweight abstraction that could potentially suit a variety of edge applications is function as a service (FaaS). In the case of FaaS, applications are explicitly composed of the subsequent parallel or sequential invocation of subtasks, referred to as functions [4]. Functions are managed, i.e., instantiated, executed and shut down, by the infrastructure, relieving the programmer from the burden of system configuration. Stateless FaaS has already found adoption in cloud computing, referred to as serverless computing, as it provides autoscaling and follows the pay-as-you-go pricing model [5]. Recently proposed solutions for stateful FaaS could extend this offering with low-latency mutable state and communication in the near future [4], [6].
Nonetheless, compared to a cloud computing environment, resource management for FaaS in an edge computing environment faces a number of novel challenges [7]. First, it has to cater for heterogeneous hardware platforms, and has to consider the orchestration of communication and computing resources. Second, it should cater for the latency requirements of applications that involve the execution of multiple functions, and at the same time may be able to adjust their data rate so as to maximize their utility. Third, it has to deal with the strategic interaction between multiple applications for constrained resources. The outcome of the resulting interaction between infrastructure resource management and application behavior is, however, not well understood.
Motivated by the above challenges, in this paper we consider the interaction between rate control and infrastructure resource management for latency sensitive tasks in a serverless edge computing system, and make the following main contributions: • We propose a queuing network model of task graph execution and use it for formulating a game theoretic model of the interaction between self interested wireless devices that can reserve communication and computing resources, and a FaaS edge operator that allocates the resources. in a G/G/1 fork-join network, a result that may be of independent interest. • We show that under perfect information the strategic interaction between Wireless Devices (WDs) can be formulated as a generalized Nash equilibrium problem, and we show the existence of Nash equilibria by using variational inequality theory. • For the case of imperfect information, we propose an online algorithm called Online Adaptive Rate Reservation and Control (OARC) for learning equilibria in a distributed manner. We show that OARC converges to equilibria and achieves zero regret asymptotically. • Our numerical results show that OARC outperforms the state of the art in Online Convex Optimization (OCO) for a variety of task graphs. The rest of the paper is organized as follows. We present the system model and problem formulation in Section II, and prove pseudoconvexity and monotonicity of the sojourn time in fork-join networks in Section III. We consider equilibria under perfect information in Section IV, and learning equilibria under imperfect information in Section V. Section VI presents numerical results. Section VII discusses related work and Section VIII concludes the paper.

II. SYSTEM MODEL AND PROBLEM FORMULATION
We consider an edge computing system that consists of a set N = {1, 2, . . . , N} of wireless devices (WDs), a set A = {1, 2, . . . , A} of access points (APs) and an edge cloud that hosts a set C = {1, 2, . . . , C} of computing resources (CRs), illustrated in Figure 1. We define the set R = A ∪ C of edge (communication and computing) resources.
Tasks and subtask graphs: We consider that WD i ∈ N generates latency sensitive computational tasks of type i with intensity λ i . We model a type i task as a directed acyclic graph G i = (V i , E i ), where each node v ∈ V i is a subtask. The source node v i 0 ∈ V i represents wireless transmission of the task's input data via an AP a ∈ A to the edge cloud.
Nodes v ∈ V i \ {v i 0 } are computational (execution) subtasks, and correspond to the execution of the functions that constitute the task. The sink node v i |Vi| is the last execution subtask, and its completion marks the completion of the task. We denote by T i the maximum average task completion time acceptable to tasks of WD i. A directed edge e(v i m , v i o ) ∈ E i indicates that subtask v i m has to finish before subtask v i o can start execution. We refer to G i as the task graph of WD i, and we consider that the task graphs G i represent fork-join type jobs, i.e, subtasks are executed sequentially or in parallel. Finally, we define V = ∪ i∈N V i . Observe that for a task of type i the arrival rate Communication and Computing Resources: We denote by R v ⊆ R the set of resources that can be used for performing subtask v ∈ V i . For a wireless transmission subtask v i 0 ∈ V i the resources are R v i 0 ⊆ A, i.e., a subset of the APs, while for execution subtasks v ∈ V \ {∪ i∈N v i 0 } they are R v ⊆ C, i.e., a subset of CRs. Similarly, for a resource r ∈ R we define the set V r = {v ∈ V|r ∈ R v } of subtasks that can be performed using resource r. We denote by μ r,v the service rate at which resource r can process subtask v; thus, μ r,v i 0 is the achievable transmission rate of WD i ∈ N when using communication resource r ∈ A, while for execution subtask v ∈ V \ {∪ i∈N v i 0 } the service rate is μ r,v when using CR r ∈ C. Heterogeneous service rates allow us to model infrastructures with heterogeneous communication and computing resources. Figure 1 illustrates the components of the considered system, including WDs, heterogeneous communication and computing resources and the corresponding modeling abstraction, which maps every subtask to a corresponding G/G/1 queue, resulting in a G/G/1 queuing network as a model of data transmission and subtask graph execution.

A. Edge Resource Allocation
Our model of resource allocation in the serverless edge infrastructure allows resources to be shared dynamically among subtasks. We denote by p r,v the fraction of resource r allocated for processing subtask v ∈ V, and by p = (p r,v ) r∈R,v∈V the resulting resource allocation vector. Furthermore, we define the resource utilization ρ r = v∈Vr p r,v ≤ 1, and the vector ρ ∈ [0, 1] |R| , which contains the resource utilizations ρ r in nonincreasing order. We consider that the processing capacity not allocated at a resource is shared among the subtasks in proportion to their allocations, thus the perceived allocation of resource r available to subtask v is We denote byμ r,v =p r,v μ r,v the resulting perceived service rate for subtask v on resource r, and we express the total perceived service rate for subtask v, Similar to existing serverless offerings and to bandwidth SLAs in 5G networks [8], we consider that users can reserve Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.
computing capacity and communication resources. The ability to reserve compute capacity is akin to provisioned concurrency in existing serverless offerings 1 Nonetheless, unlike in existing commercial offerings, for simplicity we define the reservation in terms of processing rate (instead of processing capacity). This formulation is reasonable, as users can know the average service times of their subtasks. We denote by σ vi the service rate reservation made by WD i ∈ N for its subtask v i ∈ V i . Furthermore, we denote by σ i = vi∈Vi σ vi the total rate reservation of WD i. Throughout the paper we consider that σ vi = σi |Vi| , ∀v i ∈ V i , i.e., WDs make the same service rate reservation for all of their subtasks. We make this assumption for two reasons. First, a uniform allocation of service rates to the servers minimizes the mean sojourn time in a tandem network of M/M/1 queues. It may not be optimal for non M/M/1 queues, but it is likely not too far from optimal. Second, this model allows for a simple interaction between the users and the infrastructure as each user can reserve resources through a single parameter independent of the number of subtasks in its task graph, providing ease of use for customers. Considering non-homogeneous rate reservations could be an interesting extension of our work.
Load-balancing Network Operator: To effectively serve user requests, we consider that the network operator performs load balancing periodically. It does so by minimizing ρ, i.e., the vector of the utilization of communication and computing resources, in the lexicographical sense, subject to rate stability constraints. 2 Thus, the operator periodically solves the optimization problem lex min Constraint (4) ensures that each subtask receives the reserved rate and allows WD i to adjust the sojourn time for subtask v (c.f., Kingman's approximation of the waiting time in a G/G/1 queue [9]), constraint (5) defines the utilization of each resource r ∈ R under resource allocation vector p, constraint (6) enforces resources to be allocated uniformly among execution subtasks of a WD, and constraints (7) and (8) ensure that the allocation of resources to the subtasks respects assignment constraints. The resource allocation implemented by the operator determines the perceived service rates of the subtasks, and together 1 Amazon Lambda allows function instances to be kept initialized, called provisioned concurrency. 2 Let ρ, ρ ∈ R |R| ≥0 . Then ρ < L ρ (smaller according to the lexicographical order) if and only if there exists 1 ≤ r ≤ |R| such that for r < r we have ρ r = ρ r and ρ r < ρ r . Given that ρ consists of the utilizations in nonincreasing order, lexicographical minimization results in a particular min-max solution. The two are equivalent for |R| = 2.
with the task arrival rates it determines the average task completion times of the users. To express this dependence, we define the collection λ = (λ i ) i∈N of arrival intensities of the WDs. Similarly, we define the collection σ = (σ i ) i∈N of resource reservations of the WDs. Finally, we denote bȳ S i (λ, σ) the mean completion time of tasks generated by WD i, which in our model equals the mean sojourn time of customers in a G/G/1 fork-join queuing network corresponding to the subtask graph G i .

B. User Utility
Aligned with the pay-as-you-go billing model widely used in serverless computing, we denote by c λ i and c σ i the unit cost per arrival rate and per resource reservation, respectively, and we define the computing cost for WD i as The term c λ i accounts for the cost due to the number of invocations, but it can also account for the computational resources actually used for executing tasks, as usual in existing serverless offerings. Furthermore, we define the utility of WD i, where Concavity of the utility is a natural assumption for many monitoring and control applications, and is widely used as it captures diminishing marginal gains [10], [11], [12], [13], [14], while differentiability ensures analytical tractability. We also make the reasonable assumptions that f i (0) = 0 and c λ i < df dλi | λi=0 ≤ L i ∈ R >0 . Since the WDs pay for the rate at which they generate tasks and for the resource reservations they make (c.f. equation (9)), for each WD i ∈ N there exists a maximum rate λ i and a maximum resource reservation σ i , which can be obtained as the solution to ∂Ui ∂λi (λ i , 0) = 0 and to U i (λ i , σ i ) = 0, respectively. Therefore, we can consider that WD i ∈ N chooses σ i from the compact set S i = [σ i , σ i ] and λ i from the compact set [λ i , λ i ], for some σ i ≥ 0 and λ i ≥ 0.

C. Serverless Stochastic Rate Allocation Game
In the considered system the WDs are engaged in repeated strategic interaction through the resource allocation p, which they can influence through the resource reservations σ. We consider that the WDs can update their resource reservations σ periodically, i.e., whenever the network operator updates the resource allocation p by solving (3)- (8). Between subsequent updates of the resource reservation the WDs can adjust their rates λ. We adopt the game theoretic notation σ −i and λ −i to denote the resource reservations and the rates of all WDs except WD i, respectively.
Each WD i ∈ N aims at maximizing its utility (9) subject to its average task completion time constraint T i , by choosing resource reservation σ i and rate λ i . Thus, each WD i aims at solving the optimization problem The resulting game played by the WDs is a dynamic game in which not only the objective functions of WDs depend on each others' strategies, but also the strategy sets through stochastic constraints. Importantly, in practice the mean task sojourn times, and thus, the action sets are not known, but have to be learned by the WDs. We refer to the resulting game as the Serverless Stochastic Rate Allocation (SSRA) game. In what follows we investigate (i) whether the SSRA game admits an equilibrium, and (ii) whether WDs could learn an equilibrium strategy in a distributed manner.

III. SOJOURN TIME CHARACTERIZATION
In this section we first show monotonicity and pseudoconvexity of the mean task completion timeS i (λ, σ), i.e., the sojourn time in a G/G/1 fork-join network, in the task arrival rate λ i . We then characterize the structure of the optimal solution of the operator's load balancing problem (3)- (8), and finally we show monotonicity of the mean task completion timeS i (λ, σ) in the resource reservation σ i . We use these results in Section IV and V.

A. Monotonicity and Pseudoconvexity of the Sojourn Time in the Arrival Rate
It is known that even in a single G/G/1 queue with FCFS service discipline the mean sojourn time need not be a convex function of the arrival rate [15]. Nonetheless, in what follows we show that the mean sojourn time is a monotone, pseudoconvex function of the arrival rate. The importance of this result is that pseudoconvexity is a sufficient condition for gradient-based learning algorithms to converge to the optimal solution.
We start with showing the result for tandem queues; we consider a set V = {1, 2, . . . , V } of G/G/1 queues in series, and we assume that the service discipline is FCFS and work-conserving (i.e., a server is never idle when its queue is non-empty). We make the common assumption that the interarrival and service time distributions satisfy the stability criterion [16], [17]. We denote by I v n the time between the arrival of customer n − 1 and customer n to queue v ∈ V. Furthermore, we denote by s v n , w v n and S v n the service, waiting and sojourn times of customer n in queue v ∈ V, respectively, and we introduce the notation where for l > m the sums are empty and are thus 0. Before we present our results, let us recall two fundamental results concerning the waiting times and the sojourn times in tandem queues, respectively. We first present the waiting time expression for a single G/G/1 queue, and then we extend the result to tandem queues. Lemma 1 [16]: Let m v represent the m th customer in queue v. Lindley's recursion has the unique solution The second result follows from Lemma 1 and provides a closed-form expression for the sojourn time of G/G/1 tandem queues. Lemma 2 [17]: The total time S 1:V n that customer n spends in a system of V ≥ 1 queues connected in series can be expressed as We note that both results hold for stable queuing systems, including the heavy traffic regime, whenever the offered load is less than 1. In what follows we prove our first main result concerning the sojourn time of individual tasks based on Lemma 2.
Theorem 1: Consider a G/G/1 tandem queue consisting of V queues, and an arbitrary customer n. The total sojourn time S 1:V n of customer n is an increasing pseudoconvex function of the customer arrival rate λ.
Proof: For an arrival rate of λ, let us denote by τ k−1 = 1 λ t k−1 and τ k = 1 λ t k the time at which customers k − 1 and k arrive in the system (i.e., in the first queue), respectively. t k−1 and t k can assume any non-negative values and they can be any realizations of random variables. Then, the interarrival time of customer k and customer k − 1 at the first queue is . Therefore, it follows from (12) and Lemma 2 that the total sojourn time S 1:V n of customer n can be expressed as First observe that for two successive jobs k − 1 and k we is not a function of λ (c.f. equation (12)), we have that S 1:V n is defined as the maximum of increasing functions, is continuous, but it is not necessarily a differentiable function of λ. Therefore, to prove pseudoconvexity of S 1:V n we need to consider the upper Dini derivative of S 1:V n , which we denote by D + S 1:V n . It is easy to see from (14) that S 1:V n is an increasing function of λ such that D + S 1:V n (λ ) > 0 for any λ > 0. To prove pseudoconvexity, we need to show that S 1:V n is increasing in any direction where the upper Dini derivative is positive.
Therefore, to check pseudoconvexity it suffices to show that λ ≤ λ implies S 1:V n (λ ) ≤ S 1:V n (λ) for all λ on the line segment connecting λ and λ , i.e., that S 1:V n is nondecreasing in λ, which is clearly the case. This proves the theorem. Next, we extend the above result to fork-join networks.
Theorem 2: Consider a G/G/1 fork-join network G = (V, E) of queues with FCFS and work-conserving service discipline. Then the sojourn time S n of customer n is an increasing pseudoconvex function of the arrival rate λ.
Proof: Let us denote by Π = {v 1 , . . . , v |Π| } the set of parallel queues and let v 0 and v |V| be the first and the last queue in the network, i.e., V = {v 0 } ∪ Π ∪ {v |V| }, respectively. Furthermore, let us denote by S pπ n the sojourn time of customer n on the simple path which connects the first queue v 0 with the last queue v |V| via parallel queue v π ∈ Π. Then, the total sojourn time S n of customer n in the fork-join network G = (V, E) can be expressed as By Theorem 1 we know that S pπ n is an increasing pseudoconvex function of λ. Furthermore, it is easy to see from (14) and (15) that S n is also an increasing function of λ with the upper Dini derivative D + S n > 0. By following a similar approach to the one used in the proof of Theorem 1 it follows that S n is also pseudoconvex in λ, which proves the result.
Finally, we extend the result to the mean sojourn times.

Theorem 3: The mean sojourn timeS in a G/G/1 fork-join network G = (V, E) is an increasing pseudoconvex function of the arrival rate λ.
Proof: Since Theorem 1 is true for any non-negative values of t k−1 and t k (c.f., equation (14)), it is also true when the realizations of t k−1 and t k are random variables, hence the result.
Using the above we can obtain a useful characterization of the service times of the tasks generated by the WDs in the considered serverless edge computing system. Corollary 1: The mean sojourn timeS i (λ, σ) of a task generated by WD i ∈ N is an increasing pseudoconvex function of the task arrival rate λ i .
Proof: The result follows from Theorem 3.

B. Perceived Service Rate Under Load Balancing
We now turn our attention to the perceived service rateμ * v of the WDs. In order to obtain a characterization, we first analyze the structure of an optimal solution of the operator's problem (3)-(8).

) equality holds in each constraint (4) and
2) ρ * r = ρ * r holds for any two resources r, r ∈ R v . Proof: We start with proving (i). Let us assume that there is an optimal solution p * to (3)- (8) Then, ρ r < ρ * r and ρ r = ρ * r , r ∈ R\{r} hold. Since ρ contains the utilizations of resources in nonincreasing order we obtain that ρ L ρ * , which contradicts the assumption that (p * , ρ * ) is an optimal solution to (3)-(8), and proves (i).
We continue with proving (ii). Let us assume that there is an optimal solution p * to (3) Since ρ contains the utilizations of resources in nonincreasing order we obtain that ρ L ρ * , which contradicts the assumption that (p * , ρ * ) is an optimal solution to (3)-(8), and proves (ii). This concludes the proof. Proposition 1 allows us to formulate the following results.
Corollary 2: Consider an optimal solution (p * , ρ * ), a sub- Then the perceived service rate is Proof: First, from (ii) in Proposition 1 we have that ρ * r = ρ * r for any resource r ∈ R v \ {r }, and thus the perceived service rateμ * v defined in (2) can be expressed as Corollary 3: The perceived service rateμ * v of every subtask v ∈ V i is a nondecreasing function of the resource reservation σ i .
We can provide a stronger result if we restrict our attention to the case that resources form equivalence classes, defined as follows.
Corollary 4: Under Assumption A1 the utilization ρ * r is an affine function of σ i . Furthermore, the perceived service rateμ * v of every subtask v ∈ V i is a concave nondecreasing function of the resource reservation σ i .
We proceed with providing a general result concerning the sojourn time in a fork-join network G = (V, E). To do so, we denote by μ v the service rate in queue v ∈ V.
Theorem 4: Consider a fork-join network G = (V, E) of G/G/1 queues with FCFS and work-conserving service discipline. The sojourn time S n of customer n and the mean sojourn timeS are decreasing functions of the service rates μ v .
Proof: Let us consider three customers l, n and m such that l ≤ n ≤ m. For a service rate of μ v we can express the time required to serve customer n as For any realization of x n,v , it follows from the definitions of Σ S v l,m , S 1:V n and S n (c.f., equations (12), (14) and (15)) that the sojourn time S n of customer n is a decreasing function of service rate μ v in queue v ∈ V. Taking expectation, it follows that the mean sojourn timē S in a fork-join network is also a decreasing function of the service rate μ v in queue v ∈ V, which proves the result.
Theorem 4 allows us to formulate the following result. Corollary 5: The mean sojourn timeS i (λ, σ) of a task generated by WD i ∈ N is a decreasing function of the perceived service rateμ v for each subtask v ∈ V i . Proof: The result follows from the proof of Theorem 4.
Finally, we use the above result to show that the mean sojourn timeS i (λ, σ) is a monotonic function of the resource reservation σ i .
Theorem 5: Consider an optimal solution to the operator's problem (3)- (8), and the resulting perceived service ratesμ * v of subtasks v ∈ V i . The mean sojourn timeS i (λ, σ) of a task generated by WD i ∈ N is a nonincreasing function of the resource reservation σ i .
Proof: First, from Corollary 5 we have that the mean sojourn timeS i (λ, σ) is a decreasing function of the perceived service rateμ * v for each subtask v ∈ V i . Second, from Corollary 3 we have that the perceived service rateμ * v of each subtask v ∈ V i is a nondecreasing function of resource reservation σ i . Hence, we have thatS i (λ, σ) is nonincreasing in σ i , which proves the result.

IV. EQUILIBRIA UNDER PERFECT INFORMATION
We first consider the case of perfect information, i.e., each WD i knows λ and σ, and can infer its mean task completion timeS i (λ, σ). Observe that the sets of feasible rates and reservations of players form coupled constraints, and hence the resulting game is a generalized Nash equilibrium problem. In what follows we use Variational Inequality (VI) theory to prove the existence of equilibria in the SSRA game under perfect information. First, we recall the definition of a VI(K, F ) problem from [18].
Definition 1: Let K ⊆ R n be a closed convex set and F : K → R n a continuous function. The VI(K, F ) problem is to find a point We are now ready to formulate one of our main results. Theorem 6: The SSRA game under perfect information admits a pure strategy Nash equilibrium.
Proof: First, let us recall that the WDs can update their resource reservations periodically, and that between two updates of the resource reservations they can adjust their rates. In order to model the dynamics of the game played by the WDs, we introduce two fictitious players i σ and i λ for each WD i ∈ N, which decide about the resource reservation σ i and the rate λ i , respectively. Furthermore, we denote by N σ , |N σ | = N and N λ , |N λ | = N the sets of fictitious players that decide about the resource reservations and rates, respectively. Finally, we denote by N f the set of all fictitious players, i.e., In order to model how the fictitious players interact with each other we define for each i σ ∈ N σ the set We can then define the generalized Nash equilibrium problem (GNEP) , σ)) i f ∈N f > in which both fictitious players i σ and i λ aim at maximizing utility U i (λ, σ) of WD i with respect to the latency constraint of WD i. Therefore, Γ f is a strategic game in which each fictitious player i σ ∈ N σ aims at maximizing its utility U iσ (λ, σ) = U i (λ, σ) by solving and each fictitious player i λ ∈ N λ aims at maximizing its Clearly, a pure strategy Nash equilibrium of Γ f is an equilibrium of the SSRA game in which the WDs update their resource reservations and rates separately. We thus have to prove that Γ f has a pure strategy Nash equilibrium. In the following we use VI to prove the result concerning the existence of equilibria in Γ f . Therefore, we need to define a suitable VI(K, F ) problem that corresponds to game Γ f . To do Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. so, we have to specify the set K and the function F [18], [19], [20]. First, we define the set Second, we define the function where ∇ σ U (λ, σ) and ∇ λ U (λ, σ) are the gradient vectors given by The proof relies on showing that that set K is compact and convex and that the utility U i (λ, σ) of each WD i is continuously differentiable in (λ, σ) and concave in σ i and λ i [20]. We start with proving the compactness of set K. Let us recall that WD i ∈ N can choose σ i and λ i from the compact sets [σ i , σ i ] and [λ i , λ i ], respectively. Therefore, it is easy to see that K iσ (λ, σ −i ) and respectively. Since the Cartesian product of compact sets is compact (c.f., Tychonoff's theorem), we obtain that set K defined in (21) is compact.
We continue with proving the convexity of set K. From Corollary 1 and Theorem 5 we have thatS i (λ, σ) is an increasing pseudoconvex function of the task arrival rate λ i and a nonincreasing function of σ i , respectively. Therefore, S i (λ, σ) is quasiconvex in λ i and in σ i , and thus sublevel sets K i λ (λ −i , σ) and K iσ (λ, σ −i ) are convex [21]. Since the Cartesian product of convex sets is a convex set [21] we obtain that the set K defined in (21) is convex as well.
Finally, it is easy to check that the utility function U i (λ, σ) defined in (9) is continuously differentiable in (λ, σ) and concave in σ i and λ i . Hence, it follows from Theorem 2.1 and Proposition 2.2 in [20] that the solution of VI(K, F ) exists and it is also a Nash equilibrium of Γ f , and thus of the SSRA game. This proves the theorem.
We have thus shown that equilibria exist in the SSRA game under perfect information, which is a prerequisite for the study of learning equilibria under imperfect information considered in the following section. In the Appendix, included in the supplementary material, we also show that rate reservation is essential in the considered problem, as the interaction between rate control and resource allocation may lead to starvation otherwise. Next, we study whether equilibria can be reached under imperfect information.

V. LEARNING TO PLAY EQUILIBRIUM USING ONLINE OPTIMIZATION
In what follows we propose an online optimization algorithm for WDs to maximize their individual utility based on measured sojourn times of their computational tasks, called OARC. The pseudo-code of the algorithm is shown in Figure 2. The algorithm makes use of online gradient ascent based on a perturbation of σ i , used for estimating the gradient of the utility function U i , and in between perturbations it ensures that the latency constraint is met through rate adaptation (RA). In each iteration, the algorithm first updates the perturbation size (η t ), and the learning rate (α t ) (Line 1). It then computes the perturbed reservations (σ − i (t), σ + i (t)) and reports those to the operator (Lines 3, 4 and 6, 7). WDs estimate the resulting arrival intensities and average response times ) (Lines 5 and 8). Between Lines 9−11, the algorithm computes the stochastic subgradient with respect to the rate reservation. Finally, it computes the estimated arrival rate and updates the reservation using a gradient ascent step, based on the computed stochastic subgradient (Lines 12 − 13).
In what follows we first show that the proposed algorithm can indeed ensure to meet the mean sojourn time constraint, and that under certain assumptions it converges to an equilibrium. Proposition 2: Let σ i be fixed, and

σ). Then the set of solutions of the problem
is compact and convex. Proof: We prove the result by first showing convexity and compactness of the solution set. By Corollary 1 Pseudoconvexity implies quasiconvexity, and every sublevel set of a quasiconvex function is convex, which together with the finiteness of λ * i (σ i ) proves the result. Observe that pseudoconvexity of the objective in (22) implies that stochastic gradient descent algorithms, such as stochastic approximation and the Adam algorithm [22] can be used for finding a solution efficiently [23,Th. 4.1]. We can thus consider that users are able to solve (22) using a rate adaptation (RA) algorithm, which we formulate as the following assumption.
Assumption 2 A2: Denote byλ i (σ i (t)) the estimated solution to (22). The arrival rate estimation error The assumption that the estimate is unbiased is justified by that η t → 0, which makes that the perturbed reservations converge to σ i (t), and hence the computed arrival rates converge to the actual optimal arrival rate. We now turn to the analysis of the task arrival rate and the utility under the following assumption.
Assumption 3 A3: Consider two strategies (λ i , σ i ) and (λ i , σ i ), and let 0 ≤ θ ≤ 1. Then In what follows we show that under Assumption 3 the maximum task arrival rate of each user is concave in its rate reservation.

Proposition 3: Let us define the maximum task intensity
Proof: Recall that by Corollary 1 and Theorem 5 the mean sojourn timeS i (λ, σ) is increasing and pseudoconvex in λ i , and is nonincreasing in σ i , respectively. Assumption 3 implies that the mean sojourn timeS i is jointly quasiconvex in (λ i , σ i ). Quasiconvexity implies that each sublevel set A consequence of the above result is that the utility is concave in the rate reservation.
Proof: Concavity follows from Proposition 3, and the concavity of f i . L-Lipschitz continuity follows from that S(λ i , σ i , σ −i ) is bounded by T i , and λ i and σ i have compact domain, thusS is L-Lipschitz. Observe that for any T i < ∞, the set S i ⊂ [0, ∞) is compact, and sinceŨ i is concave, it is Lipschitz continuous in the relative interior of its domain ( [24], Proposition 2.107). In addition, f i is L i -Lipschitz continuous by assumption, thus U i is L-Lipschitz continuous for some L > 0.
Our first main result about OARC establishes that if OARC converges then it indeed converges to an equilibrium of the SSRA game.
Theorem 7: Assume that the sequence σ(t) generated by OARC converges to σ * (t). Then σ * (t) is a Nash equilibrium of the SSRA game.
Before we present the proof, we introduce three technical results related to the update expression and to the measured utility under noisy rate estimates.
Lemma 3: The update expression in Line 13 of the OARC algorithm can be written as the projected gradient update where P i is the Euclidean projection on S i . The projected gradient is equivalent to where the term h(σ i ) = 1 2 σ i 2 is called the penalty function, andσ i (t) ∈ R is called the aggregated gradient.
Second, we characterize the bias of the gradient estimates used in OARC.
Lemma 4: Consider the measured central difference derivative estimate ∇Û i (t). The estimate has a bias of where θ(ζ i,t ) is the error due to the arrival rate estimation error. Proof: Consider the Taylor expansion ofŨ i at σ i (t), ), (28) and use it to express the true gradient at σ i (t) as a function of the central difference derivative estimate, ). (29) Consider now the measured utility based on (9), where λ i (σ i (t)) is the arrival intensity at σ i (t). We can perform a Taylor series expansion of (30) and (31) where θ + (ζ i,t ) and θ − (ζ i,t ) are the utility estimation error due to the arrival rate estimation error. Let us subtract (33) from (32) and divide it by η t , we then obtain which together with (29) and using θ( We note that the above result may be extended to nondifferentiable functions following the analysis in [27] Third, we show that the utility estimation error due to the arrival rate estimate vanishes.
Lemma 5: Assume that f i is smooth and Assumption 2 holds. Then , and consider the Taylor series expansion, similar to (32) and (33), Consider now (36) and the limit of its expectation, recalling that the denominator is deterministic, the difference of the first order derivatives in the first term of (36) is equal to the second order derivative by definition. Following the same logic, the difference in the second term is equal to the third order derivative by definition. This holds for all higher order derivatives in (36) as η t → 0. Now, by assumption f i is a smooth and L-Lipschitz continuous function, hence its derivatives are bounded. Furthermore, by Assumption 2 we have E[ζ 2 i,t ] → 0 as t → ∞, hence higher moments do so too with probability 1, which concludes the proof.
Using the above results we are now ready to prove Theorem 7.
Proof of Theorem 7: Let g * = g(σ * ) = ∇Ũ(σ * ) and assume that σ * is not a Nash equilibrium. By the characterization of Nash equilibria (see [26] for details), there exists a player i ∈ N and a deviation q i ∈ [σ i ,σ i ] = S i ⊆ R and g * i , q i − σ * i > 0. By continuity, there exist some c > 0 and neighborhoods U and G of σ * and g * respectively such that whenever σ ∈ U and g ∈ G. Now, let Ω be the event that σ(t) converges to σ * , so P(Ω) > 0 by assumption.
Within Ω we can also assume for simplicity that σ(t) ∈ U and g(σ(t)) ∈ G for all t. Recall that in OARC the learning rate α t satisfies . By using the update rule given in Lemma 3, and Assumption 2, we can rewrite the update rule in terms of the bias and the error term . By Lemma 5, the term due to the arrival intensity estimation error satisfies τ −1 → 0 (a.s). Let us define some positive constant M > 0, we can then rewrite the latter term as Consequently, g(σ(t)) → g * in Ω and P(Ω) > 0, and hence by (40) we can conclude that P(ḡ(t) → g * |Ω) = 1. Consider now the penalty function h defined in Lemma 3, and define its subdifferential Function h is called subdifferentiable at x ∈ R whenever ∂h(x) is nonempty, and by ([28, Th. 12.60(b)] and [29 Th. 23.5]) for the subdifferential ∂h it holds thatσ i (t) ∈ ∂h(σ i (t)) ⇐⇒ σ i (t) = P i (σ i (t)). Thus using the definition of the subdifferential and (39) we have Sinceḡ(t) → g * almost surely on Ω, (37) yields ḡ i (t), q i − σ i (t) ≥ c > 0 for all sufficiently large t. We find that (1). By substituting this into (42), we obtain h(q i ) − h(σ i (t)) > cτ t → ∞ with positive probability. This is a contradiction since h is continuous and 1-strongly convex, and S i is compact. Thus we conclude that σ * (t) is a NE, which proves the result.
We have so far shown that if OARC converges then it converges to an equilibrim of the SSRA game. In what follows we also show that OARC achieves zero regret asymptotically. For simplicity we present the proof for the case of noiseless rate estimates, but the proof can be easily extended to noisy rate estimates for the expected regret.
Thus, lim sup T →∞ R i (T )/T = 0. Proof: SinceŨ i is concave and L-Lipschitz, for any σ i (t) we havẽ At the same time we can use the update equation and Lemma 3 for obtaining the bound where the inequality is due to the projection P i . Re-arranging the inequality we obtain We can combine (46) and (48) to obtain Using α t = t γ1 , η t = 1 t γ 2 , and the bound T t=1 t −γ ≤ 1 + For 0 < γ 1 , γ 2 ≤ 1 we obtain lim sup T →∞ R i (T )/T = 0. Furthermore, using γ 1 = γ 2 = 0.5 we obtain (44), which proves the result. Thus, the OARC algorithm can compute a solution that is asymptotically optimal in hindsight.

VI. NUMERICAL RESULTS
We performed extensive simulations in order to assess equilibrium behavior and to validate the proposed OARC algorithm. For the evaluation we consider three scenarios with different task graphs and queue types. In Scenario 1 the task graph consists of two subtasks in series corresponding to a wireless transmission subtask followed by one computational subtask executed in series. Scenario 2 consists of three subtasks in series, corresponding to wireless transmission subtask followed by two computational subtasks executed in series. Scenario 3 is a fork-join queuing system in which a wireless transmission subtask is followed by two computational subtasks executed in parallel, followed by a computational subtask. For all of the scenarios, we have |A| = 4 APs and |C| = 8 servers. We assigned up to |N |/|A| users at random to each AP.
We set the WDs' latency constraints T i uniform at random on [0.1, 0.01] s, which is reasonable for a variety of low latency applications envisioned for 5G systems [30]. We choose the service rate of each resource and subtask μ r,v to be uni- for Scenario 1. For Scenario 2 and Scenario 3 we set the service rate to be 50% higher, on average. Finally, as an example of a non-negative concave function we use f i (λ i ) = log(1 + λ i ) for computing the WD's utility [31], and set c λ = c σ = 0.02. Note that with these parameters λ i = 49, and we set λ i = 0. For the evaluation we consider Poisson arrival processes, the service times are exponentially distributed (M) or deterministic (D), allowing us to validate our results under significantly different service processes.
We used two algorithms as baselines for comparison. The first algorithm is the OCO proposed in [32]. OCO is an extension of the Zinkevich algorithm, meant to satisfy convex stochastic constraints, and maximizes the expected utility by adjusting (λ i , σ i ) simultaneously. We used perturbations to estimate the local gradients, as those are assumed to be known by OCO. The second baseline is obtained by applying Online Adaptive Rate Reservation and Control -Sum of Utilization (OARC-SUM) using the sum utility of all users as objective function, i.e., considering that users cooperate for maximizing their sum utility instead of competing. We refer to this baseline as the OARC-SUM algorithm. In addition, to be able to assess  the impact of Stochastic Approximation (SA) on the performance of OARC, we consider a baseline for Scenario 1 where we compute the optimal arrival rates λ i analytically instead of using SA. We refer to this as OARC-Model. The results shown are the averages and the 95% confidence intervals computed based on 30 simulations. Fig. 3 shows the total utility as a function of the number of WDs for Scenario 1 with exponential service times, for OARC, OCO, OARC-SUM and OARC-Model. Surprisingly, the total utility for OARC is not monotonically increasing. The reason for this is that above N = 4 the WDs can no longer achieve their maximum rate λ i and thus they contend for the communication and computing resources. Contention in turn decreases the maximum service capacity of the system due to the latency constraints (c.f., the achievable rate in an M/M/1 queue with service rate μ under latency constraint T , vs the sum of the achievable rates in two M/M/1 queues with service rate μ/2 under latency constraint T ).

A. Utility Performance
The figure also shows that OARC-SUM outperforms OARC which is justified by that OARC-SUM aims at maximizing the sum utility of all WDs, i.e., WDs do not act independently. The figure also allows us to assess the effect of rate adaptation on the utility obtained by OARC. Comparing the curves for OARC and OARC-Model, we can observe that the impact of stochastic rate adaptation is negligible. Comparing the results for OARC and OCO, it may be surprising that OCO achieves higher utility than OARC for N > 4. To explain why this is possible, Fig. 4 shows the empirical CDF of the normalized sojourn times of the WDs for the two algorithms for N = 12. We compute the normalized sojourn time as the ratio of the average sojourn time of a WD divided by its latency constraint. The figure shows that OCO leads to a significant violation of the latency constraint for the majority of WDs. On the contrary, OARC-Model does not lead to a violation of the latency bound, while OARC leads to minor violations of the latency constraint due to SGDbased rate adaptation. Another observation that can be drawn from Fig. 4 is that in the heavy traffic regime OARC enables WDs to adjust their rates and prevents latency violations with high probability. On the contrary, OCO fails to keep the mean sojourn time of the WDs under their latency constraints: when there are many WDs, OCO might lead to unstable queues whereas OARC ensures queue stability by keeping the mean sojourn time of the WDs at their latency constraints. We can thus conclude that OCO does not solve the SSRA problem, mainly due to that the utility is not jointly convex in the arrival rate and in the rate reservation, which highlights the importance of the approach followed by OARC.
Corresponding results for deterministic service times, included in the Appendix, show that the utility for deterministic service times is slightly higher than for exponential service times, but the curves show similar characteristics. In what follows we will show results for deterministic service times for clarity of exposition. Fig. 5 shows the total utility as a function of the number of WDs for Scenario 2 and Scenario 3 with deterministic service times. The results show that OARC performs close to OARC-SUM for more complex subtasks graphs as well, including a fork-join task graph (Scenario 3). Importantly, it also shows that the shapes of the curves are not affected by the subtask graph topology, i.e., the utility decreases due to contention for resources. The superior performance of OCO in Scenario 2 and Scenario 3 is again due to that OCO results in significant latency constraint violations (we omit the figure for brevity). Fig. 6 shows the total arrival intensity as a function of the number of WDs for Scenario 2 and Scenario 3 with  deterministic service times. The results show that the utility is to a large extend determined by the arrival intensity, both for OARC and for OCO. It is interesting to note that OARC-SUM has lower total arrival intensity (particularly for N < 4) even though it has higher total utility compared to OARC. This is due to that OARC-SUM prevents that a few users achieve a very high arrival intensity, harming the rest of the users. We also note that the total utility and arrival rate are far from the social optimum for N > 4, as the utility obtained for N = 4 would be achievable for N > 4 by assigning zero rate to all but 4 users, this is, however, not an equilibrium. Fig. 7 shows the total reservation as a function of the number of WDs for Scenario 2 and Scenario 3 with deterministic service times. Surprisingly, the total reservation for OARC does not increase linearly with the number of users beyond N > 4, which can be explained by that WDs learn that they cannot increase their utility by increasing their reservation parameter due to the congestion on the resources. Interestingly, OCO results in significantly higher resource reservations compared to OARC and OARC-SUM, which is due to that the latency constraint is not met by the WDs, allowing significantly higher rates.  Fig. 8 shows the total revenue of the edge cloud operator as a function of the number of WDs for Scenario 2 and Scenario 3 with deterministic service times. Since the revenue is a linear function of the reservation parameter and the arrival intensity, its shape is similar to that of the curves shown in Figs. 7 and 6. Somewhat surprisingly, the results in Fig. 8 show that the total revenue decreases beyond N > 4 when using OARC and OARC-SUM, i.e, the edge cloud operator looses revenue due that the WDs contend for the resources, and consequently reduce their arrival rates so as to meet their latency constraints. This observation leads us to conclude that operators would need to implement admission control to maximize their revenue in a serverless computing environment with latency constrained tasks. Fig. 9 shows the average computation time for solving problem (3)-(8) for all scenarios, based on a Python implementation executed on an Intel i9-10900 CPU. Recall that the task graphs in Scenario 1, Scenario 2 and Scenario 3 contain 2, 3 and 4 subtasks for each user, respectively, which is why the computation time is highest for Scenario 3. Overall, we observe that the computation time increases approximately linearly as the number of WDs increases. This is because as the number of WDs (N ) increases, so does the number of subtasks |V| = |∪ i∈N V i |, indicating that the average complexity of the problem (3)-(8) is linear in the number of subtasks. Fig. 10 shows the total utility as a function of the number of computing resources for Scenarios 1, 2 and 3 with exponential service times for |A| = 4 APs. The figure shows that the utility is a monotonically increasing concave function with respect to the number of computing resources for all scenarios, and indicates that the proposed algorithm utilize the available computing resources. We note that the concavity of the curves is due to the concavity of the utility functions.

VII. RELATED WORK
Our problem is related to network utility maximization introduced in [33], later extended to, e.g., packet losses [34], Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply.  and to queuing networks subject to a stability constraint [35]. Unlike in the case of network utility maximization, in the problem we consider the objective of the network is not aligned with that of the users, which makes the two problems fundamentally different.
Related to ours are recent works on rate control in queuing networks. In [36] authors considered distributed rate control for a fork-join processing network under a static server assignment, and proposed a solution akin to the back-pressure algorithm. The focus of this work was on rate stability, and thus the issue of utilities and latency constraints was not considered. Authors in [37] analyze the convexity of the system time in queuing networks, and authors in [38] consider constrained stochastic approximation and provide unbiased estimators that can be used for GI/G/1 queues. The results hold as long as the cost function is strictly unimodal, including convex.
There are few works focusing on resource management for serverless computing [5]. Authors in [5] use Bayesian optimization for learning the execution time and cost of serverless functions on Amazon AWS. Their approach does not consider server side resource allocation and the interaction among users explicitly, and the solution requires the repeated solution of an integer linear program based on estimated parameters for choosing parameters for service chains.
Our work is related to recent work on online learning. Closely related to our algorithm is the Zinkevich algorithm for unconstrained online convex optimization [32]. The algorithm was extended in [32] to online convex optimization with stochastic constraints. These works focus on a single decision maker, and assume that the cost and the constraint functions are revealed after every round. Similarly, authors in [39] and [40] propose algorithms for nested stochastic approximation, but the problem formulations do not consider stochastic constraints.
In the area of computation offloading, authors in [41] propose an offline policy for a dynamic computation offloading and resource scheduling problem under task completion constraints, consider that both wireless devices and the network operator are decision makers, and assume that the task of each device can be modeled as a DAG with the same number of subtasks. Authors in [42] model an application as a directed acyclic data flow graph, consider a system with limited wireless and abundant computing resources shared by multiple applications, and address the problem of deciding which components in the data flow graph should be offloaded onto the cloud such that the throughput of the applications is maximized. Authors in [43] model a computational task as a DAG, consider the congestion on computing resources only, and propose a heuristic for solving an offline task placement problem in which the objective is to minimize the sum cost of the devices under constraints on the dependency among subtasks, the task completion time deadlines and the amount of available computing resources. Finally, authors in [44] consider a task graph with loops, cycles and branches, under the assumption of deterministic service and waiting times. They present heuristic algorithms for solving two related optimization problems, minimizing the response time under a budget constraint, and minimizing the cost under a response time constraint.
These works do not consider, however, the interaction between application rate control, server side resource management and the stochastic service processes. To the best of our knowledge, ours is the first work that considers this interaction, analyzes the existence of equilibria and proposes an online optimization algorithm for learning equilibria in a distributed manner.

VIII. CONCLUSION
In this paper, we proposed a modeling abstraction and a problem formulation for investigating the interaction between latency constrained services and resource management for serverless edge computing. The proposed abstraction is based on a queuing network model of task graph execution and allows the analysis of the interaction between selfish WDs that reserve edge resources and a serverless operator that allocates resources among WDs, formulated as a non-cooperative game. Our analytical results show that rate reservation plays an essential role for latency sensitive services, at the same time a simple abstraction for rate reservation allows conceptually simple algorithms, like the proposed OARC, to converge to equilibria with good performance. Our numerical results confirm the analytical findings and also reveal that current practice of serverless service rate allocation leads to a loss of service capacity under latency constraints, and to a loss of operator revenue at the same time. Consequently, solutions for admission control complemented with new abstractions and related scheduling policies would be desirable for latency constrained computing tasks in a serverless edge computing infrastructure. Our model could be extended to consider that the computing price is dependent on the total reservation, i.e., increasing with the contention for computing resources, it could be used to study the impact of different forms of signaling between the WDs and the operator on convergence speed and the resulting utility, and it could be extended to consider more complex models of task graphs. We leave these to be subject of our future work.