Second-Order Mirror Descent: Convergence in Games Beyond Averaging and Discounting

In this paper, we propose a second-order extension of the continuous-time game-theoretic mirror descent (MD) dynamics, referred to as MD2, which provably converges to mere (but not necessarily strict) variationally stable states (VSS) without using common auxiliary techniques such as time-averaging or discounting. We show that MD2 enjoys no-regret as well as an exponential rate of convergence towards strong VSS upon a slight modification. MD2 can also be used to derive many novel continuous-time primal-space dynamics. We then use stochastic approximation techniques to provide a convergence guarantee of discrete-time MD2 with noisy observations towards interior mere VSS. Selected simulations are provided to illustrate our results.


I. INTRODUCTION
A central problem of multi-agent online learning is the design of adaptive policies which a set of subscribing agents (or players) can utilize to arrive at a desired collective outcome.These adaptive policies can typically be viewed as the following iterative process: an information processing step, whereby game-relevant information is made available to the player and then collected for processing, followed by a decision step, whereby the processed information is converted into the next strategy.A unified mathematical codification of this two-step process that emerged in recent years is referred to as mirror descent, first studied by [3] and subsequently by [4]- [11] in the context of optimization.
For games, mirror descent is often analyzed as a set of ordinary differential equations (ODEs), referred to as the continuous-time mirror descent dynamics (MD), which can be seen as the multi-agent extension of [3, p. 87].MD is also referred to as dual averaging (DA) in continuous games [1], [19] and Follow-the-Regularized-Leader (FTRL) or exponential learning in finite games [18], [20], [21].The core appeal of MD lies in its flexibility in adapting to a wide variety of problem settings as well as its rich theoretical properties.In particular, MD is known to converge to the Nash equilibrium (NE) in terms of its generated strategies under a trio of strict assumptions: (i) the game is strictly monotone, i.e., the pseudo-gradient of the game is a strictly monotone operator [50].The class of strictly monotone games captures games that admit strictly concave potential functions, as well as saddle-point problems with strictly convex, strictly concave saddle functions, a special case of diagonally strict concavity of [51].(ii) the NE is strictly variationally stable (in the sense of [1]).The class of games with a strictly variationally stable NE (also known as strictly variationally stable games [17]) contains the class of strictly monotone games, as well as strictly coherent saddle point problems [46].(iii) the NE is strict [14], which coincides with a locally strictly variationally stable NE in finite games [1].Despite these theoretical guarantees, the applicability of MD and its variants remains limited, as many games found in practice do not conveniently exhibit these strict properties.In other words, MD does not converge in many games, most notably in zero-sum (ZS) finite games that feature a unique interior NE [21].[12] showed that no two-player ZS finite game is strictly monotone.Furthermore, strictness is also intimately related to the uniqueness of the game equilibrium.For instance, a strictly monotone game admits a unique NE [50].In practice, however, many games exhibit a convex set of equilibria as opposed to a unique one.This means strictness imposes serious constraints on the problem parameters or types.Moreover, many dynamics or algorithms that are directly derived from MD (through discretization or otherwise) share similar limitations in that some type of strictness needs to be assumed to ensure convergence, e.g., [1], [2], [16]- [20], [22].
To achieve convergence beyond strictness in a continuous-time setup, the existing literature typically relies on two primary approaches: averaging and discounting.The first method utilizes the time-averaged (ergodic) strategies generated by MD as opposed to the actual strategies [18].However, this approach has two drawbacks.Firstly, the players need to combine their time-averaged strategies in order to recover the NE, which is unrealistic in non-cooperative settings.Secondly, time-averaging could fail outside of ZS games [57].Another method is via a discounting procedure [15], [27], [28], which is conceptually related to the weight decay method of [41].Discounting can be seen as a regularization technique that makes use of a strictly convex function to offset poor game properties, such as the lack of strict monotonicity.However, in general, discounting cannot yield exact convergence [15], [27], [28].Since averaging and discounting represent two of the most widely studied method for improving the convergence properties of continuous-time MD, yet at the same time both have been met with challenges, therefore it is natural for us to set our sights on alternative approaches.
Contributions: In this work, in a game-theoretic setup, we propose a second-order variant of the continuous-time MD (the first-order of which was studied in [4], [5], [18]- [21]), which we refer to as MD2.Second-order means that the set of ODEs is second-order in time, thus MD2 can be seen as a dual-space formulation of the heavy-ball method [32] and is closely related to the class of nth order discounted game dynamics of [27], adapted here in a more general game setting.Like MD, we show that MD2 satisfies the basic assumption of no-regret.However, unlike MD, MD2 converges without using averaging even when the NE is not strictly variationally stable.Moreover, unlike the discounted MD [28], which partially overcomes the convergence issue of MD at the cost of inexact convergence, MD2 converges exactly.Exact convergence via a primal-space, secondorder pseudo-gradient-type dynamics has been achieved in (merely) monotone games [26].We provide a dual-space generalization of this result, beyond the monotone game setting, into the case where the equilibria are merely variationally stable.Furthermore, MD2 can exactly recover the unconstrained, primal-space dynamics of [26] in the full-information case.Finally, we use our continuous-time convergence results to guide the proof of convergence of a discretetime MD2.In gist, we show that higher-order dynamics can converge towards Nash solutions with more general stability properties.
Related Works: This paper incorporates several ideas from the variational stability, higher-order dynamics, and stochastic (semi-bandit) Nash-seeking literature.The concept of a variationally stable state (VSS) has roots in evolutionary game theory, and it is motivated by and analogous to that of an evolutionarily stable state (ESS) [37].Namely, an ESS is a locally strict VSS for games with one or more populations of agents, also known as population games (similarly, a global ESS or GESS is a globally strict VSS) [12].Many evolutionary game dynamics such as the replicator and projection dynamics have been shown to converge towards NEs of strictly stable games, which are GESS [2], [12].In contrast to ESS, VSS is defined for the more general class of continuous games [1].The literature on strict VSS seeking without structural property such as strict monotonicity is relatively recent [1], [16], [17], [19], [20].[19] showed that DA converges towards a globally strict VSS.[16], [17] studied online gradient descent (discrete MD with projection map) where the feedback is received asynchronously between agents.
In contrast to a strict VSS, the class of mere VSS is motivated by the more general notion of a neurally stable state (NSS) and global NSS (GNSS) [38].Comparatively speaking, the set of literature that deals with convergence towards a mere VSS is fewer, especially in the absence of structural assumptions such as the game being merely monotone (also known as stable game [12]).In a population game setup, [12] showed that the best response, integrable excess payoff and impartial pairwise comparison dynamics converge globally to the set of NEs in (merely) stable games, which are GNSS.With a few exceptions such as [46], the current set of literature on the (non-ergodic) convergence towards a mere VSS outside of a finite/population game setup typically also assumes monotonicity of the game [24], [26], [58].In contrast to the above references, in this work we generalize the convergence behavior of MD to non-strict NE (i.e., a mere VSS) without any structural assumption, through the use of higher-order augmentation.
Higher-order dynamics was pioneered by Polyak in [32], whose algorithm later became widely known as the heavy-ball method (HB) [32].HB was recognized as an analogue of a second-order ODE upon its inception and has since been heavily investigated in the optimization literature [39] and arguably forms the backbone of deep learning, where non-convexity and local minima dominate [40].
Inspired by [32], the authors of [34] studied a second-order dynamics for continuous concave games, which can be seen as a secondorder extension of the projection dynamics of [36] and showed that their dynamics converge whenever there exists a Lipschitz and bounded potential function.In a finite game setup, [33] studied nthorder variant of exponential learning [18], whereby the payoff is processed via successive integration, and showed that these higherorder learning schemes can achieve faster rate of elimination of iteratively dominated solutions and convergence towards strict NE.An important connection between higher-order dynamics and the stability of a NE was established by [35], which showed that the addition of an anticipatory process to first-order gradient-play and replicator dynamics can result in local convergence towards an interior (non-strict) NE, despite the dynamics themselves being uncoupled.For continuous-time equilibrium seeking via a higherorder augmentation in the dual-space, the closest work related to ours is the second-order discounted exponential learning scheme in [27], which overcomes the problem with non-convergence towards interior NE of [33] at the cost of inexact convergence.It was found empirically in [27] that such higher-order augmentation improves the convergence property of its first-order counterpart and converges in non-strictly monotone games where the first-order fails.We improve [27] by generalizing these results from finite games to continuous games and go beyond a monotone game setup while simultaneously achieving exact convergence.Exact convergence (also known as last iterate convergence) via a primal-space, second-order pseudo-gradient dynamics was achieved in merely monotone games [26], which we also generalize, by stripping away monotonicity assumptions and showing that MD2 encapsulates the dynamics of [26] as a special case.
In the discrete-time, semi-bandit setup, whereby each player receives a (possibly) noise-corrupted version of the pseudo-gradient, the closest work related to ours is [1].[1] studied the first-order MD, known as DA therein, and showed that under imperfect gradient setup, DA converges to a globally variationally stable NE under diminishing step-sizes and its ergodic average converges to the set of NE in 2player ZS games.Several variations to DA have been studied in the semi-bandit literature, whereby their convergence is towards either strict (strong) VSS or strict NE.For instance, in finite games, [23] studied a Hedge-variant of exponential weight algorithm (HEDGE) and provided convergence when the NE is strongly VS with respect to the L 1 norm.In a similar setup, [15] studied a discounted variant of HEDGE for potential games.A similar analysis was also performed by [22] for FTRL in finite games.[25] provided several primal-space algorithms for merely monotone games.[46] studied the optimistic MD algorithm, which converges in coherent games, i.e., two-player games that possess mere VSS.Compared to optimistic MD, discrete-time MD2 only requires one mirror projection instead of two prox projections.Furthermore, optimistic MD cannot converge in the presence of noise [46] unless the game is strictly coherent (a stronger assumption), whereas the discrete-time MD2 can converge under noisy conditions.
Organization: We provide the background materials in Section II.Section III reviews several basic properties of the first-order MD.We discuss the convergence properties of MD2 in Section IV and provide the rate of convergence as well as regret minimization guarantee in Section V. We derive associated primal-space dynamics in Section VI.Section VII discusses MD2 in discrete-time with noisy observations.Simulations are presented in Section VIII.Section IX presents our conclusions.For readability, all proofs are relegated to Section X.A table of main notations used in this paper (Table 1) is provided in the Appendix.

A. Sets and Vectors
Given a convex set C ⊆ R n , the (relative) interior of the set is denoted as (rint(C)) int(C).rint(C) coincides with int(C) whenever int(C) is non-empty.The closure of a set C is denoted as cl(C).We denote the non-negative orthant of R n as R n ≥0 , and the positive orthant as R n >0 .A column vector in R n is denoted as

xn
⊤ . 1 and 0 denote the column vector all ones and all zeros.I n×n and O n×n denote the n×n identity and zero matrices.Subscript is omitted when the dimensionality is unambiguous.Suppose n is a natural number, then [n] := {1, . . ., n}.

B. Convex Functions and Duality
Let M = R n be endowed with norm ∥ • ∥ and dot product ⟨•, •⟩.
and equals ∅ for all x / ∈ C. Id denotes the identity function.Recall that π C (x) = argmin y∈C ∥y − x∥ 2  2 is the Euclidean projection of x onto C, where we refer to π C as the Euclidean projector.Let ∂f (x) denote a the set of subgradients of f at x and ∇f (x) the gradient of f at x.For differentiation on the boundary of a closed set C, in lieu of the subgradient, we can also assume f is defined and differentiable on an open set containing C. Given f , the function , is called the conjugate function of f , where M ⋆ is the dual space of M, endowed with the dual norm ∥ • ∥⋆.f ⋆ is closed and convex if f is proper.By the conjugate subgradient theorem [48], suppose f : R n → (−∞, +∞] is proper, closed and convex and f ⋆ is its Fenchel conjugate, then for any Given a vectorvalued function F , the Jacobian of F is denoted as J F .

C. N-Player Concave Games
Let G = (N , {Ω p } p∈N , {U p } p∈N ) be a game, where N = {1, . . ., N } is the set of players, Ω p ⊆ R np is the set of player p's strategies (actions).We denote the strategy (action) set of player p's opponents as Ω −p ⊆ q∈N ,q̸ =p R nq .We denote the set of all the players strategies as as player p's real-valued payoff function, where x = (x p ) p∈N ∈ Ω is the action profile of all players, and x p ∈ Ω p is the action of player p.We also denote x as x = (x p ; x −p ) where x −p ∈ Ω −p is the action profile of all players except p.For differentiability purposes, we make the implicit assumption that there exists some open set, on which U p is defined and continuously differentiable, such that it contains Ω. Assumption 1.For all p ∈ N , Ω p is a non-empty, closed, convex, subset of R np .U p (x p ; x −p ) is (jointly) continuous in x = (x p ; x −p ).U p (x p ; x −p ) is concave and continuously differentiable (C 1 ) in x p for all x −p ∈ Ω −p .
Under Assumption 1, we refer to G as a (continuous) concave game.Given x −p ∈ Ω p , each agent p ∈ N aims to find the solution of the following optimization problem, maximize A profile x ⋆ = (x p⋆ ) p∈N is a Nash equilibrium (NE) if,  2) is strict [14].
A useful characterization of a NE of a concave game G is given in terms of the pseudo-gradient [51], which is defined as, where Equivalently, x ⋆ is a solution of the Stampacchia variational inequality VI(Ω, −U ), [50].Following [2], we say that a NE is globally strict if the inequality of ( 4) is held strictly for all x ⋆ ̸ = x.

D. Monotonicity
A general class of games in which many dynamics are guaranteed to converge is the class of monotone games, also known as stable games [12], [13] or dissipative games [44].We contrast some known definitions associated with monotone games in the literature.
with equality if and only if Strongly and strictly monotone games have been extensively investigated in the literature, at least dating from [51].Merely monotone games have been studied in [12], [24]- [28], [58].Weakly monotone games were considered in [27], [30], [43].When U is C 1 , there exists a natural characterization in terms of definiteness of its Jacobian [50,p. 155,Prop. 2.3.2].Note that strictly monotone games can have at most one NE, whereas NEs can form a non-singleton convex set in merely monotone games [50].

E. Variational Stablility
Although many classical examples of games satisfy monotonicity properties [12], these conditions may not hold in more complex scenarios.A recent line of research has started to relax monotonicity notions from a game to that of an equilibrium via the notion of a variationally stable (VS) equilibrium or state [1].
If a condition (i) -(iv) hold on D ⊂ Ω, then the associated definition is said to hold locally.Otherwise the definition is said to hold globally.
Remark 2. We refer to x ⋆ as a globally strong/strict/mere/weak variationally stable state (VSS) if it satisfies one of the corresponding VS notions in Definition 2 on all of Ω; x ⋆ is a locally strong/strict/mere/weak VSS otherwise.For practical reasons, we will informally refer µ-weak VSS with a small µ as a nearly mere VSS, i.e., a weak VSS within a small distance away from becoming a mere VSS.
Globally mere VSS are the solutions to Minty variational inequality [50].In a population game context, the set of globally mere VSS is called GNSS and (the unique) globally strict VSS is called GESS [2], [12], [13].Strict VSS was extended to a local/global set-wise definition in [1].[16] introduced a slight variation of strict VSS called λ-VS.[47] studied a local version of strict VSS under the name locally asymptotically stable differential NE, which requires twice continuous-differentiability of the payoff functions.Strong VSS (Definition 2(i)) was also studied in [1].[23] studied a variant of the strong VSS with L 1 norm.
It can be seen that any NE of the strongly/strictly/merely/weak monotone game is a strong/strict/mere/weak VSS.In particular, the non-empty set of NE coincides with the set of mere VSS for merely monotone concave games [2, Theorem 2].When int(Ω) ̸ = ∅ and x ⋆ is an interior NE, i.e., U (x ⋆ ) = 0, the condition for VS recovers the condition associated with monotone at x ′ = x ⋆ .Hence VS can be seen as a type of point-wise monotonicity and it is known to have a second-order characterization similar to that for monotone games, but specifically at the NE, see [1].
Recall that a NE x ⋆ is interior if it lies in the interior of Ω, that is, x ⋆ ∈ rint(Ω).Throughout this work, we make the following assumption.
Assumption 2. G admits an interior mere VSS.

F. Second-Order Characterization of VSS
In practice, outside of monotone games, it is often difficult to verify that a NE is a particular type of VSS directly through Definition 2. A more common approach to characterize the VSS conditions of a solution is by looking at the symmetric game Jacobian whenever the pseudo-gradient U is C 1 , which is given by, The matrix J U (x) ∈ R n×n is symmetric and thus guaranteed to have real eigenvalues, which makes it amenable to analysis.We now provide several sufficient conditions for verifying whether a VSS x ⋆ is strict, mere or weak.This is performed by checking the definiteness of J U (x) or J U (x ⋆ ).Our condition for strict VSS is the same as that in [1].Before proceeding, we say that the (symmetric) game Jacobian J U (x) ∈ R n×n (5) is negative definite on T Ω (x) if, and negative semi-definite on T Ω (x) if the preceding inequality is non-strict.We use the shorthand notations J U (x) ≺ O and J U (x) ⪯ O respectively.For µ > 0, J U (x) − µI ≺ O is written as J U (x) ⪯ µI .Similar conventions for when x = x ⋆ .
Proposition 1.Let x ⋆ ∈ Ω be a NE of G. Suppose U is continuously differentiable, and, (i) J U (x) ≺ O on T Ω (x), ∀x ∈ Ω, then x ⋆ is globally strictly VS and isolated.(ii) J U (x) ⪯ O on T Ω (x), ∀x ∈ Ω, then x ⋆ is globally merely VS. (iii) J U (x) ⪯ µI on T Ω (x), ∀x ∈ Ω, then x ⋆ is globally µ-weakly VS.Suppose instead, (i') Remark 3.These conditions can be verified by calculating the maximum eigenvalue of J U (x) (resp.J U (x ⋆ )), where λmax(J U (x)) (resp.λmax(J U (x ⋆ ))) denotes the (real) eigenvalue with the largest magnitude associated with an eigenvector in T Ω (x) (resp.T Ω (x ⋆ )).When J U (x) (resp.J U (x ⋆ )) is symmetric, then analysis can be performed directly on J U without resorting to calculating J U .Remark 4. Note that, even if x ⋆ is shown to be µ-weak through Proposition 1(iii) or (iii'), it does not preclude the possibility that x ⋆ could in fact be strict or mere due to the looseness of the bound in the proof of Proposition 1. Furthermore, unlike strict VSS, mere VSS need not be isolated.The next examples illustrate various notions of variational stability.Example 1. (Monotone game with a mere VSS) Every NE of a merely monotone game is globally mere VSS.Perhaps the simplest example of a merely monotone game is the so-called bilinear saddle point problem, whereby, suppose we have a saddle function, f (x 1 , x 2 ) = x 1 x 2 , x p ∈ R for which we want to minimize in x 1 and maximize in x 2 .We can cast this saddle point problem as a game by utilizing two payoff functions U 1 (x 1 ; x 2 ) = −x 1 x 2 and U 2 (x 2 ; x 1 ) = x 1 x 2 .This game has a pseudo-gradient of U (x) = (−x 2 , x 1 ) and a game Jacobian J U (x) = O.By Proposition 1(ii), the unique interior NE x ⋆ = (0, 0) is the globally mere VSS.Example 2. (Non-monotone potential game with a mere VSS) This example shows that the mere monotonicity enjoyed by Example 1 can be destroyed by the addition of another player, even when all the player's payoff functions remain linear in its own argument.Considered a three-player game, whereby, and each ) which shows that there exists an interior NE at x ⋆ = (0, 0, 0).The Jacobian, which is symmetric, is, hence this game is non-monotone in general and in fact possesses a non-convex and non-concave Example 3. (RPS with non-negative payoff for ties) Consider a two-player Rock-Paper-Scissors (RPS) game with A and B being the payoff matrices for player 1 and 2 respectively, where l , w ≥ 0 are the values associated with a loss or a win and ς ∈ R is the payoff of a tie.The strategy set associated with this game is the simplex The pseudo-gradient (as known as payoff vector [12], [13]) and the game Jacobian are as follows, To simplify our analysis, consider an example where l = 0.In this game, When ς = w /2, the game is merely monotone (Definition 1).Taken together, this means for ς ∈ (w /2, w ), x ⋆ is a unique, locally µ-weak VSS where µ = ς − w /2, whereas for ς = w /2, x ⋆ is a unique, globally mere VSS.Finally, note that when l = w = ς = 0 (and hence both A, B are zero matrices), every strategy is globally merely VS and forms a convex set (the strategy set itself).

III. REVIEW OF FIRST-ORDER MD DYNAMICS
We now describe the dynamic process that a group of agents utilizes in order to arrive at one of the equilibrium notions discussed in previous sections.One such general model is the mirror descent (MD) dynamics.Intuitively, the family of MD dynamics ascribes an abstract behavior model to each player, which states that, upon receiving the partial-gradient of its payoff function, each player processes the partial-gradient information (typically via an aggregation), then converts the processed information into the next strategy.This process can be described by the following set of ODEs [18]- [21], where z p is referred to as score or dual variable, γ > 0 is a rate parameter, and C p ϵ : R np → Ω p is referred to as the mirror map, Here, ϑ p : R np → R ∪ {∞} is assumed to be a closed, proper and (at least) strictly convex function, referred to as a regularizer, where dom(ϑ p ) is assumed be a non-empty, closed and convex set, which agrees with the strategy set Ω p , ϵ > 0 is referred to as the regularization constant.The regularizer often satisfies one of the following distinct assumptions that ensure the existence of a unique solution of MD and (10): Assumption 3. ϑ p : R np → R∪{∞} is closed, proper, convex, with dom(ϑ p ) = Ω p non-empty, closed and convex.In addition, ϑ p is, (i) Legendre (i.e., strictly convex, steep, int(dom(ϑ p )) ̸ = ∅) and supercoercive, or (ii) ρ-strongly convex, ρ > 0.
We note that for ϑ p to be steep, it means that ∥∇ϑ p (x p k )∥⋆ → +∞, ∥ • ∥⋆ is the dual norm, whenever {x p k } ∞ k=1 is a sequence in rint(dom(ϑ p )) = rint(Ω p ) converging to a point in the (relative) boundary.Let ψ p ϵ ⋆ be the convex conjugate of ψ p ϵ , ψ p ϵ := ϵϑ p .Then by Lemma 3 and Lemma 2 (in the Appendix), under Assumption 3, all values in Ω p except those at the boundary; C p ϵ could map to boundary points otherwise.We refer to C p ϵ as the mirror map induced by ψ p ϵ and Legendre or strongly-induced (by ψ p ϵ ) when ϑ p satisfies Assumption 2(i) or (ii).Note that the notions of Legendre and strongly convex need not be dichotomous.It is possible for a C p ϵ to be both Legendre and strongly-induced.The following examples capture the three types of mirror maps that are extensively used in the literature [4], [5], [15], [17]- [21], [27], [28].
, hence it satisfies Assumption 3(i), (ii).In this case, C p ϵ = Id and it is both Legendre and strongly-induced.
, where we assume 0 log(0) = 0. ϑ p can be shown to be , hence ϑ p satisfies Assumption 3(ii) and is non-steep.Here, C p ϵ = π Ω p is the Euclidean projection onto Ω p .We can represent MD in a more compact stacked notation, where where x = (x p ) p∈N , z = (z p ) p∈N are the rest points of MD and p∈N is the inverse (or the preimage) of Cϵ.The rest condition (12) implies that x is an interior NE.Hence if a trajectory z(t) of MD comes to a rest, x(t) = Cϵ(z(t)) converges to an interior NE.We note that the uniqueness of x does not imply the uniqueness of z unless Cϵ is Legendre-induced (for which Cϵ is one-to-one; this follows from Legendre theorem [52]).Hence, in general, the convergence of MD is to a set and not to an equilibrium and z(t) may continue to evolve even after x(t) has reached an equilibrium.Note that rest points are not the only game relevant solutions for which MD may converge to.As pointed out in [18], there are non-rest points that occur on the boundary of Ω that are asymptotically stable under MD.The following lemma broadly summarizes the key convergence properties of MD (refer to [18], [19] for proofs).
Lemma 1.Let G be a concave game with a globally strict VSS x ⋆ .Suppose that all players choose strategies according to MD (11).Let x(t) = (x p (t)) p∈N = Cϵ(z(t)) be generated by (11) and Cϵ = (C p ϵ ) p∈N be induced by ψ p ϵ := ϵϑ p .For any γ, ϵ > 0 and any In general, MD does not converge beyond strict VSS, i.e., a mere VSS.This poses considerable limitations in practice, for instance, mere VSS are commonly found in ZS games.[12], [13] showed that every ZS finite game is merely (but not strictly) monotone, hence all of its NEs are mere (non-strict) VSS.A standard method to overcome non-convergence to the NE in ZS games is to calculate a timeaveraged (ergodic) trajectory xavg(t) = t −1 t 0 x(τ )dτ in tandem with MD, that is, for which the time-averaged strategy xavg has been shown in many contexts to converge, e.g., [18].The main critique of using MDA is that the actual strategies do not arrive at the NE in the longrun, thereby making it unsuitable for on-line equilibrium seeking in the absence of a central planner or coordination between players.Furthermore, averaging may fail to converge outside of ZS games [57], which makes this approach vulnerable to parameter perturbation.
Another method for overcoming non-convergence is through discounting, which was studied in [28], where DMD stands for discounted mirror descent.Compared to MD, an extra −z term is inserted in the ż system, which translates into an exponential weighted decay (or discounting) term in the closed-form solution z(t).DMD has a connection with the so-called weight decay method in the machine learning literature [41], as z ∈ C −1 ϵ (x) can be shown to be equivalent to a (usually non-Euclidean) regularization term, which directly interacts with the monotonicity property of U .While it is known that DMD could converge exactly in subclasses of finite games that exhibit symmetric interior NEs, whereby Cϵ is also chosen to enforce symmetry (see [27]), in general, it cannot converge exactly to a NE, which also means that it cannot converge exactly to a VSS.This directs our attention to alternatives methods, such as higher-order augmentation of game dynamics [27], [33]- [35].

IV. SECOND-ORDER MIRROR DESCENT DYNAMICS
We now propose the second-order mirror descent, which in terms of each player p appears as, and in stacked notation, where which coincides with that of MD, i.e., x = x ⋆ is an interior NE.
Remark 5. (Intuitive Learning Interpretation of MD2) We can explicitly write ξ p (t) = (ξ p i (t)) i∈[np] as, which represents an "exponential weighting" of the strategies.Hence, we refer to ξ as the primal aggregate, and z can be referred as the dual aggregate.We note that ξ p resides in the unconstrained (ambient) space R np containing the action set.Note that, whenever ξ p (0) is initialized in Ω p , then ξ p (t) ∈ Ω p for all times.Furthermore, noting that we can re-write żp subsystem in MD2 as żp = γ(U p (x) − αβ −1 ξp ).By assuming z p i (0) = 0, ∀i, p, ).It can be seen, when t is small, the effects of ξ p i (0) and x p i are at their largest, so the presence of ξ term takes into consideration the uncertainty during the initial periods of play.Hence, the incorporation of ξ has an exploratory effect on the play, which can result in strategies being played more conservatively, preventing the players from immediately reaching a deadlock.Remark 6. (Comparison with Existing Dynamics) The closest dynamics related to MD2 is the higher-order exponentially discounted dynamics (H-EXPD-RL) for finite games [27].Specifically, when the higher-order augmentation is taken to be a high-pass filter, we obtain, (17) can be seen as the second-order extension of DMD.In [27] it was observed that ( 17) is robust to a greater degree of parameter perturbation in monotone games as compared to DMD.Let's now consider MD2 as a purely second-order ODE in the dualspace.To simplify our calculations, we assume x(t) = Cϵ(z(t)) is differentiable.By using MD2 and noting that ż = γ(U (x)−αβ −1 ξ)), taking a time derivative of ż and making the assumption that U and Cϵ are C 1 , we have, (18) or by using the chain-rule for Jacobian, where U • Cϵ := U (Cϵ).To the best of our knowledge, most of the existing second-order continuous gradient-type dynamics (see [39] and references therein) cannot recover MD2 due to the presence of the Jacobian of Cϵ.This is because in the unconstrained, primal setting (as studied in [26], [39]), Cϵ is the identity and hence, J Cϵ (z) = I , ∀z and its contribution is lumped together with βγ −1 I .
We proceed to demonstrate that the primal aggregate ξ contributes to the convergence of MD2 beyond strict VSS and such convergence depends on both the property of the regularizer as well as the topological properties of the underlying strategy set.
We now sharpen Theorem 1 by considering several other classes of games in which MD2 is guaranteed to converge.From [2], [50], Observe that when x ′ in the above definitions is NE x ⋆ of a pseudo-monotone game (or a globally strict NE of a quasi-monotone game), it automatically implies the associated NE is globally mere VS.Moreover, x ⋆ is globally strictly VS when the above games are strict.Since MD converges to globally strict VSS [1], therefore it converges to NE in strictly pseudo-monotone games and strict NE in strictly quasi-monotone games.Our next corollary, which immediately follows from Theorem 1, partially generalizes these results to the non-strict setting under MD2.
Corollary 1.Let G be a concave game and assume that every NE is interior.Suppose that all players choose strategies according to MD2 (13).Let x = (x p (t)) p∈N = Cϵ(z(t)) be generated by (13) and Cϵ = (C p ϵ ) p∈N be induced by ψ p ϵ := ϵϑ p , ϑ p is C 2 and satisfies Assumption 3(i), ∀p, then for any α, β, γ, ϵ > 0 and any x(0) = Cϵ(z(0)), (i) x(t) converges to an interior NE x ⋆ whenever G is merely monotone or pseudo-monotone, and, (ii) x(t) converges to an interior strict NE x ⋆ if G is quasimonotone, whenever x ⋆ exists.
The same conclusions hold whenever ϑ p satisfies Assumption 3(ii) and Ω p is compact ∀p.
Remark 7. We note that the existence of a NE in pseudo-monotone games is guaranteed under our continuous game assumption [

V. ADDITIONAL CONVERGENCE PROPERTIES OF MD2
In this section, we investigate two additional properties of MD2, namely, that of rate of convergence and regret minimization.To account for the geometry of the problem, we provide a relative extension to the strong VSS.Definition 4. Let h : R n → R ∪ {∞} be any differentiable, convex function with domain dom(h) = Ω.Then x ⋆ ∈ Ω is η-relatively strongly VS (with respect to h) if for all x ∈ Ω, U (x We note that Definition 4 is analogous to that of a relatively strongly monotone game, which was previously introduced in [29].Following [29,Theorem 4.4], it can be shown that MD converges to a relatively strongly VSS in O(e −γηϵ −1 t ) given a strongly-induced Cϵ that is adapted to the geometry of this VSS.We now wish to provide a similar result for the convergence of MD2 towards a relatively strong VSS.However, exponential convergence does not follow from Theorem 1. Instead, we propose an augmented version of MD2 that exhibits exponential convergence, where γ(0) > 0. Observe that MD2γ is equivalent to the nonautonomous system, whose rest point condition coincides with that of MD2.
Remark 8. From ( 20), MD2γ can be seen as MD with a vanishing perturbation g(t, z, ξ) = e −ηϵ −1 t α(0)(x − ξ), which allows for the following simple learning interpretation: when the players are aware that the game being played has a η-strong VSS, they no longer bother with exploring the strategy space during the initial stages and instead discard the extra information represented by x − ξ exponentially fast.
Next, we turn our attention towards the question of regret minimization.Let us define the time-averaged (external/static) regret function of player p ∈ N as, Intuitively, the regret value at time t represents the time-averaged sum total of the payoff difference between the actual strategy x(t) = (x p (t); x −p (t)) versus the best strategy (y p ; x −p (t)) that the player p could have played in hindsight, given any opponents' strategy x −p (t) ∈ Ω −p .Player p's dynamics that generate x p (t) is said to achieve no-regret with respect to (23) if lim sup t→∞ R p (t) ≤ 0.
While MD was shown to achieve no-regret in finite games, it cannot converge in ZS finite games with an interior mixed equilibrium [21].In contrast to [20], [21], our next result, coupled with Theorem 1, shows that MD2 achieves both no-regret as well as exact convergence whenever the (interior) equilibrium is a mere VSS.
Theorem 3. Let G be a concave game, where Ω p is compact for all p ∈ N .Then for every continuous trajectory of x −p (t) of opponents of player p, each player that uses MD2 achieves no-regret independently from the rest of the players.

VI. CONSTRUCTION OF SECOND-ORDER PRIMAL-SPACE DYNAMICS
In this section, we show that MD2 can either be used to create new primal-space dynamics (that is, dynamics that solely evolve on Ω) or recover existing ones.To simplify our presentation and to avoid the technicality of non-differentiable mirror maps Cϵ (for which examples of induced primal-space dynamics can still be generated through a technical treatment, see the so-called projection dynamics discussed in [18]), we impose the following general assumption, Assumption 4. ϑ p : R np → R∪{∞} is steep and induces a C 1 mirror map Cϵ = (C p ϵ ) p∈N = (∇ψ p⋆ ϵ ) p∈N .
Under this assumption, taking the time-derivative of x = Cϵ(z) shows that MD2 can be written as a pair of differential inclusions, where γ > 0, α, β ≥ 0 and the Jacobian of Cϵ is, . . .
and each We note that the inclusions in ( 24) come from z ∈ C −1 ϵ (x).We can further express (24) in each p as, In the following, we offer two general ways of re-writing (25) as a set of ODEs in the primal-space.
We can also reduce (27) to a single population model (p = 1), which takes the following form: where A = {1, . . ., n} represents n subpopulations of non-atomic players, and U i represents the fitness of subpopulation i.We note that structurally (28) shares similarity with the anticipatory replicator dynamics studied in [35], which can be written as, Note that the variable ξ j in (28) does not directly interacts with the pseudo-gradient/payoff vector U (x), as is the case for the anticipatory RD.
Legendre and Supercoercive: A more structured scenario is when ψ p ϵ := ϵϑ p is Legendre and supercoercive for all p (Assumption 3(i)).Since ψ p ϵ is finite on int(Ω p ), coercive, strictly convex and C 2 , then applying [62, Example 11.9, p. 480], ψ p ϵ ⋆ shares the same properties and we have that ), which allows us to write (25) or (26) as, which we refer to as the second-order natural gradient descent (NG2).NG2 is so named because in the optimization setup (p = 1), for U = −∇f , where f : R n → R∪{∞} is some objective function and α, β = 0, is analogous to the so-called natural gradient descent Example 8. (Unconstrained) Let Ω p = R np , ϑ p (x p ) = 1 2 ∥x p ∥ 2 2 , then ∇ 2 ψ p ϵ (x p ) = ϵI where ψ p ϵ = ϵϑ p .By comparing with NG2, we obtain, which recovers the one recently introduced in [26].(30) can be further shown via discretization to recover algorithms such as optimistic gradient-descent/ascent, Polyak's heavy-ball method, among others (see [26]).While (30) is by far the most standard choice for unconstrained action sets, more than one type of dynamics can reside on the same action sets.To witness, let which represents an interpolation between entropic and Euclidean terms [61].ϑ p is Legendre and supercoercive (its convex conjugate is . By comparing with NG2, we obtain, where S(x p ) := diag( ), which translates into a version of (30) with a strategy-dependent coefficient.

VII. DISCRETE-TIME SECOND-ORDER DUAL AVERAGING WITH NOISY OBSERVATIONS
So far we have considered a continuous-time setup whereby each player is able to acquire a partial-gradient U p at each time instance.In practical scenarios where the games are played in discrete-time, the acquired pseudo-gradient information could be corrupted due to a multitude of reasons, such as a noisy communication channel.This leads us to consider the so-called "noise-corrupted" pseudo-gradient scenario (also known as "semi-bandit" learning).In this setting, each player receives a realization of a so-called noise-corrupted version of the true pseudo-gradient, where ζ p k is some noise process.A special case of the semi-bandit scenario is when the payoff is obtained as the expectation of the true payoff, i.e., U p (x p k ; v p k some random vector, where E denotes the expectation operator.Here U p k is an estimate of the expected partial-gradient Let's consider the convergence of the discrete-time MD2 with noisy observations which we refer to as second-order dual averaging or DA2, where X p k , Z p k , Ξ p k are the stochastic counter-part of (resp.)x p , z p , ξ p at time k.{t k } k∈N , {τ k } k∈N denote deterministic sequences of non-increasing step-sizes, assumed to be common for all players.α, β, γ > 0 are the auxiliary parameters as from before.When α = β = 0, we refer to the resulting expression as the dual averaging with noisy observations, which coincides with the dual averaging scheme studied in [1] for γ = 1.
To analyze the convergence behavior of DA2, we impose the following set of regularity assumptions [54], [55].
Assumption 5. (ℓ 2 \ℓ 1 -Summability and Diminishing Step-sizes) p∈N is a L 2 -bounded marginal difference process adapted to the filtration {F k } k∈N : each ζ k is a random vector that is measurable with respect to F k for each k, where each F k is the σ-field, i.e., F k = σ(Ξ 0 , Z 0 , ζ 0 , . . ., ζ k ) and F k ⊆ F k+1 .In particular, {ζ k } k∈N satisfies, and for some σ ≥ 0, Assumption 7. (Bounded Iterates) Finally, we impose "global integrability" on MD2, i.e., MD2 has a complete vector field.To do so, we need to convert MD2 into a first-order system by defining: ω p := (ξ p , z p ), which generates the following stacked system on or equivalently, ωp = F p (ω) = g p (z) + A p (ξ p ), where, A p (ξ p ) = −βξ p γαξ p and g p (z) = We then proceed to impose the following assumption on the overall system, Assumption 8. (Global Integrability) The vector field F : R 2n → R 2n of ( 39) is continuous globally integrable, that is, for every initial condition (ξ(0), z(0)) ∈ R 2n , the unique solution of ( 39) is defined for all t ∈ R.
Remark 10.We note that global integrability of MD2 is satisfied whenever the following holds, (i) Suppose C p ϵ is bounded on R np and U p is bounded continuous locally Lipschitz on ran(C p ϵ ) for all p ∈ N , then F (39) is bounded locally Lipschitz and hence continuous globally integrable.This follows from [54].(ii) Suppose U p is continuous locally Lipschitz and both C p ϵ and U p • Cϵ are sublinear for all p, that is, then F is sublinear and hence continuous and globally integrable.This follows from [56].
Theorem 4. Let G be a concave game and assume that every interior NE is globally merely VS.Suppose that all players choose strategies according to DA2 and Cϵ = (C p ϵ ) p∈N is induced by ψ p ϵ := ϵϑ p , where ϑ p is C 2 and satisfies Assumption 3(i).Suppose that the following assumptions hold, • Assumption 5 (ℓ 2 \ℓ 1 Summability and diminishing step-sizes) • Assumption 6 (L 2 -bounded Martingale difference noise), • Assumption 7 (Bounded iterates) • Assumption 8 (Global integrability) then X k converges to an interior mere VSS of G almost surely.The same conclusions hold whenever ϑ p satisfies Assumption 3(ii) and Ω p is compact ∀p.
Remark 11.The closest result to ours is [1,Theorem 4.7], but for DA (i.e., α, β = 0).Assumption 5, Assumption 6 are identical to theirs and Assumption 8 encapsulates the Lipschitz continuous requirement for U .The major departure is Assumption 7, which does not hold under certain circumstances, such as convergence towards (boundary) strict VSS in finite games.Hence our result only deals with interior mere VSS in general.To account for these boundary VSS possibly involves an extension of the inductive shadowing argument used by [1] which we leave for future work.
Remark 12.In addition to DA [1], there are several other algorithms that converge in similar settings.The most notable example is the mirror-prox/optimistic mirror descent algorithm, which converges to mere VSS in perfect-gradient feedback setting [46].However, the authors of [46] noted that convergence in null-coherent saddle point problems (a game with a mere VSS) fails in the presence of noise.This is a key advantage of DA2 over mirror-prox.Another algorithm is the stochastic iterative Tikhonov method of [25], which converges in the semi-bandit setting.However, [25] requires the game to be strictly monotone, whereas DA2 does not require monotonicity.

VIII. SIMULATIONS
In this section, we consider three illustrative examples.The first example is the RPS game with a non-negative payoff for ties as in Example 3. We provide convergence behavior of MD and MD2 as well as MDA towards mere and weak VSS.We then provide the convergence behavior of DA2.The second example concerns a wireless power control game previously studied in [58].We show that DA2 converges in this game under different noise assumptions and study the effect of the number of players.Finally, we provide an example involving a generative adversarial network (GAN), which admits a locally mere VSS.We show that DA2 also converges in this game under different noise assumptions.In lieu of exact estimation of the basin of attraction for locally mere VSS, which is difficult, we deal with all such cases through appropriate initialization.
We begin by contrasting the continuous-time MD and MD2.For ς = 1, this game is merely monotone, x ⋆ is globally merely VS, MD2 converges by Theorem 1 while MD diverges (Figure 1).For ς = 1.1, the game is 0.1-weakly monotone, but since x ⋆ is only 0.1-weak VSS therefore it can be considered a nearly mere VSS (see Remark 2).In this case, MD2 still converges while MD approaches a heteroclinic orbit (Figure 2).It is useful to also examine the advantage of MD2 over timeaveraged MD (MDA).While MDA does converge for ς = 1 towards the mere VSS, when the equilibrium becomes just slightly weak (ς = 1.1), it no longer converges and instead approaches a Shapley triangle [57] (Figure 3).This shows time-averaging is in general not a panacea to non-convergence.
Next, we perform a set of experiments for the globally merely VS case (ς = 1) using DA2, where we assume the same initial condition as before.For each of the following simulations, we separately perturb the pseudo-gradient U p (9) with zero-mean Gaussian noise  with the same variance σ 2 ζ > 0 across all players.Such a perturbed gradient setting could describe a game involving multiple users interacting over a fully connected network with noisy communication channels.Figure 4 shows that DA2 easily converge in the low variance regimes σ 2 ζ = 1.As we employ larger variance, e.g., σ 2 ζ = 10, the standard step-sizes no longer leads to convergence: a larger t k will amplify the additive noise.Instead, we utilize stepsize sequences of the form c k ι and search over both ι ∈ (0, 1), c > 0 for optimal sets of parameters.The simulation with tuned step-size is shown in Figure 5.We see that despite the larger noise injected into U p , the strategies X = (X p ) p∈N still converge toward the vicinity of the mere VSS.Example 11. (Wireless power control game with a non-concave potential) Consider the wireless power control game in [58], where N = {1, . . ., N } network users decide on intensities x p ∈ R of power flow to send over a wireless network.These intensities are converted to the transmitted power through an exponential function, i.e., exp(x p ) ∈ R >0 .The payoff function for each user p ∈ N is modeled as, where a p ∈ (0, 1], ∀p and K p (x p ) = b p log(1 + exp(x p )) − c p x p is the cost of user p for transmission, b p > 0, c p ≥ 0. The partialgradient U p is calculated to be, The second-order partial derivative of U p can be calculated to be, which generates a symmetric Jacobian J U (x), ∀x, hence the game is a potential game with a non-concave potential function P such that ∇P = U .Consider an example with N = 2 and problem parameters, (3,3).For all players, Z p (0) is sampled uniformly from [0, 10], Ξ p (0) = 0.The NE of this game can be found at x ⋆ = (1.8663,1.8663).The Jacobian at the NE is J(x ⋆ ) = diag( −4.308 0 ⊤ ), hence by Proposition 1, the NE is a locally mere VSS.By Theorem 4, DA2 converges towards this NE.For each of the following experiments, we simulate for T = 10 4 steps and report the strategies at termination.We restrict Ω p to be a large but compact set [−1000, 1000] 2 and set C p ϵ to be the projection operator, C p ϵ (Z p ) = argmin y p ∈Ω p ∥ϵ −1 Z p − y p ∥ 2 2 with ϵ = 1.Unless specified otherwise, all other parameters γ, α, β are kept as 1.All additive noise are zero-mean Gaussian with the same variance σ 2 ζ .We start our experiment with a small variance σ 2 ζ = 0.1 and set the step-sizes to be t k = 0.39/k 0.26 , τ k = 0.12/k 0.64 .Figure 6 shows that DA2 converges to the NE.We increase the variance to σ 2 ζ = 10 and Figure 7 shows similar observation.Next, we investigate some possible computational issues can arise when we increase the number of players in this game.Consider an example with N = 10 players.The game parameters are: (15,20,20,14,21,20,16,19,17,17), c = (13,12,14,8,10,19,10,12,15,1).
A variance of σ 2 ζ = 1 is applied for the following experiments.We employ step-size sequences t k = 0.4 /k 0.033 , τ k = 0.78 /k 0.001 across all players.The NE of this game is located at x ⋆ = (1.87,0.41, 0.85, 0.28, −0.10, 16, 0.51, 0.54, 2.01, −2.77) (rounded to two decimal places) and was confirmed to be a nearly mere VSS (Remark 2).We note that an exact mere VSS is difficult to produce for this game when there are a large number of players.Figure 8 shows that the strategy X 1 converges to (1.87, 0.41, 0.85, 0.28, −0.10, 11.29, 0.51, 0.54, 2.01, −2.77) which is exactly the same as x ⋆ in all entries except for the sixth entry (16.51 vs 11.29), which is likely due to the small rate parameters employed in the step-sizes.This highlights the difficulty of employing the same step-size sequences across all players.We adjust all the parameters β, α and step-sizes ck −ι , ι ∈ (0, 1), c > 0 on a per-player basis, which resulted in a much closer convergence to the NE as opposed to applying them across all players uniformly (Figure 9).Example 12. (Learning to Generate a Gaussian) Let Z ∼ P (z) and X ∼ Q (x ) be two random variables.We wish to construct a model G θ : R r → R m , r , m ≥ 1, with an unknown, continuous parameter θ such that G θ (Z) recovers the statistics (mean, variance, etc.) of X .Following [53], G θ can be constructed through solving, where the model Dw : R m → R is parametrized by an unknown continuous parameter w.This problem can be thought of as a game between the owners of the models G θ and Dw whereby θ, w are their respective strategies.The owner of G θ (or the designer) varies θ and uses G θ (Z) to estimate the statistics of X , whereas the owner of Dw (or the tester) varies w in an attempt to incur the largest penalty possible for the discrepancy between G θ (Z) and X .We construct a more complicated variant of the examples in [28], [53], whereby we assume that the G θ owner wishes to learn the mean and variance of a one-dimensional Gaussian at the same time: let Z ∼ N (0, 1), and Let x 1 = θ, x 2 = w, we obtain a two-player ZS game, and U 2 (x 2 ; x 1 ) = −U 1 (x 1 ; x 2 ) where the action sets are, Since 1 and x 2 1 is not restricted to be positive, therefore, this game is not monotone [50,Prop. 2.3.2].However, at the interior NE, 2 )(0) = 0.By Proposition 1(ii'), this implies that x ⋆ is locally merely VS.
First, we consider the noiseless case, i.e., σ 2 ζ = 0. From Figure 11, we see that X 1 converges to X 1⋆ (highlighted using a green star).Next, we consider the harder case where U p is subjected to zeromean Gaussian noise with a very large variance σ 2 ζ = 30 2 for each p.The trajectories of X 1 are shown in Figure 12.Our previous observation continues to hold.

IX. CONCLUSIONS
In this paper, we have shown that MD2 overcomes many shortcomings of the first-order MD as well as other members of this class of dynamics, such as MD with time-averaging or discounting.To summarize our findings: first, we showed that the trajectories of MD2 converge exactly to an interior mere (and not necessarily strictly) VSS (Theorem 1).As such, it converges to interior mere NEs of monotone and pseudo-monotone games, and interior strict NE of quasi-monotone games (Corollary 1).Through simulation, we also found that MD2 is robust around interior mere VSS (Remark 2, Example 10).Next, we provided a modification to MD2 based on vanishing perturbation which results in an exponential rate of convergence towards a strongly stable VSS (Theorem 2).We then showed MD2 can achieve no-regret while converging beyond a strict VSS, such as to an interior/fully-mixed NEs of zero-sum finite games (Theorem 3).This result improves our current understandings between no-regret learning and convergence [20], [21].We then provided several ways of deriving primal-space dynamics from MD2.Finally, we moved to the scenario where the game evolves in discretetime while the pseudo-gradient is subjected to noise perturbation.Using stochastic approximation techniques, we were able to relate the limiting behavior of MD2 with that of its discrete-time, noisy counterpart, and showed that the discrete MD2 with noisy observations or second-order dual averaging (DA2) can converge to the interior mere VSS under standard regularity assumptions (Theorem 4).
There are several outstanding issues that could be better understood.First, our work did not provide a thorough convergence proof for local VSS, which are dealt with appropriate initialization.Moreover, we did not address convergence towards boundary point for MD2, as our proofs relied on an interiority condition associated with interior NE (14).One possible approach for dealing with these boundary solutions is to adopt the approach of [18], by restricting ourselves to the context of finite games, where these boundary solutions are quite relevant.Another basic issue is that it is still not entirely clear to us at this time how MD2 relates to other existing continuous-time dynamics for optimization or games.Aside from the cases that we know of, e.g., [26], it is possible that there are other existing algorithms that appear as specific instances of MD2.
Our work opens up an extensive array of directions on the interface between dynamical systems and games.As a starting point, in the continuous-time, an interesting direction would be to explain the difference between MD and MD2 in zero-sum games through a volume compressibility perspective as in [15], [22].We can also try to craft third or even higher-order versions of MD based on the technique in [27].In the discrete-time case, we can further reduce the information, to that of the zero-order information case (or full-bandit feedback).Moreover, we can tackle the case whereby the game itself has parameters that are time-varying.Finally, it is also worthwhile to examine MD2 in a discrete-time setting whereby the mirror map is replaced with a proximal operator.

X. APPENDIX
The following two lemmas are standard in this area of literature, see [18], [22], [28].In particular, Lemma 2 follows directly from the Legendre theorem [52].Then the proof for all these claims boil down to the equality condition found in [59,Prop. 1.44], in which it be can shown, 48), we have, Let x ′ = x ⋆ , then (x − x ⋆ ) ⊤ U (x ⋆ ) ≤ 0 and we have, By the definiteness assumptions of J U (x) on T Ω (x) for all x ∈ Ω and x = θx implies statements (i), (ii), (iii).Next, suppose the definiteness condition on J U (x ⋆ ) holds for all y ∈ T Ω (x ⋆ ).Since U is assumed to be continuously differentiable, therefore y ⊤ J U (x)y is continuous for all x ∈ Ω.Using the standard ϵ − δ definition of continuity, this means for every ϵ > 0 there exists a δ(ϵ) > 0 such that for all x ∈ Ω, ∥x − x ⋆ ∥ < δ(ϵ) =⇒ |y ⊤ J U (x)y − y ⊤ J U (x ⋆ )y| < ϵ.The latter statement, |y ⊤ J U (x)y − y ⊤ J U (x ⋆ )y| < ϵ, implies y ⊤ J U (x)y < ϵ + y ⊤ J U (x ⋆ )y.Since this holds for every ϵ > 0, therefore this is equivalent to y ⊤ J U (x)y ≤ y ⊤ J U (x ⋆ )y, and hence J U (x) shares the same definiteness property as x ∈ D. This implies holds locally on the set of all x, and (i ′ ), (ii ′ ), (iii ′ ) follows.The isolation condition of (i), (i ′ ) are proven in [1, Prop.2.7].
Proof.(Proof of Theorem 1) (i) Consider the Lyapunov function, Here, V is composed of a quadratic distance term, which measures the progress of ξ(t) towards ξ ⋆ = x ⋆ , added onto a function which consists the collection of all the terms associated with the Fenchel-Young inequality [48, p. 88], also known as the Fenchel coupling in [1].Since ϑ p satisfies Assumption 3(i), V (ξ, z) is continuous, positive definite (due to the Legendre theorem), V (ξ, z) = 0 iff ξ = x ⋆ and z = C −1 ϵ (x ⋆ ).Taking the time-derivative of V (ξ, z) along the solutions of MD2, using x = Cϵ(z), Substituting in ξ = β(x − ξ), we have, where we have used x ⋆ is a mere VSS.Observe that V (ξ, z) = 0 only if ξ = 0 ⇐⇒ x = ξ.By the Legendre theorem [52], ψ p ϵ ⋆ is coercive, which implies V (ξ, z) is coercive (radially unbounded) on R n × R n and hence all of its sublevel sets are compact.Let Dc = {(ξ, z) ∈ R n ×R n |V (ξ, z) ≤ c} be a compact sublevel set for some c > 0.
(ii) Suppose instead Ω p is compact for all players p.Let x ω be an ω-limit point of x(t) = Cϵ(z(t)) of MD2.Since x(t) is bounded for all t ≥ 0, the existence of x ω follows from Bolzano-Weierstrass theorem.Suppose that x ω is not a globally mere VSS x ⋆ .Take Oω be an open ball around x ω .By definition, there exists a sequence {x(t k )} k∈N , x(t k ) ∈ Oω, converging towards x ω , where {t k } k∈N is an increasing sequence of times.Following the technique in [19], we wish to build an auxiliary sequence in Oω and show that V is unbounded below along this sequence, thereby obtaining a contradiction.
First, note that since ϑ p is ρ-strongly convex, ψ p ϵ := ϵϑ p is ϵρstrong convex, and hence by the conjugate correspondence theorem [48,Theorem 5.26] we have, where b(τ Oω is open, therefore there exists some δ > 0, such that Bτ (x(t k )) ⊆ Oω for all τ ∈ [0, δ] and hence x(t k + τ ) ∈ Oω.Integrating (52), we have, where V 0 := V (ξ(0), z(0)) is some constant.Since x ⋆ is assumed to be a globally mere VSS, therefore , we obtain, Expressing this inequality in terms of the sequence {x(t k )} k∈N and {ξ(t k )} k∈N , −∥x(τ ) − ξ(τ )∥ 2 ≤ 0, and using the δ which we have previous found, we have, for some k.Since Ω is compact, x is continuous, ξ is a continuous function of x (15), and ξ is not eventually identically equal to x (otherwise, using the rest point condition for MD2 (14) with our assumption that all interior NE are globally mere, we arrive at which shows that V is unbounded below as k → ∞, a contradiction.
Proof.(Proof of Theorem 3) Without loss of generality, let γ = 1.Let y p ∈ Ω p be an arbitrary strategy.Since U p (y ), ∀t ≥ 0, therefore, the integral term of ( 23), can be written as (suppressing time-index for brevity), (72) Carrying on from (65) and using (72), we have, Applying Fenchel's inequality [52]  Since Ω p is assumed to be compact, and the numerator is continuous on all of Ω p , therefore by Weierstrass theorem [48, Theorem 2.12] a maximum is achieved.Taking the limsup yields the desired result.When α = 0, this proof recovers no-regret bound of MD [21].
The proof of Theorem 4 requires the following definitions from [54]- [56].Recall that a semiflow Φ on a metric space (M, d) is a continuous map Φ : T × M → M, (t, x) → Φ(t, x) = Φ t (x), where T = R ≥0 , such that, Φ 0 = Id and Φ t+s = Φ t • Φs, for all (t, s) ∈ R ≥0 × R ≥0 .If T = R, then Φ defines a flow.We say that a continuous function f : R >0 → M is an asymptotic pseudotrajectory of Φ if lim t→∞ sup 0≤h≤T d(f (t+h), Φ h (f (t))) = 0 for any T > 0. A set A ⊂ M is positively invariant if Φ t (A) ⊂ A for all t ≥ 0 and invariant if Φ t (A) = A for all t ∈ T.
Proof.(Proof of Theorem 4) For simplicity, we assume that all constants associated with MD2 are set to 1.We break up our proof into the following steps: And similarly for Ξ p k+1 , we have, which shows that MD2 is the mean ODE of DA2. 2) Next, we need to build interpolated processes for Z k and Ξ k and show that the interpolated processes converge to a rest point of MD2.The full construction process is provided as follows: let {ℓ k } k∈N be a sequence such that, and define the interpolated process, where 0 ≤ s < t k+1 .Similarly, let {ℓ ′ k } k∈N be a sequence such that, and define the interpolated process associated with Ξ p as, where 0 ≤ s ′ < τ k+1 .3) Let Z p : R ≥0 → R np and Ξ p : R ≥0 → R np denote the continuous functions (as functions of s, s ′ respectively) associated with the above processes and let Z = (Z p ) p∈N , Ξ = (Ξ p ) p∈N be the stacked-vector of all the individual interpolated processes.4) Next, define the overall process, Define the interpolated process W .By Proposition 4.1 and Proposition 4.2 of [54], under Assumption 5 -8 W is an asymptotic pseudo-trajectory of the semiflow Φ : R ≥0 ×R 2n → R 2n induced by ω = F (ω) (39), that is, lim t→∞ sup 0≤h≤T ∥W (t + h) − Φ(h, W (t))∥ 2 = 0, (88) for any T > 0. 5) By Assumption 7, W has compact closure, i.e., is pre-compact.
Since we have shown that W is a pre-compact APT of Φ induced by ω, by [54, Theorem 5.7(i)] the limit set is a constant, which implies int(V = ∅, therefore by [55,Prop. 3.27], L(W ) is contained in E. By our assumption that every NE is a mere VSS, hence every point in E corresponds to an interior mere VSS and L(W ) ⊂ E, therefore L(W ) contains a compact subset of the rest points of ω. 8) By the definition of a limit set, for any W (0), the interpolated process W (t) converges as t → ∞. 9) From our construction of the interpolated process and the dimin-ishing step-size assumption, i.e., 0 ≤ s < τ k+1 , 0 ≤ s ′ < t k+1 and lim k→∞ t k = lim k→∞ τ k = 0, the convergence of the interpolated processes Ξ and Z implies the convergence of Ξ k , Z k respectively.10) By continuity of Cϵ, it follows that X k = Cϵ(Z k ) converges almost surely an interior mere VSS x ⋆ .

TABLE 1 :
A list of main notations used in this paper.