A Converse Result on Convergence Time for Opportunistic Wireless Scheduling

—This paper proves an impossibility result for stochastic network utility maximization for multi-user wireless systems, including multiple access and broadcast systems. Every time slot an access point observes the current channel states for each user and opportunistically selects a vector of transmission rates. Channel state vectors are assumed to be independent and identically distributed with an unknown probability distribution. The goal is to learn to make decisions over time that maximize a concave utility function of the running time average transmission rate of each user. Recently it was shown that a stochastic Frank-Wolfe algorithm converges to utility-optimality with an error of O (log( T ) /T ) , where T is the time the algorithm has been running. An existing Ω(1 /T ) converse is known. The current paper improves the converse to Ω(log( T ) /T ) , which matches the known achievability result. It does this by constructing a particular (simple) system for which no algorithm can achieve a better performance. The proof uses a novel reduction of the opportunistic scheduling problem to a problem of estimating a Bernoulli probability p from independent and identically distributed samples. Along the way we reﬁne a regret bound for Bernoulli estimation to show that, for any sequence of estimators, the set of values p ∈ [0 , 1] under which the estimators perform poorly has measure at least 1 / 6 .


I. INTRODUCTION
This paper establishes the fundamental learning rate for network utility maximization in wireless opportunistic scheduling systems, such as multiple access systems and broadcast systems. The recent work [2] shows that a stochastic Frank-Wofe algorithm with a vanishing stepsize achieves a utility optimality gap that decays like O(log(T )/T ), where T is the time the algorithm is in operation. It does this without apriori knowledge of the channel state probabilities. This paper establishes a matching converse. A simple example system is constructed for which all algorithms have an error gap of at least Ω(log(T )/T ). Specifically, we construct a system with channel states parameterized by an unknown probability q ∈ [0, 1] such that for any algorithm, there is a set Q ⊆ [0, 1] with measure at least 1/6 under which the algorithm performs poorly. This is done by a novel reduction of the opportunistic scheduling problem to a problem of estimating a Bernoulli probability p from independent and identically distributed (i.i.d.) Bernoulli samples. Along the way, a refined statement regarding the regret of Bernoulli estimation is developed.
A partial version of this work was presented at the IEEE INFOCOM conference, 2020 [1].
This work was supported in part by one or more of: NSF CCF-1718477, NSF SpecEES 1824418.
A general structure for the class of opportunistic scheduling systems is as follows: The system is assumed to operate over slotted time t ∈ {0, 1, 2, . . .}. There are n users. Every slot t an access point allocates a transmission rate vector X[t] = (X 1 [t], . . . , X n [t]) for transmission of independent data belonging to each user. In the case of wireless multiple access systems, the n users transmit their data over uplink channels to the access point. It is assumed they use a coordinated scheme that allows successful decoding of all transmissions at the scheduled bit rates X[t]. In the case of wireless broadcast systems, the access point transmits data for each user over downlink channels at the scheduled bit rates X [t].
The set of all transmission rate vectors that are available on a particular slot t can change from one slot to the next. This can arise from time-varying connection properties such as channel states that vary due to device mobility. We model this time-variation by a random state vector S[t] ∈ R m that is observed by the access point at the start of every slot t (where m is a positive integer that can be different from n).
Subject to: X[t] ∈ D(S[t]) ∀t ∈ {0, 1, 2, . . .} (2) where φ(x 1 , . . . , x n ) is a given real-valued utility function of the average user transmission rates. This problem seeks to maximize the utility of the long term time average transmission rate vector. The function φ is assumed to be concave and entrywise nondecreasing. Let φ * be the optimal utility value, which considers all possible algorithms that operate over an infinite time horizon and yield probabilistically measurable random vectors X[t], including algorithms that have perfect knowledge of the probability distribution F S . This includes non-causal algorithms that have knowledge of future states It is challenging to design a (causal) scheduling algorithm that achieves a utility that is close to φ * , particularly when the distribution F S is unknown. Algorithms that are causal (having no knowledge of the future) and that have no a-priori knowledge of F S shall be called statistics-unaware algorithms. This paper establishes the fundamental convergence delay required for any statistics-unaware algorithm to achieve utility that is close to the optimal value φ * .
A general statistics-unaware algorithm may incorporate some type of learning or estimation of the distribution F S or some functional of this distribution. Observations of past channel states can be exploited when making online decisions. Consider some statistics-unaware algorithm that makes decisions over time t ∈ {0, 1, 2, . . .}. For each positive integer T , the expression ] is the utility associated with running the algorithm over the first T slots {0, 1, 2, . . . , T −1}. This utility includes decisions X[t] made at each step of the way (including the decision X[0] made at time t = 0 based only on the observation S[0]). Decisions must be made intelligently at each step of the way and fast learning is crucial. How close can the achieved utility get to the optimal value φ * ? What time T is required?
A naive greedy policy might choose X[t] ∈ D(S[t]) to maximize φ(X[t]) on each slot t. This is optimal when φ(·) is linear because time average expectations commute with linear functions. However, this greedy policy can be far from optimal in general. For example, consider a 2-user system with  p)), which is the utility function applied to the long term transmission rate vector (X 1 , X 2 ), leads to p * = 1/2 and X 1 , X 2 = (10, 9.5), which achieves the optimal utility for problem (1)- (2). In contrast, the greedy policy chooses user 1 every slot, so X 1 , X 2 = (20, 0) and user 2 never transmits! If there is concern that the greedy policy in this example is ambiguous due to the singularity at zero, a similar example holds under a modified utility (defined in the next subsection) that removes the singularity.

A. Example utility functions
Different concave utility functions can be used to provide different types of performance (with corresponding fairness properties). The following linear utility function seeks to maximize a weighted sum of average transmission rates: From a fairness perspective, linear utilities are undesirable. For example, suppose there are two users and the above linear utility function is used with for all t (perhaps because user 1 is always closer to the access point). To solve (1)- (2) in this case, it is optimal to always transmit with user 1. Thus, user 2 (unfairly) receives a time average rate of 0. One way to improve fairness is to use a new utility function: Under this concave (but non-smooth) utility function, the problem is to maximize the minimum time average transmission rate. Another type of utility function is the (smooth) function in (3), called the proportionally fair utility function [3] [4]. This function is often modified to remove the singularity at zero: where c > 0 is a constant. Other types of concave and nonlinear utility functions can be used for other types of fairness, such as α-fair utility functions [5][6] [7].

B. Prior work
The work [8][9] develops statistics-unaware Frank-Wolfe type algorithms (with various step size rules) for solving the problem (1)-(2) for smooth utility functions using a fluid limit analysis. An alternative statistics-unaware drift-plus-penalty algorithm of [10] [11] can be used to solve (1)-(2) for smooth or nonsmooth utility functions, and this achieves utility within of optimality with convergence time O(1/ 2 ). Drift-pluspenalty can also be used for extended problems of multihop networks with power minimization and constraints [12], and related algorithms for these extended problems are in [13] [14][15] [16].
The basic stochastic Frank-Wolfe algorithm for smooth utility functions is to choose is an average of past decisions: where η[t] is a stepsize. Recent work in [2] shows that, for smooth utility functions, this algorithm with a constant stepsize yields convergence time O(1/ 2 ), while a particular vanishing stepsize yields an improved O(log(1/ )/ ) convergence time. In particular, in the latter case we obtain where c > 0 is a particular system constant. The work [2] also provides a near-matching converse of Ω(1/T ). The problem of closing the logarithmic gap between the achievability bound and the converse bound was left as an open question. We resolve that open question in this paper by showing that the O(log(T )/T ) gap is optimal.
A related logarithmic convergence time result is developed by Hazan and Kale in [17] for the context of online convex optimization with strongly convex objective functions. The prior work [17] provides an example online convex optimization problem for which no algorithm can achieve regret less than Ω(log(T )). Specifically, they seek to choose X[t] ∈ [0, 1] for each slot t to make the following as small as possible where regret is measured by comparison to the best fixed decision immediately reduces the problem to an estimation problem that seeks to estimate the value of p from i.i.d. Bernoulli samples in order to minimize a sum of mean square error: The work [17] then provides a deep analysis of the Bernoulli estimation problem to show, via a nested interval argument, that for any sequence of Bernoulli estimators there exists a probability p ∈ [1/4, 3/4] under which the estimators have a sum mean square error that grows at least logarithmically in the number of samples. This prior work inspires the current paper. We show that certain opportunistic scheduling problems can also be reduced to Bernoulli estimation; then we can use the Bernoulli estimation result of [17]. However, this reduction is not obvious. Online convex optimization problems have a different structure than opportunistic scheduling problems and the same reduction techniques cannot be used. New techniques are used to establish the converse, including a novel reduction of the opportunistic scheduling problem to a Bernoulli estimation problem.
A conceptually related problem on the fundamental time required to minimize a deterministic convex function with noisy observations of the gradients/subgradients is treated in [17] (where an achievability result is developed concerning the error versus time) and from an information theoretic perspective in [18] (where converse results are developed); see also early work on computational complexity for this problem in [19]. At a high level, the convergence time and learning concepts for that problem are similar to the current paper. However, the structure and analysis of that problem is quite different. For example: (i) Unlike the problem of this paper, the fundamental asymptotic tradeoffs for that problem depend significantly on whether or not the convex function is strongly convex; (ii) The fundamental tradeoffs for that problem do not have the same logarithmic properties as the problem of the current paper.
Prior work on fundamental bounds on the time and accuracy for estimation problems is in, for example, [20] for classification problems, and work in [21][22] treats bounds for estimation of functionals of discrete distributions.
C. Our contributions 1) This paper proves an Ω(log(T )/T ) converse for opportunistic scheduling. This matches an existing achievability result and resolves the open question in [2] to show that this performance is optimal. 2) This paper shows that strongly concave utility functions cannot be used to improve the asymptotic convergence time for opportunistic scheduling problems in comparison to functions that are concave but not strongly concave. This is surprising because strong convexity/concavity provides convergence improvements in other contexts, including online convex optimization problems [23][24], deterministic minimization via gradient descent [25], and deterministic minimization via stochastic gradients [18][19] [17]. 1 This emphasizes the unique properties of opportunistic scheduling problems.
3) The technique for reducing opportunistic scheduling to Bernoulli estimation can more broadly impact future work on more complex networks (see open questions in this direction in the conclusion). 4) This paper refines the regret analysis for Bernoulli estimation theory in [17] to show that for any sequence of estimators, not only does there exist a probability p ∈ [1/4, 3/4] for which the regret grows at least logarithmically, but the set of all such values p has measure at least 1/6. This is used to establish a 1/6 result for opportunistic scheduling: If any particular statisticsunaware algorithm is used, and if nature selects the channel according to a Bernoulli process with parameter p that is independently chosen over the unit interval, then with probability at least 1/6 the algorithm will be limited by the Ω(log(T )/T ) converse bound. Shouldn't algorithms always be limited by this bound? No. Imagine a scheduling algorithm that makes an a-priori guessq ∈ [0, 1] about the true network probability q, and then makes decisions that are optimal under the assumption that the guess is exact. In the "lucky" situation whenq = q, this algorithm would perform optimally and would not be limited by the Ω(log(T )/T ) converse. Nevertheless, our analysis shows that every algorithm (including algorithms that attempt to make lucky guesses) will fail to beat the Ω(log(T )/T ) converse with probability at least 1/6.

II. BERNOULLI ESTIMATION
This section gives preliminaries on estimating an unknown probability p from i.i.d. Bernoulli samples {W n } ∞ n=1 with P [W n = 1] = p, P [W n = 0] = 1 − p 1 For example, online convex optimization compares to the best fixeddecision policy using a criterion of regret. As shown in [23], a regret of O(log(T )) is achievable when utility functions are strongly convex, while an example of linear (not strongly convex) utility functions is given for which no algorithm can yield regret lower than Ω( √ T ). For deterministic minimization of a smooth convex function using linear combinations of observed gradients, [25] shows that an optimality gap of 1/T 2 is fundamentally improved to O(e −rT ) under strong convexity.

A. Estimation functions
Let {W p n } ∞ n=1 be a sequence of i.i.d. Bernoulli random variables with P [W p n = 1] = p (called a Bernoulli-p process). The value of p ∈ [0, 1] is unknown. On each time step n we observe the value of W p n and then make an estimate of p based on all observations that have been seen so far.
A general estimation method is characterized as follows: Let {Â n } ∞ n=1 be an infinite sequence of Borel-measurable functions such that each functionÂ n (u, w 1 , ..., w n ) maps a binary-valued sequence (w 1 , ..., w n ) ∈ {0, 1} n and a random seed u ∈ [0, 1) to a real number in the interval [0, 1]. That is, for all n ∈ {1, 2, 3, ...} we havê TheÂ n functions shall be called estimation functions. Let U be a random variable that is uniformly distributed over [0, 1) and that is independent of and equally likely binary digits that provide a never-ending source of randomness.
For a given sequence of estimator functions, let A p n be the estimate at time n, as defined by (5). Define E p [(A p n − p) 2 ] as the mean square estimation error at time n ∈ {1, 2, 3, . . .}. The expectation is with respect to the random seed U and the random Bernoulli sequence is with respect to the probability measure associated only with the random vector (W p 1 , . . . , W p n ). For a given random seed u ∈ [0, 1) and for two different probabilities p, q are the mean square errors at time n associated with the same deterministic estimation functionÂ n but assuming a Bernoulli-p process and a Bernoulli-q process, respectively. The following theorem is due to Hazan and Kale in [17].
Theorem 1: (Bernoulli estimation from [17]) Fix any se- where A p n is defined by (5). It is important to distinguish the result of Theorem 1 from the Cramer-Rao estimation bound (see, for example, [26]). The Cramer-Rao bound is most conveniently applied to unbiased estimators. While biased versions of the Cramer-Rao bound exist, they require additional structural assumptions, such as knowledge of a (differentiable) bias function b (p) with a derivative that is bounded away from −1 so that a term (1 + b (p)) 2 does not vanish. Moreover, Cramer-Rao bounds are typically applied to a single estimator for time step n. In contrast, the Hazan and Kale theorem above treats the sum mean square error over a sequence of estimators, which is essential for establishing connections to the regret of online scheduling algorithms.
Using the nested interval techniques of [17], the asymptotic bound Ω(log(N )) in Theorem 1 can be written as an explicit function b log(N ) − c where b and c are system constants that do not depend on N . Unfortunately, there is a minor constant factor error in Lemma 15 of [17]. That error does not affect correctness of the Ω(log(N )) result. For completeness, the minor error is identified and fixed in Appendix A.

B. Positive measure in the unit interval
Theorem 1 shows that for any sequence of Bernoulli estimators, there is a probability p ∈ [1/4, 3/4] under which the estimators perform poorly, in the sense of having a sum mean square error that grows at least logarithmically in N . The next theorem shows that, not only does such a probability p exist, the set of all such probabilities p is measurable and has measure at least 1/6 within the interval [1/4, 3/4]. It also generalizes to treat arbitrary powers of absolute error. For each α > 0 define:  (6) where c = 8/3. Then Q is Lebesgue measurable and has measure µ(Q) ≥ 1/6. Thus, a randomly and uniformly chosen p ∈ [1/4, 3/4] satisfies (6) with probability at least 1/3. In particular, the inequality (6) implies: There is nothing special about the interval [1/4, 3/4]. This interval enables a specific bounding constant in the right-handside of (6). A similar result could be obtained for the larger interval [ , 1− ] for 0 < ≤ 1/4, but the corresponding bound gets worse as → 0.

III. THE CONVERSE BOUND
This section constructs a simple 2-user opportunistic scheduling system with state vectors S[t] described by a single probability parameter q ∈ [1/4, 3/4]. It produces a converse bound on the utility optimality gap by mapping the problem to a Bernoulli estimation problem and then using Theorem 1 and Theorem 2.
A. The example 2-user system In particular, if S[t] = 0 then the controller has no choice but to allocate X[t] = (1, 0), which gives no transmission rate to user 2. On the other hand, if S[t] = 1 then the controller is free to allocate X[t] = (r, 1 − r 2 ) for some r ∈ [0, 1], which allows giving a nonzero transmission rate to user 2. Observe that under any system state and any decision, it holds that is shown as the solid curve in Fig. 1. While this example decision set D(S[t]) is very specific, it is representative of the following physical scenario: Imagine that user 2 goes offline independently every slot t with probability 1 − q (possibly due to a time-varying channel condition, or because it allocates its resources to other tasks according to a randomized schedule). Hence, user 1 can allocate a full rate of 1 on those slots (corresponding to slots t such that S[t] = 0). On the other hand, during the slots in which users 1 and 2 are both online (corresponding to S[t] = 1), the users can simultaneously transmit but, due to interference, they cannot both transmit at the full rate of 1. During such slots t for which S[t] = 1, there is a tradeoff between the rates X 1 [t] and X 2 [t] that can be allocated, so that X 2 [t] is a nonincreasing function of X 1 [t]. The particular nondecreasing function X 2 [t] = 1 − X 1 [t] 2 that is used is shown in Fig.  1. This function is chosen for mathematical convenience (it simplifies the proof to be given). Similar proofs can be given for curves that are qualitatively similar but that have more physical meaning: For example, for slots t such that S[t] = 1, suppose the total bandwidth available is B and the rates of users 1 and 2 are chosen by allocating fractions of the bandwidth θ 1 [t] and θ 2 [t] to users 1 and 2, so that user 1 is allocated a total bandwidth of Bθ 1 [t], user 2 is allocated a total bandwidth of Bθ 2 [t], and θ 1 [t], θ 2 [t] are chosen as nonnegative values that sum to 1. The users thus transmit over frequency-separated channels. Assuming each channel is an additive white Gaussian noise channel (with noise density uniform over the given frequency spectrum) and given these particular frequency division allocations on slot t, the pointto-point Shannon capacity of each channel is [26]: where P and N are fixed positive parameters. The expression P θi[t]N represents the signal-to-noise ratio for channel i ∈ {1, 2} and the noise θ i N is proportional to the bandwidth used on channel i.
) allocations are given in Fig. 1 for a particular choice of parameters B = 0.7, P/N = 3. The mathematical curve is different from the curve (r, 1 − r 2 ), but it is qualitatively similar. In particular, like the curve (r, 1 − r 2 ), it can be shown to have a strongly concave structure. The proof of our converse can be extended to apply to this particular (C 1 [t], C 2 [t]) curve, and to similar curves that are strongly concave. This includes curves of the form (r, 1 − r a ) for 1 < a ≤ 2. We use the curve (r, 1 − r 2 ) because it is simple and yields the most direct proof of the desired converse result.

B. The example network utility maximization problem
For positive integers T define: and define X(T ) = (X 1 (T ), X 2 (T )). Let φ : [0, 1] 2 → R be a continuous and concave utility function. The goal of the network controller is to allocate X[t] over time to solve Subject to: This problem indeed has the structure of (1)- (2). Define φ * as the optimal utility for the above problem. That is, φ * is the supremum value of (10) over all algorithms that satisfy (11).

C. Optimality over stationary policies
Results in [11] show that optimality for the problem (10)-(11) is characterized by the closure of the set C ⊆ R 2 of all one-shot expectations E [(X 1 [0], X 2 [0])] that can be achieved on slot t = 0. Consider the set of all (x 1 , x 2 ) ∈ R 2 that satisfy The set of all such points is shown in Fig. 2 for the case q = 0.5. The next result shows that this set is equal to C.
where Conv(·) denotes the convex hull. This set C is closed, bounded, convex, and is equivalently described as the set of all (x 1 , x 2 ) ∈ R 2 that satisfy the inequalities (15), (16), (17). Proof: Define the set C 1 by The set Conv(C 1 ) corresponds to the set defined in the righthand-side of (18). The set C 1 is closed and bounded and so Conv(C 1 ) is convex, closed, and bounded. Define C 2 as the set of all points (x 1 , x 2 ) that satisfy the three inequality constraints (15), (16), (17). Define C as the set of all expectations E [(X 1 [0], X 2 [0])] achievable on slot t = 0. We want to show that Conv(C 1 ) = C = C 2 . We first show Conv(C 1 ) ⊆ C. Fix r ∈ [0, 1]. Consider the following decision policy for slot t = 0: Hence, any point in the set C 1 can be achieved as an expectation on slot t = 0. Any point in Conv(C 1 ) can also be achieved as an expectation by randomizing over policies that achieve particular points in C 1 . Thus, Conv(C 1 ) ⊆ C.
To show that (16) also holds, observe from (19) that regardless of whether S = 0 or S = 1 we have: which implies that inequality (16) holds.
To show that (17) holds, observe that where (a) holds by (19); (b) holds by Jensen's inequality; (c) holds by (20). Thus, inequality (17) holds. It follows that Conv(C 1 ) ⊆ C ⊆ C 2 . Finally, it is not difficult to show that the upper boundary of the set C 2 , defined by all points (x 1 , x 2 ) that satisfy 1 − q ≤ x 1 ≤ 1 and that satisfy inequality (17) with equality, is equal to the set C 1 (see Fig.  2 for the case q = 0.5). Further, the convex hull of this upper boundary is the entire set C 2 , so that Conv(C 1 ) = C 2 .
Results in [11] imply that the optimal utility φ * for problem (10)-(11) is equal to the maximum of the continuous and concave function φ(x 1 , x 2 ) over all (x 1 , x 2 ) in the closed, bounded, and convex set C. That is, The utility function of interest can be shown to be strongly concave and so the maximizer (x * 1 , x * 2 ) is unique. The maximizer is given in terms of q in the next lemma.

D. Statistics-unaware algorithms for utility maximization
Fix T as a positive integer. Define X[T ] as the time average over the first T slots Taking expectations of both sides of the above equality and using the definition of Let x * = (x * 1 , x * 2 ) be the optimal operating point defined in (23) of Lemma 2. For φ(x 1 , x 2 ) in (12), let φ (x * ) denote the gradient at x * expressed as a row vector: By concavity of φ we have: where (a) holds by the gradient inequality for concave functions; (b) holds by (29); (c) holds by (30) and the fact that φ(x * ) = φ * for the vector x * = (x * 1 , x * 2 ) defined in Lemma 2; (d) holds by (24). Now consider a particular t ∈ {0, 1, . . . , T −1}. Define F [t] as the history of channel states over the first t slots: By Jensen's inequality we have We have  (7); (c) holds as an entrywise inequality by (32) and (33). Taking expectations of (34) with respect to the random F [t] and using the law of iterated expectations gives (using : On the other hand, recall from Lemma 2 that Using (35) and (36) together gives . Substituting this and r = h(q) in (37) gives Substituting the above inequality into (31) yields: where (a) holds by the fact that x * 2 ∈ [0, 1] and q ∈ [1/4, 3/4] so that q/(1 + x * 2 ) ≥ 1/8; (b) holds by neglecting the nonnegative term for t = 0. By the expansion property of h in (27) we obtain Here is the crucial observation: For each slot t ∈ {1, 2, 3, . . .} we can view θ[t] as defined in (38)   Finally, assuming T ≥ 2 and rearranging (40) gives Again observing that {θ[t]} ∞ t=1 is a sequence of deterministic estimators of q, from Theorem 2 we know there is a set Q ⊆ [1/4, 3/4] with measure µ(Q) ≥ 1/6 such that for all q ∈ Q we have lim sup Substituting this into (41) yields This completes the proof of Theorem 3.

E. Discussion
The O(log(T )/T ) achievability result derived in [2] holds for smooth and concave utility functions and does not require strong concavity. The Ω(log(T )/T ) converse bound of Theorem 3 was carried out using a smooth and strongly concave utility function. This was intentional: This shows that, for these opportunistic scheduling problems, strong concavity cannot improve the fundamental convergence time. This is surprising because strong convexity/concavity is known to significantly improve convergence time in other optimization scenarios, including deterministic subgradient minimization [25] and online convex programming [24] [23].
The discussion in Section I-A shows that the Ω(log(T )/T ) converse does not hold for linear utility functions. For intuition on how the proof of Lemma 4 would fail with linear utilities, it is easy to see that if φ 1 (x) = a 1 x, φ 2 (x) = a 2 x for some real numbers a 1 > 0, a 2 > 0, then the r value that solves (42) does not depend on q and hence h (q) = 0 for all q, so there is no β > 0 such that h (q) ≥ β. Assumption 2 of Lemma 4 enforces nonlinearity. Assumption 2 implies that the φ function is strongly concave over the domain [0, 1] 2 .

IV. CONCLUSION
This paper establishes a converse bound of Ω(log(T )/T ) on the utility gap for opportunistic scheduling. This matches a recently established achievability bound of O(log(T )/T ). This means that log(T )/T is the optimal asymptotic behavior. The bound in this paper was proven for an example 2-user system with a strongly concave utility function. This demonstrates the surprising the result that strong concavity of the utility function cannot improve the asymptotic convergence time for opportunistic scheduling systems. This is in contrast to other optimization scenarios, such as online convex optimization, where strong convexity/concavity is known to significantly improve asymptotic convergence. The converse proof constructed a nontrivial mapping of the opportunistic scheduling problem to a Bernoulli estimation problem and used a prior result on the regret associated with Bernoulli estimation. The paper also develops a refinement on Bernoulli estimation to show that for any sequence of Bernoulli estimators, not only do probabilities exist for which the estimators perform poorly, but such probabilities have measure at least 1/6 in the unit interval. This is used to show that for any opportunistic scheduling algorithm, if nature chooses a Bernoulli state distribution by selecting the Bernoulli probability uniformly over the unit interval, the algorithm is limited by the Ω(log(T )/T ) bound with probability at least 1/6.
The converse bound of this paper was established for a simple 2-user system. This means that there exist systems that are limited by the Ω(log(T )/T ) bound. The techniques in this paper link opportunistic scheduling to estimation problems and can likely be used in future work to investigate bounds on more general networks, including networks with state variables S[t] that are described by more complex distributions. This motivates the following open questions: Can refined bounds be established for non-Bernoulli S[t] processes? Can more detailed coefficients of the log(T )/T curve be obtained in terms of simple parameters of the distribution on S[t]? The Cramer-Rao bound of estimation theory allows bounds for non-Bernoulli variables that depend on Fisher information of the underlying probability distribution. However, it is currently unclear how to reduce a general opportunistic scheduling problem to a generalized (non-Bernoulli) estimation problem, and it is not clear how to incorporate Fisher information concepts to provide "regret" type bounds for networks.

APPENDIX A -A REFINED VARIATION INEQUALITY
This appendix refines a Lemma in [17] about the total variation distance associated with the measure of i.i.d. Bernoulli random variables. Let Ω be a finite and nonempty sample space and consider the sigma algebra of all subsets of Ω. Let P where D KL (P ||P ) is the Kullback-Leibler divergence (in nats) between P and P : For a fixed positive integer n, define the sample space To compute the right-hand-side of the above equality, we have where (a) uses the inequality log(1 + x) ≤ x for all x > −1; 16n 2 3 Substituting this inequality into (47) proves the result.
We now utilize the above refined lemma. Let {Â n } ∞ n=1 be an infinite sequence of estimation functions as defined in Section II-A, so that and E q [·] represent expectations with respect to the probability distributions that form the random vectors (U, W p 1 , . . . , W p n ) and (U, W q 1 , . . . , W q n ), respectively. Since U is independent of the samples, the conditional expectation E p [|A p n − p| α | U = u] is with respect to the probability measure B p n associated only with the random vector of i.i.d. Bernoulli-p variables (W p 1 , . . . , W p n ). Similarly, E q [|A q n − q| α | U = u] considers the same estimation functionÂ n (·) but is with respect to the probability measure B q n associated only with the random vector of i.i.d. Bernoulli-q variables (W q 1 , . . . , W q n ). For n ∈ {1, 2, 3, . . .}, these expectations can be evaluated in terms of the measures B p n and B q n for which the Pinkser inequality applies.
The following lemma generalizes Lemma 16 of [17], which treats mean square error, to treat general powers of the absolute error. The proof closely follows the structure developed in [17] and uses Lemma 5 in a key place.
Lemma 6: Fix α > 0. Fix any sequence of measurable estimation functions {Â n } ∞ n=1 of the form (48). Let p, q be probabilities that satisfy p, q ∈ [1/4, 3/4]. Then for all n ∈ {1, 2, 3, . . .} that satisfy where A p n and A q n are defined by (49)-(50) and c = 8/3. Proof: Fix n ∈ {1, 2, 3, . . .}. Define = |p−q| and assume that ≤ 1 2c √ n . Fix u ∈ [0, 1). It suffices to prove that Without loss of generality assume q ≥ p so that p = q + . If = 0 then (52) trivially holds. Assume > 0 and suppose (52) is false (we reach a contradiction). Then Thus, there is a constant θ ∈ (0, 1) such that: 4 Following the technique in [17], applying the Markov inequality to (54) and (55) gives where P p [· | U = u] and P q [· | U = u] represent probabilities under the probability measures B p n and B q n , respectively. For simplicity of notation, for the remainder of this proof we suppress the explicit "U = u" conditioning, with the understanding that all probabilities are implicitly conditioned on U = u. With this simplified notation the above inequalities become From (56) we obtain 4 Indeed from (53): If Ep[|A p n − p| α ] = 0 then any θ ∈ (0, 1) satisfies (54) and we can choose θ ∈ (0, 1) sufficiently close to 0 to ensure (55). Else, if Eq[|A q n − q| α ] = 0 then any θ ∈ (0, 1) satisfies (55) and we can choose θ ∈ (0, 1) sufficiently close to 1 to ensure (54). Else, define θ ∈ (0, 1) by θ = where (a) holds because q = p + . Now define the set C ⊆ {0, 1} n as follows: Then (58) implies P p [C] < θ/2 and (59) implies P q [C] > 1/2 + θ/2 and so Then (1/n) α/2 ∀m ∈ {1, 2, 3, . . .} Fix α ∈ [0, 2). Fix any sequence of measurable estimation functions {Â n } ∞ n=1 of the form (4) and define A p n according to (5). Define Q as the set of all p ∈ [1/4, 3/4] such that lim sup Let µ(Q) denote the total Lebesgue measure of the set Q. We first show Q is Lebesgue measurable: Let X n be the set of all 2 n binary-valued vectors of size n. For each random seed u ∈ [0, 1), each functionÂ n (u, x 1 , ..., x n ) maps (x 1 , ..., x n ) ∈ X n to a real number in the interval [0, 1]. So The right-hand-side contains integrals (with respect to u) of bounded and measurable functions of (u, p). Those integrals are measurable functions of p, and so the right-handside is a measurable function of p. f n (p)dp 1/4 f n (p)dp +  Now let Z be a random variable that is independent of all else and is uniform over [1/4, 3/4]. Define H m = m n=1 f n (Z). Inequality (66) can be interpreted as Summing the above over n ∈ {1, . . . , m} gives Define A m (α) as the following event: Define A m (α) c as the complement of this event. Then However where (70)  where (a) holds because A m (α) ⊆ ∪ ∞ i=m A i (α); (b) holds by monotonicity of probability: where the notation "i.o." represents "infinitely often," that is, P [A m (α) i.o.] is the probability that the event A m (α) occurs for an infinite number of indices m. Thus However, by definition of the events A m (α) we have Finally we note by definition of f n (p) in (64) that Substituting this into (72) gives Since Z is chosen uniformly over the size-(1/2) interval [1/4, 3/4] it follows that the measure of all values p ∈ [1/4, 3/4] for which the above lim sup inequality holds is at least 1/6, that is, µ(Q) ≥ 1/6.