Ergodic Spectrum Management

Ergodic Spectrum Management (ESM)’s basic features are introduced as a cloud-based management of wireless connectivity that targets improvement of internet-user’s quality of experience. Ergodic Spectrum Management (or ESM) learns and exploits near-ergodicity, or time-consistency, to improve a communication-link connection’s stable and efficient use in time, space, and frequency; while using consumer quality of experience as the target metric. ESM methods can also improve existing radio resource management, particularly advancing unlicensed spectrum-use efficiency to levels at or exceeding those associated with licensed spectra, as shown herein. ESM’s use of learned probability distributions’ dimensional (time, space, and frequency) consistencies enables latency-insensitive remote-cloud-based resource management to be applied to wireless multi-user transmission. ESM methods are developed for 3 increasingly more effective stages that correspondingly increasingly rely on data collection and functional-profile (policy) guidance of physical-layer design choices. ESM application to either and both of existing and future unlicensed- and licensed-spectra networks is suggested as a means to improve overall wireless performance. Examples and field data are provided to show the potential of very large improvements in wireless system connectivity, throughput, and quality of experience.


E RGODIC Spectrum Management (ESM) methods
target cloud-based remote management of wireless transmission links' efficient and stable operation as learned from link users' quality of experience (QoE). Both the transmission link's characteristics, as well as link user's experience, have certain statistical consistencies, or ergodicities; therefore, an ESM-empowered cloud server learns and consequently exploits these ergodicities to manage (ideally optimize) time, space, and frequency use, and inter-user contention. Such ESM use provides a tool for wireless network evolution that does not require computational burden at the network edge, but instead allows more efficient use of available cloud computing resources to effect large improvements in network throughput as well as in consumer-user QoE. This paper introduces ESM basics with the goal of enabling and encouraging others to explore the concepts and pursue the area's vast possibilities to improve wireless network use in general.
Statistical consistency, or ergodicity, has enabled averaged wireless-design analysis and performance projection for several wireless-network generations to date. Such averaged analysis permits link budgets, data rates, and corresponding transmission ranges to be estimated. The ergodic analysis is used while actual transceiver designs are based on instantaneous transceiver training/pilot packets, initially, and usually interpolated thereafter. The average over the ergodic distributions from which channel conditions are sampled then presume the corresponding instantaneous' designs use for each channel instance. As bandwidths widen in modern communication systems, radio resource management (RRM) has increasingly been exploited at the appearance of a slower relative time variation (to the wider bandwidth) in these instance-dependent designs, see for instance [1]- [3] 1 for licensed spectra and [4]- [6] for unlicensed (Wi-Fi) spectra. Wireless RRM then increasingly approximates dynamic spectrum management's 2 (or DSM's) slow-time-variation earlier methods used in wireline copper networks [7] where the instantaneous channel in both wireless' RRM and wireline's DSM is presumed tracked/learned accurately. Some DSM methods are predecessors of what wireless systems expand upon and call "Non-Orthogonal Multiple Access" or NOMA [8], [9]. However, the channel-probability distribution's slow variation can be particularly important in wireless networks where not all users have a common spectrum controller. ESM additionally learns and exploits any nearergodicity or consistent use patterns to improve connection stability and efficiency in use of time, space, and frequency. ESM can be viewed as a specific example of the intriguing findings and summary in [2], where ESM simplifies determination of some decoupled cloud-based delay-insensitive spectraassignment and modulation-coding choices through artificial intelligence and learning methods for RRM.
Also important to ESM's distinction from earlier RRM/DSM is differentiation of the concept of Quality of Experience (QoE) relative to Quality of Service (QoS). Many technical works confuse QoS with QoE. QoE is measured by the connection user's contentment, often through 1 The series of documents under 5G-Xhaul at [3] has a good balance of theoretical RRM with plans for 5G standards, along with front-haul and backhaul, in practice in describing today's generation of RRM in wireless licensed spectra and LTE. 2 A subset of DSM is often called dynamic line management (DLM), more formally and originally known as DSM Level 1 in fixed-line standards and efforts.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ promoter scores, complaint rates, corrective-action costs, or simple service churn (cancellation). QoS is measured in more technical quantities like probability of bit/packet errors, latency, and achieved data rates. QoE and QoS need not be well correlated. For instance, a very good QoS may occur most of the time/use, but outages may nonetheless occur in situations readily noticed by users and thus lead to poor QoE. Alternately, some links have good QoE even if the QoS levels are below some prescribed levels, depending on the user, place, time, and application being used. ESM can address QoE more directly, while RRM/DSM addresses QoS, as becomes evident in Sections IV and V as this paper culminates. ESM will base energy-allocation decisions (as in Section III) upon ergodic probability-distribution behavior while optimizing QoE relative to those choices; however, these are related by the throughput (date rates) achieved through direct QoE-driven choices of code rates and constellation sizes in Section IV.
Traditional RRM depends on low-latency resource assignment, causing computational capability to be placed closer to the radio cells, often known as "edge computing," or "fog computing" see [10]. This traditional RRM often presumes a nearly instantaneous knowledge of all channels, noises, and interference levels to which the edge computing must quickly respond. The instantaneous reaction then often ignores learned network/user behavior. Alternately, traditional RRM may introduce various levels of randomness to sense and/or avoid collisions [11]. Figure 1's Learn-ed Resource Manager (LRM) 3 in ESM introduces the ability to learn and exploit any statistical consistency. ESM's LRM can guide such policy decisions by providing functional description on statistical consistencies of the various cell uses and dynamics. Figure 1 also shows 3 radio nodes and their sub-nodes and/or devices. Each radio node's coverage area has its own identification index or "color" for whatever spectra it may actively use, and dimensional uses may overlap between the radio nodes (or colors), corresponding to interference between different nodes' signals. Devices 1b, 2a, and 2c experience interference that RRM/ESM attempts to reduce or eliminate. The term "ergodic" formally means that time averages are equal to statistical averages. This paper uses the term "ergodic" more loosely to mean that certain consistencies recur [12]. These consistencies may however depend on a state in which the channel, noises, and interference may likely be -thus for each state, consistent behavior is expected but not necessarily the same behavior in each state. The current state is determined locally, but the set of possible states and their corresponding spectra, constellation sizes, and code parameters are guided by a cloud server's policy recommendations, which is Figure 1's Learn-ed Resource Manager. Ergodic approaches estimate the (possibly state-dependent) probability distribution of what will be called here the channel gains, g n , defined formally in Section II and indexed by n. A certain distribution's consistency is determined to assist ESM policy guidance provided to locally implemented RRM. Section II also develops a geometric-equivalent model for channel gains that will correspond to wireless systems' present-day use of channels (often themselves comprised of many tones using single-user-focused so-called "orthogonal-frequency-division multiplexing" modulation systems). Section II also largely decouples spectra decisions from code/rate choices to simplify the complexity challenges that are well described in [2] and to introduce an element of learned QoE improvement.
Section II also briefly reviews Goldsmith's original concepts in ergodic loading [13], [14], (see also [15], Chap 4) for a single user while preparing for Section III's multi-user alterations thereof. Section II shifts emphasis from traditional RRM wireless' systems design that depends on instantaneous channel gains to ESM's learned probability distributions and correlations of these channel gains that instead guide ESM policy decisions. In ESM, the distributions of the other users' channel gains become mutually dependent in a jointly controlled manner when possible and prudent. Section III develops intuition around these dependencies and defines 3 stages of increasing ESM sophistication that could help guide ESM's incremental introduction into legacy networks as well as future networks with greater tunability. Section III contains a few simple examples that illustrate ESM's fundamental gains with respect to the contention-based approaches typically used in unlicensed spectra. These stages will roughly parallel 3 DSM "levels" originally proposed in [7] and later used successfully to advance significantly fixed-line speeds and efficiencies. Section IV augments Section III's ergodic-spectra guidance with QoE-influenced additional functional-choice (policy) specification of the modulation and coding-system (MCS) parameters by expanding traditional outage-probability metrics and mechanizing them with Markov models as adaptively learned and output-optimized. Section IV also describes some simple distribution estimation and various methods for estimation and ESM's use of QoE metrics. Section V provides some examples of correspondingly large potential ESM gains in QoE. Section V notes that while only using a portion of the ESM full capability significantly improves QoE, further motivating the development of more complete cloud-based application interfaces that would lead to field deployment of ESM's fullest opportunities as theoretically projected in earlier sections. Section VI concludes. Table I provides a list of mathematical quantities and brief corresponding explanations, along with the section in which they first appear, which the authors hope assists the reader.
II. RESOURCE DIMENSIONALITY, LOADING, AND STATISTICS Resources in this paper will be measured in dimensions. Dimensions in modern wireless communication networks can occur traditionally in time and frequency, but also increasingly in space where increasing numbers of antennas are used to improve system performance. Generally, these dimensions are viewed as system resources. Power is sometimes also viewed as a resource in that power (or really energy) can be assigned (or not) on any subset or on all the available dimensions. There may be a probability associated with that energy assigned as well, but the assignment of any non-zero probability of non-zero energy to any dimension is a use of that dimensional resource. This section generically casts dimensional resources as first equal in contribution and avoids specifically associating them with time, frequency, or space but rather an equal partitioning of available resources. There will a finite number of space/time/frequency dimensions per symbol, and transmissions of successive symbols is presumed. Such presumption tacitly will require an overall ESM symbol clock in some situations, for which the corresponding synchronization can be approximated in actual systems, but may become more exact in the most sophisticated highest-gain cases. As this section progresses, ESM's resource partitioning shifts from an all-dimensions-are-equal deterministic view to a statistical view based on the probability the dimensional resource is useful. Several examples will illustrate ESM's improvement upon collision-detection methods, and even in some cases on perfectly deterministic RRM. This section thus prepares for alteration of original single-user ergodic approaches to Section III's multi-user cases.

A. Multi-Dimensional Channel Generics
Loading (see [14]) refers to the assignment of energy and information (a sub-function of coding and modulation) to a channel's possibly variable-quality 4 dimensions that may not necessarily all have the same gain and noise. Figure 2 illustrates this for 4 channels, each of which themselves may have many "dimensions", 5 which dimensions in some cases may not have each the same amplitude and noise. As in Figure 2's second row, ESM often represents a channel by an equivalent constant dimension repeated a certain number of times per 4 Variable quality could be viewed in time as "time-variant," in frequency as "frequency-selective" channel filtering, or as the different gains on different spatial paths, etc. 5 A dimension can be thought of as a time slot, a subcarrier/tone, or a spatial dimension. This paper focuses on wireless transmission using some base quadrature modulation on each dimension, so a dimension is viewed as a complex dimension, or thus two real dimensions.  symbol, which in turn can generalize into a probability that a certain type of dimension/resource is available. Loading for single-user channels is addressed in detail in [14], (see also [15] Chapter 4). Figure 2's Channel A uses variable energy on the dimensions to maximize performance, while Channels B and C use equal energy on all dimensions. Channels B and C, though, each have different total energy. Channel D uses equal energy or zero energy on its dimensions. Channel A might be representative of a massive MIMO 6 system with many spatial channels, each spatial channel of which has different gain and SNR. Channel A could also correspond to the frequency dimensions of a wireline system. Other types of communication links might also produce Channel A. Channels B and C might be wireless Coded-OFDM 7 systems that use a constant energy on all subcarriers within a specific channel. Channel D could correspond to a wireless system that aggregates two channels for transmission that are perhaps not contiguously located in frequency. Channel D could also represent 4 spatial streams, of which 3 are used. Any dimension has a SNR defined by where E = E / N is the average transmit energy used on the dimension, E is the total energy used over N dimensions, and g is the "channel gain" that basically represents the channel energy gain/attenuation normalized to the dimensional noise energy. For Figure 2, Loading decides the transmit energy assigned to each dimension. There may be many loading criteria, resulting in variable or flat energy distributions. The gain g is a function of the given channel and cannot be changed (by the designer) and is viewed as random in ESM. The maximum bit rate b for such a channel (with any g value) is well known to be with the per-dimensional quantity b computed from the total number of bits b transmitted over N dimensions as b b / N . For Figure 2 For any energy on a channel's dimensions, this distribution and channel may always be represented by an equivalent geometric single-SNR channel that has the same information-bearing capacity, b. That equivalent, or geometric, SNR is given exactly, for example for Channel A, by (with SNR n,A being the SNR for dimension n of Channel A) Since good loading methods [14] will typically not assign energy to channels (or dimensions) where the SNR is not significantly greater than 1, Equation (3) is often approximated by dropping the 1 terms; then appearing exactly as the geometric SNR equal to the N th A root of the product of the N A constituent dimensional SNR's. In this case by assigning constant energy E A to the equivalent channel in N A instances, the channel gains also can be represented by their geometric average as and thus As Figure 2 illustrates, a nested-loading problem can now be solved for the constant energy assigned for the channel set (as if it were a single "wider" dimension) and an overall aggregate geometrical SNR can then present the channel set: Equation (6) can also be accurately approximated by dropping all the 1 terms while only including loaded channels. Figure 2 though does show SNR geo,C as not being loaded (zero energy assigned), so Channel C's 1 term should not be ignored and consequently that zero-SNR term then trivially exits the formula in (6) as a unity gain factor. The overall nested loading problem would assign constant (or zero) energy to each of the geometric-equivalent channels' dimensions (but possibly different energy to different geometric-equivalent channels N where X = A, B, C, D could be viewed as the average probability that a certain resource (dimension) is used. The values in the distribution represent the probability that a dimension appears in a certain channel. Roughly speaking, this probability corresponds to the likelihood that a certain channel "resource" is available to be used. With such interpretation, the probability is the trivial ratio N X / N and is no longer a function of the channel gains (and thus noises). However, the probability concept could be generalized to correspond to the probability that a certain channel resource is available, with fading, gains, interference from other users, and noises taken into an account. Such a conceptual interpretation is particularly useful in ESM when the probability of certain channel conditions is known or more importantly can be estimated. These estimated channel gains' (g geo,X 's) probabilities become ESM-LRM's inputs to set energy-use policy. This probability of g geo,X concept expands into the concept of ergodic loading that appears later in this section, which then become the foundation for Section III's multi-user extension of ergodic loading in ESM. Figure 2's ESM channels could be considered to be different channels in an IEEE 802.11-series Wi-Fi system (each typically 20 MHz wide, but could be power-of-2 multiples of 20 MHz) or in same-frequency-bandwidth transmission systems. In LTE, these are known as resource blocks (or resource units), typically corresponding to 12-tone groups (180 kHz wide) of a Coded-OFDM system over certain time slots of duration 0.5 ms (usually containing 6 or 7 successive OFDM symbols) [16]. This concept has somewhat abstractly been also called the "all-band" wireless system by Ericsson [17]. The system might even further combine a fixed-line DSL DMT 8 or DOCSIS 3.1 Coded-OFDM system with wireless channels, with the former themselves each viewed as channels. In effect, the aggregate forms a "channel of channels." Narrowband low-power-wireless-area-networks (LPWANs) could also be considered each as a channel in this context. These LPWANs can include wireless systems such as Bluetooth, 9 LTE-M's narrowband IoT 10 (Internet of Things), or LoRa 11 (long range). Dimensions in this context are probabilistically weighted partitions of resources (each partition base element typically corresponds to a certain single least-common-divisor use of time, frequency, and space over all channel resources; different probabilities scaling with the number of such base units). 8 DMT or Discrete Multi Tone is an OFDM method that adaptively sets each dimension's energy and gain, unlike the equal energy and information in OFDM methods [7]. As such DMT represents upper bound on OFDM performance and more closely parallels Shannon's original capacity recommendations. 9 Bluetooth was original IEEE standard 802.15.1, but is now maintained by the Bluetooth Special Interest Group, for more information see https://www.bluetooth.com. 10 3GPP LTE's Release 13. 11 Long Range low-power transmission methods for narrow band internet of things communication, see https://lora-alliance.org for more information.

B. Water-Filling as a Dimension-Management Tool
The water-filling energy allocation to a set of parallel independent channels (or dimensions) dates to Shannon [4], [18], [19] and various methods for computing it appear in [14], [15]. Of particular importance here is water-filling distribution's energy assignment per channel according to In Equation (7), the energy input to the n th channel 12 is E n ; Forney's [15]"coding-gap" parameter Γ characterizes the applied code's capability, with Γ = 1(0dB) implying a capacity achieving code is used 13 ; K is the water-level constant; and the n th channel (or dimensional) gain is defined by g n channel energy amplification (attenuation) sum of all noises , where the "noises" can include interference from other users who simultaneously attempt to use that same channel. The channel amplification/attenuation is just the respective squared increase/decrease in the transmitted signal voltage to its noise-free component at the receiver input.
Water-filling essentially says that the sum of the energy and the inverse gain 14 is constant on all used subchannels. The term used is italicized because water-filling will zero certain channels as being unable to solve Equation (7) with positive energy. Normal water-filling will order the {g n } from largest (n = 1) to smallest (n = N ) and choose the largest N * such that (to maximize data rate or total bits carried b) gn is satisfied with all non-negative energies, and E = N n=1 E n is the total energy allowed. Water-filling can be viewed with the ordered set of channel gains as the transmit per-dimension rule of "transmit if good enough" (with γ 0 Γ / K ) or ⎧ ⎨ ⎩ g n > γ 0 transmit at energy E n = K − 1 g n g n ≤ γ 0 do not transmit, so E n = 0.
The rate-adaptive 15 water-level constant K RA can also be viewed as the sum of the used-dimension average energy 12 Channel could mean "dimension" (or really a set of dimensions as in Figure 2's nested loading) or it could more simply mean the base-unit dimension for the tones/slots of a single channel. 13 Equations (3) -(6) assume that Γ = 1. For more on Γ, (see [15], Chapter 1). 14 The inverse gain is channel-input-referred noise component, similar to the noise figure often used in communication systems; the noise figure typically reserved for analog front-ends (antenna and subsequent amplification electronics) but has the same "noise on the output referred to the input" quantity used here by Shannon's water-filling. The noise figure is the ratio of the inverse gain to the inherent lower bound of pure thermal noise. 15 Rate adaptive means the criterion is to maximize the data rate or sum of data rates b over all the dimensions given fixed total energy E (which is the sum of energies for the dimensions). Margin adaptive is a dual criterion that first minimizes the total energy given a fixed target total data rate b, and secondly then to boost the "margin" relative to the minimum energy to the allowed energy level. so with N * < N the water-fill loading process increases energy on average for the better used dimensions) and the gap-scaled average inverse gain Γ g * 1 N * N * n=1 1 gn , so The dimensional average uses angle brackets to indicate they are over time, space, or frequency and do not correspond to averages over the channel input or noise distributions. These may also be viewed here as "ergodic averages." A useful water-filling interpretation (with Γ = 0 dB) is that the transmit energy on any used dimension exceeds (or deceeds) the average transmit energy by an amount that is equal to the amount by which the channel gain deceeds (exceeds) the average channel gain or (with unit gap or perfect codes): Similarly, to minimize energy for a given data rate or total bits over all channels, dual margin-adaptive (MA) waterfilling instead chooses the largest N * such that is satisfied with all non-negative energies.
MA water-level constant can be also written as These water-filling formulas presume the (single-user) RRM knows the channel gains instantaneously and accurately at both the transmitter and the receiver, and the statistical interpretation just appears superfluous, as yet.

C. Calculation of the Channel Gain
The channel gain for user u in channel X is g u,X . This quantity is a derived quantity from other reported QoS parameters. There are two major standardized, measured, and reported wireless parameters that are the Received Signal Reference Power (RSRP) and the Received Signal Strength Indicator (RSSI). They are formally defined as and where |h u,X | 2 , E u,X and, σ 2 u,X are respectively the average magnitude of the channel transfer functions, the transmitted energy, and the noise variance for the corresponding user u and band X.
RSRP measures only a single user's channel-output energy for a specific reference signal that is known to the receiver (allowing averaging in the receiver to remove the effect of other users and of noise). RSSI measures the total received energy from all users together with the noise. The channel gain is derived via The channel transfer function |h u,X | 2 can be derived in (13) because the transmitted energy E u,X will be known. Knowing a particular set of simultaneously occurring {g u,X } u=1,...,U implies also knowing the set of simultaneously occurring {|h u,X |} u=1,...,U , which values will help the LRM compute iterative water-filling algorithm in Section III.

D. Ergodic Water-Filling
ESM guides loading decisions through a statistically based function of the instantaneously measured channel gain (or gains). The LRM computes the probability distribution over the channel gains, as p g , over a discrete set of gain values (ranges), G = {g}. The instantaneous geometric-average channel gain value g geo,X for X ∈ A B C D may also be all that is known at the local radio node's transmitter via an initial training process for each channel use or for very recent history. The instantaneous transmitted packet's g geo,X value is often feedback (or "channel state information" CSI) to the transmitter through a training protocol often called channel sounding using what Wi-Fi, for instance, calls an NDP (null data packet) [4] (Chapter 13). LTE runs continuously on the channel with the channel gains instead interpolated from embedded training pilots that basically range through the used channels. This instantaneous CSI should be distinguished from statistical CSI that corresponds to a presumption or calculation (See Section II-E) of the probability distribution of the channel gains. The CSI (or instantaneous CSI) is computed at the radio node while the statistical CSI is computed in the LRM. The energy transmitted for a specific value g is E g .
An ergodic water-filling solution, first found by A. Goldsmith and summarized in her book [13], generalizes for a discrete distribution [14], [15] (Chapter 4) to maximize the average data rate b = g∈G p g · log 2 (1 + E g · g) (16) subject to an average energy constraint of where p g is the probability of gain g. Maximization of (16) leads to the ergodic water-filling constant as where G * is the largest set of the (ordered again from largest to smallest) gains' range values for the discrete distribution for which all energies in (10) are non-negative. The ergodic water level generalizes Equation (10)'s uniform distribution over the used channels and replaces it by a more general distribution p * g over the used channels that have sufficiently large gain, but then otherwise retains Equation (10). Ergodic water-filling replaces the deterministic resource index n by the channel gain value g. However, ESM also requires the instantaneous channel gain to be known locally at the transmitter and also follows Equation (9) or also (11). Essentially, single-user ergodic water-filling differs from normal water-filling only in the calculation of the water-fill constant K.
1) Outage Probability and Loading for Ergodic Channels: As in [14], when the spectra/channels' energies are determined, the traditional-RRM radio node locally decides two code parameters that are the constellation size |C| (nominally chosen from among BSPK, 4QAM, 16QAM, 64QAM, . . .. , 4096QAM) and code rate r (typically, code rates are simple fractions like 1 / 2 , 2 / 3 , . . . , i / i+1 created by puncturing a rate 1/2 convolutional code to have less redundancy. 16 With reasonable code decisions (fixed gap), the water-filling spectrum decisions are independent of the code choice. When the code is capacity achieving (0 dB gap), then the data rate is simply determined by the well-known log 2 (1 + SNR) formula; but for realistic codes, the code rate and constellation size are independently (of energy policy) computed for each channel with constant SNR over the band. A possible local radio-node Quality of Service (QoS) objective for r |C| , which uses a channel-gain threshold parameter γ 0 for optimization also, is effectively equivalent to the following problem statement: max r,|C|,γ0 b r · log 2 |C| subject to: P e < δ and P out ≤ 1 − r, where, for instance on a channel with additive white Gaussian noise, the average probability of symbol error, with average number of adjacent constellation points N e , is (limited by a specified maximum tolerable level δ ) 20) and the probability of outage 17 is The code distance profile versus rate is known as d free (r) and is known for the codes allowed in the radio node. The formulation in (19)-(21) assumes the receiver correctly erases (or soft decodes) channels with g ≤ γ 0 ; decoder imperfections simply tighten this inequality constraint. The parameter γ 0 on the sum's index is chosen to satisfy both (20) and (21). Equation (20) admits also an overall data-rate ordering b = r · log 2 (|C|) that can be checked to solve the QoS optimization problem by successively testing this ordering's overall optimized data rate in (19) until the performance objectives in (20) and (21) are met.
2) Nesting With Ergodic Water Filling: Nested loading with ergodic water-filling presumes that a geometric average channel gain is available locally (at the radio node) for each channel and for its corresponding packet and/or "time slot." Thus, the lowest level loading is performed locally in the radio node. The ergodic decision then may simply become "use or don't use" a certain channel at a certain time, along with the energy level to use that is based on the instantaneous measured channel gain. For a single user, this is relatively simple. Section III will progress to multiple users where the joint probability distributions tacitly (Stage 1 ESM, see Section III) or explicitly (Stage 2 ESM, Section III) will be needed to create a useful multi-user form of ergodic waterfilling. Again there a level of local deterministic water-fall underlies an overall averaging.
In ESM, the local transmitter will know only the gain for its own channels X ∈ A B C D , and the LRM will know the distribution of such values, but not the instantaneous value. The LRM will provide energy-use-policy guidance to the local transmitter and code-use policy as a function of the locally measured gain value, this policy function being E g , and which essentially amounts to supplying the water-fill constant as this policy in the simple cases viewed so far.
Equation (18) can be rewritten, by defining the probability of channel gains P * geo = # g∈G * p g $ that that corresponds only to used resources after the optimization selects the set G * , as indexed through the channel gain (or inverse gain), as with the distribution on the used set {g ∈ G * } defined as The ergodic water-fill factor (22) is similar to the factor N N * in (non-ergodic) water-fill and corresponds again to only the better channels using the available energy. The energies are again determined, now indexed by g, as In practice, the usable range of energies is typically close to on/off as in (non-ergodic) water-filling because the factor Γ / g will be small relative to the water-fill level K RA in practice on the used channels with large-enough channel gains for significant data-rate transmission (and be very large for the poor channels that cannot be used). This is because data rate contribution is very small when the SNR is low, and thus "edge SNRs" contribute little on most practical channels of interest. The LRM cannot know the current instantaneous g geo value.
For a single user, the decision of energy and coding parameters to use could be guided by the LRM through ESM's functional specification, or set, of spectra/codes for each locally measured geometric channel gain. Thus, while feedback of instantaneous g values for each dimension is impractical, the LRM knows and specifies the set {g geo } of possible values along with the associated energy use; for instance the energy for Figure 2's channels X = A, B, C, and D, at a certain time of day in a certain location (or user) could have arisen from such policy specification. These are the locally measured g geo,X that are the inputs to the LRM's provided energy-policy function. If the average bit rate is fixed in (16), there is a corresponding dual ergodic water-filling solution for minimum average energy where (with ESM generalizes the concept of resource use from the fraction of used dimensions to a probability distribution, and when nesting loading over many channels, N X / N → p geo,X . ESM transforms the overall SNR in (6) into While these generalizations may as yet appear superfluous for a single user, they become more helpful to comprehend their alternatives in Section III's ESM multi-user case.

E. Probability Distribution Estimation
This subsection suggests some methods for single-user and multi-user channel probability-distribution estimation in two successive subsections (II-E-1 and II-E-2). The single-user distribution can be used in single-user ergodic water-filling. The multi-user distribution can be used in Section III. The division of time into epochs that may be different from one another but consistent within each group; for instance most basically would use periods of nominal user behavior. For instance, a 24-hour day can be divided into 96 15-minute periods, which is common in many telecommunication maintenance systems. Each of these corresponds to certain common user behaviors. For instance, the peak-use periods in residences tend to between 7pm and 10pm in the evening, while minimaluse periods are often 2am to 4am. Each of these 2-3 hour periods will exhibit common statistics and use, but the statistics might be quite different between the peak-and minimal-use periods. For this reason, the ergodic probability distributions would be separately estimated in these periods. These might be further identified by weekend periods versus week-day periods (and even holidays). Business systems uses tend to be in the working hours of the day, and less heavy in the evenings and weekends. Determining common probability distributions is the area known as statistical inference, and beyond the scope of this paper, but the reader is referred to any of the many fine texts on this subject, for instance [20].

1) Estimation of a Single Probability Distribution:
The channel gains, g geo,X , themselves can be continuously distributed between 0 (channel is unusable) and some reasonable maximum value. The range of gain values thus needs discretization for ESM. Equation (20) provides a reasonable way to discretize the gains' range by looking at the minimum gain levels necessary at a presumed nominal transmit power spectral density (E ) and target random-error probability level (say p = 10 −7 ), as per 18 Q according to the allowed values for r |C| for the given code, where free-distance is given as a function of rate for some known applied code(s) as d free (r). Finer resolution of gains is possible, but perhaps of diminishing value. The solutions of (19) provide the successive gain regions' endpoints that are typically characterized or represented by the lowest gain value at the region's lower boundary. Each of these ranges can correspond to certain interference situations (different sets of other active users for instance as in the EIW example in Section III) or also to the channel's attenuation varying with user/environmental movement/change (or both).
For ESM, these gain ranges each can correspond to the current measured values of g geo,X for different channels X that are reported to the LRM. The gains create a range segment where g 0,X = 0. The measured set of all {g geo,X (k)} (with k an observation-interval index) for a certain channel X will have size |G X | that is the total number of measurements. Each of the sets will have a size |G i,X | that equals the number of measurements that fall in range segment G i,X . After a sufficient number of such measurements for each channel, the gain distribution can be estimated from the set of measured gains g i,X for that channel aŝ Typically, the total number of observation intervals should be at least ten times larger than the number of ranges in the discrete distribution p g to ensure that distribution-estimation error is relatively small. If such distributions are computed for different times of day, then this rule should hold true for all such computed distributions individually corresponding to their respective times of day. A good estimate-accuracy measure is that the distribution no longer changes much with additional measurements; that is, the distribution appears "ergodic." The distribution would not immediately remain ergodic if a new radio node or user is introduced that has not previously been observed. Entire textbooks, see for instance [20], have been written on the subject of statistical 18 The Q-function is defined as · du, which is non integrable in closed form, but heavily used and tabulated. Q −1 (x) is its unique inverse function, which can also be tabulated.
inference, which attempts to estimate distributions or if a distribution has changed. A simple, but perhaps not optimum, method computes the moments such as mean and variance of the distribution p g,X at each observation interval or the statistical distances of empirical distributions at each observation interval in place of moments. Most commonly used statistical distances include Kolmogorov-Smirnov statistics and Kullback-Leibler divergence [21]. If those moments or statistical distances have changed more than previously observed variation of earlier ergodic values, then policy recommendation should be suspended until the new distribution stabilizes. Movement of the radio node or a subtended device/user) can also cause such change. Such movement does not prevent an average "ergodic" distribution appearance if the movements are roughly consistent (for instance movement down a hallway that occur often during a certain time of day, or even movements down a roadway by vehicles using the ESM channel spectra fairly consistently). The estimation process can average the distribution over several intervals by using a sliding block of intervals that simply averages the distributions found for each of the intervals within the sliding block. Alternatively, an exponential fading window can update the distribution according to which will exponentially reduce the effect of old data more gradually while introducing new data with highest weight λ where 0 ≤ λ < 1, but typically close to 1. However, in many situations the distributions will be consistent at certain times/places. This will be particularly evident in indoor networks where most users/things are not frequently changing position, and/or the same positions of use are often also common at certain times of day. Consistent movements may degrade channel gains on average, but the averages may still be consistent and reliable in that degradation.

2) Estimation of a Multi-User Channel-Gain Vector:
This subsection extends the distribution estimation to a random, U −dimensional gain vector g's distribution across all users, where U is the number of users. Such joint-distribution estimation with reasonably large number of users (even just a few to 10's) can become draconian computationally with Subsection IV-C-1's simple range-segment counting method, because the number of measurements rapidly becomes astronomical with the straightforward counting method applied to a joint distribution p g1,...,gU (the number of measurements required grows exponentially with U ). A table can be used to describe this joint discrete distribution. The chain rule of probability can help reduce this complexity by recognizing that any order of the users will produce the same product. Indeed, all the possible conditional probabilities of one user given any set of the others are possible according to which users are simultaneously active at any observation interval. In (31), such interpretation provides upon each measurement interval an opportunity to update the entire product (assuming some value for other users' prior conditional probabilities, and similarly for different other users' post conditional probabilities).
The computed joint distribution can be averaged with the last joint distribution computed (sliding block or exponentially windowed). The entire product can be initialized by assuming all users are independent or effectively any users for which there is yet no joint data are independent and simplify in (31). The probability distribution can be initialized by where only those users who've been active for sufficient number of intervals are included in the product. At any point in time when a p gu is reported, the LRM checks for all other active users reported at that time to form set U act , and then this reported distribution is then the term p gu → p gu/Uact in (31). Combinations not observed simultaneously present over multiple observation intervals would have their terms eventually zeroed though equation (31). More interesting is the correlation between different users' distribution values. For instance, a certain value of channel gain for User 1 may be often (or nearly always) associated with another value for User 2. Many other combinations will be zero, corresponding to a sparse table for p g (namely many zero or very small entries that can be assumed zero). In products like (31), various terms may have values that are nonzero (or significant) only when other users specific channel gain values occur, and are otherwise zero for all other combinations. This simply means that interference between users basically occurs in certain pairings or tuples for U > 2. Essentially the channel gain value on one user may well suggest which other users are active and which are silent when it is observed. The set U act may be one of up to U ! possible such sets. Each user u has a probability distribution p gu/Uact function of its gain g u and all the other users U act \u channel gain values. Again, this function will be effectively zero for all but a few channel-gain vector settings. Those non-zero settings correspond to (ergodic) patterns of mutual interference. The LRM will also know for reported g u,X the corresponding levels of reported E u,X . These, if changed, over subsequent measurements can be used to callibrate any such energy changes by using the derived associated set (see Section II-D) of channel transfer functions to adjust the value of any user's (or all users') g u,X . Only the LRM needs to know these pairings as becomes evident in Section III. These pairings will be a function also of the channels available X = A, B, C, D, . . . for each non-zero-probability set. For a particular user u's g u,X values, the corresponding values of g i =u,X will thus be known basically by the non-zero entries of the of the p g tabulation.  channel-gain values and time-correlates these individual-user values with those of other users through the non-zero p g table entries as in Section II-E-2. These gains are collected historically with time stamps (see Section V) for each radio node's (RN's) users and subtended connections. These time stamps allow a periodicity of observed ergodicity -that is presuming the statistical consistency occurs for certain times of day or week, but there may be different statistical consistencies at different times of the day. While it may be possible to infer joint channel-gain distributions across multiple radio nodes' connections, the Stage 1 ESM LRM finds the non-zeroprobability values of p gu/Uact and uses the corresponding sets to implement Subsection III-A's iterative water-filling process that will produce a recommended spectrum policy E u,X (g) for each value of g that is communicated by the LRM to the radio node for user u. The local radio node otherwise operates mostly independent of the LRM. Individual channel-gain probability distributions may be computed to estimate the probability of data rates achieved and corresponding energy levels that can possibly attempted by different user sets that can arise, as well as average values and percentile performance levels. Subsection III-B's Stage 2 ESM more aggressively applies spectral constraints based upon joint distributions for radio nodes with sub radio nodes or across mesh networks where more severe and comprehensive interference occurs. Stage 2 ESM uses more sophisticated optimum spectrum balancing methodologies.

III. ESM STAGES
Stage 2 may better extend also to mesh situations where there are sub radio networks within a given radio node's coverage as shown in Figure 1's middle (red) radio-node coverage. Stage 2 in its full form would result in more complicated multi-user functional guidance to radio nodes. However, Subsection III-B develops methods to simplify the guidance to the same level as Stage 1 in the context that Stage 2 solutions always select mutually exclusive channeluse patterns. This is a quasi-distributed form of null-space steering methods, as described in [22] and [23], but here in ESM for systems without instantaneous central control of all radio nodes. Stage 3 represents a higher-level ability for a neighborhood of radio nodes' spectra use to be additionally well synchronized and coordinated based again on ergodicity and not instantaneous conditions. Stage 3 Vectored ESM guides and improves RRM across a group of radio nodes that otherwise were individually optimizing within their own limits. Typically, a Stage 3 system may have radio nodes and devices that have many antennas and can follow (phase lock to) a common symbol clock accuratelytheir spectra, space, and time use can be yet better coordinated than Stage 1 or Stage 2. Such methods would be well suited to DAS [24], 5G-DSL [25], and/or CoMP/FeICIC [26] methods. Figure 3 illustrates management-information flows in any stage ESM ecosystem. The system has physically separate radio nodes that may only coordinate (for ESM purposes) indirectly through the cloud-based LRM. One element of the parameter vector θ is the throughput (data rate) of the user connection, which is how the spectral choices of this section (using the generalization of Section II's loading methods) are linked to the modulation and coding parameters optimized for QoE in Section IV. The information provided by the radio nodes to the LRM are the channel gains for any/all subtended connections to devices (or sub radio nodes), any measured interference transfer gains/phases (only Stage 3 ESM), and QoS parameters like times of use, outages, packet errors, previously achieved data rates and corresponding conditions (See Sections IV and V), all possibly time-stamped The ESM control information provided to the radio nodes are policy functions, to be considered for use by the radio-node. The inputs for these policy functions are the future local-radionode channel gains measured (or derived, See Section II-D) instantaneously. As this section illustrates, such guidance can lead to improved performance when certain ergodic consistencies are present. The radio node may always over-rule the guidance if an obvious fault would occur, and simply report such its actions (to the LRM).

A. Stage 1 -A Form of Ergodic Iterative Water Filling (EIW)
Iterative Water-filling (IW) is for multiple users who each simultaneously practice single-user water-filling in shared channels. It is a deterministic method to reduce the mutual interference between the users. Figure 4's flow chart outlines IW [27]. The user index is u = 1, . . . , U where U is the number of users. IW is indirectly a function of all the users' gains {g u,X } u=1,...,U , which effect themselves into IW's energy loading through the "noise" that includes other users' interference in the same band (presuming the other users' interference cannot be cancelled). These gain values can be partitioned into the discrete subsets of Section II-D-1. In practice, these gains are measured by the wireless radio nodes' equipment and reported to the LRM through low bandwidth cloud/internet feedback. The LRM determines which gain sets mutually correspond to non-zero probability and for each user in such a set the corresponding water-fill spectra for specific channel-gain values. The values representing these non-zero probability sets can be used in the LRM's IW calculations. There will consequently be a delay in reporting a channel-gain value to the LRM, so only the specific user device and radio node will know the current instantaneous gain value. The LRM, however, can compute the distribution from reported values (as in Section II-E) and find the mutually active sets to be used in IW. Channel gains can be locally measured (see Section II-C) in the radio node before reported to the LRM. Iterative water-filling is not always guaranteed to converge, although there are numerous cases where it can be mathematically proven to converge and many others in which mild conditions are necessary for convergence [28]- [30]. The convergence point need not be optimum in all these cases, but it usually is an improvement over all the users attempting to use all the bands, or all the users attempting to avoid one another completely (using collision detection or other fixed assignments of users to channels), as this section's example will illustrate.
Various improvements [31]- [33] to IW have been proposed, but they increasingly require knowledge of the exact inter-user interference-filtering transfer functions (or their equivalents) while the IW implicitly measures those users as part of noise in the denominator of g (or interference's impact on the measured probability distribution). Iterative water-filling can essentially be computed in a nearly distributed fashion where each user's transmissions simply water-fill against the others' sensed interference. However, usually the data rates for each user, as in the components of the vector of different users data , are fixed, and then all users implement energy-minimization (MA) water-filling, which tends to prevent any user's data from being zeroed in favor of the rest. This data-rate vector fixing and imposition of energyminimization criterion at that data rate is a form of "central control" so there is, even in IW, some degree of central control, and then IW is not completely distributed. In ESM, this control is in the LRM. In EIW, all the users water-fill computations are performed (essentially simulated) in the LRM, based on the sets of non-zero-joint-probability channel-gains that also are computed in the LRM from reported (and delayed) values of past g u,X . Functional guidance is then returned to the radio nodes and their sub-tended devices. Figure 5 illustrates iterative water-filling's incremental actions. Water-filling resource energization appears for 5 channels (A,B,C,D, and E). User 1 initially water fills with user 2 not present. This creates the interference shown for User 2, who then attempts to water-fill. Progressing (downward in Figure 4), User 2 now water-fills on Channels B, C, and E, which then creates interference to User 1. This will manifest itself as lower g values particularly for Channel C, and thus a higher probability of low g values in Channel C's probability distribution. User 1 then proceeds to water-fill a second time knowing that Channel C is not good with high probability so less energy goes there. Correspondingly, this means less interference on Channel C into User 2, who then sees higher g values and loads more energy into Channel C.
The energy-minimizing dual water-filling form is particularly effective as long as the two data rates selected for the two users are feasible (each with a water-fill solution relative to the other). This is equivalent to a two-user game in which each user can do no better by additional changes, sometimes known as a Nash Equilibrium [34]. The following simple example illustrates the LRM's potential use and guidance to two radio node users with IW. Calibration of changing {E i =u,X } is executed as necessary in the IW steps.   Figure 6's example has two users who each can use both of two channels with different gains (User 1 has attenuation corresponding to a "far" or longer-length channel while user 2 is a "near" or shorter-length channel). Both frequency bands A and B have the same gain on both channels (so they are likely close in terms of carrier frequencies). However, the interference between them is somewhat different. The parameter a is initially set at 0.1, but then later will be revisited to be 0.9 to illustrate some effects. The noise is zero-mean white, uncorrelated between the two users, and has variance 0.1. Each user is allowed 2 units of energy to be allocated to channels A and B. Table III illustrates the iterative water-filling process for the case of a = 0.1.

1) Example -2-User IW Versus Contention Protocol:
The data rates reflected in Table III are continuously flowing (streaming) for both users -there is no contention, even though channel A is occupied by both users as in Figure 6. This IW example illustrates that User 1 zeros Channel B, a quasi-frequency-division-multiplexing like solution. However, User 2 always uses both channels since it is the "near" channel, while the far channel (User 1) yields to the near channel on the band for which it performs worse (band B). For a symmetric channel with a = 0.9, the second step would lead to a fully FDM channel with User 1 using only band A and User 2 only using Channel B. Stage 1 ESM methods may often instead exploit a sufficient symmetry between channels when a larger inverse gain is evident, the energy moves to a beneficial channel split accordingly with each user occupying one channel exclusively. This may not be optimal, but can provide an acceptable solution when both users are heavily active.
As an alternative for the case of a = 0.1, a contention protocol on this channel operating continuously for fair comparison might initially attempt to transmit User 1 for one-half the time and User 2 for the other half. This would have no interference. The corresponding contention-avoiding protocol's data rates are 19 and thus a sum of 5.27 < 5.29 (which it has to be since the iterative water-filling considered this solution). However, for such always-on transmission, the effect of retransmission when contention might occur has been ignored if data were arriving randomly from the two users. Indeed, if both users desire access 1/2 the time, the contention protocol will fail and the data rate zeros. However, the IW solution clearly handles this case. Thus, if IW were feasible, this example illustrates that IW would be much better than collision detection when channel use is heavy. An alternative comparison could assume that User 1 and User 2 simultaneously transmit data only 10% of the time. In this case, Collision Detection (CD) will function properly with data rates of b CD,1 = = 0.9 · log 2 1 + 10 · .5 2 = 1.6 b CD,2 = 0.9 · log 2 (1 + 10 · 1) = 3.1 The rate sum, now considering the efficiency related to retransmission, is b CD,tot = 1.6 + 3.1 = 4.7.
The Ergodic IW at 5.3 bits would in this case only apply 10% of the time, while the remaining 90% time would transmit the nominal CD (or otherwise) sum of 5.2 bits. The LRM policy guidance to Users 1 and 2 would be transmit the water-fill solution in Table III if the interference is non-zero, otherwise use equal energy in both bands because there is no interference. The average remains roughly 5.3 bits, 20 and the gain of IW (DSM) over collision detection is 13%. As the wireless system use increases, the probability of collision increases, and the IW advantage would increase to be infinite at the point where the full throughput of both channels were used by IW. Again IW, while better, is NOT optimal and 19 Half the time but 2 channels corresponds to the factor 0.5 · 2 in Equation (33) and there is no interference with this time-multiplexed scheme. 20 It is somewhat of a coincidence on this example that the average data rate for the cases of interference and no interference are almost equal, and it is not true in general that these two different data rates will be the same. there are better yet solutions possible. It just does better than collision detection as evident in this example.
For Stage 1 ESM water-filling, the LRM needed only know the joint occurrence of certain sets of channel gains for the different users. This was tacit in assuming that the iterative water-filling procedure could be simulated in the LRM -thus that LRM process knew the channel gains to from the other users in the LRM. This means the LRM has previously observed situations where every other user's individual interference into a current user was viewed for a known transmit power level, and no other users were present. This would be evident from multi-user distributions estimated, as for instance described in Subsection II-E-2.
The authors also investigated a direct iterative use of ergodic water-filling, which a curious reader might be tempted also to construct. This however degraded performance in the situations tested. The main issue was the lack of a joint distribution leads the individual ergodic water-filling instances to be averaged over that single-user's distribution, but independent of the joint distribution. This can lead to non-zero probability of significant interference between users. The preferred version of Ergodic Iterative Water-filling here instead uses water-filling according to the joint distributions of non-zero probabilities of joint channel occupancy with significant interference, but provides policy guidance as a function of the channel gains. Those channel gains (when measured locally) will be different for the situations where a channel's users are simultaneously active, and the policy then anticipates and exploits this in the presented form of EIW here.

B. Stage 2 -Optimal Spectrum Balancing
For deterministic channels, the optimal multi-user spectra selection is well known (without any interference cancellation permitted) as Optimal Spectrum Balancing (OSB) [31]. The admissible range of all users' data rates are found by maximizing the convex weighted ( φ u are the weights) data-rate sum subject to an energy constraint on each user (the interference from other users enters through the term g X,u ): The constraints in Equation (36) can be relaxed to a total energy constraint such that 0 ≤ . . , U, which may correspond better to effective radiation limits in wireless antenna systems. OSB therefore outer bounds the data rate combinations that IW can achieve. IW can only match, at best, OSB. Margin adaptive IW can pick a rate vector for all the users b = b 1 . . . b U and attempt to achieve this rate tuple by minimizing energy for each user. However, such a point may also not be a best operational point for a given amount of maximum energy for each user, or for the total-energy constraint OSB's vector of data-rate weightings φ = φ 1 . . . φ U can adjust the influence of different users. (Stage 1 IW essentially arbitrarily assigns these weights.) The achievable outer-bound of rate tuples corresponds to tracing the region for all possible non-negative weightings φ ≥ 0.
OSB's solution forms, defining L X,u = ω u · E X,u − φ u · b u and L u = X L X,u , the Lagrangian The energy-constraint Lagrangian vector ω = ω 1 . . . ω U (ω = ω, a scalar constraint for the total-energy-constrained problem) can also be viewed in the above-mentioned MA dual problem that fixes a rate vector b and minimizes a weighted energy sum using these non-negative weights. The OSB algorithm discretizes the energy range with some ΔE into M = max u Eu ΔE energy values and recognizes the separability over the channels to maximize individually each of the L X,u terms over the |X| · U · M |X|·U possible energy values 21 for any given weight vectors ω and φ. The calculation of the possible interference transfers indeed requires U × U tensor generalization (each matrix element is viewed as a function with |X| input/output mappings) of the channel gain from vector g = g 1,X . . . g U,X to a matrix G. Calculation of the OSB solution is known to be complex (NP-hard). The maximum in (36) and (37) then sums the terms in L when the best vectors have been found. An OSB implementation (slow converging but simple to describe) is the gradient descent iteration (for the RA problem of maximum weighted rate sum for given θ), with E = E 1 . . . E U and E X = E 1,X . . . E U,X so each energy is a scalar function of the frequency bands indexed as X, as where α is a positive "step-size" constant. Similarly, for the energy-minimization (MA) problem and fixed energy-weight vector ω and known admissible /feasible target rate vector b: where γ is another positive "step-size" constant. Such a solution requires large complexity and also would need each radio node to know the channel gains of other radio nodes (physically impossible if required on instantaneous basis). However, Stage 2 ESM can be considerably simplified with some limits on the search that also allow local guidance to be a function only of local instantaneous values, as a revisit of the Figure 6's example now shows. A neural-net-based 21 The factor |X| · U corresponds to summing U interference components for each gain calculation in each of the |X| bands, while the M |X|·U factor corresponds to all the possible discrete energy combinations that could create g u,X values in computing bu. machine-learning method by Sun et al [35] has been found in simple cases to approximate well OSB.
1) Example Revisited: Revisit of the previous example readily determines that a solution of E 1A = 2 and E 1B = 0 with instead E 2A = 0 and E 2B = 2 yields the data rates of b 1 = 2.6 and b 2 = 4.4 (or a sum of 7 bits). A careful check of user 2's least significant bits would reveal that user 2 has slightly higher data rate in the first instance of this example. User 1 of course does much better with this frequency-division-multiplexed (FDM) solution that would also be produced trivially by OSB for some appropriate choice of the weight vector θ. Indeed, OSB is a function of this vector. OSB does not always produce an FDM solution, because it depends on the weight vector. The sum data rate is higher, and User 2 has essentially the same data rate while User 1 is much improved. The rate sum is 32% higher, while User 1 is 292% better. The guidance in this situation would be simple as "User 1, use Channel A," and "User 2, use Channel B, when interference is present." (This might be a solution guessed by a designer without all the theory, but helps illustrate the otherwise intimindating OSB math.) 2) Orthogonal Dimension Multiple Access (ODMA) Constraints: OSB solutions often exhibit a strong orthogonaldimension-multiple-access (ODMA) 22 character that often has each user using a mutually exclusive set of channels from the other channels, particularly for some choice of the user weights. The ODMA solutions often occur when there is an asymmetry in crosstalk between users, sometimes called a "near/far" or "strong/weak" situation. In such situations, iterative-waterfilling tends to over-emphasize the stronger user since the criterion tends to favor all users equally and has no weighting. Instead, OSB will with most weightings that don't zero the strong user's effect tend to allow the weak users to obtain a minimum data rate. Prior work [29] has found that when OSB has significant advantage over IW, the better OSB solution is almost always based on orthogonal dimensional multiplexing, so simpler design would use either ESM Stage 1 EIW largely unless a significant channel gain difference is evident between users with non-zero joint probability of occurrence, in which case an orthogonal-division multiplexing solution is best to use. Neither IW nor OSB is optimum in general, where "NOMA" (for Non-Orthogonal Multiple Access, see [9]) solutions can be used, of which ESM Stage 3 is a special case as in Section III-C.
ESM Stage 2 in practice would have the LRM search all possible ODMA solutions. If the number of channels is |X|, then each user could have 2 |X| possible band choices. For U users, this then becomes 2 |X|·U searches if equal energy were assigned to each channel. If there were M energy choices for each channel, then this becomes M |X|·U , so the order of computation is the same as OSB. However, the guidance for the ODMA solutions can follow the same format as ESM Stage 1 with one exception: certain different (singular) sets of active users could produce the same channel gain for the same victim user. The LRM needs to consider this in its 22 Frequency Division Multiplexing (FDM) is a simple form, but the channels may also be in space.
calculations and provide the worst-case FDM solution for such situations.
Various OSB simplifications, like ISB [36] and SCALE convex bounding [32] exist to approximate basic OSB well with faster-converging algorithms, as can also Multi-Level Iterative Water-filling solutions [33], [37]. These all would have similar modifications for the restriction to search of only ODMA solutions. They would all also correspond to Stage 2 ESM. It is possible that Stage 1 would outperform Stage 2 simply because Stage 1 is less restrictive in terms of only specifying functional guidance to each radio node rather than ergodically imposed ODMA constraints. However, the LRM would know this and simply provide the Stage 1 style guidance. Indeed the radio node in this case would not know whether it was being operated by a Stage 1 or Stage 2 LRM.
Stage 2 ESM is particularly pertinent for "mesh networks" that have sub radio nodes within a node. In such systems, the sub nodes act as relays and thus correspond to 2 users (one receiving and one retransmitting) on different channels. An LRM operating for a single radio node with sub nodes would find the optimal ODMA solution among all the FDM solutions for the mesh. It is feasible to consider also solutions where Stage 1 is used between radio nodes and Stage 2 is used within the node's mesh.
Subsection III-B-4 introduces an ODMA-specific algorithm that greatly simplifies the search and is essentially a discrete form of the "multi-level water-fill" algorithms in [37], [33], which remains considerably simple compared to the neural network solution in [38].

3) Example of 4-Band 2-User Complexity:
Another simple example illustrates the rapid growth of Stage 2 complexity. Returning to Figure 2, two users will divide the 4 channels between them. For this example b 1 = b 2 = 4. If integer bits are allowed, this means the number of energy levels on any channel cannot exceed M = 5 (zero plus the energy to transport 1, 2, 3, or 4 bits on that channel). The maximum number of possibilities to search in this case cannot exceed 5 levels for each of the 4 channels for 2 users or (5 · 5 · 5 · 5) 2 = 5 8 = 390625 possible OSB spectrum choices to search. However, this maximum number can be reduced: since each user's bits must add to 4, it is possible to see that if one user uses only 1 channel for all 4 bits, there are 4 choices for that user (use one of the other channels). If that user instead places 3 bits on one channel, then that same user must place one bit on one of the other 3 channels leading to 12 more choices. Similarly if that user places 2 bits on one channel, there are 6 distinct choices for the case of 2 bits on another channel. Also for the user with 2 bits on one channel, there are an additional 3 ways to place 1 bit each on each of the two remaining channels, so a total of 12. The number of possibilities so far is 4+6+12+12 = 34 The last combination of 1 bit on all 4 channels is leads to 35. The total number of combinations is then 35 · 35 = 1, 225 for two users that each have 4 bits. With the ODMA restriction, further complexity reduction occurs, the overall complexity can be reduced to 150: If each user uses 2 channels, the computational complexity is ( 4 2 )·5 = 30 (5 arises from the possible bit distributions across the two channels that are 40, 31, 22, 13, 04); If User 1 uses 1 channel and thus User 2 can use up to 3 channels, there are 4 * 15 = 60 where 4 is the number of possible channel selections of User 1 and 15 is the number of possible bit distributions for User 2 (400, 310, 301, 220, 211, 202, 130, 121, 112, 103, 040, 031, 022, 013, 004). The reversal of User 1 to 3 channels and User 2 to one channel is another 60 possibilities due to symmetry. The [1111] combination for one user is not possible because the other use then can't get 4 bits (or any) bandwidth. Therefore, the total complexity reduction is from 35 * 35 = 1225 to 150. However, as the number of bits (and therefore energy level possibilities) increases for instance to the 6 possible SQ QAM choices of LTE and Wi-Fi, the necessary computation rapidly rises (for 4 channels) to nearly 2 50 . This example illustrates that for small number of users, it is possible to compute fairly easily an ODMA solution, but the alternative in Section III-B-4 allows less complex algorithm to address large numbers of users.
4) An Implementation of the Stage 2 ODMA Algorithm: The Stage 2 compatible radio node can provide an indication of its volume of use for an observation interval to the LRM. (Again this indication can be indexed by time of day, peak periods, off-peak periods, etc.) The LRM can compute the data volume for each user for a given normative time/observation period for all users. That volume will be called V u , essentially an average throughput measure. Basically, this volume is the number of bits/bytes transferred in an observation interval. The LRM orders the channel gains for each user across the channels X from largest to smallest. The users are ordered from largest to smallest V u . This algorithm is somewhat greedy (serving the users with greatest volume of need, but those needs can increase if a user receiving little channel assignment therefore begins to see greater average volume need. The algorithm's complexity is basically U · |X| (essentially on the order of the Stage 1 IW approach). The algorithm essentially creates a water-fill problem with different water levels for the different channels used by any particular user who uses more than 1 channel to determine transmit-energy policy The channels with zero energy for any particular user have low water levels, while used channels have a higher water level, emulating [33], [37]. This will be considerably simpler than the methods in [38], although the latter are curious and might be modified to include simultaneously Subsection II-E's joint p g estimation.

5) Ergodic OSB:
There is an ergodic form of OSB that might however be also considered. Ergodic OSB only guarantees optimality (in the absence of any Stage 3-like interference cancellation) with infinite buffer-scheduling delay and truly ergodic statistics. Nonetheless, this paper provides it for completeness because it has not appeared elsewhere (to the best of the authors' knowledge). The complexity of the joint averaging and algorithm render it likely impractical so apart from presenting the result, this interesting theoretical algorithm is not pursued further in this paper.
Ergodic OSB's uses the joint probability distribution p g,X for the random vector of channel gains in each band X, and becomes (or a total energy constraint) where averages over the joint probability distribution's marginal distributions for each of the users, p gu,X are (found by summing over all the other users possibly gain values): presumably pre-calculated and stored, requiring |G| U calculations. The Lagrangian terms adjust to L gu,X = ω u · p gu,X · E u,X − φ u · p gu,X · log 2 (1 + E u,X · g u,X ) with L u = gu,X ∈Gu,X L gu,X (ω = ω with a total energy constraint) and then Equation (37) remains the same. The energy range is similarly partitioned into M discrete levels and the complexity then becomes |G| · M calculations 23 for each term and then adding |G| of these maxima together for each index in L u , so then |G| 2 . EIW's complexity might appear less than IW, but the large calculation burden shifts to the large computation amount |G| U for the probability distribution in (41). The gradient search steps adjust to or for the MA case More sophisticated search/descent methods than the slowly converging gradient can be used, and OSB methods are notorious for high complexity and a variety of numerical-precision problems. Nonetheless, the basics illustrate Ergodic OSB.

C. Stage 3 -Vectored ESM
Stage 3 Vectored ESM allows spatial interference cancellation through some additional coordination of multiple 23 |G| is the maximum number of gain segments for any user over all used channels X. radio nodes' multiple-antenna systems. Stage 3 ESM essentially configures the multiple antennas to provide signal separation without performing real-time adaptation. ESM Stage 3 attempts to capture consistent spatial patterns in adaptive instantaneous edge RRM that for instance is well addressed in [39] and [40]. Such spatial division multiplexing also occurs with Massive MIMO systems, as per for instance [38], and [24]. Each ESM Stage 3 radio node has multiple antennas for at least downlink transmit and for uplink reception. The devices in each radio node's cell (same "color" as the radio node, so therefore the same used channels) can have one or more antennas. The Stage 3 ESM radio node has more antennas than current devices simultaneously in use, and ideally the number of such radio-node antennas exceeds significantly the total number of users L U . ESM's Stage 3 depends therefore on massive MIMO's [41] presence in the radio node. This subsection focuses on a radio node with L U antennas, and U devices with 1 or more antennas each. However, the math will be for the single antenna per device/user case. Extension to more device antennas is notationally tedious but otherwise straightforward.
The ergodic cloud-managed portion of Stage 3 ESM relates to the consistency of spatial directions for either nulling (to reduce other users' interference in reception by an antenna array) or energy directional focusing (to allocate more power in the direction of the user in transmission from an antenna array). If the user positions are relatively constant, as might readily occur when say one user usually uses their laptop at a desk for certain hours of the day while another user's television set is in a particular location at those same times. The spatial arrays used to set to the corresponding desired directional nulling and focusing will not vary relatively to each other if the radio nodes have a common clock. Movement of a cellphone user in a certain area of call receipt/generation also might well have most probability in certain locations. These will be reflected in the probability distributions generated. Small spatial-position variations of the exact transfer functions can be centered in clusters based on the average position, or equivalently the average transfer function measured. This section's ESM Stage 3 then exploits these ergodicities in spatial-interference cancellation, presuming a common clock.
Within any node, multi-user MIMO (MU-MIMO) methods are well established, dating to their first uses with independent energy allocation in 2001 [40], which also make use of diagonal dominance that will also occur for large L U in wireless applications. MU-MIMO methods benefit from the nodes' learned knowledge and coordinated management of all downlink transmissions or alternately from learned co-processing at a single point for all the uplink signals. These methods make use of Generalized Decision Feedback Equalizers or their dual generalized precoder forms, and have essentially optimal multi-user performance on vector broadcast and vector multiple-access channels [15] (Chapters 13 and 14). Again, they all require centralized control at the radio node.
ESM Stage 3 assumes that precoded interference cancellation of other radio nodes' signals is not possible (because those signals are not available, unlike the MU-MIMO/vectored case). ESM is different than LTE's Coordinated Multipoint Transmission (CoMP) [1], which operates however with a smaller number of antennas and physically coordinates separated radio nodes at instantaneous transmit signal level. 24 Similarly, no individually controlled post-coded subtraction of another radio-nodes' user interference is (generally) possible for a receiver because it may not have access to, nor be able to decode (itself), that radio-nodes' user signals. However, if a radio node has enough "extra" antennas it is possible spatially to exploit linearly these extra dimensions such that the radio nodes jointly steer downlink to each other's "null space" (or null/notch uplink). When centrally coordinated by a single radio node, this can be very effective and is the rationale behind Massive MIMO, and nicely addressed in [42]. Enough extra antennas usually means that the number of antennas exceeds significantly the total number of users, although the exact excess needed depends on the link. The more the excess, the more flexible are the possibilities for steering and acquiring without explicit (unlike optimal MU-MIMO) or GDFE ( [15], Chapter 5) need for other radio-nodes' user signals. [43] shows that usually 2x to 3x the number of users is sufficient for the number of antennas to be considered large.
ESM Stage 3 again presumes radio nodes' symbol synchronization. This can be achieved by radio nodes' use of a common inferred clock through multiple methodologies beyond this paper's scope, but Section V provides some suggestions. Better synchronization implies greater spatial accuracy.
1) Vector Channel Models: To understand ESM vectoring, a deterministic channel model is first summarized. For the deterministic model, a prescient controller might theoretically have access to Figure 7's large channel-gain matrix H and non-user noise autocorrelation matrix R nn such that a vector of all channel outputs' responses y to all users' inputs x follows the vector model y = Hx + n. (44) This spatial model applies to each tone within a channel in a synchronized ESM Stage 3 system. The spatial time-ergodicity then essentially presumes slow movement/time variation within the environment. Essentially this implies E [H] → H.
The gains matrix G would be G = E H * · R −1 nn · H and would have within it all interfering paths specified in terms of each's channel gains to all others. The LRM can compute the average, which distinguishes it from the nominal real-timecomputed MU-MIMO spatial processing that often occurs within the radio node if all the antennas were to be connected to that node. In ESM, the antennas can be distributed and the averages are thus used without need of all antennas' connection to the same node(s). Equation (44) with these constraints is known as the vector interference channel.
The entire downlink multi-user channel will have U outputs (1 antenna at each device or output) and each input radio node has L U antennas so therefore a total of LU antennas. The model is: Transmission from user input u corresponds to the model component is the contribution from the 1 × L input x down,u and user u 's corresponding output component can be written as: The input to this channel, x down,u , when L U can beamform zero energy to each of the other user's i = u single antenna locations (directions). Some authors refer to this as transmitting in other users' null space [42]. Correspondingly, the uplink channel is similarly modelled with a single scalar transmit antenna at each user location all transmitting to U separate radio nodes (the variable U is reused here with a user corresponding to each radio nodemore than one user on a particular node would be handled by the existing MU-MIMO locally present at the radio node; thus the ESM Stage 3 focus is on signals from U users going to separate radio nodes): In the uplink case, each radio node u has an uplink L × 1 received vector, and the received signal is where J u is a puncturing matrix with an identity in the positions to pass only user u s output dimensions and zeros elsewhere, so it passes the appropriate L rows of H up . In the uplink direction, each of the U − 1 columns (i = u ) of the L × U channel row-subset matrix H up,u represents interference. When L U , a single 1 × L "equalizer" (diversity combiner) can zero all the users' i = u energy at the detection point of user u's detector so that only user u is received. In this case again, the other users transmit uplink in user u's spatial null space.
In the situation where different radio nodes may possibly be allowed to use the same channels for uplink and downlink, effectively the number of users doubles (U → 2U ) in the models, and some of the individual user's input/output models have corresponding dimensionalities of anywhere between L× 1 to L × L downlink and 1 × L to L × L uplink. Otherwise the concepts remain the same, but with tedious bookkeeping of antennas in models.
Nominally, the linear matrix operations in Equations (44) - (48) are per (complex) dimension (tones). While the geometric averages readily apply to energy quantities, and the channel coefficients for adjacent dimensions in frequency may have the same absolute magnitude, their phases at least will be different. In the most complete case, Stage 3 ESM would then use these models for all dimensions. In wireless practice there is a coherence bandwidth over which adjacent frequency dimensions will be strongly correlated and thus largely of the same amplitude (with readily adjustable linear-phase change for tones over the coherence bandwidth). Many wireless systems thus only compute the magnitude and phase for tones spaced apart but within the limits of the coherence bandwidth. LTE for instance uses pilots or reference signals that are within this bandwidth. Thus the models above can apply (with linear phase interpolation) over many dimensions. The interpolation used can be sophisticated or simple and a good reference on methods to interpolate between the tones or frequency dimensions are well studied by Ling in [44]. Most LTE and Wi-Fi systems already use such interpolation. This work will assume such interpolation is in place already, although Section V will discuss interfaces between the LRM and the radio nodes in more detail as to the control-system bandwidth and this per-tone issue in Stage 3 ESM.
2) Optimization of the Vector Interference Channel: When the number of downlink transmit antennas is large for many or all the radio nodes' transmitters, relative to the number of users L U , the tall matrix E [H down,u ] → H down,u can be preprocessed (precoded) with a set of linear transmit matrices at each radio node's massive set of L antennas. This creates many degrees of freedom upon which spatial modes may transmit for each of the massive-antenna transmitters' radio nodes.
These spatial modes can be energized or zeroed such that only the desired receiver captures energy from the intended user. Energy transmitted in the (average) spatial direction of the un-intended users is zeroed. Apart from singular cases where two users are exactly on the same line that passes also through all transmit antenna locations, enough antennas can achieve the spatial separation. Dually uplink, a large number of receive antennas can capture only energy from the intended user while spectrally notching all other users' directions. These effects are sometimes called "channel hardening" [41]. Figure 7 shows a downlink transmit orthogonal matrix tuner that zeros energy output at all locations except the intended location.
For the deterministic case with Figure 7's instantaneous adaption of all transmit precoder's W down,u (downlink) or all receive postcoder's W u (uplink), it is possible largely to eliminate interference if the number of antennas at any one transmit (downlink) or one receive (uplink) location L significantly exceeds the number of users, L U . In these cases a linear solution is (asymptotically) optimum and can be found for each user using the corresponding pinning vector σ u that is all zeros, except for one "1" in the u th position. If the corresponding U × L downstream matrix is given by H down,u (u), the optimal set of synchronized linear precoders (each operating on its own input) is given as an L × 1 vector by where α u is a scalar that ensures the transmit energy is not increased. Again, an implied ergodic average is present in (49), and in (50) below. In reality, some spatial motion creates a variance around the average that will in ESM be treated as an additional noise, and the LRM computes the average. A superscript of + means pseudoinverse when the I term is ignored. (The added identity is usually ignored in zero-forcing approaches (without much loss) in (49)). A superscript of * means conjugate transpose. The corresponding uplink receiver postcoder is 1 × L and equal to The downstream H down and upstream H up matrices need not be the same, and of course again would be for each frequency dimension within a channel (whether interpolated or fully reported in the Xlin s of the channel matrices). This system enables space-division multiplexing where the same time/frequency dimensions can be shared by all users because the large number of transmit (downstream) antennas, or receive (upstream) antennas, essentially beamforms a notch in the direction of the other users, allowing the common channels' spatial reuse. However, the requirements on control are severe in that no one device or radio node has access to all the signals so their channels' H down H up would need to be known at a central (LRM) location. In the vector interference channel, none of the users require inputs from the other channels (which is not physically possible since all are processed in different locations, but they do need to know the average channel matrices). Essentially this Vector ESM solution is no different on average than each radio node viewing all other systems as within its own cell and adapting antennas/space accordingly as in a MU-MIMO system; however, each has the benefit of being close to its own radio node on the one non-zero path that links the relevant user. Such an ideal system would then allow frequency/time dimensional reuse across space. This would require training protocols to be completely synchronized on a time/frequency grid that spatial reuse was agreed between all users. This is the reason for the common symbol clock that was listed earlier as a Stage 3 ESM requirement. The spatial cancellation's ergodic time average will be correct if the users do not (significantly) change position, even if the common clock drifts for all synchronized users over time.
3) Updating the Precoders and Postcoders: Stage 3 ESM recognizes that each L × L precoder/equalizer is first locally computed through a QR factorization of an identified channel matrix. For downlink, this channel matrix can be recursively constructed one user's row h u at a time (complex measured gains from the radio node antennas to the single user antenna) to the radio node during initialization, corresponding to each user's identification of training signals sent to it. That initialization will only return this information for devices associated with that same radio node. However, the LRM can collect (more slowly and average) these channel row vectors for every user (with respect to every radio node's L antennas). This identification would be associated with the other radionodes' colors (different from the radio-node color associated with each device's primary environment). Each node could then accept such effectively "user-direction" vectors (from the LRM) as input to the QR factorization (which becomes larger, but still computable since U < L ) of (51). As long as the user's position relative to the radio node remains the same (on average small zero-mean movement), then the Stage-3-capable radio node simply accepts (up to) U − 1 such vectors to add to its QR factorization to determine the transmit precoder matrix. Each row can be written as (52) where the separation point u increases (moves to the left) with the number of users. The users can be reordered at any radio node so that the user of interest (user 1, which corresponds to the particular radio node or color under direct ESM Stage 3 control) is at the bottom, thus σ u → σ = 0 . . . 0 1 . Thus, user 1 is the one for which the transmit energy is desired to be non-zero at the single-antenna receiver in its own radio-node (or color). The user indices u ≥ 2 then refer to other users (colors) in whose direction zero-energy transmission is desired. When u = 1, the situation is single user and h u =h u . The QR factorization of then u × LH down,u can be written where Q down,u is unitary (QQ * = Q * Q = I) and will not be unique when L > U. Further, when L U ≥ u , the square upper-triangular matrix R down,u will be diagonally dominant [40], so the off-diagonal terms are small relative to the diagonal elements in the corresponding row. By combining (51) and (53), The uplink process is the same, simply with commuting of matrices (and again L U diagonal (column) dominance to get For uplink, the radio node directly identifies the channel from devices within its cell (same SSID in Wi-Fi). This can be more instantaneous than downlink's via-LRM. The LRM could indicate when other-color radio nodes are likely to be excited, and thus for the additional uplink columns for each other-color active uplink user, an additional column would be added to H u prior to the QR factorization that determines W u . If the device(s) relative to the radio nodes are stationary, then these additional columns should be constant. The users rows can be ranked in terms of importance to add to the overall channel matrix in terms of the values of h u 2 since this interference otherwise would be the largest noise contribution.

4) Example of Vectored ESM:
Two radio nodes operate downlink, each with L = 5 transmit antennas. Each radio node attempts communication in the same frequency band to a single user with 1 antenna. There is interference from the other radio node's single user. A very simple model to illustrate the effects is .9 .9 −.9 −.9 −.9 In (57) each of the two inputs can have total power 1 (across all antennas) and the noises are independent, Gaussian, and of variance.01. Such a channel is oversimplified, but basically creates a situation where user 1 interferes with user 2 at 6 dB below signal level (the negative signs attempt to indicate some phase differences without overly complicated the mathematics here that are only intended to illustrate basic concept). User 1 experiences more heavy interference (basically only 1 dB reduced) from User 2, possibly indicative of a mild "near-far" channel. User 2 is physically separated from User 1 on the device side. Radio node 2 does not have access to User 1's inputs, and vice-versa. Nonetheless, the channel matrix can be written as in (57). The linear downlink precoder at Radio Node 2 is a 5 × 1 matrix that can be computed from (49) as The second column of the pseudoinverse is shown in (58) because it might be that the two users roles were reversed (or even roam from one node to the other), but only User 2 is important at the device for user 2. This is checked readily by computing This means that any energy from User 1 will not appear at receiver 2. That is needed because similarly Thus, the two users can spatially share the same frequency/time dimensions. If there is time variation of the H down , then it is replaced by the average value as determined by the LRM. To ensure 1 unit of energy across the 5 antennas, it is useful to note that W down,2 2 = 0.2 or 1/5, so that the input energy to the precoder would be then 5 units to ensure 1 unit of energy across all antennas is transmitted. These 5 units would reach User 2's device interference-free. Apart from synchronization, Radio Node 2 knew nothing about the input of Radio Node 1 (and vice versa). This is vectored ESM in its simplest form. However, vectored ESM is not optimum. For small noise, the optimal receiver on this channel would rely on the factorization in (53), for which the R matrix can be found as The matrix is not quite diagonally dominant with 5 antennas, but the loss factor for a perfect dirty-paper precoder [15] (Chapters 5 and 14) would be 4. The optimal precoder's overall Stage 3 ESM improvement is 5/4 = 1dB. Thus, using linear instead of optimal nonlinear precoder loses 1 dB. If L → 10 in this example (with interference coefficient remaining as amplitude.5), the loss is 0.46dB, and with L = 100, the loss is 0.04dB. Thus, diagonal dominance is increasingly evident in ensuring that the linear solution is nearly optimum. A similar uplink example could be constructed. The overall gain here is at least 100% because two users can share the bandwidth that only 1 could use previously (and collision detection would dramatically increase the 100% if both users are streaming, similar to Subsection III-A's example). If time variation is significant, the downlink MCS choice will consequently be less aggressive and the gain reduced.

IV. MCS CRITERIA, FUNCTIONAL SPECIFICATION, AND GAINS' PROBABILITY-DISTRIBUTION ESTIMATION
This section addresses first the LRM's separation of QoE-based MCS selection and Section III's spectral (and Stage 3's spatial-vectoring) optimization. Figure 8 illustrates the overall ESM process, showing both parametric feedback to the LRM and policy guidance from the LRM. The radio nodes (only one is shown, but the process is the same for all) provide recent channel-gain, energies used, and MCS values to the LRM along with various recent-history QoS parameters θ u . The LRM processes these values to produce the QoE estimates, as described in Subsection IV-A in parallel with calculation of the channel-gains' probability distributions p gu , as earlier in Subsection II-E. Section III used the latter p gu to compute the spectral policy guidance. This p gu also maps into a ergodic-average MCS, which is in turn used by the logistic regression process to determine if the radio node's MCS choices are consistent with the user's QoE. QoE measures indications of internet-user/thing satisfaction -typical indications include complaint calls or complaint messages, need for repair, service drops, like or more importantly help or "thumbs-down" buttons, mean opinion scores, etc.; QoS measures engineering metrics like packet-error rates, variation in data rate, and outage probabilities. Subsection IV-A introduces logistic regression methods to estimate QoE from QoS based on earlier training that used QoE data indications, presuming a level of ergodicity in this relationship.
Because of the closed feedback system, ESM effectively then jointly optimizes the spectra and the MCS, although both are done largely independently, simplifying one of traditional RRM's major computational challenges. There is however the link of the two optimizations through the data rates (throughputs) achieved as these in turn affect the spectrum choices through the loading methods of Sections II and III. Similarly, the throughput achieved is one of the observable inputs to the QoE estimation in the QoS-parameter vector θ. Subsection IV-B describes a Markov Model (state-transition control system) that simplifies the MCS optimization guidance via an offset method as determined by the QoE estimates. This is sometimes called reinforcement learning [45], particularly when the state machine on which it is based is dynamically determined.

A. QoE Estimation From QoS
For ESM, the QoS objective extends to QoE via a logistic regression calculation that relates a QoE "happy/sad user/customer" random variable to a linear combination of various measured QoS observables 25 : where p QoE is defined as the probability that the customer's QoE is good. The most accurate use would be for each user, but that presumes data has been previously collected. Instead, users can be clustered based on characteristics shared by the users and then training results are applied to any and all members of the clustered set. For example, users can be clustered based on the type of subscribed services or service locations to avoid basing them on the same observables used for prediction. The variables θ j , j = 1, . . . , J typically include observables like number of (or percentage) of recent (or historical) collisions ("outages") on the particular user's link, indications of errors (like cyclic-redundancy-check failures) or erasures on the link, a device-model/version indicator, large (maxmin) data-rate variations, an application type (streaming video vs short data packets vs audio, etc.), and/or other observable metrics. Figure 8 also shows the current reported MCS and channel-gain values can be observables used in the overall ESM process. Sometimes features are extracted from other data and then converted into the observables, possibly with nonlinear functions (for instance a neural-net rectified linear unit (RELU)) used on the observable data [2], as a forthcoming example will suggest. β 0 is typically a offset/constant and thus θ 0 = 1. The LRM learns the row vector of coefficients β = β 0 . . . β J . The observables can be similarly stacked into a column vector θ, so LLR QoE = β ·θ. The QoS criterion in (19) can be then updated to be the QoE criterion The LRM learns the customer-QoE probability LLR QoE through the LRM's collection of various user QoE data. Typically LLR QoE is estimated (or updated) over an observation interval (typically much longer than a symbol period) from this QoE data such as user-complaint calls to a service/help desk, user requests for chat-box help, dispatches in some cases to a user's location, discontinuation of service (drop/quit the service), customer surveys, mean-opinion scores, and like (or better yet "unlike") buttons. The quantity p QoE can be learned over several successive observation intervals and may have distinct values for different types of observation intervals (like peak-use periods for the evening in residences or off-peak use periods, even every hour, every day, etc.), and ESM can apply an individual metric across such observation intervals that is as defined in (62).
As mentioned above, clustering of users with common characteristics can also be used to apply common training results to all members of the cluster. The user clustering can include U subsets (in a slight abuse of the U notation to mean number of users from Section III, while here it means number of user clusters). A method for such clustering can be the "k-means clustering" [46] where the clusters are based on the characteristics vectors θ being partitioned into U groups where the mean-square distance for any point in the group from the centroid of its group is minimized over the choice of being in any other group.
The base of the log in (62) simply scales the learned β. Base 10 logarithms lead to simple interpretations like LLR QoE = 2 means the user is happy roughly 99% of the time, while LLR QoE = 5 is "five-nines reliability," and so on. The quantity p QoE is presumed stationary or really ergodic (if computed separately for different times like peak, off-peak or times of the week, the terms "cyclo-stationary" or "cyclo-ergodic" might be more appropriate). Connectivity-usage patterns/statistics have been often found in the field to be consistently periodic, with of course some random unpredictable part that augments the consistent cyclo-ergodic/stationary part. The random part is inherently averaged or statistically bounded in ESM.
The regression vector β can be computed from the raw data sets used to compute p QoE , which are matched to θ's observation intervals. Such computation uses a time index of k for the series of successive observation intervals. For instance an observation interval in which any of the events like call, dispatch, "unlike button", etc. occurs could be viewed as a binary QoE variable d with d = 0, while periods of no (negative) consumer reaction set d = 1. These variables can be aggregated into a data vector d over the set of such observation intervals. Correspondingly, the observations' value for the corresponding observation-interval index k is θ k . The matrix Ε stacks these vectors of measurements as rows so that Θ * = θ * 0 θ * 1 . . . . By initializing estimate ofβ 0 = 0 and defining the data's intermediate probability estimate as LLR QoE,k =β k · θ * k , the quantity p QoE can be estimated bŷ An Iteratively Reweighted Least-Squares (IRLS) (see [47]) can be computed over all the observed data aŝ which will converge over reasonable conditions [34].
Once an acceptable β has been found, it can be used to compute an estimate of LLR QoE through (62) for situations where the QoE actual data are not yet known, but presumably consistent ergodically with previous training. This ergodic consistency could be particularized to individual users and their applications/devices in use, depending on the LRM's desired sophistication (and age in terms of available earlier training data), as Subsection IV-B further examines.
As the LRM experience grows over many observation intervals, the vector β can be used along with the computed distribution p g to predict the channels/dimensions to be used with appropriate corresponding energy, but then also used to predict the modulation-coding-system (MCS) parameters r |C| that will be best ergodically. The MCS parameters' separation (from assigned spectral energies) does not reduce the performance [14] and constitutes a simplification over many previous RRM methods. These MCS parameters, as a simplified function of g, are communicated to the radio node as a set of recommended policies to be taken for that radio node's observed subsequent instantaneous g geo,X values. This g geo is also reported (with delay) to the LRM as historical data for the LRM's subsequent calculations, as in Figure 8 to complete the ESM feedback process. In this manner the LRM can update its distributions and derived functional outputs to accommodate any new (unexpected and not predicted) conditions.

1) Example -Feature Extraction:
A QoE probability of p QoE = .99 is observed in training data, meaning that only 1 user in 100 is showing discontent. Three observable QoS parameters are available: the number of packet errors over a certain time interval, the number of unexpected retrains or outages in that same interval, and the difference in maximum data rate and minimum data rate over that same interval. In training or feature extraction, it is noted that discontent periods will often show that at least 2 out Table IV(a)'s following 3 conditions are present (15-minute observation intervals might be typical here): Table IV(a)'s simple feature extraction essentially hard limits the observable at a threshold of occurrence. Then, the LLR QoE could be estimated perhaps resulting in β P E = β OUT = β ΔR = 2 3 , which would correspond then trivially to Table IV(b)'s QoE range specification: However, the observed feature extraction might instead be a piece-wise linear function with extreme values ±1, but intermediate values allowed for indication levels below the thresholds. In this case the LLR QoE will take a continuum of values and fall into one of the ranges. Thresholds could be learned (as could the values of β be adjusted). This can be modeled as a depth 2 neural network with a RELU in the first stage to implement the feature thresholds and continuous outputs below Table IV(a)'s thresholds and the second (linear) stage to implement β. The computed LLR QoE then provides a means to assess if a current ESM guidance function may need update on the MCS coding-parameter functions. For instance, too many very unstable measurements would suggest more conservative coding parameters (lower code rate and/or smaller constellation size) be used in the guidance function, while very stable indications would suggest higher code rates and larger constellation sizes for a larger data rates. These in turn could cause further adjustments in the thresholds and/or β values. Subsection IV-B next elaborates more on fairly simple state-machine (or Markov models) that can be used for such situations and largely ensure the feedback system's stability. Figure 9 provides an example of the LRM's state-transition representation of the MCS parameter choices for a particular radio node. Each box represents a state. The darkened boxes contain specific example numbers while all the empty boxes' similar numbers can easily be determined by inspection: The constellation size |C| increases upward on the diagram and the code-rate parameter r increases to the right. The red-colored path indicates a possible sequence of MCS choices that start at QPSK (|C| = 4) and r = 1 / 3 . The ESM process first makes code-rate increase to r = 1 / 2 while holding the constellation size to QPSK; then the ESM process increases constellation size to 16QAM while maintaining code rate; these changes precede another code rate increase, two more constellationsize increases and finally a code-rate decrease before a state (MCS setting) is determined that looks to be the best for some system. This sequence might have occurred, for instance, for a code being optimized according to Equation (19). This ESMoptimization sequence occurs in the LRM for certain QoE metrics learned as a function of including the MCS "state" itself in the logistic regression process as shown in Figure 8's feedback.

B. Markov Modeling of the Regression and Optimization Processes
The LRM would presumably know the code and choices that a radio node can implement. These may be specified in standards for the transmission system (for instance Wi-Fi has over 100 possible MCS settings in recent versions that are required by the 802.11 standards) or they could be learned from observation of the radio-node-to-LRM-supplied MCS settings over time, perhaps creating an initially sparse version of Figure 9 that expands as settings are observed and applied. As Figure 9 indicates, up and to the right corresponds to better QoE while down and to the left indicates worsening QoE. Optimization clearly tries to get as far up and to the right in the state machine as possible without producing poor QoE, because these directions correspond to higher data rate. However, overly aggressive MCS settings could result in more interference to other systems and their consequent responses that would in turn create more reverse interference and thereby  Table V illustrates 4 possible thresholds and actions for any particular state. These rules here are fixed in the table for illustration purposes but could be determined dynamically by implementation with neural nets or other artificial intelligence methods and then would be properly called reinforced learning, as per the above comments in Section IV's introduction.
The thresholds here appear higher than those in Section III's example. However, that example did not include the ergodic-average MCS parameters as an observable, which presumably is used to drive QoE closer to the objective of 99% happy users. The numbers in the 3 rd column represent potential indications given by the LRM MCS guidance in the form of an offset to the instantaneous MCS that the radio node would otherwise (in the absence of ESM guidance) select. These of course arrive in parallel with channel and energy policy. The MCS ±2, ±1, 0 could also be supplied as a function of locally measured g, i.e. as a policy. A +2 means move up in constellation size relative to the nominal position in the statetransition diagram that would otherwise have been selected, while +1 means move right, 0 means stay in the current state, and so on in Table V. A more sophisticated system might allow for larger values in the 3 rd column that would correspond to more aggressive moves (more than adjacent states) in the state diagram. This particular type of optimization is relative to what the radio node would do without guidance and thus in effect uses and improves upon the radio node's initial design-time models on MCS for a particular channel.
V. SOME ESM RESULTS AND SUGGESTIONS ESM results so far have used simple examples to illustrate concepts. A deployed system's settings and information transfers require both good design and data-based experience. Actual observables, learned thresholds, exact choice of spectra-selection/optimization algorithm, state deletion from state transition diagrams, distribution estimation, and other considerations can vary from deployment to deployment. In some cases, the actual data can also be too proprietary for public disclosure. This section attempts to illustrate field benefit achieved for some early ESM field use. One important area is simply QoE versus QoS, and Subsection V-A addresses some field results that show QoE estimates versus actual aggregated customer data. Some suggestions appear for various standards groups. Some optimization benefits also are illustrated for various geographical regions with some explanations. Subsection V-B discusses logical interfaces between radio nodes and the LRM that could be reasonably specified (or even where some may already exist). Synchronization assumptions often merit skepticism from experienced communication engineers, as might be the case for Stage 3 ESM (despite its potential large benefits). Subsection V-C attempts to allay such concern.

A. QoE/QoS Correlations and Ergodicity Examples
Section IV-A's simple example on stability suggested the LLR QoE have intermediate ranges, for instance good, indeterminate, and bad. Figure 10 illustrates ESM field-diagnostic correlation with the two QoE raw-data inputs of connections that had complaint calls and connections that needed a dispatch for repair. These field results are for millions of customers who subscribe to an internet service, with Wi-Fi as the last link, for which the QoE data was available after training was complete. Thus, Figure 10's data is not the training data, but instead measures the QoE estimation's true accuracy. QoS parameters including packet errors, retrain counts, and data-rate changes were reported to the LRM, and then the LRM computed LLR QoE with the 3 ranges shown of good QoE (green), poor QoE (yellow), and bad QoE (red).
Clearly the projections based on earlier training correlate well with new data in that the LRM's declaration of a bad connection correlates strongly with a large percentage of calls and dispatches. This exhibits a form of ergodicity. Similarly, the LRM's declaration of a good connection corresponds to comparatively low call and dispatch incidents. Once such correlation is established, the additional observable of MCS-parameter choice can be introduced to improve further the total numbers of bad QoE (unstable) customers, as in Figure 9's ESM process. Figure 11 instead shows field results of very simple Stage 1 ESM based systems (only roughly 10-20 states in the state transition diagram) that are used to alter MCS parameters for large internet service providers 26 in the countries listed. In these systems, basically the parameters that could be varied were the code rate (2 choices roughly close to ¾, and 9 / 10 ), a power-margin and data-rate combination of parameters equivalent to constellation size. The different levels of improvement merit some explanation. The stability improvement plotted is the decrease in the number of internet-service connections that were characterized as bad QoE after optimization (as for example the red area in Figure 10) as a percentage of the bad QoE without optimization. The UK and France are highly competitive internet-service markets with low prices and service-provider attempts to offer higher speeds to retain customers. Thus, those country's internet connection speeds often see aggressive setting of MCS and data-rate parameters. By isolating only those customers who have poor QoE (as in Figure 11) and then optimizing them, ESM improves upon otherwise overly conservative designs that previously had applied correspondingly ubiquitous overly restrictive worst-case spectra and MCS choices (and thereby ESM provides a better competitive internet service offering). The United States in Figure 11 corresponds to a less competitive market (wireline internet plus Wi-Fi) with higher prices, and thus has less aggressive speed-attempt practice for internet service providers. Consequently less gain occurs (although average connection speed is lower than in the countries with aggressive speed attempts). A different country ordering might be observed for wireless LTE service, but the range of gains can be comparable. Other countries' lower stability improvements in Figure 11 may be caused by less competitive markets and/or the offering of services that are not as bandwidth consumptive. ESM Throughput increase field results (Optimized, Simple Stage 2) over NO ESM (Baseline). Figure 12 instead plots the simple QoS measure of (Wi-Fi here) throughput (so throughput is defined as the volume of user data actually delivered over period of time) for a very simple ODMA Stage 2 ESM.
In this case, the system used largely IEEE 802.11ac components from a lead manufacturer who takes pride in excellent designs, larger number of antennas, advertised speed, and expertise in their RRM methods sold. This baseline 802.11ac system has NO ESM and shows the speed distribution on average over several hundred thousand customers (all powered by a fiber backhaul connection from the Wi-Fi access point so there was no "slow copper" limiting the throughputs. Also note these throughputs are generally much lower than speeds normally advertised for Wi-Fi connections). The optimized system uses a very early form of simplified Stage 1 ESM that was possible to impose on the system. 27 Low throughputs tend to correlate to poor QoE and the brown areas show a reduction by over 3x the number of such very low throughputs (and correspondingly a shift towards higher average speeds throughout the deployment).
The authors at present have no field results on Stage 3 ESM because that would require equipment that supports it, which at time of publication does not yet exist.

B. Migration Paths for ESM Application Interfaces
Subsection V-A cites an issue of ESM-compatible management interfaces (sometimes called "application programmer interfaces or "API's") for wireless equipment. Each ESM Stage requires increasingly more information or data from (and may provide somewhat more information or control policies to -or "controls" for short) the radio node and tacitly from/to its subtended devices. This subsection attempts to enumerate those information flows for consideration by standards groups, forums, or manufacturers who might consider providing such "ESM-compatible" interfaces.
This paper uses an index k to represent time in observation intervals. Data and controls to/from the LRM are thus associated with k in observation intervals since initialization. 28 Typical intervals might be 15 minutes, 5 minutes, or 27 The integrated-circuit manufacturer here was highly resistant to opening their interface to allow reasonable ESM, but it was possible to circumvent that resistance through the appreciated assistance of the box manufacturer and the internet service provider involved to allow some Stage 1 and 2 ESM. 28 If cyclo-ergodic periods occur then k may also be indexed as kτ where τ is a cyclic or peak/valley index corresponding to statistically different epochs, which each though show ergodicity with respect to the value of k. 30 seconds. The radio node's time-stamp should accompany each information flow from the radio node to the LRM. This should be the absolute k index of the first transmission that used Figure 8's associated transmit energy E u,X (k), g u,X (k) and corresponding M CS (k). For Stage 1 and 2 ESM, the time instant would be the beginning of packet data transmission corresponding to the parameters and such transmission's duration in symbols. For Stage 3, the observation interval index should align with some known and fixed multiple of the corresponding established common symbol clock. Stage 1 and 2 do not need absolute accuracy of a common symbol clock, and drifts or changes manifest themselves in changes of the distribution p g .
1) Flows to the LRM: For E u,X (k), the radio node can report essentially the power spectral density for each used band X and user u (those not reported should be zeroed for all ESM stages). For instance a transmit power of 17 dBm in a single channel 20 MHz corresponds to the power spectral density of 17 − 73 = −56 dBm/Hz. If that same power is equally distributed to two 20-MHz-wide channels, the reported energy is E u,X (k) = −59 dBm/Hz. If 4 antennas were used (with 4 spatial streams) equally on the same 20 MHz channel, the number would be −62 dBm/Hz for each. These numbers apply to both LTE and Wi-Fi. The reasonable range of transmit powers might range from as high as +45 dBm/Hz in LTE systems to perhaps as low as −93 dBm/Hz in Wi-Fi (with any smaller value causing no report and thus "0" energy emitted in that band) in 0.5 or 1 dB steps. A good ESM transmit reporting might include the energy in adjacent bands after filtering if known (since sidelobe energy is probably not zero) and could be large in a high-powered system.
Presumably this transmit power will eventually be controlled by ESM politely, but it should also be reported because ESM policy guidance may be ignored (or may need to be calibrated relative to issued guidance).
The parameter g u,X (k) is probably the most challenging in that present systems do not report it, despite it being essentially the well-quoted SINR (signal to interference-and-noise ratio) in technical documents, normalized to unity transmit power. However, as in Subsection 2.3, LTE systems for instance do compute a parameter called RSRQ (Reference Signal Received Quality) that can be used to compute The RSRQ u,X is derived from LTE's reported RSRP u,X (Reference Signal Received Power, See Section II) channel output signal power for certain received training signals that are specific to the radio node (color) and measured by the receiver during training sequencies (or for inserted pilot/reference signals in LTE). RSRQ u,X is then the ratio of this to the total power received or RSSI u,X (Received Signal Strength Indicator). Section II-C explains this area further. Wi-Fi does not appear to report this RSRQ u,X quantity nor RSRP u,X although it is internally necessary in some form for all systems. The LRM can learn or estimate it from reported MCS values if the code is known, but reporting is more desirable. Ideally Wi-Fi's future reporting 29 of an SIN R u,X (k) (along with E u,X (k)) for Wi-Fi would be helpful because it allows direct calculation of g u,X (k) by the LRM. The MCS parameters are known and exchanged in all wireless sytems of interest and simply need reporting to the LRM (these are typically a finite number of options specified in standards). Table VI(a) illustrates these parameters for potential assistance to groups considering standardizing or agreeing on their use. In the ESM case, at least the current M CS (k) should be supplied; however, also initial supply of the allowed values would be helpful.
Stage 3 downlink transmission requires each radio node to measure the interference from others using the known transmission packets that are used for training and or reference/pilots. The radio node (and devices within) needs to support such measurement (that is, implement it) and then report it as a complex vector of measured gain/phase channel coefficients (for a single antenna at that device) h u . Preferably this is measured when the relevant radio node is silent. The same methods that are used today for measuring its own such complex vector (same color) can be used for a different color in this situation. The value then is reported either directly from the measuring device to the LRM or indirectly through the radio node and then to the LRM. The LRM can then average this over time. If multiple values are observed in a single observation interval, then the average can be supplied. A power-delay-profile-like variance about this mean for each h u tap would also be useful. Stage 3 uplink requires no reporting of coefficients.
Similar parameters have been proposed as part of a Wi-Fi Alliance project in [48].
2) Flow from the LRM: The LRM's guidance functions will also need a time index for first implementation and thereafter of any guidance (or change in guidance). This index would be the same index and resolution as that used in flows to the LRM. Namely, policy enforcement commences on k ≥ k. k needs to account for some implementation delay.
The functional specification of energy can trivialize in Stage 1 IW to the specification of the water-level for each band, a constant K u for all bands such that a binary band-use indicator control i u,X is positive if it is assumed that MCS parameters are also specified simultaneously. Table VI(b) lists the recommended controls. This may simplify early ESM systems' implementation. Stage 2 expands to a tabular specification of energy as E g,X for each user sent to the radio node. The locally measured channel gain (for instance computed locally by (66)) is used as the index to the table (shown as size M in Subsection III-B). These would correspond to the number of partitions of the gain range to be used in computing the probability distribution p g . Subsection II-E-1 suggests one such range |G u |.
Table VI(b)'s MCS parameters show a full table indexed by g u,X , although the relative state-machine offset described in Subsection IV-B would be a simpler way to achieve the same specification and make it relative also to local practice of the radio node and its client devices (which may not otherwise be open by manufacturers to say what they do). Stage 3 ESM could require, in the uplink case only, an indication to the uplink radio-node receiver of which color other-node user is to be treated first, second, third in the nominal internal QR factorization (or determination of precoder). This is not strictly needed by the LRM for uplink Stage 3, but could be used for energy allocation guidance in systems with mixed ESM stages or where ESM Stage 3 vectored cancellation is imperfect. Stage 3 ESM downlink requires the same prioritization and indication, but in this case will be based on the h u supplied. This is sometimes called Xlin in standards (X for crosstalk, and linear specification).

C. Synchronization Thoughts
Stage 3 ESM requires a common-symbol clock. While interuser phase shifts of a few samples in OFDM systems (with their "guard intervals") will not cause excessive interference increase, the symbol frequency needs to be common and accurate. This common symbol clock is most useful in relatively stationary environments (the users are not moving or their movement is slow). In these cases, Stage 3 is possible. If there is no common clock, only Stage 1 or 2 is feasible. The LRM cannot be the source of the common clock, which means that Stage-3-ESM-compatible radio nodes would need to accommodate such a symbol clock in Wi-Fi (LTE systems already have such a common clock). However, it can indicate a control synchronization to i = u, effectively leading to a master clock confirmation at the LRM. Even if this common clock drifts with time, the average spatial inter-relationships should remain stable if the carrier clock is stable (for instance a drift of 10-100 parts per million in clock corresponds to a spatial shift of 10 −6 or 10 −5 of a wavelength (which is typically small for most radio systems) and thus have small effect on the stationary object's relative spatial-position appearance.
A simple process to establish such a Wi-Fi common clock follows: First a Stage 3 ESM-compliant radio node need only be phase-locked to a common symbol clock when it desires transmission (otherwise it is silent, creating no interference nor sensing any). Any radio node today capable of collision detect (Wi-Fi) can "listen before it talks" and instead of waiting a random period of time, this node can continue reception for the sole purpose of phase locking to the largest interference it senses. This radio node then transmits on that same symbol clock (at energies or with notches/nulls accordingly observed). It also reports who to the LRM. In turn, any other radio node that subsequently has traffic would follow the same procedure. Any hidden radio nodes from a first radio node would eventually synchronize to the same symbol clock (unless they never experienced a silent period, which has probability zero). Any non-ESM radio nodes would affect themselves through the channel gains measured and lead the guidance to accommodate that interference. In a situation where no synchronization occurred, the performance would fall back to Stage 2/1 or even to an existing collision-detection system with consequent performance. When all interfering systems report synchronization, Stage 3 ESM control policies can be issued by the LRM.

VI. CONCLUSION
ESM's learned exploitation of wireless-network's statistical consistencies can help reduce costs of existing RRM industry drives towards concentrated edge computing/reaction. ESM also can remove the need to have the as much computation for RRM at the edge because part of the computational responsibility then moves to the cloud. ESM methods may have largest performance advantage when compared to collision-detection methods in unlicensed spectra (like Wi-Fi) , but also provide some improvement on more centrally coordinated systems like LTE's 4G/5G by allowing artificially intelligent dimensional re-use by simultaneous users. Increasingly sophisticated ESM stages could be accommodated by relatively minor adjustments to management interfaces, and ensuring they are available to cloud/internet servers (even if on slow control paths), that accommodate increasing ESM gain. This also allows a significant improvement on all systems, including LTE edge RRM despite RRM's mobile-edge high-computational requirement. ESM also provides a base upon which better QoE can be accommodated and allows indeed movement of users/devices across many bands/regions as they roam. ESM provides a costeffective alternative to low-latency-only management schemes that merits and motivates further investigations as wireless networks evolve. ESM also further advances the industry direction toward efficient high-performance wireless now partially supported by Wi-Fi 6 and 5G. A range of practical matters, such as precise specification, accuracy of such specification, etc. is best undertaken by standards groups.