Skip to Main Content
The probability of observing xt at time t, given past observations x1...xt-1 can be computed if the true generating distribution μ of the sequences x1x2x3... is known. If μ is unknown, but known to belong to a class ℳ one can base one's prediction on the Bayes mix ξ defined as a weighted sum of distributions ν ∈ ℳ. Various convergence results of the mixture posterior ξt to the true posterior μt are presented. In particular, a new (elementary) derivation of the convergence ξt/μt → 1 is provided, which additionally gives the rate of convergence. A general sequence predictor is allowed to choose an action yt based on x1...xt-1 and receives loss ℓx(t)y(t) if xt is the next symbol of the sequence. No assumptions are made on the structure of ℓ (apart from being bounded) and ℳ. The Bayes-optimal prediction scheme Λξ based on mixture ξ and the Bayes-optimal informed prediction scheme Λμ are defined and the total loss Lξ of Λξ is bounded in terms of the total loss Lμ of Λμ. It is shown that Lξ is bounded for bounded Lμ and Lξ/Lμ → 1 for Lμ → ∞. Convergence of the instantaneous losses is also proven.