I. Introduction
We investigate sequential compound decision problems that arise in several different signal processing [1]–[5], information theory [6], [7] and machine learning applications [8]– [11]. In particular, we sequentially observe a real valued sequence and produce a decision (or an action) at each time as our output based on the past . We then suffer a loss based on this output when the true is revealed and our goal is to minimize the (weighted) accumulated or expected loss as much as possible while using a limited amount of information from the past. As an example, in the well-known sequential prediction problem under the square error loss, the output at time corresponds to an estimate of the next data point , where the algorithm suffers the loss after , i.e., , is revealed. The algorithm can then adjust itself in order to reduce the future losses. This generic setup models a wide range problems in various different applications ranging from adaptive filtering [12], channel equalization [13], repeated game playing [9] to online compression by sequential probability assignment [14], [15].