We consider the problem of maximizing the expected sum of n variables Xk chosen sequentially in an i.i.d. sequence of length N. It is equivalent to the following resource allocation problem: n machines have to be allocated to N jobs of value Xk (k=1,…,N) arriving sequentially, the ith machine has a (known) probability pi to perform the job successfully and the total expected reward must be maximized. The optimal solution of this stochastic dynamic-programming problem is derived when the distribution of the Xks is known: the optimal threshold for accepting X k, or allocating the ith machine to job k, is given by a backward recurrence equation. This optimal solution is compared to the simpler (but suboptimal) open-loop feedback-optimal solution for which the threshold is constant, and their asymptotic behaviors are investigated. The asymptotic behavior of the optimal threshold is used to derive a simple open-loop solution, which is proved to be asymptotically optimal (N→∞ with n fixed) for a large class of distributions for Xk
Published in:
Automatic Control, IEEE Transactions on
(Volume:46
,
Issue:
5
)
Date of Publication: May 2001