Skip to Main Content
Little is known about the distributed learning of the global maximum in a stochastic framework when there is no communication between the decisionmakers. The case of two decisionmakers is considered, and prior knowledge is assumed about the expected rewards. The asymmetries that may be present in the reward matrix is captured by the prior knowledge. It is shown that each decisionmaker completely unaware of the other converges to the global optimum with arbitrary accuracy over time.