Multiscale representations of images have become a standard tool in image analysis. Such representations offer a number of advantages over fixed-scale methods, including the potential for improved performance in denoising, compression, and the ability to represent distinct but complementary information that exists at various scales. A variety of multiresolution transforms exist, including both orthogonal decompositions such as wavelets as well as nonorthogonal, over-complete representations. Recently, techniques for finding adaptive, sparse representations have yielded state-of-the-art results when applied to traditional image processing problems. Attempts at developing multiscale versions of these so-called dictionary learning models have yielded modest but encouraging results. However, none of these techniques has sought to combine a rigorous statistical formulation of the multiscale dictionary learning problem and the ability to share atoms across scales. We present a model for multiscale dictionary learning that overcomes some of the drawbacks of previous approaches by first decomposing an input into a pyramid of distinct frequency bands using a recursive filtering scheme, after which we perform dictionary learning and sparse coding on the individual levels of the resulting pyramid. The associated image model allows us to use a single set of adapted dictionary atoms that is shared-and learned-across all scales in the model. The underlying statistical model of our proposed method is fully Bayesian and allows for efficient inference of parameters, including the level of additive noise for denoising applications. We apply the proposed model to several common image processing problems including non-Gaussian and nonstationary denoising of real-world color images.