Sparse signal representation, analysis, and sensing have received a lot of attention in recent years from the signal processing, optimization, and learning communities. On one hand, learning overcomplete dictionaries that facilitate a sparse representation of the data as a liner combination of a few atoms from such dictionary leads to state-of-the-art results in image and video restoration and classification. On the other hand, the framework of compressed sensing (CS) has shown that sparse signals can be recovered from far less samples than those required by the classical Shannon-Nyquist Theorem. The samples used in CS correspond to linear projections obtained by a sensing projection matrix. It has been shown that, for example, a nonadaptive random sampling matrix satisfies the fundamental theoretical requirements of CS, enjoying the additional benefit of universality. On the other hand, a projection sensing matrix that is optimally designed for a certain class of signals can further improve the reconstruction accuracy or further reduce the necessary number of samples. In this paper, we introduce a framework for the joint design and optimization, from a set of training images, of the nonparametric dictionary and the sensing matrix. We show that this joint optimization outperforms both the use of random sensing matrices and those matrices that are optimized independently of the learning of the dictionary. Particular cases of the proposed framework include the optimization of the sensing matrix for a given dictionary as well as the optimization of the dictionary for a predefined sensing environment. The presentation of the framework and its efficient numerical optimization is complemented with numerous examples on classical image datasets.