The sparsity of images in a transform domain or dictionary has been exploited in many applications in image processing. For example, analytical sparsifying transforms, such as wavelets and discrete cosine transform (DCT), have been extensively used in compression standards. Recently, synthesis sparsifying dictionaries that are directly adapted to the data have become popular especially in applications such as image denoising. Following up on our recent research, where we introduced the idea of learning square sparsifying transforms, we propose here novel problem formulations for learning doubly sparse transforms for signals or image patches. These transforms are a product of a fixed, fast analytic transform such as the DCT, and an adaptive matrix constrained to be sparse. Such transforms can be learnt, stored, and implemented efficiently. We show the superior promise of our learnt transforms as compared with analytical sparsifying transforms such as the DCT for image representation. We also show promising performance in image denoising that compares favorably with approaches involving learnt synthesis dictionaries such as the K-SVD algorithm. The proposed approach is also much faster than K-SVD denoising.