Sparsity has been shown to be very useful in source separation of multichannel observations. However, in most cases, the sources of interest are not sparse in their current domain and one needs to sparsify them using a known transform or dictionary. If such a priori about the underlying sparse domain of the sources is not available, then the current algorithms will fail to successfully recover the sources. In this paper, we address this problem and attempt to give a solution via fusing the dictionary learning into the source separation. We first define a cost function based on this idea and propose an extension of the denoising method in the work of Elad and Aharon to minimize it. Due to impracticality of such direct extension, we then propose a feasible approach. In the proposed hierarchical method, a local dictionary is adaptively learned for each source along with separation. This process improves the quality of source separation even in noisy situations. In another part of this paper, we explore the possibility of adding global priors to the proposed method. The results of our experiments are promising and confirm the strength of the proposed approach.