In this work, a new framework for channel estimation in MIMO OFDM systems is provided. Sparse channel estimation refers to estimating the time domain channel impulse response by exploiting the fact that the channel has a very few nonzero taps. We formalize the problem and drive necessary and sufficient condition on the number of pilots for perfect channel recovery which leads to a L0 norm optimization problem. A practical suboptimal solution is proposed that is a modified orthogonal matching pursuit (OMP) which exploits the sparsity structure of the MIMO channel. The investigations reveal that the training overhead can be drastically reduced while maintaining the same accuracy as the current state of the art techniques.