Perceptual image quality metrics have explicitly accounted for human visual system (HVS) sensitivity to subband noise by estimating just noticeable distortion (JND) thresholds. A recently proposed class of quality metrics, known as structural similarity metrics (SSIM), models perception implicitly by taking into account the fact that the HVS is adapted for extracting structural information from images. We evaluate SSIM metrics and compare their performance to traditional approaches in the context of realistic distortions that arise from compression and error concealment in video compression/transmission applications. In order to better explore this space of distortions, we propose models for simulating typical distortions encountered in such applications. We compare specific SSIM implementations both in the image space and the wavelet domain; these include the complex wavelet SSIM (CWSSIM), a translation-insensitive SSIM implementation. We also propose a perceptually weighted multiscale variant of CWSSIM, which introduces a viewing distance dependence and provides a natural way to unify the structural similarity approach with the traditional JND-based perceptual approaches.