A crucial step in the assessment of an image compression method is the evaluation of the perceived quality of the compressed images. Typically, researchers ask observers to rate perceived image quality directly and use these rating measures, averaged across observers and images, to assess how image quality degrades with increasing compression. These ratings in turn are used to calibrate and compare image quality assessment algorithms intended to predict human perception of image degradation. There are several drawbacks to using such omnibus measures. First, the interpretation of the rating scale is subjective and may differ from one observer to the next. Second, it is easy to overlook compression artifacts that are only present in particular kinds of images. In this paper, we use a recently developed method for assessing perceived image quality, maximum likelihood difference scaling (MLDS), and use it to assess the performance of a widely-used image quality assessment algorithm, multiscale structural similarity (MS-SSIM). MLDS allows us to quantify supra-threshold perceptual differences between pairs of images and to examine how perceived image quality, estimated through MLDS, changes as the compression rate is increased. We apply the method to a wide range of images and also analyze results for specific images. This approach circumvents the limitations inherent in the use of rating methods, and allows us also to evaluate MS-SSIM for different classes of visual image. We show how the data collected by MLDS allow us to recalibrate MS-SSIM to improve its performance.