On a Structural Similarity Index Approach for Floating-Point Data

Data visualization is typically a critical component of post-processing analysis workflows for floating-point output data from large simulation codes, such as global climate models. For example, images are often created from the raw data as a means for evaluation against a reference dataset or image. While the popular Structural Similarity Index Measure (SSIM) is a useful tool for such image comparisons, generating large numbers of images can be costly when simulation data volumes are substantial. In fact, computational cost considerations motivated our development of an alternative to the SSIM, which we refer to as the Data SSIM (DSSIM). The DSSIM is conceptually similar to the SSIM, but can be applied directly to the floating-point data as a means of assessing data quality. We present the DSSIM in the context of quantifying differences due to lossy compression on large volumes of simulation data from a popular climate model. Bypassing image creation results in a sizeable performance gain for this case study. In addition, we show that the DSSIM is useful in terms of avoiding plot-specific (but data-independent) choices that can affect the SSIM. While our work is motivated by and evaluated with climate model output data, the DSSIM may prove useful for other applications involving large volumes of simulation data.

Because lossy compression remains a powerful tool in reducing the enormous volumes of simulation data produced with modern HPC machines, we are particularly interested in the usecase in which IQAs help to discover and quantify image artifacts due to lossy compression.Recall that in contrast to lossless compression, applying lossy compression to a dataset prohibits exact reconstruction of the original data.The SSIM is useful in this context as an objective means of assessing the effects of lossy compression.For example, in the medical imaging field, images are typically compressed to reduce unmanageable data volumes, but clearly the potential loss of critical information, such as details needed to make an accurate diagnosis, is a concern.Concordantly, the SSIM has been advocated as a means of evaluating compressed-medical-image quality in many studies (e.g., see [7], [8], [16], [26], [37]).
The application area of particular interest to us is climate modelling, where simulations are well-known for producing enormous amounts of output data (e.g., terabytes or even petabytes).For a number of years, lossy data compression has been proposed as a means of mitigating the big data problem in climate research (e.g., [3], [11], [17], [38]), though its acceptance in the climate community is far from secured as more comprehensive measurements for evaluating the loss of information are still needed.Given the importance of data visualization to climate scientists interacting with model output, an objective means of assessing whether images generated from the compressed model data are noticeably different from images based on the original model data is critical.Therefore, as part of an effort to persuade climate scientists to adopt lossy compression, we included the SSIM in a suite of measures to evaluate the "quality" of compressed climate simulation data [5].In a follow-up work [4], we proposed a minimum threshold for SSIM values to indicate when differences could be seen when comparing images.This threshold was based on a forced-choice visual evaluation study in which participants indicated whether a visual difference could be seen, with respect to the reference image that was created from uncompressed data.Note that evaluating the impact of lossy compression is a nontrivial task and depends on a number of factors such as the characteristics of the data, the type of compressor algorithm, and the intended scientific analysis of the data.Therefore, multiple evaluation approaches are typically necessary to instill confidence in the compressed dataset.Here, we do not analyze different methods of detecting compressor artifacts, as that has been done previously (e.g., [3], [5], [25]), but rather acknowledge that the SSIM is beneficial in this context and focus on how to reduce its cost.
While the SSIM is undoubtedly useful for objectively comparing images, several shortcomings arise in the context of its use in our compression-related research on large volumes of data.While we address those issues in this manuscript in the context of climate model data, the findings may be applicable to other application areas that rely on large volumes of floating-point simulation data.First, because the SSIM calculation is based on two corresponding grids of pixel values (i.e. a reference and a modified image), rendering images from the dataset values to be compared is required before the SSIM can be computed.Generating many images, for example, from a long time series of climate data, potentially at high spatial resolution, can be quite computationally intensive.Indeed, the computational cost of generating the images from the climate model output required for the SSIM makes the SSIM a much more expensive measure of lossily compressed data quality than other data comparison measures that only require the floating-point data for identifying problematic compressor artifacts.However, we are loathe to abandon the SSIM due to cost considerations given its popularity and documented usefulness in image quality assessment [35] as well as our own positive experiences with it [4], [5].
A second, albeit more minor, motivation is that the SSIM value is naturally dependent on plot parameters such as color scheme or geometric transformation (or other decisions that are not based on information contained in the data) that a scientist may make when creating a plot.As a result, for a particular floating-point dataset, two images that are considered indistinguishable based on their SSIM value given one set of plot parameters, may become distinguishable (again, based on the SSIM) if the plot parameters are changed.A simple example of a plot setting for climate data that affects the SSIM is the global map projection.Therefore, because for our use-case we often do not know how an image will be generated from data, we would like the similarity measure that indicates whether an image created from a compressed dataset is likely to be distinguishable from that generated from the original dataset, to be independent of plot parameters.
Hence, the cost of generating images from the raw data as well as the possible dependence on plotting choices motivates applying the SSIM directly to the climate model's floating-point dataset values, rather than to the pixel values.Unfortunately, simply applying the standard SSIM formula without modification to floating-point data, rather than pixel values of images created from the data, does not always result in desired behavior.However, by making a few relatively simple but critical modifications, which we collectively refer to as the Data SSIM (DSSIM), we obtain a useful measure to apply directly to floating-point datasets that is comparable to the SSIM and better discriminates between the differences in our test datasets.In this paper, we make the following contributions: • demonstrate the effect of basic data modifications and image generation choices on the computed SSIM value to improve our understanding of SSIM value ranges and dependencies; • present a SSIM-like statistic that can be applied directly to floating-point data, thus avoiding the computational expense of rendering otherwise unnecessary images; • and provide an in-depth evaluation that illustrates the method's utility in evaluating the effects of data compression on large volumes of climate data, particularly in terms of cost reduction.
The remainder of this paper is organized as follows.In Section 2, we review the SSIM and demonstrate its dependence on plot and parameter choices.Next, in Section 3, we discuss considerations for floating-point data and introduce the DSSIM approach.In Section 4, we discuss the applicability of the DSSIM in the context of evaluating lossy compression on climate data.We provide concluding remarks in Section 5.

STRUCTURAL SIMILARITY INDEX (SSIM)
As previously noted, full reference IQAs are a popular means for comparing two images, where one image is typically the reference image, against which the quality of the second image is being compared.The IQA value is intended to be an objective measurement of the more subjective concept of how noticeably different the two images are, say to a human observer.While the SSIM was developed to compare the encoding of natural images, we found in previous work [4] that the SSIM showed good predictive ability to gauge when experts perceive differences in images generated from climate model simulation data.In fact, while a number of other IQAs showed good predictive ability, the SSIM IQA measure performed the best.It is important to note that the plots of most interest to the climate community in diagnostic packages, for example, are typically those that smoothly map the floating-point data to RGB values.In other words, pseudocolor plots (or possibly a filled contour -depending on the transfer function) that use a smooth colormap are suitable for comparison with the SSIM.We do not consider scatter plots, data plots with a lot of white space, glyph-based techniques, or, more generally, plots for which a small change in the data causes a large or abrupt change in the image.Also note that the SSIM can be sensitive to accessories on the plot such as plot grids, labels, etc. [32], which should be removed for comparison purposes.Figure 1 contains an example of the type of pseudocolor plots that climate scientists typically create for a commonly used climate variable.Both plots in the figure show a single time slice of surface temperature (TS) data; the top plot contains the original (not compressed) data and the bottom plot contains data that has been aggressively (lossily) compressed such that artifacts are clearly visible.The application dataset from which we obtained these data and the compressor are described in Section 4.

Method overview
The SSIM enjoys widespread use across a number of disciplines.While some recent works cast doubt on the popular notion that the SSIM truly represents human visual perception (e.g., [6], [21]), it nevertheless remains very popular in practice, due in large part to its simplicity and usefulness as a statistical measure [31].The SSIM is the product of three factors that are intended to represent luminance, contrast, and structure.Consider comparing two 2D images X and Y, each of dimension m x × m y with M = m x m y pixel values.The SSIM is computed by first calculating so-called perpixel SSIM values comparing local patches (or windows) of the images.Let x i and y i be local image patches (i.e., 2D arrays that have been flattened) taken from the same location in X and Y, respectively.Subscript i indicates the pixel index in X and Y that is at the center of the local window (i ≤ M).And, let N = n 2 be the number of pixels in the local window.Then, in the local window centered at pixel i with x i and y i containing N pixel values, the per-pixel SSIM value is where l(x i , y i ) is the luminance term, c(x i , y i ) is the contrast term, and s(x i , y i ) is the structure term.Parameters α, β , and γ are for adjusting the relative importance of the three terms).The luminance, contrast, and structure terms for each local patch i use the means (µ x i , µ y i ), variances (σ x i , σ y i ) and covariance (σ x i y i ) that are computed on the local window, typically with Gaussian weights, from the arrays x i and y i : Constants C 1 , C 2 and C 3 are chosen to provide numerical stability by avoiding a zero denominator.Note that the choice C 1 = C 2 = C 3 = 0 is equivalent to the universal quality index [34] or UQI, which is a precursor to the SSIM.To simplify (1), the following assumptions are suggested in [36]: This yields a simpler form of the per-pixel equation (1): where and Then the SSIM for the entire image, SSIM(X, Y), which is sometimes referred to as the mean SSIM, is the average of the per-pixel SSIM values calculated for each local window i: The SSIM value has a couple of important properties.First, SSIM(X, Y) = 1 if and only if X = Y.Also, −1 ≤ SSIM(X, Y) ≤ 1, and the closer SSIM(X, Y) is to 1, the more similar the images are.In practice, most calculated SSIM values are positive, with a negative value only occurring when the covariance term is negative (assuming nonnegative pixel values).

Implementation
For the implementation proposed in [36], the authors suggest the following constant values: where K 1 = 0.01, K 2 = 0.03, and L is the dynamic range of the pixels (so L = 255 for 8-bit images or L = 1 if the image range is [0, 1]).The authors note that the choice of these constants is "somewhat arbitrary," but claim that the SSIM is "fairly insensitive" to their values [36].The constant values are of particular interest when applied directly to floating-point simulation data, as discussed in the next section.Another implementation detail is the local window (or patch) for which the per-pixel SSIM value statistics (mean, variance and covariance) are computed in (8).The recommendation in [36] is an N = 11×11 window with a Gaussian filter kernel.Note that for the windows centered on the pixels at the boundaries of the image (i.e., within five pixels of the edge for the 11 × 11 kernel), the Gaussian filter requires special treatment to handle the missing values (i.e., outside the image boundary).However, in the implementation in [36], these per-pixel SSIM values from the edge regions are simply excluded in the averaged SSIM value in (8), which could lead to reduced emphasis on pixels near the edges of an image.
Because the SSIM is quite popular, many implementations are available.We use the Python implementation of the SSIM that is available via SCIKIT-IMAGE [30].This version closely follows that of the simplified SSIM from [36], given here in (5), using the default suggested parameters for the constants C 1 and C 2 and the Gaussian kernel size 11 × 11.This implementation similarly ignores the border per-pixel SSIM values when computing the overall mean SSIM.Note that for this SCIKIT-IMAGE version, one must specify "gaussian weights=True" and "use sample covariance=False" to match the implementation in [36].

SSIM value dependencies
The computed SSIM value for a pair of images naturally depends on a number of plot and SSIM parameter choices that are independent of the floating-point data.Here we give a few examples to demonstrate the effect of these choices on the computed SSIM for two commonly used climate variables: surface temperature (TS) and precipitation rate (PRECT).Recall that the original TS data is shown in the top panel of Figure 1, and Figure 2 displays variable PRECT data.These plots are representative of the types that climate scientists typically create and were generated with the LDCPY 1 Python package, which uses MATPLOTLIB [12] and CARTOPY [20].While our test data is further detailed in Section 4, note that these two sample variables have quite different characteristics: variable TS is relatively smooth and has a modest-sized range ≈ 100K, while PRECT contains zero and near-zero values, changes more abruptly, and spans several orders of magnitude.
For reference, we briefly familiarize ourselves with the range of SSIM values that result from comparing climate variable images such as these.Table 1 lists the SSIM value in the third column that results from comparing the original TS data, as shown at the top of Figure 1, with several test data cases (columns 4 and 5 are discussed in later sections).For example, the LOSSY case uses the lossily compressed data shown in the bottom of Figure 1, and the visual difference due to the rather aggressive choice of compression is quite obvious, resulting in an SSIM value of 0.93453.The remaining test data cases are generated as follows.We refer to the original TS data as ts orig and its maximum and minimum values as max and min, respectively.The data for the inverse (INV) case are computed by inv data = max − ts orig + min.Data for the RAND case are set to randomly generated values between min and max, and data for MEAN are constant values equal to the mean over all data in ts orig.Finally, the MIN data are all equal to min, ZERO indicates an array of all zeros, and PERT adds a random perturbation to ts orig from [1.0e-7, 0.1] Note that when comparing two images created from simulation data with the SSIM, one needs to ensure that both figures use the same colorbar, meaning that the transform from data to image space must be the same.For the tests in Table 1, we use the same colorbar shown in Figure 1 with its extents set to the minimum and maximum values from the two datasets being compared.
The SSIM values for these test cases in column 3 of Table 1 may not be immediately intuitive.For example, comparing to all zeros does not result in a SSIM value of zero.In fact, because the comparison uses the same colorbar for the transformation to image space, the ZERO test case is more visually similar to the original than one would think.In particular, because the colorbar range has been expanded to include zero, the original data visually appears much closer to constant-valued than in the top of Figure 1.Thus, when compared to the constant of zeros values, the SSIM value  is surprisingly far from zero (.72804).Interestingly, the MIN and ZERO cases have nearly the same SSIM value, both of which are noticeably lower than that of the MEAN case.Considering that the MEAN is also constant-valued, one might desire the SSIM to be lower than it is (.90384).Indeed, that the SSIM of MEAN is quite higher than that of the MIN is not necessarily intuitive, as approximating the original data by a constant-valued array of the mean or minimum would both make for quite a poor compressor in terms of visual quality.The RAND case is by far the worst according to the SSIM, which is rather expected as all structure is gone.The INV case is interesting as well as structure has been maintained in some sense, but the data are quite different.These test cases are presented not to draw sweeping conclusions about the SSIM, but simply to illustrate what the SSIM value range may look like for a particular variable.We now explore the effects of plot choices and SSIM constants on the SSIM calculation for both TS and PRECT and list the results in Table 2.Each row in the table compares the same two sets of data for both TS and PRECT.In particular, the TS and PRECT data used in this section are included in the LDCPY package in data/cam-fv directory (from NetCDF files zfp1e-1.TS.100days.nc,orig.TS.100days.nc,zfp1e-7.PRECT.60days.nc,and orig.PRECT.60days.nc).The SSIM values in subsequent rows differ from the default due to either choices in the way the plots are generated or the SSIM is calculated; the underlying floating-point data is the same for each row.The first row of the table, labeled "default," lists the result of computing the SSIM via LDCPY, which uses the previously mentioned SCIKIT-IMAGE SSIM implementation.Recall that the default images for the original TS and PRECT variable data are given in Figure 1 (top) and Figure 2, respectively.The compressed data that we compare against are not shown as they have been chosen to be similar enough that differences cannot be seen for TS at this scale and are unlikely to be noticed for PRECT either, as indicated by the relatively high SSIM values for TS and PRECT: .99985and .99153,respectively.We purposely chose a default case with high SSIM values for each variable as values near a potential SSIM cutoff threshold (e.g., that indicates whether differences are noticeable to a human) are of most interest for our application.This concept of a cutoff threshold is discussed in the context of climate data in Section 4.
In the top half of Table 2, a subset of the parameter choices used to create these default plots are listed in the second column.For each row after the "default" case, the parameter that is changed when creating both the original and compressed images is indicated in the third column, labeled "modification".For reference, plots for a subset of these modifications for TS are displayed in Figure 3.The differences in SSIM values give an idea of the effect of the plotting choices for these two sample variables.Other climate variables may be more or less sensitive to these changes, but that is not our focus.Note that while the first two significant digits are the same (.99) in each row for TS, we are interested in five significant digits to match the number of digits in the SSIM threshold from previously mentioned quality measures for climate compression [4].The second row shows that removing the coastlines does not have much effect on the SSIM for TS, but a bit more for PRECT.Enlarging the extents on the colormap (Figure 3a) as in row 3 of Table 2 moves the SSIM closer to 1.0 for both variables, as would be expected.Figure 3b shows the plot generated using contourf() instead of pcolormesh(), which also has a small effect for TS but a larger effect for PRECT as seen in Table 2 (row 3).The type of data projection onto a map, which is common for climate data, influences the SSIM as well.
The equal-area map projection (row 6) actually has more of an influence for TS (Figure 3c) than for PRECT.On the other hand, the often used equirectangular projection (row 7) has no effect on TS (Figure 3d), but does affect PRECT.The last two rows in the top half of the table illustrate that using a different colormap can definitely affect the SSIM, particularly when it is quite different from the original, such as that shown for TS in Figure 3e.The colormap changes the SSIM less (but still notably for PRECT) when it is similar to the original (Figure 3f).Note that while the SSIM is not influenced by the color or hue of an image [10], [31], [32], when we encode the floating-point data into a colormap, the characteristics of the colormap can influence the SSIM [32].For example, because the prism color map is more segmented than the default, this affects the SSIM more [32] than the cool colormap which is more similar to the default coolwarm map.Other factors that we have not yet mentioned that have been shown to affect the SSIM include the size of the local window, whether or not to use Gaussian weights, and how the edge pixels are treated [14].
The bottom half of Table 2 focuses on the SSIM constants K 1 and K 2 as defined in (9).While it is customary to use the recommended defaults proposed in [36], we have found the SSIM calculation to be sensitive to the values for K 1 and K 2 (which in theory should only contribute to numerical stability) in some cases.We note that the SSIM's sensitivity to constants is shown in [27] as well, particularly the sensitivity to K 2 , which is what we found with our climate data.Recall that the first row in Table  2.
2 (labeled "default") gives the SSIM values for the default values of K 1 and K 2 .For the number of significant digits that we list, only the effects of changing K 2 are noticeable.Indeed, Table 2 shows that for the PRECT data, changing K 2 can cause quite a large difference, exceeding anything shown in the top half of the table by orders of magnitude.
For our use case of interest and likely others, the observation that the SSIM calculation as in ( 5) is more sensitive to K 2 than K 1 meets expectations.In practice, the SSIM is generally used to compare images that are supposed to be similar in some sense (e.g., in our case, differing only in compressor-induced artifacts).Therefore, it is reasonable to assume that the means of the two images will be quite similar.As a result, for the first term in the SSIM, S 1 as given in (6), if µ x i and µ y i are nearly the same over the local window, then the value of C 1 (and therefore K 1 ) is unimportant, and S 1 will be nearly one.However, the situation is less clear for the second term, S 2 as given in (7), in the SSIM calculation.The magnitude of the numerator and denominator statistics, σ x i y i and (σ 2 x i + σ 2 y i ), respectively, may be less stable than the mean over the 11 × 11 window, in which case the value of C 2 (and thus K 2 ) becomes more influential.This influence is important as we now move to applying the SSIM directly to the floating-point data.
A few considerations that we do not specifically address are related to the grid that the floating-point data live on.When an image is created from gridded data, the image may have either more or fewer pixels than grid points -depending on the chosen image resolution and the data grid size.In addition, we are assuming that we have structured grid data.

APPLICATION TO FLOATING-POINT DATA
Recall that our primary interest in the SSIM is as a tool for evaluating the effects of lossy compression on climate simulation data.While we also use other metrics to evaluate compression quality, the SSIM has proven useful for quantifying visual differences in images created from climate data [4].The primary advantage of and motivation for applying an SSIM-like statistic directly to the floating-point data (rather than the image pixel values) is the reduction in computational cost associated with generating images, particularly when data volumes are large and the images created are not actually needed for any other purpose.A second more minor advantage in operating directly on the floating-point data is that we can avoid plot-specific but dataindependent (color, scale, axes, grid transform, etc.), which may result in different SSIM values for images created from the same datasets.To summarize, then, our goal is to determine whether we can apply the SSIM to the raw simulation data and obtain an indication of whether differences in the data are likely to impact a visual assessment, without committing to the creation of a specific set of images from the data.
Finally, in developing a modified SSIM for floating-point data, we were motivated to rethink the choices for the constants.While the SSIM constants were introduced to prevent dividing by zero [36], we would prefer that they do not noticeably affect the SSIM values, as shown in the previous section for PRECT especially.In particular, if the SSIM statistics in the local window (i.e., mean, covariance, variance) are close to zero for the floatingpoint data, then the constants may have an out-sized effect.This characteristic in not uncommon when comparing to data with modest compression.Further, it is known that SSIM values tend to saturate toward one, and we see this effect even when the data are quite different, as for TS in Table 1.By making the constants less influential, we can spread the range away from one.
In this section, we first describe applying the SSIM formula, without modification, directly to floating-point data and explain why further modifications were desired.We then discuss the further modifications that collectively result in the DSSIM and how they are useful.We compare the SSIM variants on climate model data with compression in Section 4.

A straightforward approach (SF-DSSIM)
The straightforward approach to extending the SSIM to floatingpoint data is simply to use the SSIM equation in (8) to compare the 2D arrays with M grid points, where arrays x i and y i now contain the floating-point values in the local window of size N, where typically N = 11 × 11, and are centered at grid location i.The constant definitions and defaults in (9) remain the same, but now L is the dynamic range of the floating point data.We refer to this variant as straightforward data SSIM, or SF-DSSIM.
There are a number of considerations when applying the SSIM formula directly to floating-point data instead of to pixel values.First, the suggested SSIM values for K 1 and K 2 may not be appropriate for every dataset.While for the SSIM, the pixel range L is typically L=255 or L=1, for floating-point numbers, the range may be much larger.If the range is quite large, then the suggested values for K 1 and K 2 may result in constants that are too big compared to the data values at some locations.This situation is of concern as the constants are only meant to prevent division by zero.Another scenario is that in which the dynamic range for a set of data could be L=1, with all values in [0, 1], including many very small near-zero values (e.g., of order 1e-20) as happens for the previously mentioned PRECT variable.In other words, if the data range is small but the range of exponents is quite large, then the constants will dominate the SF-DSSIM computation.This size mismatch results in SF-DSSIM values of one or near one, even when the two datasets are quite different.(Recall that the SSIM value is 1.0 only when the two images are identical.) Two more minor considerations when dealing with floatingpoint simulation data include that a NaN (or fill value or missing value) can be encountered in the data, which is common for climate data.In this case, we do not want such values to propagate to the entire local window when the Gaussian filter is applied, and the code must handle this situation.Another point is that while pixel values are typically nonnegative, floating-point values are often negative (or a mix of positive and negative).Therefore, the situation where the sign of means µ x i and µ y i are opposite can occur and cause S 1 (x i , y i ) as given in (6) to be negative.
In Table 1, the second column from the right lists the SF-DSSIM values for the modified TS datasets.Note that the MIN and ZERO cases have nearly the same SSIM values, but quite different SF-DSSIM values, which may arguably make more intuitive sense in terms of a comparison to all zeros resulting in a nearly zero SSIM value.For the PERT case, the SF-DSSIM is actually closer to 1.0 than the SSIM value.As will be discussed further in the context of climate data, this behavior is largely the result of the quantization step in rendering the images for the SSIM calculations.This particular perturbation results in quantization bin changes (which can increase the difference) for some of the values perturbed at the high end of the interval.If the perturbation values were all small enough, e.g. in [1.0e-7, 1.0e-5], then the SSIM and SF-SSIM would both be 1.0 (to five significant digits).

Data SSIM (DSSIM)
We now explain the modifications to the SF-DSSIM approach that collectively result in our proposed variant of the SSIM for floatingpoint data, which we refer to as DSSIM.To begin, we normalize both sets (the original and that to compare) of floating-point data to the range [0, 1].We normalize the data for a couple of reasons.First, normalizing to this range makes determining appropriate constants, which we discuss shortly, much easier.This step both eliminates the need for the L term in the constants in (9), as L = 1, and ensures that S 1 (x i , y i ) is nonnegative, as is typically the case with the SSIM (meaning that a negative value can only result from a negative covariance).Also, when visualizing floating-point data, Fig. 4: This plot shows the effect of modifying the value of K 1 (= K2) when computing the SSIM, SF-DSSIM, and DSSIM for comparing the original surface temperature (TS) data to the lossily compressed data with compressor ZFP and p=8.) the first step is to transform the data to the color bar range.We assume a linear transform and note that pixel values are often normalized so that each pixel value has a value between 0 and 1.
In determining the choice of constants for DSSIM, recall that the two constants C 1 and C 2 are intended to provide numerical stability.We simply want C 1 and C 2 to be small enough to not disproportionately influence the value of the DSSIM, yet big enough to prevent dividing by zero.We set them to equal values largely for convenience: C 1 = C 2 and K 1 = K 2 .Therefore, because L = 1 for the DSSIM, we have C 1 = K 2  1 , and we find that is a reasonable choice for DSSIM.We verified this choice for our application data by examining the influence of changing the constant on the DSSIM calculation for a number of different variables and compressor levels and finding the largest constant value that no longer influences the DSSIM value.For example, in Figure 4, we show the effect of changing the constant values for the SSIM, SF-DSSIM, and DSSIM when comparing the original TS data with the lossy compressed version shown in Figure 1 and listed in the first row of Table 1.The dashed lines indicate the default value of the three SSIM approaches, which is at approximately K 1 = K 2 = .01on the plot for SSIM and SF-DSSIM ("approximately" because K 2 = .03 in these methods, which results in a subtle difference).The plot shows that the DSSIM value no longer decreases with decreasing constant at K 1 = .0001,and this behavior is representative of what we observed for test data from other variables.While this choice is likely reasonable for data from other application areas as well, verifying that the computed DSSIM values are not sensitive to the constants is certainly easy to check.Another difference for DSSIM is that after normalizing the floating-point data to the range [0, 1], DSSIM quantizes the data into 256 bins.The quantization mimics a linear color map transformation by allowing the DSSIM to use a similar precision on the floating-point data as the SSIM is using on image pixel data.This step is particularly needed when comparing data that is quite visually similar, meaning that the SSIM value is quite close to one.We demonstrate and further explain the effects of this modification for climate variables in the next section.
Finally, we address any NaN values (or fill or missing values, which we hereafter also refer to as NaNs) present in the data when the Gaussian kernel is applied locally to each 11 × 11 window.If the center point of the window (grid point i) is NaN, that window calculation is simply excluded from the final mean SSIM calculation given in (8).However, if the value at the center of the window is not NaN but any of the other local values in the window are NaN, the filter must be modified (otherwise the DSSIM(x i , y i ) value will be set to NaN).In the convolution and 2D Gaussian kernel functions in the ASTROPY 2 package [2] [1], NaNs are replaced by interpolating from neighboring data points within a given kernel.In particular, for the kernel, DSSIM uses Gaussian2DKernel(x_stddev=1.5, x_size=11, y_size=11) and then we convolve with filter_args = 'boundary': 'fill', 'preserve_nan': True.The boundary option for convolve is not important in the current implementation as the DSSIM values whose windows extend past the boundary are ignored when computing the mean DSSIM over the grid -as with the SSIM.Note that SCIPY convolution routines do not properly deal with NaNs at this time, but our DSSIM implementation, available in the previously mentioned LDCPY package, does.
Referring back to Table 1, the rightmost column lists the DSSIM values.Note that the negative value for the INV case helpfully reflects the negative correlation in the data.The three constant-valued sets MEAN, MIN, and ZERO are now all similar and very close to zero.RAND is now also near zero.From our user base's point of view, the DSSIM values are more intuitively in line with the type of changes represented by the test cases.In the next section, we will use climate simulation data to demonstrate the benefit of using the DSSIM.

APPLICATION TO CLIMATE DATA
We now describe our investigation into whether we can use the DSSIM instead of the SSIM for evaluating the effects of lossy compression on climate data in order to avoid the computational cost incurred by generating images.We use data from the Community Earth System Model (CESM) [13], which is a popular climate model that generates far too much data (e.g., terabytes or petabytes) -hence the interest in lossy compression.In previous work in [5] and [4], we applied the SSIM to images created from CESM data with the NCAR Command Language (NCL) [29], which are similar to those generated by the Atmosphere Working Group Diagnostics Package (AMWG-DP).While AMWG-DP-type images are familiar to scientists in the Earth science community because of its historical widespread use, Python has been quickly replacing NCL as the analysis tool of choice in recent years.In fact, many scientists are doing their own analyses in Python with the help of communities such as Pangeo 3 [9] and creating their own images.This change in post-processing analysis provided further motivation for us to use an SSIM-like measurement that is independent of plot choices.

Experimental data details
The experiments in this paper use a subset of data from the popular and publicly available CESM Large Ensemble Community Project (CESM-LENS) [15].In particular, we use the CESM-LENS data corresponding to the RCP8.5 forcing period, which begins in January 2006, and ensemble member 31.We focus on the atmospheric model output, which uses a one-degree latitude-longitude grid  (32-bit).In Table 3, we list a subset of the atmospheric variables for which we show results.These variables are among the most frequently downloaded and analyzed from the CESM-LENS dataset and have differing characteristics.For example, in terms of compression, TS is considered an "easy" variable to compress due to its relatively narrow range and smoothness, while variables such as PRECT, which has a large range of values, including some very small values close to zero, are typically challenging for lossy compressors.
For these experiments, we compress CESM-LENS data with the popular ZFP compressor [19].ZFP is a high-speed lossy compressor designed for compressing logically regular and spatially correlated arrays of floating-point numbers, compressing data based on various accuracy or size constraints.We use ZFP 0.5.5 in fixed-precision mode, meaning that the precision encoded for the transform coefficients is fixed.The fixed-precision mode parameter (p) specifies how many uncompressed bits per value to store (related to the relative error), so the smaller the value of p, the more aggressive the compression.To improve compression quality, we also use a newer ZFP feature that addresses biased error and is available in ZFP 0.5.5 by checking out the "feature/unbiasederror" branch from the ZFP Github page.In particular, we enable the pre-rounding mode by configuring ZFP with "cmake -DZFP_ROUNDING_MODE=ZFP_ROUND_FIRST -DZFP_WITH_TIGHT_ERROR=ON".

SSIM variants and compressed climate data
To better illustrate their differences, we compare the SSIM, SF-SSIM, and DSSIM on the atmospheric variables in Table 3, each of which is compressed by varying amounts.The SSIM quantities are again computed via LDCPY with default settings as in Section 2.3, and results for each variable are given in Figure 5.For each, we plot the SSIM, DSSIM, and SF-DSSIM values that compare the original dataset to its compressed version.We show results for eight different levels of ZFP fixed-precision compression (p = 6, 8, 10,12,14,16,18,20).As p increases, the compressor is increasingly conservative.In particular, p = 6 is the most aggressive compressor option shown and should result in the smallest SSIM values for all variables, and p = 20 is fairly conservative and should result in values close or even equal to one.We do not show p > 20 as they are indistinguishable from each other and 1.0 in the plots.A fourth SSIM variant appears in these figures as well: "DSSIM (no quant)".This approach is equivalent to DSSIM without the quantization step and will be discussed shortly.
Recall that the primary motivation for using the DSSIM instead of the SSIM is to avoid the cost associated with generating plots to compute the SSIM that are otherwise not needed, as is the case here for evaluating lossy compressor artifacts.In applying the SSIM formula directly to the floating-point data, we took the opportunity to make a few beneficial modifications, collectively referred to as the DSSIM, whose effects can be seen in the plots in Figure 5. First, note that by normalizing the data to [0, 1] and then choosing constants that do not have an out-sized influence on the computed SSIM values (e.g., as discussed for Figure 4), the DSSIM obtains much lower values for more aggressive compressor options, i.e., smaller values of p, than the SSIM and SF-DSSIM.The slope of the DSSIM line is much steeper in this aggressive-compression region.Furthermore, by mitigating sensitivity to the constants, the DSSIM behavior resulting from more appropriate constants is desirable to us as intuitively we want compressor choices, that are too aggressive, to have a more noticeable drop in the DSSIM value.
Another trend in Figure 5 is that as the compression becomes more conservative with increasing p values, the SF-DSSIM values reach 1.0 sooner than the SSIM for all of the variables.Recall that for the SSIM, values of 1.0 are obtained only when the images are exactly the same.When applying this calculation directly to the floating-point data, it is reasonable to not want a value of 1.0 when the SSIM value is still below 1.0, indicating that the images are not equivalent.Instead, the DSSIM's sensitivity to small differences when the data is quite similar (higher values of p) can be beneficial for identifying minor differences.Because identifying when each SSIM variant reaches 1.0 (to five significant digits) is difficult to discern in the Figure 5, we list the corresponding ZFP parameter p at which this occurs in Table 4.In our experience, the DSSIM value is always less than the SSIM value, whereas the SF-DSSIM lines typically intersect the SSIM line at some point, because for smaller p, the SF-DSSIM is usually smaller than the SSIM (but then levels off to 1.0 more quickly).This statement may not be universally true as the SSIM is sensitive to plot choices, but holds for all the variables that we have examined in this representative climate dataset.As mentioned previously, recall that we assume that the floating-point data is smoothly mapped to RGB values, as with pseudocolor plots that use a smooth colormap.
Figure 5 and Table 4 also include results from running the DSSIM without the quantization step, as it is more difficult to intuit the usefulness of this modification.In fact, for TS and FLUT variables, the quantization has virtually no noticeable effect in the subplots in Figure 5, and for the remainder of the variables, it is difficult to see what is happening when the SSIM value is approaching one.Again, Table 4 better illustrates the situation by providing the DSSIM results without quantization in the rightmost column.As with the SF-DSSIM, DSSIM (no quant.)reached 1.0 before (i.e., at a smaller/more aggressive value of p) the SSIM for 5 of the 6 variables.For TS and CLOUD, the SSIM and DSSIM reach 1.0 at the same compressed dataset parameter p, and the DSSIM reaches 1.0 under more conservative compression for the remainder.This behavior is acceptable as we prefer to err on the conservative side, and these SSIM and DSSIM values near 1.0 are important when determining a similarity cutoff threshold, which we address in the next subsection.And, as previously discussed, we do not want a value of 1.0 from a similarity measure on the floating-point data when the SSIM on the images is less than 1.0, implying that there is a visual difference.Note that the data compression here with ZFP is lossy for all tested values of p, meaning that the two floating-point datasets being compared are not equivalent.However, we do not have the requirement that a DSSIM of 1.0 comparing datasets X and Y , implies that X = Y .In effect, the DSSIM quantization step is useful in terms of better differentiating changes in visual quality of the data in this region of interest where the data compression is quite conservative, meaning that differences between the two datasets are quite small and the SSIM and SSIM-like variants should be nearly 1.0.Consider the situation when the two datasets are nearly equivalent and the data is varying smoothly in a local patch, then quantizing the data values will amplify small differences in data that spans quantization bin boundaries and remove small differences within a quantization bin.Even though most of the differences will fall within a bin (and thus be zero after quantization), when the differences between bin boundaries are an order of magnitude or so larger than differences between the datasets (as happens when the DSSIM is nearly 1.0), then in our experience, the quantization step allows the DSSIM to highlight the differences in the data in a similar manner to transforming data to an image with 256 colors as done with the SSIM.While the exact DSSIM value depends on the number of quantization bins, of course, we find that the DSSIM with sensible choices for numbers of bins (e.g., 128, 256, 512, 1024) better differentiates differences near 1.0 than the DSSIM without quantization.

A cutoff threshold for DSSIM
The primary reason for developing the DSSIM as a replacement to the SSIM is to reduce computational costs when evaluating compression effects on large amounts of data.When using the Fig. 5: A similarity comparison between the original data and the ZFP-compressed data for different compression parameter (p-level on the x-axis).The range on the y-axis differs for each plot.
SSIM to quantify the similarity of an image generated with the compressed climate data to that with the original, we use a socalled cutoff threshold, above which the quality of similarity is deemed acceptable.More specifically, in [4] we determined that the SSIM with a cutoff threshold of 0.99995 indicated whether climate scientists would be able to detect a difference in CESM diagnostic images after compression.This threshold, based on a large user study, is much tighter than the generally accepted SSIM indistinguishability threshold of 0.99 (e.g., [21]) and the 0.98 suggested for medical imaging (e.g., [8], [37]).Therefore, note that because we are using the SSIM with a hard threshold, small effects from plotting choices can be important in regions near the threshold.Thus, the DSSIM's independence from plotting decisions is a desirable addition to the cost-savings gained by not generating specific images.However, to use the DSSIM instead of the SSIM for evaluating the climate data, we need to determine an appropriate cutoff threshold for the DSSIM, and conducting another large user study is not feasible at this time.Instead, we use statistical techniques and the previous study results to show that the DSSIM can be used to evaluate lossy compression artifacts in climate data with an appropriate cutoff threshold.
Ideally, we want to find a DSSIM threshold such that datasets that pass the SSIM threshold test also pass the corresponding DSSIM threshold test, and datasets that fail the SSIM threshold test also fail the corresponding DSSIM threshold test.We define a compressed dataset to be a true "pass" if the SSIM value meets or exceeds the 0.99995 threshold, otherwise we consider that dataset to be a true "fail".One method to assess an appropriate DSSIM threshold is with classification matrices, which are commonly used tools to evaluate the results of a classification model.In our case, the classification matrix is a 2 × 2 matrix where the columns correspond to the true pass or fail status of the data as determined by the SSIM.The rows correspond to whether our model (i.e., the DSSIM) passes or fails the dataset, which is based on whether the DSSIM is above (pass) or below (fail) the DSSIM threshold being tested.This setup means that the diagonal entries of the matrix correspond to the number of instances where there is agreement or consistency between the DSSIM and the SSIM, and the off-diagonal elements correspond to the number of instances where the DSSIM and SSIM disagree (an inconsistency) in their classification decision.For example, if we take the first 3 time slices from the 79 2D monthly variables in the CESM-LENS dataset and apply the ZFP compressor using 10 parameters for p (p = 6, 8, 10,12,14,16,18,20,22,24), then we obtain 2370 SSIM values obtained by plotting and comparing the original dataset and the compressed dataset.As before, SSIM and DSSIM values are computed via LDCPY with the default settings.In Figure 6, the top plot shows the number of images for which the DSSIM result was classified differently than the SSIM result (i.e., "inconsistent") for a range of DSSIM thresholds.The bottom plot is the classification matrix corresponding to a DSSIM threshold of 0.99919, which minimizes the inconsistent results (the sum of the off-diagonal entries in orange).Note that alternatively one could choose to minimize either the number of inconsistent fails or inconsistent passes, depending on the use case.
This analysis for finding a cutoff threshold assumes that the data distribution in our analysis is representative of the data distribution in practice.Because the CESM variables have quite different characteristics, an appropriate DSSIM threshold based on only a single variable may be slightly different (lower or higher).However, because the SSIM threshold determined in [4] is quite conservative, we find that the corresponding DSSIM threshold of 0.99919 is conservative enough to use on all CESM variables.In practice, we often reduce this threshold further to allow for more aggressive data compression (e.g., to 0.995 or 0.95).
Figure 7 gives an indication of how much compression can be achieved with ZFP in fixed-precision mode for the variables in Table 3, including how the data reduction corresponds to the DSSIM.The compression ratio (CR) is the size of the losslessly compressed data divided by the lossily compressed data with ZFP.We see that the CR decreases as the ZFP precision parameter p increases, as expected.CR values near 1.0 mean that the lossy compression is so conservative that there is little reduction beyond what lossless compression has achieved.The DSSIMs increase in a nonlinear fashion, which is consistent with the idea that we get diminishing returns in the data fidelity as we approach lossless compression.Moreover, there is clearly a limit to the amount of compression possible while maintaining a high DSSIM value.The larger the CR, the lower the DSSIM tends to be, which can easily be seen by noting how the colors of the dashed and solid lines are nearly in reverse order from top to bottom.Further examinations of DSSIM in the context of data compression for climate model data can be found in technical reports [23] and [22].

SSIM vs. DSSIM speedup
Finally, we show that, as expected, computing the DSSIM is much faster than computing the SSIM.We time the calculations of the DSSIM and the SSIM using the implementations in the LDCPY package via a Jupyter notebook.We again use the TS and PRECT variables from the CESM-LENS data that are included with LDCPY.Timing results (in seconds) are given in Table 5.The algorithmic cost of the DSSIM and SSIM are similar, so the difference in time is due to the rendering of the images from the  floating-point data needed to calculate the SSIM.While the actual times for these calculations will vary depending on the computing platform (these were performed on a laptop), the speedup indicates what type of performance can be gained by using the DSSIM instead of the SSIM.While the rendering performance could of course be improved by using different hardware (e.g., GPUs) or more optimized software, avoiding the rendering altogether will always be less expensive.Whether this savings is significant enough to matter to the user will be specific to their particular hardware, software stack, and data volume.For our application, the DSSIM is more competitive from a computational standpoint to other metrics for data similarity applied to the floating-point data, like the mean-squared error (MSE) or the peak signal-tonoise ratio (PSNR).(Note that the MSE and PSNR can be applied to either floating-point or image data.When applied to floatingpoint data, it is well known they are not particularly indicative of visual similarity.However, they may be useful for evaluating visual similarity in some cases when applied to image data.) The time savings for the DSSIM are important for comparing compressor results with climate data, particularly as we automate testing and must evaluate the data quality and similarity on large amounts of data.Indeed, in previous work we had found the SSIM to be a better indication of image similarity than other measurements, but its relative expense made its use hard to justify.The cost of the DSSIM, on the other hand, is more reasonable for our large-scale evaluations and has proven useful in practice.

CONCLUDING REMARKS
In this manuscript, we have proposed an alternative to the popular SSIM that can be applied directly to floating-point data.Applying the DSSIM to the floating-point data is computationally cheaper than generating images from the data and then applying the SSIM.This reduced computational cost is quite important when analyzing large volumes of data in an automated fashion and is appropriate for situations in which we need only a general idea of whether images created from the data will be similar.An additional benefit is that the DSSIM is independent of plot-specific choices that can affect the SSIM.The DSSIM is implemented in the LDCPY (Large Data Comparison for Python) package [24].
While conceptually simple, the DSSIM has been tremendously beneficial to us for comparing lossily compressed to uncompressed climate model data.Prior to our development of the DSSIM, the SSIM had become an important measurement in our compression evaluation toolkit, largely due to its intuitiveness for the user and ability to represent visual similarity well.However, the SSIM was prohibitively more expensive to compute than all other measurements in the toolkit on large data volumes because of the image generation requirement, motivating us to propose the DSSIM.In practice, we now have been using the DSSIM instead of the SSIM with success in terms of identifying compression artifacts and saving compute time when evaluating data compression.While we have only evaluated the DSSIM in the context of comparing climate model simulation data, we are optimistic that it could be a useful measurement in other application areas as well -especially those producing large volumes of simulation data.putational and Information Systems Laboratory, sponsored by the National Science Foundation.

Fig. 1 :
Fig. 1: The plots for surface temperature (TS) data for the original data (top) and the lossily compressed data (bottom) with compressor ZFP and p=8.(Compressor details are discussed in Section 4.1.)

Fig. 2 :
Fig. 2: The default plot of original data for precipitation rate (PRECT) in meters per second (m/s).

Fig. 3 :
Fig.3: Examples of data-independent plot choices that can affect the SSIM values for surface temperature (TS).Note that the corresponding SSIM values are given in Table2.

Fig. 6 :
Fig. 6: The top plot shows the number of differently classified (i.e., inconsistent) datasets by DSSIM threshold, which is minimized when the DSSIM threshold is 0.99919.The bottom plot contains the classification matrix for a DSSIM threshold of 0.99919.

Fig. 7 :
Fig. 7: The relation between various amounts of lossy compression and the compression ratio (CR) and DSSIM values for the variables in Table3.The horizontal axis indicates the different fixed precision parameters for ZFP compression from most (left) to least aggressive (right).The dashed lines correspond to the DSSIM values (right axis) and the solid lines correspond to the CR (left axis).The compression ratio is the losslessly compressed file size divided by the lossy compressed file size.

TABLE 1 :
A comparison of the original TS data to multiple test cases that are modifications of the TS test data.

TABLE 2 :
Examples of modifications to plot choices and constant values that affect the SSIM values.Differences between the new and default SSIM values are given in parentheses (negative values indicate a decrease due to the modification).

TABLE 3 :
Sample atmospheric variables with their descriptions and selected characteristics (minimum, absolute nonzero minimum, and maximum values) for the first time slice.Note that the absolute nonzero minimum is not listed when it is equivalent to the minimum.All listed variables are 2D.(CLOUD is normally a 3D variable, but here we use vertical level 20 only.) 2. http://www.astropy.org3. http://pangeo.io

TABLE 4 :
A list of the smallest ZFP compression parameters (p) for which the SSIM variants equal 1.0.Parameters in red indicate that the SSIM variant (SF-DSSIM, DSSIM, or DSSIM (no quant.))reached 1.0 for a more aggressive compression parameter (lower p) than the original SSIM value (bold) did, which is undesirable.

TABLE 5 :
Timings for computing the SSIM and DSSIM via the DataCalcs object in LDCPY.Times reported are the fastest of 5 executions in seconds.The first time slice is used for each variable.