Uncertainty-Aware Multidimensional Scaling

We present an extension of multidimensional scaling (MDS) to uncertain data, facilitating uncertainty visualization of multidimensional data. Our approach uses local projection operators that map high-dimensional random vectors to low-dimensional space to formulate a generalized stress. In this way, our generic model supports arbitrary distributions and various stress types. We use our uncertainty-aware multidimensional scaling (UAMDS) concept to derive a formulation for the case of normally distributed random vectors and a squared stress. The resulting minimization problem is numerically solved via gradient descent. We complement UAMDS by additional visualization techniques that address the sensitivity and trustworthiness of dimensionality reduction under uncertainty. With several examples, we demonstrate the usefulness of our approach and the importance of uncertainty-aware techniques.


INTRODUCTION
Dimensionality reduction is used in numerous fields of application as a tool to reduce data while keeping the relevant data characteristics. It conceptually projects high-dimensional data to, or embeds it into, a target space of lower dimensionality. It is a popular approach for visualizing high-dimensional data: the mapping to 1D, 2D, or 3D spaces enables visual representation, processing, and interaction. However, dealing with uncertain data makes these tasks significantly more challenging and needs additional consideration. Unfortunately, non-linear dimensionality reduction techniques such as multidimensional scaling (MDS) [19], t-distributed stochastic neighbor embedding (t-SNE) [35], or uniform manifold approximation and projection (UMAP) [23] do not provide a direct mechanism to deal with uncertain data and propagate • All authors are with the University of Stuttgart. E-mail: firstname.lastname@visus.uni-stuttgart. de. such uncertainty through the process of dimensionality reduction.
In this paper, we pick MDS because it is one of the most common dimensionality reduction techniques. We address the problem of uncertainty visualization by introducing uncertainty-aware MDS (UAMDS), an extension of MDS to uncertain data. The contributions of this paper can be summarized as follows: first, the extension of MDS to uncertain data by providing a generic and rigorous mathematical formulation that supports any kind of distributions and a variety of stress types; second, a reformulation of the generic concept for the important case of normally distributed input data and squared stress, leading to an efficient numerical algorithm and intuitive interpretation of the mathematical terms that go into the MDS computation; third, several visualization techniques tailored to our approach that address the trustworthiness and sensitivity of dimensionality reduction under uncertainty. In addition to these technical contributions, we show the usefulness of UAMDS with several examples and illustrate the application of our visualization techniques. Fig. 1 shows a simple example, where the left column demonstrates the application of UAMDS to a small data set without uncertainty (in this case, UAMDS boils down to traditional MDS). The middle col-umn presents the result of UAMDS for the same data set, yet with uncertainty attached to one point (modeled as a random vector). In the right column, we introduce uncertainty to the remaining data points. UAMDS embeds the multivariate probability density functions from the input data into the 2D space. This leads to 2D probability densities, which are illustrated via isolines. We can see that uncertainty not only manifests itself in spread-out distributions, but also in the location of the centers of these distributions. The UAMDS projection in the right column shows that there is an even greater impact on the means. In addition, this projection also changes the orientation of the orange distribution in 2D space. In short, MDS needs to be extended to UAMDS to correctly capture the impact of data uncertainty on dimensionality reduction, leading to a more reliable representation of the data.

RELATED WORK
Our paper addresses uncertainty visualization, which has been identified as one of the challenging problems of visualization research [17].
General approaches to uncertainty visualization. In their seminal paper, Pang et al. [26] provide a classification of uncertainty visualization techniques and describe several strategies for representing uncertainty. Since then, much additional progress has been made in uncertainty visualization; a summary of techniques can be found in survey papers [2,3,10,18] and in an interactive visual collection [16].
The above surveys also discuss models of representing uncertainty in data. A common approach assumes some kind of probabilistic modeling. We follow this approach in our paper as well: we describe uncertainty of input data in the form of probability density functions, which allows us to describe typical sources of uncertainty, such as measurement imprecision or variability from statistical aggregation over input samples. A broader discussion of uncertainty levels is provided by Skeels at al. [33]; however, probabilistic modeling can play a relevant role in many of their levels of describing uncertainty.
According to Brodlie et al. [3], there is a difference between visualization of uncertainty and uncertainty of visualization. Both aspects play a role for UAMDS. Our main goal is to make MDS aware of uncertainty in the data, addressing the problem of visualizing uncertainty. However, dimensionality reduction introduces additional error because it cannot guarantee that distances are preserved, i.e., we are also facing the problem of uncertainty of visualization. We address this problem by devising visualization techniques that explicitly show such error. In this way, we contribute to improving trust in the visualization [30]. Indeed, our approach looks at how uncertainty is propagated through the visualization pipeline [5,11]. We focus on the propagation of uncertainty through dimensionality reduction via (UA)MDS, but also consider the visual representation at the stage of visual mapping and rendering.
Dimensionality reduction, MDS, and uncertainty. There are many publications on linear or non-linear dimensionality reduction. An overview of non-linear techniques is provided by Lee and Verleysen [20]. As this paper addresses the extension of MDS to uncertain data, our discussion focuses on methods that consider uncertainty.
One of the most frequently used techniques of (linear) dimensionality reduction is principal component analysis (PCA) [27]. In the context of fuzzy data, Masson and Denoeux [7] presented an extension of PCA by using autoassociative neural networks. Görtler et al. [9] introduced an uncertainty-aware version of PCA with a formulation that incorporates moments of a distribution. Their method projects the first and second order moments via a linear mapping from high-dimensional to lowdimensional space. Correa et al. [6] demonstrated uncertainty-aware visual analytics and also showed propagation of uncertainty through PCA. While these approaches are based on linear dimensionality reduction, our approach extends the non-linear technique MDS, which requires different mathematical and numerical modeling.
Schulz et al. [32] presented a method for a probabilistic graph layout of uncertain networks. Their method uses Monte Carlo sampling of graphs and combines the resulting graph layouts to uncertainty visualization. In contrast, UAMDS does not require a (computationally costly and potentially inaccurate) Monte Carlo approximation but utilizes an analytic mathematical model that considers distributions.
In the context of MDS, there are probabilistic extensions. For example, the model developed by Zinnes and MacKay [21,37] assumes random vectors with fixed variance in all dimensions as input and outputs corresponding distributions in screen space. Therefore, it differs from our approach. Masson and Denoeux [22] adapted MDS to fuzzy systems where dissimilarities are given as intervals or fuzzy numbers. Their approach results in fuzzy regions. Besides the modeling aspects, our work also addresses a much more generic concept of uncertainty.
Bayesian models of MDS were presented by Bakker and Poole [1] and Soh [34] for probabilistic embeddings of noisy dissimilarities. Their approaches estimate distributions in the embedding space, whereas our mathematical model results in a projection operator that maps high-dimensional distributions to low-dimensional space. Nguyen and Holmes [24] consider uncertain dissimilarities for MDS projections allowing them to propagate uncertainty into visualization spaces. Our method takes more uncertainty information into account, e.g., covariance, and is able to propagate orientation and shape of distributions.
Another related approach was developed by Chen et al. [4]. Their uncertainty-aware projection technique for multivariate ensemble data estimates distributions from data and uses MDS on the means of a subset of the distributions for an initial layout of landmark points. The remaining points are then projected to screen space using a least squares approach that takes mean distances but also a divergence measure between distributions into account. Although the approach also deals with distributions and projections for dimensionality reduction, our mathematical model is a consistent extension of MDS that takes all aspects of a distribution simultaneously into account.

MATHEMATICAL MODEL AND NUMERICAL ALGORITHM
In this section, we motivate and derive our mathematical model of UAMDS. We start with basic conceptual considerations and the derivation of the generic model that allows processing of arbitrary kinds of distributions. Then, we consider the special, yet common and relevant case of uncertainty described by normal distributions. Accordingly, we reformulate the generic model, leading to a concise description of a corresponding optimization problem. Based on this, we discuss the numerical algorithm to compute UAMDS.

Concept
We start with MDS, i.e., data is yet not associated with uncertainty. MDS formulations are typically classified into metric and non-metric types. Our work addresses the class of metric MDS using a generalized form of the stress consisting of a sum of squares of dissimilarity differences. Given points p 1 ,..., p K ∈ R N in the high-dimensional space R N , the goal is to find points x 1 ,...,x K ∈ R n in a low-dimensional target space R n (i.e., n < N) such that the following stress is minimized: The relevant information about objects in high-dimensional space are expressed by pairwise dissimilarities d i, j = ∥p i − p j ∥ α p , often with p = 2 (representing Euclidean distance) and α = 1.
How can we now arrive at the uncertainty-aware version of MDS? Our derivation is inspired by limit process considerations in physics that take one from the mechanics of individual mass points to continuum mechanics [8]. Fig. 2 applies this concept to our problem of uncertainty awareness in MDS. The left part of Fig. 2 shows the "world" of probability density functions, both in the high-dimensional space (top-left; illustrated as 2D space for the sake of visual presentation) and the lowdimensional space (bottom-left; illustrated as 1D space). Unfortunately, we do not yet know how to handle this left part. Therefore, we reduce the problem to something that we already know: MDS for data points. We achieve this by conceptually sampling the distributions, which leads to clouds of data points in high-dimensional space (top-right). For these, we can then apply MDS as usual, leading to the dimensionality reduction of the point cloud (bottom-right). Finally, we can construct the corresponding probability density functions in the low-dimensional space (bottom-left). This is a conceptual model assuming that we have a limit process of infinitely dense sampling-similar to the limit process taken in calculus for defining Riemann integration. The resulting integral representation of UAMDS can be seen as the result of a limit process of the sampled model.

Generic Mathematical Model
To work with this conceptual model, we need a formulation that provides points in high-dimensional space. For the time being, we assume that these are given in a high-dimensional Cartesian space. Later, we will see that we do not need Cartesian coordinates for the final computation, but that distances are sufficient, as for traditional MDS.
With this, we can express the desired low-dimensional points x 1 ,...,x K in terms of the high-dimensional points p 1 ,..., p K by using local projections Φ 1 ,...,Φ K : R N → R n , i.e., we obtain the relation x k = Φ k (p k ) for k = 1,...,K. In other words, we describe the overall dimensionality reduction as a combination of individual local projections, one for each of the uncertain input distributions. We want to point out that the local projections are not yet known; in fact, it is our goal to eventually compute them in order to determine the dimensionality reduction. In the next subsection, we will see that the formulation of the stress truly restricts the dimensionality reduction process and, thus, reasonable local projections can be determined.
Due to the use of local projections, the minimization problem in Equation 1 is equivalent to minimizing the following stress (with projections Φ k that map p k to x k , e.g., constant functions Φ k (p) = x k ): With this representation, it is possible to formulate UAMDS because the dimension of the point objects is the same for all i and j. Now, we need to transition from the world of data points (righthand side of Fig. 2) to the world of distributions (left-hand side of Fig. 2). Given random vectors P 1 ,...,P K with realizations in the high-dimensional space R N and associated probability density functions f P 1 ,..., f P K : R N → R, the goal is to find low-dimensional random vectors X 1 ,...,X K via X k = Φ k • P k with realizations in the lowdimensional target space R n and associated probability density functions f X 1 ,..., f X K : R n → R such that stressS is minimized.
For the formulation of the stress, we again resort to the conceptual model of sampled random vectors, as illustrated in Fig. 2 (top-right). In traditional point-based MDS, we would have to compute pairwise distances (i.e., distances between all pairs of sampled points), which go into the summation in the stress term (Equation 1). Now, we have to transform this by the corresponding limit process. Here, we replace the summation for the traditional stress term by integration, leading to the following stress (compare Fig. 2): where f P i ,P j is the joint probability density function of f P i and f P j .
We can see that the integrand directly corresponds to the summands in Equation 1. The joint probability density function f P i ,P j describes the probability density for the respective distance pair. The integration has to go over the space of these pairs, now represented as the space of pairs of points from R N × R N . The integration variables v and w represent the locations of the two respective points.
In the case of independent random vectors, it holds: ). Hence, the formulation is a direct extension of Equation 2 to an integral representation that accounts for the uncertainty in the data. This aspect is clearly observed if the random vectors are modeled by Dirac delta distributions (thus, there is no uncertainty in the data). In this case, Equation 3 directly leads to Equation 1. This shows that our new model is compatible with traditional MDS.
Besides the stress function, another modeling parameter of Equation 3 are the local projections Φ 1 ,...,Φ K that determine the nature of the low-dimensional random vectors X 1 ,...,X K via X k = Φ k • P k . We observe that this factor highly depends on the input distributions P 1 ,...,P K . In the next subsection, this aspect is examined in the framework of normal distributions. For a completely generic approach, however, it is useful to approximate the local projections via a finite Taylor series that is tailored to the distribution, e.g., the Taylor series should be evaluated at the expected values.

Normal Distributions
While the generic UAMDS model above provides the framework for arbitrary distributions, in this subsection, we want to derive a model based on normal distributions. We have two goals: (1) find an adequate choice of local projections Φ 1 ,...,Φ K that are suitable for multivariate normal distributions; (2) evaluate the stress (Equation 3) to obtain an adequate mathematical formulation and solve it numerically.
We choose p = 2 and α = 2 for the dissimilarity, i.e., dissimilarity based on the squared Euclidean distance. The reason for this choice is the tractability of the resulting integrals. Eliminating the implied square root of the regular Euclidean distance allows us to handle the integrals analytically. For the sake of transparency and easy reproducibility, the full details of the technical parts of the calculations are placed in the supplemental material [13] (Sections 1-3) while important results are summarized in this documents as well.
According to the previous subsection, random vectors P 1 ,...,P K with realizations in high-dimensional space R N and associated probability density functions f P 1 ,..., f P K are given.

Assumption 1:
The random vectors P 1 ,...,P K are independent and normally distributed with P k ∼ N (µ P k , Σ P k ), µ P k ∈ R N , Σ P k ∈ R N×N , and the singular value decompositions Σ P k = U P k S P k U P k T with The first consequence is that f P i ,P j = f P i · f P j due to the independence of the random vectors. Secondly, we can perform a change of variables with the transformations Ψ k : R N → R N , Ψ k (v) = U P k S P k 1/2 v + µ P k (for the degenerate case, a limit process is needed that leads to the same result, though): Now, the local projections Φ 1 ,...,Φ K are the only remaining model parameters in Equation 4. In general, the use of an affine transformation (first-order Taylor) Φ : In fact, for a normally distributed random vector P ∼ N (µ, Σ), the transformed distribution is given by . Therefore, we model the local projections by the following affine transformations.
The effect of the adjusted affine transformations onto the highdimensional normally distributed random vectors We denote this equivalent function as S. For the sake of transparency, this stress S : is split into three parts (see Subsection 3.1 in the supplemental material [13]): where S 1 , S 2 , and S 3 are given by (δ kl is the Kronecker delta) For a compact matrix notation of the three stress components, we refer to Subsection 3.2 in the supplemental material [13].
With the interpretation that a normally distributed random vector can be represented as an ellipse in 2D, it can be observed that the stress components characterize certain aspects of a normal distribution. In this representation, the ellipse is spanned by the principal axes, which are given by the eigenvectors of the covariance matrix. Using this interpretation, the first stress component S i, j 1 solely addresses the assimilation of the covariance matrices in terms of size and shape.
In contrast, the third component S i, j 3 characterizes two aspects: fitting of the expected values and the comparison of the principal axes of the covariance matrices. Since the first stress component already covers this covariance matrix aspect (in fact, this term is also included in the sums), S i, j 3 mainly fits the expected values. The only thing missing is the orientation of the ellipses. This point is reflected in the second stress term S i, j 2 , where the alignment of principal axes to the difference of the expected values (used like an orientation line) is used.
Another important aspect of the stress formulation is that the actual random vectors do not need to be defined explicitly. In fact, analogously to traditional MDS, where only distances of points (d i j = ∥p i − p j ∥ 2 ) are necessary, our normal distribution UAMDS only requires certain scalar values as well. These are given by: • scalar products of the principal axes: This list shows the input for normal distribution UAMDS, indicating the information requirements.

Numerical Algorithm
After the evaluation of the stress, we have to deal with a minimization problem. In case of a normal distribution model, our goal is to find min The reference implementation is available as a Java library [12]. According to the previous subsection, the stress consists of three different components. While the presented formulation (Equation 5) highlights the connections between single objects, the corresponding matrix notation, which can be found in the supplemental material [13] (Subsection 3.2), has numerical and algorithmic advantages. For the implementation, in particular, for the evaluation of the stress, we recommend using the matrix notation. For an implementation of gradient descent, we also provide an analytic gradient in the supplemental material [13] (Section 4) for the normal distribution model. However, for more complex distributions or choices of local projections, there may not be an easy analytical solution of the gradient. In this case, we recommend resorting to a numerical gradient computation.
By default, our implementation uses a random initialization of the affine transformations, but we also support two other initialization strategies for speeding up the starting stage of the optimization: The No-Variance initialization performs normal distribution UAMDS on a version of the data set with zero covariances, which allows for omitting major parts of the stress calculation and provides reasonable c i , and the UA-PCA initialization performs UA-PCA [9] on the data set and uses the resulting linear projection as initialization for affine transformations.

VISUALIZATION
In this section, we discuss how to visualize the results of UAMDS and we provide visualization techniques that support the analysis with UAMDS. Finally, we present a visualization analysis for varying uncertainty. We are omitting the normal distribution indication when refering to our evaluated UAMDS model from here on for conciseness.

Representation of Distributions
Here, we describe how the resulting low-dimensional random vectors can be visualized. According to our generic approach, the lowdimensional random vectors are characterized by X k = Φ k • P k , where P k is the input high-dimensional random vector and Φ k : R N → R n the projection operator computed via UAMDS. The effect of the projection operator onto the probability density functions is given by In the case of arbitrary distributions, there might be no compact formula for the probability density function. Thus, a discretization is needed to visualize the shape of the distribution in the low-dimensional space. However, in the case of normal distributions, our model shows that the projection of the continuous distributions P k ∼ N (µ P k , Σ P k ) boils down to a projection of the characteristic moments, i.e., For the visualization of normally distributed random vectors in 2D, there are several options [36]. A simple but also effective option is the use of scatter plots, where a large number of samples (drawn from a distribution) are visualized (see Fig. 3a). Another sampling-based approach are hypothetical outcome plots [15], which use animation to show possible realizations of random vector. The latter technique allows an understanding of distributions and correlations over time, whereas the former provides a sense of sample likeliness based the density of certain regions.
Visualizing the probability density function as a 2D scalar field provides information about likeliness of certain regions as well. While it is possible to encode continuous probability densities with a color gradient, the effect of binning certain value ranges has its advantages [25]. The use of isobands at specific quantities of the distribution is shown in Fig. 3b. In this context, the 50 th , 80 th , and 95 th percentiles are used. Using stippling of the scalar field, the issue of overdraw present with isobands can be mitigated [31]. In our visualizations of projected normal distributions, we choose isolines (see Fig. 3c), which have less overdraw issues than probability samples or isobands, and give a clear impression of shape and size. These aspects are beneficial for our analysis and comparisons of projections. In general, we advise users of UAMDS to choose a representation that fits the individual tasks.

Trustworthiness of Projections
When projecting data with UAMDS, or any other dimensionality reduction method, some information cannot be preserved. While a projection is found that is optimal with respect to the stress, in our case, some random vectors may have a worse representation than others in favor of the global projection. Thus, by the nature of dimensionality reduction, a projection cannot be fully trusted. This is amplified when dealing with uncertain inputs and the more important it is to support the assessment of the projection quality and faithfulness to help the analysis.
A straightforward way to inform about projection faithfulness of the individual distributions is through color coding of the stress. We compute how much stress results from projection of each distribution P i , which is given by S i (c, B) j   3 (c, B), and color the projected elements in the visualization accordingly. This makes it easy to spot projections that may not faithfully represent the high-dimensional distribution and its relationship to others. Leveraging interaction, we allow for selection of an individual distribution P i and give more details on the stress by color coding elements corresponding to P j according to pairwise stress components S i, j (c, B). This way, it can be assessed which distances between the projected distributions can be trusted and which cannot. Fig. 4 shows an example, where distributions 1 and 4 have most overall stress. The right plot in Fig. 4 shows the local stress of distribution 1, i.e., distributions are colored according to the stress they share with distribution 1. It can be seen that especially the adjacent distribution 4 causes high stress, while the others exhibit only little stress, hinting at a faithful representation of their distances. Fig. 10 shows the individual pairwise stress components in a scatter plot with x = S i, j 1 , y = S i, j 2 , which allows for even more detailed assessment of projection quality and detection of poorly represented relationships. Looking at the stress is a good first indicator for projection quality, but we can do more to assess how well distances are preserved in the projection.

Distribution Shepard Diagram
Since UAMDS is a distancepreserving technique, the use of a Shepard diagram to evaluate projection quality is an obvious choice. In regular point-based projections, such as MDS, the Shepard diagram is a scatter plot of pairwise distances of the original and the projection space. A single point in a Shepard diagram has coordinates (D i, j , d i, j ) and its position shows how the distance in the projection d i, j between i and j compares to the distance D i, j . This notion of distances is also present in our general UAMDS model where Equation 2). However, the high-dimensional points v, w are samples from their respective distributions P i , P j so that there can be infinitely many of them. Since we are interested in the projection quality for the whole distributions, we may analyze the distribution of high-and low-dimensional distances between random vectors P i and P j . For this, we define the bivariate random vector D i, j of pairwise distances between P i and P j as follows: In a Monte Carlo fashion, we draw a number of samples from P i and P j , project them according to UAMDS and compute pairwise highand low-dimensional distances. Plotting the samples from D i, j in a scatter plot shows an approximation of its distribution. Observing its location and spread with respect to the plot's diagonal, we can see that distances between P i and P j are overestimated (above diagonal) or underestimated (below diagonal) in the projection. Using brushing and linking, we support highlighting of samples in the Shepard diagram corresponding to projected distributions in the visualization to mitigate the problem of massive overdraw stemming from sampling. Fig. 5 shows a Shepard diagram for pairwise distances D 1, j , i.e., the distances between distribution 1 (which has high overall stress) and the others. It can be seen that there is high point density along the diagonal and some spread above and considerable spread below it, which means that distances from 1 have a tendency to be underestimated, specifically distances to 4 (orange) are underestimated. In the right Shepard diagram, the pairwise distance distribution D 1,0 is highlighted (red dots), which was occluded in the other diagram. It can be seen that the points are close to the diagonal, which means that distances between distribution 0 and 1 are faithfully represented.
We also provide two other Shepard diagram versions tailored to projections with UAMDS. For these, we consider the pairwise differences of the normally distributed random vectors Y i j = P i − P j ∼ N (µ P i − µ P j , Σ P i + Σ P j ). To analyze the degree to which variability is preserved we use two different metrics on the respective covariances. Fractional anisotropy , whereλ is the average eigenvalue of an N × N covariance matrix with eigenvalues λ 1 , .. , λ N , is a metric for the anisotropy of the distribution. We use the trace tr( 2 as a metric for the total variability of a distribution (i.e., the expected squared norm of the random vector Y i, j without taking into account the expected location). With these metrics, we can analyze how well variability information is preserved in the projection when plotting them for pairs of a high-and low-dimensional random difference vectors. Together with brushing and linking, we facilitate quality assessment of a UAMDS projection as shown in Fig. 10 and also in the supplemental material [13].

Sensitivity Analysis of Dimensionality Reduction
We argue that the analysis of high-dimensional data becomes even more challenging when data is uncertain. While using off-the-shelf dimensionality reduction is possible, for instance, when only expected values are regarded, the faithfulness of such projections has to be questioned since the uncertainty is not accounted for. In Fig. 1, we illustrate that taking different levels of uncertainty into account leads to different results compared to projection of just the expected values. To show that UAMDS is indeed aware of uncertainty, we provide an analysis on a synthetic data set for the sensitivity to data variance.
Our synthetic data set consists of 5 uncertain inputs described by 4D multivariate normal distributions. For the analysis, we introduce a scaling parameter to vary the uncertainty by scaling the distributions' covariance matrices. Initially, we scale down the covariances to zero, and project the data with UAMDS, which yields an equivalent projection to regular MDS in this case. We then iteratively increment the scaling of the covariances of the input and apply UAMDS while using the affine transformations that resulted in the previous iteration, for initialization. This is important since it provides consistent global orientation over all scaling steps and avoids convergence to very different local optima in each iteration. While gradually increasing the scaling back to 1, we trace the the projected means and covariances.
In Fig. 6a, the projection for the covariance scalings 0.0 and 1.0 are shown. It can be seen that distributions 0 and 2 as well as 1 and 4 are projected close to each other when no uncertainty is considered, but they are drifting apart when introducing variance. According to the isolines, the mean of distribution 1 is within the 80 th percentile of projected distribution 2, which means that possible realizations of the random vectors can be closer than what is illustrated by the upper plot.
In Fig. 6b, we plotted the trace of the means in the projection as they travel with increasing uncertainty. We also show the principal axes of the projected covariances for scalings 0.0125 (in light color), 0.5, and 1.0 (saturated color) on the mean traces. It can be seen that means travel in different directions, blue (1) and orange (4) for instance, diverge from each other, but blue also takes a sharp turn after the first half of the upscaling process. While red and purple stay rather stationary, green moves toward blue during the second half of the process. We can also see that the principal axes of covariances turn, which means that not only the positions change, but also the way in which distributions are oriented to each other in the projection to account for a better preservation of distances of hypothetical samples.
Taking a closer look at the rate of change of projected means and covariances to further analyze the sensitivity, we plotted the magnitudes of the velocities of means and covariances in Fig. 6c. The upper plot shows the velocity of the means for increasing scalings. High velocity for all of the means can be seen in the beginning, hinting at a high sensitivity to changes in variance when variance is small. The positions of means seem to be converging around a scaling of 0.35, but further increments to the scaling cause some of the means to move again. Blue and green even gain more and more momentum. The velocities of the covariances of projected distributions show a behavior similar to the means in the beginning and around the 0.35 scaling mark.
We also looked into the effect of only varying the variance scaling for a single input. In Fig. 1, the effect of introducing uncertainty to only a single data vector (colored orange) is shown in the middle column. The former positions of the projected data points are indicated by faded color so that it can be seen that some are mostly unaffected (red and green), and others have moved considerably. In general, we could see from our analysis that UAMDS is sensitive to changes in uncertainty on a global and local level, and that changes in the projection can be rather unpredictable and surprising as could be seen from the velocities in Fig. 6c, for example.

RESULTS
While only synthetic data has been used in the previous sections, we will showcase our technique on non-synthetic example data sets in the following. We also take the opportunity to demonstrate the versatility of UAMDS and what benefits come with our mathematical model.

Iris Data Set
Since dimensionality reduction is often used to assess similarity of subsets of data, e.g., when dealing with labeled data, the locations of single points in the projection may not be of interest. Instead, the shape of clusters and, for example, their overlap is interesting to analyze. Furthermore, many data sets consist of observations that can be interpreted as samples of an underlying distribution. Such distributions are usually not explicitly given and are a source of uncertainty for the analysis. When estimating the underlying distributions from data, the uncertainty is quantified and we can use UAMDS to visualize it. With our current model, we need to assume that the distributions are normally distributed, so we can estimate their densities from mean and sample covariance, and then map them to two dimensions. As an example of this procedure, we use the well known Iris data set, which consists of observations from three different species of Iris flowers. Fig. 7 shows a comparison of a regular MDS projection and our UAMDS projection. For Fig. 7b, the high-dimensional distributions of the three different species have been estimated from the data as normal distributions. It can be seen that the kernel densities of the UAMDS projection are very similar in orientation and spread compared to the clusters in Fig. 7a. Both plots show an overlap of the Versicolor (orange) and Virginica (blue) species, and that Setosa (green) samples are less similar to the aforementioned ones.
Outliers cannot be spotted in the UAMDS density plot, since it projects the estimated distribution instead of the whole data set. However, the densities are well suited for analyzing relationships between whole populations. In fact, we are able to project the data points using UAMDS as well, since our model is based on projection operators for distribution samples, which we approximate by affine transformations. Applying the transformation UAMDS yields for each distribution to the original data, provides results very similar to regular MDS (see Fig. 7b). This way, UAMDS can also be used for processing massive data sets of many instances, which is often infeasible with regular MDS.

Student Grades Data Set
In this section, we will demonstrate the application of our method to data sets with uncertain observations. Since uncertainty can stem from different sources, let us first characterize what kind of uncertainty in a data set we are considering here. What we mean by data consisting of uncertain observations is that every instance of the set is a random vector instead of a regular vector. The random vector could be fully described by a multivariate distribution or each feature as a distribution on its own. An example of this is the student grades data set presented by Denoeux and Masson [7] and also used by Görtler et al. [9]. It consists of four features describing students' performance in different subjects (M1, M2, P1, P2). The performance in a subject is based on grading, ranging from 0 to 20 points. However, the grading is expressed with different levels of uncertainty, for instance, it can be given as a real number, a range of values, or even as a textual description such as 'very good' or 'fairly bad.' The following  [6,9] For UAMDS, we need normally distributed uncertainty. Value ranges [a, b] are translated to normal distributions with mean µ = (b − a)/2 and variance σ 2 = (b − a) 2 /12. For textual descriptions, we provide two mappings to normal distributions as shown in Fig. 8 (top).
In the following, we will compare UAMDS and uncertainty-aware principal component analysis (UA-PCA) [9] on the basis of the student grades data set. For this, we compute a projection with UA-PCA, use the UA-PCA projection as initialization for UAMDS, and compare both projections. We choose to initialize UAMDS with UA-PCA to obtain similar global orientation. From the density plots on the left half of Fig. 8, it can be seen that the projections look similar, but there are some differences, e.g., the distribution for Tom is more spread out for UAMDS. There is also more overlap between Jane and Tom in the UAMDS projection, and less for Tom and David.
Using a Shepard diagram, we can assess the difference in projection quality in terms of distance preservation. By nature of PCA, with its global linear projection, distances cannot be overestimated, which is why all pairwise distances are on or below the diagonal. We take a closer look at distances to Tom's distribution (red) in the Shepard diagram, since it is the distribution causing most projection stress. For UAMDS, we can see that there is slight over-and underestimation for distances between Tom and the green, orange, and brown distributions (Bob, Joe, Jack). These distances seem to be more faithfully represented with UAMDS than with UA-PCA. Also, the pairwise distances between Tom and Jane (purple in the Shepard diagram) are more closely located to the diagonal in the Shepard diagram. When looking at the projections with the alternative mapping of textual grades to distributions (right half of Fig. 8), we see that this difference of modeling uncertainty has a noticeable effect on the projections (UA-PCA and UAMDS), underlining their sensitivity to uncertainty. From the Shepard diagram with highlighted red distribution, we can see that distances of samples within Tom's distribution are more faithfully represented by UAMDS since it has less spread and deviation from the diagonal in the Shepard diagram. Also interesting to see are the prominent horse-shoe shapes in the Shepard diagram for distances to David (blue) and Jane (purple). This means that for sample pairs with large distance in the original space, the projection either represents those distances well, or underestimates them considerably. This can be the case when overlapping and non-overlapping parts of the distributions are projected to close-by locations, similar to looking at a photo where foreground and background elements can be close. We encourage the reader to have a look at our supplemental material [13], where we provide additional visualizations for this data set.

Pokemon Data Set
Uncertain and statistical data can be found in many domains, especially video games contain carefully crafted sources of uncertainty to make gameplay more diversified. From this domain, we use a data set stemming from the Pokemon video games that consists of monsters (the Pokemon) and their different battle qualities Health Points (HP), Attack Strength (Att), Defense (Def), Special Attack Strength (SpA), Special Defense (SpD), and Speed. Each Pokemon comes with a type (e.g., fire-, water-, grass-, bug-type), but we omit the explanation of this mechanic for conciseness. The source of the uncertainty is the value ranges for the different stats that are possible. We collected the possible minimum and maximum values of the six stats for every kind of Pokemon from a web source [28], resulting in a set of 6D random vectors. In the following analysis, we select the first 15 Pokemon, containing the three starter Pokemon and their evolutions, as well as two early obtainable bug type Pokemon and their evolutions.
From the left plot in Fig. 9, we can see a pattern of grass, water, and fire types following the same direction. The strongest Pokemon (Charizard, Venusaur, and Blastoise) are found on the right side of the plot while their weaker evolution predecessors are located left of them. The weakest monsters are found on the left side of the plot. In traditional MDS, such patterns would also be possible to find, however, when using UAMDS we have means to analyze why elements have a certain arrangement in the visualization. Examining the projection matrices of the different Pokemon, we see how the different dimensions of their distribution are mapped to 2D space. Each distribution could potentially use a very different projection matrix, but due to the coupling of projections in the stress term, the matrices need to be similar to some extent. Visualized at the bottom are the projection matrices for a selection of Pokemon, where it can be seen how the different stats are projected to 2D. For instance, the Charizard distribution is mapped in such a way that samples with high speed stats are mapped to the top right. It can also be seen that all samples with higher than average stats are found right from the mean, which means that weaker Charizard H Ä gele eT Al.: UncerTAinTy-AwAre MUlTidiMensionAl scAling samples are mapped to the left. When looking at the visualizations of the other projection matrices, we see that they all agree on "weak samples left," which implies that monsters are ordered from left to right according to total strength in the whole visualization. Comparing the projection matrices of Charizard and Blastoise, we can see that they agree on projection directions to a large degree. Looking at the most dominant projection directions, speed and defense, we can explain the spatial arrangement of the distributions in this local area. Quicker monsters are located to the top right, defensive monsters are found toward the bottom, which implies that their distance in the visualization mainly encodes these stats.
The right plot in Fig. 9 color codes the stress of the projection. It can be seen that the projection of Beedrill's distribution causes the most stress. When examining the projection matrix, we can see that it has Att as dominantly mapped stat in place of Speed. The main part of Beedrill's high stress is caused by the nearby Butterfree distribution, where the projection almost omits the Att stat, which explains the stress and also tells us that the spatially encoded similarity of the two can only be explained when ignoring this stat. Fig. 10 Fig. 10. Plots for assessing projection quality of the UAMDS projection of Fig. 9. The stress parts S 1 , S 2 for all pairs i, j are plotted (left). The Shepard diagrams compare the trace of high and low-dimensional covariance combinations (middle) and high and low-dimensional fractional anisotropy of difference distribution pairs (right). Dots with light colors are pairs of different Pokemon (i ̸ = j), where their original color mapping is blended. Highlighted dots with outline are pairs with Beedrill. and shape of the distribution. Beedrill's distribution is also projected considerably more isotropically than the original distribution, as can be seen in the Shepard diagram (middle low dark green dot). The total variability of all distributions is mostly underestimated as indicated by the dots under the diagonal in the right Shepard diagram.

CONCLUSION
We showed that our conceptual idea of considering an infinite number of samples from input distributions in the limit and plugging these into an already existing optimization problem, provides a good starting point for developing uncertainty-aware variants of already existing methods, such as MDS in our case. We think that this concept can also be fruitful for other dimensionality reduction methods, but more considerations will be required to arrive at useful models, such as the introduction of a projection operator in UAMDS. Even though we have a distributionagnostic general model, a concrete stress has to be derived from it for specific distributions to be able to perform UAMDS. We already provide a model for the common case of normal distributions, and show a detailed derivation in the supplemental material [13]. This derived model assumes data consisting of normally distributed random vectors, independence between random vectors, uses squared distances in the stress, and estimates a separate linear projection operator for each of them. The process of introducing new types of distributions involves more derivations to be done and is thus not straightforward. Fortunately, normally distributed data is quite common, so our presented model should be widely applicable nonetheless. Assuming independence between random vectors of our normal distribution model is not always valid. For example, student grades could be correlated due to students being taught by the same teacher. However, such information is usually not provided, in which case independence is assumed anyways. Our choice of squared distances for dissimilarity made the stress term tractable for analytical stress derivation. However, this choice implies that pairs of distributions with large distances are emphasized in the stress, whereas pairs of close-by distributions are less crucial for stress minimization. Realizing the projection through local affine transformations follows the principle of first-order Taylor approximation, but may not perform well in cases where the operator cannot be assumed to be locally linear. This can be the case when there is major overlap between distributions in high-dimensional space. To support identification of such cases we provided visualization techniques for checking the trustworthiness of the projection. However, we also benefit from assuming local linearity as it is easy to interpret and provides the opportunity to apply analysis methods from linear projections as we showed in the results. Our examples and the comparison to UA-PCA showed that UAMDS is a useful addition to uncertainty-quantifying methods. Due to its approximation of a projection operator by local linear projections, it provides greater flexibility than UA-PCA, which is especially useful when dealing with anisotropic distributions. Generally, UAMDS makes it possible to analyze relationships in high-dimensional data sets while considering uncertainty and propagating it to the visualization space, therefore allowing for a more informed and faithful analysis.
In the future, we want to derive models for a broader range and mix of distributions, e.g., uniform, bimodal, and heavy-tailed distributions. We would also like to support a wider variety of stress types, such as nonmetric MDS, and test more sophisticated projection operators. Investigating the viability of using different dissimilarity measures as well as their effect on the projection will be an important part of improving the flexibility of UAMDS. Investigating ways to make other dimensionality reduction methods, like t-SNE or UMAP, uncertaintyaware and examining their stability [29] under uncertainty is also on our roadmap. Finally, we plan to perform a user study based evaluation [14] of the uncertainty visualization via MDS.