A New Approach to Improve the Topological Stability in Non-Linear Dimensionality Reduction

Dimensionality reduction in the machine learning field mitigates the undesired properties of high-dimensional spaces to facilitate classification, compression, and visualization of high-dimensional data. During the last decade, researchers proposed many new non-linear techniques for dimensionality reduction based on the assumption that data laying on or near a complex low-dimensional manifold is embedded in the high-dimensional space. On the other side, new techniques for dimensionality reduction aim to identify and extracting the manifold from the high-dimensional space. Isomap is one of widely-used low-dimensional embedding methods, where geodesic distances on a weighted graph are incorporated with the classical scaling (metric multidimensional scaling). The Isomap chooses the k-nearest neighbors based on the distance only which causes bridges and topological instability. In this paper, we propose a new algorithm to find the nearest neighbors to optimize the number of short-circuit errors and thus improve the topological stability. We assume that any point on the manifold and its nearest neighbors form a vector subspace and the orthogonal to that subspace is orthogonal to all vectors spans the vector subspace. Therefore, in the proposed algorithm, the point on the manifold and its nearest neighbors (i.e. the selected number of nearest neighbors based on the required subspace’s dimension) are used to find the bases of the subspace and the orthogonal to that subspace which belongs to the orthogonal complementary subspace. Then, candidate neighbors points will be added to the nearest neighbors based on the distance and the angle between each candidate point and the orthogonal to the subspace. The new algorithm is tested using low dimensional (3D) synthesized datasets and high dimensional real datasets. The superior performance of the new algorithm in choosing the nearest neighbors is confirmed through visually inspecting its capability find to the correct $2D$ representation of the synthesized datasets at different $k$ . Moreover, In the case of the high dimensional real datasets, the new algorithm yields a lower residual variance than the standard Isomap. Finally, we investigated using the new algorithm to find the nearest neighbors for the Locally Linear Embedding (LLE) algorithm. We tested the LLE variant using synthesized datasets and high dimensional real datasets and the results were promising


I. INTRODUCTION
Dimensionality reduction is the process of transforming a high-dimensional data-set into a meaningfully reduced dimensionality. The ideal goal of the dimension reduction is discovering the minimum number of parameters needed to account for the observed properties of the data, which is called the intrinsic dimensionality of data [1]. Dimensionality reduction techniques take as input a collection of unorganized data points. The topology and intrinsic dimensionality The associate editor coordinating the review of this manuscript and approving it for publication was Zhanyu Ma . of the underlying manifold are unknown and need to be estimated in the process of constructing a faithful lowdimensional embedding using the data sample. By representing the datasets using its reduced intrinsic dimension, many machine learning tasks such as classification, visualization, and compression of high-dimensional data become easier.
In the real world, many datasets such as global climate patterns, stellar spectra, or human gene distributions lie on complex nonlinear data manifolds and need nonlinear dimensionality reduction techniques that are capable of dealing with it. Antoulas [13], and Roweis and Saul [14] addressed the issue of model reduction for the large-scale dynamical systems considering the accuracy and complexity. Many reductions problems are solved using model order reduction techniques. In electrical engineering applications by Fortuna et al. [15]. Joshua B. Tenenbaum et al. [2] proposed the Isomap algorithm which is a global approach for dimension reduction where nearby points in the high dimensional space should be nearby in the new reduced dimensional space and the same should be happening for the faraway points. Isomap is a nonlinear dimension reduction technique that is capable of discovering the nonlinear degrees of freedom that underlie complex natural observations, such as human handwriting or images of a face under different viewing conditions. The Isomap algorithm has three steps. The first step is constructing a weighted graph G over the data points by establishing the neighborhood relationship between them. In the second step, Isomap estimates the geodesic distances between all pairs of points on the manifold M . These distances are the shortest path distances d G (i, j) in graph G which are calculated by Floyd's algorithm. The Isomap's third step is finding the top M eigenvectors of the matrix (−HSH /2) where S ij is the square of the shortest distance between points i and j and H is the centring matrix.
There are two types of Isomap: unsupervised and supervised Isomap. Conformal Isomap (C-Isomap) is unsupervised Isomap which is developed to guarantee conformality [3]. Landmark Isomap (L-Isomap) is another unsupervised Isomap which is a computationally efficient approximation of Isomap developed to increase the speed of Isomap [4]. Also, Incremental Isomap is a version of unsupervised Isomap, which is efficiently applied when data is collected sequentially [5]. The second type of Isomap is supervised Isomap. S-Isomap [6] and M-Isomap [7] are two supervised Isomap algorithms which utilize class information to guide the procedure of nonlinear dimensionality reduction. In all of the above versions of the Isomap, neighborhood search is a common and an important step.
The simplest way to identifying the neighborhood of each data point is identifying a fixed number of nearest neighbors, k, per data point, as measured by Euclidean distance or any other distance measure. The other criteria used to choose neighbours is based on the distance too, where all points that have distance smaller or equal to a certain threshold are considered neighbors. For example, one can identify neighbors by choosing all points within a ball of fixed radius. The number of neighbors for each point could be different. The step of neighborhood selection can be more sophisticated.
As an example, all points within a certain radius up to some maximum number can be considered as neighbors. Another quite sophisticated method, we can take up to a certain number of neighbors but none outside a maximum radius. In general, specifying the neighborhoods in Isomap depends only on distance information and does not incorporate any other information that may help to get better low dimensional embedding. Isomap estimating the nonlinear intrinsic geometry of a data manifold based on a rough estimate of each data point's neighbors on the manifold, hence the ability of Isomap to estimate the intrinsic geometry of a data manifold depends on choosing the correct neighbors for each point. Unfortunately, defining the connectivity of each data point via its nearest Euclidean neighbors in the high-dimensional is vulnerable to short-circuit errors [8]. These errors could happen if the folds in the manifold on which the data points lie are too close and the distances between these points are smaller than the true neighborhood distance. For example, a single data point which lies on one surface and very close to another surface could connect these two surfaces if the true neighborhood distance is relatively large. Many entries in the geodesic distance matrix can be altered by a single short-circuit error and consequently it leads to a drastically different (and incorrect) low-dimensional embedding. The left column ((a) and (c)) in Figure (1) shows an example of how short-circuit errors lead to incorrect low dimensional embedding. On the other hand, the right column ((b) and (d)) of Figure (1) shows no short-circuit errors and a correct low dimensional embedding. In this work, we incorporate another piece of information into the neighborhood selection in order to ensure the topological stability. Since each point and its neighbor lie in low dimension vector subspace (W ) we use the angle between the neighbouring candidate and the normal to the bases of that vector space (W ⊥ ). This angle should be close to 90 degrees to consider the candidate neighbor as neighbor.

II. THEORETICAL BACKGROUND AND METHODS
Given a set of unorganized data points which are sampled from the vector space V over the field R, [9], [10].
Let W be a finite-dimensional subspace of a vector space V which means the subspace W has a finite basis. If W is a finite-dimensional subspace of a vector space V then all basis for W will have the same number of vectors. The number of vectors in any basis is called the dimension of VOLUME 8, 2020 The white dotted vector, (bridge)x 3 has a small angle θ (less than 90 degrees) with W ⊥ which is normal to the subspace W . This means that x 3 cannot be approximated by a vector in the subspace W spanned by x 2 and x 1 .
the subspace W . If V itself has finite bases we say V is a finite-dimensional vector space then the number of vectors in any basis is called the dimension of V . Let W ⊥ denote the collection of all vectors in R D which are orthogonal to all the vectors in W ; At any point x in the vector space V we can assume that the all close neighbours to x that lie on the same manifold surface belong to the same subspace W or has a small angle with its projection on W. As shown in Figure (2) we can use the well-known theory above to solve the instability problem in Isomap. As shown in the figure, if we use the distance only to choose the neighbors of x 0 , then x 3 will be chosen as one of x 0 's neighbours. If we assume the two vectors from x 0 to its very two nearest neighbours are independent vectors and span the subspace W . Then using the bases vectors of W , we can find the W ⊥ . W ⊥ is normal to all vectors in W and if x 3 is not on the same surface as x 1 and x 2 then angle between W ⊥ and x 3 will be smaller than 90 degrees.

III. THEORETICAL BACKGROUND AND METHODS
Establishing the neighborhood relations is very important and if not accurate, will lead to the unstable topology. The proposed algorithm is used to construct the weighted graph G over the data points by establishing the neighbourhood relationship between them. The algorithm establishes the neighbourhood relationship for point x i based on two criteria; 1-The distances d(x i , x j ) between the pairs x i and x j . 2-The angle between the vector (x i − x j ) and any vector belong to W ⊥ . The other two steps are typically the same as those used in Isomap. Table 1 shows the proposed algorithm to establish neighborhood relationships for a set of data points M which are sampled from the vector space; As shown in Table 1   algorithm will use the three nearest neighbours to form the subspace W . We should highlight that in step 5 we choose a data point x q that is not an element of the nearest neighbors to find one of the vectors that span W ⊥ . The intuition behind this choice is that this data point is not local to the x p . We can not well approximate the vector (x q − x p ) using a linear combination of the bases v 1 and v 2 ; x q is a data point and x q / ∈ {x N 1 , x N 2 , . . . ., x Nk }. Hence we can use Gram-Schmidt to find one of the vectors that span W ⊥ .

IV. EXPERIMENTAL WORK
The experimental work consists of three parts, in the first part, we used low dimensional (3D) synthesized datasets and in the second part, we used high dimensional datasets. We started with the synthesized dataset because they have well-known manifold and can be used as a ground truth. Finally, in the third part we investigate if using the new algorithm in choosing the nearest neighbors to construct the graph representation of the data points in the nonlinear dimensionality reduction by Locally Linear Embedding algorithm (LLE) [1].

A. LOW DIMENSIONAL(3D) SYNTHESIZED
We have generated three synthesized datasets to test the proposed methods. These datasets are: 1-Helix. 2-Swiss Roll.
3-Punctured Sphere (the sampling is very sparse at the bottom and dense at the top). We applied the proposed algorithm to the three datasets using different number of nearest neighbors k. Each data point in the k nearest neighbour is added to NSS if θ within the range 90 • ± 5 • . The mapped data to lower dimensions is visually checked and it shows a good performance. The outputs of the proposed algorithm are shown in the following figures when using different k. Moreover, for the sake of visual comparison, we applied the t-Distributed Stochastic Neighbor Embedding (t-SNE) [16], (which is the state of art dimensionality reduction method) to the three datasets. The t-SNE method uses a measure called perplexity instead of the number of neighbors. One advantage of t-SNE is that its performance is robust to changes in the perplexity. Figure (4) shows the low dimensional representation of the helix where the right column is the result of the proposed method. The results in the figure show that for the proposed algorithm there are no bridges between points that belong to different surfaces. Also, the results show good low dimensional representation at k = 3 and 10 and a perfect representation at k = 5 which is to our mind better than the t-SNE presentation.
The proposed algorithm is applied to the Swiss Roll dataset at different k. The 2D representation of the Swiss Roll is shown in Figure (5).
Figure (5) shows the 2D representation of the Swiss role where the right column is the result of the proposed method. The results show good 2D representation at k = 10, 15, 20 and a perfect representation at k = 30. The standard ISOMAP gives a perfect representation at k = 10, and a representation looks like linear projection at other k. Furthermore, the t-SNE method did return a good representation which divided the 2D presentation into three separated clusters.
The proposed algorithm is applied to the punctured sphere dataset at different k. The 2D representation of the punctured sphere is shown in Figure (6). Figure (6) shows the 2D representation of the Punctured Sphere where the right column is the result of the proposed method. The results of the proposed algorithm show good 2D representation at k = 5, 10, 15 and 20. The standard Isomap gives bad representation at k = 5, 10, 15 and 20, and a representation looks like linear projection at k = 5, 10, 15 and 20. The t-SNE representation is good where it shows the large distances between the dark blue data point (i.e. the very sparse data points at the bottom of the 3D sphere) and small distances between the red points (i.e. the dense data points at the top of the 3D sphere).

B. HIGH DIMENSIONAL REAL DATASETS
The second part of the experiments is done using high dimensional datasets. The datasets used are: 1-The MNIST data set consists of 60,000 grayscale images of handwritten digits. For our experiments, we selected 1,000 of the images for computational reasons [11].  which introduce more variation into the images. The background of all images is homogeneous dark and the subjects in an upright, frontal position (with tolerance for some side movement) [12].     Because the datasets used in this set of experiments have high dimension and the underlying manifold of each dataset is unknown. We have used the residual variance to show the improvement of the proposed method.We compared the residual variance equation (4) of the proposed method and the standard Isomap.
VOLUME 8, 2020 where D Y is the matrix of Euclidean distances in the lowdimensional embedding recovered by the algorithm, D G is the graph distance matrix and R is the standard linear correlation coefficient. Figure  In order to further show the improved performance of our proposed approach over the standard Isomap, we calculated the residual variance at different k for both approaches. Figure (9) shows the residual variance of the two methods. Comparing the residual variance of the two methods shows that the proposed algorithm has a better residual variance. The 2D representations with highlighting images of the dataset using both algorithms are shown in Figures (10-11). These figures confirm that close points in the 2D space have similar features.
We applied both the standard Isomap and the proposed method to the Olivetti faces dataset using different k. The 2D representations of the dataset using both algorithms are shown in Figures (12-13). We show representations which have the best residual variance in each algorithm. Figure (14) shows a comparison between the residual variance of the two methods when applied to the Olivetti Faces the red curve is the best residual variance for the proposed method (at k = 13) and the blue curve is the best residual variance for the standard Isomap. The figure shows that the proposed method has lower residual variance which means it has better performance.

C. USING THE NEW ALGORITHM IN CHOOSING THE NEAREST NEIGHBORS TO CONSTRUCTS THE GRAPH REPRESENTATION OF THE DATA POINTS IN THE LLE
As shown in the above two subsections, using the proposed new algorithm to find the nearest neighbors reduces the number of short-circuit errors and thus improve the topological stability of the Isomap. In this subsection, we introduced a variant of the LLE where we used the proposed algorithm to choose the k-nearest neighbors for the LLE algorithm. It is well known that LLE follows a local graph-based approach hence it is less sensitive to short-circuiting than Isomap. [17], [18]. However, we hypothesis that using the   new algorithm to find the nearest neighbors for the LLE will produce better low-dimension representation, and sometimes low-dimension representation as good as the standard LLE.
In order to test our hypothesis, we used the standard LLE and our variant of LLE to reduce the dimensions of the three synthesized datasets. We used Equation 5 to compare the VOLUME 8, 2020  proposed LLE variant to the standard LLE.
percentage of cost function difference (PCFD) where (Y ) = i Y i − j W ij Y j is the LLE cost function Y i is the low-dimensional embedding data point of the high-dimensional data point X i Y j is the set of neighbours of the data points Y i 33906 VOLUME 8, 2020  W ij are the weights that linearly reconstruct Y i from its set of neigbors Y j Table compares the two methods in terms of the PCFD at the different k values. The results show that the new variant of the LLE produced better low-dimensional representation where local relationships between neighbors are preserved. Figure 15 shows some of the 2D mapping of the three synthesized datasets at different k which confirms that the new variant of the LLE better preserve the local topology between the neighbors data points. In Figure 15  Since, dimension reduction is the upstream step for more downstream data analysis such as clustering and classification. We used a subset of the MNIST data set to compare between the standard LLE and the proposed LLE variant in terms of the classification accuracy. In this experiment the high-dimensional data are converted into the low-dimensional data. Consequently, we used 10-fold cross-validation to estimate the classification accuracy of ten trees random forest classifier. As shown in Figure, the proposed LLE variant outperforms the standard LLE at the different lower dimension spaces and k values.
The results in the this subsection suggest that using the new algorithm to find the nearest neighbors for dimensional reduction methods that need to constructing a neighborhood relationships between the data points may produce a better low-dimensional representation.

V. CONCLUSION
Dimension reduction is a significant technique for discovering the underlying structure of the data manifold. Isomap is one of the best reduction algorithms, however, this algorithm has a drawback which is related to topology instability. This instability arises when the manifold surfaces are close to each other at some points which introduce bridges and lead to drastically different (and incorrect) low-dimensional embedding. We proposed an algorithm based on the subspace and orthogonal complementary subspace to solve the instability problem. In our algorithm, the neighbors are chosen based on both their distance and angle with the orthogonal complementary space. The proposed algorithm is tested with both synthesized datasets and high dimensional real datasets. The results show the improved stability of our approach over the standard Isomap and its lower residual variance. Moreover, we investigated using the new algorithm to find the nearest neighbors for the LLE algorithm. We tested the LLE variant using synthesized datasets and high dimensional real datasets and the results were promising.