Fuzzy Granule Manifold Alignment Preserving Local Topology

Granular computing has the advantage of discovering complex data knowledge, and manifold alignment has proven of great value in a lot of areas of machine learning. We propose a novel algorithm of fuzzy granule manifold alignment (FGMA), where we define some new operations, measurements, and local topology of fuzzy granular vectors in fuzzy granular space. Furthermore, the algorithm is very different from Semi-supervised and Procrustes algorithm because predetermining correspondence is not necessary. A projection is learned that can map instances described by two types of features to a low-dimensional space. Meanwhile, the local topology of the fuzzy granular vector induced by the instance is also preserved and matched within each set in lower dimensional space. This approach makes it possible to directly compare between data instances in different spaces. We convert an alignment problem of data in feature space into fuzzy granular manifold alignment problem of granular space. Specifically, we first define fuzzy granule, fuzzy granular vector, operations, and measurements in fuzzy granular space and gave proofs of theorems and deductions. Next, the local topology around the fuzzy granular vector is introduced and the optimal local topology matching can be achieved by minimizing their Frobenius norm. Finally, two manifolds are connected and the optimal mapping can be calculated to obtain dimensionality reduction of the joint structure. Thus, the corresponding relationship between two data instances can be got. We verified this algorithm in Oxford image and Alzheimer’s disease voice dataset. Theoretical analysis and experiments demonstrate the algorithm proposed is robust and effective.


I. INTRODUCTION
The term manifold learning was first proposed by Bregler and Omohundro in 1995 [1], [2]. It refers to restoring the underlying low-dimensional manifold structure on the basis of high-dimensional data, while building a low-dimensional description of data. This low-dimensional description reflects intrinsic variables that control the changes in the distribution of manifold data, that is, the minimum independent variables require to describe and perceive the changes in the distribution of data points on the manifold. In a broad sense, as long The associate editor coordinating the review of this manuscript and approving it for publication was Bilal Alatas .
as it is based on the manifold distribution assumption of the dataset, any feature learning method for the purpose of learning the internal rules and structural characteristics of the dataset can be regarded as the category of manifold learning. The difference between various manifold learning methods is limited to their different ways of collecting neighborhood information and the structural characteristics of the neighborhood. Manifold learning can be classified into global and local methods [3]. The global method is to maintain the distance relationship of all data instance points at various scales from the perspective of the global geometry of the distribution structure of the dataset. That is, when constructing a low-dimensional embedded representation, we need to VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ project points on manifold that are close to the distribution to neighboring points in lower dimensional space. Meanwhile, data points that are distributed farther are mapped into lower dimensional space, and their distribution relationship at a long distance is also maintained. Representative algorithms include ISOMAP and Maximum Variance Unfolding (MVU) and Diffusion Maps (DM). This idea based on global geometric structure is simple and easy to understand, and can give an accurate description of the global distribution structure of data. However, this type of method generally requires the subset of low-dimensional space to be convex, otherwise the geometric distance between the data points cannot be accurately described. In addition, because the distance between any pair of data needs to be calculated, the calculation of the algorithm is expensive. The local method is very different from the global one. It uses the perspective of local geometry to assure data points that are close within local neighborhood obtain similar projection positions on the low-dimensional space. This type of method constructs a low-dimensional description by maintaining the change of the distribution structure of the manifold data in the local neighborhood. Although it is impossible to accurately describe the overall structure of the data, its calculation cost is low. And for complex manifold structures, there is no correct priori knowledge as the guidance, so the calculation of the distance between distant points is often not guaranteed to be precision. Therefore, local methods are more widely employed in practice. Representative algorithms involve local tangent space alignment (LTSA), Laplacian eigenmaps (LE), local linear embedding (LLE) etc.

II. RELATED WORK
Three papers published in Science in 2000 brought manifold learning to a new stage of research [4]- [6]. In these three papers, not only was the biological basis of the existence of perceptual manifolds demonstrated, but also the isometric feature mapping algorithm and the local linear embedding algorithm were proposed, and good results were obtained. The method ISOMAP maintains the geodesic distance between data points and is successfully used in biomedical data visualization [7], [8]. In the LLE method, data points can be denoted by a linear combination of their neighbors, so local properties of data are maintained in dimensionality reduction, and they are successfully applied to non-convex manifolds, but the artificial biomedical database cannot be visualized [9]. Another Laplacian Eigenmaps (LE) algorithm that has attracted attention considers the distance between neighbors, and obtains a low-dimensional data representation by maintaining the local properties on the manifold [9], [10]. Although the algorithms mentioned above have excellent performance in dimensionality reduction, they do not get an explicit mapping. When new data is input, it cannot extract features fast and effectively.
To obtain solution of this problem, Locality Preserving Projections (LPP) algorithm [11]- [13], Neighborhood Preserving Embedding (NPE) algorithm [14], [15] and Orthogonal Neighborhood Preserving Projections (ONPP) algorithm [16] were successively proposed. These algorithms have similar ideas. First, they establish a neighbor graph for the instance points of the original dataset and construct a suitable relationship matrix to characterize the similarity of the pair of points inside the neighbor graph. Then these algorithms retain the neighbor relationship of the original dataset when reducing the dimension. And finally the linear embedding map of the explicit form through the optimization criterion can be obtained. This type of method can map new points directly into a lower dimensional space by linear embedding mapping. Whereas the methods do not adopt the category information in the dataset. The separability between the projected categories is not prominent. Hence the methods are unfit for classification and recognition of data, but suitable for dimensionality reduction or clustering [17]. Peng and his colleagues proposed a novel subspace clustering approach by introducing a new deep model-Structured AutoEncoder (StructAE) to handle realistic data without the linear subspace structure in 2018 [18]. Next, they proposed a novel objective function to project raw data into one space in which the projection embraces the geometric consistency (GC) and the cluster assignment consistency (CAC) [19]. Furthermore, they proposed a novel clustering method by minimizing the discrepancy between pairwise instance assignments for each data point [20]. These researches were very important to align instances. Granular computing has the superiority of discovering complex data knowledge. If manifold alignment is combined with granular computing, a new algorithm can be designed to enhance the performance. The idea of Granular computing is derived from concept of fuzzy information granulation proposed by Zadeh [21]. He thought that there are three main characteristics of human cognition, namely, granulation (breaking wholes into parts), organization (combining parts into wholes), and causation [22]. Hobbs presented the concept of Granularity in 1985 [23]. Then, Lin presented the concept of granular computing on the basis of binary relations in 1998 [24]. Yager et al. also talked over importance of granular computing in intelligent engineering in the same year [25]. In 2000, Yao gave the construction and calculation of granules in granular computing in detail [26]. In 2002, he studied the problem of information granulation and concept approximate structure in the rough set theory [27]. After that, Yao concentrated on basic concept of granular computing and gave a granular computing model via region division using set theory [28]; analyzed the past, present and future development of granular computing and presented triadic theory [29]; deeply discussed connection between cognitive science and granular computing and constructed a granular computing framework for cognitive concept learning [30]. The rough outline of the granular computing gradually forms. Pedrycz et al. published the book ''Handbook of granular computing'', which aims to provide guidance and assistance in granular computing related to computing intelligence, and so on [31]. Ling Zhang and Bo Zhang published the book on problem solution of quotient space, which elaborated a mathematical model of hierarchical multi-granule computing based on quotient space [32]. Duoqian Miao, Deyi Li, Yiyu Yao et al. introduced the process of research on the uncertainty of granular computing and the cross-over study in [33].
Wang and his colleagues analyzed and summarized the uncertainty problems on granular computing models [34]. Liang et al. combined concept of information granule with entropy theory, and established the complementary relationship between them [35]. Yao et al. gave a new way of thinking about prospects and challenges of future granular computing in [36]. Wu et al. established a rough approximation model of the order granular structure in the order information system of multi-granularity marking [37]. For the three core issues of information granulation, information granularity and granular operation under granular computing, Qian et al. established a unified representation framework for granular computing based on sets [38]. Hu et al. realized the attribute reduction of mixed data by constructing a new information granulation method on fuzzy rough sets [39]. Xu et al. established a formal conceptual system describing human cognitive processes, and gave definitions and algorithms for granular transformation of related information [40]. Herbert and Yao regard the growth and absorption of neurons as the construction and decomposition of granules, and then proposed a hierarchical self-organizing mapping granular computing framework for neural networks [41]. Jankowski et al. used the interactive operation between complex information granules in rough set model to present an approach to deal with uncertain problems in complex systems [42]. Dubois and Henri discussed the common features between extended fuzzy sets from two aspects of similarity relationship and formal concept [43]. Peters and Weber summarized and proposed a unified framework for existing dynamic granule clustering algorithms [44]. Li et al. proposed formal concept learning method on the basis of granular computing from angle of cognitive computing [45]. Xu et al. presented a granular computing method that uses information granules to describe machine learning for fuzzy datasets [46]. Chiaselotti et al. connected the automorphism of graphs with the indistinguishable relationship, and introduced the relationship between simple graphs based on adjacency matrices and granular computing [47]. Salehi et al. conducted a systematic classification study on granular computing related research from four perspectives of key areas, contribution types, research types and research frameworks. They found that the research on clustering analysis for information granules was rare, and compared the effect of information granulation of five common clustering algorithms [48]. Al-Hmouz et al. introduced a granular computing framework on the basis of time series description and prediction [49]. Wang and his colleagues surveyed existing research of granular computing from optimization and conversion of granularity and multi-granularity joint problem solving, and presented graph for relationship among three primary modes of granular computing [50]. Xu and his colleagues presented a local-density-based optimal granulation model [51]. Granular computing has achieved lots of results in theory, methods and applications, but there are still many key problems to be solved in the analysis and processing of big data, such as the abstract description of multi-level granular space and knowledge acquisition technology, the establishment and optimization of multi-granularity model, the interpretation and construction of the multi-level granularity spatial structure, the conversion mechanism and method between the granular layer and the granularity, etc. And three-way decision proposed by Yiyu Yao is one of good methods to find these solutions. In addition, Li et al. presented a fastest robust path optimization algorithm to form the best routing by combining the effect of traffic events [52]. Li and his colleagues adopted granular computing to design boosted k nearest neighbor classifiers [53] and search voice encrypted scheme [54]. Mencar and Pedrycz proposed the definition of granular counting, which realized in the presence of uncertain data modeled through possibility distributions [55] and this is also a new angle to study uncertain data.

III. CONTRIBUTIONS
Our contributions are as follows: • We convert an alignment problem of instance in feature space into fuzzy granule manifold alignment problem in fuzzy granular space to solve. This method incorporates the idea of granular computing and solves problems from different levels. This method can observe and process data sets hierarchically from different granularity, which reduces the complexity of alignment problem.
• We propose the term fuzzy granular vector by defining new operations and metrics and design the algorithm of fuzzy granule manifold alignment based on them. Moreover, this algorithm can align instances without predetermining correspondence, which is required for Semi-supervised and Procrustes algorithm.

IV. THE PROBLEM
As defined in Table 1, let P be a set of instances sampled from manifold P, and let Q be a set of instances collected from manifold Q. Learning mappings α and β to a new space Z is our aim. Also, we need to preserve the neighborhood relationships inside of P and those of Q in new space Z respectively. Suppose that local topology of p i ∈ P and q j ∈ Q can be well matched in the original spaces, then p i and q j will also be matched in the new space Z . Our goal is to align the instances in the datasets P and Q. Since the instances in these two datasets are represented by features of different dimensionality, it is very difficult to directly compare p i ∈ P and q j ∈ Q. In order to establish the relationship between them, we use fuzzy granular vector constructed by p i and fuzzy granular vectors induced by its neighbors to jointly characterize the local topology of p i . Similarly, the local topology of q j is also expressed via this method. p i and q j may be directly compared by local topology denoted relations VOLUME 8, 2020  instead of features. But p i might be similar to many instances in Q. Therefore, to find which instance in Q is true similar is very difficult. As matter of fact, there are often lots of true match for adhibitions. Interestingly, discovering true match may be more difficult than solving the coupled problem of the original problem. The reason lies in preserving structures of two manifolds during matching process. So it can help us filter out some false positive matches. In the method, we firstly establish all likely matches for every instance based on local topology mentioned above. After that instance alignment problem is transformed into an embedding problem with constraint conditions, which can be found solution using generalized eigenvalue decomposition. The algorithm flow is shown in Figure 1. p i ∈ P and q j ∈ Q come from varied manifolds, so they are not to be indirectly compared. The algorithm proposed learns the mappings α and β which can map the instances from P and Q, respectively, to a new same space. In this way, samples from varied manifolds with similar local topology can be projected to similar locations and manifold structure can also be kept. In this process, first, each instance in P and Q is fuzzy granulated to construct a fuzzy granular vector. After that we need to analyze local topology of each fuzzy granular vector, put the fuzzy granular vectors and their local topology from P and Q into the joint structure, and find the best local topology match, so as to get the mapping α and β. Thus, α T p i and β T q j are in the same space and have the same dimensionality, which can be directly compared and be as the evaluation of the similarity between p i and q j .
. . , f e s (q j )) T , and Q is a s × n matrix. The aim is to discover the match point pair between P and Q. We first fuzzily granulate each point in P and Q. Then we can match the points according to the local topology of fuzzy granular vectors induced by them in the fuzzy granular space. Further these points from P and Q can be aligned or these matching point pairs can be found.

A. CONVERTING DATA INTO FUZZY GRANULES
In general, it is fuzzy that the granularity of human reasoning and concept building. Fuzzy information granulation is generally achieved by fuzzy binary relationship defined in fuzzy granular space. Atomic attribute is fuzzy granulated and then we can construct fuzzy granular vectors. For ∀p i , p j ∈ P and ∀r ∈ R, the distance on attribute r between p i and p j can be denoted as: where s r (p i , p j ) ∈ [0, 1]. f r (p i ) and f r (p j ) express normalized value of p i and p j on the attribute r, respectively. According to distance definition, fuzzy granule can be constructed. For ∀p i ∈ P,∀r ∈ R, fuzzy granule of instance p i on attribute r can be denoted as: Here N r (p i ) is the fuzzy granule induced by p i (also seen as fuzzy neighborhoods of p i ). ''+'' represents the union operator, and ''−'' denotes separator. This fuzzy granule can also be seen as a set composed of the distance pairs between instances. By accumulating the elements inside fuzzy granule, the cardinal of fuzzy granule is denoted as: We have 0 ≤ |N r (p)| ≤ |P|. Here |P| represents the number of instance set P. Since an attribute can induce one fuzzy granule of one instance, one attribute set can induce one fuzzy granular vector of one instance. In other words, a fuzzy granular vector consists of many fuzzy granules. We can define it as follows: for ∀p ∈ P, ∀A ⊆ R, namely A = {r 1 , r 2 , . . . , r k }, (k ≤ |R|), then the fuzzy granular vector of p on the attribute subset A iŝ Similarly, here ''+'' denotes union operator and ''-'' represents separator. Furthermore, the module of fuzzy granular vector induced by p on attribute subset A can be denoted as: Given these definitions mentioned above, operators and metrics between fuzzy granules can also be denoted as follows.
Let N r (p) and N r (v) be fuzzy granules of instances p and v on attribute r respectively, four operators ∩,∪,∼,and ⊕ can be defined as follows: Next, operators and metrics of fuzzy granular vector can be designed on the basis of them. For ∀p, v ∈ P, consider two fuzzy granular vectorsN A (p) = k j=1 N r j (p) on A ⊆ R, r i ∈ A, i = 1, 2, . . . , k. Operators and metrics between fuzzy granular vectors can be defined as:N Here, ''+'' represents union; ''−''denotes separator. Based on these definitions, the distance between fuzzy granular vectors can be described aŝ Below we give two theorems about the distance and the monotony of fuzzy granular vectors. Theorem 1: For ∀p, v ∈ P, the distance of fuzzy granular vector satisfies to the following property: by Equation (4). Further, ∀r ∈ R, 0 ≤ |N r (p)⊕N r (v)| |N r (p)∪N r (v)| ≤ |P| can be given by Equation (5), (7) and (9).
Theorem 2 (Monotony): For ∀p ∈ P, the attribute subset A and F satisfy A ⊆ F ⊆ R, and letN A (p) andN F (p) be fuzzy granular vectors of p on A and F respectively. Then For ∀r ∈ A, its fuzzy granule is N r (p). Since A ⊆ F, r ∈ F. Its fuzzy granule satisfies N r (p) ∈N F (p),|A| ≤ |F|. So r∈A |N r (p)| ≤ r∈F |N r (p)|, that is, |N A (p)| ≤ |N F (p)| is established. The distance of fuzzy granular vector is regarded as the metric of similarity between fuzzy granular vectors. The smaller distance, the more similar fuzzy granular vectors are. Otherwise, the bigger distance, the less similar fuzzy granular vectors are. We have now give an example.
Example 1: As shown in Table 2, given a instance set P = {p 1 , p 2 , p 3 , p 4 } and an attribute set R = {r 1 , r 2 , r 3 }, granulation process is as follows.

B. LOCAL TOPOLOGY OF FUZZY GRANULAR VECTOR
Let L p i be a (k + 1) × (k + 1) matrix denoting local topology of fuzzy granular vector induced by p i . For example, one ele- ) denotes the distance between fuzzy granular vector of x a and fuzzy granular vector of . , x k+1 } are p i 's k nearest neighbors of fuzzy granular vectors. Similarly, L q j is also a (k + 1) × (k + 1) matrix denoting local topology of fuzzy granular vector induced by q j . Let M be a m × n matrix, where M ij is the distance between L p i and L q j . As we know, there are k! permutations in the k nearest neighbors of the fuzzy granular vector from p i . In other words, there are k! variants in L p i . Let {L p i } h represent its h th variant. Similarly, L q j has also k! variants and {L q j } h denotes its h th variant. For ∀M ij ∈ M m×n , the calculation of M ij is equivalent to calculating the distance between L p i and L q j , dist(L p i , L q j ), which can be given according to Equa- represents Frobenius norm). We give the theorem and proof as follows.
Theorem 3: Given two (k + 1) × (k + 1) distance matrices L 1 and L 2 , π 2 = trace(L T 2 L 1 )/trace(L T 2 L T 2 ) can be got by minimizing ||L 1 − π 2 L 2 || F . Symmetrically, π 1 = trace(L T 1 L 2 )/trace(L T 1 L T 1 ) can also be got by minimizing Proof: The aim is to find π 2 to minimize ||L 1 − π 2 L 2 || F . Here || · || F expresses Frobenius norm. This is equivalent to solving π 2 = argmin π 2 ||L 1 − π 2 L 2 || F . It is easy to verify ||L 1 − π 2 L 2 || F = trace(L T 1 L 1 ) − 2π 2 trace (L T 2 L 1 ) + π 2 2 trace(L T 2 L 2 ) according to Frobenius norm. Since trace(L T 1 L 1 ) is constant, the problem is equivalent to finding the solution on π 2 = argmin π 2 {π 2 2 trace(L T 2 L 2 ) − 2π 2 trace(L T 2 L 1 )}. Differentiating with respect to π 2 , this Equation implies 2π 2 trace(L T 2 L 2 ) = 2trace(L T 2 L 1 ). Further, it implies π 2 = trace(L T 2 L 1 )/trace(L T 2 L 2 ). Similarly, we can also find π 1 = trace(L T 1 L 2 )/trace(L T 1 L T 1 ). To calculate matrix M , the comparison of all matrices L need to be calculated. When computing L p i and L q j , we assume that p i and q j are matched. However, it is unknown that k nearest neighbors of fuzzy granular vector of p i matches that of q j . To find the optimum matching, k! permutations need to be considered. It is easy to do, because local pattern and k are very small positive integer. As demonstrated in Theorem 3, we describe how to rescale optimally to match each other. In other words, we need to consider all possible matching between two local patterns. That is to compute dist(L p i , L q j ) and return the distance calculated from the best possible match.

C. MANIFOLD ALIGNMENT WITHOUT PREDETERMINING CORRESPONDENCE
In this section, to find the best matching between P and Q, a loss function LS(α, β) is constructed by fusing their local topology. By solving the LS(α, β), the best mappings, α, β, can be found. This computing process can be equal to solving the problem of generalized eigenvalue. We first give some notions as follows.
Let D p be a diagonal matrix, where D ii p = j M ij p and V p = D p − M p . M ij q denotes the distance between q i and q j , and D q is also a diagonal matrix, where D ii We use matrix multiplication to map the vector p i into a scalar, that is α T p i (α is a t × 1 matrix). Similarly, we have β T q j to obtain a scalar (β is a s × 1 matrix). Let ξ = (α T , β T ) T . LS(α, β) is a loss function, where µ 0 is weight of the first term of LS(α, β).
. The aim is to minimize the loss function Here weights satisfy to µ 1 + µ 2 = 1. The first term of LS(α, β) is to penalize the difference between P and Q on the matched local topology in fuzzy granular space. If M ij is small, it can be inferred that L p i and L q j are close in the new space. Otherwise, if M ij is big, we may conclude L p i is far distant from L q j in the new space. The second and third terms imply similarity of local topology inside P and Q. The optimal solution of this loss function LS(α, β) is provided in Theorem 4. Theorem 4: Solving the loss function LS(α, β) is equivalent to calculating the minimum eigenvectors of the generalized eigenvalue decomposition ZVZ T ξ = λZDZ T ξ , which can optimal mappings to align P and Q.
Proof: The aim is to assure LS(α, β) = ξ T ZVZ T ξ is established. Two manifolds can be aligned and their embedding structure can be discovered by joining two graphs using matrix V . To eliminate arbitrary proportionality scaling in the embedding, α T PD p P T α + β T QD q Q T β = ξ T ZDZ T ξ = 1 need to be as an constraint condition to be added. The matrices D p and D q give a measure on the vertices of the graph, where vertices can be seen as instances. If the value D ii p or D ii q is small, it implies p i or q i is less essential. If there were not the constraint condition, all instances projected into new space might be indistinguishable. Finally, the optimum problem can be converted into solving the Equation Then, α T p i and β T q j have the same dimensionality and can be compared directly.
The algorithm is mainly divided into three phase, instance granulation, local topology construction, and eigenvector solving, which are shown in Table 3.

VI. EXPERIMENTAL ANALYSIS
In this section, we adopted recall and precision to evaluate the performance of algorithm. Here recall = TP TP+FN , precision = TP TP+FP , where TP represents true positive, FN denotes false negative, and FP is false positive.
We employed Oxford 5K image dataset and Alzheimer's disease voice dataset to verify performance of the algorithm. Oxford 5K is a widely used in Oxford public standard dataset. It consists of 5062 landmark building images collected from Flicker. There are 11 categories, and each category has 5 query images, and total is 55 query images. The dataset contains many types of image changes, such as translation, rotation, and perspective transformation. For all datasets we adopted the standard evaluation protocols and reported Average Precision and Average Recall. we follow standard practice and resize the images to multiple and extract crops of 256 × 256 pixels. In this paper, we used Histogram of Oriented Gradients (HOG) [56] and Speeded Up Robust Features (SURF) [57] as image features. In image processing cases, P and Q represent image. We need to match point pairs in P and Q. p i and q j are point in P and Q respectively. R, A, and E denote feature (attribute) set. r i , a j , and e k express feature (attribute) value respectively. We can describe HOG feature via computing and counting gradient direction histogram from local area inside image. Since the shape and appearance of local area can be well embodied via gradient direction, we adopted HOG feature descriptor to characterize Local details around the point of image. The image was split into several blocks. The center of each block with the size of 16 × 16 was the focus point. Then each block was evenly divided into 4 cells and amplitudes and directions of these cells were computed. The gradient direction was divided into 8 directions on average, and then accumulate the gradient amplitude of the pixels in each cell in the same gradient direction value range to obtain an 8-dimensional cell feature vector. Connected them together to construct a 32-dimensional vector, which is used as the descriptor of the focus point.
The SURF is widely used target feature extraction and matching algorithm. It has efficient and stable characteristics and can be employed in scenes that require real-time operations such as target recognition and tracking locking. The basic idea of the SURF algorithm is derived from the scale-invariant feature transform (SIFT) algorithm [58], but it adopts a fast approximation method in specific search, neighbor feature description, and descriptor matching, which makes the execution efficiency and stability better than the SIFT algorithm. For example, in terms of feature point search, the SURF algorithm uses a square filter instead of the Gaussian filter adopted in the SIFT algorithm, and with the help of the concept of integral image, the convolution operation of image and the Gaussian differential template are converted into addition and subtraction operations on the integral image. Therefore, its computing speed is improved. HAAR features and integrated images are used to decrease computational complexity. SURF is similar to SIFT. Here, 64-dimensional feature vector was obtained by calculating horizontal and vertical HAAR wavelet features in a certain area of feature points. Figure 2 compares the performance of FGMA, semi-supervised manifold alignment (SSMA) [59], manifold alignment using Procrustes analysis (MAPA) [60], semisupervised manifold alignment with multi-graph embedding (SSMAME) [61], and only considering local geometric alignment (OCLGA) methods. The performance refers to average precision and average recall in the experiments.
As shown in the left side of Figure 2, the average precision is shown. Its abscissa represents the number of local neighbor of fuzzy granular vector, and the ordinate denotes the average precision. When k = 17, FGMA reached a peak of 0.912, and SSMAME, SSMA, MAPA, and OCLGA achieved 0.903, 0.881, 0.880, and 0.868 respectively. Average precision increased by 1.00%, 3.52%, 3.64%, and 5.07% respectively. The average precision curve of the three methods shows a shape with a high middle and low sides. This shows that the performance is better when the number of local neighbor of fuzzy granular vector is between 10 to 20. Whether k is too large or too small, it may lead to be on the decline for performance. Overall, FGMA outperforms other four methods. The graph on the right of Figure 2 shows the average recall rate. From the curve shape, FGMA envelopes OCLGA, and OCLGA envelopes SSMA. This illustrated that in terms of average recall rate, FGMA performed OCLGA and OCLGA was superior to SSMA. From the peak of the curve, when k = 17, FGMA got the maximum value of 0.916. SSMAME, SSMA, MAPA, and OCLGA reached their maximum values of 0.896, 0.883, 0.878, and 0.869 respectively (i.e., 2.23%, 3.74%, 4.33%, and 5.41% improvement respectively). Similarly, the curve of average recall also demonstrates a shape with a high middle and low sides. Whether k is too large or too small, performance may decline. If parameter k is too large, much more noise will be imported so that performance may also fall; otherwise, if parameter k is too small, local topology will be not embedded in matching process as constraint condition. Figure 3 shows the result of using the fuzzy granule manifold alignment, where HOG and SURF were employed to extract raw features of images.
Besides the image dataset, we also adopted voice data to verify performance of the algorithm. The Alzheimer's disease dataset came from the University of Pittsburgh, and the data was stored in form of speech and text from participants including elderly controls, people with possible Alzheimer's Disease, and people with other dementia diagnoses. There are 1263 instances in the corpus. Here, P and Q represent two voices. Point pairs in P and Q need to be compared. p i and q j denote point in P and Q respectively. R, A, and E denote feature (MFCC and GFCC) set. r i , a j , and e k are feature value respectively. We know that Mel Cepstral Coefficients (MFCC) and Gammatone Cepstral Coefficients (GFCC) are highly robust, and the two methods are used as features to extract speech. First, the voice instances were down-converted to 22050 Hz and normalized. A window with a frame length of 512 (about 23ms) and a frame shift of 256 were adopted to divide the sound instance into multiple frames. Next, extract MFCC feature and GFCC feature for each voice frame separately. For MFCC, we selected the first 15 dimensions, its first-order difference, and its second-order difference, which were concatenated to obtain 42-dimensional features. For the GFCC, we selected the first 12-dimensional features, its first-order difference, and second-order difference, which were connected to get 33-dimensional features. Deleted blank frames and scrambled the dataset. As shown in Table 4, when the number of neighbors k = 7, the precision of FGMA reached maximum of 0.921, while SSMAME, SSMA, MAPA, and OCLGA got 0.905, 0.887, 0.879, and 0.868 (i.e., improvement by 1.77%, 3.83%, 4.78%, and 6.11% respectively). Next, we gave commutaion time because improvement of  efficiency is also depended on running time. As demonstrated in Table 4, compared with the four algorithms, FGMA is better than SSMAME (i.e., improvement by 3.37%); OCLGA, MAPA, and SSMA outperformed FGMA and improved by 7.75%, 3.49%, 2.33%, respectively. In terms of recall rate, compared with SSMAME, SSMA, MAPA, and OCLGA, FGMA increased by 2.30%, 6.14%, 6.87%, and 7.12% respectively. In addition, Figure 2 also illustrated how average precision and average recall changed with the number of neighbors in the image dataset.
In summary, whether dataset is image or voice, FGMA is slightly superior to SSMAME, and SSMAME performs better than SSMA, MAPA, and OCLGA. The reasons are as follows: on the one hand, FGMA considers the global characteristics from the perspective of fuzzy granule; on the other hand, it not only considers the local topology of fuzzy granular vector induced by instance point, but also establishes an objective function to obtain an optimal solution.

VII. DISSCUSSION
In the paper, we propose a novel fuzzy granule manifold alignment method without predetermining correspondence. The approach can find correspondence between different fuzzy granular spaces, making it possible to align between instance points denoted via different features. In addition to giving a theoretical analysis, we also apply it to the real world of image feature point alignment and Alzheimer's speech instance matching. In future work, we will consider local and distributed granulation to further enhance algorithm efficiency.