Local Low-Rank Approximation With Superpixel-Guided Locality Preserving Graph for Hyperspectral Image Classification

Given the detrimental effect of spectral variations in a hyperspectral image (HSI), this article investigates to recover its discriminative representation to improve the classification performance. We propose a new method, namely local low-rank approximation with superpixel-guided locality preserving graph (LLRA-SLPG), which can reduce the spectral variations and preserve the local manifold structure of an HSI. Specifically, the LLRA-SLPG method first clusters pixels of an HSI into several groups (i.e., superpixels). By taking advantage of the local manifold structure, a Laplacian graph is constructed from the superpixels to ensure that a typical pixel should be similar to its neighbors within the same superpixel. The LLRA-SLPG model can increase the compactness of pixels belonging to the same class by reducing spectral variations while promoting local consistency via the Laplacian graph. The objective function of the LLRA-SLPG model can be solved efficiently in an iterative manner. Experimental results on four benchmark datasets validate the superiority of the LLRA-SLPG model over state-of-the-art methods, particularly in cases where only extremely few training pixels are available.

HSIs have wide applications in various areas, such as mineral detection [2], agriculture [3], [4], urban planning [5], [6], and environment monitoring [7], [8]. However, given the noise associated with complex environmental conditions, the acquired HSIs often suffer from spectral variations [9], i.e., pixels of the same material may change considerably and significantly impair the performance of HSI classification.
Many low-rank-based methods have been proposed to alleviate the impact of spectral variations and accordingly improve classification accuracy. Particularly, much attention has been paid to the low-rank approximation (LRA)-based methods [10], [11], which assume that the pixels of an HSI should be distributed in one or more low-dimensional space(s) [12], [13]. For example, some methods [14], [15] assume that the data come from a unified subspace and then employ robust LRA on a whole HSI. To better capture the data drawn from the union of multiple subspaces, a multi-subspace-based LRA [16] is used to model an entire HSI [17], [18], [19], [20], [21], [22], [23]. All the above methods assume the global low-rank property, i.e., applying robust LRA or multisubspace-based LRA to the entire HSI. However, they neglect a fact that pixels within a homogeneous local region are often from the same class, i.e., the local low-rank property. To this end, many local LRA-based methods have been proposed [24], [25], [26], [27], [28], which first divide an HSI into multiple rectangular patches and then process each patch individually via robust LRA. As shape-adaptive regions (i.e., superpixel) are better at capturing the complex local spatial structure than rectangular patches, the superpixel-based segmentation has also been used in some local LRA-based methods [29], [30], [31]. It is worth noting that all of the above studies conduct LRA on 2D matrices. As an HSI can be naturally represented as a 3-D tensor, some tensor LRA-based methods have been proposed [32], [33], [34], [35], [36]. Additionally, due to the powerful representation ability of deep neural networks [37], [38], [39], some low-rank-based deep learning methods [39], [40] have been proposed. Besides, for other advanced feature extraction methods [41], [42] irrelevant to low-rank-based methods, the graph preserving is utilized to enhance the representation of an HSI.
Although some of the existing local LRA-based methods use superpixels to characterize the complex local spatial structure; they fail to preserve the local manifold structure, which is unfavorable to the discriminative ability of the representation. In other words, they cannot sufficiently capture the locality of an HSI, which limits the further improvement of classification performance.
Considering the above points, we propose a local low-rank approximation with superpixel-guided locality preserving graph (LLRA-SLPG) model, which can increase the within-class compactness by reducing the spectral variations for each superpixel and promote the local consistency via a proposed superpixel-guided locality preserving graph. Moreover, our proposed LLRA-SLPG method is the first to combine the local manifold structures with the local LRA into a unified model. Our targets are to use the local low-rank term with the 1 norm to separate the low-rank part and sparse noise part of an HSI, and use the superpixel-guided locality preserving graph to further preserve the local manifold structure of the low-rank part. Specifically, the proposed graph regularizer in the LLRA-SLPG method makes the pixel similar to its neighbors within the same superpixel. We formulate the objective function of the LLRA-SLPG method as a constrained optimization problem, which can be solved efficiently in an iterative manner. Extensive experiments validate the superiority of the LLRA-SLPG model over state-of-the-art methods, particularly for the cases with few training samples. The contributions of this article are given as follows.
1) Our proposed LLRA-SLPG method is the first to fuse the local LRA and the local manifold structures into a unified model simultaneously. Specifically, existing local LRA-based methods cannot adequately capture the complex local spatial structures of the original HSI, e.g., local manifolds (or local consistencies) are ignored, which further limits the classification performance. To address this issue, our proposed LLRA-SLPG method designs a superpixel-guided locality preserving graph to enhance preserving the local spatial structure of an HSI, thus further improving the discriminative ability of the representation.
2) The proposed LLRA-SLPG method can be solved efficiently in an iterative manner. The experiments demonstrate its advantage over state-of-the-art methods, especially when the training samples are extremely few. 3) We organize the remainder of this article as follows. Section II gives a brief review of existing LRA-based methods for HSI. Then, we present the proposed LLRA-SLPG method in Section III, followed by the comprehensive experimental results and analyses in Section IV. Finally, Section V concludes this article.

II. RELATED WORK
This section reviews existing LRA-based methods for modeling HSIs.
Some methods assume that the data are distributed in a single low-rank subspace and simply apply LRA on a whole HSI to recover a discriminative representation. For example, Mei et al. [14] apply 1 -based LRA to reduce the spectral variations and improve the HSI classification. Given the data drawn from a union of multiple subspaces, multisubspace-based LRA is introduced to represent the HSI in some methods [17], [18], [19], [20], [21], [22], [23]. Specifically, Lu et al. [17] proposed a graph regularized LRA method within multiple subspaces to remove striping noise of an HSI. Sumarsono and Du used a multisubspace-based LRA model [18] to preprocess the spectral feature, which is then used by both supervised and unsupervised learning methods. Wang et al. [19] incorporated the local geometric structure into a multisubspace-based LRA model to improve the classification performance. Wang et al. [20] and Mei et al. [21] used cluster-based regularizers to incorporate the superpixel information into the multisubspace-based LRA model, that is, they represent the feature of a superpixel as a cluster center and make all the pixels in the superpixel close to the cluster center. Xu et al. [22] integrated a hypergraphbased regularizer into a multisubspace-based LRA model for the unsupervised HSI classification, whose graph induces the spatial-spectral information based on a superpixel segmentation. Xing et al. [23] incorporated a classwise regularization into a multisubspace-based LRA model to capture the classwise block-diagonal structure, which maps pixels from one class into the same subspace.
All the aforementioned methods adopt the global low-rank assumption. As the pixels within a small local region are usually from the same class, some local LRA-based methods have been proposed. Zhang et al. [24] and Zhu et al. [25] divided an HSI into regular patches and apply robust LRA patch by patch for HSI restoration. As the noise level in different bands may change significantly, He et al. [26] proposed a noise-adjusted iteration framework that uses a patchwise LRA method for HSI denoising. Apart from applying the patchwise LRA in the spectral domain, Mei et al. [27], [28] explored the low-rank property from the spatial domain (i.e., applying LRA to a whole HSI on each spectral band). Specifically, Mei et al. [27] applied LRA to an HSI from the spectral and spatial domains in two distinct steps. Moreover, Mei et al. [28] also proposed a unified model to combine the spectral and spatial low-rank property.
Since the patchwise segmentation cannot thoroughly exploit the complicated local spatial structure of an HSI, some superpixel-induced local LRA-based methods have been proposed [29], [30], [31], in which each superpixel can be regarded as a shape-adaptive region. Specifically, Xu et al. [29] first perform LRA on all the superpixels to extract the low-rank data, followed by a Markov random field to define the local correlation. Fan et al. [30] first employed PCA to obtain the first component of an HSI, which is then processed by the superpixel segmentation method to get the homogeneous regions. Finally, they apply the LRA to each homogeneous region to remove the noise and outliers. Yang et al. [31] proposed a discriminative low-rank model that can increase the intraclass similarity by applying LRA on each superpixel while promoting the global separability between classes. Based on such a model, a superpixel-based classification framework is proposed to utilize the prediction of a typical classifier to improve the superpixel segmentation.
Due to an HSI can be inherently depicted as a 3-D tensor, some tensor-based LRA methods have been investigated [32], [33], [34], [35], [36]. An et al. [32] proposed a tensor-based low-rank graph to perform the dimension reduction for an HSI, and this method characterizes the intraclass compactness and the interclass separability via a multimanifold regularizer. Deng et al. [33] developed a tensor low-rank discriminative embedding model, which utilizes the low-rank reconstruction to uncover the potential sample relationships and incorporates the label information to improve the discriminability of features. Deng et al. [34] proposed a patch tensor-based multigraph embedding framework, which builds three different types of subgraphs to capture the intrinsically geometrical structure of HSIs. Sun et al. [35] proposed a lateral-slice sparse tensor robust principal component analysis to remove noises or outliers in an HSI to improve the subsequent classification performance. Liu et al. [36] proposed a local-global balanced tensor LRA method, which can be viewed as an extension of [31] to the tensor case.
Given the powerful representation learning capacity of deep neural networks [37], [38], [39], some deep learning models [39], [40] were proposed to combine the low-rank property with deep learning. Specifically, Wang et al. [39] proposed an unsupervised segmented stacked denoising autoencoder for extracting features, followed by a low-rank classifier for the HSI classification. Zhang et al. [40] proposed an end-to-end low-rank spatial-spectral network for removing noises in HSIs. By integrating the low-rank property into a deep convolutional neural network (DCNN), this method benefits from the powerful feature representation ability of DCNN and the good generalization ability of the low-rank property.

III. METHODOLOGY
Y ∈ R b×n specifies an HIS, which contains n pixels, and each pixel is represented with b dimensional spectral bands. To suppress the spectral variations, a straightforward approach is to adopt the global low-rank approximation of an HSI by solving the following objective function as where · 1 and · * are the 1 norm and nuclear norm operations, respectively, γ is a nonnegative regularization parameter, Z ∈ R b×n is the low-rank part of Y, and N ∈ R b×n is the sparse noise part (or called spectral variations). A drawback for this approach defined in problem (1) is that the local spatial structure cannot be exploited in such a global manner [24], [29], as pixels within a homogeneous region often belong to the same category.
To address these problems, we follow the superpixel-induced local LRA-based methods, i.e., applying the LRA on each superpixel. Note that such a way can enhance the data compactness for each homogeneous region (i.e., superpixel), which increases the intraclass similarity. However, existing methods cannot preserve complex local spatial structures of the original HSI, such as local manifolds (or local consistencies). To address this issue, we propose a superpixel-guided locality preserving graph-based regularizer to maintain the local consistency by forcing each pixel to have a similar spectral feature representation to its neighbors (i.e., the neighboring pixels located within a squared window of a homogeneous region). In this manner, the local manifold structure of the original HSI can be adequately exploited to improve the discriminative ability of the representation for the low-rank part Z.

A. Superpixel-Guided Locality Preserving Graph
This section describes how to design a superpixel-guided locality-preserving graph in detail.
We first adopt the entropy rate superpixel method [43] to generate superpixels. To define such a graph, we first refer to (p i , q i ) as the pixel location of the ith pixel (i.e., y i ). The neighboring pixels of y i are defined as the pixels within a squared window with the radius r centered on y i , and these neighboring pixels that include y i constitute the neighbor set To preserve the local consistency of the original HSI Y, i.e., to make the discriminative representation of the ith pixel similar to that of its neighbors within the same superpixel, the weight W i,j between the ith and jth pixels is defined as where I i and I j represent the indices of superpixels that the ith and jth pixel belong to, respectively. Finally, the superpixel-guided Laplacian graph regularizer can then be established based upon W as where z i denotes the ith column of Z, Tr(·) computes the trace of the corresponding matrix, D is a diagonal matrix with the ith diagonal element defined as D i,i = n j=1 W i,j , and G = D − W is the Laplacian matrix.

B. The Model
The objective function of the proposed LLRA-SLPG model is formulated as where S denotes the number of superpixels, Y i ∈ R b×n i denotes the ith superpixel which contains n i pixels, Z i ∈ R b×n i and N i ∈ R b×n i denote decomposed parts whose sum is Y i , and λ, β are nonnegative regularization parameters.
In the objective of problem (4), the first term (i.e., S i=1 Z i * ) focuses on the data compactness for each Z i via employing LRA on each superpixel. The second term (i.e., λ N 1 ) uses 1 norm to make the spectral variations N sparse.
The third term makes the local manifold of Y consistent with that of the low-rank representation Z.

C. Optimization
To solve problem (4), we first introduce an auxiliary variable Q ∈ R b×n and reformulate it as The inexact augmented Lagrangian multiplier (IALM) [44] is employed to solve problem (5) alternatively. The augmented Lagrangian of problem (5) is formulated as where Γ 1 , Γ 2 ∈ R b×n are two Lagrange multipliers, ρ is a nonnegative penalty parameter, and · F denotes the Frobenius norm of a matrix. We alternatively update the three variables (i.e., Z, N, and Q) by solving the three subproblems in each iteration until convergence. In the following, we give the details of the three subproblems as well as the update of Lagrange multipliers. a): By fixing other variables, the subproblem to update It is easy to see that each Z i can be solved independently through the following problem as where Γ 1,i , Γ 2,i , and Q i denote the ith component of Γ 1 , Γ 2 , and Q, respectively. According to [45], problem (8) has a closedform solution as where U, Σ, V are derived from the singular value decomposition (SVD): , diag(·) transforms a diagonal matrix to a vector or transforms a vector to a diagonal matrix, and S ε (x) is a soft thresholding operator, i.e., S ε (x) = 0 if |x| ≤ ε and S ε (x) = (1 − ε/|x|)x otherwise.
b): By fixing other variables, the subproblem to update N is formulated as It is easy to see that problem (10) has a closed-form solution [46], [47] as c): By fixing other variables, the subproblem to update Q is formulated as By setting its gradient with respect to variable Q to zero, problem (12) has an analytical solution as where I denotes an identity matrix with appropriate size. d): The Lagrange multipliers Γ 1 , Γ 2 and the penalty parameter ρ are updated as where iter denotes the index of the iteration, and the parameter μ > 1 improves the convergence rate. The proposed LLRA-SLPG method is shown in Algorithm 1 of the Appendix, which shows the initial values of the variables and the convergence condition. After obtaining Z, a classifier (e.g., SVM) receives it to conduct the classification.

D. Complexity and Convergence of Algorithm 1
The computational complexity of Algorithm 1 in the Appendix is mainly determined by the steps of updating {Z i } S i=1 and Q. Specifically, the step of updating Moreover, the step of updating Q needs to take the inverse of a matrix with the size of n × n, and calculate the matrix product between two matrices whose sizes are b × n and n × n, respectively. Therefore, the computational complexity of updating Q is O(n 3 + bn 2 ). In summary, the total computational complexity of Algorithm 1 is O(n 3 + bn 2 + S i=1 bn i × min(b, n i )) in each iteration.
Remark: It should be emphasized that as the matrix of β(G T + G) + ρI in problem (13) is very sparse, its inverse can be calculated fast in practice.
Moreover, in Algorithm 1, we use IALM to solve the objective function in an iterative manner, which is similar to the classic Expectation Maximization (EM) [48] algorithm. Moreover, it is worth pointing out that IALM has been widely used in many LRA-based methods [28], [31], [35]. The global convergence of  IALM has been theoretically proved when the convex problem has at most two blocks [44], [49]. For the proposed LLRA-SLPG method, where there are three blocks, to the best of our knowledge, it is still unsolved to theoretically prove the global convergence of IALM with three or more blocks [49]. Fortunately, we find that LLRA-SLPG empirically converges well on all the benchmark datasets (see Fig. 1).

IV. EXPERIMENTS
In this section, we empirically evaluate the proposed method.

A. Datasets and Experiment Settings
We use four widely used benchmark datasets 1 to evaluate the effectiveness of the proposed method. The details of these datasets are described as follows.
1) Indian Pines Dataset: This scene is acquired by Airborne Visible and InfraRed Imaging Spectrometer (AVIRIS) sensor over Indian Pines test site, consisting of 145 × 145 pixels with 2) Salinas Valley Dataset: The AVIRIS sensor collects this scene over Salinas Valley, consisting of 512 × 217 pixels with 204 spectral bands after band removal. The ground truth map contains 54129 pixels that belongs to 16 classes.
3) Pavia University Dataset: The ROSIS sensor acquires this scene over Pavia, Italy, which consists of 610 × 340 pixels with 103 spectral bands. The ground truth map contains 42776 pixels that belongs to 9 classes.

4) WHU-Hi-LongKou Dataset:
The imaging sensor acquires this scene over Longkou Town, Hubei province, China, on July 17, 2018, which consists of 550 × 400 pixels with 270 bands. A rectangular part (from 151 to 350 rows and 1 to 400 columns) suffering from noise heavily is used for testing. The ground truth map contains 74 474 pixels that belongs to nine classes.  More details for all the four benchmark datasets are given in Tables I-IV. The numbers of superpixels are empirically set as the suggested values in [31] for the four benchmark datasets, i.e., 64 (Indian Pines), 50 (Salinas Valley), 50 (Pavia University), and 64 (WHU-Hi-LongKou).
Three widely used evaluation criteria, i.e., overall accuracy (OA), average accuracy (AA), and Kappa coefficient (κ), are used to measure the classification results. For all the LRA-based methods, after learning the recovered representation, the SVM classifier equipped with a RBF kernel is used as the classifier. Specifically, our proposed LLRA-SLPG method uses the max normalization method [50] as a preprocessing.

B. Parameter and Convergence Analysis
In this section, we investigate how the three hyperparameters of the proposed LLRA-SLPG method, i.e., λ, β, and r, affect the classification performance. Note that λ depends on the severity of the noises in an HSI, β controls to preserve the local manifold structure, and r reflects the number of neighbors for the locality preserving graph. Following [31], For the Indian Pines dataset, as shown in Table V, the proposed LLRA-SLPG method achieves the highest OA (97.13%) with λ = 0.1 and β = 5. It can be observed that, with different values of λ, the corresponding highest OAs may not change too much, that is, with λ = 0.05 and 0.1, the corresponding highest OA equals 96.96% and 97.13%, respectively. According to the results, we can see that values of β that result in OA larger than 96.00% are in the interval [1,50].
For the Salinas Valley dataset, as shown in Table VI, the LLRA-SLPG method achieves the highest OA (98.10%) when λ = 0.05 and β = 20. With different values of λ, the corresponding best OAs are very close, that is, when λ equals 0.05 and 0.1, the best OA is 98.10% and 98.09%, respectively. Values of β that result in OA larger than 97% are in the interval [2,100].
For the Pavia University dataset, as shown in Table VII, the LLRA-SLPG method achieves the highest OA (90.79%) with λ = 0.01 and β = 0.1. Values of β that result in OA larger than 87% are in the interval [0. 1,2].
For the WHU-Hi-LongKou dataset, as shown in Table VIII, the LLRA-SLPG method achieves the highest OA (98.42%) with  On all four datasets, with a fixed value of λ, the OA first increases and then decreases as β increases. For example, on the Indian Pines dataset, with λ = 0.1, the OA steadily increases from 79.02% to 97.13% as β changes from 0 to 5, and decreases to 90.01% as β increases from 5 to 10 000. Besides, when β varies in a specific range on those datasets (e.g., 0.01 ≤ β ≤ 20 on the Indian Pines dataset), the LLRA-SLPG method performs consistently better than β = 0, which validates the effectiveness of the third term in problem (4) (i.e., βTr(ZGZ T )) to improve the classification performance.
2) Influence of Radius r of the Superpixel-Guided Locality Preserving Graph: In this part, we study the effect of the radius r. Table IX shows    Accordingly, in the subsequent experiments, the parameters r, λ, and β are set to the corresponding values which could achieve the best performance (i.e., the highest OA) according to Table IX. Specifically, r is set to 1, 3, 2 and 4 for the Indian Pines, Salinas Valley, Pavia University, and WHU-Hi-LongKou datasets, respectively.

3) Ablation Study for the Superpixel-Guided Locality Preserving Graph:
In this part, an ablation study is conducted to study the effectiveness of the superpixel-guided locality preserving graph. First, we define the objective function in problem (15), namely M1, which removes the superpixel-guided locality preserving graph term (i.e., βTr(ZGZ T )) in problem (15). The candidate values of λ for M1 are the same as the proposed LLRA-SLPG method. The comparison results on all the four datasets are shown in Table X, which can be seen that the performance of the proposed LLRA-SLPG method achieves much better performance than the M1 method on all the evaluation metrics (i.e., OA, AA, and κ). Such an observation validates that the superpixel-guided locality preserving graph term (i.e., βTr(ZGZ T )) can significantly enhance the classification performance Fig. 1, we can see that problem (6) converges after about 150 iterations on the Indian Pines and Salinas Valley datasets. In addition, on the Pavia University and WHU-Hi-LongKou datasets, problem (6) converges faster (i.e., in about 100 iterations). All of those results validate the convergence of the proposed LLRA-SLPG method.
We adopt the suggested settings in the original papers for all those baseline methods. We repeat the experiments for ten times with randomly sampled training pixels for all the methods and report the average results.
According to results on the Indian Pines dataset shown in Table XI where the training percentage per class equals 5%, we can see that the LLRA-SLPG method significantly outperforms baseline methods. For example, the proposed LLRA-SLPG method achieves the highest accuracies (i.e., OA (97.18%), AA (96.52%), and κ (96.79%)), which is much better than the second-best one (i.e., OA (93.99%), AA (90.45%), and κ (93.15%)). Moreover, the proposed LLRA-SLPG method achieves the best performance in 12 out of 16 classes, especially for classes 1, 2, 3, 5, 7, 9, 10, 11, 12, 14. For classes with limited samples, i.e., classes 1, 7, and 9, the LLRA-SLPG method achieves the best performance and shows a remarkable margin over the second-best one. We also compare the classification performance for all the methods with various training percentages per class, i.e., P = 1%, 3%, 5%, and 7%. As shown in Fig. 2, the proposed LLRA-SLPG method is consistently better than baseline methods in OA, AA, and κ. The classification accuracies of all the methods gradually improve as the training percentage increases. Moreover, our method performs significantly better than the compared ones when the number of training pixels is extremely small, e.g., 1% and 3%. Table XII shows the performance comparison on the Salinas Valley dataset, where the training percentage per class is set to 0.5%. We can observe that the proposed LLRA-SLPG method also achieves the highest values in all the accuracy metrics. Specifically, the proposed LLRA-SLPG method achieves the best performance in most categories, i.e., the best performance for 14 out of 16 classes. Especially for the classes 1, 2, 3, 8, 10, 11, 14, 15, and 16, the proposed LLRA-SLPG method shows remarkable margins over baseline methods. Furthermore, as illustrated in Fig. 3, the proposed LLRA-SLPG method performs well under various training percentages, which shows its superiority. It can be observed that the classification performance of all the methods gradually improves with the training percentage increasing. Table XIII shows the performance of all the methods in comparison on the Pavia University dataset, in which the training percentage per class equals 0.2%. According to the results, we can see that the proposed LLRA-SLPG method achieves better performance. Specifically, the proposed LLRA-SLPG method performs remarkably better than baseline methods, especially in classes 1, 2, 3, 6, and 8. In addition, as shown in Fig. 4, the proposed LLRA-SLPG method consistently performs better than baseline methods under different training percentages, especially for small training percentages, e.g., 0.2%. Table XIV shows the performance comparison on the WHU-Hi-LongKou dataset, in which the training percentage per class is 1%. We can observe that the proposed LLRA-SLPG method outperforms the other methods in OA, AA, and κ. Specifically, the proposed LLRA-SLPG method achieves the best performance in 7 out of 9 classes, especially for classes 3, 5, 8. Moreover, as shown in Fig. 5, the proposed LLRA-SLPG method shows significant superiority over other methods on different training percentages, especially for small training percentages, e.g., 0.3% and 0.5%.
The classification maps of all the methods on the four benchmark datasets are given in Figs. 6-9, which further manifest the advantage of the proposed LLRA-SLPG method.   per class of the four datasets (i.e., Indian Pines, Salinas Valley, Pavia University, and WHU-Hi-LongKou) are 5%, 0.5%, 0.2%, and 1%, respectively. It can be seen that the proposed LLRA-SLPG method has a reasonable running time compared with baseline methods.

V. CONCLUSION
In this article, we propose the LLRA-SLPG method to improve the discriminative representation of pixels in an HSI. The LLRA-SLPG method can reduce the spectral variations and promote the local manifold structure to improve the discriminability of features in the low-rank component. Experiments on four benchmark datasets demonstrate that the proposed LLRA-SLPG method outperforms state-of-the-art methods. In addition, the proposed LLRA-SLPG method shows remarkable performance improvement with extremely few training samples. In the future, we would like to improve the efficiency of the proposed LLRA-SLPG method by using Anderson acceleration [55], and adaptively determine the value of parameters, e.g. using maximum a posteriori (MAP) to estimate them [56], [57], [58], [59] according to the characteristic of input samples.