By Topic

IEEE Quick Preview
  • Abstract



FEATURE-EXTRACTION methods are used to create a subset of new features by combinations of the existing ones. Feature extraction is typically used before the application of classification or regression algorithms to discard redundant or noisy components and to reduce the dimensionality of the data, which helps also to prevent numerical problems. The use of linear projection methods, such as principal component analysis (PCA) [1], is quite common in remote sensing data analysis. However, PCA can be limited for several reasons. Alternative feature-extraction methods, such as the partial least squares (PLS) [2], include the information of the target variable (or labels) in the projection matrix. Other methods include information about the noise; see, e.g., the minimum noise fraction transformation [3].

All previous methods assume that there exists a linear relation in the original data. However, in many situations, this linearity assumption does not hold, and a nonlinear feature extraction is needed to obtain acceptable performance. In this context, kernel methods are a promising approach, as they constitute a nice framework to formulate nonlinear versions from linear algorithms [4]. Since the early use of support vector machines in remote sensing [5], several kernel-based feature-extraction methods have been proposed in the field. Multivariate kernel feature-extraction methods, such as kernel PCA (KPCA) and kernel PLS, have been proposed for hyperspectral classification [6], [7] and target detection [8].

In this letter, we present the application of a new kernel-based data transformation method, called kernel entropy component analysis (KECA) [9], to remote sensing data processing. KECA, like KPCA, is a spectral method based on the kernel similarity matrix; however, it does not necessarily use the top eigenvalues and eigenvectors of the kernel matrix. Unlike KPCA, which preserves maximally the second-order statistics of the data set, KECA is founded on information theory and tries to preserve the maximum Rényi entropy of the input space data set. The entropy of a probability density function (pdf) can be interpreted as a measure of information [10], [11]. In this context, the entropy concept can be extended to obtain a measure of dissimilarity between distributions [12], and the Cauchy–Schwarz (CS) divergence between two pdfs can be related to the cosine of the angle between their kernel feature space mean vectors. Hence, the combination of KECA feature extraction (which maximally preserves entropy) with an angle-based clustering (which maximizes the CS divergence between the cluster distributions) provides a new information theoretic learning tool. This method enables nonlinear data analysis and captures the data higher order statistics. In particular, nonlinearly related input space data clusters are distributed in different angular directions with respect to the origin of the kernel feature space (see Fig. 1). KECA thus reveals cluster structure and, hence, information about the underlying labels of the data.

Figure 1
Fig. 1. Projections extracted by different methods on the (red) cloud versus (black) cloud-free problem (MERIS bands 1 and 8). The RBF kernel was used, and the width parameter was set to the median distance of all training samples.

This letter is organized as follows. Section II presents KECA formulation and proposes a clustering that exploits kernel features characteristics. Section III is devoted to the analysis of the results. We use KECA as a feature-extraction method before performing clustering. In particular, the challenging problem of cloud screening from multispectral MEdium Resolution Imaging Spectrometer (MERIS) images is tackled. This letter is concluded in Section IV.



The Rényi quadratic entropy [13] is given by Formula TeX Source $$H(p) = -\log \int p^{2}({\bf x}) dx\eqno{\hbox{(1)}}$$ where Formula$p({\bf x})$ is the pdf generating a data set, Formula${\cal D} = {\bf x}_{1}, \ldots, {\bf x}_{N}$, being Formula${\bf x}_{t} \in {\cal R}^{d}$, Formula$t = 1, \ldots, N$.

The aim of KECA is to maximally preserve the entropy of the input data in (1) with the smallest number of extracted features. Therefore, since the logarithm is a monotonic function, we concentrate on the quantity Formula$V(p) = \int p^{2}({\bf x}) dx$ or, alternatively, the expectation of Formula$V(p)$ w.r.t. the density Formula$p({\bf x})$. Here, we use Parzen windows to estimate Formula$V(p)$ [14] Formula TeX Source $$\mathhat{p}({\bf x}) = {1 \over N} \sum_{{\bf x}_{t} \in {\cal D}} K({\bf x}, {\bf x}_{t} \vert \sigma)\eqno{\hbox{(2)}}$$ where Formula$K({\bf x}, {\bf x}_{t} \vert \sigma)$ is the so-called Parzen window, or kernel, centered at Formula${\bf x}_{t}$ and Formula$\sigma$ is a width parameter [15]. Using the sample mean approximation of the expectation operator and assuming a positive semidefinite (psd) Parzen kernel, e.g., the Gaussian or radial basis function (RBF), we have [16] Formula TeX Source $$\mathhat{V}(p) = {1 \over N^{2}} \sum_{{\bf x}_{t} \in {\cal D}} \sum_{{\bf x}_{t^{\prime}} \in {\cal D}} K({\bf x}_{t}, {\bf x}_{t^{\prime}} \vert \sqrt{2}\sigma) = {1 \over N^{2}} {\bf 1}^{\top} {\bf K} {\bf 1}\eqno{\hbox{(3)}}$$ where element Formula$(t, t^{\prime})$ of the Formula$(N \times N)$ kernel matrix Formula${\bf K}$ equals Formula$K({\bf x}_{t}, {\bf x}_{t^{\prime}} \vert \sqrt{2}\sigma)$ and Formula${\bf 1}$ is an Formula$(N \times 1)$ vector of ones. Hence, the empirical Rényi entropy estimate resides in the elements of the corresponding kernel matrix [17].

Moreover, the Rényi entropy estimator can be expressed in terms of the eigenvalues and eigenvectors of the kernel matrix, which can be eigendecomposed as Formula${\bf K} = {\bf EDE}^{\top}$, where Formula${\bf D}$ is a diagonal matrix storing the eigenvalues Formula$\lambda_{1}, \ldots, \lambda_{N}$ and Formula${\bf E}$ is a matrix with the corresponding eigenvectors Formula${\bf e}_{1}, \ldots, {\bf e}_{N}$ as columns. Rewriting (3), we then have Formula TeX Source $$\mathhat{V}(p) = {1 \over N^{2}} \sum_{i = 1}^{N} \left(\sqrt{\lambda_{i}} {\bf e}_{i}^{\top} {\bf 1} \right)^{2} = {1 \over N^{2}} \sum_{i = 1}^{N} \psi_{i}.\eqno{\hbox{(4)}}$$ Each term Formula$\psi_{i}$ in this expression will contribute to the entropy estimate. This means that certain eigenvalues and eigenvectors will contribute more to the entropy estimate than others since the terms depend on different eigenvalues and eigenvectors.

A. KECA Transformation

Let Formula$\phi: {\cal R}^{d} \rightarrow {\cal F}$ denote a nonlinear map such that Formula${\bf x}_{t} \rightarrow \phi({\bf x}_{t})$, and let Formula${\mmb \Phi} = [\phi({\bf x}_{1}), \ldots, \phi({\bf x}_{N})]$. Inner products in the Hilbert space Formula${\cal F}$ can be computed via a psd Mercer's kernel function Formula$K: {\cal R}^{d} \times {\cal R}^{d} \rightarrow {\cal R}$ Formula TeX Source $$K({\bf x}_{t}, {\bf x}_{t^{\prime}}) = \left \langle \phi({\bf x}_{t}), \phi({\bf x}_{t^{\prime}}) \right \rangle.\eqno{\hbox{(5)}}$$ Defining the Formula$(N \times N)$ Mercer kernel matrix Formula${\bf K}$ such that element Formula$(t, t^{\prime})$ of Formula${\bf K}$ equals Formula$K({\bf x}_{t}, {\bf x}_{t^{\prime}})$, then Formula${\bf K} = {\mmb \Phi}^{\top} {\mmb \Phi}$ is an inner-product (Gram) matrix in Formula${\cal F}$. The kernel matrix can be eigendecomposed as Formula${\bf K} = {\bf EDE}^{\top}$ as before.

A projection of Formula${\mmb \Phi}$ onto a single principal axis Formula${\bf u}_{i}$ in Formula${\cal F}$ is given by Formula${\bf u}_{i}^{\top} {\mmb \Phi} = \sqrt{\lambda_{i}} {\bf e}_{i}^{\top}$. Hence, the empirical Rényi entropy estimate, (4), is based on the projections onto all principal axes in Formula${\cal F}$, Formula${\bf U}^{\top} {\mmb \Phi} = {\bf D}^{1/2} {\bf E}^{\top}$, where Formula${\bf U} = [{\bf u}_{1}, \ldots, {\bf u}_{N}]$ is the projection matrix.

We define KECA as an m-dimensional data transformation obtained by projecting Formula${\mmb \Phi}$ onto a subspace Formula${\bf U}_{m}$ spanned by those m feature space principal axes contributing most to the Rényi entropy estimate of the data, obtaining the extracted KECA features Formula TeX Source $${\mmb \Phi}_{eca} = {\bf U}_{m}^{\top} {\mmb \Phi} = {\bf D}_{m}^{1 \over 2} {\bf E}_{m}^{\top}.\eqno{\hbox{(6)}}$$ This is the solution to the minimization problem Formula TeX Source $${\mmb \Phi}_{eca} = {\bf D}_{m}^{1 \over 2} {\bf E}_{m}^{\top}: \min_{\lambda_{1}, {\bf e}_{1}, \ldots, \lambda_{N}, {\bf e}_{N}} \mathhat{V}(p) - \mathhat{V}_{m}(p)\eqno{\hbox{(7)}}$$ where the entropy estimate associated with Formula${\mmb \Phi}_{eca}$ is Formula TeX Source $$\mathhat{V}_{m}(p) = {1 \over N^{2}} {\bf 1}^{\top} {\bf E}_{m} {\bf D}_{m} {\bf E}_{m}^{\top} {\bf 1} = {1 \over N^{2}} {\bf 1}^{\top} {\bf K}_{eca} {\bf 1}\eqno{\hbox{(8)}}$$ and where Formula${\bf K}_{eca} = {\mmb \Phi}_{eca}^{\top} {\mmb \Phi}_{eca} = {\bf E}_{m} {\bf D}_{m} {\bf E}_{m}^{\top}$. Note that Formula${\mmb \Phi}_{eca}$ is not necessarily based on the top eigenvalues, Formula$\lambda_{i}$, since Formula${\bf e}_{i}^{\top} {\bf 1}$ also contributes to the entropy estimate.

An alternative expression to the minimization yields Formula TeX Source $${\mmb \Phi}_{eca} = {\bf D}_{m}^{1\over 2} {\bf E}_{m}^{\top}: \min_{\lambda_{1}, {\bf e}_{1}, \ldots, \lambda_{N}, {\bf e}_{N}} {1 \over N^{2}} {\bf 1}^{\top} ({\bf K} - {\bf K}_{eca}) {\bf 1}.\eqno{\hbox{(9)}}$$ The minimum value obtained from (9) is given by Formula TeX Source $$\min_{\lambda_{1}, {\bf e}_{1}, \ldots, \lambda_{N}, {\bf e}_{N}} {1 \over N^{2}} {\bf 1}^{\top} ({\bf K} - {\bf K}_{eca}) {\bf 1} = {1 \over N^{2}} \sum_{i = m + 1}^{N} \psi_{i} \eqno{\hbox{(10)}}$$ where Formula$\psi_{i}$ corresponds to the ith largest term of (4). Note that the KPCA transformation is based solely on the top eigenvalues of Formula${\bf K}$ and will, in general, differ from KECA.

B. Out-of-Sample Extension

Although we do not have direct access to the mapped samples Formula$\phi({\bf x})$ nor the principal axes, we can obtain an explicit expression for projections onto a principal axis Formula${\bf u}_{i}$ in the kernel feature space Formula${\cal F}$. This is helpful to derive a formula for projecting an out-of-sample (test) data point Formula$\phi({\bf x})$ onto that axis. Requiring unitary norm Formula$\Vert {\bf u}_{i} \Vert^{2} = 1$, we thus have Formula${\bf u}_{i} = \lambda^{-{1 \over 2}} {\mmb \Phi} {\bf e}_{i}$. Hence Formula TeX Source $$\eqalignno{{\bf u}_{i}^{\top} \phi({\bf x}) = &\, \left \langle \lambda_{i}^{-{1 \over 2}} \sum_{t = 1}^{N} e_{i, t} \phi({\bf x}_{t}), \phi({\bf x}) \right \rangle\cr = &\, \lambda^{-{1 \over 2}} \sum_{t = 1}^{N} e_{i, t} K({\bf x}_{t}, {\bf x})&\hbox{(11)}}$$ where Formula$e_{i, t}$ denotes the tth element of Formula${\bf e}_{i}$. Let Formula${\mmb \Phi}^{\ast}$ refer to a collection of out-of-sample data points, and define the inner-product matrix Formula${\bf K}^{\ast} = {\mmb \Phi}^{\top} {\mmb \Phi}^{\ast}$. Then, by (11), one obtains Formula TeX Source $${\mmb \Phi}_{eca}^{\ast} = {\bf U}_{m}^{\top} {\mmb \Phi}^{\ast} = {\bf D}_{m}^{-{1 \over 2}} {\bf E}_{m}^{\top} {\mmb \Phi}^{\top} {\mmb \Phi}^{\ast} = {\bf D}_{m}^{-{1 \over 2}} {\bf E}_{m}^{\top} {\bf K}^{\ast}.\eqno{\hbox{(12)}}$$

C. Remarks

Some interesting issues of KECA should be noted. From an input space perspective, Formula$\mathhat{V}_{m}(p)$ preserves, as much as possible, the Rényi entropy estimate Formula$\mathhat{V}(p)$ of the original data Formula${\bf x}_{1}, \ldots,\break {\bf x}_{N}$. Therefore, one can associate Formula${\bf K}_{eca}$ with a transformation such that the input and the transformed data entropies are maximally similar. From a feature space perspective, KECA transformation preserves the squared Euclidean length of the mean vector in the kernel feature space Formula${\mmb \mu}$ Formula TeX Source $$\mathhat{V}(p) = {1 \over N^{2}} {\bf 1}^{\top} {\bf K} {\bf 1} = {1 \over N^{2}} {\bf 1}^{\top} {\mmb \Phi}^{\top} {\mmb \Phi} {\bf 1} = {\mmb \mu}^{\top} {\mmb \mu} = \Vert {\mmb \mu} \Vert^{2}\eqno{\hbox{(13)}}$$ where Formula${\mmb \mu} = 1/N \sum_{{\bf x}_{t} \in {\cal D}} \phi({\bf x}_{t})$. Similarly, the entropy estimate of the transformed data, Formula$\mathhat{V}_{m}(p)$, can be expressed as Formula TeX Source $$\mathhat{V}_{m}(p) = {1 \over N^{2}} {\bf 1}^{\top} {\bf K}_{eca} {\bf 1} = \Vert {\mmb \mu}_{eca} \Vert^{2}\eqno{\hbox{(14)}}$$ where Formula${\mmb \mu}_{eca} = 1/N \sum_{{\bf x}_{t} \in {\cal D}} \phi_{eca}({\bf x}_{t})$ is the mean vector of the transformed data Formula${\mmb \Phi}_{eca} = [\phi_{eca}({\bf x}_{1}), \ldots \phi_{eca}({\bf x}_{N})]$. Hence, attending to (7) and (9), KECA minimizes the difference between the squared Euclidean length of the mean vectors of the original and transformed data in the kernel feature space Formula$\Vert {\mmb \mu} \Vert^{2} - \Vert {\mmb \mu}_{eca} \Vert^{2}$. It is worth noting that, unlike in most of the kernel-based feature-extraction methods, centering does not make sense in KECA since Formula${\mmb \mu} = {\bf 0}$ corresponds to assuming infinite entropy input space data.

D. KECA Spectral Clustering Algorithm

KECA often leads to a data set with a distinct angular structure, where clusters are distributed more or less in different angular directions with respect to the origin of the kernel feature space (see Fig. 1). Therefore, an angle-based clustering based on the kernel features Formula${\mmb \Phi}_{eca}$ may be reasonable. In [12], the kernel entropy concept was extended to express the CS divergence measure between pdfs as a measure of the cosine of the angle between kernel feature space mean vectors. For example, the CS divergence between the pdf of the ith cluster Formula$p_{i}({\bf x})$ and the overall pdf of the data Formula$p({\bf x})$ is given by Formula$D_{CS}(p_{i}, p) = -\log(V_{CS}(p_{i}, p))$, where Formula TeX Source $$V_{CS}(p_{i}, p) = {\int p_{i}({\bf x}) p({\bf x}) dx \over \sqrt{\int p_{i}^{2} ({\bf x}) dx \int p^{2}({\bf x}) dx}}.\eqno{\hbox{(15)}}$$ Via Parzen windowing, we have Formula$\mathhat{V}_{CS}(p_{i}, p) = \cos \angle ({\mmb \mu}_{i}, {\mmb \mu})$ where Formula${\mmb \mu}_{i}$ is the mean vector of the cluster Formula$C_{i}$ in the kernel feature space. In this context, an angle-based clustering cost function in terms of the kernel feature space data set Formula${\mmb \Phi}_{eca}$ is proposed Formula TeX Source $$J(C_{1}, \ldots, C_{k}) = \sum_{i = 1}^{k} N_{i} \cos \angle \left(\phi_{eca}({\bf x}), {\mmb \mu}_{i}\right)\eqno{\hbox{(16)}}$$ where Formula$N_{i}$ is the number of samples in the cluster Formula$C_{i}$ and Formula${\mmb \mu}_{i}$ is its centroid. This is the kernel k-means clustering objective using an angular distance measure, instead of a Euclidean distance-based measure as used in, e.g., [18]. The optimization procedure is simply the well-known k-means algorithm using angular distances in KECA space and is guaranteed to converge to a local optimum [9]. The spectral clustering algorithm is formalized as follows.

KECA Spectral Clustering Algorithm

  1. Obtain Formula${\mmb \Phi}_{eca}$ by KECA.
  2. Initialize means Formula${\mmb \mu}_{i}, \ i = 1, \ldots, k$.
  3. For all train samples t, assign a cluster Formula${\bf x}_{t} \rightarrow C_{i}$ maximizing Formula$\cos \angle (\phi_{eca}({\bf x}_{t}), {\mmb \mu}_{i})$.
  4. Update mean vectors Formula${\mmb \mu}_{i}$.
  5. Repeat steps 3 and 4 until convergence.

In summary, we assign a kernel feature space data point Formula$\phi_{eca}({\bf x}_{t})$ to the cluster represented by the closest mean vector Formula${\mmb \mu}_{i}$ in terms of angular distance.



This section presents the results of applying the KECA-based clustering algorithm to the problem of cloud screening from multispectral imagery.

A. Data Collection

We perform pixelwise binary decisions about the presence/absence of clouds in multispectral satellite images. In particular, we focus on cloud masking using data acquired by the MERIS instrument on board the Environmental Satellite (ENVISAT) [19]. We used a set of MERIS Level 1b images1 taken over Spain and France (cf. Fig. 3). Images from different locations and time acquisitions correspond to different atmospherical conditions, illuminations, and kinds of land cover.

For our experiments, we used as input 13 spectral bands (MERIS bands 11 and 15 were removed since they are affected by atmospheric absorptions) and six physically inspired features obtained from MERIS bands in a previous work [20]. The dimension of the data is thus Formula$d = 19$. The features capture general properties of clouds: brightness and whiteness in the visible and near-infrared spectral ranges, along with atmospheric oxygen and water-vapor absorption. The main problems in optical cloud screening come from thin clouds and sparse cirrus; low-height clouds; bright, white, and cold surfaces; and high-altitude regions with low pressure. Therefore, in our experiments, the classification complexity of each MERIS image increases in the following order: Barrax image (BR-2003-07-14) presents a bright and thick cloud in the center of the image; Barrax image (BR-2004-07-14) presents small clouds over sea in the right part of the image; and, finally, France image (FR-2005-03-19) presents not only opaque clouds at south and north France but also snowy mountains at various altitudes.

Figure 2
Fig. 2. Averaged and standard deviation (top left) and individual estimated kappa statistic for the three considered cloudy scenes as a function of the number of training samples for different clustering methods.

B. Numerical Comparison

The proposed spectral clustering method was applied to the MERIS images with the aim of identifying clouds. The algorithm essentially performs the (angular) k-means algorithm on KECA features, using the cosine similarity measure. The number of clusters was fixed to Formula$k = 2$, assuming that the two clusters found using the extracted features correspond to cloud-free and cloudy areas, respectively. That is, we assume that a good feature-extraction method is capable of capturing the data structure and, then, a simple clustering algorithm should distinguish the two classes in the images. After clustering, in order to compute classification accuracies, the class label of the majority of the training samples belonging to a cluster was propagated to all the test samples in the cluster. A total of 30 000 pixels (10 000 per image) were manually labeled. A different number of labeled samples, Formula$N = \{50, \ldots, 1000\}$, was used to train the models, and results were computed over the rest of the samples. The labeled data sets were balanced in the sense that the numbers of the positive and negative examples were roughly the same: The training samples were selected randomly while ensuring the same number of samples in each class. Results were compared with those of standard k-means clustering, KPCA plus k-means, and kernel k-means [4]. In all kernel methods, the RBF kernel was used, and the width parameter was tuned for each method following a grid search in the range Formula$[ \sigma_{d} 10^{-2}, \sigma_{d} 10^{2}]$, where Formula$\sigma_{d}$ was the median distance of all training samples and the best kernel width was selected by cross-validation. Differences between KPCA and KECA methods decrease as the number of extracted features increases. Therefore, in the experiments, the number of extracted features using KPCA and KECA was fixed to Formula$m = 2$ to stress the differences between both methods.

Fig. 2 shows the kappa statistic results over ten realizations for all images. It is observed that, in average (top left subfigure), the proposed algorithm outperforms standard k-means (average gain of +25%) and kernel k-means and KPCA (gain of +15%). Since different scenes have particularly different complexities, we also show the individual performances: As expected, the highest classification accuracy is obtained for the first Barrax image Formula$(\kappa_{max} > 0.9)$, followed by the second Barrax image Formula$(\kappa_{max}\sim 0.75)$ and the France image with the lowest accuracy Formula$(\kappa_{max} \sim 0.5)$. The general trends suggest that the number of training samples positively affects the results, except for the case of the France image. This behavior was already observed in [21], where a relatively high number of labeled samples was required to correctly detect clouds.

C. Classification Maps

A quantitative and a visual analysis of the classification maps of the test images are carried out. We compare the outputs to a ground truth obtained in [20], in which the labeling of the clouds has been supervised by an expert. Fig. 3 shows the comparison between the ground truth and the maps obtained with all the methods for all images.

Figure 3
Fig. 3. (First column) Color composite of MERIS FR images over Spain (BR-2003-07-14 and BR-2004-07-14) and France (FR-2005-03-19) and comparison of the ground truth with the clustering maps of the k-means, kernel k-means, and the clustering of the top two KPCA and KECA features Formula$(m = 2)$. Discrepancies with the ground truth are shown in red when proposed methods detect cloud and in yellow when pixels are classified as cloud free.

In KECA, the computational cost is the same as that in KPCA since, in the training phase, it only depends on the eigendecomposition of the kernel matrix, typically Formula${\cal O}(N^{3})$. For the test phase, the features are extracted following the out-of-sample procedure in (12) over the whole image, and the classification is based on the cluster centers obtained using 1000 training samples. We observed that a good choice was to cluster KECA features according to a nearest angle criterion rather than the Euclidean distance (results not shown), which goes in line with the clustering principle of the method described in Section II-D.

Classification agreement is depicted in white for cloudy pixels and in blue for cloud-free pixels; discrepancies are shown in red when analyzed methods detect cloud and in yellow when pixels are classified as cloud free. Overall classification accuracies (OAs) higher than 90% are obtained for most cases, but the lower values of Formula$\kappa$ for some cases point out that results are unbalanced due to the misclassification of a significant number of pixels of one class. The best Cohen's kappa result for each experiment is highlighted in bold, which corresponds to the KECA method in the three test sites.

The results obtained numerically in Fig. 2 are here confirmed by visual inspection. The Barrax images represent an easy cloud screening problem, and KECA clustering shows good agreement. On the contrary, the France image presents snow in the Alps, the Pyrenees, and the Massif Central, which makes prediction really difficult due to the misclassification of snow as cloud. In this test site, none of the methods provides an accurate cloud mask since clouds and snow are grouped in the same cluster attending to their similar spectral features. Unfortunately, this is a general drawback of unsupervised methods when the spectral clusters and the defined land-cover classes are not directly related. In this context, it is worth noting that the other clustering methods provide a significant number of false positives (red) and false negatives (yellow) for some of the scenes. For example, k-means simply groups bright pixels, which is quite robust but makes impossible to detect most of the cloud borders and small clouds (yellow pixels in BR-2003-07-14 and BR-2004-07-14) and misclassifies snow as cloud (red pixels in FR-2005-03-19). On the other hand, kernel methods can produce more complex clusters, but the Euclidean clusters found by the kernel k-means in the transformed space or the directions of maximum variance of KPCA might not be related to the defined classes. This can be observed for kernel k-means in BR-2004-07-14 and for KPCA in BR-2003-07-14 and FR-2005-03-19, where some dark and wet areas are grouped with clouds due to the effect of the atmospheric input features. These results demonstrate the validity of the proposed nonlinear feature extraction for remote sensing applications, which maximizes the information content by maximizing the preserved Rényi entropy.



This letter proposed KECA for clustering remote sensing data. The method extracts nonlinear features that maximally preserve the entropy of the input data with the smallest number of features; and the angular clustering of the mapped data reveals the data structure in terms of maximum divergence between cluster distributions. An out-of-sample extension of the method was also presented, which is mandatory in remote sensing image processing due to the amount of data to be processed. Good results were obtained on cloud screening from MERIS images. The code is available at for those interested readers. Future work will be devoted to include labeled and noise information in KECA and to test it in more remote sensing applications.


This work was supported in part by the Spanish Ministry for Science and Innovation under Projects AYA2008-05965-C04-03, CSD2007-00018, and by the UV-INV-AE11-41223 Project.

L. Gómez-Chova and G. Camps-Valls are with the Image Processing Laboratory, Universitat de València, C/ Catedrático Escardino, E-46980 Paterna (València), Spain (e-mail:;

R. Jenssen is with the Department of Physics and Technology, University of Tromsø, N-9037 Tromsø, Norway (e-mail:

Color versions of one or more of the figures in this paper are available online at

1All scenes correspond to MERIS full spatial resolution (FR) images with a pixel size of 260 m across track and 290 m along track and an image size of 1153 × 1153 pixels (300-km swath by 334-km azimuth) in scenes over Spain and with an image size of 2241 × 2241 pixels (582-km swath by 650-km azimuth) in the other scene.


No Data Available


No Photo Available

Luis Gómez-Chova

No Bio Available
No Photo Available

Robert Jenssen

No Bio Available
No Photo Available

Gustavo Camps-Valls

No Bio Available

Cited By

No Data Available





No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
INSPEC Accession Number:
Digital Object Identifier:
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size