Avoiding Invalid Transitions in Land Cover Trajectory Classification With a Compound Maximum a Posteriori Approach

Classifying remote sensing time-series in land cover trajectories can provide essential information about ecosystems functioning and about the impacts of natural phenomena or human activities over the environment. The existing approaches commonly used for this task can present serious drawbacks, such as the possibility to derive invalid trajectories, use of complex data inputs, and several classification steps. To provide a simple method for land cover trajectory classification, we present a novel algorithm named Compound Maximum a Posteriori (CMAP) classifier. CMAP incorporates the knowledge of land cover dynamics and the information of multi-temporal data sets to produce only valid land cover trajectories using a global generative classification approach and simple inputs. CMAP was tested in two case studies, in which we compared land cover trajectories obtained by CMAP to those obtained using the traditional Maximum Likelihood (ML) classifier in a post-classification comparison approach. In the first case study, we classified 6 images from the same sensor, using the same land cover legend. In the second case study, we classified 3 images from different types of sensors, using different land cover legend levels. Because of its formulation, CMAP does not return invalid transitions/trajectories as classification results. The use of ML and post-classification comparison, on the other hand, resulted in invalid land cover trajectories in more than 50% of the used images in both case studies. Furthermore, the use of CMAP leads to better accuracy indexes for land cover classification of each date and reduces the classification noise.


I. INTRODUCTION
The information about the dynamics of land use and land cover (LULC) is essential to understand how ecosystems function over time, as well as to better assess the impacts of natural phenomena or human activities over the environment [1], [2]. Land cover refers to the biophysical state of the terrestrial surface, whereas land use involves the activities carried out in a given area or the intention of manipulation [3]. Concerning LULC changes, we consider that the identification of land cover classes in the same unit of analysis in two consecutive times defines one land cover transition.
The associate editor coordinating the review of this manuscript and approving it for publication was Stefania Bonafoni .
A sequence of land cover transitions forms a land cover trajectory, also defined as the succession of land cover types in a given unit of analysis in three or more observed times [4].
One of the most established and widely used methods to characterize land cover trajectories consists of independently classifying remote sensing images at varying points in time and then stacking these classifications [5]- [7]. This method, known as post-classification comparison, is often criticized because the accuracy of the mapped land cover trajectory depends on the quality of the land cover classifications [1], [6], [8]. Therefore, it is impossible to accurately measure changes occurring at a lower rate than the combined errors of the individual classifications [8]. Errors in the land cover classifications can result in trajectories that would never VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ occur in the real world, defined here as invalid land cover trajectories. Invalid trajectories are usually considered as classification errors and masked from analysis or corrected in post-classification processes. Methods to correct these trajectories include manual edition and temporal filter usage. In this case, classes that result in invalid transitions are substituted by others following a user-determined set of rules, such as 'areas previously deforested and classified as forest should be renamed as secondary vegetation' [9]. These rules can consider or not the spectral characteristics of data and change information [10]. Problems caused by invalid trajectories can be avoided if the temporal features of remote sensing images are included in the analysis [11]. For instance, it is possible to classify a multi-temporal set of images directly into land cover trajectories, with either unsupervised (reference samples not needed) or supervised (reference samples of trajectories needed) methods [6]. However, both methodologies present some problems. The use of unsupervised algorithms usually offer very complex results, and collecting reference samples of land cover trajectories over a set of multi-temporal images may be laborious and requires full understanding of the characteristics of different kinds of change [7].
Nonetheless, the knowledge about the validity of land cover transitions on a given multi-temporal study has been included both to improve [10], [12], [13] as well as to evaluate the classification of land cover trajectories [14]. One common approach to classify land cover trajectories is to generate a base map, which is updated with change information about the area. In this case, one considers only the types of land cover transitions that are actually feasible in the studied area and time gap [12], [15]. Change detection and land cover classification are either done visually, so using it for very dense time series may be laborious and subjective, or based on the spectral characteristics of data, which limits the use of these methods for multi-sensor analysis when using very different data, like those obtained by the joint use of optical and Synthetic Aperture Radar (SAR) sensors. It is also possible to use external data as masks that condition the classification. As an example, in the Programme for the Estimation of Deforestation in Brazilian Amazon (PRODES), areas previously mapped as deforestation are masked from analysis, so these areas can not be classified as newly deforested in any given year. The knowledge of deforested areas mapped by PRODES is also used by the Land Use and Land Cover Mapping of Deforested Areas in Legal Amazon Project (TerraClass) [16], so features of secondary vegetation are only mapped in previously deforested areas.
Specifically for dense time series, it is also possible to segment the image time series considering a given statistical rule so each segment represents a period of analysis of no change [7].These segments of no change are then classified as land cover classes, using either traditional supervised classifiers, those based on the distance between time series [17], or those that consider the spectral variation of a feature over time [18].
Given the presented drawbacks of the commonly used approaches for land cover trajectory classification, it would be convenient to have a classifier that: 1) returns only valid land cover trajectories; 2) uses automatic supervised classification approaches that require reference samples of land cover classes in each time as input, instead of reference samples of transitions/trajectories; 3) incorporates the knowledge of land cover dynamics and the information of multi-temporal data sets in a simple way; 4) is also applicable to multisensor data sets; and 5) can be used for sparse time series. In order to cover these aspects, we present a novel algorithm named Compound Maximum a Posteriori (CMAP) classifier, derived from a global generative classification approach, along with two case studies with different examples of land cover trajectory classification. The proposed algorithm is described in Section II. The data and methodology employed for the case studies are presented in Section III. The results are presented and discussed in Section IV. Conclusions and considerations for future work are drawn in Section V.

II. COMPOUND MAXIMUM A POSTERIORI ALGORITHM
Consider a single geographic position in which one wants to estimate the trajectory s = {ω k 1 1 , . . . , ω k t t , . . . , ω k T T }, T is the length of the time sequence and ω k t in which ω k t t is the actual class at time position t of s and k t is the indicator of the class in the set t that holds the K t possible classes on time t. Additionally, s ∈ = 1 ⊗ . . . ⊗ t ⊗ . . . ⊗ T , in which ⊗ is the Cartesian Product of sets. A given observation vector X = { x 1 , . . . , x t , . . . , x T } contains the T temporal observations that can indicate the trajectory composition, like the digital numbers in correspondent pixels in a given set of images X, and X ∈ X.
A generative method [19] for trajectory classification can be formulated as: Based on the definition of the conditional probability: s = arg s max(P( X |s) × P(s), s ∈ , X ∈ X). (2) in which P(s) is the a priori probability of a trajectory s. Supposing that the observations are independent in time and that each one depends only on the observed object, we have: As an example, for the Gaussian distribution, P( x t |ω k t t ) is calculated by [20]: (4) in which B is the number of observed variables (like the channels of the image) of the time t, m k t and k t are respectively the mean vector and the covariance matrix of the class ω k t t , | · | is the determinant and is the transpose.
Observe that Equation 2 returns the same trajectory as concatenating the Maximum Likelihood classifications in each point of time if the a priori probability of all trajectories are equal. The rule expressed in Equation 3, once the a priori term is removed, is called Compound Likelihood (CML). Once Equation 3 is inserted in Equation 2, we have what we refer to as the Compound Maximum a Posteriori (CMAP) estimation of a trajectory.
In eqnarray 2, P(s) is the a priori probability of a trajectory s and can be given by: Assuming the premise that the a priori probability of a class in a given time depends only on the class in the previous time, Equation 6 is the basic assumption of Markov chains [20], adapted here for trajectory classification purposes. The conditional probabilities of Equation 6 can be represented by a matrix that tabulates the probability of each transition between legends used in two consecutive dates, or a transition matrix for simplicity, as illustrated in Fig. 1. Given the premise of dependence to a previous time, this matrix will be referred here as 'forward transition matrix'. Following the concepts of [19], this structure is considered discriminative. In the forward transition matrix, we consider that Equation 7 is based on a hard classification approach, in which the summation includes all possible classes and these are mutually exclusive. In the example presented in Fig. 1, it means that the sum of probabilities in each line of the matrix equals to 1. However, it is also possible to assume that the a priori probability of a class in a given time depends only on the class in the posterior time. In this case and the transition matrix, referred as 'backward transition matrix', changes accordingly to depict the values of . Nonetheless, one may be interested in defining the values of P(s) based on less restrictive assumptions than the discriminative transition matrices. Here we propose a simplified model, in which P(s) = 0 for invalid trajectories and P(s) = 1/N s for valid trajectories, in which N s is the number of valid trajectories.

III. MATERIALS AND METHODS
In this section, we present the data and methodology for two applications of CMAP to generate land cover maps. Case Study 1 corresponds to what we considered a usual case of land cover trajectory classification: the classification of 6 images, one for each year between 2005 and 2010, from the commonly used sensor Thematic Mapper (TM) onboard Landsat5, with the same land cover class legend. Case Study 2 corresponds to the classification of images from different sensors and using different levels of a land cover class legend for each date/image. For this case study, we used one image from each of the following satellite/sensors: the Advanced Land Observing System (ALOS)/Phase Array L- Band Synthetic Aperture Radar (PALSAR), Landsat5/TM, and Earth Observer 1 (EO-1)/Advanced Land Imager (ALI), respectively from years 2008, 2010, and 2013. The study area, data, and methods used for each case study are described in the following sections.

A. STUDY AREA
Both case studies were conducted in a region along the Cuiabá-Santarém highway (BR-163) in Belterra, Pará state, within the Brazilian Amazon. This area is illustrated in Fig. 2 along with political and natural limits. This is an area of humid tropical climate with a relatively flat relief. The original vegetation cover is humid tropical rainforest, with the presence of woody lianas, palm trees and epiphytes [21]. Due to the occupation process, it is possible to observe patches of secondary vegetation in diverse stages of development, pasture and agriculture within the forest matrix. Additionally, an important feature in the region is the Tapajós National Forest, a forest reserve created by the Act number 73,684 of February 19, 1974, with approximately 600,000 ha. The study area has been selected due to the available information about land cover classes over the years, collected in field works realized by the research team between August and September of years 2009, 2010, and 2013, as well as those published in varied studies [22]- [24].

B. LAND COVER LEGENDS AND TRANSITION MATRICES
Ten main land cover classes were identified in the study area [24] and organized in four different legend levels. The definition of each land cover class is presented in Table 1 and the defined legends are illustrated in Fig. 3. The legend with 6 classes was used to classify all dates from Case Study 1. The remaining legend levels, with 5, 10, and 8 classes, were Labeled samples of each class of a correspondent legend were manually collected over the images, considering the field information, maps from the TerraClass project [16], and  visual interpretation of ancillary images. These samples were randomly divided into training and test sets. We collected at least 147 samples (pixels) for each class/year for the training set and at least 95 samples/class/year for the test sets. The class Agriculture was not found in the study area for years 2006 and 2009, so samples of the remaining 5 classes used in Case Study 1 were collected for these years. As an example, the labeled samples collected for 2010 using the most detailed legend level are presented in Fig. 4. In this figure, the samples are illustrated over the band 5 of the Landsat5/TM image of the same year.
We used both the simplified and the discriminative approaches for calculating the a priori probabilities of trajectories in each case study. For the simplified approach, we considered as valid trajectories those composed only by valid transitions, as illustrated by Figs. 5 and 6. For the discriminative approach, we considered the backwards transition matrices depicted in Figs. 7 and 8. In these matrices, the sum of each column equals to 1 (Equation 9). In Equation 8,

C. REMOTE SENSING IMAGES
The used images are described in Table 2. The Landsat5/TM images were collected from the United States Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center Archive at L1 processing level and tier 1 terrain correction. These images present co-registration with Root Mean Square Error (RMSE) less than 12 m, considered a suitably geometric correction for time-series analysis. The image from ALOS/PALSAR was acquired from the Japan Aerospace Exploration Agency (JAXA) in Fine Beam Dual   pixel size, re-sampled using the nearest neighbor approach to preserve the statistical properties of the images. All images were preprocessed in order to be co-registered and to have the same pixel size and geographic projection, as summarized by Fig. 9 and explained as follows.
Since the Landsat5/TM images were downloaded suitably geometrically corrected, preprocessing of these images comprised a feature selection step for both case studies, done in order to remove redundant or noisy spectral bands from the analysis. For Case Study 2, there is the additional re-sampling step. Feature selection was based on the minimum Jeffries-Matusita (JM) distance [25] between all pairs of classes of the 10 class legend used for Landsat5/TM classification in Case Study 2 (Fig.3). Bands 2, 4, and 5 were selected for classification. The ALOS/PALSAR image was first geocoded to UTM (zone 21S)/WGS84, converted to ground range, and transformed to intensity in ASF MapReady 3.0 software. It was then orthorectified using Shuttle Radar Topography Mission version 4 (SRTM4) data and the Rational Function Model (RFM) from the PCI Geomatics 13.0 software (with less than 7.5 m of RMSE), and re-sampled to square pixels of 15 m. The orthorectified ALOS/PALSAR image was then filtered by the Stochastic Distances Nonlocal Means (SDNLM) filter [26] and converted to amplitude values. The parameters used in SDNLM are filtering window=5 × 5 pixels, patch=3 × 3 pixels and confidence level=90%. For analysis, we used both the HH and HV polarizations.
The EO-1/ALI image was geometrically corrected using first order polynomial transformation and control points col- lected using the Landsat5/TM image from 2010 as reference, with less than half a pixel (15 m) of RMSE. Bands 4', 5' and 7 from the EO-1/ALI image were selected for use based on the minimum JM distance between all pairs of classes from the 8 class legend used to classify this image.
In Study Case 1, areas under clouds or cloud shadows at any given point were masked from analysis (2485 pixels, less than 0.3% of the image subset). The image subsets used for Study Case 2 were cloud-free.

D. IMAGE CLASSIFICATION AND EVALUATION OF RESULTS
The land cover trajectory classification process is similar for the two case studies and is illustrated in Fig. 10. In both cases, CMAP was calculated considering the training samples collected over each image and using Gaussian distribution. We also classified each image using the traditional ML pixelwise classifier, in order to verify the impact of including the a priori probabilities of each land cover trajectory. We implemented CMAP and ML classifiers in R language and processed the classifications for both case studies in a computer with Windows 7, Intel R Core TM i7-4770 (3.40 GHz) Processor, and 8 GB of RAM memory. Processing time for each set of classifications using a non-optimized version of the code varied between 12 and 17 minutes. We also independently implemented a version of the algorithm in Interface Description Language (IDL) which we used to verify the correctness of results acquired using the R version.
Classified images were evaluated using confusion matrices and global and class-focused accuracy indexes. Both the confusion matrices and the derived accuracy indexes are calculated based on the Monte Carlo approach and evaluated by the average and standard deviation values. For each classified image, we randomly selected 50 samples per land cover class from the test set and constructed one confusion matrix, from which the Overall Accuracy (OA), Global Kappa Index, Pro- ducer's Accuracy (P.A.) and User's Accuracy (U.A.) indexes were calculated. This process was repeated 100 times for each land cover classified image, with varying sample selection, resulting in 100 values for each index, which were compared using the Wilcoxon unpaired test at 1% significance level. Additionally, we also verified the number and placement of disagreements in pixel classification considering the ML and CMAP classifiers, as well as the number of invalid transitions obtained when stacking ML classifications. We also checked the eventual classification of invalid trajectories from CMAP results, to further validate the algorithm.

A. CASE STUDY 1
As previously mentioned, a common way to obtain land cover trajectories would be stacking land cover classifications obtained by the independent use of supervised classifiers, such as ML. The stacking of ML classifications in Case Study 1 resulted in 54.9% of the observed pixels classified as an invalid trajectory, that is, a land cover trajectory that contains at least one impossible transition resulting from obvious classification errors. The invalid trajectories obtained using ML are illustrated in Fig. 11. In CMAP, these trajectories are non-existent.
The number of pixels in which the traditional ML classification and the CMAP ones disagreed is illustrated in Table 3. As can be seen, the use of the discriminative transition matrix to calculate the a priori probabilities of trajectories resulted in more pixels with disagreement than the use of the simplified approach. This was expected, since the simplified approach effectively produces results similar to concatenating ML classifications but only considering possible transitions, whereas the discriminative matrix has the potential to change the classification at each time by weighting possible transitions differently.
A subset of the classifications of years 2006 and 2007, respectively the ones in which fewer and more disagreements occurred, are presented in Fig. 12. Notice that in ML classifications there are many pixels misclassified as Developed Secondary Vegetation scattered among those classified as Forest.
In CMAP results, pixels classified as Developed Secondary Vegetation are concentrated in features located in previously deforested areas, although the information about the actual event of deforestation was not included in the analysis. As can be seen, just by avoiding invalid or improbable transitions, the use of CMAP reduces the classification noise, without considering a contextual approach.
As a secondary effect of only classifying valid trajectories, CMAP also provides improved land cover classification for each date. The average values of the Global Kappa indexes for the classifications obtained using CMAP are presented in Table 4, along with the values for ML traditional classifications, for comparison purposes. In this table, the highest value for each Global Kappa, as well as those statistically similar at 1% of significance level, are highlighted in bold font. The complete set of accuracy indexes for the classifications of Case Study 1 is presented in Appendix A.
The use of CMAP led to higher values of Global Kappa and Overall Accuracy indexes than the traditional ML classifier for 5 of the 6 classified images of Case Study 1. Only for 2006, ML and CMAP with the simplified approach for the trajectory probability calculation obtained statistically similar  overall indexes. Regarding the approach used for calculating the trajectory probability, the simplified one achieved similar overall indexes than the one based on the discriminative matrix for 5 of the 6 images. The average Overall Accuracy values differed in less than 0.02 for all CMAP classifications. Improvements in overall classification indexes occur mainly due to the improvement of the classification of the classes Initial Secondary Vegetation, Developed Secondary Vegetation, and Forest, as expected given those are the classes involved in the majority of the defined invalid transitions. The decrease in the misclassification of these classes can be assessed in the average confusion matrices of the classifications for 2007 presented in Table 5. In this sense, we also perceive a significant improvement of the User's Accuracy of the classes Developed Secondary Vegetation and Forest in CMAP classifications, compared to the ML based ones. It happens because the transitions that result in these classes are limited in CMAP, so the number of pixels of other classes 98794 VOLUME 8, 2020  misclassified as Developed Secondary Vegetation and Forest tends to reduce. This is exemplified by the average User's Accuracy values of the class Developed Secondary Vegetation illustrated in Table 6. As expected, classes not limited in CMAP in this case study (Bare Soil, Agriculture, and Pasture) are very similar in all the analyzed classifications of each date, as indicated by statistically similar Producer's and User's Accuracy average values for the majority of classifications (Appendix A).

B. CASE STUDY 2
The validity of land cover trajectories obtained by stacking the ML classifications of Case Study 2 is illustrated in Fig. 13. The percentage of pixels in which the traditional ML classification and the CMAP ones disagreed is presented in Table 7.
The average values of the Global Kappa for the classifications of Case Study 2 are illustrated in Table 8. The highest  value for each index, as well as those statistically similar at 1% of significance level, are highlighted in bold font. The complete set of accuracy indexes for the classifications of Case Study 2 is presented in Appendix A. VOLUME 8, 2020  Approximately half of the pixels (50.1%) in the used images would be classified as an invalid land cover trajectory by the stacking of the ML classifications in Case Study 2. Because of its formulation, all land cover trajectories obtained by CMAP are valid. Similarly to Case Study 1, we also obtained more accurate land cover classification using CMAP rather than using ML, with classes involved in invalid transitions being the most changed.
The improvement in classification results is particularly expressive for the classification of the ALOS/PALSAR image of 2008 (gain of 30.4% in the average Kappa value and more than 30% of changed pixels in relation to ML classification) mainly due the contribution of optical data in other dates to decrease the confusion between Secondary Vegetation and Forest, as illustrated in Fig. 14 and Table 9. As is widely known, optical data may be greatly affected by cloud cover whereas SAR data can be acquired almost independently of atmospheric conditions [2]. Therefore, in areas frequently covered by clouds, such as tropical forests, it is probable that only SAR data may be available on certain periods of the year. This type of data usually delivers poorer results for land cover classification than optical data in tropical forests, mainly due to the confusion between primary and secondary forests [23], [27], and more refined data processing and classification methods are necessary [28]. However, CMAP allows the use of information of optical data available in other dates to better classify the SAR image on the date of interest, even allowing the use of different legends in order to meet accuracy requirements.

V. CONCLUSION
This study presented the Compound Maximum a Posteriori (CMAP) classifier and two applications for land cover trajectories classification. For both applications, CMAP was capable of incorporating the knowledge of land cover dynamics to classify multi-temporal sets of remote sensing images directly into valid land cover trajectories. These results were compared to those obtained using the post-classification comparison approach and the traditional Maximum Likelihood (ML) classifier.
For the presented case studies, at least 50% of the land cover trajectories generated by post-classification comparison and ML were considered invalid. By its own design,  CMAP does not return invalid transitions/trajectories as classification results. In a way, CMAP expands the ML classifier by adding a multi-temporal a priori probability of trajectories and excluding invalid transitions from the analysis. The main difference between CMAP and ML is that in CMAP the land cover trajectory is classified as a whole, so the decision about the attributed land cover classes is done after all probabilities of each time are calculated. In ML, decisions are made independently for each date. It means that, for each image being analyzed, CMAP is able to navigate among the class likelihoods in order to choose the highest one that results in a valid transition and not simply the highest one, which is not viable in hard algorithms like the traditional ML classifier. In this sense, we verified that not only the use of CMAP leads to better accuracy indexes, but it also reduces the classification noise for individual dates.
CMAP is a fairly direct way of introducing previous knowledge into the classification process. The only additional input in relation to the traditional ML classifier is the calculation of the a priori probability of trajectories, which may be as simple as defining which transitions are valid and which are not. We proposed two ways to calculate these probabilities, namely the simplified approach and the one based on discriminative transitions matrices. Although both approaches resulted in land cover classification with similar overall accuracy indexes, the discriminative approach may be more suitable for difficult classification situations, because of the possibility of weighting each transition differently. Another interesting feature of CMAP is that it allows for different legends being used for each image, which may be interesting in cases of low class separability for specific data sets, as illustrated by the classification of one SAR image.
A few limitations were identified in the way we applied the CMAP classifier in the present study. The first one is the use of land cover training samples collected for each image to be classified which may be very expensive for studies with a high number of images. However, parallel studies show good results in using the spectral information from one Landsat5/TM image to train an ML classifier that will be used to classify a different calibrated image from the same sensor [29]. This process is known as signature extension, spectral extensibility or generalization of training samples [30]. Thus it is also expected that generalizing training samples for use in CMAP will lead to accurate classification results.
The second perceived limitation regards areas under clouds/cloud shadows that were masked from analysis in this study. Although these areas were small, greater areas are expected when using a higher number of optical images, even considering image composition techniques. How to properly integrate areas under clouds/shadows in CMAP and the impacts of generalizing training samples and image composites in CMAP will be investigated in future studies.
Additionally, we focused on simple assumptions about the land cover transitions in the present study, such as if these were possible or impossible given the class definition and time gap. Although we have found promising results using CMAP this way, future studies could benefit from more rigorous methods of calculating the used transition matrices, as well as analysis in different classification scenarios (study area, classes of interest and type of data).

APPENDIX A ACCURACY INDEXES
In this section we present the full set of accuracy indexes calculated for the classifications analyzed in this manuscript. The indexes calculated for the classifications from Case Study 1 and Case Study 2 are, respectively, illustrated in Tables 10 and 11. In these tables, the highest value for each index, as well as those considered statistically similar using the Wilcoxon unpaired test at 1% of significance level, are highlighted in bold font.