Interpretable POLSAR Image Classification Based on Adaptive-Dimension Feature Space Decision Tree

Decision tree method has been applied to POLSAR image classification, due to its capability to interpret the scattering characteristics as well as good classification accuracy. Compared with popular machine learning classifiers, decision tree approach can explain the scattering process of certain type of targets by use of the polarimetric features at the tree nodes. Except the interpretability, decision tree approach could be transplanted to other data set without training process for the same terrain types, since the polarimetric features are inherently connected to the physical scattering properties. Currently, decision tree based classifiers, typically employ one single polarimetric feature at the nodes of the tree. The idea to increase the number of the polarization features at the decision tree node is expected to improve the classification result, which combine two or more polarimetric features to form a two or high dimension feature space. In this way, the classes which cannot be discriminated with one feature could possibly be separated with the space constructed by several features. However, it also inevitably leads to an increase in the computational burden. In fact, not all nodes require very high-dimensional feature space to achieve high classification precision. Therefore, in this article we proposed that the dimension of the feature space used in the decision tree nodes is adaptively changed from one to three, due to the separability of the classes under this node. The developed classification method is examined by the classical AIRSAR data in Flevoland area of the Netherlands, as well as GaoFen-3 data in Hulunbuir of China. The experiments show that the classification performance is superior to the fixed dimension feature decision tree methods, with less and reasonable computation time. Besides, the transferability of polarimetric features obtained by decision tree is preliminarily demonstrated in the application to another AIRSAR data.

features has become one of the central tasks of POLSAR image processing [7]. Combining polarimetric features with different classification methods to achieve more effective PolSAR image classification is an important research field. According to whether there exists data label and manual intervention, the classification methods can be mainly divided into two types, supervised and unsupervised methods. The unsupervised classification methods classify data according to their statistical characteristics without prior knowledge, such as complex Wishart [8]- [10], k-means clustering [11], [12], fuzzy c-means clustering [13], Expectation Maximization [14] and so on. The unsupervised classification methods cannot obtain satisfactory classification accuracy, when the difference in scattering characteristics of targets is small. There are a lot of supervised classification methods have been proposed, such as Support Vector Machine (SVM) [15], Random Forest (RF) [16], [17], deep learning [18], [19], Nearest-Regularized Subspace(NRS) [20] and so on [21]. Although the differences of polarimetric features are used to classify targets in these supervised classification methods, which are data-driven, the scattering characteristics of targets are not mapped to the certain features.
Compared to those popular machine learning and deep learning methods, the classical decision tree approach has its own advantages over several aspects. By reviewing the related work in the literature, the advantages are briefly summarized as follows: 1) Interpretability: The polarimetric radar data could capture the geometrical and bio-physical information about targets, which is the key capability for realizing unsupervised classification. Geometrical and bio-physical properties of targets decide their radar scattering process.
In the decision tree method, the polarimetric features used in the tree nodes for discriminating certain categories are clear, so it is possible explain the scattering characteristics of these categories. 2) Transferability: Generally, the observation capability of the certain POLSAR sensor keeps stable, hence the polarimetric features for classification can be probably applied to the data in different area acquired by the same sensor, without training process. Furthermore, for different sensor data, it is also possible to be utilized in a proper way, because the similar targets also show close scattering properties and polarimetric features as well. 3) Adjustability: The hierarchical structure of decision tree presents the role of the used polarimetric features, as well as their relations with terrain types, thus we could increase the classification accuracy of our concerned type through adjusting the tree structure such as branch order or employed features, according to the needs in reality. At present, almost all decision tree based classifiers employ only one single polarimetric feature at the nodes of the decision tree. Zhang et al. used one-dimensional feature decision tree to classify crops and explained the scattering mechanism of crops through the classification results [22]. Zhang and Yan inputted 72 polarimetric features to establish a one-dimensional feature decision tree, and pointed out that with the increase of polarization characteristics, the useful information for classification may also be increase, and the classification results of terrain objects tend to be accurate [23]. In the [24], Jain and Singh proposed a decision-tree-based approach for land cover classification of Radarsat-2 data, multiple inequalities containing polarimetric parameters are used, but no feature space is formed. Thakur et al. developed a decision tree based on separability index to classify ALOS-PALSAR data [25]. And G. S. Phartiyal et al. attempt to analyze the polarimetric signature to decide the individual class boundary values which will help in building a decision tree based classification technique [26]. Those above methods focus on the feature selection or optimization before decision tree algorithm, not on the improvement on the tree nodes.
In the field of decision tree application, the focus of research is mostly on extending feature set, but the nodes of decision tree still adopt one polarimetric feature. Traditional one-dimensional feature decision tree has poor classification ability among targets with similar scattering characteristics. Due to the complexity of the physical properties of the actual targets, they often show more than one kind of mixed scattering characteristics. Therefore, the scattering characteristics of an actual target cannot be explained completely by one polarimetric feature. One dimensional polarimetric feature could discriminate the basic type of scattering mechanism well, such as surface scattering, double-bounce scattering, and volume scattering. However, it could not separate the vegetation very well because almost all the vegetation contains not only volume scattering by branches and leaves, but also some double-bounce scattering mechanisms from the dihedral constructed by soil surface and trunks. Shao and Hong [27] increased the dimension of the decision tree nodes to two and verifies that it can improve the accuracy of classification. As the number of polarimetric features increases, the separability of classes increase as well, so as the accuracy of classification [28], [29]. In this article, the maximum dimension of decision tree nodes reaches to three, and a classification method of adaptive-dimension feature decision tree is proposed. The dimension of the nodes starts from one, and expands to two or three, depending on the purity of the linearly separable clusters. In the continuously updated training sample set, the Fisher Linear Discriminant analysis [30], [31] was used to project the multidimensional space into one dimension. Then Jeffries-Matusita (J-M) distance [32], [33] is adopted to calculate the thresholds which used to divide the boundary. The determinant of confusion matrix [34] is calculated as purity and the linearly separable clusters are selected with the highest purity. The end of the decision tree branch is decided when there is only one category label in both groups. The proposed classification method is examined by the widely tested AIRSAR data in Flevoland area of the Netherlands, as well as GaoFen-3 data in Hulunbuir of China. VOLUME 8, 2020 The main contribution of this article is to develop a new decision tree method for POLSAR data classification, in which the dimension of polarimetric feature space is adaptively decided at the tree nodes. Traditional fixed dimensional decision tree approaches are compared with the proposed one, including one to three dimension cases. Besides, classical SVM method is also experimented for comparison. Among them, the method developed in this study shows the best compromise between classification accuracy and computation efficiency. By the use of adaptive dimensional feature space, actually the better features for discriminating certain class groups are founded. It achieves the grained interpretation of similar scattering mechanisms of terrain types, and is preliminarily demonstrated by transplanting the features obtained at the tree nodes directly to another data sets of the same sensor for classification without training process.
In the following sections, the theory and classification methods for this article are described in Section II, the experimental data sets and results analysis are given in Section III, the discussion is given in Section IV, and conclusion is given in Section V.

A. POLARIMETRIC FEATURES
In this article, eight polarimetric features [27], [35] are selected for the decision tree, as shown in Table 1. The 8 features can be divided into two categories, one is from the second-order backscattering matrix, including |S HH | 2 (the backscattering power of HH), |S VV | 2 (the backscattering power of VV), S HH S * VV (the co-polarization cross product), CPR(the ratio of co-polarization component to cross-polarization component) and Span(the total backscattering power). And the other is from the polarization decomposition components, including α(the scattering angle), H (1 − A)(the combination of entropy and anisotropy) and P V (the volume scattering component from Freeman-Durden decomposition). In full polarization observation, assuming that the mode of transmission and reception is linear horizontal and vertical polarization, the backscattering matrix [S] is expressed as where S HH and S VV contain the backscattering power of the co-polarization channel, S HV and S VH contain the backscattering power of cross-polarized channel. If the transmitted and received signals are transmitted in the medium that satisfies the reciprocity, the backscattering matrix also satisfies the reciprocity theorem, i.e., S HV = S VH . According to the backscattering matrix, and assuming reciprocity, the following three circular polarization components where S RR and S LL represent right-right circular polarization component and left-left circular polarization component respectively, S RL represents right-left circular polarization component. Circular Polarization Ratio (CPR) can be used to classify three scattering mechanisms, i.e. surface, volume and double-bounce scatterings, respectively [36]. It is defined as The polarimetric covariance matrix is derived from the scattering matrix, and assuming reciprocity, it is defined as where * represents the complex conjugation, · represents the ensemble average of time or space, and S XY represents the complex scattering amplitude when the transmitted and received signals have a polarization X and Y , respectively.
|S HH | 2 and |S VV | 2 can effectively improve classification accuracy [37]. While S HH S * VV could distinguish three types of scattering: single bounce, volume and double bounce scattering [38].
Total scattering power is an important representation of spatial information can be used for image edge extraction, texture analysis etc., which is the sum of the diagonal elements of the covariance matrix.
Cloude and Pottier proposed a eigenvalue based decomposition theory using second-order statistics to extract the average parameters of samples [39]. Three averaged parameters can be derived from the covariance matrix [C 3 ]: mean scattering angle (α), entropy (H ), and anisotropy (A), which are defined as where p k = λ k / 3 k=1 λ k , (k = 1, 2, 3). And the eigenvalues are arranged in order from large to small (λ 1 > λ 2 > λ 3 > 0). The mean scattering angle α, ranging from 0 • to 90 • , describes the continuous variation of scattering mechanism, varying from surface scattering(α ≈ 0 • ) to dipole scattering(α ≈ 45 • ) and then to double bounce scattering(α ≈ 90 • ). The entropy H , ranging from 0 to 1, represents the randomness of the scatterer from isotropic scattering (H = 0) to totally random scattering (H = 1). The anisotropy A is very useful for discriminating scattering mechanisms, especially for those with different eigenvalue distributions but similar entropy values. The combination of entropy and anisotropy H (1 − A) represents the random scattering process, which satisfies the high value of H , and the low value of A, which means λ 2 ≈ λ 3 ≈ λ 1 .
Freeman-Durden decomposition is an incoherent matrix decomposition method based on three physical scattering models, for surface, dihedral and volume respectively. Assuming that these components of are not correlated, the polarimetric covariance matrix [C 3 ] can be expressed as where f V , f S , and f D is the contribution of volume, surface and double-bounce scattering component. The 5 parameters α, β, f S and f D are estimated from the actual radar data.
The results of F-D target decomposition are P S , P D , and P V , which represent the power of three scattering mechanism components, respectively. where

B. FISHER LINEAR DISCRIMINANT ANALYSIS
When the decision tree nodes employ the feature space instead of the single feature, the feature space needs to be projected to certain direction so as to judge the linear separability of classes. In this article, Fisher Linear Discriminant Analysis (FLD) [30], [31] is adopted to obtain the projection direction with the largest degree of dispersion between classes, and projects the feature space into one dimension. The formula is as follows where ω T is the projection direction. However, projecting multidimensional space into one dimension will result in loss of data. Originally well-classified classes in the multidimensional space will be severely overlapped after being projected to one dimension. Therefore, Fisher proposed a standard function (Fisher's ratio). The formula is as follows where m 1 and m 2 represent the intra-class mean of the projected samples, S 1 and S 2 represent the standard deviations of the intra-class scatters of the projected samples.

C. J-M DISTANCE
In this article, J-M distance [32], [33] is used to calculate the degree of separation between the samples. The range of distance is [0, 2]. ''0'' means that two sample categories are completely confused while ''2'' represents two sample categories are completely separated. The J-M distance formula is as follows where B is the Bhattacharya distance where m 1 and m 2 represent the mean value of two categories, σ 1 and σ 2 represent the standard deviation of them. When the J-M distance on one feature satisfies a linearly separable condition between two categories, the thresholds of the two categories in this feature can be calculated, according to the Gaussian probability distribution density function where ω 1 and ω 2 represent two classes, P(ω 1 ) and P(ω 2 ) represent the prior probability, and P (x|ω 1 ) and P (x|ω 2 ) represent the posterior probability. When x 0 is present, and P (x 0 |ω 1 ) = P (x 0 |ω 2 ) is established, the two classes have the best separation effect. Therefore, the value can be used as the threshold. The formula for calculating the threshold T is as follows where A = log 10 ( σ 1 σ 2 × m 2 m 1 ).

D. PROPOSED METHOD
The traditional one-dimensional decision tree cannot satisfy the classification requirement very well, especially has poor classification ability among targets with similar scattering characteristics. Due to the complexity of the physical characteristics of the actual target, it often shows more than one kind of scattering properties. Therefore, selecting several polarimetric features to form feature space at some nodes is not only expected to improve classification accuracy, but also to explain the scattering mechanism of targets more comprehensively. However, when more than three polarimetric features are selected at tree nodes, the computational complexity will increase to an unacceptable level. In this article, an improved decision tree classification method is proposed, and the core part is to construct the adaptive-dimension feature space, which could contain one, two and three dimension. FLD and J-M distances are used as the linear separability measurement and the boundary partition algorithm, and ''purity'' is used as the branching criterion of the decision tree. Here we adopt the VOLUME 8, 2020 determinant of the confusion matrix as purity. The confusion matrix [C] is defined as follows: where c 11 and c 22 represent the number of correctly classified samples, c 12 and c 21 represent the number of wrong classified ones. Then, the normalized confusion matrix is calculated as purity, i.e. p = | ∼ C |. There is no doubt that the more samples are correctly classified, the closer the calculated purity is to ''1''. The method of calculating the purity, taking into account the number of correctly and incorrectly classified samples, ensures that the combination of terrain types is optimal.
In this article, a classification method of adaptivedimension decision tree is proposed. At first, we need to construct the decision tree with training samples, then the classification of testing samples with the tree is implemented. The construction of decision tree is as shown in Algorithm. 1, and the diagram of classification with adaptive-dimension decision tree is shown in Fig. 1.

III. EXPERIMENTS
This section consists of three subsections. Subsection III-A introduces the information of experimental data set. Subsection III-B analyses the fine-grained interpretability of the adaptive-dimension feature decision tree. Subsection III-C gives the classification accuracy comparison of different methods.

A. DATA SET
The experimental data set is the widely-used L-band data acquired by NASA/JPL AIRSAR system over the Flevoland test site in mid-August of 1989. The incidence angles are around 20 • at the near range and 44 • at the far range. There are 15 different terrain types were marked in the ground truth image, including stem bean, forest, potatoes, alfalfa, wheat, bare soil, beets, rapeseed, peas, grasses, water, barley, buildings, wheat2 and wheat3. The size of the datum is 750×1024 pixels. It is filtered to reduce the speckle noise by using refined Lee filter (7 × 7 window size). The Pauli image and the ground truth image are shown in Fig. 2. And 5% of the labeled pixels are selected as training pixels. The number of samples is shown in Table 2.

B. FINE-GRAINED INTERPRETABILITY
The decision tree classification method is different from other POLSAR image classification methods, for it retains the polarimetric features used in tree nodes which can be used to describe target scattering mechanism and interpret classification rules. Compared with the classical polarimetric feature decision tree, the proposed method can realize the one-time classification of a single terrain type, that is, a single terrain type only corresponds to one leaf node in the decision tree. classification result of the node ← J-M distance 9: p ← | ∼ C | (C: confusion matrix) 10: purity ← max {p} 11: end for 12: if purity ≥ H then 13: break 14: else if H ≥ purity ≥ L then 15: Construct two-dimensional feature space: 16: for each f 1 ∈ [1, 8] do 17: for each f 2 ∈ [1, 8] do 18: dimensionality reduction ← FLD 19: classification result of the node ← J-M distance 20: p ← | ∼ C | (C: confusion matrix) 21: purity ← max {p} 22: end for 23: end for 24: else 25: Construct three-dimensional feature space: 26: for each f 1 ∈ [1, 8] do 27: for each f 2 ∈ [1, 8] do 28: for each f 2 ∈ [1, 8] do 29: dimensionality reduction ← FLD 30: classification result of the node ← J-M distance 31: p ← | ∼ C | (C: confusion matrix) 32: purity ← max {p} 33: end for 34: end for 35: end for 36: end if 37: Select the node with the highest purity for branching 38: repeat 39: carry out processing: 3 ∼ 37 40: until the node contains only one terrain type Because of the complexity of the physical characteristics of the actual targets, they often shows more than one kind of mixed scattering phenomena. In this case, if single polarization feature is used at the tree nodes, then the dominant scattering mechanism of the target is identified, and the description of the scattering mechanism for complex areas (such as agricultural regions) is incomplete. Therefore, in this  article, we replaces a single feature with the adaptive dimensional feature space which adaptively selects the number of dimensions according to the degree of separation difficulty.
Here the maximum value of the feature space dimension is increased to three, which enhances the fine-grained interpretation ability of the decision tree.
The branch order of the proposed decision tree for the AIRSAR Flevoland data is shown in Figure 3. It can be seen  that, those targets with significant differences in the scattering mechanism will be separated firstly with low dimensional polarimetric feature space. As the separation difficulty increases, the number of polarimetric features selected at the decision tree nodes will increase up to two or three, such as N8-N13.
In the structure of the decision tree, the polarimetric features for node classification can be seen visually. Therefore, the polarimetric features are connected with the physical characteristics of the targets, which can explain the role of polarimetric features in classification. For example, the HH backscattering power of the buildings is greater than that of other targets, which could be easily separated by HH scattering power at the node N0. The scattering mechanism of bare soil is similar to surface scattering, but the mean scattering angle is smaller than that of other targets except rapeseed. Considering that the total scattering power of bare soil is also small, hence, with both α and Span bare soil can be discriminated from others. Following that, the water surface is relatively simple with roughness surface scattering, so the entropy value is much smaller than other unclassified targets, thus water can be separated by using the feature  H (1 − A). Stem bean, forest, and potatoes have dense leaves, so they have volume scattering which could distinguish them from other targets. Usually the volume scattering intensity of stem bean is smaller than that of forest and potatoes. Further research found that |S HH | 2 can be used to subdivide VOLUME 8, 2020 stem bean and potatoes, while |S VV | 2 can be used to subdivide stem bean and forest. Therefore, using P V , |S HH | 2 , and |S VV | 2 simultaneously can be more accurate for the separation of stem bean. The scattering angle and canopy scattering intensity of forest are both greater than those of potatoes, also because of the complex scattering phenomenon in the forest area, the entropy value is greater than that of potatoes. Therefore, the simultaneous use of α, H (1−A), and P V can separate the forest and potatoes. The total scattering power of wheat3, grasses, alfalfa, and barley is much larger than other unclassified targets, hence using Span can put them into a group. And since the circular polarization ratio of this group is slightly larger than the unclassified targets, using CPR can further improve classification accuracy. The scattering mechanism of beets, peas, and wheat is the combination of surface scattering and volume scattering. However, the co-polarization cross product of beet is smaller than that of wheat and peas. Although the scattering mechanism of peas and wheat is similar, but their HH backscattering ability is significantly different.

C. CLASSIFICATION ACCURACY
In order to highlight the advantages of the proposed method, based on the same training samples and test samples, classical decision tree, that is the one dimensional polarimetric feature decision tree, as well as two and three-dimensional decision tree and classic Support Vector Machines(SVM) classifier are performed. The images of classification results are shown in Fig. 4. The classification accuracy of five methods is compared, as shown in Table 3. Because the number of samples is quite different, it is unreasonable to use the total classification accuracy to measure the classification results, so the average accuracy (AA) is used to compare the classification results in this study.  Table 3, the one-dimensional polarimetric feature decision tree has good classification accuracy only for easily separable terrain types, such as bare soil, buildings and so on. However, the classification accuracy of stem beans and water is only 45.13% and 42.75%, respectively. This situation has been greatly improved in the two-dimensional polarimetric feature decision tree, but using two-dimensional feature space caused feature redundancy and reduced classification accuracy for easily separated terrain types, such as buildings and bare soil. It is worth noting that in the classification results of the three-dimensional polarimetric feature decision tree, the classification accuracy of water is only 51.86%, which further indicates that only simply increasing the number of features at the tree nodes will cause the polarimetric feature redundancy and reduce the classification accuracy.

As shown in
The average classification accuracy of the proposed method is 8.18% higher than that of one-dimensional polarimetric feature decision tree and 0.95% higher than that of two-dimensional polarimetric feature decision tree and 1.61% higher than that of three-dimensional decision tree. It can be seen that the one-dimensional tree has the lowest result, where specific type has very low accuracy; the two-dimensional tree improves the phenomenon a lot; while the three-dimensional tree causes the accuracy lower than the two-dimensional one. Our proposed adaptive dimensional tree achieve the better results than all, especially for those complicated types, but with better interpretability on scattering properties as well as relatively reasonable computation complexity. Although the SVM classifier cannot explain the role of polarimetric features in classification, we still compare its performance with the proposed method. The average classification accuracy of the proposed method is 9.81% higher than that of SVM. The penalty parameter in SVM is set to 1000 in our experiment.
The experimental environment is as follows: MAC OS operating system, Intel Core i5 processor, 8G memory, Inter Iris Graphics 6100 graphics card and Matlab software. By calculating the running time of the code, it is concluded that the running time of nodes using one-dimensional feature space is about 3.5 seconds, the classification time of nodes using two-dimensional feature space is about 40 times of one-dimensional feature space, about 130 seconds, and the classification time of nodes using three-dimensional feature space is about 2 times of two-dimensional feature space, about 260 seconds. Since the adaptive-dimension polarimetric feature decision tree has seven nodes using three-dimensional feature space, four nodes using two-dimensional feature space and two node using one-dimensional feature space, so the running time is reduced by 28% compared to the three-dimensional decision tree.
Another dataset is empolyed to test the effectiveness of the proposed method. The data is the fully polarimetric C-band image collected by the GaoFen-3 SAR system over the Hulunbuir test site. In this GaoFen-3 data set, four categories are known, including forest, bare land, cole and wheat. According to the unsupervised classification and experience, the other four categories were found, which were assumed to be grasses, water, sand and wetland. The Pauli image and ground truth are shown in Fig. 5. 3% of the labeled pixels were selected as training pixels. The number of samples is shown in Table 4.  The experimental environment is the same as the previous data. Fig. 6 gives out the classification results of four methods, and Table 5 indicates the classification accuracies of them. As shown in Table 5, the average classification accuracy of adaptive dimension decision tree is  7.32% higher than that of one-dimensional decision tree and 0.17% higher than that of two-dimensional decision tree. When using the three-dimensional decision tree classification method, the classification accuracy is reduced by 3.82% compared with the proposed method, which is due to redundancy of nodes, which further proves the necessity of adaptive dimension. In this data experiment, SVM achieved a bit higher classification accuracy of 0.43%, since the 8 terrain types here has larger differences between each other, which makes the discrimination much easier. Even from the Pauli image in Fig. 5 the scattering diversity could be observed. However, in the first data experiment, many types all belong to vegetated/agricultural areas with similar scattering characteristics. So the advantage of the proposed approach is more obvious in the previous experimentation. Considering that two-dimensional decision tree runs 40 times of one-dimensional feature space, three-dimensional decision tree runs 80 times of one-dimensional feature space, and because that the adaptive dimension decision tree has one node using three-dimensional feature space and six nodes VOLUME 8, 2020 using two-dimensional feature space, so the running time is reduced by 43% compared to the three-dimensional decision tree.
From the comparison of classification experiments for all these methods, it could be seen that the proposed decision tree with adaptive dimensional feature space is optimal to other fixed dimensional decision tree approaches, with the consideration of improved classification accuracy and acceptable computation complexity. SVM could achieve similar or a bit lower classification results with our method, however, it could not provide the relation between polarimetric features and the terrain types. Therefore, SVM is not interpretable method, which cannot be extended to other datasets of the same sensor for unsupervised or low-supervised classification. In the following discussion section, the transplantation of polarimetric features obtained by decision tree is implemented. It could not only achieve high classification accuracy, but also save the time for training process.

A. NECESSITY OF ADAPTIVE DIMENSION FEATURE SPACE
The necessity of using adaptive-dimension feature space for decision tree nodes is demonstrated by classifying bare soil from other targets from Flevoland data set. As shown in Fig. 7, when the bare soil and other targets are classified by the feature f 1 (α) or f 8 (Span) respectively, it is difficult to achieve separation of them. However, as shown in Fig. 8, the two-dimensional scatter plots distributed on f 1 and f 8 could clearly separate the bare soil from other targets. On the contrary, for certain terrain types, one single feature is enough to be discriminated from others. As shown in Fig. 9, the complete separation of buildings and other targets can be achieved by the use of only f 2 . As shown in Table 3, the classification accuracy of buildings using one-dimensional feature space has reached 100%, but the classification accuracy of buildings using two-dimensional feature space is 99.33%. For this type, by increasing the number of polarization features at the node not only causes an increase in computational complexity, but also reduces the classification accuracy.
Therefore, it is very important to select the feature space dimension at the nodes of decision tree, by means of determining the high and low thresholds of the purity. In order to ensure that the nodes with one-dimensional feature have enough capability for separation, the high threshold is set  to ''1''. When the purity of classification results of training samples is maximum (= 1), it loses significance to continue to increase the dimension. The low threshold is set to ''0.97'' based on experience. When the purity of classification results of training samples is greater than 0.97, the purity can generally reach to 0.99 or even higher after using the two-dimensional feature space. If the purity of classification results of training samples is less than 0.97, it can be considered that the scattering mechanism of the categories is quite similar, which needs to be classified by three-dimensional features. When the selected threshold is too low, the two-dimensional feature space cannot satisfy the classification of some categories. On the contrary, when the selected threshold is too high, the three-dimensional feature space will be used for those easily separated categories, which will increase computational complexity and sometimes decrease accuracy.

B. MIGRATION OF POLARIMETRIC FEATURES
The decision tree classification method based on polarimetric feature is different from the data-driven classification methods. The scattering mechanisms of the targets are described in the classification results. Because the observation capability of the certain POLSAR sensor keeps stable, the polarimetric features for classification can probably be applied to different data using the same sensor without training process. The migration of the proposed method is examined by using  another AIRSAR data in Flevoland area, as shown in Fig. 10. By comparing the same categories of the two AIRSAR data, the classification of eight categories can be transplanted, this is, rapeseed, potatoes, wheat, alfalfa, peas, stem bean, beets and grasses. The features in the constructed tree nodes are applied directly to the classification of the Flevoland II data without any training process, and the results are shown in Table 6. The decision tree nodes trained by the first AIR-SAR data can obtain satisfactory classification accuracy in the second AIRSAR data. High classification accuracies are achieved for all 8 types. It worth to be noted that for those targets with large differences in scattering mechanisms, such as rapeseed and peas, rapeseed and beets, achieve almost complete separation. The type Wheat here corresponds to the type Wheat2 in the first Flevoland data.

V. CONCLUSION
In this article, an adaptive dimension decision tree based on polarimetric feature is proposed, in which the feature space of adaptive dimension is used to replace the fixed one or two or three dimension cases. It not only improves the fine-grained interpretability of decision trees, but also improves the classification accuracy, with reasonable computation complexity. AIRSAR data in Flevoland area and GaoFen-3 data in Hulunbuir area are used to verify the validity of the method. Compared with the one-dimensional feature decision tree, the classification accuracy of proposed method improves 8.18% in AIRSAR data and 7.32% in GaoFen-3 data. In comparison with the two-dimensional decision tree, the developed approach increases 0.95% and 0.17%. As to the three-dimensional decision tree, the average classification accuracy of proposed method in AIRSAR data is 1.61% higher, while in GaoFen-3 data, it is 3.82% higher. Flexible selection of feature space dimensions can avoid feature redundancy, improve classification accuracy, reduce computational complexity effectively and explain the scattering mechanism of targets more scientifically and completely. In addition, compared with data-driven classification method SVM, with better or equivalent classification accuracy, adaptive dimension decision tree demonstrates its superiority in its interpretability and transplantation capability. Another AIR-SAR data in Flevoland area is used to realize the migration of polarimetric feature in terrain classification, and promising results are achieved for all the transportable terrain types.