Deep Evidential Remote Sensing Landslide Image Classification With a New Divergence, Multiscale Saliency and an Improved Three-Branched Fusion

Hitherto, image-level classification on remote sensing landslide images has been paid attention to, but the accuracy of traditional deep learning-based methods still have room for improvement. The evidence theory is found efficient to boost the accuracy of neural networks, however, the present study argues three challenges that hinder the lead-in of this theory in deep landslide image classification. Aiming at the three problems, this study makes three improvements. For the interpretability and decision-invariance losses of three previous divergences, we propose a belief Jensen–Renyi divergence with properties proven. To couple the evidence theory with deep remote sensing landslide image classification, a channelwise multiscale visual saliency fusion is developed. We additionally find that the channelwise fusion is capable to reduce false recognition of networks as compared with original RGB images. To avoid decision failures in evidence-theoretic fusion process, we design an interpretability improved three-branched fusion. Experiments on Bijie Landslide dataset corroborate the synergistic benefits of the three improvements, where the proposal is compared with state-of-the-art image classification backbone networks, remote sensing image scene classifiers, evidence fusion algorithms, and versatile evidence-theoretic deep learning classifiers. We also evaluated the new method with two sort of image degradation, as well as an actual scenario in Luding County, China, whose data is publicly available.

falls, topples, slides, spreads, and flows [2], to name a few, most of which have severe adverse effects on human life.According to Casagli et al. [3], from at least 2004 to 2016, the global total death toll from 4862 landslide events had approximated 56 000.Beyond any doubt, it has been an urgent request to monitor, detect, and classify landslides with guarantees of a high accuracy.
The landslide hazard can be fast recognized using visual surface morphology analysis through remote sensing technology [4].Considering the computer-assisted recognition systems, deep learning technique has been increasingly accepted due to its highly performed "black-box" feature extraction, which is driven by data avoiding sophisticated manual design on geomorphologic feature descriptors [5].From the subtasks of deep learning and computer vision, at least three manners have been investigated to link deep learning with autonomous landslide recognition, i.e., 1) landslide target detection; 2) pixel-level or object-based segmentation; 3) image-level classification.
The landslide target detection originates from traditional object detection [6] in computer vision, which indicates to box-up potential landslide areas using bounding boxes from images, such as the radar, multispectral, and panchromatic images remotely captured by airborne or satellite sources [7].Hou et al. [8] improved the YOLOX network to enhance the extraction of landslide features in optical remote sensing.Liu et al. [9] devised a SE-YOLOv7 network to improve detection precision.However, it is still an open question that how to find an optimal network structure for remote sensing landslide detection, especially when the background scenarios are intricate or the target landslide is small [10].The pixel-level landslide segmentation aims to identify the pixels (or subpixels) belonging to landslide areas using distinguishable visual features, whereas object-based methods group up adjacent pixels to enhance the utility of spatial distribution information [11].Some studies also refer to segmentation-based scheme as landslide detection or landslide semantic classification [12].In this realm, Ji et al. [13] associated attention mechanism with convolutional neural networks for optical remote sensing landslide segmentation.Lv et al. [14] also proposed a ShapeFormer for the same task.Except the mentioned works, many state-of-the-art learning techniques, including multiclass classification [15] and few-shot learning [16], are also found effective.Nevertheless, pixel or object-level detection has a large computational burden, and Wang and Qiao, [17] pointed out in many scenarios, the accurate identification of landslide boundary is challenging.In contrast with pixel-level segmentation, image-level landslide classification is able to differentiate landslide and other nonlandslide scenes according to geological and visual features, such as object colors, textures, and topographic patterns, from the level of image labels.In this group, Defang et al. [18] investigated transfer learning for landslide image classification combining hybrid datasets.Fang et al. [19] presented a framework with generative adversarial networks.Among the three conventional computer vision tasks, from the view of remote sensing, the image-level classification is more fundamental [20], [21], [22], [23].However, few of the mentioned image-level classification schemes, dominated by deep learning, though, are coupled with the well-known evidence theory to achieve a higher accuracy.The accuracy boosting effect of this theory in neural network sciences, especially in universal deep image-level classification, have been extensively discussed in very recent documents [24], [25], [26].Nevertheless, the remote sensing images are distinct from natural images, since the former has a higher spatial resolution, and is often disturbed by object rotation and shape distortion [27], [28], [29].For landslides, their scales and patterns are often undetermined, which can make them be falsely recognized as other objects, such as swamps, woods, and hills [10].Herein, considering the importance of deep learning-boosted landslide classification, the present study considers the next problem that maintains relatively unstudied: The research community is not clear how and to what extent the evidence theory can boost the accuracy of deep learning in image-level remote sensing landslide classification.In consequence, to further investigate the mentioned issue, this article newly constructs an evidencetheoretic scheme, which will be introduced next.The evidence theory, also known as Dempster-Shafer's theory, is an advanced technique for intelligent decision support [30].It has been exploited as a plug-and-play module in copious domains, such as machine learning [31], [32], [33], [34], object detection [35], and opinion aggregation [36].This theory is sensitive to decision uncertainty [37], and specially tailors a "Dempster's combination rule" for multiple evidence fusion [38], [39].On the basis of its merits, the present study investigates evidence theory-coupled deep learning in remote sensing landslide image classification.However, we still argue three challenges that hinder the lead-in of evidence theory.
1) Belief divergence challenge: Some belief divergences (BDs) may limit evidence-theoretic landslide image classification due to their explainability and decisioninvariance losses.2) Evidence theory involvement challenge: For landslide image classification, how to efficiently form evidence set via deep learning to implement evidence theory is not off-the-shelf.3) Evidence conflict challenge: The Dempster's combination rule may bring about counterintuitive classification results when "evidence conflict" occurs.
For the first challenge, the present study mainly focuses on two BDs' intrinsic defects.Typically, the "BDs" refer to "the divergences in evidence theory" to distinguish them from the divergences in other theories [40].By far, dozens of BDs, such as belief Jensen-Shannon divergence [41] and fractal belief Kullback-Leibler divergence [42], have brought about a tectonic change on evidence fusion.However, some BDs have inherent shortcomings [40].The present study points out the belief Renyi divergence [43] and generalized belief Renyi divergence [44] can lead to explainability and decision-invariance losses on the newly developed evidence-theoretic system (before formally providing a solution, we set two analyses to illustrate the two mentioned losses at first).To address this issue, this article provides a multi-evidence remedy, i.e., the new belief Jensen-Renyi (BJR) divergence, which is more generalized than current solution [45].With its properties justified, we perfectly settle the two mentioned losses.In addition, the ill-definition of its probabilistic prototype [46] is also addressed.
The second challenge can be unique to landslide image classification, but it will drastically affect the proposal's accuracy.Currently, at least two manners are majorly used to implement evidence theory into deep learning: One is to directly attach evidence-theoretic operations behind deep feature extractions and the other is to combine decisions of networks using evidence fusion.The first manner shows advantages, but its ability to handle multisource information is limited.For the second manner, a diversified evidence set is the prerequisite [47].For instance, Xu et al. [48] formed an evidence set via multimodals; Tong et al. [49] combined heterogeneous datasets to generate such a set.Nevertheless, at least two modals or datasets are required in the two mentioned methods, limiting their real-world applications.In [50], the evidence set is obtained via multicolor spaces, but their simple color space conversion is adverse to network accuracy.The ideas of the authors in [51] and [52] may be efficient, but they are not designed for image classification.Then, this study presents a new design ethos from landslides that can avoid the listed issues, i.e., using channelwisely fused images by multiscale visual saliency to form the desired evidence set.We show that the geographical visual saliency from landslides is efficient to diversify networks' decisions, and can reduce network's false classification in comparison with original RGB images.
In the third challenge, "evidence conflict" indicates that the multievidence contains high conflict [53].The evidence conflict is an intrinsic shortcoming of evidence theory, but in landslide image classification, it may result in false recognition.One of the solutions to overcome evidence conflict is to assign weights to the evidence.One can categorize the weighting-based methods into three groups [47].The first group utilizes "belief entropy" or "BD" to form weight, like in [54].The second group often incorporates "belief distance (divergence) + belief entropy", such as in [41], [55], [56], and [57].The third group also shows efficiency, which may choose a "belief distance + belief entropy + impurity" strategy.Zhang et al. [47] investigated this approach and proposed a three-branched fusion model.However, Zhang et al.'s [47] model lacks theoretical explainability on involving

II. THEORETICAL JUSTIFICATIONS: NEW BJR DIVERGENCE
Before the evidence theory is applied in the geotechnical image classification task, the present study starts from the theoretical analysis on explainability and decision-invariance losses, with which the new BJR divergence is proposed to manage the "BD" challenge.

A. Intrinsic Limitations of the Two Belief Renyi Divergences
The major motivation to develop the new divergence is to fix the "BD" challenge of belief Renyi divergence and generalized belief Renyi divergence via a multiBPAs' approach.Namely, the belief Renyi divergence is defined as follows [43]: ) where α ∈ (0, 1) ∪ (1, +∞), θ i ∈ Θ, Θ is the frame of discrimination (FoD) (the set containing all the hypothesis in evidence theory [47]), and P Bl m (θ i ) = Bel(θ i ) + P l(θ i )/ θ i ∈Θ Bel(θ i ) + P l(θ i ).The Bel and P l functions can be found in [43] and [44].
Note that the two mentioned BDs are symmetric iff.α = 1/2.This property is essential to their limitation analysis.To be precise, most of the BDs are symmetric between two pieces of evidence,1 i.e., suppose SBD is a symmetric belief divergence, m 1 and m 2 are two arbitrary BPAs, then the next relationship holds SBD(m 1 , m 2 ) = SBD(m 2 , m 1 ). (3) However, the mentioned two divergences are only symmetric on limited conditions as aforementioned (addressed as limited symmetric hereinafter), with which property the "BD" challenge, the explainability and decision-invariance losses will occur.Since Fig. 2. Fused belief on the preferred hypothesis regarding the fusion times from (a) and (b) belief Renyi divergence [43] and (c) and (d) the generalized Renyi divergence [44].Note that (a) and (c) are symmetric cases, whereas (b) and (d) are nonsymmetric cases.The abscissa is on behalf of the number of fusion times with Dempster's combination rule [38] when fusing the weighted evidence in [43] and [44]

TABLE I FOUR SETS OF BPAS
before this article, few prior studies have both qualitatively and quantitatively discussed this topic, without loss of generality the present study provides two short analyses as behind.
Analysis 1 (Explainability loss): This analysis mainly regards the explainability loss.Suppose, m 1 and m 2 are two arbitrary BPAs and a nonsymmetric belief divergence (NBD).Considering the nonsymmetry, it is rational to suppose the following relationship: whose semantic can be "the dissimilarity between evidence m 2 and m 1 is greater than m 1 and m 2 ," which is self-contradictory and illogical.This article believes that it is reasonable to regard this nonlogicality as an explainability loss when a NBD is applied.
Analysis 2 (Decision-invariance loss): This analysis focuses on the decision-invariance loss.A divergence measure matrix (DMM) among N sets of BPAs pertaining an arbitrary BD can be characterized as next where i, j ∈ {1, . .., N}.It is easy to derive that for the symmetric BDs, the transposition invariance holds, i.e., DM M = DM M T .And for those who are nonsymmetric, since BD(m i , m j ) = BD(m j , m i ) if i = j, there is DM M = DM M T .It means that if the divergence is nonsymmetric, the mentioned transposition invariance will vanish.This article finds that for the symmetric BDs, their fusion results will also keep invariant even if the matrix is transposed.But for the rest, their decision invariance will collapse.The decision invariance-loss will force the system to store the full matrix, which reduces storage efficiency.We will use an example to illustrate the numerical effect of the mentioned decision-invariance loss.Suppose m 1 ,..., m 4 are four BPAs on Θ = {θ 1 , θ 2 , θ 3 }, which are listed in Table I.Then in line with the fusion algorithm in [43] and [44], use the mentioned belief Renyi divergence and generalized belief Renyi divergence to make decisions.For the symmetric case, α = 0.5 is adopted, whereas for the nonsymmetric case, α = 0.3 is chosen instead.The fused belief on the desired hypothesis {θ 1 } regarding DM M and DM M T have been shown in Fig. 2. It is interesting that in the nonsymmetric case the fusion indeed loses the decision invariance with respect to the matrix transformation on both the divergences, suggesting that the decision system is getting ill-posed, whereas in the opposite case this invariance is kept.
We still remember the belief Renyi divergence [43] and generalized BD [44] are two nonsymmetric divergences when α = 0.5.For the at least two adverse effects of the nonsymmetry of divergence as we analyzed ahead, this article will try to fix their undesired limited symmetry.Two possible solutions are mainly considered.
i) Only choose α = 1/2 when they are applied for information fusion, just as preconditioned in [43] and [44].ii) To develop a BD that is symmetric on α's whole definition domain, like the method in [45].But this study concerns that the approach i) may waste the definition domain of α.The method in [45] satisfies approach ii), but their proposal is symmetric within evidence pairs, less generalized than being symmetric among evidence set [55].Fortunately, this article finds that the classical Jensen-Renyi (JR) divergence in (6) can be a potential candidate, which is symmetric among multiple BPAs.The JR divergence is defined Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II PROBABILITY DISTRIBUTIONS OVER SAMPLE SPACE Θ
Fig. 3. Surface and contour graphs of the classical JR divergence with an evenly distributed weight vector.In this figure, it is easy to derive that negative divergences indeed have occurred.
where P 1 , P 2 , . .., P N are N sets of discrete probability distributions, and their corresponding weight vector is ω = {w 1 , w 2 , . .., w N } that satisfies Nevertheless, by demonstrating the next example, this study discovers the JR divergence can be negative when α > 1, manifesting in certain conditions it may not be well-defined.
As can be derived in Fig. 3, negative divergences indeed appear.Thus, now the mentioned proposition is proven.
Up to now, the present study has verified the two belief Jensen divergences may harm the theoretical explainability and decision invariance of fusion framework, which is the "BD" challenge.Besides, we find the classical JR divergence can be ill-defined.Aiming at these issues, a new BJR divergence should be provided.

B. Proposed BJR Divergence Measure
Next, the BJR divergence is formally defined to address the "BD" challenge.To differentiate other Renyi entropy-like belief entropies [59], the Renyi entropy for the Enhanced Pignistic probability is formalized.
Definition 1 (Renyi entropy for the Enhanced Pignistic probability): Suppose Θ is the FoD, H j ∈ 2 Θ , then, the Renyi entropy for the Enhanced Pignistic probability is defined where α > 0, α = 1 is the order, EBetP is the Enhanced Pignistic probability that can be found in [58].
As an entropy for belief functions, it is clear that the newly defined Renyi entropy for the Enhanced Pignistic probability is not a disaggregated information measure, which inspects the randomness uncertainty only [60].With this new entropy, the BJR divergence can be safely defined.
It is effortless to derive that the new BJR divergence can manage the limited symmetry of the two belief Renyi divergences, so the next theorem is provided without proof.This theorem points out that the new divergence is rearrangement invariant regarding ω within α's full definition domain, rather than 1/2 solely, and is symmetric among multiple BPAs, satisfying our expectation.
Theorem 1 (Rearrangement invariance regarding ω): Unlike divergences in (1) and ( 2 In addition, the ill-definition of the JR divergence revealed in Proposition 1 has been fixed due to the exp(•) operator, which is self-explanatory in the next theorem.
Theorem 2 (Nonnegativity): Unlike the divergence in (6), for ∀α > 0, α = 1, the proposed divergence is nonnegative.The theorem of nonnegativity is pivotal and further verified via the next example.For a clear comparison with the probabilistic JR divergence, the data where negative JR divergences are discovered has just been retained.
It is obvious that in Fig. 4, no matter what the parameters μ and α are set, the minimum value of BJR divergence will always be greater than zero.In a significant contrast, the nonnegativity property of our proposal is thoroughly different with the traditional JR divergence, which has been proven in Proposition 1 and exemplified in the proof.
Then, except for the two desired properties, i.e., Theorems 1 and 2, two additional properties are also provided with proofs attached in Appendixes A and B, respectively.
Theorem 4 (Generalization): The BJR divergence can be considered as the generalization of the generalized evidential Jensen-Shannon (GEJS ω ) divergence [55] on the exponent operator.When α → 1 and using the m function to replace the EBetP function, the next relationship holds and |Φ j | remarks the cardinality of Φ j , and Φ j ∈ 2 Θ .This article believes the Theorem 4 is elegant since it uncovers that the new BD is coherently linked with the famous generalization circle of Jensen-Shannon divergence-like BDs, such as belief Jensen-Shannon divergence [41] and generalized Jensen-Shannon divergence [55].
Finally, when the evidence reliability is nonobservable, one can directly assign an average weight to the divergence, i.e., one can set ω = {1/N, . .., 1/N } 1×N , where N denotes the number of evidence.This case will be used later, thus, it is useful to address the following corollaries for the proposed BD with an evenly distributed weighting vector.

III. METHODOLOGY: THE DEEP EVIDENCE-THEORETIC REMOTE SENSING LANDSLIDE IMAGE CLASSIFICATION
As described, the BJR divergence is devised to manage "BD" challenge.Next, we will deal with "evidence theory involvement" and "evidence conflict" challenges, respectively.Note that the new BD is applied in Section III-B.

A. Proposed Strategy to Involve Evidence Theory
The accurate recognition of landslide images is very thorny primarily due to their irregular shape patterns, but the shape information can be reflected on multiscale visual saliency maps [27].The "evidence theory involvement" challenge also indicates a diversified evidence set is vital to lead in the evidence theory [50].Therefore, we develop a solution with channelwise fusion, which is also the first usage of geological visual saliency for evidence set generation.
The calculation of multiscale visual saliency map can be shortly introduced as next [61].Assume the input image is I ori which is converted to hue, saturation, and value [62] (HSV) color space, then, down sample I ori to its 2/3 and 1/3 sizes.Denote the downsampled images as I 2/3 and I 1/3 respectively, and remark η(•) as upsampling an image to its original size.Next, the multiscale visual saliency can be where id ∈ {ori, 2/3, 1/3}.ψ is the spectral residual visual saliency operator [63], and G(•) is the Sobel image gradient extractor [64].w id represents the image entropy [65] of the image I id after a step of normalization.The multiscale visual saliency in [61] can automatically generate visual attention maps containing geological information from remote sensing images, yelling the importance degree of pixels.With MS(I ori ), this article proposes a channelwise fusion structure to obtain diversified evidence set.Denote ⊗ as the Hadamard multiplier to channelwisely fuse the multiscale visual saliency to the remote sensing landslide images (14) where C ∈ {H,S,V} represents the image channels.
The major two advantages of HSV space-based channelwise fusion are i) it provides the prerequisite for diversified evidence set, and ii) the channel fused images can experimentally promote accuracy of neural networks than direct classification on original RGB images.For the advantage i), the RGB images, multiscale visual saliency maps and channelwise fused images are displayed in Fig. 5.The colors in H channel fused images are diversified.The saturation of salient regions are changed in S channel fused images.Artificial halos can be observed in V channel fused images.All the diversifications are identical with the expected evidence set, and with such a set one can safely exploit the evidence fusion.The advantage ii) is because the multiscale visual saliency of landslides in Fig. 5 is commonly higher than nonlandslides, which makes the landslide images more diversified, and the classification improvement is verified in Section V-F.Fusion on spaces other than HSV is discussed in Section V-G.
Since the evidence set is premise of evidential fusion [47], after channelwise fusion, three homogenous networks are trained on each category of channelwisely fused images.The class activation maps in Fig. 6 present that within homogenous network, networks' attentions on pixel of interests are changed or shifted (both the activation location and weighting degree are considered), manifesting the networks are focusing on different locations, or assigning different attention degrees on the same location [66].The observed phenomenon indicates the desired evidence set is formed [47].
As suggested in [47], the algorithm should appropriately convert evidence set into fusion candidates.Therefore, after network training, the evidence theory is implemented.Suppose the predicted probabilities from networks are P H , P S , and P V for each image category, and FoD is Θ = {θ l , θ nl } denoting "landslide" and "non-landslide," respectively.Then, the BPAs before normalization are formed via the next approach  upper bounds where C ∈ {H, S, V}.A step of normalization is required to form BPAs where Φ ∈ 2 Θ and C ∈ {H, S, V }.It is easy to verify that the generated BPAs satisfy the property in ( 18) By verifying (18), it can be determined the computations of evidence theory on landslide classification can be valid [43].

B. New Interpretability Improved Three-Branched Fusion With Proposed BJR Divergence
The final decision on landslides is difficult, which should be made with caution.To tackle the "evidence conflict challenge" in evidence-theoretic decision fusion, after fusion candidate generation, this study proposes the interpretability improved three-branched evidential fusion algorithm.
The motivation on the new algorithm is that in [47], the three-branched fusion is devised, but it lacks an explicit interpretability for their third weighting branch, i.e., the "impurity" branch, since their third branch is just the first-order expansion of their second branch, indicating that it is difficult to answer why the concept of a third weighting branch is indispensable.To manage the described issue, the new model abandons the "impurity" participated structure, and improves justifiability for introducing a third branch in landslide classification.The new workflow can be roughly divided into five steps, where from Step 2 to Step 4 are the three branches.
Step 2. (Calculate branch 1: Supporting degree branch) The first branch takes supporting degrees into consideration.Since no prior knowledge on the information weight is provided, the proposed BJR divergence with an averagely distributed weighting vector is exploited for the calculation: where Dis(m 1 , . .., m j−1 , m j+1 , . .., m N |m j ) means the average dissimilarity among the N sets of BPAs except m j , j ∈ {1, . .., N}.Then, the supporting degree of the jth evidence Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
According to the suggestions by Xiao [55], if Sup j < 1, it manifests that the average dissimilarity with evidence m j is greater than the average dissimilarity without m j , thus, m j is a conflicted evidence, and a smaller credibility should be assigned.And for the case Sup j > 1, it means that after providing evidence m j , the average dissimilarity is decreased, thus, a larger credibility should be assigned.For the case Sup j = 1, it means that no significant dissimilarity change is obtained.Then, the final supporting degree is gained via this approach Sup j = Sup j N s=1 Sup s (22) where j ∈ {1, . .., N}.
Step 3. (Calculate branch 2: Information volume branch): The information volume measures quantify informational quality of evidence, and Deng entropy has achieved success [47], thus, the Enhanced Pignistic Deng entropy (EPDE) [58] is utilized to measure the uncertainty degree of the jth evidence where j ∈ {1, . .., N}.Then, the information volume of the jth evidence is formed via the next equation which terminates the Step 3. Note that as a Deng entropy-like entropy, the EPDE linearly combines the uncertainty measured from randomness and nonspecificity [60].However, in [68], Deng entropy is argued that its nonspecificity part may expand too large, thus, may cancel the uncertainty measurement from randomness.As a belief entropy analogous to Deng entropy, the EPDE also faces this problem.Keep this motivation in mind, unlike the original method in [47], this study adopted another entropy of belief function in Step 4, which enhances the explainability of three-branched fusion progress.
Step 4. (Calculate branch 3: Improved information volume branch): Following the drawback analysis of the Deng entropy as presented in Step 3, to enhance the uncertainty measurement from randomness, the Renyi entropy for the Enhanced Pignistic probability defined in ( 8) is exploited where j ∈ {1, . .., N}.Then, a step of normalization is required According to (8), the proposed belief entropy only measures the uncertainty from randomness, and is parameterized by α, which is more feasible.And unlike its former counterpart in [47], the new approach abandons an "impurity" that can be more beneficial to the fusion model, because the third branch inherently corresponds to the shortcoming of the second branch.Consequently, the third branch can serve as an improvement of the second branch with a better theoretical significance.Note that though the branches can be conflicted, what is determined is a computational crash will not occur, as the three branches are only weights, rather than decisions.
Step 5. (Multiple evidence combination for landslide classification): The evidence combination determines decision belief assignment on landslides [58].First, fuse the three weight components to attain the evidential weight of jth evidence Ŵj = Sup j × IV j × EIV j (27) for j ∈ {1, . .., N}.Then, a step of normalization is required where j ∈ {1, . .., N}.Next, form the weighted evidence via the next equation Then, fuse the weighted evidence for N − 1 times, with which the decision evidence function can be obtained

W EBetP
(30) where ⊕ marks the Dempster's combination rule [38] Up to now, the interpretability improved three-branched fusion for landslide image classification has terminated.Fig. 7 displays the full evidence-theoretic fusion progress.The refined fusion framework can help to gather and fuse the diversified evidence from deep neural networks, thus, the decisions can be integrated more reasonably.For the step of decision-making, the Pignistic transformation [69] is utilized to convert belief functions back into probability functions: where θ ∈ Θ.Ultimately, the BetP function carries the final classification on remote sensing landslide images.

IV. THEORETICAL COMPARISON ON EVIDENCE CONFLICT MANAGEMENT
In Section III, an interpretability improved three-branched fusion framework is proposed.However, we still remember the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.new fusion model is majorly proposed to cope with evidence conflict [41], thus, like [41], [55], and [56], this study provides a short fusion analysis on evidence conflict, which emerged in remote sensing landslide image classification.
Example 2 (Numerical validation on conflict management): This example is from a real landslide classification problem.Fig. 8 displays the channelwisely fused remote sensing images that contain a landslide occurred near a mountain road.Suppose FoD Θ = {θ l , θ nl }, where θ l and θ nl denote landslide and nonlandslide respectively.m H , m S , and m V are three BPAs converted from networks' predictions on H, S, and V channel fused images.The BPAs have been displayed in Table V, and m H is noticed to be a piece of conflicting evidence with the highest belief 0.9591 on {θ nl }.Next, this example involves seven previous evidential fusion schemes to fuse the evidence.Table III displays the results.
In Table III, it is clear that except Dempster's rule [38], the remaining algorithms have successfully discriminated the desired hypothesis, i.e., {θ l }.In addition, the proposed fusion with α = 0.5 reaches the highest belief on the desired hypothesis with 0.9906, which reflects its efficiency in evidence conflict management [41].In summary, this result primarily verifies the practicality of the proposed evidential fusion scheme.Even when confronting evidence conflict, the proposal can still keep robust to desired fusion results.
Next, the contribution ratio of each weighting branch and the final weights of evidence are shown in Table IV.One can easily derive that the six involving algorithms have all assigned the lowest weight on the untrusted evidence m H as desired, but the three-branched schemes, including Zhang's method [47] and the proposal, is averagely even lower, which are 0.0563 and 0.0184, respectively, (and the proposal is even lower than Zhang's method [47]).It suggests that the three-branched schemes can be more sensitive to evidence conflict, because once the third weighting branch is added, via the multiplier in (27) the canceling effect on the conflicted evidence will be reinforced to a higher level.And the proposal is even more sensitive to evidential conflict than the original Zhang's method [47] since an even lower evidential weight (0.0184) on the untrusted m H has been assigned, which will be more beneficial to conflict-based evidential fusion.
Finally, since the participated weighting-based fusion methods, except Zhang's method in [47], mainly choose to fuse weighted BPAs rather than the weighted EBetP functions with Dempster's combination rule [38], this study also provides the fused belief on the desired hypothesis {θ l } under two mentioned cases in Fig. 9.It is clear that when using weighted EBetP functions, the model will be more competitive (+0.1109belief at most).But even to fuse weighted BPAs, except the case when α = 0.1, the proposal can still obtain a higher belief on the desired hypothesis than the models that choose to fuse weighted BPAs (the best is generalized Renyi [44] with a fused belief of  0.8911).Consequently, in this case study we present that even when the weighted BPAs are set for fusion can still the proposal be robust to evidence conflict.

A. Dataset Description
To guarantee a fair comparison, the involved dataset is Bijie Landslide open source dataset [13], which is an optical remote sensing landslide image dataset collected by research groups in Wuhan University.This dataset consists of 2773 images captured by TripleSat satellite in Bijie, China, from May to August 2018, including 770 optical landslide images and 2003 nonlandslides images.The exemplified images have been demonstrated in Fig. 10.
Then, this article utilizes a proportion of "0.64:0.16:0.20" to randomly split the train, evaluation and test datasets for network training.Details have been listed in Table VI.

B. Setup and Training Details
The following experiments are executed on an Intel I9-11900 K 3.50 GHz CPU, 64.00-GB RAM platform.A NVIDIA GeForce RTX 3090 graphical processing unit is also exploited to accelerate the network training process.
To guarantee a reproducible study, the involved networks are trained for 100 epochs with initial weights pretrained on ImageNet-1 K dataset [70].Images are center cropped into 224 × 224 with batch size 32.Optimization details of the participated networks are shown in Table VII.For the "Step" updater, the decay steps are 30, 60, and 90.For the "CA" updater, the minimum learning rate ratio is 0.01.The label smooth value is 0.1.Data argumentation [71] is also employed to solve the issues of data imbalance and small amount of data.

C. Comparison Algorithms
Considering that this article is a de facto integrated study, we involve four comparative experiments, where Fig. 11 shows the comparative studies' organization.
Deep image classification backbone networks are essential to image classification task in computer vision-related areas.Thus, in the first experimental study, this article involves eight classical or very recently researched image classification backbone networks (12 networks in total) for the proposal's efficiency verification, which consists of ResNet101 [72], EfficientNet (b0) [73], MobileNetV3 (small) [74], RepVGG (B0) [75], Visual Attention Network (VAN, Tiny and Base) [76], Swin Transformer (Tiny and Base) [77], Swin Transformer V2 (Tiny and Base) [78], and HorNet (Tiny-GF and Base-GF) [79].In the second comparison, the comparison between the improved three-branched evidential fusion and SOTA evidence fusion algorithms are investigated on the best performed network, which include Dempster's rule [38], Murphy's method [80], Deng's method [81], belief Hellinger distance [57], belief Renyi divergence [43] and its generalization [44], Zhang's method [47] and this proposal.[13], where (1)-( 4) are landslide images, and ( 5)-( 7) are nonlandslide images.Then, in the third comparative study, this article provides the comparison with eight remote sensing scenes image classification algorithms.Bazi et al. [82] employed ViT-32 for this vision task.In [83], a SCCovNet is devised for end-to-end remote sensing scenes image classification with skip connections and covariance pooling.In [84], the network VGG-VD-16 is applied.In [85], a multiscale feature fusion covariance network named MF 2 CNet with octave convolution is constructed.Tang et al. [86] endowed EMTCAL, a remote sensing scene classification algorithm with multiscale transformer and cross-level attention learning.A homo-heterogenous transformer learning (HHTL)-based remote sensing scene image classification framework, is demonstrated in [87].And Huang et al. [50] proposed ECMS, a multicolor spaces-based remote sensing image classification with an optimizable BPA function discounting weights.This paper choose their best performed network, the GSANet [88] in comparison study.
As for the fourth study, we fairly compare the proposal with three recently prevalent evidence-theoretic deep learning classifiers.In [90], an evidence-theoretic deep-learning algorithm with expected utility theory was developed.In [89], dynamic evidence fusion is connected with trusted multiview classification.Debaque et al. [91] presented a deep evidencetheoretic frame for accurate sheep classification.Then, this study fairly compares them in the deep image-level classification task on remote sensing landslide images.

D. Evaluation Criteria
In experiments, four metrics (overall accuracy, precision, true positive rate, and F-measure score) are involved to guarantee a fair comparison [50] OA = TP + TN TP + TN + FP + FN (33) where TP, TN, FP, and FN represent true positive, true negative, false positive, and false negative.

E. Comparative Analyses 1) Efficiency Analysis on Network Structures:
One of this study's goals is to demonstrate the efficiency of evidence theorycoupled deep learning in remote sensing landslide image classification.Thus, to begin with, this article compares solely using a single image classification backbone network with the proposal employing the evidence theory (both channelwise fusion and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.interpretability improved three-branched fusion are included).The evaluation results in Fig. 12 are revealing in two ways, which are discussed as behind.
1) The leading-in of evidence theory has proven efficient with almost all the involved neural networks.Except the slightly decreased Pre score on RepVGG-B0, the proposed frame significantly improves the OA, Pre, TPR, and F-measure scores on the remaining 11 neural networks (with +4.15%, +5.19%, +5.72%, and +4.94% improvement at most, respectively).The dominating reason is the evidence theory can combine multiple meaningful features and information, and consequently make comprehensive decisions, whereas a single network does not have this property.And it also reflects that the evidence theory seems not to have an explicit preference on a particular network depth or structure.Namely, the tested networks' scale varies drastically, and they may be structured on CNN or ViT, or their combinations, but we find most of them equally have a tendency to be improved.2) Under the four given criteria, the Swin Transformer V2-Tiny and VAN-Base achieve the highest Acc, TPR, and F-measure with 0.9963, 0.9935, and 0.9935, respectively.For Swin Transformer V2-Tiny, it is because its scaled cosine attention improves the learning of diversified features from landslides.As for VAN-Base, its channel adaptability is improved, which is compatible with the proposed channelwise fusion.In the following parts, this article chooses the Swin Transformer V2-Tiny as the best backbone network for further investigations.

2) Comparison With Previous Evidence Fusion Algorithms:
Since except for the theoretical comparison in Section IV, one should also investigate the improved three-branched fusion in remote sensing image classification, the present article sets the comparison between the improved model and seven state-of-the-art evidence fusion algorithms, and the results have been shown in Table VIII.The parameter analysis on α is also included.
The data in Table VIII reveals that the proposed fashion yields the highest OA, Pre, TPR, and F-measure scores with 0.9964, 0.9935, 0.9935, and 0.9935, when α = 0.1 and 0.5.This is because in contrast with earlier proposals, the proposed BD introduces a more general mathematical form that can handle multiple evidence.The impact is the apriori knowledge from landslide visual features can be reasonably fused with a global interplay.In consequence, the decision accuracy is further improved.Note that when α = 1.5, the performance of the proposal drops to a lower level, suggesting that when applying this scheme in real world landslide image classification problems, it is better to finetune α to guarantee the algorithm's performance.
3) Comparison With Deep Remote Sensing Image Scene Classification Algorithms: In this section, the present study turns to the comparison with state-of-the-art (evidential) remote sensing image scene classification schemes.
To start with, the algorithms' performances are displayed in Table IX.A positive finding is the new method with the best performed backbone network ranks first under OA, Pre, TPR, and F-measure with 0.9964, 0.9935, 0.9935, and 0.9935, respectively.The result from ECMS-GSANet is also satisfying.Nonetheless, its simple channel conversion is adverse to network accuracy.That is, although it also uses evidence theory to combine multisource information, its fusion candidates have already made more false classifications than the proposal.The lower competitiveness of the rest methods accounts for 1) they are generalist networks, whose structures are less optimized for the complex landslide features, and 2) they rely to a great extent on a single image, which is less comprehensive than exploiting integrated visual patterns and features.
Then, we go a step further.Since the Precision/Recall curve and receiver operating characteristic (ROC) curve are two more objective performance indicators [92], the present article involves them in Fig. 13.In the Precision/Recall curves, it is apparent that the proposal with best performed backbone network significantly dominates this metric with the highest average precision score 0.9985, demonstrating its superiority.Then, in the figure of ROC curves, since both the proposal and ECMS-GSANet yield satisfying performance, they seem Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.to simultaneously dominate this metric.However, because only when an algorithm ranks the first in ROC curves can it dominate the Precision/Recall curves [93], and the proposal has gained the highest area under curve score with 0.9994, this proposal is still more competitive than the second best performed algorithm.
Eventually, the present study sets Fig. 14 to compare the classification on difficult samples.The images in the figure were randomly selected from the samples that are confusing even for humans.A positive finding is from the first three lines, our approach stands out as compared with ViT-32, MF 2 CNet, and HHTL.However, our method also fails when dealing with extremely difficult samples, as presented in the last line.The khaki scene in the image center closely resembles the color and shape pattern of real landslides, making it very challenging for the recognition of artificial intelligence classifiers.

4) Comparison With Evidence-Theoretic Deep Learning Classifiers:
In this section, the present study further compares the proposal with three versatile deep learning classifiers combining evidence theory, which are evidence deep learning with utility theory (EDLU) [90], evidence deep learning for sheep classification (EDLS) [89], and evidential trusted multi-view classification, (ETMC) [91], respectively.The backbone network of EDLU and the proposal is Swin Transformer V2-Tiny, where Table X exhibits the classification results.
Through data analysis, it is clear that the EDLS achieves inferior performance.The core reason is its ResNet18 backbone networks are too shallow to capture useful visual features for landslides.And the proposal has a significant potential to surpass both ETMC and EDLU.For ETMC, it seems because its fusion rule is akin to traditional Demspter's fusion rule, which is not robust against evidence conflict when it has to face confusing apriori knowledge.Then, the underlying reason for EDLU is it lacks a decision fusion module, which hinders the evidence theory's accuracy boost effect when integrating the landslide information, such as colors and shape patterns.The proposal can avoid the three analyzed disadvantages, thus, a better performance can be obtained.

F. Ablation Studies 1) Ablation Study on Each Fusion
Stage: First, we present the effects of each fusion stage.Fig. 15 displays the confusion matrix before and after each fusion (the results of channelwise fusion are listed in accordance with their fused channel, i.e., the H, S, or V channel).The results strongly support the efficiency of our proposal, since in contrast with the RGB images, the number of miss classifications significantly drops on all the H, S, and V channel fused images.Namely, we find the HSV space conversion plus channelwise fusion result in three better feature spaces than traditional RGB space.This encouraging phenomenon can be observed on 11 backbone networks (still excluded the RepVGG-B0), which is also the first study to discover that the proposed HSV channelwise fusion can enhance the neural networks on remote sensing landslide images.Then, with the new three-branched fusion, final results with the least miss classification can be achieved.In summary, both the channelwise fusion and the interpretability improved three-branched fusion are found effective.
2) Ablation Study on Improved Three-Branched Fusion: The second study is about the refined three-branched fusion, where the conditions are presented in Table XI.Fig. 17  The Fig. 17 uncovers that if the third weighting branch is not involved, the OA scores on the participated two networks both shrink to a lower level.Then, if we adopt the third branch, the proposed fashion achieves the same performance with the current best three-branched fashion, i.e., Zhang's fashion.But we still remember that Zhang's model falls in the dilemma of numerical efficiency and model explainability, which is avoided by the proposal.Consequently, this article considers that the proposed scheme is better not only from the aspect of model explainability, but also from its guarantee on computational effectiveness., (d) SwinTransformer-T [77], (e) SwinTransformer V2-T [78], (f) HorNet-T-GF [79], (g) EfficientNet [73], (h) RepVGG-B0 [75], (i) , (j) SwinTransformer-B [77], (k) SwinTransformer V2-B [78] and (l) HorNet-B-GF [79], respectively.The columns (1) and ( 6) are the classification results from the original images, ( 2) and ( 7) are from the H channel fused images, (3) and ( 8) are from the S channel fused images, ( 4) and ( 9) are from V channel fused images, ( 5) and ( 10) are from the proposed fusion flow (including the channelwise fusion and the new divergence-based interpretability improved three-branched fusion).

G. Sensitivity Analyses 1) Sensitivity on Fused Color Spaces:
In the proposed channelwise fusion, this article chooses a HSV space conversion strategy.However, the fusion in other color spaces, such as RGB, XYZ, and Lab, has not been verified yet.Therefore, this article also tests the model performance when the fusion is applied in spaces rather than HSV.Fig. 16 demonstrates the OA score when the channelwise fusion is applied in RGB, XYZ, Lab, HSV, and YIQ spaces.A key conclusion is that except EfficientNet-b0 and RepVGG-B0, the HSV-space fused images tend to reach a better classification.That may be explained that the HSV space is naturally more suitable for feature representation, which is in line with literature [50].Thus, by appending a HSV transformation, the neural networks can automatically extract better features on landslides than features in the remaining color spaces.This finding also answers why it is the HSV space that is chosen as the basis for our later operations.
2) Sensitivity on Evidence Conflict Management: This section investigates the sensitivity of evidential weight assignment regarding α, which determines the sensitivity of evidence conflict management.This test is performed on Swin Transformer V2-Tiny network, and results are demonstrated in Fig. 18.
An essential outcome is with the increase of α, the evidential weights gradually approximate long-tailed distributions.More weights are assigned nearing 0 and 1, the two boundary values, which indicates the sensitivity of evidence conflict gets reinforced.In accordance with [58], a evidence fusion system with more sensitivity to evidence conflict might be more helpful to manage the counterintuitive results of Dempster's combination rule.Therefore, we can learn with the rising of α, the management of evidence conflict also varies dramatically.
3) Sensitivity to Image Noise Degradation: The image degradation will galactically influence deep learning on remote sensing image classification [95].This article stimulates two types of sensor-affected image degradation in remote sensing, i.e., addictive and multiplicative Gaussian noise, and compares the proposal with single backbone network.Fig. 19 displays the exemplified noisy landslide images affected by stochastic zero mean Gaussian signal with a variance of 0.1.
The comparison has been presented in Fig. 20.One can learn that with the increase of noise variance, both the single network and the proposal shows a performance degeneration.Nevertheless, the proposal can still produce a better classification on OA score than single networks, which suggests a robustness against noise-caused image degradation.This phenomenon can be explained that in contrast with a single network, the evidence theory is more adept in organically capturing and unifying several information sources.Consequently, a more robust decisionmaking is observed.It also provides a new evidence about the evidence theory's effectiveness in remote sensing landslide image classification.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.With the data from Table XII, it is clear that the proposal ranks first as compared with the remaining algorithms, which reflects the proposal's efficiency.The underlying cause is that the proposal chooses an improved three-branched structure to

TABLE XII COMPUTATIONAL COMPLEXITY COMPARISON WITH COMPETING EVIDENTIAL
FUSION ALGORITHMS evaluate the information sources, which is more comprehensive than its former counterparts.In addition, from ranking of OA score, the law "the faster, the better" is strictly followed.Namely, the simplest algorithm participated in this study faces the heaviest risk on performance degradation.This finding suggests that our algorithm is still far from being too complex to face generalization ability loss.
2) Real Scenario Verification: By far, the proposed method has only been verified on Bijie Landslide dataset.Next, we execute a verification on real scenario in Luding, China, to further evaluate the proposal.
The Luding County is located in Sichuan Province, Southwest China.As a conclusion in Fig. 21, due to its complex terrain and fragile ecological environment, landslides are frequent in this area.Fu et al. [97] published a series of real landslide scenarios in Luding, which includes 200 optical remote sensing images on earthquake-included landslides captured by GF-6 satellite on September 5, 2022.Although they did not point out the exact regions that the landslides are located at, one can still use this real image set as unseen samples to test the models' generalization ability.In consequence, we apply  four image classification networks that are coupled with evidence theory and pretrained on Bijie dataset to make predictions on the Luding image set, where the results are shown in Fig. 22.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The finding significantly stands out from Fig. 22 is the evidence theory-coupled methods outperform the single networks.The dominating reason is when facing unseen instances, the evidence-theoretic fusion can efficiently exploit its nature in dealing with "uncertain" and "ignorant" apriori information, which brings about a more robust decision-making.In contrast, the original single networks is less competitive in processing uncertainty in dataset shift, thus, they can be less accurate.The authors notice that the work in [98] can support our analysis.
3) Proposal's Shortcoming and Future Works: An algorithm can hardly be competitive in every aspect.Even though the proposal has satisfying decision accuracy, it is still limited to pure vision-based classification.Some environmental factors, such as the profile curvature and slope angle, are not involved.Besides, yet the current model has a high complexity, and has not been pruned for in-orbit embedded systems.Future works can focus on the combination of geological landslide discriminators, as well as the evidence theory-boosted fast recognition on in-space platforms.

VI. CONCLUSION
In the task of remote sensing landslide image classification, the accuracy of conventional deep learning still has room for improvement.Aiming at this issue, the present study introduces the evidence theory to enhance deep learning classifiers.We address three insurmountable challenges that hinder the implementation of evidence theory: the "evidence conflict," "BD," and "evidence theory involvement" challenges, and makes three improvements.To tackle the BD challenge, the BJR divergence is proposed.Next, for the evidence theory involvement challenge, we newly design a channelwise fusion strategy with multiscale visual saliency.Its reduction effect on false classification is also witnessed when compared with RGB landslide images.Eventually, to address the challenge of evidence conflict, an interpretability improved three-branched fusion is meticulously devised, which successfully refines the model explicability of its former counterpart.
To comprehensively evaluate the proposal, in theoretical comparison, we verify its robustness against evidence conflict; in experimental studies, we uncover its suitability on state-of-the-art image classification backbone networks.The proposal also tends to yield better classification as compared with remote sensing scenes classifiers, versatile deep evidence-theoretic classifiers and evidential information fusion schemes.We also painstakingly discussed the proposal's sensitivity on two types of image degradation, as well as its performance under difference color spaces.Eventually, the proposal is verified in a real landslide scenario in Luding County, China, whose data is publicly available.
Despite effectiveness, the present article has pointed out that the proposal is still restricted to pure vision-based landslide classification, and the current model has not been pruned for in-orbit platforms.In future works, the authors will focus attention on the evidence theory-boosted landslide classification combining multiple topographic discriminators, as well as their fast and accurate identifications on in-orbit instruments.

ACKNOWLEDGMENT
The authors would like to sincerely thank the timely consultations of Prof. Lianmeng Jiao in Northwestern Polytechnical University, China, and all the encouragement and constructive suggestions from editors and peer reviewers.
Lemma 2: When α ∈ (1, +∞), the Renyi entropy for a probability P is always upper bounded by the traditional Shannon entropy R α (P ) ≤ H(P ) (37) where H(P ) is the classical Shannon entropy for the probability distribution P [99].
With the lemmas this theorem can be proven safely.Proof: It is clear that the inequality BJR ω,α ≥ 0 holds.By using Lemma 1, it is rational to suppose when α ∈ (0, 1), the upper bound of the JR divergence is M , then, it is easy to derive the next relationship w i EBetP i (39) where H represents the traditional Shannon entropy.Since when P is a n-dimensional discrete uniform distribution, the Shannon entropy can achieve its maximum value, we have Thus, the BJR divergence is upper bounded by e log n .In summary, for α ∈ (0, 1) ∪ (1, +∞), the BJR divergence has an upper bound M = max{e M , e log n }.

Fig. 1 .
Fig. 1.Proposed deep evidence-theoretic remote sensing landslide image classification workflow with a BJR divergence and a new fusion model called interpretability improved three-branched evidential fusion.n denotes the size of the power set of FoD.
Fig.2.Fused belief on the preferred hypothesis regarding the fusion times from (a) and (b) belief Renyi divergence[43] and (c) and (d) the generalized Renyi divergence[44].Note that (a) and (c) are symmetric cases, whereas (b) and (d) are nonsymmetric cases.The abscissa is on behalf of the number of fusion times with Dempster's combination rule[38] when fusing the weighted evidence in[43] and[44]'s proposal.(a) Symmetric case 1.(b) Nonsymmetric case 1. (c) Symmetric case 2. (d) Nonsymmetric case 2.
15) where C ∈ {H, S, V} marks the image channel.For the case of ignorance, i.e., mC (Θ), one can easily derive its lower and Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 7 .
Fig. 7. Workflow of interpretability improved three-branched evidence fusion.n denotes the size of FoD's power set.

Fig. 9 .
Fig. 9. Fused belief on desired hypothesis {θ l } using two different decision functions in the proposal.

Fig. 12 .
Fig.12.Results of efficiency analysis."Backbone" means only a single SOTA backbone network is applied for landslide image classification.
displays the comparison results on RepVGG-B0 and Swin Transformer V2-Tiny backbone networks.

Fig. 16 .
Fig. 16.Classification results in Sensitivity analysis 1 when the proposed channelwise fusion is applied on different color spaces.Here the RGB, XYZ, Lab, YIQ, and HSV spaces are investigated.

Fig. 21 .
Fig. 21.Satellite terrain image of Luding from Google Earth.The data collectors did not point out where these landslides locate.

41 )
lim α→1 R α (P ) = H(P ) (Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where H represents Shannon entropy.Then, if we use m function to replace the EBetP function, it is easy to derive lim α→1

TABLE III FUSION
RESULTS OFEXAMPLE 2

TABLE IV CONTRIBUTION
RATIO OF EACH WEIGHTING BRANCH TOWARD BPAS AS WELL AS THE FINAL WEIGHTS IN EXAMPLE 2, WHERE "↓" MEANS SMALLER IS BETTER

TABLE VI SIZE
OF EACH SET AFTER STOCHASTICALLY SPLITTING 2773 IMAGES

TABLE VII TRAINING
DETAILS OF THE BACKBONE NETWORKS, WHERE "CA" IS SHORT FOR COSINE ANNEALING Fig. 11.Comparative studies' organization.

TABLE VIII COMPARISON
STUDY BETWEEN THE BJR DIVERGENCE-INVOLVED FRAME AND STATE-OF-THE-ART EVIDENTIAL SCHEMESTABLE IX COMPARISON BETWEEN THE PROPOSAL AND SOTA (EVIDENTIAL) REMOTE SENSING SCENES IMAGE CLASSIFICATION ALGORITHM

TABLE X COMPARISON
WITH DEEP EVIDENCE-THEORETIC CLASSIFIERS

TABLE XI INVOLVING
CONDITIONS IN ABLATION STUDY 2 Table XII displays the results.