WCDL: A Weighted Cloud Dictionary Learning Method for Fusing Cloud-Contaminated Optical and SAR Images

Cloud cover hinders accurate and timely monitoring of urban land cover (ULC). The combination of synthetic aperture radar (SAR) and optical data without cloud contamination has demonstrated promising performance in previous research. However, ULC studies on cloud-prone areas are scarce despite the inevitability of cloud cover, especially in the tropics and subtropics. This study proposes a novel weighted cloud dictionary learning (WCDL) method for fusing optical and SAR data for the ULC classification in cloud-prone areas. We innovatively propose a cloud probability weighting model and a pixelwise cloud dictionary learning method that take the interference disparities at various cloud probability levels into account to mitigate cloud interference. Experiments reveal that the overall accuracy (OA) of fused data rises by more than 6% and 20% compared to single SAR and optical data, respectively. This method considerably improved by 3% in OA compared with other methods that directly stitch optical and SAR data together regardless of cloud interference. It improves almost all land covers producer's accuracy (PA) and user's accuracy (UA) by up to 9%. Ablation studies further show that the cloud probability weighting model improves the OA of all classifiers by up to 5%. And the pixelwise cloud dictionary learning model improves by more than 2% in OA for all cloud conditions, and the UA and PA are enhanced by up to 9% and 10%. The proposed WCDL method will serve as a reference for fusing cloud-contaminated optical and SAR data and timely, continuous, and accurate land surface monitoring in cloudy areas.

air pollution, and climate change [1], [2]. Increasing satellite observations has provided opportunities for ULC monitoring using various remote sensing technologies in an accurate and timely manner [3]. However, one common problem persisting in optical remote sensing, cloud cover, still hinders accurate and timely ULC monitoring. It can corrupt the reflected optical signal and obstruct the land-ground view [4]. Moreover, research showed that, on average, 67% of the Earth's surface is covered by clouds [5]. In cloudy tropical and subtropical regions, there are even fewer cloud-free images. For example, a study shows that for the center of the highly urbanized Pearl River Delta (PRD) in southern China, located in the subtropical, humid climate belt, cloud-free Sentinel-2 images account for about only 7.6%, and most of them are between November and January [6].
Facing the challenge, synthetic aperture radar (SAR) provides possibilities for timely and continuous ULC mapping in cloudprone areas as it is weather independent. SAR benefits from the characteristics of penetrating clouds and is sensitive to the electrical conductivity and geometric structure of the ground surface and thus can provide discriminative land cover information in all weather. Therefore, SAR has been widely adopted for land surface monitoring, including urban built-up identification, crop growth monitoring, and soil moisture retrieval [7], [8], [9]. However, due to its imaging mechanism, it also suffers from several problems, such as speckle noise, foreshortening, and layover [10]. And it provides limited information compared to rich spectral information of optical remote sensing. Therefore, researchers have proposed various methods for fusing optical and SAR data for ULC classification, primarily at three different levels: pixel level, feature level, and decision level [11]. Without feature extraction, pixel-level fusion refers to the overlay of optical and SAR data at the pixel level [12]. The combination of characteristics obtained from optical and SAR data is known as a feature-level fusion [10], [13]. Support vector machine (SVM), random forest (RF), and deep network methods are common approaches used for feature-level fusion [11], [14]. Decision-level fusion refers to making decisions based on the classification results from both optical and SAR sources. Voting, Dempster-Shafer theory, and RF are the fundamental components of the decision-level fusion of optical and SAR data [15], [16], [17].
Nevertheless, among these methods, most focus is on fusing cloud-free optical and SAR data without considering cloud occlusion [10], [11], [13], [14], [17], [18], [19]. There are few This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ studies on SAR and optical data fusion for image classification in cloudy areas. Yang et al. [20] proposed a marine disaster assessment method for cloudy coastal areas using optical and SAR images. Zhang et al. [21] combined optical and SAR data for classification in cloudy areas. And Sun et al. [22] proposed a crop detection method in rainy areas based on optical and SAR response mechanisms. However, most of these methods directly concatenate optical and SAR features or discard the cloudy part of optical data. Considering the interference of clouds, the methods of integrating optical and SAR data for ULC discrimination under clouds still need further exploration.
Our previous research has found that under the interference of clouds, the spectral information of pixels with different cloudcovering probabilities is disturbed differently, and the credibility of the pixels' spectral features is thus different [6]. Generally, the greater the cloud probability (CP), the thicker the cloud might be. In the presence of clouds, previous methods that directly concatenate optical and SAR features under cloud coverage and treat all CP pixels equally under one processor would ignore cloud interference. It causes samples under different cloud conditions to interfere with each other, hindering the classifier's learning for ULC classification.
Considering the interference of clouds on the spectral features, in this study, we propose a cloud-oriented optical and SAR fusion method for ULC classification. As dictionary learning (DL) methods have shown exemplary performance in remote sensing field, such as multisource/multimodality data fusion [23], [24], [25], [26], [27], image classification [28], [29], [30], [31], [32], [33], image recovery [34], [35], [36], and interferometric phase restoration [37], an innovatively pixelwise cloud dictionary learning (CDL) method that considers the interference of clouds is proposed for better discriminating ULC in the cloudy environment. Compared with previous research, the main contributions of this study are summarized as follows. First, it innovatively proposes a pixelwise CDL method that considers cloud interference for fusing cloud-contaminated optical and SAR data and discriminating ULC in cloud-prone areas. Second, it develops a cloud-probability-based weighting strategy for more credible joint extraction of optical and SAR information. The proposed weighted cloud dictionary learning (WCDL) method could serve as a reference for fusing cloud-contaminated optical and SAR data and timely, continuous, and accurate land surface monitoring in cloudy areas.

II. STUDY AREA AND DATASET
The PRD in southern China has undergone intense urbanization in recent decades. However, with increased human activities, environmental consequences in this area have gradually emerged. As an essential index of urbanization and its economic, social, and environmental impacts, the dynamic monitoring of ULC is of great significance to the urbanization research of the PRD. However, the PRD region is in a subtropical humid climate zone. A study shows that only 7.6% of images in this region are cloudless [6]. A long period of cloudy weather throughout the year brings great difficulties. Clouds can reflect optical signals and hinder the observation of the land surface, making it challenging to classify land cover using remote sensing images. Therefore, exploring how to fuse optical and SAR data in the presence of clouds to achieve timely, continuous, and accurate classification of ULC will provide significant support for urbanization research in the PRD. The study site is located in Shenzhen and Hong Kong, representative of highly urbanized cities in PRD. The land covers in this region mainly include vegetation (VEG), soil (SOI), impervious surfaces (IS), and water (WAT). Scenes of the cloud-contaminated Level-2A Sentinel-2 images acquired in January 2019 are selected. And one scene of the fully polarimetric Gaofen-3 SAR image acquired on 28 December 2018 with a 5 m resolution is utilized. Another cloudfree Sentinel-2 image acquired on January 25, 2019 is used for reference. Fig. 1(a) shows the study area's location. Fig. 1(b) and (c) shows the study area's optical and SAR remote sensing images, respectively. The pixel property of "cloud possibility" could be identified with the F-mask algorithm and is directly identified in the quality bands of the Sentinel-2 in this study.

III. METHODOLOGY
This study aims to alleviate the inference of clouds on ULC identification to achieve timely, accurate, and continuous ULC classification in cloud-prone regions. To this purpose, a weighted cloud dictionary learning method termed WCDL that fuse cloudcontaminated optical and SAR data for ULC recognition under clouds is proposed. According to the CP, cloud states are divided into three levels in this study: Free_CP, Low_CP, and High_CP, with CP at [0, 10%], [10%, 60%], and [60%, 100%], respectively. The proposed WCDL consists of a CP-based weighting model and a pixelwise cloud-level DL method, as shown in Fig. 2.

A. Feature Extraction
Before feature extraction, GF-3 data undergoes a series of preprocessing steps. The GF-3 data were first radiometrically calibrated. The speckle filtering was then performed to reduce speckle noise. PolSAR decomposition based on coherence and covariance matrices helps reveal the scattering information in more detail. To fully describe the scattering process, in addition to extracting the backscattering coefficients of the HH, HV, VH, and VV modes, we also extracted different polarization decomposition features that have been shown to be effective, including Freeman-Durden decomposition parameters, H-A-Alpha decomposition parameters, Pauli decomposition parameters, and Yamaguchi four-component decomposition parameters [38], [39], [40]. Fig. 3 demonstrates the false color composition of the polarimetric features of GF-3 data. Fig. 3(a) shows the backscattering coefficients, setting HH, HV, and VH coefficients as R, G, and B channels. Smooth surfaces mainly manifest as simple surface scattering, resulting in weak echo. It is generally dark on the backscattering coefficient images, as in the water and roads in Fig. 3(a). While due to the city buildings' complex geometric shapes and angular reflectors, they usually have vigorous scattering intensity and will appear bright. For objects with rough surfaces (such as trees), the intensity of HH and VV polarization  modes might be weaker than cross polarization (HV or VH) and thus appear green. Fig. 3(b)-(e) shows Pauli decomposition features, H-A-Alpha decomposition features, Freeman decomposition features, and Yamaguchi decomposition features, respectively. These features present different characteristics and provide rich scattering information. For Sentinel-2 images, 12 optical bands with spatial resolutions of 10, 20, and 60 m were obtained and resampled to 10 m resolution (the SWIR-Cirrus band was not adopted due to its poor quality). Three scenes of cloud-covered optical images were adopted to obtain more cloud samples, as shown in Fig. 3(f)-(h). Pixel-level "Cloud possibility" information was extracted from the quality bands of the Sentinel-2 images.
Finally, a digital elevation model was introduced to geocode the processed GF-3 data. The geocoded GF-3 and Sentinel-2 data were then coregistered with WGS84 and UTM 49N systems.
The root-mean-square error for the coregistration was controlled within one pixel using over 30 ground control points. All features were normalized to avoid bias during the following processes.

B. Cloud Probability Weighting
With the extracted optical and SAR features, the common fusion strategy in previous studies directly concatenates the two features, which means the weight of the two kinds of features is 1 to 1. However, previous research [6] found that clouds can obscure the spectral information of image pixels. Therefore, the cloud-contaminated optical features might provide spectral reflectance from the cloud surface rather than the ground surface under cloud coverage. In this kind of cloud coverage situation, directly cascading the optical and SAR features will exacerbate invalid cloud information interference and increase land surface recognition's difficulty. But on the other hand, the interference does not mean that the optical features of all cloud-covered samples are entirely harmful. Depending on the covered clouds, they may still retain a certain recognition capability. Inspired by this, this study develops a weighting strategy to assign different weights to optical features by evaluating the credibility of the optical features based on the sample's cloud coverage possibility, which reflects cloud thickness to some extent. This way, the cloud-weighted fusion strategy can make the most of the optical information for the pixels without clouds. For pixels with clouds, the strategy can reduce the weight of optical features according to the cloud coverage possibility to help use the advantages of the two data sources more effectively and improve ULC recognition with more credible information. The cloud-based weighting model is shown as follows: where W is the weight for optical features, and CP is the cloud possibility of the sample pixel, ranging from [0, 1]. In this study, pixel-level cloud possibility information was extracted from quality bands of Sentinel-2 images, which could also be identified with cloud detection algorithms such as the F-mask. The greater the cloud possibility, the smaller the weight of the optical feature. For each sample, the vector of original optical features will be dot multiplied by its cloud possibility wight W and then fused with its SAR features. For a cloud-free sample, the cloud possibility is 0, and the fusion weight will be 1. For a sample with 100% cloud possibility, the weight of optical features will be reduced to 0.5 after fusion, which means the focus will be more on the discrimination of SAR features.

C. Multilevel Cloud Dictionary Learning
Considering the large variance of different types of clouds, we then propose a pixelwise multilevel CDL model for ULC classification. Previous research has found that spectral information could be disturbed differently for cloud-covered pixels with different cloud probabilities [6]. Even for the same type of land cover, the optical features of samples with varying possibilities of cloud may be completely different due to different cloud interference. Therefore, cloud interference can lead to the cloud-covered samples' large intraclass difference and the small interclass difference for land covers. Under this condition, previous methods that process all samples equally may cause samples of the same land cover under different cloud states to interfere with each other, which hinders the learning of the classifier for ULC classification. Therefore, in this study, cloud information is introduced into the learning process of the classifier so that the pixels with different cloud probabilities can be processed differently to reduce cloud interference on classifier learning. First, we divide the cloud states into three levels of Free_CP, Low_CP, and High_CP according to the CP of [0, 10%], [10%, 60%], and [60%, 100%]. A pixelwise CDL method is then proposed to design cloud-based subdictionaries for different cloud-level samples. As seen in Fig. 2, the pixelwise cloud dictionary is composed of three cloud-level subdictionaries, with the subdictionary of each cloud level responsible for the land cover recognition of pixels covered by the same level of clouds.
Each cloud-level subdictionary comprises four land cover subdictionaries for ULC classification. Thus, we obtain a discriminant dictionary containing multiple subdictionaries representing different cloud and land cover categories. With this cloud-oriented dictionary, each dictionary atom will have a corresponding cloud and land cover label. The land cover label thus then equips the discriminant dictionary with ULC classification capabilities. Each dictionary atom composed of cloud-weighting fused optical and SAR components and provides more credible and prosperous information for land surface monitoring.

D. Model Construction and Optimization
For basic discriminative DL, suppose that there are k classes of where λ regularizes the fidelity of the solution, and ||X i || 1 is the l 1 norm constraint of X i . By introducing the category information of samples in the above DL process, the learned dictionary D can be class discriminative. Therefore, for an input sample y, by obtaining its sparse coefficient x over the learned dictionary D through solving (2), the class label of sample y can be calculated by minimum residual To construct our objective function, cloud information and land cover label information will be utilized to constrain the dictionary and coding coefficients, to then learn a multisource fusion dictionary with category discrimination under different cloud conditions. Referring to the basic discriminative DL [41], where D W _optical relates to the weighted optical information and D SAR relates to the SAR information. For each cloud level D c ࢠR m×d , the corresponding training samples Y c of the same cloud level can be represented as i . Apart from requiring that the subdictionaries D c should have powerful capability of reconstructing Y c , it is also required that D c should have powerful capability of distinguishing the samples in Y c . To this end, two constraints need to be considered. First, a sample is supposed to be well represented by the same class of subdictionary. Besides, to enhance the class discrimination capability of the subdictionaries, there should be low coherence between the class-level subdictionaries to encourage the subdictionaries associated with different classes to be as independent as possible. In this way, the cost function of CDL is defined as where λ and µ denote the parameters that balance the tradeoff between the constraints. When fixing one of D and X, the objective function (5) could be convex. The model optimization thus could be achieved by an iterative solution, which is fixing D to solve for X, and then fixing X to solve for D. Then, to efficiently solve (5), the alternating direction method of multipliers [42] is adopted to reduce the computational complexity. And the conjugate gradient method is utilized to efficiently solve the convex quadratic programming problem. More algorithm-solving details can refer to [43].
Finally, the learned cloud dictionary will be used for ULC classification under clouds. For an input sample y c , with its cloud label, the corresponding cloud-level subdictionary will be used for identifying its land cover label by solving For the constructed model, the sparsity regularization parameter λ and the coherence tradeoff parameter µ were set to 0.005 and 0.002 by fivefold cross validation. And the number of atoms in each subdictionary was set as 100. For model optimization, the iteration number was set to 100.

A. Urban Land Cover Classification Accuracy
For evaluating the performance of WCDL, a total of 37 806 samples were carefully labeled with visual interpretation over a coregistered cloud-free Sentinel-2 image and Google Earth images, including 9708 VEG, 9113 SOI, 10 596 IS, and 8389 WAT. The sample sizes of the three cloud levels are 5438 (Free_CP), 19 465 (Low_CP), and 12 903 (High_CP). Half of the samples were randomly selected as training samples in experiments, and the other half were selected as testing samples. Quantitative metrics, including the overall accuracy (OA), the producer's accuracy (PA), and the user's accuracy (UA), were applied to evaluate the overall classification accuracy and the confusion between land covers. The OA is calculated by adding all the correctly classified samples and dividing by the total number of samples. The PA refers to the probability that a sample in a given class was correctly classified. It is the number of samples classified accurately divided by the total number of true samples for that class. The UA measures the likelihood that a sample predicted to belong to a particular class actually belongs to that class. The probability is calculated by dividing the number of correctly predicted samples by the total number of samples  predicted to belong to that class. The mathematic calculation of OA, PA, and UA is as follows: PA (c) = n cc n c+ (8) UA (c) = n cc n +c (9) where c denotes the cth category, i is the number of rows in the matrix, n cc denotes the number of observations in row c and column c, n +c and n c+ are the marginal totals of row c and column c, respectively, and N denotes the total number of observations. In order to assess the effectiveness of WCDL, various fusion and classification methods were adopted for comparison, as given in Table I. Representative classification methods such as SVM, RF, and deep neural network GoogleNet were chosen due to their proven effectiveness. The parameters were set as the recommendation. In addition, the results with single optical data (OP) and single SAR data (SAR) were also compared with these fusion methods to explore the effects of optical-SAR fusion in cloud-prone areas. Fig. 4 further plots the accuracies of different land covers for each method separately for more intuitive analysis. Fig. 4(a) shows these comparison methods' PA, UA, and OA, with each method represented by one color. Each corner of the radar map represents the PA or UA of a land cover type. Fig. 4(b) shows the OA of all methods. Two main findings were observed from Table I and Fig. 4. First, compared with single cloud-contaminated optical or SAR data, all algorithms using fused data could achieve higher accuracy with OA above 80%. This result further enhanced previous findings on the significance of combining optical and SAR data in cloud-prone areas. It also showed that the ULC classification accuracy of optical data could be significantly affected by clouds and was relatively low. On the other hand, although single SAR data showed a better performance than single optical data, it was still unsatisfactory. The supplement of optical data could improve its OA by more than 6%.
Second, compared with RF, SVM, and GoogleNet, the validity of WCDL could be certificated by the notable 3% increase in OA. GoogleNet did not perform remarkably in this case because it was more easily disturbed by inaccurate sample information caused by cloud contamination. This is consistent with the findings in [6]. Fig. 4(a) demonstrates that the WCDL achieved the highest PA and UA in most cases, with a gain of up to 9% compared with other fusion methods. Although SVM and RF achieved higher UA for SOI, their PA was significantly lower than WCDL, which showed a better balance of PA and UA. GoogleNet showed the highest PA for VEG and SOI, while its overall performance is far from stable. Single optical data displayed disappointing accuracy, and the performance of single SAR could be improved by fusion with optical data. Among the four land covers, SOI and IS were harder to distinguish due to their more complicated surface reflection, while WAT was easier to identify, even for single optical data.
To verify the effectiveness of the cloud-level dictionary learning (CDL), the confusion matrices with PA and UA of general DL without considering cloud differences and the cloud-oriented CDL under three CP levels Free_CP, Low_CP, and High_CP were compared in Table II. Each of the six confusion matrices represents the results of DL or CDL under each cloud level. Comparing results under the three cloud levels Free_CP, Low_CP, and High_CP showed that as CP increased, the classification accuracy of PA, UA, and OA decreased for both DL and CDL. This reflected the impact of clouds. WAT reported lower impact from clouds because SAR information was already valid for water recognition due to its unique scattering behavior. In contrast, due to the insufficient distinguishing capability of single SAR for other land covers, such as SOI and IS, more severe cloud disturbance could cause a more serious failure of spectral information. The accuracy difference between land covers here was consistent with Fig. 4 with higher accuracy for WAT and VEG and lower accuracy for SOI and IS. At each cloud level, introducing the cloud-level DL strategy, CDL improved OA by more than 2%. CDL successfully mitigated the land cover confusion for all cloud conditions and improved UA and PA by up to 9% and 10% for SOI and IS. These results further verified the effectiveness of the cloud-level strategy, which reduced cloud interference on classifier learning by the cloud subdictionaries responsible for different cloud levels.
To further evaluate the effectiveness of the proposed CP weighting model, the classification methods, SVM, RF, GoogleNet, and DL, with/without combining with the CP weighting model are shown in Fig. 5. The original methods without CP weighting are represented with blue arrows, and the methods combined with CP weighting model are represented with orange dots, exhibiting PA, UA, and OA for each land cover. It could be found that no matter which classification method, the combination with the CP weighting model showed higher OA than its original method, and the PA and UA for land covers improved in most cases. Among the classification methods, DL reported more stable performance than SVM, RF, and GoogleNet, benefiting from the cloud weighting model. With cloud weighting, GoogleNet showed a huge improvement in PA for IS and WAT and UA for VEG, SOI, and IS. When observing the land covers, the PA for SOI and IS and UA for VEG and IS witnessed improvement from cloud weighting for all methods. The cloud weighting model might provide a high reference value, especially for impervious surface identification.

B. Classification Map Under Clouds
To analyze the experimental results more intuitively, the ULC classification maps are illustrated in Fig. 6. The first to fourth rows show the land cover type of soil, urban impervious surface,  water, and vegetation for detailed analysis of the performance on different land types. Fig. 6(a) shows the cloud-free image with labeled ground truth samples for reference and (b) shows the experimental cloud-covered image. Fig. 6(c)-(h) shows the results of single optical, single SAR, SVM, RF, GoogleNet, and WCDL, respectively. From Fig. 6(c), there was severe land cover confusion on the results of single optical data when covered by clouds. Although the performance of single SAR data is much better than Fig. 6(d), there were apparent noises in SAR results, and many SOI samples were misidentified. SVM, RF, and GoogleNet, which directly concatenated optical and SAR data for fusion, shown in Fig. 6(e)-(g), respectively, also showed different degrees of land cover confusion, although better than results using single data. Comparatively speaking, WCDL [see Fig. 6(e)] showed its effectiveness through the more accurate ULC recognition.
Soil is less distributed in highly urbanized areas, mainly in mountains. The first row of Fig. 6 shows the soil recognition performance of the comparison. Single optical image (c) identified the soil area while missing most IS due to the cloud coverage. Single SAR (d) exhibited much "soil noise" due to its imaging issues. SVM, RF, and GoogleNet had an underestimation of soil, affected by fused cloud-contaminated optical information, whereasWCDL well identified the mountain soil area. Results on the IS area (the second row) showed that RF results had more noise and overestimated IS. For example, many vegetation samples on both sides of the port were misclassified as IS in RF. Again, soil samples were missed by SVM and RF here. Among the methods, GoogleNet exhibited instability. For instance, although clouds do not cover the water area, other cloud samples interfered with the classifier and resulted in misclassification, even for cloud-free samples. But it did show less noise compared to other methods. It also missed the anchored IS object floating on the sea, as in the third row. Recognizing vegetation and soil in mountain areas (the fourth row) could be difficult. Under thick clouds, SVM was heavily affected by clouds. It misidentified the vegetation as IS, with a shape like the covered cloud shape. Clouds also affected RF, showing a mixture of IS and soil with noise. GoogleNet was also misled by floating clouds and reported IS in the middle of the mountain. In contrast, WCDL showed less affected, with successful recognition of vegetation and soil in mountain areas.
For a more detailed analysis of the proposed WCDL for different cloud levels and the cloud-oriented strategies, Fig. 7 shows the classification results under three cloud levels. The first, second, and third rows indicated the ULC classification results under Free_CP, Low_CP, and High_CP cloud levels. Fig. 7(a) and (b) shows the experimental cloud-covered image and the cloud-free image for reference. Fig. 7(c)-(f) shows the results of DL, WDL, CDL, and WCDL. Although clouds did not cover the area from the first row, DL (c) and WDL (d) failed to identify the IS pixels of seafood farming tools. It might be due to the interference introduced by other cloud-covered samples in the classifier training process. And with the CDL, which considered three cloud levels, CDL (e) and WCDL (f) showed correct recognition. For the case of Low_CP clouds depicted in the second row, DL, without considering the cloudoriented strategies, could not accurately identify the fishpond and surrounding roads. WDL, introducing the CP weighting strategy, showed better discrimination. For CDL, with raising the cloud-level dictionary learning, (e) showed clear fishponds and surrounding road boundaries. And through adding the CP weighting strategy in CDL, WCDL in (f) further identified the soil pixels. From the third row, even for the most easily identifiable water body, DL, which did not consider the influence of clouds, misjudged most water bodies as vegetation in (c), which was alleviated by the CP weighting strategy in (d). With the cloud-level dictionary learning method, results in (e) and (f) avoided the misidentification of water, and (f) further correctly identified the ships that were misjudged as SOI in (e) as IS.

V. DISCUSSION
The proposed WCDL method performed well in fusing cloudcontaminated optical and SAR data to identify ULCs in cloudy environments. The cloud possibility weighting and cloud-level dictionary learning both rely on the cloud possibility information, which makes the identification of cloud possibility plays an important role in the proposed WCDL. This study extracts the cloud information from the quality band of Sentinel-2 images. For other optical images that could not provide quality information, cloud detection methods, such as F-mask, could be used to identify the pixelwise CP. Since the CP information is the premise requirement of this method, the identification accuracy of the CP may have a certain impact on the effect of this method. At present, experiments have proved that the combination of cloud information provided by Sentinel-2 and the WCDL method achieves impressive performance. Further research that considers more accurate cloud detection methods may bring greater improvements.
Sentinel-2 optical data and Gaofen-3 PolSAR data are utilized in this study. With full-polarization capabilities in HH, HV, VH, and VV four modes, GF-3 can provide rich scattering information of the land surface, including various polarization features. Such refined surface scattering information is very valuable for small-scale mapping. However, it might be difficult to obtain full-polarization data covering a large area in large-scale land surface classification. And the data processing work, such as the polarimetric decomposition of full-polarization data, might be another challenge. Open-source dual-polarization SAR data, such as Sentinel-1, is a good choice for large-scale land cover recognition. The study of fusing open-source optical and dualpolarization SAR data might be worth further study.
In addition, local computers' limited storage and computing power hinder large-scale applications. Online platforms such as Google Earth Engine can provide high computing power and data support. Therefore, modifying and embedding this method into GEE to use the open-source online platform for large-scale monitoring would also be considered in our next work.

VI. CONCLUSION
This work proposes a cloud-oriented method for fusing cloudcontaminated optical and SAR data to improve the accuracy of ULC classification in cloud-prone areas. The developed cloud possibility weighting model and the pixelwise CDL method could reduce cloud interference and maximize the integration of effective information from both optical and SAR data to complement each other's advantages. The proposed WCDL method provides a reference for fusing cloud-contaminated optical and SAR data for more accurate ULC classification in cloud-prone regions. Experiments show the limitation of only using a single data source and justify the importance of data fusion. Comparison with other classification methods verifies the superiority of the proposed WCDL, achieving a notable 3% increase in OA and up to 10% improvement in PA and UA compared with RF, SVM, and GoogleNet. Further ablation studies demonstrate the validity of the cloud weighting model in cloud-contaminated optical and SAR fusion for all classifiers, as well as the multilevel CDL method for land cover recognition in all cloud conditions.