Unsupervised Haze Removal for Aerial Imagery Based on Asymmetric Contrastive CycleGAN

Aerial image dehazing is an important preprocessing step, since haze extremely degrades the imaging quality and affects subsequent the applications of aerial imagery. Most current haze removal methods achieve encouraging performance by relying on paired synthetic data, while are limited to their generality and scalability in the practical tasks. To this end, this paper aims to learn an effective unsupervised dehazing model from an unpaired set of clear and hazy aerial images. Motivated by the great advantages of contrastive learning in unsupervised representation field, we first attempt to formulate a Asymmetric Contrastive CycleGAN dehazing framework (namely ACC-GAN) to maximize the mutual information between the hazy domain and the haze-free domain. In the latent representation space, the introduced contrastive constraint ensures that the restored image is pulled closer to the clear image and pushed away from the hazy image, so as to indirectly regularize the unsupervised dehazing process. Importantly, different from the standard CycleGAN, we develop an additional feature transfer network into the forward path to form the asymmetric structure of ACC-GAN, which can enhance encoded features from hazy domain to haze-free domain. During training, multi-dimension loss terms are jointly built into a loss committee for generating dehazed results with higher naturalness and better fidelity. Experimental results on synthesis and real-world datasets indicate that our method is superior to existing unsupervised dehazing approaches, and is also very competitive to other related supervised models.


I. INTRODUCTION
Aerial images refer to photos taken from UAVs, helicopters and other aircraft, which have rich information content. Therefore, they are widely used in various fields, such as remote sensing [1], agriculture [9], geology [10] and earth science [11]. In addition, aerial images can also facilitate numerous subsequent high-level vision applications, such as target detection [12], aerial surveillance [13], scene understanding [14], land cover classification [15]. Since aerial images are perceived from considerable distances, these images often suffer from blurring, color shifts and low visibility due to variations in atmospheric paths. Such atmospheric effects, mainly in the form of clouds, fog and mist, can extremely degrade the imaging quality and aerial vision-based systems. Cloud is a giant cloud of ice crystals or water droplets moving through the atmosphere. Under cloudy conditions, the imaging path is severely obstructed, aerial sensors cannot directly perceive the ground. Haze is the condensation of smoke, fine dust or light water vapor, resulting in the attenuation of light transmission through the air [16]. Aerial images taken in hazy conditions are typically characterized by low intensity, low contrast, and low visibility. Thus, it is of great interest to develop an effective aerial image dehazing algorithm to recover high-quality haze-free images.
The dehazing technology of aerial images has attracted increasing attention [1], and several classical approaches have been developed. Early methods employed image enhancement strategies, such as histogram equalization [2], homomorphic filter [3], wavelet transform [4] and Retinex [5], which can effectively recover clear images. However, they often only change the contrast of the image and do not really remove the haze. Later, some image restoration based dehazing algorithms have been designed to deal with haze removal, such as dark channel prior (DCP) [6], haze thickness map (HTM) [7] and haze optimized transformation (HOT) [8]. These algorithms start from the blur mechanism and the causes of haze image degradation. The mathematical model of imaging process is established to deal with the inverse process of image degradation, and then the clear image is restored. The commonly used atmospheric scattering model [17], [18] describes the simple approximation of haze effect, and its mathematical expression is where I(x) and J(x) denote the hazy image under observation and the clear image, respectively. Note that, there are two critical parameters: A refers to the global atmospheric light and t(x) is termed as the medium transmission map. Under the homogeneous haze, the map of transmitting process t(x) is define as: t(x) = exp(−βd(x)), where β and d(x) denote the atmosphere scattering coefficient and the scene depth. Although the above methods are generally simple and fast, their threshold parameters need to be fine-tuned to obtain satisfactory dehazing results, and their results are easily lead to serious color distortion (see Figure  2 (b)). In recent years, numerous deep learning based methods have been employed convolutional neural network (CNNs) and generative adversarial networks (GANs) for image dehazing. The CNN based methods mainly utilize the atmospheric scattering model to regress the transmission map and clear image [19]- [22], or directly restore the results in endto-end training manner without relying on the model [23]- [25]. The GAN based methods [26]- [28] adopt the game strategies to map the hazy image to a clear one with the help of generator and discriminator. However, due to the aviation scene is complex, the objects are diverse and the space is vast, the direct application of these previous ordinary dehazing models to aerial images often suffers from lowquality restored results. These models lack the ability to express complex scenes, resulting in less effective extraction of haze-related features. In particular, they do not consider non-uniform haze, which is the most common haze state in aerial images. To overcome such problems, a series of haze removal methods [1] for aerial image have started to emerge. On the one hand, the latest aerial image dehazing FIGURE 2. Aerial image dehazing results by different methods, including DCP [6] (prior-based method), GFN [22] (paired learning-based method), and CycleGAN [35] (unpaired learning-based method). Zooming in the figures offers a better view at the dehazing capability.
datasets have been established, such as SateHaze1k (three levels of fog, namely thin, moderate, and thick fog) [29], UN-HAZE (uniform haze-clear image pairs) [30], and NONUN-HAZE (nonuniform haze-clear image pairs) [30]. On the other hand, the recent aerial image dehazing algorithms exhibit remarkable performance, such as MRCNN [32], H2RL-Net [31], and RSDehazeNet [16]. However, it is worth noting that the above mentioned dehazing networks rely on paired synthetic data to train their models in a fully supervised manner, which directly learn a suitable representation from one-to-one mapping relationship, as shown in Figure 1. The expression can be defined as follows: where F (·) represents the mapping function related to image degradation. For supervised dehazing models, there exist two major limitations: (1) Under complex and changeable hazy environments, it is time-consuming and laborious to collect paired training data in large scale. (2) Since the current synthetic hazy datasets are generally too simplified to represent the complex real-world hazy scenarios, the performance of those supervised methods in dealing with real aerial cases will significantly drop due to the intra-domain and interdomain gap. Figure 2 (c) shows an real-world example in which the dehazing result is suboptimal. Given that, some researchers [33], [34] start to shift to explore the unsupervised aerial image dehazing strategies. Most existing methods can be viewed as an image-to-image transfer learning problem, which tries to get the transferable and domain invariant feature through decreasing the distribution discrepancy of two domains. Following the advantage of CycleGAN [35] in unpaired setting, visual adversarial frameworks have performed feature-level and pixel-level adaptation jointly between the hazy domain and the haze-free domain, see Figure 1. Due lack of supervised constraints suitable for haze removal, directly applying limited CycleGANbased adversarial frameworks for the dehazing task will inevitably suffer from the under-constrained problem. Figure 2 (d) gives a corresponding dehazing example. Fundamentally, existing CycleGAN-based unsupervised dehazing methods do not supply an effective mechanism that can better learn mutual information between the hazy domain and the hazefree domain and infer its latent-space representation, which could be helpful for unsupervised learning [46]. One more thing, since the domain gap between the hazy domain and the haze-free domain, previous unsupervised CycleGAN-based dehazing methods are not conducive to domain translation by using symmetric architecture [48].
Towards this end, a novel unpaired aerial image dehazing solution called Asymmetric Contrastive CycleGAN (ACC-GAN) is developed. CycleGAN is a common framework for unsupervised low-level vision tasks. Motivated by the great advantages of contrastive learning in unsupervised representation field, we introduce recent contrastive learning into the CycleGAN framework to achieve end-to-end training without paired information. Contrastive learning was widely applied in unsupervised representation learning [36]- [38] and showed its great ability to capture useful visual features by leveraging both positive and negative samples. Considering the image dehazing task, the information of the hazy image and the clear image are taken as negative samples and positive samples. The proposed ACC-GAN aims to maximize mutual information, and the representations are learned in their embeddings by encouraging the positives closer while keeping the negatives further away. In other words, our introduced contrastive regularization indirectly constrains the latent space of corresponding patches to guide dehazing by acting as a "supervising teacher". To obtain best dehazing performance (see Figure 2 (e)), we combine multiple loss functions into a loss committee to further constrain the proposed ACC-GAN for the better jointly optimization.
In summary, we make the following contributions: • We formulate an effective ACC-GAN to leverage contrastive learning to maximize the mutual information between the hazy domain and the haze-free domain, which helps the model trained in an unpaired fashion and improves aerial image dehazing generalization. • To the best of our knowledge, this study is the first attempt to integrate contrastive learning into the architecture of asymmetric CycleGAN, which provides information supplement for one-way translation by utilizing the feature transfer network. • We build an effective loss committee embedded in ACC-GAN for collaborative optimization and benefit to generate dehazed results with higher naturalness and better fidelity. • Experimental results on multiple synthesis and realworld datasets indicate that our method is superior to existing unsupervised dehazing approaches, and is also very competitive to other related supervised dehazing models.
The remainder of this paper is organized as follows: Section II describes an overview of related work. In Section III, we provide the main technical details of our proposed approach. Then, Section IV presents the comprehensive experimental results with discussions. At last, the conclusions are given in Section V.

II. RELATED WORK
In this section, we briefly introduce several aerial image dehazing approaches based on priori, supervised learning and unsupervised learning types. In addition, recent successful contrastive learning mechanism is presented.

A. AERIAL IMAGE DEHAZING
Haze removal from aerial images has attracted increasing attention, and many classic methods have been developed. Traditional researchers toward aerial image dehazing task by utilizing various kinds of prior knowledge, such as He et al. [6] proposed the dark channel prior (DCP), Zhang et al. [8] presented haze optimized transformation (HOT) method, and Makarau et al. [7] raised haze thickness map (HTM) method.
Recently, learning-based approaches start to employ CNNs and GANs for haze removal and achieve state-of-the-art performance. Based to the physical model, the CNN based methods devote to regress the transmission map or dehazed the image using multiple features. Cai et al. [19] first exploited DehazeNet to recover haze-free image by simple pixelwise operations. The lightweight AOD-Net [21] was proposed to be an end-to-end architecture design dealing with the dehazing problem. Liu et al. [24] presented a novel trainable Grid-Dehaze Network (GDN) that indicates how confident the network is about the multi-scale features are learned. Later, these models succussed inspired great efforts invested in the development of aerial image dehazing methods, such as MRCNN [32], H2RL-Net [31], RSDehazeNet [16], where more effective network architectures were mainly designed. Unlike CNN, the GAN based methods have adopted generation and discrimination networks to regularize the dehazed image to have reliable colors and structures. The work of [39] explored dehazing approach using SAR and multi-spectral images as input to train a GAN. Afterwards, Huang et al. [29] developed conditional GAN with SAR image prior to eliminate the image blurring. Despite the remarkable progress, the above data-driven learning methods depend on the paired training data that extremely difficult to acquire. In practice, however, for real hazy aerial images without ground-truth (i.e., unpaired), most existing supervised dehazing models may fail due to the irregularity and nonuniform haze.
To the best of our knowledge, there are only a few unsupervised efforts for aerial image dehazing. CycleGAN [35] has been employed to address the unpaired image-to-image translation problem, to achieve an unsupervised dehazing process. SkyGAN [34] was suggested by utilizing HSI guidance and multi-cue color input for dehazing. Hu et al. [33] used edgesharpening cycle-consistent adversarial network without require prior information. However, these unsupervised dehazing performance is still insufficient, leaving room for further improvement. Note that, these unsupervised dehazing networks are still implemented on the standard CycleGAN, and few studies consider the impact of asymmetric CycleGAN on dehazing performance. Therefore, this paper focuses on learning an effective asymmetric CycleGAN-based dehazing model from an unpaired set of clear and hazy aerial images.

B. CONTRASTIVE LEARNING
In recent years, contrastive learning has been widely studied in self-supervised and unsupervised representation learning [40], [41]. Its working principle is to pull the positive sample near the anchor and push the negative sample away in the latent space, so as to increase the mutual information in the learned representation. The selection of positive and negative samples may vary depending on the specific downstream tasks. Previous work has applied contrastive learning to a host of high-level vision tasks [42], [43], which has shown the ability to capture complex visual features and model the contrast between positive and negative features.
More recently, contrastive learning also made an important exploration in the low-level vision tasks. For exemple, CUT [36] first utilized noise contrast estimation to achieve imageto-image translation, which aims to learn the mutual relationship between the input image patches and the generated image patches. Based on the dual advantages of CUT [36] and CycleGAN [35], Han et al. [37] proposed DCLGAN to perform unsupervised image-to-image translation, and achieved new state-of-the-art performance. Wu et al. [38] developed a novel AECR-Net based on autoencoder network and contrastive regularization for single image dehazing. CWR [44] was a model of contrastive learning applied to underwater image restoration problem, which showes impressive restoration performance on the created underwater dataset HICRD. Chen et al. [46] designed an effective unsupervised single image deraining GAN to explore the mutual properties of the unpaired exemples by means of contrastive learning. Wang et al. [45] introducted a unsupervised super-resolution network, which uses contrastive learning to deal with various degradations based on the learned representations.
Inspired by the above related work, this paper introduces a contrastive learning scheme for aerial image dehazing task. Considering the properties of haze removal task, the information of the hazy image and the clear image are taken as negative samples and positive samples, respectively. The schematic diagram is shown in Figure 3. Specifically, the features of each exemples are first extracted by the encoder, and then the stack feature representation is formed and finally sent to the two-layer MLP. In the latent representation space, the introduced contrastive learning scheme ensures that the restored image is pulled closer to the clear image and pushed away from the hazy image.

III. PROPOSED APPROACH
In this section, we first illustrate the proposed approach, including the overall network design and technical implementation details. Then we introduce each member of the loss committee.

A. ASYMMETRIC ARCHITECTURE
Given a hazy domain D h (a set of hazy images I h ) and a hazefree domain D n (a set of haze-free images I n ), the goal of traditional paired image dehazing is to learn a direct mapping function F pair : D w×h → D W ×H based on the paired dataset can well estimate the corresponding ground-truth haze-free image. However, this way of paired learning often suffer from certain limitations. In this work, our goal is to learn an indirect mapping function F unpair : D w×h → D W ×H from unpaired training data, so that the dehazed image where we do not have any explicit association between individual images in D h and D n . Since the ground truth labeled data is not available, unpaired image dehazing is much more challenging than paired image dehazing. In order to achieve this goal, it is very important to explore the internal relationship between the images of the two domains. We now present our proposed ACC-GAN and its overall architecture is shown in Figure 4. In detail, it has two generators G, F generating the clean and hazy images respectively and two discriminators D 1 , D 2 distinguishing between fake dehazed images and real clean images. Table 1 and Table 2 show the specific architecture details of the generator and discriminator. Specifically, the designed ACC-GAN covers two main branches below: (1) hazy to hazy cycle-consistency branch h → n h → h * , predicting clean images from hazy images and subsequently reconstructing them with the use of the generator; (2) haze-free to haze-free cycle-consistency branch n → h n → n * , generating hazy images from clean images and afterwards reconstructing them with the use of the generator. Most of CycleGAN-based [35] models adopt symmetrical translation structure for the optimization of stable equilibrium. Instead, motivated by the similar intuition in [48], we design a novel asymmetric CycleGAN for bidirectional translation process. The reason why we adopt asymmetric architecture is intuitive. If we treat the forward cycle and backward cycle equally, it is not enough to fully excavate the feature properties from hazy domain to haze-free domain. Therefore, we insert an additional feature  transfer network into the forward path to achieve an enhanced and disentangled information for domain translation.
However, it hardly tackles with the heavy variations in haze removal task only by relying on cycle-consistency constraints, because this constraint is weak in complex image space [46]. Notably, additional knowledge from unpaired information based regularization has been adopted, enabling the model to improve the quality of the restored images. Therefore, we further introduce contrastive regularization indirectly constrains the latent space of corresponding image patches to better guide dehazing by acting as a "supervising teacher". The component is elucidated below. VOLUME 4, 2016 5 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3186004 Inspired by contrastive learning, we seek to maximize the mutual information between the corresponding patches of input and output, which can supervise the dehazing process with feature disparity obtained by constructing patch-wise contrastive loss. For example, for the ground patch of the generated dehazed image, we should be able to associate it more closely with the same ground patch of the original input image rather than the rest of the image. In contrastive learning, there are two aspects to consider: one is how to construct the feature representation space for comparison, and the other is how to select positive and negative pairs. Considering the properties of haze removal task, the information of the hazy image, clear image and restored image are taken as negative samples, positive samples and anchors, respectively. The positive pair includes the restored image and the corresponding clear image, while the negative pair includes the restored image and the corresponding hazy image. The feature visualization process is shown in Figure 5. For a anchor point, contrastive learning aims to encourage the positives closer while keeping the negatives further awary.
To constrain features with the discovered pairs of patches in their embeddings, let us define the query, positive, and N negatives to v, v + , and v − n . Motivated by [36], we also establish a (N + 1) path classification problem and calculate the probability of choosing "positive" instead of "negative". Mathematically, this can be further regarded as the cross entropy loss, which is calculated by the following formula: where the cosine similarity between u and v can be expressed as: In formula (6), τ indicates the temperature parameter to be used to scale the distance between the query and other examples. Here, we use 0.07 as the default value. The number of negatives is set to 255 by default. Technically, we extract features of images from the four encoding layers (the downsampling-1 and downsampling-2, the residual block-1 and residual block-5) of the generator and send them to a 2-layer multi-layer perceptron (MLP) projection head. After having a stack of features {z l } L = G l enc (x) L , we . Such latent representations can be captured the relationship between the hazy domain D h and the haze-free domain D n by acting the role of a "supervising teacher". Thus, we introduce a contrastive regularization term to regularize the captured images' latent representations. In other word, this regularization term is employed in the form of contrastive loss (as a member of the loss committee). Mathematically, the contrastive loss is refined as Considering the reverse mapping on the other side, we also introduce a similar loss, Interestingly, this guidance from contrastive learning essentially can be regarded as an adversarial learning collaboration: the lower disparity the mutual feature is, the more similar the latent representation is, the cleaner the dehazed output is, and vice versa.

C. LOSS COMMITTEE
For aerial image dehazing task, the output of dehazing process should be close to the haze-free image domain in a certain level. In addition, another important goal is to ensure  Images  2903  355  355  355  2090  0  Clear Images  3577  355  355  355  2441  0  Test-Set  Hazy Images  100  45  45  45  127  30  Type Synthetic Real-world content consistency and avoid color-/structure-corruption after dehazing. To this end, we combine multi-dimension loss terms into a loss committee as the task-specific proxy guidance. These optimization losses are jointly embedded into the proposed ACC-GAN during the training stage. We describe the members of this committee in the following.

1) Adversarial Loss
For promoting the generated images to be highly realistic, the adversarial loss is adopted in two domains. In terms of the mapping G : D h → D n , the adversarial loss is formulated as: Similarly, we define same adversarial loss for the mapping F : D n → D h : The overall adversarial loss is calculated by

2) Cycle Consistency Loss
We also introduce a cycle consistency loss for solving the issue that an adversarial loss independent fails to ensure the matching of the target distribution and the output distribution. L cycle is capable of limiting generated samples' space and preserving image content. The loss in image domain is expressed as:

3) Contrastive Loss
As mentioned above, the overall contrastive loss can be defined as:

4) Identity Preserving Loss
To further preserve the identity information such as color invariance between the input and the output, we add an identity preserving loss. Considering training speed and time complexity, we do not use PatchNCE loss [36] as identity preserving loss. Instead, we define it as follows:

5) Perceptual Loss
Other than the measurement of images' per-pixel difference, perceptual loss displays a relationship to the distinction of feature maps, covering a wide range of contents and perceptual quality. As such, we exploit the perceptual loss to encode the difference between the original hazy image and the corresponding dehazed image. The perceptual loss can be defined as: where ϕ l (·) denotes the features extracted from l-th layer of the pretrained CNN.

IV. EXPERIMENTS AND RESULTS
In this section, we describe the datasets and training details. Then, comprehensive dehazing experiments are employed to demonstrate the effectiveness of proposed ACC-GAN against the current comparing approaches. In addition, ablation studies are conducted to validate the efficiency of our designed framework.

A. DATASETS SETUP
We use two synthetic datasets including the RESIDE dataset [20] and SateHaze1k dataset [29] with different haze distribution states. Besides, real-world datasets are collected to evaluate the performance of dehazing and two datasets are involved: Overwater dataset [47] and RealHaze dataset [31]. The detailed descriptions are tabulated in Table 3, including the synthetic and real-world datasets. Note that we shuffle above-mentioned images randomly in order to achieve unpaired supervised learning.

B. TRAINING DETAILS
The detailed architecture and parameter settings of the proposed ACC-GAN are depicted in Table 1 and Table 2 using Pytorch framework. To accelerate the training process, the Adam optimization is applied with a batch size of 1. For the better performance, we train our model for total 200 epochs. The proposed network is trained from the scratch for 100 VOLUME 4, 2016  This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and samples. For the RESIDE and Overwater datasets, the training phase took about 5 days. For the SateHaze1k dataset, the training phase took about 3 days. All the comparing testing experiments performs with the same datasets and hardware environment is performed on a server with the NVIDIA Tesla P100-PCIE-16GB GPU. All parameters are defined via cross validation using the validation set, and the whole network is trained in the unsupervised manner.

C. RESULTS ON SYNTHETIC DATASETS
With the help of the ground truth in synthetic datasets, we perform the quantitative comparison using two commonly evaluation metrics: the Peak Signal to Noise Ratio (PSNR) and Structural SIMilarity index (SSIM).

1) Evaluation on RESIDE Dataset
We compare our method with other five comparing image dehazing methods, including DCP [6], DehazeNet [19], AOD-Net [21], GFN [22], and CycleGAN [35]. Meanwhile, the Mean Square Error (MSE) and computational cost are also compared. Through the Table 4, our proposed method gets the highest value both in PSNR/SSIM and the lowest value in MSE. The notable increasing scores reflects the excellent performance of ACC-GAN, which benefits from the improvement of contrastive regularization for unsupervised learning. Even though DCP and AOD-Net algorithms have advantages in running speed, their dehazing performance is far from enough. In contrast, our proposed method can take into account both dehazing performance and running cost. Besides the quantitative results, we further present one example for visual observation comparisons in Figure  6. CycleGAN is an unsupervised-based learning algorithm. Unfortunately, its recovery performance is far from enough. In contrast, our network performs better than other models and deals with majority of fog. Since other comparing results contain more haze and color distortion, which keep consistent with the above PSNR results.

2) Evaluation on SateHaze1k Dataset
To objectively evaluate the aerial image dehazing performance, we compare our method with DCP [6], DehazeNet [19], FCTF [30] and CGAN [29]. Table 5 and Figure 7 show quantitative and qualitative results on the SateHaze1k dataset, respectively. The Satehaze1k dataset is more challenging because non-uniform haze is a common state in aerial images. Through Table 5, our proposed method gets the highest scores both in PSNR and SSIM except for the PSNR of thick fog. As displayed in Figure 7, DehazeNet leaves too much haze in the dehazed results, particularly in the thick fog condition. Zooming the color boxes, the main drawbacks of CGAN is that it tends to cause color distortion and blur the contents. Compared to the above approaches, our method can remove various densities of haze and preserve color and structural information.

D. RESULTS ON REAL-WORLD DATASETS
For further general verification in practical use, we conduct experiments on two real-world hazy datasets. For the cases without ground truth, we only perform qualitative comparison. Figure 8 compares the results on the Overwater dataset of all competing methods visually, including DCP [6], GFN [22], CGAN [35] and CycleGAN [35]. As hazy aerial images from ocean scenes are more complex, all the competing methods fail to recover high-quality clear images. It can be observed that the proposed method significantly competitive VOLUME 4, 2016 9

1) Evaluation on Overwater Dataset
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3186004    others in removing the majority of haze while preserving image structure even in the light effecting and dark surrounding. Obviously, developing unpaired learning-based dehazing method is more valuable for real-world generalized scenes.

2) Evaluation on RealHaze Dataset
Similarly, we also further evaluate competing algorithms on RealHaze real-world dataset, including DCP [6], GFN [22], CGAN [35] and CycleGAN [35]. As shown in Figure 9, it can be observed that the proposed method still exhibits remarkable performance, and it can recover more clear images with truthful structures. This benefites from the contribution of the loss committee we established. In general, the above experiments reveal our method performs well on image dehazing in various real hazy scenarios, demonstrating both the generality and effectiveness of the proposed ACC-GAN.

E. ABLATION STUDIES
We study the impact of different loss components, asymmetric architecture, and negatives choices on the final dehazing performance. All the studies are performed in the same environment by using the RESIDE dataset.

1) Study on Loss Committee
To better demonstrate the effectiveness of our loss committee, we regularly remove one member to each configuration at one time. The values of PSNR are listed in Table 6. We observed that the best dehazing performance obtains 24.35 dB by using all the above loss terms, which indicates that each design strategy that has been considered has its own contribution to the final performance of ACC-GAN, especially the contrastive loss.

2) Study on Asymmetric Architecture
We remove the feature transfer network that acted on the farward cycle path as the comparison model. As reported in Table 7, the proposed asymmetric CycleGAN architecture in ACC-GAN works better in PSNR than symmetrical model. Thanks to the further performance gains brought by the feature transfer network. The reason is that it helps to enhance encoded features from hazy domain to haze-free domain.

3) Study on Selection of Negatives
The most important step in contrastive learning is how to select the negatives. In our work, we sample 256 negatives from within the same image (internal). Similar to [36], We further use external negatives instead of internal negatives to investigate the impact of different strategies. Figure 10 provides the visual dehazing results on a sample image. We can observe that the performance goes down sharply with external negatives, which also proves that internal negatives are more effective than external.

F. LIMITATIONS
When training on small-scale datasets, our proposed method has limitations in dehazing performance, because contrastive  learning often requires a large number of sample pairs to deliver satisfactory performance. In addition, another limitation is that the network training of GAN is unstable, and a large number of parameters need to be adjusted to converge.

V. CONCLUSIONS
In this paper, inspired by the significant advantage of contrastive learning, we are the first attempt to construct a novel Asymmetric Contrastive CycleGAN framework (called ACC-GAN) for aerial image dehazing. The proposed framework aims to learn an effective unsupervised dehazing model from an unpaired set of clear and hazy aerial images. Instead of only employing adversarial learning and cycle-consistency constraints, we introduce contrastive regularization to maximize the mutual information between the hazy domain and the haze-free domain. Furthermore, an additional feature transfer network is integrated into GAN to form asymmetric structure, which further enhances encoded features from hazy domain to haze-free domain. To avoid the dehazed results suffer from color-/structure-destroying effect, we combine multiple loss functions into a loss committee as the taskspecific proxy guidance for the better jointly optimization. Experimental results on synthesis and real-world datasets considerably demonstrate that the effectiveness and generalization of our designed model. In future research work, we plan to explore the contribution of contrastive representation learning scheme in other low-level vision tasks, such as aerial image denoising and deblurring. 13 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3186004