Residual Dense Blocks and Contrastive Regularization Integrated Underwater Image Enhancement Network

Due to severe information degradation, underwater image deblurring remains a challenging ill-posed problem. However, most deep learning models do not adequately exploit the hierarchical features of the original underwater images. They typically use clear images as positive samples to guide the training of image enhancement networks, while neglecting the utilization of negative information. In this paper, we introduce the Residual Dense Block (RDB) and Contrastive Regularization (CR) techniques. By leveraging the local and global feature fusion of RDB and the contrastive learning of CR, our model effectively extracts multi-level features from the original images, adaptively preserves hierarchical features, and achieves high-quality underwater image deblurring through learning from the original images. Experimental results demonstrate that our model outperforms other comparative algorithms in terms of subjective visual quality and objective evaluation metrics across four datasets.


I. INTRODUCTION
With the rapid advancement of computer technology, the widespread application of computer vision systems in underwater image enhancement has received increasing attention.Underwater images serve as essential information sources in marine environments, playing critical roles in marine resource exploration [1], underwater robot navigation [2], underwater monitoring [3], and other fields.However, the optical characteristics and water quality conditions in underwater environments often lead to challenges such as light attenuation, scattering, and absorption, resulting in low image quality, including blurriness, dimness, and lack of details.These challenges impose severe limitations on the successful implementation of underwater vision tasks.Therefore, enhancing underwater images to obtain high-quality underwater information becomes a prerequisite for advancing underwater research tasks in practical applications.
In recent years, numerous underwater image enhancement methods have emerged, exhibiting promising results.
The associate editor coordinating the review of this manuscript and approving it for publication was Jiajia Jiang .
However, some issues remain to be addressed: Most learning-based underwater image enhancement methods [4], [5] overlook the utilization of hierarchical features to extract more clues.Although they can acquire features from several convolutional layers, they fail to extract multi-level features from the original underwater images.Many existing methods often use clear images as positive samples [5], [6], employing L1/L2-based image reconstruction loss to guide the training of enhancement networks without the need for regularization.However, relying solely on image reconstruc-tion loss may not effectively handle image details, potentially leading to color distortion in the enhanced images.
To address the aforementioned issues and limitations, we introduced Residual Dense Blocks (RDB), consisting of Dense Connection Layers, Local Feature Fusion (LFF), and Local Residual Learning (LRL).RDB can adaptively extract local dense features and perform global feature fusion to adaptively preserve hierarchical features in a global manner.This enables the extraction of more features from the original blurry underwater images to a maximum extent.Additionally, we introduced a novel Contrastive Regularization (CR) inspired by contrastive learning.CR not only leverages clear images but also utilizes blurry images for contrastive learning, assisting the deblurring network in approaching positive images and moving away from negative images, thereby enhancing image enhancement effects.This simultaneous use of both blurry and clear images for contrastive regularization is believed to be the first in the field of underwater image enhancement.In contrast to other underwater image enhancement models that tend to optimize enhancement results, our model, by adding the RDB and CR modules, focuses more on mining additional features from the original images, resulting in clearer images.
The primary contributions of this paper can be summarized as follows: (1) A comprehensive end-to-end fully convolutional network is developed specifically for the enhancement of underwater images, denoted as RDCR.We introduce the RDB that enables feature extraction through continuous memory (CM) mechanism from preceding RDBs and utilizes local dense connections to better leverage all layers within.
(2) We introduce a novel Contrastive Regularization, which effectively aids the image enhancement network to approximate positive images and distance itself from negative images, without the need for introducing additional parameters during the testing phase.
(3) We evaluate the model on four different datasets against other underwater enhancement models.The results indicate that the proposed RDCR model outperforms other comparative algorithms in terms of both subjective visual effects and objective evaluation quality of the processed underwater images.

II. RELATED WORK
Owing to the robust feature extraction capabilities inherent to deep learning methodologies within the domain of low-level vision tasks, substantial progress has been achieved in underwater image enhancement, resulting in remarkably competitive performance levels.In the subsequent section, underwater image enhancement methods are systematically classified into two distinct categories: CNN-based and GANbased approaches, leveraging various network models [7], as shown in Table 1.Concluding this section, we furnish a comprehensive synopsis of the existing underwater datasets, thereby offering an encompassing portrayal of the prevailing state in this domain.

A. UNDERWATER IMAGE ENHANCEMENT METHODS 1) CNN-BASED METHODS
In the realm of CNN-based techniques for underwater image enhancement, the potency of convolutional neural networks is harnessed to extract pertinent information from input images, thereby facilitating the execution of image enhancement.The core emphasis lies in the acquisition of the mapping function that bridges underwater images to their corresponding ground truth counterparts.As an illustration, Wang et al. [8] introduced the UICE^2-Net approach, which innovatively amalgamates the RGB and HSV color spaces-constituting a pioneering endeavor in integrating deep learning within the purview of underwater image enhancement.This pioneering method showcased exceptional performance metrics on both artificially generated and authentic underwater images, thereby markedly ameliorating color distortion challenges.
Alternatively, Lyu et al. [9] devised a succinct yet efficacious CNN architecture for underwater image enhancement, delineating a dual-stage processing mechanism.This method encompasses a lightweight CNN-based enhancement phase, seamlessly coupled with YUV-based post-processing.Notably, this novel approach outperformed extant methodologies both in qualitative and quantitative assessments, all the while maintaining a reduced computational overhead.Liu et al. [10] introduced a supervised adaptive learning attention network that retains shallow information and adaptively learns crucial features of underwater images.This method exhibited strong performance across various underwater datasets.Li et al. [11] constructed the Water-Net, a CNN model for underwater image enhancement.Through comparative experiments, it demonstrated significantly better enhancement results compared to most other models.Additionally, they established a new underwater image dataset, providing more data for future underwater image enhancement research.

2) GAN-BASED METHODS
GAN-based approaches for underwater image enhancement utilize generative adversarial networks to generate lifelike underwater images through adversarial training, thus achieving enhanced image quality and addressing issues like color distortion and detail loss.Wang et al. [12] introduced a novel CA-GAN framework, which employs category-conditional attention generative adversarial networks specifically designed for underwater image enhancement.This framework establishes a many-to-one mapping function based on the category of underwater images, effectively restoring both color and detail.Notably, this method outperforms existing techniques on both synthetic and real underwater images, underscoring its efficacy and significance in the domain of underwater image enhancement.Liu et al. [13] introduced a conditional GAN-based deep network with a focus on multiscale feature fusion for the purpose of underwater image color correction.Through the extraction of multiscale features and the subsequent fusion of global features, this network facilitates expedited and proficient network learning.The result is pronounced enhancements in both color correction and detail preservation, thereby eclipsing the performance of contemporary techniques, as demonstrated through comprehensive experimental validations.Liu et al. [14] proposed a dual-contrastive learning-based underwater image enhancement method, aiming to address the issue of decreased object detection accuracy in underwater environments.It demonstrated significant improvements in both visual quality and target detection accuracy.
Hambarde et al. [15] introduced a UW-GAN framework for single underwater image depth estimation and enhancement.The framework consists of two networks, UWC-Net and UWF-Net, for coarse-level and fine-level depth estimation, respectively.The proposed UW-GAN exhibited superiority in single underwater image depth estimation and enhancement.Furthermore, other researchers [16], [17] proposed GAN-based adaptive underwater image enhancement methods, allowing dynamic adjustment of network structures or parameters during the training process to adapt to different input data or tasks, thereby achieving better performance in various scenarios or tasks.

B. DATASETS
Acquiring real underwater images is challenging due to the complex underwater environment.Thus, researchers address this by either synthesizing datasets through networks or collecting datasets with underwater robots.For instance, Islam et al. [18] established the EUVP dataset, containing paired and unpaired underwater images from seven cameras under varying visibility conditions.Fabbri et al. [19] used CycleGAN [20] to create image pairs for the Underwater Imagenet dataset.Li et al. [11] applied classic algorithms to process images and manually filtered clear ones, forming the UIEBD dataset.Duarte et al. [21] simulated underwater scenes by adding milk to water and captured a dataset.Li et al. [22] introduced the U-45 dataset with underwater-degraded images.Berman et al. [23] constructed the SQUID dataset with underwater stereo images.Islam et al. [24] developed the UFO-120 dataset, comprising training samples and a benchmark test set.Utilizing these datasets for training enhances underwater image quality and lays the foundation for datadriven enhancement.

III. METHOD
The overall architecture of the proposed RDCR model is illustrated in Fig. 1, which consists of four components.(1)The Multiscale Fusion Module (MFM) enhances effective connections of spatial information among the three branches.(2) The RDB utilize local dense connections to better leverage all layers and adaptively retain accumulated features through LFF.(3)The Three-Group Structure (3GS) increases the depth and expressive power of the network.(4)The CR aids the image enhancement network in better approximating positive images and moving away from negative images.Multiple loss functions are constructed to enhance the network's performance.

A. MULTISCALE FUSION MODULE
To enhance the extraction of global features from the input image, we designed three branches to learn convolutional kernels with receptive field sizes of 3 × 3, 5 × 5, and 7 × 7. Larger kernel sizes effectively capture the image's global features.The outputs of these branches are labeled as x1, x2, and x3.Additionally, we introduced a Multiscale Fusion Module (MFM) to merge the global features from branches with different kernel sizes.The MFM takes x1, x2, and x3 as inputs for multi-scale feature fusion, as illustrated in Fig. 2.

B. RESIDUAL DENSE BLOCK
We illustrate the details of the Residual Dense Block that we introduced in Fig. 3. Our RDB consists of dense connection layers, LFF, and LRL, resulting in a CM mechanism.
The Continuous Memory mechanism.The idea behind CM is to maximize the integration of information from all Conv layers.However, directly fusing feature maps from all Conv layers is impractical as it leads to a substantial feature stack.Instead, we achieve this by first adaptively fusing the information and then passing it for subsequent feature fusion.This is realized by propagating the state from the preceding RDB to every layer of the existing RDB.Let  Convolutional layer indexed as c within the d-th RDB can be expressed as follows: Local Feature Fusion: LFF is harnessed to intelligently amalgamate the state originating from the prior RDB with the complete Convolutional layer of the present RDB.As underscored previously, a pivotal aspect involves diminishing the number of features, achieved by directly integrating the feature maps from the (d-1)-th RDB into the d-th RDB through concatenation.Simultaneously, we introduce a 1 × 1 convolutional layer that exerts dynamic control over the output information.This composite action is denoted as the LFF operation.
where H d LFF represents the function of the 1 × 1 Conv layer in the d-th RDB.
Local residual learning.LRL is seamlessly integrated into the RDB framework to amplify information propagation and foster an expanded growth rate, a prudent consideration given the coexistence of numerous convolutional layers within a singular RDB.The derivation of the output at the d-th RDB transpires as follows:

C. THREE-GROUP STRUCTURE
The 3GS module is formed by concatenating three identical structures, as illustrated in Fig. 4.Each structure consists of two parts: Global Average Pooling (GAP) and Group Structure (GS).When input is fed into the module, it undergoes both GAP and GS operations, and their output features are multiplied together.Finally, shallow features are added to the result, yielding the output.The entire process can be described as follows: In this process, F out represents the output of the module.U GAP and U GS execute GAP and GS operations, respectively.X denotes the input feature to the current GS.⊕ represents the addition operation, and ⊗ signifies the multiplication operation.I n denotes the incorporation of shallow-level information.The GS comprises 20 parallel Attention Modules (PAMs).The sequential integration of these PAMs not only extends the network's depth but also augments its performance.Concluding the network, a convolutional layer is enlisted to effectuate dimensionality reduction, culminating in the creation of vibrant and luminous underwater images.

D. CONTRASTIVE REGULARIZATION AND LOSS FUNCTION
Inspired by contrastive learning [25], [26], we not only use clear images for contrastive learning but also incorporate blurry images as negative samples to constrain the solution space.Within our CR framework, a consistent set of intermediate features is extracted from a fixed pre-trained model denoted as G.These features are then employed to establish positive pairs, where clear images are paired with their enhanced counterparts, while negative pairs are formed by associating blurry images with their enhanced versions.By bringing the positive pairs closer in representation space and simultaneously pushing apart the negative pairs, our approach elevates the learning efficacy of the model.This culminates in the enhancement of image clarity in the resultant images.The CR loss function can be formulated as follows: In this process, X represents a blurry image, Y is the corresponding clear image, φ denotes the entire underwater image enhancement network, and φ (X , ω) represents the enhanced image obtained after passing through the network.The initial component corresponds to the reconstruction loss, which serves to align the reconstructed image with its corresponding ground truth in the data domain.In this context, we opt for the utilization of the L1 loss function, as empirical evidence underscores its superior performance in comparison to the L2 loss counterpart.
In the subsequent term, denoted as ρ { G (X ) , G (Y ) , G [φ (X , ω)]} , we introduce contrastive regularization.This exerts opposing forces within the same latent feature space: pulling the enhanced image φ (X , ω) closer to the clear image Y , while pushing it away from the blurry image φ (X , ω).The hyperparameter β balances the reconstruction loss and CR.To enhance contrastive ability, we extract hidden features from different layers of a fixed pre-trained model.Consequently, the overall CR loss function can be further formulated as follows: In this process, G i , i = 1, 2, . . ., n is responsible for extracting the i-th hidden feature from a fixed pre-trained model.The representation of D (x, y) signifies the L1 distance between x and y.Accompanying this, ω i functions as a vital weighting coefficient.
For the preservation of edge information, we incorporate the Laplacian operator as an edge detection mechanism, aimed at accentuating the intricacies within the generated underwater images.The structural attributes of the Laplacian template are delineated as follows: Let the images generated by RDCR be denoted as I g , and their corresponding reference images be named I l .Additionally, we apply the Laplacian operator to convolve I g and I l .The Laplacian loss L lap is defined as follows: Here, N represents the total number of pixels in the image, and i denotes the pixel positions.2.
We utilized the Adam optimizer, employing a learning rate of 0.0001 for training the RDCR network.The batch size was configured as 4, and the training process extended over 40 epochs.Implementation of the model was carried out using the PyTorch deep learning framework, with computations executed on an NVIDIA GeForce RTX 3090 GPU equipped with 24GB of memory.
B. QUALITATIVE ANALYSIS Fig. 5 presents the performance of various models on green and blue underwater images.Firstly, it can be observed that both the HE and UCM models increase the brightness but suffer from severe overexposure and exhibit a red color cast.On the other hand, the UDCP, RGHS, and ICM models show unsatisfactory results on green color-shifted images.For UDCP, the brightness reduction on blue underwater images exacerbates the color shift, and ICM produces more blurred results.The LANet model improves both brightness and contrast, but its color correction on blue color-shifted images is not ideal.Conversely, the WaterNet performs poorly on green color-shifted images and even produces blurrier results on blue underwater images, lacking clarity.
By inspecting the images, it is evident that RDCR demonstrates significantly superior results on green color-shifted images, even surpassing the reference images in color correction.Although its color correction on blue underwater   images may not be ideal, RDCR excels in terms of brightness, contrast, and clarity compared to other models.images, it is evident that the UDCP model produces significantly subpar results with darker brightness.Although the RGHS and ICM models improve brightness on low-light underwater images, their results on turbid underwater images exhibit blue color shifts and blurry edge regions.The HE model suffers from severe overexposure, while the UCM model shows an overall yellow color cast.Both LANet and WaterNet produce results on turbid underwater images that appear as if covered by a layer of haze, and although they enhance brightness on low-light underwater images, they exhibit slight yellow color shifts, making them less realistic compared to the reference images.In contrast, the RDCR model demonstrates exceptional results in both environments.It enhances brightness and clarity, showing the closest resemblance to the reference images, thus better matching the real underwater conditions.

C. QUANTITATIVE EVALUATION
After qualitative analysis, we conducted quantitative data analysis using four evaluation metrics: Structural Similarity (SSIM), Peak Signal-to-Noise Ratio (PSNR), Underwater Image Quality Measure (UIQM), and Underwater Image Colorfulness Measure (UICM).The results of various models are presented in Table 3 and Table 4, where bold fonts indicate the best-performing model.From the tables, it is evident that RDCR consistently achieves top performance across all four datasets, except for a slightly lower UICM score on the UFO-120 and UIEBD datasets, still outperforming most methods.This demonstrates the strong generalization ability of our RDCR model, yielding excellent results across multiple datasets.Additionally, the enhanced images produced by RDCR exhibit vibrant colors, high clarity, and reduced noise, making them visually closer to the reference images.

D. ABLATION STUDY
To validate the effectiveness of the introduced modules in RDCR, we conducted several ablation studies and the results are shown in Table 5 and Table 6.The specific conditions are as follows: (1) NCR group: The CR module is removed from the network.(2) NRDB group: All RDB modules are removed from the network.(3) 2GS group: The 3GS module in the network is replaced with two GS modules.(4) 1GS group: The 3GS module in the network is replaced with one GS module.We compared the introduced LANet with the NCR group, NRDB group, 2GS group, and 1GS group as part of the ablation experiments.The purpose was to investigate the impact and contribution of the individual modules in RDCR.
The data in the table reveals that across the four datasets, the outcomes of NCR, NRDB, 2GS, and 1GS significantly surpass those of the original images.This substantiates the crucial contributions of each module within the RDCR network.Furthermore, RDCR consistently outperforms other models in terms of both SSIM and PSNR metrics, underscoring its superiority.While most of the data for UIQM and UICM metrics also favor RDCR, there are three instances where it slightly lags behind 2GS and 1GS.Thus, we can infer that the GS module might exert a dampening effect on UIQM and UICM metrics under specific circumstances, although its predominant impact remains positive in the majority of cases.

V. CONCLUSION
In this work, we have proposed an underwater image enhancement network that combines Residual Dense Blocks (RDB) and Contrastive Regularization (CR).By introducing RDB, we extracted multi-level features from the original underwater images, adaptively preserving hierarchical features.Our core module, the 3GS module, enhances the image's color and brightness features.We utilized CR for contrastive learning and employed multiple loss functions.This not only involved contrasting with clear images but also constrained the enhancement process using the original images to achieve high-quality underwater image enhancement.Through qualitative and quantitative experiments, the results demonstrate that RDCR significantly improves image color and brightness, yielding outstanding clarity.It performs well across various datasets, showing strong generalization.Particularly, it excels in addressing green color bias and low-light conditions.On the widely-used UIEBD dataset, RDCR achieves an SSIM of 0.8842 and a PSNR of 23.20dB, surpassing the second-best results by 0.0101 and 0.15dB, respectively.However, the model's performance is less favorable for images with blue color bias, showing room for improvement.Our future research will be directed towards enhancing this aspect.
F d−1 and F d denote the input and output of the RDB at the d-th layer, both comprising G 0 feature maps.The outcome of the 113020 VOLUME 11, 2023Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Here, σ
signifies the output post application of the ReLU activation function.W d,c denotes the weight of the c-th Conv layer, and for the sake of simplicity, we omit bias terms.We presume F d,c comprises G feature maps.F d−1 , F d,1 , . . ., F d,c−1 is the concatenation of feature maps from the (d-1)-th RDB and feature maps derived from 1, . . ., (c − 1) within the d-th RDB, culminating in the G 0 + (c − 1) × G feature map.The antecedent RDB and the output from each layer form direct connections with all ensuing layers, thereby retaining the feed-forward structure and concurrently extracting dense local features.
We conducted separate experiments on the following datasets: EUVP, UFO-120, UIEBD, and Underwater Imagenet.Each dataset underwent division into training and testing subsets, ensuring uniform comparison among all methods within a consistent experimental framework.The specific partitioning details are outlined as follows: (1) EUVP Dataset: For training, 2000 pairs of images were randomly selected, while 200 pairs were designated for testing to evaluate underwater scenes.(2) UFO-120 Dataset: In the training phase, a random sample of 1200 image pairs was chosen, with 150 pairs used for rigorous testing in underwater settings.(3) UIEBD Dataset: Containing 890 pairs of underwater images, the UIEBD dataset allocated 800 pairs for training and the remaining 90 pairs for meticulous testing.(4) Underwater Imagenet Dataset: 2000 pairs of images were randomly chosen for training, alongside another 200 pairs for rigorous testing.The quantity of training and testing images per dataset is summarized in Table

FIGURE 5 .
FIGURE 5. Running results of various models on blue and green underwater images.

FIGURE 6 .
FIGURE 6.Running results of various models on low illumination and turbid underwater images.

Fig. 6
Fig. 6 illustrates the performance of various models on low-light and turbid underwater images.By examining the

TABLE 1 .
Advantages and limitations of different methods.

TABLE 2 .
The number of tesing and training images.

TABLE 3 .
Results of various models on EUVP and UFO-120 data sets.

TABLE 4 .
Results of various models on UIEBD and Underwater Imagenet data sets.

TABLE 5 .
Results of ablation experiment on EUVP and UFO-120 data sets.

TABLE 6 .
Results of ablation experiment on UIEBD and Underwater Imagenet data sets.