Benchmarking Underwater Image Enhancement and Restoration, and Beyond

Image enhancement and restoration is among the most investigated topics in the field of underwater machine vision. The objective image quality assessment is a fundamental part of optimizing underwater enhancement and restoration technologies. However, most no-reference (NR) metrics are not specifically designed for underwater image quality assessment. Moreover, since the reference (undegraded) images are not available in underwater scenes, the classical full-reference (FR) metrics cannot be used to evaluate underwater image enhancement and restoration methods. In this paper, we first design an underwater image synthesis algorithm (UISA), in which depending on the real-world underwater image, we can produce a synthetic underwater image from an outdoor ground-truth image. Based on this strategy, we establish a new large-scale benchmark that contains ground-truth images and synthetic underwater images of the same scene, called synthetic underwater image dataset (SUID). Our SUID is constructed on the basis of the underwater image formation model (IFM) and characteristics of underwater optical propagation, possessing solid reliability and feasibility. The proposed SUID creates possibility for a FR evaluation of existing technologies for underwater image enhancement and restoration, which is illustrated by performing extensive experiments and quantitative analysis. The SUID is available online at: http://dx.doi.org/10.21227/agdr-y109.


I. INTRODUCTION
With the wide underwater applications including underwater geological exploration, underwater biological exploration, object recognition, artificial intelligence and other related activities, underwater image processing has attracted more attentions in recent years. But unfortunately, the quality of captured underwater images are often in poor visibility conditions such as low contrast and blur, which is caused by absorption and scattering when light travels through water. To address these problems, a series of underwater image enhancement and restoration technologies are applied to underwater vision tasks.
Considering whether relying on the image formation model (IFM) or not, underwater image processing The associate editor coordinating the review of this manuscript and approving it for publication was Claudio Cusano . technologies can be divided into two categories: image enhancement and image restoration [1]- [3]. Enhancement algorithms always achieve the goal of image optimization depending on quantitative objective standards. In these algorithms, the factors leading to the degradation of underwater images were not taken into account. Numerous enhancement technologies have been proposed to improve the visibility of underwater degraded image, such as histogram equalization [4]- [7], Retinex based algorithms [8]- [12], particle swarm optimization (PSO) [13]- [15] and fusion based algorithms [16]- [21]. Unlike enhancement schemes, restoration algorithms generally devote to recovering the original reflection characteristics of the scene based on the underwater image degradation model. The underwater image formation model was primly proposed by McGlamery [22] and Jaffe [23], known as the Jaffe-McGlamery model, which has been applied in the development of various underwater image acquisition systems. In 2006, Trucco and Olmos-Antillon [24] proposed a self-tuning filter for underwater image restoration based on the simplified Jaffe-McGlamery model. To properly address the issue of blurred underwater images arising from light scattering by ripples and suspended particles, Hou et al. [25] combined underwater optical properties with deconvolution to estimate the light scattering parameters. Among the IFM-based methods [26]- [31], dark channel prior (DCP) [26] is widely used because the underwater imaging model is highly similar to the outdoor fogging model. In 2013, a novel underwater dark channel prior method (UDCP) [27] is designed to estimate the medium transmission map by only considering the blue and green channels. Afterward, Galdran et al. [28] introduced a red channel prior method derived from the deformation of DCP to restore underwater image. In 2016, also inspired by DCP method, Gao et al. [29] constructed a bright channel prior (BCP) to estimate the transmittance and background light. Recently, several variational approaches [32]- [37] and deep learning methods [38]- [40] have been gradually put forward for dehazing and denoising. Obviously, underwater image enhancement and restoration has attracted extensive attentions among researchers and has also gained many desirable achievements. But the performance assessment of these methods on the enhanced and restored results highly relies on subjective observation, which lacks objectivity and consistency.
Actually, image quality assessment (IQA) [41]- [44] plays an important role in analyzing and evaluating the performance of image enhancement and restoration algorithms. Since the reference (undegraded) images are not available in underwater scenario, the no-reference image quality assessment (NR-IQA) strategies [45]- [53] are commonly adopted without requiring any reference images. In general, these methods can be divided into two categories depending on whether considering the prior knowledge of the distortion type: distortion-specific (DS) [45]- [49] and non-distortion-specific (NDS) [50]- [53]. However, these NR-IQA methods are not specially designed for underwater image. In 2015, Yang and Sowmya [54] designed an evaluation method namely the underwater color image quality evaluation (UCIQE) metric to quantify the color cast, blurring and contrast. The UCIQE metric was generated by linearly combing the chroma, saturation and contrast. Afterward, Panetta et al. [55] proposed a novel non-reference underwater image quality measure (UIQM) metric inspired by the human visual system. The UIQM consisted of three underwater image attribute measures including colorfulness measure, sharpness measure and contrast measure. Partly, NR-IQA methods can be successfully employed to quantitatively evaluate the underwater image enhancement and restoration methods. However, the existing classical full-reference (FR) evaluation are not available for evaluating these schemes on account of lacking underwater reference image. This limitation would hinder the progress of underwater image enhancement and restoration technologies, and quality evaluation. To fill this gap, some researches attempted to develop the benchmark dataset to evaluate underwater image enhancement and restoration methods. Berman et al., [56] Li et al. [57] and Liu et al. [58] respectively built a real-world underwater image dataset, namely SQUID dataset, UIEB dataset and RUIE dataset, which enables us to analyze the existing underwater image enhancement methods in terms of NR metrics, but it cannot be adapted to the FR underwater image quality evaluation. In [59], a 3D TURBID dataset was constructed to simulate underwater environment. Unfortunately, they only produce different levels of turbid scenario containing 30 images, which limits its application. In 2019, Sánchez-Ferreira et al. [60] presented an underwater image database UID-LEIA with the help of laboratory of embedded systems and integrated circuits applications. They produced 135 greenish degraded images with three levels of turbidity by mixing different amounts of green tea. In [61], Li et al. proposed an underwater image synthesis algorithm to generate underwater image degradation datasets (UIDD) with different turbidity types. Similarly, in [62], Uplavikar et al. also constructed a dataset that contains different Jerlov water types. In these two approaches, they both synthetised underwater images from indoor groundtruth images based on random background and depth.
In this paper, we propose an underwater image synthesis algorithm (UISA) and construct a new large-scale underwater image benchmark, namely synthetic underwater image dataset (SUID). The main contributions of our work include: (i) The proposed UISA is based on the underwater IFM, which assure its accuracy. Moreover, in our synthesis strategy, we utilize hierarchical searching and red channel prior algorithms to acquire the underwater background light (BL) and transmission map (TM) from the real-world underwater image, respectively, which assure its robustness.
(ii) The synthetic underwater image dataset (SUID) is generated from outdoor ground-truth images by assigning the estimated BL and TM into the underwater IFM, yielding more natural synthetic results.
(iii) The constructed large-scale SUID contains 900 degraded images with different turbidity types and degradation levels by reconstructing four common underwater challenge scenes including greenish scene, bluish scene, low-light scene, hazy scene.
(iv) The proposed SUID can provide a rich variety of criteria for evaluating underwater image enhancement and restoration algorithm. Depending on the SUID, the existing classical FR metrics becomes available for underwater image quality evaluation. This can remedy the shortage of inconsistency results between subjective evaluation and some NR metrics.
The rest of our work is organized as follows. The theoretical foundation of underwater image formation model is briefly given in section II. Section III describes the proposed UISA to estimate the underwater BL and TM. Section IV introduces the constructed SUID and the qualitative and quantitative evaluation results on several state-of-the-art algorithms are further presented to verify the reliability and availability of the SUID. Finally, the conclusion is provided in section V.

II. UNDERWATER IMAGE FORMATION MODEL
In this section, we focus on the introduction of underwater image formation model (IFM). Understanding the underwater IFM can help us better design a robust and effective strategy to produce a semblable synthetic image. It is well known that the propagation of light in the water is different from that in atmosphere. Light often suffers from absorbing and scattering when it travels through water. Absorption results in a loss of energy as light travels through the medium, depending on the refractive index of the medium. Scattering leads to a deflection from the propagation path. In underwater environment, the decay of light is related to the wavelength of the color. As a matter of fact, red light decays fastest because of its largest wavelength, followed by yellow light and green light. We can see that all red light goes off at the depth of 3m from Fig. 1. The orange and yellow disappear at the depth of 5m and 10m, respectively. Finally, the green and blue disappear at further depth. Actually, blue light can still travel beyond 30 meters. Therefore, underwater captured images are usually characterized with green-bluish tones.
According to the Lambert-Beer empirical law [63], the intensity of light decays exponentially in water. The irradiance E from position o to position d can be modeled as where c is the total attenuation coefficient, a and b are the absorption coefficient and scattering coefficient. In generally, typical attenuation coefficients for bay water, coastal water, and deep ocean water are 0.33m −1 , 0.2 m −1 , 0.05 m −1 , respectively.
In 1980, the underwater image formation model was originally presented by McGlamery [22] and extended by Jaffe [23] in 1990. In Jaffe-McGlamery model, the underwater optical imaging process can be represented as being the linear superposition of three component: direct transmission component, forward scattering component, and background scattering component, which is illustrated in Fig. 1. Mathematically, where E T is the total irradiance energy entering to the camera, E d refers to the light directly reflected into the camera by an object, E b denotes the light that enters the camera when ambient light is scattered by plankton and suspended particles, E f refers to the light deviates from camera. For an underwater image I λ (x), E d can be defined as where J λ (x) is the undegraded image, t λ (x) refers to transmittance, λ represents the color channel, λ ∈ {R, G, B}, x is the pixel coordinate.
Further, E b can be defined as where B λ denotes the background light. Generally, the forward scattering can be ignored because of the relatively close distance between the underwater scene and the camera. Following the previous researches [27]- [29], [61], [62] by only considering the direct transmission component and background scattering component, the simplified underwater IFM can be expressed as:

III. PROPOSED UNDERWATER IMAGE SYNTHESIS ALGORITHM (UISA)
The proposed UISA method is composed of two main parts. First, we estimate the underwater background light (BL) and transmission map (TM) from a real-world underwater degraded image. After that, we use the estimations to generate a synthetic underwater image from a natural ground-truth  image based on the underwater IFM. Fig. 2 illustrates the flowchart of generating synthetic underwater images by the proposed UISA.
From (5), we can see that acquiring accurate values of BL and TM plays a key role to synthesize a semblable underwater image. In the following subsections, we will introduce the technologies to estimate the TM and BL from a real-world underwater image.

A. UDWATER BACKGROUND LIGHT ESTIMATION
The global background light B is usually determined by simply picking the brightest pixel in the image. However, due to the influence of artificial light or a spot appearing in a living creature, this kind of selection is ill-suited for underwater scene. To improve its robustness, we employ a quad-tree subdivision based hierarchical searching method [64] to estimate the underwater BL. An example to estimate underwater BL is illustrated in Fig. 3.
First, we divide a real-world underwater image into four equal rectangular sub-regions, and then define a score for each of them. The score is the subtraction between the mean pixel values and their standard deviation in the region. The candidate region is the one with the highest score. The candidate region is defined as: where UL, UR, LL, LR represent the upper-left area, upperright area, lower-left area and lower-right area of image, respectively.
we repeat this operation until the size of the candidate region is smaller than the pre-specified threshold. The threshold is set as the number of pixels in the candidate area is less than 1% of the original image. As shown in Fig. 3(b), the red block is the selected candidate region. After that, we sort the pixels in the candidate area in descending order, and finally pick the pixels located in the 1/4 position as the value of estimated background light.

B. MEDIUM TRANSMISSION MAP ESTIMATION
The most popular and effective method to estimate transmission map is dark channel prior (DCP) proposed by He et al. [26]. However, it is well known that different wavelengths of light have different attenuation rates in underwater environment. Actually, the red channel often suffers an aggressive attenuation and loses its intensity rapidly, making it always with lowest intensity. In this situation, the traditional DCP method is not suitable for underwater scene. To tackle this issue, we employ a red channel prior (RCP) method [28] to estimate the transmission map by starting from the red channel. Deriving from DCP method, the red channel prior can be stated that where J RED represents red channel of image J , (x) is a local patch around the location x.
Given an estimated BL, the TM of red channel can be expressed as: According to the optical properties of underwater scenes, the propagation distance of red light is different from that of blue VOLUME 8, 2020 and green lights. Therefore, the calculation of the medium transmittances of green and blue channels is different from that in the red channel. Since the scattering coefficient of light under water has a linear correlation with the wavelength of light of different colors [65], which is expressed as where b λ is scattering coefficient, β λ is the wavelength, τ (β λ ) is an identical value. According to [66], the expressions of total attenuation coefficient of different color channels are given by where c G c R and c B c R represent the ratios of total attenuation coefficient of green-red and blue-red, respectively.
Then, the underwater TMs of the green and blue channels can be determined according to the Lambert-Beer law in (1) and (3), which are stated as Finally, we can obtain a synthetic underwater image from a ground-truth image via (5) after estimating the TM and BL from a real-world underwater image.

IV. DATASET AND EVALUATION
Since undegraded images are not available in underwater scenario, the classical FR metrics are not available for underwater image quality evaluation. To overcome this limitation, we construct a large-scale synthetic underwater image dataset (SUID). Moreover, its application in the performance evaluation of underwater images enhancement and restoration technologies will also be discussed in this section.

A. SYNTHETIC UNDERWATER IMAGE DATASET (SUID)
Given a real-world underwater image, we can obtain the underwater BL and TM by using the proposed UISA. After that, we can further produce a synthetic underwater image based on the underwater IFM in (5). Due to scattering and absorption of light, the captured real-world underwater image can be mainly classified into four types of scenarios [57]: greenish scene, bluish scene, low-light scene and hazy scene. Some examples in Fig. 4 illustrate the real-world underwater images with these four different underwater scenes. For each ground-truth image, we generate 30 synthetic underwater images according to 30 different real-world underwater images. In total, we synthesize 900 underwater images in our SUID. Due to limited space, several parameters estimated from different types of real-world underwater images are presented in Table 1. Additionally, based on these estimated BL and TM, 80 synthetic underwater images generated from 10 outdoor ground-truth images are shown in Fig. 5 for a sample, respectively. It is evident that the synthetic underwater images are close to the real-world underwater images in their appearance and characteristics. To verify the reliability and availability of our SUID, we present several samples of comparison of histogram distribution between our dataset and the other two benchmarks: UID-LEIA [60] and UIDD [61]. To be fair, we randomly pick out seven sets of synthetic images and seven real-world images with different water types or turbidity levels for comparison. Additionally, their corresponding histogram distributions are also accordingly displayed. Here, three levels of water turbidity and four different water types are illustrated in Fig. 6(a) and 6(b), respectively. Moreover, we further perform quantitative comparison on the similarity of histogram distributions between these synthetic and real-world underwater images. Their similarity is measured by two different metrics: Euclidean distance (ED) and Bhattacharyya coefficient (BC). In the case of ED and BC metrics, the lower ED (higher BC) denotes the synthetic result is close to the real-world image. As visible, among these compared synthetic underwater images we tested, our SUID comes out as more semblable across these different degradation types, demonstrating its reliability and efficiency. Regarding the ED response, our SUID achieves lower scores compared with the FIGURE 5. Synthesized results generated by employing the proposed UISA from the 10 ground-truth images with the parameters given in Table 1, respectively. (a) Ground-truth images, (b) synthetic underwater images with bluish sense, (c) synthetic underwater images with greenish sense, (d) synthetic underwater images with low-light, (e) synthetic underwater images with hazy scene. other two datasets. Similarly, in most cases, our BC is higher among them. As expected, the better performer benefits from estimating the BL and TM from the real-world underwater image, rather than random assigned.
In addition, we further employ Fourier spectrum to visualize the energy distribution of synthetic underwater image in the frequency domain. Fig. 7 shows four different types of underwater images from our SUID for a sample, along with their Fourier spectrum. Comparing Fig. 7(a) and Fig. 7(c), we notice that more dark spots appear in the Fourier spectrum of our synthetic underwater images, which indicates that their contrast and edge sharpness are decreased. Comparing Fig. 7(b) and Fig. 7(c), it is evident that the characteristics of Fourier spectrum obtained from synthetic sample results are visually more similar to the real-world underwater images. Thus, both the histogram similarity measure and Fourier spectrum validate the reliability and veracity of the synthetic underwater image generated by the proposed UISA.

1) QUALITATIVE COMPARISON ON SUID
In the following qualitative comparison experiments, we perform the above-mentioned enhancement and restoration methods on 10 synthetic underwater images for a sample. In Fig. 8(a) and Fig. 8(b), 10 synthetic underwater images generated from 10 outdoor ground-truth images using the proposed UISA are illustrated respectively. Their enhanced and recovered results are presented subsequently in Figs. 8(c)-(l).
From Fig. 8(c), we can observe that the HE method can effectively enhance the contrast and brightness. However, it may also amplify the noise and lead to an over-enhanced outcome in the bright region. It can be observed from Fig. 8(d) that the CS algorithm achieves a better outcome among these compared methods. Similarly, FE method has a good performance on color correction and contrast enhancement, which is close to the restored results of CS approach, as shown in Fig. 8(e). Fig. 8(f) and Fig. 8(k) show that the UDCP method and Proximal Dehaze-Net method both succeed in removing VOLUME 8, 2020 haze, but UDCP method fails to enhance contrast. Compared with UDCP method, Proximal Dehaze-Net method can remain more details and suppress the noise. In Fig. 8(g), the RCP method achieves a good recovery of the visibility loss, but may produce some over-saturated regions due to the color lines prior, especially in the hazy scene. Fig. 8(h) shows that although the visibility of the restored image is improved after using BCP algorithm, the serious color imbalance appears in some regions (ie. the white badminton in the second image). As shown in Fig. 8(i), DCE algorithm can effectively increase contrast and brightness, and unveils more valuable information, but the proposed method can not remove the effects of noise. It can be observed from Fig. 8(j) that the restored results of IBLA has little effect on dehazing, and may bring color distortion, especially in the bright background region (ie. the white flowers in the fourth image). In Fig. 8(l), after using DehazeNet-HWD method, the recovered images become less saturated, but the sharpness and contrast are greatly enhanced, and the visibility is improved. Based on the visual observations, the proposed SUID can provide a subjective test for evaluating the image enchantment and restoration methods.
For a more objective assessment, we employ a singlestimulus (SS) method to rate the recovered images obtained by these ten compared methods based on ITUR BT.910 [70]. In all sessions, we used a LED Lenovo display monitor with the size of 21.5-in. and a settled resolution of 1920 × 1080. The contrast and brightness of the monitor was set to '90' and '40', respectively. The 15 observers seated in front of the monitor, with an appropriate distance about 0.5m between their eyes and the monitor. We perform this experiment in a separate room with the natural light. In this session, 30 synthetic images are randomly chosen from the SUID. The recovered 30×10 images were divided into 30 sets of 10,  and randomly assigned to observers. To rate the images, the observers are asked to score the 300 images from '0' (low quality) to '10' (high quality), depending on their own preferences. The presentation of each test image was displayed  for 5s, and the rating time should be less than 10s. The experiment was conducted by using the neuro behavioral systems [71] to collect the data entered by each observer. The subjective mean opinion scores (MOS) for each compared method are presented in Table 2. In fact, none of them can obtain the best results for all tested images. However, from Table 2, it can be seen that FE, CS, DCE, and RCP methods achieve relatively higher values of MOS, indicating a superiority in the perceptual assessment.

2) QUANTITATIVE COMPARISON ON SUID
The proposed SUID enables us to objectively evaluate the underwater image enhancement and restoration algorithms by using FR indicators. In what follows, to validate the reliability and availability of the constructed SUID, we further quantify the performance of these compared methods by employing both several no-reference (NR) metrics, full-reference (FR) metrics, respectively.
First, we employ five NR metrics namely blind referenceless image spatial quality evaluator (BRISQUE) [48], naturalness image quality evaluator (NIQE) index [49], multi-task end-to-end optimized deep neural network (MEON) [53], underwater color image quality evaluation (UCIQE) [54], and underwater image quality measures (UIQM) [55] to provide the associated quantitative comparison. In this part, we evaluate the ten compared methods on 30 synthetic underwater images from SUID. For each NR metric, we present a graphical display of 8 images with different degraded types for a sample shown in Fig. 9. In addition, the average values of these five NR metrics obtained from different compared methods are given in Table 3. From Fig. 9 and Table 3, we can notice that the evaluation result of each image is consistent with the average result. In Table 3, the higher scores of MEON, UCIQE, and UIQM indicate superior performance on color rendition, contrast enhancement, and visibility improvement. In contrast, lower values of BRISQUE and NIQE show a better result. From Table 3, FE and CS methods achieve better results in terms of NIQE, UCIQE and UIQM, while UDCP and Proximal Dehaze-Net methods produces the worse outcomes, which is in accordance with the visual results shown in Fig. 8.
In addition, following other researchers, peak signal to noise ratio (PSNR), noise quality measure (NQM) index [72], universal quality index (UQI) [73], structural similarity index (SSIM) [74], visual information fidelity (VIF) measure [75] and information fidelity criterion (IFC) measure [76] are further used for FR assessment. The indicators PSNR and NQM are used to assess the ability to suppress the noise. Higher PSNR and NQM values indicate less noise in the images. UQI index is designed by combing other three factors: loss of correlation, luminance distortion, and contrast distortion. The SSIM indicator is normally employed to measure the recovered information of luminance, contrast and structure. VIF is an image quality evaluation index proposed by combining natural image statistical model, image distortion model and human visual system model. A larger value of UQI or SSIM or VIF indicates that it achieves a better outcome. Similar to Fig. 9, the sample results of FR metrics evaluation for each image are given in Fig. 10. Their average values of the selected 30 images calculated by different methods are respectively presented in Table 5.    Table 3.
obtained from CS, DCE, and HE methods indicate that the noise is also amplified when enhancing edges and details. To better display their performance, we rank the results of each algorithm from 1 (best) to 10 (worst), as shown in Table 4 and Table 6.
Moreover, we compute the Pearsonlinear correlation coefficient (PLCC) and Spearman rankorder correlation coefficient (SROCC) between the IQA metrics (NR and FR metrics) outputs and the MOS to analysis their statistical relationship. We perform this test on the 300 recovered images produced from 30 synthetic images by the ten enhancement and restoration algorithms. The calculated values in terms of PLCC and SROCC are summarized in Table 7 and  Table 8, respectively. As shown in Table 7, the PLCC and SROCC values obtained for the NIQE metrics are 0.7214 and 0.6621, respectively. The PLCC and SROCC computed for BRISQUE are 0.4322, 0.3243, respectively, which have the lowest values among these five NR metrics. Table 7 demonstrates that NIQE metric presents the highest PLCC and SROCC values, followed by UCIQE and UIQM metrics. That's because NIQE was developed based on natural scene statistic (NSS) model, and UCIQE and UIQM were specifically designed for underwater image evaluation. From Table 8, we can notice that the PLCC and SROCC obtained for PSNR is the lowest with 0.3354 and 0.4276, which is consistent with the results presented in Table 5. The SSIM and VIF metrics have a better correlation with MOS than NQM, UQI, and IFC. Taking into account the results of Table 7 and VOLUME 8, 2020    Table 5. Table 8, it can be concluded that the NIQE and SSIM can be chosen as objective function to guide the enhancement and restoration process among all the tested metrics because of their higher correlations with MOS.

V. CONCLUSION
In this paper, we construct a large-scale synthetic underwater image dataset containing 900 images with different degraded types and turbid levels based on the proposed underwater image synthesis algorithm. In our work, the underwater BL and TM are firstly estimated from real-world underwater image by using the proposed UISA. Then, a synthetic underwater image is generated by assigning the values of the acquired BL and TM into the underwater IFM. Experiments give an honest view on subjective and objective quality assessments in terms of MOS, NR evaluation, FR evaluation to demonstrate the efficiency of our SUID. Extensive qualitative and quantitative experimental results demonstrate that the proposed SUID can be used as a benchmark to test various enhancement and restoration algorithms on underwater vision applications. Based on the SUID, we are able to evaluate the state-of-the-art underwater image enhancement and restoration algorithms by employing both NR and FR metrics. The main contribution of the SUID is the presence of the clear image that creates the possibility for a FR underwater image quality evaluation. In addition, since none of a single method can be used to address all the problems, the analysis of the application of SUID can provide a guidance for optimizing underwater enhancement and restoration algorithms. Despite of the good performance, our SUID also has some limitations. First, the SUID is constructed based on the simplified underwater IFM without considering the effect of forward scattering. Actually, motion blurring caused by forward scattering exist in some real underwater images. In addition, the influence of nonuniform illumination from artificial lighting source is also not taken into account. Comparing with some real-world underwater image captured in the extreme deteriorated scenarios, some features such as light source, plankton, light spot are not reproduced in the synthetic underwater image.