Underwater Image High Definition Display Using the Multilayer Perceptron and Color Feature-Based SRCNN

High-definition display technology for underwater images is of great significance for many applications, such as marine animal observation, seabed mining, and marine fishery production. The traditional underwater visual display systems have problems, such as low visibility, poor real-time performance, and low resolution, and cannot meet the needs of real-time high-definition displays in extreme environments. To solve these issues, we propose an underwater image enhancement method and a corresponding image super-resolution algorithm. To improve the quality of underwater images, we modify the Retinex algorithm and combine it with a neural network. The Retinex algorithm is used to defog the underwater image, and then, the image brightness is improved by applying gamma correction. Then, by combining with the dark channel prior and multilayer perceptron, the transmission map is further refined to improve the dynamic range of the image. In the super-resolution process, the current convolutional neural network reconstruction algorithm is only trained on the Y channel, which will lead to problems due to the insufficient acquisition of the color feature. Therefore, an image super-resolution reconstruction algorithm that is based on color features is proposed. The experimental results show that the proposed method improves the reconstruction effect of the image edges and texture details, increases the image clarity, and enhances the image color recovery.


I. INTRODUCTION
In the process of image formation, due to the influence of severe weather and the limitation of the equipment, the details of the images are often lost in the process of image transmission and storage, which reduces the image resolution. In recent years, as the most direct way to obtain information, images are used in important applications such as facial recognition, medical imaging, video monitoring, remote sensing imaging, computer vision and other fields. In the current research field, the methods for image superresolution mainly include interpolation [1]- [5], reconstruction constraint [6], [7] and learning [8]- [10].
The associate editor coordinating the review of this manuscript and approving it for publication was Huimin Lu. For example, in terms of the reconstruction-based methods, Stark and Oskoui [14], Wernick and Chen [15], and Stark and Olsen [16] were the first to propose a convex set projection method to solve the problem of image superresolution. In this method, the intersection of the solution space of a highresolution image and a constraint set representing the highresolution image is obtained iteratively to determine a smaller solution space to complete the reconstruction of the highresolution image. A point is selected from the high-resolution image space as the starting point, and the next point satisfying all the constraint convex sets is obtained by many continuous iterative projections to obtain the high-resolution image. The convex set projection method provides a simple way to solve the problems of superresolution images and makes full use of the prior knowledge, which can better guarantee the quality VOLUME 7, 2019 This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/ of the edges and details in high-resolution images. However, the method largely depends on the random selection of the initial point (leading to unstable and not unique solutions), and it has slow convergence, large computational requirements and low convergence stability. Later, Rasti et al. [17] proposed adding a Bicubic interpolation and Bicubic sampling to each Iterative Back Projection (IBP) iteration, which could reduce the mean square error in each iteration and thus improve the effect of the image reconstruction. However, the series of methods mentioned above only apply to the IBP algorithm to improve the quality of the reconstructed high-resolution image but do not replace the traditional interpolation algorithm to estimate the initial image, so there will still be jagged edges. In the IBP recovery of superresolution images, the possible sawtooth effects in the edge position, considering the original image, are often nonlocal redundancies. Kai and Shifei [18] believe that in the process of iterative projection, the full use of the similarity of the nonlocal information can reduce the reconstruction error, the initial value of the fixed IBP estimates and the error accumulation to solve the above problems. Irani and Peleg [19] proposed the iterative back-projection method and used it to solve the problem of superresolution images. This method first obtains the initial estimation of the high-resolution image through interpolation. If the image obtained by the initial estimation is equal to the original high-resolution image, then the low-resolution image obtained by the initial estimation is equal to the actual low-resolution image. Otherwise, the error is back-projected and corrected. The iterative process is terminated when the error is acceptable. This method is simple and intuitive, with a small computational load and fast convergence. However, it is difficult to select the inverse projection operator, and the solution is usually not unique. In addition, the errors of each iteration are uniformly added to the reconstructed image, so there are jagged edges in the image. Based on the learning methods, Fan et al. [10], Jobson et al. [9], and Huang et al. [20] proposed the neighborhood embedding method, first for the low resolution image block set selection and the input image of the closest k neighbors, and again for a constrained least square solution value to obtain the appropriate weights to use for the reconstruction of the high resolution image block. The neighborhood embedding method is simple and direct, and its dependence on the sample set is greatly reduced. However, the number of neighbors is fixed and controlled by the user, which will cause overfitting or under-fitting and result in blurring in the final image reconstruction. Yang et al. [12] proposed a basic method based on sparse coding (ScSR) to reconstruct high-resolution images, where the ScSR method places the sparse constraint between the low resolution and high resolution image blocks in the process of jointly training the dictionary, and so they can use the same sparse representation low resolution images and the corresponding high resolution image blocks. Reconstructing the superresolution image sparse representation requires only two concise learning dictionaries rather than a huge image block training library; so this method is highly efficient and has good scalability and good noise resistance. However, the regularization effect of the coefficient representation in the reconstruction algorithm is not obvious. The dictionary is not sufficiently complete (resulting in the image superresolution being limited to a specific field), and the quality of the edge details in the reconstructed images is not high. Glasner et al. [13] proposed the anchoring field regression (anchored neighborhood regression, A+) method to study a sparse dictionary and return a fixed number of fast superresolution dictionary atoms. This method calculates the dictionary in the neighborhood of the atom rather than using a direct calculation in a low resolution image block in the neighborhood, which can reduce the complexity and operation time. However, there is an extreme case where, instead of looking for the atom's nearest neighbor, all the atoms use the same mapping matrix, leading to a significant reduction in the computational load. In practice, however, this extreme approach fails to adapt to specific input characteristics, resulting in reduced flexibility.
Considering the learning methods based on neural networks, Dong et al. obtained inspiration from the sparse coding method in [1], and they were the first to put forward a kind of classic (LR) method for the end of the (HR) SRCNN super-resolution convolutional neural network algorithm. The traditional SR method is integrated into a convolutional neural network learning model, which greatly simplifies the SR workflow. The image was amplified to the target size through interpolation, and then feature extraction, nonlinear mapping and feature reconstruction were carried out through the three-layer convolutional neural network to achieve end-to-end image reconstruction. The SRCNN has a simple structure, easy convergence and low computational complexity. Compared with the traditional SR algorithm, the SRCNN has been greatly improved. However, the SRCNN's single-channel training cannot meet the requirements of higher color and precision. According to the characteristics of the convolutional neural network, the single-channel input can only obtain part of the texture information and color features of the image, and only multichannel input can be used for network training to obtain more information and color features. To solve the above problems, this paper proposes a convolutional neural network algorithm based on color features. The algorithm improves the input channel based on the SRCNN. The original Y channel input was improved to RGB multichannel input training, and then the output images were fused to extract more color features and high-frequency information to achieve better reconstruction of high-resolution images.
The structure of this paper is as follows: Section 1 briefly introduces the research methods, the advantages and disadvantages of image superresolution technology and the algorithm in this paper. Section 2 introduces some methods and techniques used in this paper. Section 3 describes the improved algorithm in detail. Section 4 presents the experiments and an analysis of the improved algorithm. Finally, the conclusion is given.

II. RELATED WORK
To the best of our knowledge, there are no researchers focused on studying underwater image high definition displays. We discuss the background of the related works in the following sections.

A. MULTILAYER PERCEPTION
The multilayer perceptron [21]- [24] (MLP) is also known as an ANN (artificial neural network). In addition to input and output layers, it can have multiple hidden layers in the middle.
The obtained result is processed by a predefined activation function, f, which can be described as follows: The most common activation function used in a perceptron is the hyperbolic tangent function, tanh, which is expressed as The training of the ANN adjusts the weight and bias values to obtain the desired output according to the input combinations.

B. THE SRCNN
The combination of image superresolution and convolutional neural networks and the integration of traditional SR methods into a deep learning model can effectively simplify the neural network and reduce the number of parameters for the neural network. The convolutional neural network is applied to the superresolution image. By learning the feature mapping relationship between the input and output, the neural network realizes the image reconstruction process from the low resolution to high resolution. The SRCNN network structure is constructed by a three-layer convolutional neural network, which is composed of image block feature extraction and representation, and also nonlinear mapping and reconstruction of high-resolution images.
In the first layer, feature extraction and representation is applied to the image blocks from the original image, and the image blocks extracted by convolution can be expressed as: where Y represents the original high-resolution image after interpolation amplification, W 1 and B 1 represent the convolution kernel and deviation, respectively and * represents the convolution operation. The size of W 1 is n 1 × c × f 1 × f 1 , f 1 is the size of the filter in the first layer; c is the number of channels contained in the input image; and n 1 is the number of convolution kernels in the first layer. The convolution kernel, W 1 , is convolved with the original image, Y , and then the deviation, B 1 , is added. Then, the characteristic graph obtained by the convolution is processed by the activation function, RELU. The RELU activation function takes 0 and the maximum value in the convolution result as the final value. This layer outputs the n1 dimensional feature mapping as the input of the second layer. The second layer of nonlinear mapping is as follows: the output n 1 dimensional feature mapping of the first layer is the nonlinear mapping into an n 2 dimensional feature space, that is, it is a mapping from low-resolution space to highresolution space, which can be expressed as: where W 2 and B 2 represent the convolution kernel and deviation, respectively. The size of W 2 is n 1 × n 2 × f 2 × f 2 ; f 2 is the size of the second layer filter; and n 2 is the number of convolution kernels in the second layer. This layer outputs an n 2 dimensional feature mapping as an input to the third layer.
For the reconstruction of high-resolution images in the third layer, a convolution is conducted on the output highresolution image blocks of the second layer to generate images that are close to the original high-resolution images, which can be expressed as: where W 3 and B 3 represent the convolution kernel and deviation, respectively. The size of W 3 is c × n 2 × f 3 × f 3 , and f 3 represents the size of the filter in the third layer.
The whole training process of the SRCNN involves the estimation and optimization of the parameters. The mean square error between the generated image and the original high-resolution image is calculated to minimize the error and obtain the optimal solution. The mean square is expressed as: where n is the number of training samples. The SRCNN uses the standard back-propagation stochastic gradient descent method to minimize the loss function.

III. THE PROPOSED METHOD
To improve upon the abovementioned method, we propose a multilayer perceptron-based underwater image enhancement method, followed by a color feature-based superresolution method.

A. MULTILAYER PERCEPTRON-BASED ENHANCEMENT
This paper proposes an improved scheme for underwater image enhancement, which is divided into two steps. The first step is image defogging. The second step enhances the image details and improves the dynamic range. First, the Retinex algorithm is used to initially defog the image. The formulas are shown in Eqs. (4) and (5). Second, due to the low contrast of underwater images, the image brightness was adjusted by a gamma correction to make the image more natural.
The preprocessed image can be obtained from the following equation:  where r(x, y) is the image enhanced with the Retinex algorithm and r (x, y) is the Gamma corrected map. Finally, a dark channel prior is used, and the contrast stretch technique is applied to improve the dynamic range of the image.
where t (x, y) = MLP[t (x, y) ],t(x, y) is the transmission map of the dark channel of r (x, y).

B. COLOR FEATURE-BASED SUPERRESOLUTION
Youm et al. [11] proposed that training that increases the number of input channels can extract more image features, obtain more high-frequency information and reconstruct high-resolution images better. The method in this paper is based on an SRCNN composed of three convolution layers to improve the input image. The method in this paper is divided into three steps. The first step is to divide the low-resolution image into three separate images with three RGB channels.
In the second step, the three images are trained by a convolutional neural network to obtain three output images. In the third step, the obtained three images are fused to obtain the final high-resolution image reconstruction. The SRCNN flowchart for this paper is shown in Figure 1. The first step is channel processing. The original image is divided into R, G and B channels to extract different information from each channel. The segmentation processing formula can be expressed as: In the above formula, i represents the three channels R, G and B, and F represent the input image of the segmented original image as the input image of the neural network after processing.
The second step is the convolutional neural network training. The images of the three different channels are trained by a convolutional neural network. The training formula can be expressed as: In the above formula, W i is the convolution kernel of each layer in the convolutional neural network, B i is the bias of each layer in the convolutional neural network, and F i(j−1) is the output of the color channel after the fifth convolution.
The third step is image fusion, where the output image of the second step is fused. The fusion formula can be expressed as: To obtain the low-resolution image, Fi, the subimage after feature extraction of the original image is first sampled after being blurred by Gaussian filtering, and then bicubic interpolation (BI) is used to enlarge to the image to same size as the atomic image.
As mentioned above, more color features and highfrequency information can be obtained by using RGB multi-channels as the input image, which is more effective in SR reconstruction. Compared with other algorithms, the SRCNN has obvious advantages in terms of the evaluation indexes such as the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM), and its operating speed is fast. However, the input image is only a single Y channel image, and the extracted image features are limited, which leads to problems such as unclear edges and textures.

IV. EXPERIMENTAL RESULTS AND DISCUSSION
A. VISUAL ANALYSIS Superresolution images have a profound importance in the field of marine ecology since studying the ocean is of great significance for disaster prevention, marine resource development and underwater environment monitoring; thus, it is important to research the improvement of underwater images. In obtaining underwater image information, unlike the characteristics of ground images, the underwater optical image is usually affected by low illumination, high turbidity scattering and wavelength absorption, of which scattering and color change are the two main sources of underwater image distortion. Scattering is caused by large suspended particles, such as when turbid water contains a large number of particles. The color change or color distortion corresponds to the different degrees of light attenuation from different wavelengths when they propagate in water, which makes the underwater environment appear bluish. Many scholars have made numerous contributions in this field. For example, Serikawa and Lu [26] and Lu et al. [27], [28] proposed an optical field imaging method to solve the underwater imaging problem under the influence of low illumination, high turbidity scattering and wavelength absorption for underwater optical images and used a deep convolutional neural network for depth estimation to solve the descattering problem of optical field images. In addition, a color correction method based on spectral features is proposed for color restoration. A new underwater model is proposed to compensate for the attenuation error along the propagation path, and a fast joint triangulation filter algorithm is proposed for the study of image defogging and underwater image enhancement. The enhanced images exhibit a reduced noise level, better exposure to dark areas, improved overall contrast, and significantly enhanced details and edges. At the same time, the cognitive ocean network (CONet) architecture and an important and useful demonstration of applying CONet were proposed and described in detail in terms of the integration of the ocean network and artificial intelligence, and the development trend of CONet research in the future was prospected. Therefore, to verify the universality of the method in this paper, some underwater images are selected for the test with an upsampling factor of 3.
In the first stage, to test the superiority of the improved method, it is compared with the methods in [24] and [25]. As shown in Figure 3, the method in [24] does not realize image defogging, and there is a large amount of haze in the image. The method in [25] has a good effect in defogging, but the details around the characters are blurred and the details are lost.
From Figure 3, the contrast of the original image is lower than that of the other images. The method in [24] improved the image brightness, but the details are fuzzy. For example, the darker area in Figure 3 cannot show the fish, and the position of [24] in Figure 3 is fuzzier than that in this paper. From the perspective of images 4 and 5, the defogging effect of [24] is relatively uniform, and the object cannot be highlighted. The image enhanced in [25] has a certain color difference, and some areas are lighter in color, which is almost the same as the surrounding color and leads to the loss of details. From the position of the red box in the picture, the face of the Buddha becomes gradually clear, and the color has been restored. Figure 4 shows the visual test results of the underwater image. As shown in Figure 4, compared with the bilinear interpolation (BI) technique, the image reconstructed by our method is clearer and produces more color features and high-frequency information. This is because our method learns more about the high frequency component from the input of multiple different channels than from the input of a single channel.

B. IMAGE QUALITY EVALUATION
To measure the performance of the different methods, the underwater image quality measure, UIQM, in the literature [29] has been adopted. The method comprises three attribute measures, namely, the underwater image colorfulness measure (UICM), the underwater image sharpness measure (UISM), and the underwater image contrast measure (UIConM).
where RG = R − G, YB = R+G 2 −B. The asymmetric alphatrimmed mean, The first-order statistic mean value, µ, represents the chrominance intensity. A mean value that is closer to zero in the RG − YB opponent color component implies a better white balance. The second-order statistic variance, σ 2 α 1 RG = 1 N N p=1 (Intensity RG,p − µ α,RG ) 2 , demonstrates the pixel activity within each color component.  where the image is divided into k 1 k 2 blocks, I max,k,l I min,k,l indicates the relative contrast ratio within each block, and the EME measures in each RGB color component are combined linearly with coefficients λ c , where λ R = 0.299, λ G = 0.587 and λ B = 0.114 are used according to the relative visual responses of the red, green, and blue channels.
The logAMEE is I max,k,l I min,k,l I max,k,l ⊕ I min,k,l ×log( I max,k,l I min,k,l I max,k,l ⊕ I min,k,l ) (16) where an image is divided into k 1 k 2 blocks, and⊕, ⊗ and are the PLIP operation, introduces the entropy-like operation to the traditional Agaian measure of enhancement by entropy(AMEE), which is formulated as the average Michaelson contrast in image local regions [29]. The overall underwater image quality measure is then given by where, c 1, c 2, c 3 are application-dependent parameters. As seen from Table 1, the improved algorithm proposed in this paper is better than that of Salazar-Colores [24] and Zhanget al. [25] in the measurement of image color, sharpness and contrast. However, for some images, the results were not as good as those of the two comparison methods. For example, the sharpness of Figure 3(b) is not as good as that of Salazar's method, and the contrast of Figure 3(c) is lower than that of Zhang's method. Our improved method has advantages in restoring the color loss. The color restoration effect of all the images is better than that of the other two methods. The color restoration is more natural and conforms to human vision. Table 2 shows the data test results of the underwater images in terms of PSNR and SSIM. From the results of Table 2, we can see that the method in this paper has better performance than the BI and SRCNN approaches in terms of PSNR and SSIM, and the reconstructed high-resolution image details and edges have better relative effects.

V. CONCLUSION
This paper presented an improved image enhancement method and an image superresolution method. The proposed enhancement method employed a combination of the Retinex algorithm and a neural network to enhance the details of the image and restore the image color. According to the underwater image quality measure (UIQM), the improved algorithm enhances the local details and contrast of the underwater image and restores the image color. We also presented an improved color feature-based image superresolution algorithm. Referring to the network structure of an SRCNN, a convolutional neural network model that is trained on the R, G and B channels of an image is adopted, and the output images are fused to obtain clear textures and edge effects under the premise of ensuring the PSNR indexes. Although the algorithm proposed in this paper has a good reconstruction effect on most images, its advantages are not obvious for images with nonobvious edges and irregular textures. Simultaneously, a simple cat-operation was used for the three-channel image fusion. The next step is to try to classify and reconstruct the edges and textures to generate images with clear edges and rich textures and further study image fusion.  SEIICHI SERIKAWA received the B.S. and M.S. degrees in electronic engineering from Kumamoto University, in 1984 and 1986, respectively, and the Ph.D. degree in electronic engineering from the Kyushu Institute of Technology, in 1994, where he is currently the Vice President. He is also a Professor with the Center for Socio-Robotic Synthesis and the Department of Electrical and Electronic Engineering. His current research interests include computer vision, sensors, and robotics.