Underwater Image Enhancement Method Based on Feature Fusion Neural Network

Aiming at the problems of uneven illumination of underwater image caused by supplementary illumination in deep-sea and night waters, image noise, low contrast and color deviation caused by suspended particles in water, a new underwater image enhancement method under non-uniform illumination is proposed. The heterogeneous feature fusion module is designed to fuse different levels and different levels of features, so as to improve the overall perception ability of the network to detail information and semantic information. Secondly, a new feature attention mechanism is designed to improve the traditional channel attention mechanism, and the improved channel attention and pixel attention mechanism are added to the heterogeneous feature fusion process to strengthen the ability of the network to extract pixel features with different turbidity. Then, the dynamic feature enhancement module is designed to adaptively expand the receptive field to improve the adaptability of the network to the image distortion scene and the ability of model conversion, and strengthen the network’s learning of the region of interest. Finally, the color loss function is designed, and the absolute error loss and structural similarity loss are jointly minimized to correct the color deviation on the basis of maintaining the image texture. A multi-scale feature extraction module is designed to extract different levels of features at the beginning of the network, and the output results are obtained through the convolution layer with jump connection and the attention module. The experimental results on several data sets show that this method can have good results in processing synthetic underwater images and real underwater images, and can better restore the image color and texture details compared with the existing methods. It conforms to the characteristics of human vision, and the visual effect is better than the existing underwater image enhancement algorithms.


I. INTRODUCTION
The complex underwater environment will lead to the problems of color deviation, haze effect and low contrast of the image, which seriously hinders the robot's visual perception of the underwater scene [1]. Therefore, it is necessary to study the underwater image enhancement algorithm to improve the performance of underwater robot performing advanced The associate editor coordinating the review of this manuscript and approving it for publication was Tao Zhou . vision tasks. Underwater image enhancement can be roughly divided into traditional enhancement methods and enhancement methods based on deep learning. Traditional methods can be roughly divided into non physical model-based methods and physical model-based image enhancement methods. Methods based on non physical models, such as histogram equalization [2], gray world hypothesis [3], wavelet transform [4], automatic white balance [5], improve the quality of underwater image by directly adjusting the image pixel value. However, due to the lack of physical model, it is easy to cause problems such as large image color deviation and image artifacts. The methods based on physical model include the restoration method based on image defogging model Jaffe mcglamery [6], [7], such as dark channel prior [8], [9], fuzzy prior [10], [12], etc. The method based on physical model depends on the assumption of a priori underwater physical formation image model, the process of calculating features is very complex, and the a priori assumption is ideal and has great limitations. It can not well realize the color correction of underwater image and remove artificial light spot.
The characteristics of color deviation and fuzzy boundary in marine biological image lead to the poor effect of existing image detection algorithms for marine biological image detection. Researchers have proposed a variety of improved algorithms. Ping et al. [13] added the trained underwater image enhancement GAN generation network to the YOLOv3 detection network as its enhancement network, so that the enhancement part of the network is more conducive to the improvement of marine biometric recognition accuracy. Kun et al. [14] applied the automatic color equalization algorithm to the image enhanced by the multi-scale retina with color restoration algorithm to optimize the image color and brightness. At the same time, they replaced the residual module in the YOLOv3 feature extraction network with dense blocks to effectively strengthen the feature propagation, But this method makes the detection accuracy not very high. Song et al. [14] proposed a method for detecting underwater. This method realizes high-precision detection of small sample data sets by combining image enhancement algorithm with mask RCNN, but the disadvantage is that the model detection speed is low and the practicability is poor, and this method is not suitable for long-distance underwater target detection. Wang et al. [15] proposed an improved image enhancement algorithm based on adaptive algorithm to clearly process the collected underwater degraded image and detect marine or GANisms using YOLOv3, but the detection accuracy of sea cucumber and scallop is not high. The existing algorithms still can not achieve high detection accuracy in marine biological image detection, and there is still room for improvement.
Common image displacement detection methods include template matching algorithm based on gray scale, such as Normalized Cross Correlation [16], sum of absolute difference algorithm, etc. But only whole pixel displacement can be detected; Matching algorithms based on feature points, such as accelerated up robust features (surf), scale invariant feature transform (SIFT), etc. [17] The above methods can be classified as spatial detection methods. This kind of algorithm not only has low detection accuracy, but also the gray information in spatial domain is vulnerable to noise, which affects the matching accuracy Methods based on phase information in frequency domain include phase correlation algorithm (PCA) [18], phase difference algorithm [19]. Frequency domain information has better anti-interference ability than spatial domain the underwater image enhancement algorithm does not consider the physical model in the imaging principle, and achieves the purpose of image enhancement by processing some mathematical models in the spatial domain or frequency domain. He et al. [20] calculated the atmospheric light component A and transmittance T through a priori principle, and obtained the target value to realize the defogging processing of the image. The underwater image is similar to the fog image. This model is also often used for underwater image enhancement, but it does not perform well when processing the image with low exposure. After processing, the overall color of the image is black and the background is blurred. Ancuti et al. [21] processed the images with white balance and sharpening, and adopted the multi-scale fusion strategy to fuse the images with white balance and sharpening, so as to improve the global contrast and edge definition. However, it has limitations in processing deep scenes and failed to eliminate the blue-green background. Peng et al. [22] used the underwater scene depth estimation based on image blur and light absorption for the image formation model to restore and enhance the underwater image. However, this method only considers the scene depth and background light and ignores the noise in the water, which will aggravate the image noise in the restoration process. Song et al. [23] proposed a backlight database based on manual annotation, which processes the underwater image according to the relationship between the background light and the histogram distribution of the underwater image, but the background light is too compensated, the processed underwater image is overexposed, the background is blurred, and the details in the image are also covered by the white fog caused by the over compensated background light. The above algorithm is only improved for individual cases in underwater environment, and its practical application has limitations. The evolving fusion based visibility restoration model has outstanding advantages in the overall enhancement effect, but it does not meet the expectations in the details of individual underwater blurred images [24].
To sum up, the deblurring effect of the traditional network model has been improved, but the training efficiency is low, as Table 1. When restoring the blurred image with low resolution, the image will still have the problems of artifacts and untrue image details, which seriously affects the quality of the restored image and cannot meet the subsequent detection requirements. In order to improve the illumination unevenness, blur and color distortion of underwater image, VOLUME 10, 2022 an underwater enhancement algorithm based on guided filter and adaptive operator is proposed. Firstly, according to the principle of rapid attenuation of visible light and red light in underwater environment, an adaptive color correction operator is designed for RGB three channels to restore the real color of underwater image, make the image more color contrast and enhance the appreciation of underwater image. Then put the image into the model combined with guided filtering and Retinex model, and use the characteristics of guided filtering to smooth the underwater image while retaining the image edge information and enhance the definition and detail information of the underwater image. Finally, the image weight is calculated and multi-scale fusion is carried out according to the image weight to avoid the artifact phenomenon of general image fusion. Experiments show that compared with the existing algorithms, this algorithm can better restore the color of the image in the atmosphere, has better numerical performance, and has more image detail information. Improve the low exposure of dark areas and enhance the overall contrast of the image. And the above GAN based image enhancement algorithm directly learns the degradation

A. DYNAMIC HETEROGENEOUS FEATURE FUSION NETWORK
In order to solve the problems of uneven enhancement effect and loss of image details in existing methods, an end-to-end heterogeneous feature fusion and dynamic feature enhancement underwater image restoration network is designed in this paper. The network structure is shown in Figure 1. Based on the encoder decoder structure, feature attention module, dynamic feature enhancement module and heterogeneous feature fusion module are integrated. The improved attention mechanism is added to the encoder structure to gradually extract different types of features from low level to high level; The dynamic feature enhancement module enhances feature extraction in low resolution space; The decoder uses the up sampling operation to reconstruct the feature vector and gradually restore it to the corresponding clear image. With the deepening of the network layer, the shallow features gradually degenerate, and the feature fusion module is used to integrate the low-level and high-level information.
Row fusion can improve the information transmission between layers and reduce the loss of detailed features. Among them, the encoder is composed of 8x down sampling operation and improved feature attention mechanism. 8x down sampling consists of a standard 7 with a step of 1 × 7 convolution and 3 standard convolutions with 2 steps, and the decoder uses transpose convolution as the corresponding up sampling method to gradually restore the image resolution.
Because most encoder decoder structures treat the whole image indiscriminately in the process of image processing, when the image information is large, more irrelevant features are extracted, which is difficult to adapt to the complex and changeable underwater environment. Therefore, attention mechanism needs to be added to improve the accuracy of feature extraction of the model. In order to solve the above problems, this paper improves the channel attention mechanism and pixel attention mechanism, and designs the feature attention module to improve the adaptability of the network to different turbidity regions of underwater images. Fig. 2 shows the structure diagram of the feature attention module. The local residual structure is designed. The output features of the upper layer are first convoluted, and then converted into nonlinearity through the leaky relu activation function. The output results are superimposed with the input features, and the superimposed features are convoluted again. The obtained features are introduced into the improved channel attention mechanism and pixel attention mechanism to extract the more implicit mapping relationship.
Previous channel attention mechanisms usually use global average pooling for input feature f to convert channel related information into channel descriptors g c .
Under the effect of global average pooling, the shape of the characteristic graph changes from C × H × W becomes C × 1 × 1. The full connection layer is used to obtain the weights of different channels, reduce the complexity of the model by reducing the dimension, project the channel features into the low-dimensional space, and then re map. This method will destroy the direct correspondence between the channel and the weight and affect the prediction effect of channel attention. In order to improve the correlation between channels, this paper replaces the full connection layer in the channel attention module with a fast one-dimensional convolution with a kernel size of k, and all channels share learning parameters.
Among them, σ Represents sigmoid activation function, conv D 1 represents 1-dimensional convolution. The mapping function between channel dimension C and kernel size k is designed as shown in formula (3), and the kernel size k is determined adaptively.
Given the channel dimension C, The k is expressed as: Among them, will γ set to 2, b set to 1, |ξ | odd representative ξ recent odd number. Finally, the input feature F is multiplied by the learned channel weight W c pixel by pixel, and the output channel feature F c is expressed as: Due to the simple encoder decoder network structure, the image content will be processed indiscriminately, and the turbidity produced by different underwater environments is not uniform. Therefore, as a supplement to the channel attention mechanism, this paper designs a pixel by pixel attention mechanism to enable the network to focus on the image area with higher turbidity. Similar to the channel attention mechanism, the feature F c passing through the channel attention mechanism is introduced into the pixel attention layer as the input of pixel by pixel attention. The pixel attention layer is three convolution layers composed of two prelu activation functions and one sigmoid activation function.
where conv stands for 1 × 1 convolution, δ Represents the prelu activation function and sigmoid activation function. After a series of convolution operations, the characteristic shape is determined by Finally, the input feature F c and the obtained weight WP are multiplied pixel by pixel to obtain the output feature FSC, which is expressed as: The ideal motion state of the linear motor mover is onedimensional rigid body translation, but in the actual working conditions, the longitudinal vibration of the mover is difficult to avoid. At this time, the camera fixed on the mover will produce the offset of the lens and affect the collected signal sequence Therefore, the aperiodic fence stripe image shown in Figure 3 is designed. This kind of image has two characteristics: 1) The information in the horizontal direction is aperiodic, which avoids the information overlap in the process of signal acquisition 2) The gray consistency in the vertical direction can ensure that the signal collected during lens offset is consistent with the signal information under ideal conditions Moreover, if the displacement of the whole image is calculated, because the gray of the image is consistent in the vertical direction, it will cause a lot of redundant calculation In this paper, the linear array camera is used to obtain the one-dimensional information in the horizontal direction of the image to improve the calculation speed of the algorithm For dimension. The expression of M non periodic stripes can be constructed through the following pixels.
Among G i is the gray value of the i-th stripe, and w i is the width of the i-th stripe. The x, y are the width and depths for the i-th pixel, respectively.

C. MULTISCALE FEATURE EXTRACTION MODULE (MFE) AND LOSS FUNCTION
In this network, in order to obtain different levels of features [25]. Among them, multi-scale refers to the hole convolution with different size and sampling rate = 1, 2, 4 to extract the characteristic information of different scale range. Due to the different expansion rate of hole convolution and different receptive fields, the size of characteristic area information obtained is different. In the 1-D case, given the 1-D input, the output of hole convolution is where, r is the expansion rate and k is the size of the receptive field. At that time, the expanded convolution corresponds to the standard convolution. As shown in Fig. 3, Four different points of the next layer are indicated by different colors. It can be seen that they are obtained from completely unrelated points of the previous layer, which may lead to mesh artifacts. It adds an additional separable shared convolution layer before expanding convolution, which increases the dependence between input units. The features obtained by smooth hole convolution with different sampling rates are weighted by the attention mechanism module, and then the features are connected together. After a layer of convolution, the features of different scales are extracted adaptively, and the obtained features are added with the input and input to the next stage

III. NETWORK IMPROVEMENT A. RETINEX MODEL
The absorption of light by the water body makes the deepwater world dark. It is necessary to take a clear underwater image through auxiliary light. However, due to the scattering of light caused by granular substances in the water, the underwater imaging contrast is low, the image details are blurred, and there are fewer observable things. Aiming at the influence of water scattering on underwater images, this paper improves the guided filtering of Gaussian low-pass convolution function in single-scale Retinex [28]. Therefore, in order to avoid the halo phenomenon, this paper selects the single-scale Retinex model. The edge details of underwater image processed by single-scale Retinex model are incomplete. In order to solve this phenomenon, guided filtering is introduced. The formula of single scale Retinex model is as follows: Retinex theory holds that the color, brightness, shape and details of the object by the human eye depend on the light irradiation and the reflection of the light on the surface of the object. That is, the image I (x, y) received by the human body's own visual sensor is composed of incident information L (x, y) and reflected information R (x, y). At present, single scale Retinex algorithm (SCR) mostly adopts Gaussian low-pass convolution [10] to obtain ideal illumination estimation. The specific calculation formula is as follows:L where:L(x, y) For ideal illumination estimation, it means convolution of the original image and Gaussian function [11]. The specific formula is as follows: where: σ Is the size of Gaussian function, and the coefficient K should meet: The specific formula of the ideal reflection image characteristic R (x, y) of the target object is as follows: The traditional simple weighted fusion has the advantages of easy implementation, simple principle, fast running speed, improving the peak signal-to-noise ratio of the image and reducing noise. However, this algorithm not only optimizes the image, but also reduces the detail information in the image to a certain extent, which makes the image edge incomplete, the image contrast low, artifacts and so on. In order to solve the above problems, this algorithm calculates Laplace weight, photometric weight and saliency weight, and introduces pyramid multi-scale fusion [30] to get a better image. The significant weight focuses on the significant objects that lose significance in the underwater image. Firstly, convert RGB image into lab image, calculate the mean value of three channels, and calculate the sum of the squares of the mean values of three channels. The specific formula is as follows: (16) where: W x is the significant weight, L is the brightness, Which a and b are two colors respectively; L mean is the average brightness, a mean and b mean is the color mean. Fig. 4 is a significant weight diagram after color compensation coefficient compensation and a significant diagram after improved guidance filtering. The significant weight tends to highlight the area with high brightness value. Therefore, the photometric weight based on reducing the saturation of the bright area is introduced. Firstly, RGB is converted to lab to obtain the bright channel L, and then the deviation between RGB tee and l bright channel is calculated. The specific formula is as follows: where: WL is the photometric weight, R, G, B are the pixel values of RGB three channels respectively, and l is the value of L bright channel.

B. DETECTION MODULE
In YOLOv4, the loss function loss is composed of CIO loss, confidence loss and classification loss, and BCE is used in classification loss Loss, the formula is: Among them, ω is the weight, m is the real value, and n is the predicted value. In the training set used in this paper, the number of four categories is unbalanced, as shown in Table 2, which will lead to the decline of precision measurement accuracy. Therefore, a focal Loss is used to replace BCE loss [31] in YOLOv4. Focal loss is to use an appropriate function to measure the contribution of difficult and easy to classify samples to the total loss α t = 0.25 not only adjust the weight of positive and negative samples, but also control the weight of difficult and easy to classify samples. among α t = 0.25, γ = 2 . The value is from the literature [12]. The model structure of YOLOv4 is shown in Fig.4. This paper expands the idea of literature, improves the red channel compensation strategy, and solves the problem caused by the direct application of gray world algorithm to partial green underwater images in the improved compensation strategy, the attenuation of the significant area of the red channel is increased. Finally, the compensation formula of the red channel in this paper is: Among them, G com is the improved red channel compensation rate in this paper, which is defined as: Among them, β And γ Is a constant, which is obtained from the experiment β = 13.15, γ = 3.79. After using the improved color correction algorithm to preliminarily correct the color deviation of underwater image, a simple feature extraction network is designed to obtain the feature F1 of VOLUME 10, 2022  color corrected image. Its structure is shown in Fig. 5 The structure is composed of two convolution layers. A convolution core with a size of 3 and a convolution step of 1 is used, and a batch normalization (BN) layer and a leaky relu activation layer are added after the convolution layer. The number directly below the blue block represents the number of characteristic channels. In this paper, the u-net feature extraction module is used to extract the features of underwater degraded images. The symmetrical codec structure is adopted, and a layer hopping connection is added between the corresponding codec layers. Its structure is shown in Fig.6, The feature extraction module is mainly composed of a lower sampling unit and an upper sampling unit. The lower sampling unit includes a convolution layer with convolution step of 1 and a convolution layer with convolution step of 2, while the upper sampling unit includes a convolution layer with convolution step of 1 and a deconvolution layer with convolution step of 2 In the feature extraction module, all convolution layers adopt convolution kernel with size of 3, and BN layer and leaky relu activation layer are added behind the convolution layer The underwater image is mapped into 32 channel feature F2 through the feature extraction network.
At the end of G, the features of the color correction image and the original underwater image are fused by multiplying the corresponding elements, and the fused features are mapped into the enhanced underwater image through the convolution layer Among them, the convolution layer uses a convolution core with a size of 1, and a sigmoid activation layer is added after the convolution layer.

C. LOSS FUNCTION DESIGN
There are some problems in underwater image, such as low brightness, image blur, color deviation and so on. The previous underwater image enhancement methods may not achieve good restoration effect only by using a single loss function. This paper comprehensively considers the minimization of absolute error loss L 1 , structural similarity loss lssim and color loss lcolor, and designs a loss function suitable for underwater images [33]. L 2 loss is often used in previous image restoration tasks, but due to its own defects, it will over punish large errors and reduce the quality of image restoration. In order to avoid this problem, this paper uses the minimization of absolute error loss, such as formula (14): where p is the index of the pixel and U is the pixel block; x(p) and y(p) are the values of pixel blocks in the enhanced image and pixels in the reference image, respectively. L I1 loss calculates the absolute value by comparing pixel by pixel to keep the brightness and color of the image unchanged. When using self encoder like structure for image reconstruction, the structure texture of the original image is easy to be distorted. Structural similarity [34] can measure image similarity by calculating the combined differences of brightness, contrast and structure between the original image and the label image. In underwater image enhancement and other low-level visual tasks, it is very necessary to ensure the consistency of image brightness, contrast and texture structure. The mean value is used to estimate the brightness, the standard deviation to estimate the contrast and the covariance to estimate the structural similarity, as shown in formula (22) among, u x and u y Represents the average of pixel x and pixel y;σ xy Represents the covariance of x and y;σ 2 x and σ 2 y represents the variance of x and y respectively; To prevent the denominator from being zero, set C 1 = 0.01, C 2 = 0.02. The expression of similarity loss of design structure is: Although the L1 loss has the function of calculating the color error, because it only measures the color difference between the generated image and the real image by comparing the pixel values, it can not guarantee that the color vector has the same direction, and this measurement method may still lead to obvious color difference. Therefore, this paper designs a loss function with color angle information. The color is further corrected by calculating the angle between the pixel color of the generated image and the real image pixel by pixel and summing it [35]. The expression of color correction loss is as follows: Including (·) P represents one pixel; (·) Is an operator that calculates the angle between two colors and converts RGB colors into three-dimensional vectors. Formula (24) pairsx i andỹ i Sum the angles between the color vectors of each pixel pair in the. To sum up, the total loss function L designed in this paper consists of three components. As shown in formula (25):

IV. EXPERIMENT AND ANALYSIS
The compensation amount of the red channel is proportional to the difference between the normalized mean of the green and red channels In order to avoid the supersaturation of the red channel in some areas when the gray world algorithm is used after the red loss compensation, the red channel compensation follows the principle of enhancing the area with small pixel intensity of the red channel and reducing the compensation of the area with large pixel value of the red channel However, when the algorithm is directly applied to the greenish underwater image, there may still be excessive red compensation, especially when the color deviation problem is serious, as shown in Fig. 7.
The essence of Gaussian low-pass filtering is to set different weights according to the distance between each point in the neighborhood and the central point for smoothing, so as to ensure the overall gray value characteristics of the image at the same time. However, after Gaussian low-pass filtering, the edge of the image is excessively smoothed, resulting in the lack of edge details in the ideal illumination image, and the resulting ideal reflection image inevitably lacks a large number of edge details. General filtering cannot distinguish between noise and edge. The pixel gradient around image noise is large, resulting in gradient fault at the edge [29]. The advantage of this paper is to use the gradient filter instead of the Gaussian filter to obtain the maximum illumination of the image. The edge details are preserved to the greatest extent, the essence of the image is preserved, and the purpose of noise removal is achieved.
As shown in Fig.8, the underwater image processed by Retinex with improved guidance filtering has higher definition and accuracy than single channel Retinex (SCR), multi-channel Retinex (MSRCR) [30] and the original image Reduction degree, which is closer to the image under natural light. In Fig. 9, after the image is locally enlarged by 300 times, the algorithm in this paper can smooth the image and remove noise better than SCR and MSRCR, while retaining the edge details of the image to make the overall details more prominent. The dark area of the enhanced image is brighter, the color is real, and the edge details remain intact.

A. OBJECTIVE INDEX AND ABLATION EXPERIMENT
In order to analyze and evaluate the effect of the algorithm more objectively, in addition to subjective feelings, full reference index evaluation and no reference index evaluation are also carried out. The test sets test A and test B containing label images were evaluated by Peak Signal to Noise Ratio (PSNR) and structural similarity index (SSIM). The higher the score of PSNR, the closer the enhancement result to the label image in image content, and the higher SSIM score, the closer the enhancement result to the label image in structure and texture. For the test set test C without label image, the enhanced image is evaluated by using the non reference underwater image quality indexes Underwater Image Quality Measure (UIQM) [33]. UIQM evaluates the degree of underwater image degradation by measuring the weighted sum of underwater image  color difference, underwater image sharpness and underwater image contrast. The larger the value, the better the image enhancement effect. As shown in Table 3, the algorithm in this paper achieves the best in test set test a and test B, which shows that the difference between the enhanced image and the label image is the smallest, and has achieved remarkable results in improving image brightness, contrast and maintaining texture structure. Compared with test a data set, the restoration effect of different methods on the real underwater image data set test B is relatively poor. Compared with the comparison method, the restoration effect of this algorithm on the real underwater image is still significantly improved. In addition, the UIQM index of this algorithm on the test set test C is slightly weak, and the FIEGAN method achieves the best Underwater Color Image Quality Evaluation (UCIQE), which shows that this method has achieved good restoration effect in improving image clarity, color saturation and color saturation, and is more in line with human visual perception.
In order to illustrate the effectiveness of each component of the network proposed in this paper, ablation experiments are carried out on the feature attention module CP (channel pixel), dynamic feature enhancement module DFE (dynamic feature enhancement), heterogeneous feature fusion module HFF (heterogeneous feature fusion), and loss function under the same experimental conditions. The basic network is composed of encoder, decoder and hop connection, and different modules are gradually added to the basic network. In the ablation experiments of different modules, only L1 loss is used to train the network, and euvp image data set is used for training and testing. The number of experimental steps is set to 52 10. On the basis of using only L1 loss, the structural similarity loss and color loss are gradually added to train the network, which proves the effectiveness of the designed joint loss function.
According to the one-dimensional rigid body translation motion characteristics of linear motor mover, a linear motor mover position detection platform based on linear array camera and aperiodic fence stripe image is designed As shown in Fig.10, the linear array camera is installed on the mover and remains relatively stationary with the mover, and the aperiodic fence image is fixed in front of the camera lens During the movement of the mover, the linear array camera collects the signal sequence in real time, calculates the sub-pixel displacement between adjacent signal sequences through this algorithm, and then obtains the actual displacement of the mover combined with the calibration coefficient The mathematical model is: the signal sequences of two adjacent frames collected by the linear array camera are I 1 (x) and I 2 (x) respectively, and there is a one-dimensional rigid body translation conversion relationship between them as shown in formula (26).
Among them X is the sub-pixel displacement between two frame signal sequences, which is accurately calculated by the algorithm in this paper X to obtain the mover displacement. Due to the low intensity of light and the gradient information of the CCD, the overall intensity of the image decreases, and the image is sensitive to the intensity of light in reference [18], Taylor series expansion of the two signal sequences to be displaced proves that when the gray gradient and sum of the signal sequence are reduced, the detection accuracy will also be reduced. Therefore, it is necessary to preprocess the collected signal sequence to improve the detection accuracy and anti-interference ability to the light The image enhancement method based on gray linear transformation method is simple in calculation and will not add too much burden to the calculation time of the algorithm, so it is introduced into the preprocessing process of signal sequence If the collected signal sequence is I (x), the signal sequence I e (x) enhanced by gray linear transformation is:: where [a, b] and [m, n] are the gray range of I (x) and transformed I e (x), respectively, [m, n] takes the maximum range of gray range, i.e.
[0255] As shown in Fig. 11, Fig.11 (a) and Fig.11 (b) are two-dimensional projection and gray distribution of reference signal sequence and signal sequence under low light intensity respectively It can be seen that the gray value of the signal sequence is generally low under low light intensity, and it can be seen from table 4 that the gray gradient and sum of the signal sequence collected under low light intensity are significantly reduced for the same target image Fig.11(c) and Fig.11(d) respectively show the signal sequence after image enhancement in Fig. 11(b) through histogram equalization and linear transformation method. The histogram equalization image has more distortion than the reference signal after enhancement, while the linear transformation method better retains the gray change characteristics of the signal.

B. COMPARISON BETWEEN iYOLOv4 AND YOLOv4 ALGORITHMS
There are 100 epochs in the whole training process. The first 50 epochs are trained by freezing network. The initial VOLUME 10, 2022 FIGURE 11. One dimensional signal sequence and its two-dimensional projection and gray distribution. learning rate is 0.001, and the cosine annealing is used to attenuate the learning rate, batch_ When the size is set to 64, the last 50 epoch thaw network training, the initial learning rate is 0.0001, and the cosine annealing is used to attenuate the learning rate, batchsize Set to 6. Using the same environment to train YOLOv4 and YOLOv4 respectively, the loss has reached a stable state after 100 times of training. Using Adam optimizer for parameter learning, it can iteratively update the weight of neural network based on training data, which is suitable for large-scale data and parameter problems. Take out 1000 images in the data set and use four classical optimizers: SGD, momentum, RMSprop and Adam to train 100 epochs in the same network. Since the loss values are relatively large at the beginning of training, the loss values of the first 10 epochs are discarded. The results are shown in Figure 12. It can be seen that the effect of SGD and momentum optimizer is general, and the effect of RMSprop and Adam is better. However, after training 40 epochs, the loss value of Adam optimizer is lower. Therefore, Adam optimizer is selected for training in this paper. In deep learning, the complexity analysis mainly includes two indicators: the amount of computation and the amount of parameters. Flops refers to the number of floating-point operations per second, and flops corresponds to the time complexity; Params refers to the total number of parameters to be trained in the network model, and the parameter quantity corresponds to the spatial complexity. In this paper, the complexity of each method is compared by using the control variable method, which is divided into three groups for comparison, as shown Table 5. The first group (G1) is the original YOLOv4 algorithm; The second group (G2) is to add ASFF structure on the basis of YOLOv4; The third group (G3) is to replace bceloss with focaloss on the basis of G2. The results are shown in Table 5. Compared with G1 and G2, lops and params only increase by 9.6% and 14.2%, indicating that they have little impact on the time and space complexity of the network; Compared with G2 and G3, the replacement loss function does not affect flops and params.

V. CONCLUSION
This paper proposes an underwater image enhancement network based on heterogeneous feature fusion and dynamic feature enhancement. The network to improve the encoder and decoder structure, design and add features module, dynamic enhancement module and heterogeneous characteristics of fusion module with the method of heterogeneous feature fusion converged network to extract the low-level features and advanced features, improve communication between various modules, retain more image details. Aiming at the problems of uneven illumination, low contrast, large amount of noise and color deviation of underwater images taken in deep sea and night waters, the underwater image imaging model and water parameters are used to synthesize more real data sets for training. The trained network not only achieves good results on the synthetic data sets, but also achieves good results on the collected in reality data sets. For real underwater images, the color and detail of the scene are also well restored. The experimental results show that the proposed method is better than many existing methods, and can effectively remove the discoloration caused by water, and the enhanced image can restore the original features of the object well. It has better robustness for underwater images with severe color decay. In the future, we will consider further optimizing the network structure, reducing the number of network parameters and single frame operation time; At the same time, although the algorithm in this paper has a good effect of image enhancement and restoration, it only focuses on the enhancement of static images. In the future, the network will be considered to be applied to continuous video scenes, and the enhancement of low resolution fuzzy video can be achieved by introducing the extraction of video multi frame information into the network structure.