Low-Light Image Enhancement via Gradient Prior-Aided Network

Low-light images have low brightness and low contrast, which brings huge obstacles to the intelligent video surveillance system. The enhancement of low-light images must simultaneously consider the interference of factors such as brightness, contrast, artifacts, and noise. To this end, in this study, we propose a gradient prior-aided low-light enhancement network (GPANet). The main idea is to improve the network’s ability to extract edge features and remove unwanted noise by introducing first-order (i.e., Sobel Filter) and second-order gradient (i.e., Laplacian Filter) features. Unlike in previous methods, in the proposed study, we first extract the first-order and second-order gradient information of low-light images and concatenate them with low-light images for multi-view feature analysis in the multi-view fusion encoder (MFE). Then, we suggest the multi-branch topology module (MTM) to fuse and decompose the multi-view features. Finally, we reconstruct the multi-view features through multi-view decomposition decoders (MDDs, including three sub-decoders) to generate potentially normal-light images. The first- and second-order gradient decoders will provide the enhancement decoder with multi-scale gradient prior features. Furthermore, we suggest a residual network to speed up network convergence while ensuring stable enhancement performance. We conduct experiments on widely adopted datasets. The results demonstrate the advantages of our method compared to other methods from both qualitative and quantitative perspectives. The source code is available at https://github.com/LouisYuxuLu/GPANet.


I. INTRODUCTION
Computer vision provides sensing and processing technologies for various applications and services, including the current vision-based unmanned driving system and visible light-based intelligent surveillance system. However, insufficient light in low-light conditions has an effect on the initially visible scene information. The light reflection of the scene perceived by the photosensitive device on the camera is weak, the captured image is dim and blurred, and the image has unwanted noise interference, etc. This visual deterioration severely restricts the development of advanced visual tasks (such as object detection [1], [2] and segmentation [3]) and The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan .
causes an unpleasant visual experience. Therefore, low-light image enhancement is essential for improving the performance of existing optical systems.
In recent decades, to extract the hidden feature information from the low-light background, researchers have proposed a variety of solutions for low-light image enhancement. Methods for low-light image enhancement can be divided into two categories: traditional methods and learning-based methods. Histogram equalization (HE)-based, Retinex-based, and Dehazing-based are the main solutions for traditional enhancement methods. HE-based methods [4], [5], [6], [7] adjust the contrast and brightness of low-light images by increasing the gray level and flattening the intensity or color distribution of pixels. Due to the difficulty of adjusting the gray level adaptively, the enhanced image will FIGURE 1. Sobel- [37] and Laplacian-based [38] edge detection results for low-light and normal-light images from LOL dataset [25].
experience the phenomenon of the loss of local details or an exaggerated increase in contrast. Retinex-based methods [8], [9], [10], [11], [12], [13], [14], [15], [16] decompose the low-light image into illumination and reflection components. The illumination component is then analyzed and adjusted to generate a potential normal-light image. However, Retinex-based methods are susceptible to local color distortion and cannot efficiently eliminate noise. Dehazing-based methods [17], [18], [19] have also been extensively attempted for low-light image enhancement. The reversed low-light image can be dehazed [20] and then reversed to generate an enhanced image. However, the pixel value composition of the inverted low-light and hazy images are significantly different, so Dehazing-based methods are susceptible to localized color distortion. The camera response model-based methods [17], [21], [22] adjust the brightness of the image by adjusting each pixel to the desired exposure based on an estimated exposure ratio map. Deep learning [23] has had a great deal of success with low-level computer vision tasks. Learning-based methods [24], [25], [26], [27], [28], [29], [30], [31], [32], [33], [34], [35], [36] have been extensively studied in low-light image enhancement. It mainly comprises end-to-end learning, model-based learning, and generative adversarial learning. However, the learning-based methods are challenging to meet the enhancement requirements of different low-light scenes. In addition, it will inevitably cause the loss or destruction of detailed information, such as the edge texture of the image.
Real-world low-light images will mask or destroy potential edge texture features due to insufficient brightness, noise interference, et al., resulting in local over-smoothing or serious noise interference in the enhanced image. Considering the advantages and disadvantages of the above methods, to more accurately extract the latent edge feature information in the low-light background, we propose to introduce first-(i.e., Sobel Filter [37]) and second-order (i.e., Laplacian Filter [38]) gradient features [39], [40] to assist the reinforcement network in learning the mapping of global and local edge features. The first-and second-order spatial derivatives are beneficial to improve the stability of the network for edge information extraction and the robustness of noise suppression. To this end, we propose a gradient prior-aided low-light enhancement network (GPANet). Specifically, the multi-view fusion encoder (MFE), multi-branch topology module (MTM), and multi-view decomposition decoders (MDDs) enable the transmission and exchange of three-view feature information (i.e., Sobel-, Laplacian-based, and original) at three scales. In addition, through multi-view and multi-scale feature information fusion and decomposition, our GPANet can capture the imperceptible edge detail features from the dark background more accurately, thereby producing a potential normal-light image. The main contributions of our work can be summarized as follows • We introduce first-order Sobel and second-order Laplacian gradient priors to assist the learning and mapping of edge features in low-light images by the deep enhancement network.
• We construct an end-to-end network consisting of MFE, MTM, MDDs. It can improve the enhancement performance of the deep network by fusing and decomposing multi-view and multi-scale features.
• Extensive quantitative and qualitative evaluation experimental results demonstrate that GPANet can achieve high-quality low-light image enhancement in complex imaging environments compared to state-of-the-art methods.
The rest of this paper is organized as follows. The recent studies on low-light image enhancement are reviewed in Section II. In Section III, we introduce our GPANet. Numerous experiments on both synthetic and real-world scenarios have been implemented to evaluate the enhancement performance in Section IV. Conclusions and discussion are given in Section V.

II. RELATED WORK
In this section, we briefly review the research on low-light image enhancement methods, including traditional and learning-based methods.

1) HE-BASED
The histogram equalization (HE) [4] aims to control the processed image histogram, enabling the pixel values to follow a uniform distribution, which improves the contrast and clarity of the image. Because it is a global operation that ignores the brightness transformation, it will result in over or under enhancement issues. The dynamic histogram equalization (DHE) [41] separates the histogram into sub-histograms and equalizes each one. Contrast limit adaptive histogram equalization (CLAHE) [42] adaptively controls the degree of HE contrast enhancement. The method described above may cause major color misalignment issues, and the details in darker areas are often not appropriately enhanced. To enhance the overall visual effect, some succeeding methods have improved the HE by maintaining the average value of image brightness and enhancing resilience against noise, etc. For example, brightness preserving dynamic histogram equalization (BPDHE) [43] is appropriate for images with a very dark foreground and background because it improves image contrast while maintaining the original image's overall brightness. Nonetheless, it is probably to increase background noise, limit proper signal contrast, and generate excessive saturation in some parts of the image. Lee et al. [44] also improved contrast by increasing the gray value of adjacent pixels. Shubhi et al. [45] proposed a novel unsharp mask filtering technique combined with histogram equalization to maximize the entropy of generic images. In addition, it controls over-enhancement and under-enhancement by cropping the histogram of the image. Nonlinear exposure intensity-based modification histogram equalization (NEIMHE) [46] divides a non-uniformly illuminated image into five sub-regions and modifies each sub-region histogram by setting nonlinear weights in the cumulative density function of each sub-region histogram. HE-based methods effectively enhance contrast in the whole or part of the image, but most methods are inflexible. Some parts of the image have undesirable visual effects, such as underexposure, overexposure, and amplified noise.

2) RETINEX-BASED
The principle of the Retinex theory [8] is to divide the image into the illumination component and the reflection component and then restore the original detail information of the image. After estimating and boosting the illumination component, the two images are fused to achieve an enhanced effect. The single-scale Retinex method (SSR) [47] approximated the reflection component by the Gaussian function and the image convolution. The multi-scale Retinex restoration (MSR) [48] was a modification of SSR. It used Gaussian filtering at various scales to approximate the illumination image, followed by a weighted average of the filtering results. The multi-scale Retinex with color restoration (MSRCR) [49] was built upon MSR and included a color restoration factor to correct image distortion and bring it closer to the real scene. After that, several methods emerged, which combined the Retinex theory and other theories. For example, structure-revealing low-light image enhancement (SRIE) [11] employed a compensation technique to compensate for the dark areas of the image that are exaggerated by the logarithmic domain gradient. Low-light image enhancement (LIME) [12] estimated the brightness component and used an inverse technique to obtain the reflection component. Naturalness preserved enhancement (NPE) [10] employs Retinex theory and log bilateral conversion to bring the light component mapping closer to natural color. To better remove unwanted noise in low-light images, the low-rank regularized retinex model (LR3M) [50] injected a low-rank prior to the retinex decomposition process. It can avoid remaining noise commonly found in illumination and reflectance maps by sequentially estimating piecewise smooth illumination and noise-suppressed reflectance. Wang et al. [51] proposed an improved logarithmic transformation-based adaptive and simple color image enhancement method by applying the Weber-Fechner law to grayscale mapping in logarithmic space. The above Retinex-based methods improve image contrast and mitigate the effect of noise to a certain extent.

3) DEHAZED-BASED
Due to the similar properties of low-light and hazy images, mature dehazing methods [20] can be used to enhance lowlight images. Dong et al. [17] enhanced the image by first inverting the low-light image and then improving the image's contrast by the dehazing method. Li et al. [19] employed an appropriate BM3D [52] denoising procedure to separate the base and enhancement layers and then adjusted the two layers independently. While these methods provide respectable findings, they lack a physically plausible explanation.

B. LEARNING-BASED LOW-LIGHT ENHANCEMENT METHODS
Convolutional neural networks (CNN) have been widely employed for image enhancement in low light conditions. LLNet [24] is a deep learning method and trains a stacked sparse noise reduction autoencoder based on synthetic data. VOLUME 10, 2022 Algorithm 1 Gradient Prior-Aided Network (GPANet) Require: Low-light image x and its Sobel-based gradient feature G s x and Laplacian-based gradient feature G l x , backbone Network F GPANet , Learning loss L GPANet , groundtruth of three-view y, G s y , and G l y , batch size N , 5: for sampled minibatch {x k } N k=1 do 6: Data pre-processing for task F GPANet 7: Computer gradient with respect to L GPANet 13: 15: L En = L En (ŷ, y) 16: Update layers within F GPANet , 17: end for 18: end while 19: return F GPANet andŷ, and discardĜ s y andĜ l y .
The method can enhance low-light images while simultaneously reducing noise. HDRNet [53] learns local, global, and content-dependent decisions to approximate the intended image transformation. RetinexNet [25] builds upon the Retinex theory and adjusts the brightness map to improve low-light images using a product neural network. Lighten-Net [54] takes a weakly illuminated image as input and outputs its illumination map, which is then used to generate the enhanced image using the Retinex model. DeepUPE [55] improves low-light images by predicting the brightness map but does not take noise into consideration. While learning methods outperform traditional methods, most of them are ineffective at noise reduction. Moreover, some even neglect noise reduction entirely. MBLLEN [26] applies multiple levels of feature extraction and fusion to low-illuminance image enhancement, achieving a more noticeable enhancement effect. KinD [27], and KinD++ [56] use the Retinex theory to optimize the decomposition, while the reconstruction structure incorporates an adjustment network and efficiently performs continuous light map modification. Jiang et al. [29] proposed EnlightenGAN that employs a global-local discriminator and a self-regular attention mechanism to avoid training on paired image data sets, thereby increasing the adaptability of the network to most real-world scenarios. Guo et al. [30] established a non-reference network and developed a new zero-reference depth curve estimation method (Zero-DCE). The non-reference network addresses the over-fitting problem and has superior generalization power to the reference network. However, it could be improved in terms of handling noise and color aberrations. Zhao et al. [57] proposed a novel Retinex decomposition ''generative'' strategy to generate more accurate latent components and used a unified depth framework to perform low-light image enhancement. Zhang et al. [32] proposed a novel method to learn and infer motion field (optical flow) from a single image and synthesize short-range video sequences, thereby enforcing the temporal stability in low-light video enhancement with only static images. Sobashi et al. [35] proposed a low light homomorphic filtering network, which performs image-to-frequency filter learning and is jointly trained to optimize image enhancement and classification performance. Light channel enhancement network (LiCENet) [58] suggested a combination of an autoencoder and a convolutional neural network in the HSV color space to train a low-light enhancer and further improve the details of low-light images on top of improved lighting. However, the learning-based methods have insufficient ability to express image features at different scales, making it difficult for the network to recover detailed information from extremely dark images. In addition, the enhanced image is prone to color distortion, amplified noise, and blurred edges.

III. GPANet: GRADIENT PRIOR-AIDED NETWORK
In this section, we elaborate on the details of GPANet. First, we extract Sobel-and Laplacian-based edge gradient features from low-light images. Then, we performed a multi-view fusion encoder (MFE), multi-branch topology module (MTM), and multi-view decomposition decoders (MDDs) on low-light images. Finally, we elaborate on the network parameter settings of the loss function.

A. EDGE GRADIENT DETECTION
In image processing, an edge is an intentionally abrupt change in intensity. However, low-light images have inconspicuous edge features due to inconsistent local brightness in Fig. 1. Therefore, we will learn to optimize the gradient features of low-light images to extract the masked gradient edge information in complex low-light imaging environments. Spatial edge detection techniques are mostly based on the first-order derivatives (e.g., Sobel) and the second-order derivatives (e.g., Laplacian).

1) SOBEL FILTER
The Sobel operator is a discrete difference operator that is used to approximate the image brightness function gradient. It can return the gradient vector or normal vector corresponding to any point in the image. In addition, the Sobel operator can assist in balancing edge detection and noise suppression. Considering a digital image I (u, v), where (u, v) represent spatial coordinates, the image gradient magnitude is defined as where G su and G sv are the gradient components respectively in u and v directions G su and G sv are obtained by filtering the image with directional kernels, K su and K sv , i.e., Therefore, G su = I (u, v) * K su and G sv = I (u, v) * K sv , where the * symbol denotes convolution. However, because the Sobel operator does not process the image based on its grayscale, it is challenging to distinguish the image's main body from its background, and the extracted image contour is often unsatisfactory.

2) LAPLACIAN FILTER
The Laplacian operator is a second-order derivative operator that will produce a steep zero-crossing at the edge. The Laplacian operator is isotropic and can sharpen boundaries and lines in any direction, with no directional characteristics. In mathematical terms, the Laplacian filter represents the differential operator. The continuous Laplacian of a two-dimensional function I (u, v) is defined as The Laplacian filter can be represented as a 3 × 3 mask, with the center value being negative or positive, depending on the neighboring values. The kernel of the Laplacian filter can be given by We calculate the convolution between K l and the image I (u, v) to obtain the second order gradient G l , i.e., G l = I (u, v) * K l . The Laplacian operator method is sensitive to noise. However, considering that this paper uses Laplacian-based features as a priori information and uses deep networks for learning and prediction, it will not cause additional negative effects.

B. MULTI-VIEW FUSION ENCODER
As shown in Fig. 2, we take the low-light image x, the corresponding concatenation of gradient features G s , and G l together as the input (i.e., represents the concatenation of multi-view feature map. On the one hand, the multi-view feature can make the encoder purposefully pay attention to the edge gradient information in low-light images. On the other hand, the encoder can improve its image feature mining ability in complex imaging environments by receiving different types of inputs, thereby further optimizing the deep model. Encoder concatenates several residual units (ResUnits). The particular operation in ResUnit, denoted by R(·), can be expressed as where f k i and f k i+1 are the input and output of the (i + 1)-th ResUnit in the k-th residual module. c(·) is convolutional layer, l(·) is Layer Normalization, and τ (·) is PReLU. There are 4 ResUnits in the encoder, and they are connected through the maximum pooling M (·). The output f E after a sequence of operations of ResUnits is The output of the encoder contains the low-light image, Sobel-and Laplacian-based large unordered, but lowdimensional feature maps that contain a lot of information.
To enhance the three-view feature information, as shown in Fig. 2, we propose a multi-branch topology module (MTM). MTM can transmit information in the depth and width directions of the network through different network nodes so as to more effectively utilize the parameters between neurons and improve the network learning ability [59]. The elements that make up the MTM (i.e, the pink circles) are convolutional units, which are still composed of a convolution, a layer normalization, and the PReLU function. It is worth noting that although multi-view feature addition can enhance high-frequency information such as gradient edges, at the same time, unwanted noise is also enhanced. To this end, we will adopt a strategy of first addition fusion and then concatenation fusion of the output feature maps of the three fields of view from the MTM. In addition, we propose inserting a convolutional layer between addition and concatenation operations to fuse and enhance edge texture features. MTM will output feature maps of three views, including Sobel-based feature y s m , Laplacian-based feature y l m , and mainline enhancement feature y e m . To better fuse and decompose the features of the three perspectives, we use the form of dense connections to strengthen the transfer of information flow. In addition, to improve the fusion processing capability of the convolutional layer in the middle of the MTM for huge amounts of information, we additionally increase the number of convolutional channels.
In the end, we fuse the outputs of the three views of the MTM again to strengthen the mainline features, which can be given by where f M is the input of Main-Decoder.

D. MULTI-VIEW DECOMPOSITION DECODER
Our GPANet will supervise the optimization of deep networks from three views. Therefore, the proposed network consists of three sub-decoders (i.e., Sobel-,Laplacian-based, VOLUME 10, 2022 and enhancement decoders). Similar to the encoder, a decoder can be given as where U is upsampling operation. In this paper, we suggest using bilinear interpolation to improve the resolution of feature maps. D in are the three-view outputs of the MTM, i.e., y s m , y l m , and y e m . In addition, we introduce long skip connections to connect residual units of corresponding scales of the encoder and decoder for more efficient learning and inference. For the enhancement decoder, each of its residual units will simultaneously receive the output of the corresponding residual units of Sobel-and Laplacian-based decoders, thereby further improving the mainline encoder's attention to edge feature information.

E. LOSS FUNCTION
To optimize trainable parameters and improve image quality qualitatively and quantitatively in the proposed GPANet, we propose a loss function L total , which is composed of three components, i.e., Sobel-based gradient consistency loss L Sobel , Laplacian-based gradient consistency loss L Lap , and Final Enhancement loss L En . It is expressed as where ω 1 , ω 2 , and ω 3 denote the weight value for each loss term, respectively. Based on extensive experiments, those values are set to 0.8, 0.1, and 0.1 for the best performance. The details of the three loss functions are given below.

1) SOBEL-BASED GRADIENT CONSISTENCY LOSS
To better approximate the image gradients, we compute the first and second derivatives and use them as regularizers (i.e., 1 and 2 ) to penalize incorrect estimates of image gradients, i.e., L S and L L . In addition, 1 and 2 can also improve the edge detection accuracy of the two gradient prior-Aided branch networks by forcing the semantic properties in the low-light image, making it as close to the ground truth as possible, which can be given by

2) LAPLACIAN-BASED GRADIENT CONSISTENCY LOSS
Similar to L Sobel , we still suggest 1 and 2 as the loss function of L Lap , i.e.,

3) ENHANCEMENT LOSS
Whether the structure, brightness, color, etc. of the enhanced image are natural is an important criterion for testing the performance of the enhancer. To optimize the trainable parameters of the proposed method, we formulate a multi-constrained loss function L total , which consists of two parts, i.e., data loss L d , and edge loss L e . It is defined as in this part, we set ω 1 1 = 0.99, and ω 2 1 = 0.01, respectively. From Eq. (10) and (11), we can know that 1 and 2 are able to penalize the enhancement results for not being similar to the corresponding ground truth in the pixel-averaged point of view, so we thus suggest 1 and 2 as our major data constraints loss, defined as follows Sobel-and Laplacian-based decoders provide the enhancement decoder with a large number of edge features of different scales. To further improve the learning and reasoning ability of the enhancement decoder for high-frequency edge features, we suggest using edge loss to constrain the difference between y andŷ, i.e., where Lap(ŷ) and Lap(y) represent the edges extracted from y and y through the Laplacian operator, respectively. The penalty coefficient ε is empirically set to 10 −3 .

IV. EXPERIMENTS AND DISCUSSION
In this section, to clearly demonstrate the GPANet, the details of the experimental procedure are presented. First, the operating environment and implementation details of network training and learning are presented. Second, we introduce the referenced and non-referenced evaluation indicators used in our experiments. Then, we perform qualitative and quantitative comparisons with traditional and learning-based enhancement methods on real-world paired and non-paired low-light standard test datasets. Fourth, to validate the value of the gradient prior features introduced in this paper, we analyze ablation experiments. Finally, we test the running time of the proposed model on a single high-resolution image.

A. IMPLEMENTATION DETAILS
The imaging environment of low-light images is complex and changeable. To improve the generalization ability and robustness of the deep model, we will adopt two strategies to obtain multi-scene paired training datasets. We propose to train an existing LOL dataset [25] and GLADNet [61]. Specifically, LOL dataset obtains low-/normal-light data pairs by varying exposure parameters in daylight. It contains a total of 1500 pairs of data, of which 500 pairs are collected from real scenes, and the other 1000 pairs are synthetic data. GLADNet dataset contains a total of 5000 pairs of synthetic data. Quantitative comparison between our method and state-of-the-arts on the Eval15 from the LOL dataset [25]. The best results are highlighted in red, and the second-best results are highlighted in blue.
Furthermore, to further improve the diversity of training data, we randomly rotate and flip the training dataset to enrich the feature structure information of the samples. In the test inference validation, we will perform objective and subjective evaluation analysis on real-world paired and non-paired low-light datasets, including LOL [25], DICM [44], ExDARK [60], LIME [12], MEF [62], MIT-Adobe FiveK [63], NPE [10], and TMDIED 1 . Our GPANet is trained and tested on experimental devices with Windows OS, Intel(R) Core(TM) i9-10850K CPU @ 3.60GHz and Nvidia GeForce RTX 2080TI GPU. The framework used for training is Pytorch and the Adam optimizer is used to propose 60 epochs to train GPANet. The initial learning rate is 10 −3 , and at 20 and 40 epochs, the learning rate is multiplied by 0.1 attenuation. Training time of proposed model can be completed in about 24 hours. For a fair comparison of all traditional and learning-based low-light image enhancement methods, all test codes are downloaded and tested from the code link published in the authors' paper.

B. PERFORMANCE EVALUATIONS
To more comprehensively evaluate the enhanced performance of the proposed model, our GPANet will be compared on synthetic and real-world low-light images with the state-ofthe-art 19 current methods, including HE-based methods (i.e., HE [4]), Retinex-based methods (i.e., NPE [10], SRIE [11], LIME [12], and JIEP [13]), camera respond model-based methods (i.e., CRM [21], Dong [17] and BIMEF [22]), dehazing-based methods (i.e., DeHz [18]) and learning-based methods (i.e., RetinexNet [25], MBLLEN [26], KinD [27], DeepUPE [55], EnlightenGAN [29], DLN [28], Zero-DCE [30], DeepLPF [64], RUAS [33], and DSLR [65]). In this subsection, we consider quantitatively analyzing the performance of different enhancement methods from several aspects. Evaluation indicators are roughly divided into two 1 https://sites.google.com/site/vonikakis/datasets/tm-died categories: one is an evaluation with reference, and the other is an evaluation without reference. Reference-based evaluation metrics require the original image to be available, and the enhanced results are compared with the ground truth, while non-reference-based evaluation metrics do not require. Specifically, we will utilize peak signalto-noise ratio (PSNR), structural similarity (SSIM) [66], feature similarity (FSIM) [67], visual saliency-induced Index (VSI) [68], and lightness order error (LOE) [10] indicators to quantitatively evaluate the enhancement performance under different degradation conditions. The above four metrics evaluate the performance in terms of numerical distance and structural similarity between the augmented result and the corresponding well-exposed image (i.e., the ground truth), respectively. Meanwhile, three popular non-reference image quality assessment methods, including natural image quality evaluator (NIQE) [69], perceptual-based image quality evaluator (PIQE) [70], and AIC (entropy) [71] are used for blind image quality assessment in practical experiments. The above three metrics are based on the entropy of informative regions, the quality-aware set of natural scene statistics models, tone mapping, and the hierarchical perception mechanism (from local structure to global semantics) in human vision systems in a non-referenced manner. These visual quality metrics are expected to provide objective criteria to evaluate the performance of each method from different perspectives.

C. QUANTITATIVE ANALYSIS 1) PAIRD TEST DATASET
To objectively evaluate the performance of our GPANet, we first select the standard test set of 15 paired images from the LOL dataset. According to Table 2, GPANet ranks first in three reference evaluation metrics (i.e., PSNR, SSIM, and VSI) and one no-reference evaluation metric (i.e., NIQE) when compared to the other 19 competing methods. Although our method does not all achieve the best performance in other reference and no-reference evaluations, it still has a TABLE 3. Quantitative comparison of NIQE between our method and state-of-the-arts on the DICM [30], ExDARK [60], LIME [12], MEF [72], MIT-Adobe FiveK [63], NPE [10], and the TMDIED datssets. The best results are highlighted in red, and the second-best results are highlighted in blue.
high ranking. Compared to some learning-based methods, the performance of traditional methods is relatively stable. On the one hand, it's due to the structure of the deep model's insufficient robustness. In addition, a lack of diversity in the learning data can hinder performance improvement. Our GPANet is able to accurately extract latent gradient features and learn multi-view features at multiple scales. It can enhance learning and reasoning in complex low-light environments.
does not achieve the best objective evaluation results in all test datasets, it ranks first in the mean calculation of all test datasets, further validating the robustness and generalizability of GPANet.

D. VISUAL ANALYSIS
To compare the visual performance of start-of-the-art enhancement methods, three real-world low-light images are chosen from the LOL test dataset. As shown in Figure 3, low-light images captured in the real world are largely low in brightness and contrast, accompanied by noisy information that destroys image texture details. HE can enhance the image's contrast, but there is the phenomenon of color distortion and noise amplification. NPE can measure image brightness in a more natural manner than other Retinex-based methods (i.e, SRIE, LIME, and JIEP). LIME is enhanced by the denoising function of BM3D, which suppresses unneeded noise points and makes the model appear natural. It is challenging for camera response model-based methods to improve the brightness and contrast of images captured in excessively dark environments. Although RetinexNet decomposes low-light images into illumination components and reflection components, the color of the enhanced images is distorted due to the limitations of its learning data. The images improved by MBLLEN are over-smoothed, the edge texture information is lost locally, and the contrast remains VOLUME 10, 2022  [4], (c) NPE [10], (d) SRIE [11], (e) LIME [12], (f) JIEP [13], (g) CRM [21], (h) Dong [17], (i) BIMEF [22], (j) DeHz [18], (k) RetinexNet [25], (l) MBLLEN [26], (m) KinD [27], (n) DeepUPE [55], (o) EnlightenGAN [29], (p) DLN [28], (q) Zero-DCE [30], (r) DeepLPF [64], (s) RUAS [33], (t) DSLR [65], and (u) proposed GPANet, respectively.
low. There is a slight overexposure phenomenon, but KinD has a strong perception of potential structural features and can improve image contrast while preserving texture structure. DeepUPE, DeepLPF, and DSLR are challenging to extract valuable information from the dark background when the light intensity is too low. The images enhanced by DLN and Zero-DCE have low contrast, and the color information is lost. Although the color distribution of RUAS is close to the actual value, the texture information is lost, and the image is locally blurred. Our GPANet benefits from the constraints of gradient prior features, which can enhance the contrast and brightness of images without destroying the underlying texture and color features. What's more, it has a visual performance that is more comparable to that of real clear images. As shown in Fig. 4, we choose the method with the best performance to compare the edge detection results of Sobel and Laplacian for the enhanced image. Multi-view and multi-scale learning and reasoning enable GPANet to accurately extract gradient information and assist image enhancement in complex low-light imaging environments. To test the robustness of the proposed method on visual evaluation, we randomly select one image from the DICM, ExDARK, LIME, MEF, MIT-Adobe FiveK, NPE, and TMDIED datasets, respectively. Fig. 6 shows the enhancement results with different methods applied to the seven images from the test dataset. Fig. 5 shows the enhancement results on the TMDEID dataset. After zooming in on the local area, it can be clearly found gradient priors-enable GPANet FIGURE 6. Visual comparison of different enhancement methods for seven real-world low-light images in different low-light scenes. From left to right: (a) low-light image, restored images, respectively, generated by (b) NPE [10], (c) LIME [12], (d) CRM [21], (e) DeHz [18], (f) KinD [27], (g) DeepUPE [55], (h) DLN [28], (i) Zero-DCE [30], (j) DSLR [65], and (k) proposed GPANet, respectively. can balance image brightness enhancement and edge texture information preservation.

E. USER STUDY
Furthermore, we conduct a user study to understand how our model differs from other state-of-the-art methods, including LIME, CRM, DeHz, KinD, DeepUPE, DeepLPF, Zero-DCE, RUAS, DSLR, and GPANet. The test dataset consists of 20 images from the dataset mentioned by IV-A. For each image in the test method, a score between 1 and 10 is produced. In order to be fair and impartial, each method will be sent to the user in an anonymous way. As shown in Fig. 7, our method has the highest score of 10 and ranks first with an average score of 8.288. Therefore, we can further confirm the performance of the proposed GPANet in low-light image enhancement.

F. ABLATION STUDY
In this section, we attempt to verify the value of gradient prior information in GPANet. We continue to utilize the 15 images from the LOL test dataset for testing ablation. According to the index values provided in Table 6, the objective evaluation performance is lowest when gradient prior information is not included. Adding Sobel or Laplacian prior information improves the performance of a network. When both types of priors were used for deep network learning and inference, PSNR, SSIM, FSIM, VSI, and LOE performance improved by 1.08, 0.017, 0.014, 0.015, and 12.19, respectively, compared to when gradient information was not introduced.

G. OBJECT DETECTION AFTER VISIBILITY ENHANCEMENT
We investigate how high-level vision tasks interact with low-light enhancement by detecting under-lit objects.  The basic recognition module is the YOLOv4 model [73]. Then we combine the GPANet and Yolov4 models for joint optimization. We use the Pascal-VOC2007 dataset (ground truth) to generate synthetic low-light images. As demonstrated in Table 7, the image enhanced by our GPANet has the most obvious improvement in object detection accuracy. Our GPANet can optimize and enhance the network through auxiliary edge gradient information and has a better ability to extract potential texture features of images. Therefore, the GPANet-driven object detection network has higher detection accuracy than the other three methods.

H. RUNNING TIME
Although modern computers have significantly improved computing power, for most algorithms, real-time processing of videos with high resolution is still a severe challenge. This section compares traditional and learning methods' running time on the same computers mentioned with Windows OS, Intel(R) Core(TM) i9-10850K CPU @3.60GHz, and Nvidia GeForce RTX 2080TI GPU. As shown in Table 8, we perform low light enhancement on a 2k image (2560 × 1440). The original HE algorithm needs to count the information of the entire image. Though the required computing time is short, it is insufficient to meet real-time processing needs. In general, Retinex-based methods take a longer time. Although the learning methods make use of a powerful GPU, methods other than Zero-DCE still require additional computing time for mapping. In comparison to other methods, our GPANet is capable of meeting the requirements for real-time enhancement of low-light videos while also providing superior enhancement effects.

V. CONCLUSION AND DISCUSSION
This paper proposes a gradient prior-aided enhancement solution and implements it by introducing first-order (i.e., Sobel Filter) and second-order (i.e., Laplacian Filter) gradient prior features to handle low-light image enhancement.
The key is to construct an end-to-end network consisting of an encoder, a multi-branch topology module, and three sub-decoders. GPANet is able to improve the enhancement performance by fusing and decomposing multi-view and multi-scale features. Extensive experiment results demonstrate the advantages of GPANet compared to other methods from both qualitative and quantitative perspectives in maritime-related and other natural low-light scenes. To make our work more reliable and applicable, the research shown in this work can be extended in the following directions.
• To further improve the robustness and effectiveness of our GPANet, the size of the loss function weights cannot rely solely on experimental experience. The weight of the loss function should be adaptively adjusted and updated for different brightness distributions, scene compositions, etc. To this end, we will consider using an equatorial uncertainty weighting strategy to adaptively adjust the constant weights.
• Sobel and Laplacian are the basic first-and second-order gradient operators, and the gradient information generated by them has limitations such as noise interference and insensitivity to grayscale changes. In order to extract edge gradient features from the dark background more accurately, we will try different gradient detection operators or algorithms such as Laplacian of Gaussian, Prewitt, Canny, etc., which can contribute to balance edge extraction and brightness enhancement.
• The generalization ability of deep networks in different low-light scenarios remains a challenge. Therefore, Therefore, how to adapt the network to different low-light scenarios is one of the urgent problems to be solved. It is a crucial way to improve the stability of the network for complex low-light imaging by introducing physical prior knowledge to constrain the brightness, contrast, and structure of the generated image. We will explore more physical priors and fusion methods of deep networks to come up with better solutions for low-light image enhancement.  53 China, in 2020. She is currently working as an 54 Information Engineer with Wuhan Baosight Soft-55 ware Company Ltd., Wuhan, China. Her research 56 interests include computer vision and machine 57 learning. 58 XIANJUN HU received the Ph.D. degree in 59 system engineering from the Naval University 60 of Engineering, Wuhan, China, in 2013. He is 61 currently a Senior Lecturer with the Naval Univer-62 sity of Engineering. His research interests include 63 system optimization, artificial intelligence, and 64 remote sensing applications in coastal and lake 65 ecosystems. 66 67