Double Domain Guided Real-Time Low-Light Image Enhancement for Ultra-High-Definition Transportation Surveillance

Real-time transportation surveillance is an essential part of the intelligent transportation system (ITS). However, images captured under low-light conditions often suffer poor visibility with types of degradation, such as noise interference and vague edge features, etc. With the development of imaging devices, the quality of the visual surveillance data is continually increasing, like 2K and 4K, which have more strict requirements on the efficiency of image processing. To satisfy the requirements on both enhancement quality and computational speed, this paper proposes a double domain guided real-time low-light image enhancement network (DDNet) for ultra-high-definition (UHD) transportation surveillance. Specifically, we design an encoder-decoder structure as the main architecture of the learning network. In particular, the enhancement processing is divided into two subtasks (i.e., color enhancement and gradient enhancement) via the proposed coarse enhancement module (CEM) and LoG-based gradient enhancement module (GEM), which are embedded in the encoder-decoder structure. It enables the network to enhance the color and edge features simultaneously. Through the decomposition and reconstruction on both color and gradient domains, our DDNet can restore the detailed feature information concealed by the darkness with better visual quality and efficiency. The evaluation experiments on standard and transportation-related datasets demonstrate that our DDNet provides superior enhancement quality and efficiency compared with state-of-the-art methods. Besides, the object detection and scene segmentation experiments indicate the practical benefits for higher-level image analysis under low-light environments in ITS. The source code is available at https://github.com/QuJX/DDNet.


I. INTRODUCTION
W ITH the rapid growth of intelligent transportation sys- tem (ITS), more and more visual sensors are employed for transportation surveillance.However, when the imaging device is under low-light environments, the acquired images always suffer poor sharpness, low contrast, and undesirable noise [1].The poor imaging quality makes it difficult to see the captured scenes clearly and brings great challenges to higherlevel image analysis, such as object detection [2]- [4] and scene segmentation [5]- [7].Even though some imaging devices attempt to enlighten the darkness with extra artificial light such as infrared and ultraviolet flashes [8], the cost and the poor quality are the main limitations.Therefore, an effective lowlight image enhancement method is necessary for nocturnal transportation surveillance.Moreover, with the development of imaging and parallel computational devices, the resolution of the captured visual surveillance data is continually increasing, from the standard definition (SD, 480p, 720p), the high definition (HD, 1080p), to the ultra-high definition (UHD, 4K).The corresponding image processing algorithm has also been widely investigated under multiple transportation scenes, e.g., parking lot [9], waterway [10], and airport surveillance [11], etc.The trade-off between visibility enhancement and computational complexity is a major problem to be solved in current transportation applications [12].

A. Motivation
The real-time transportation surveillance has two main requirements for low-light image enhancement: effectiveness and efficiency.Specifically, the main targets of transportation surveillance are vehicles [13], pedestrians [14], vessels [15], etc.It is thus necessary to enlighten the darkness effectively with better noise suppression and feature preservation.For traditional low-light enhancement methods, the illumination is mainly improved by enhancing the contrast globally (e.g., histogram equalization (HE) [16]), which only improve visual perception without effective noise suppression.Compared with traditional methods, learning methods are robust to the noise due to the strong learning ability of deep neural networks, which could also improve the computational efficiency due to the acceleration of GPU.However, in transportation scenes, the edge feature is rarely considered in previous low-light image enhancement methods [17], which is especially important for higher-level image analysis like vehicle detection [13], pedestrian detection [18], and scene segmentation [19].In practical applications, the frame rate of most transportation surveillance cameras is less than 30 FPS [20], which is thus the basic efficiency requirement of real-time image processing methods.However, most previous low-light image enhancement methods can not satisfy this requirement [21].Therefore, in most cases, the UHD images will be firstly resized to smaller scales for lower computational complexity.It is doubtless that the image resizing has severe degeneration on the image quality.As shown in Fig. 2, the resizing operation causes significant detail loss, making the blur of vehicle license plates.Many methods have achieved real-time processing on UHD images, like Zero [22], SCI [23], and UHDFour [24], but the results are unsatisfactory in ITS scenes.
To achieve effective real-time low-light image enhancement in UHD transportation surveillance, we propose a double domain guided network (DDNet).It achieves superior noise suppression and brightness enhancement with enhancing the feature map on the color and gradient domains simultaneously.The experiments on running time have demonstrated the efficiency of the implementation on UHD images.Furthermore, the object detection and scene segmentation experiments indicate the practical improvement for higher-level image analysis.In general, this paper provides an effective and efficient method to improve the transportation surveillance under lowlight environments.

B. Contributions
In this paper, we propose a real-time low-light image enhancement network for UHD transportation surveillance, which achieves competitive enhancement quality and computational efficiency.The main contributions of the proposed method can be summarized as follows:

II. RELATED WORK
In this section, we briefly introduce the previous low-light image enhancement methods (i.e., traditional and learning methods) and their applications in ITS.

A. Traditional Methods
The traditional methods employ some mathematical models to enhance the low-light images.Histogram equalization (HE) [16] flattens the histogram and expands the dynamic range of intensity to improve the brightness of the image.However, it is challenging to discriminate the noise and clear information with HE-based methods.Excessive noise corrupts the histogram distribution, making it harder to get reliable information from low-light backgrounds.Retinex theory [25] and related methods [26]- [28] decompose the low-light image into the reflectance and illumination components to get the underlying normal-light image.To make a better balance between the brightness enhancement and noise suppression.However, Retinex-based methods have two major drawbacks.First, insufficient brightness enhancement in complex scenes results in unqualified enhanced images.Besides, they have difficulty in balancing noise suppression and edge feature preservation.Ying et al. [29], [30] suggested a camera response model to improve the effect of low-light image enhancement.Dong [31] and DeHz [32] enhanced the low lightness based on the atmospherical scattering model.SRRP [33] kept the smoothness of the original illumination to achieve qualified image enhancement.However, they failed to simultaneously achieve satisfactory detail preservation, illumination enhancement, and computational efficiency for real-time UHD transportation surveillance.

B. Learning Methods
In recent years, deep learning [34] has achieved widespread success in diverse fields of computer vision tasks, such as object detection, scene segmentation, and low-light image enhancement.Based on the Retinex theory, many methods employed the CNN to formulate the decomposition and enhancement of low-light images, e.g., KinD [35], RetinexNet [36], RUAS [37], Uretinex-net [38] and LR3M [39].Meanwhile, many multi-branch networks [40]- [45] were designed to tackle different subtasks in low lightness enhancement, e.g., noise reduction and color restoration.In addition to the supervised training, EnlightenGAN [46] and DRBN [47] enlightened the darkness with semi-supervised network.LLFormer [48] used vision transformer to achieve UHD low-light image enhancement.Although with considerable efforts, the running time of most previous works is not suitable for real-time UHD transportation surveillance.Besides, in transportation scenes, edge feature restoration is typically important, which was rarely considered.Lu et al. [49] proposed a gradient prior-aided neural network employing Laplacian and Sobel filters to guide the enhancement.However, these filters are sensitive to noise interference, which is harmful to image quality enhancement.In this paper, we employ the robust LoG operator to extract the gradient information and enhance it in the network to obtain better edge features.

C. Applications in Transportation System
The efficient low-light image enhancement methods are necessary for nocturnal surveillance in ITS.Therefore, many efforts have been devoted to overcoming the restriction of poor illumination.For instance, a CycleGAN-based image enhancement method is proposed for railway inspections [50], and an attention-guided lightweight generative adversarial network is designed for maritime video surveillance [51].Guo et al. [52] enlightened the darkness in maritime transportation scenes with a lightweight neural network.Besides, [53] and [54] have demonstrated the benefits of low-light enhancement for promoting the accuracy of higher-level image analysis tasks in ITS.

III. DOUBLE DOMAIN GUIDED LOW-LIGHT IMAGE
ENHANCEMENT NETWORK In this section, we first introduce the Laplacian of Gaussian Operator (LoG) in Section III-A.The architecture of DDNet and the implementation details of the self-calibrated convolutions are then presented in Section III-B and III-C.The joint loss function is introduced in Section III-D.

A. Laplacian of Gaussian Operator
The transportation surveillance under low-light environments suffers from low brightness along with vague edge features, which causes knotty troubles to higher-level visual tasks in ITS [55].Therefore, it is necessary to take the restoration of edge features into consideration [56].The Laplace operator is the sum of the second-order partial derivatives of the gray image function in the horizontal and vertical directions [57].It responds to areas where the intensity changes rapidly and can be used to extract the image edge features.The Laplacian operator L(u, v) corresponding to the intensity value I of the image pixel can be given as follows A single image can be represented by a discrete set of pixel values.The gradient feature map can thus be generated through a second-order derivative discrete convolutional kernel K L , which approximates the Laplacian operator, i.e., However, the images captured in low-light environments commonly contain unwanted noise.The sensitivity to noise makes it challenging to accurately extract gradient features from low-light images.To this end, we first reduce the interference of noise on the image by Gaussian smoothing filtering, which can be expressed as follows where σ is the Gaussian standard deviation.Benefiting from the associative property of the convolutional operation, we obtain a hybrid filter by convolving the Gaussian smoothing filter and Laplacian filter to generate LoG-based gradient features.The 2-D LoG function centered on zero with Gaussian standard deviation σ is given by The convolutional kernel of LoG is small, and the kernel parameters are pre-calculated, which brings little computational burden.In this work, the convolutional kernel parameters of LoG can be given as follows In the network, we first generate the gradient map of the low-light image via the LoG-based operator, which will be then enhanced in the GEM, as shown in Fig. 3.

B. Network Architecture
An ordinary neural network can not simultaneously and accurately generate the normal-light image and gradient feature map from the low-light image.We thus use multistage architecture to perform fusion-decomposition-fusion on the color and gradient domains.For the sake of better understanding, Fig. 4 depicts the architecture of our DDNet.Specifically, we first concatenate low-light images and their corresponding LoG-based gradient feature maps and feed them into the network.The proposed architecture includes six selfcalibrated convolutions with attention modules (ScCAM) in the peripheral en-decoder, GEM and CEM, respectively.As introduced in Section.III-C, the ScCAMs leverage spatial attention to identify valuable information locations within the feature maps, which are then utilized for self-calibration convolutions.This enables the convolutional modules to extract more important features without incurring additional computational costs.Additionally, the feature maps share similar structures, e.g., the same size (width and height) and intensity range ([0, 255]), allowing ScCAM to effectively extract and enhance the spatial features on gradient and color domains simultaneously.Therefore, the potential spatial features of gradient and color domains are effectively mined and enhanced during the en-decoding in GEM and CEM.The outputs of GEM and CEM, as well as the outputs of previous encoders are then fed to the final feature fusion decoder, which reconstructs the normal-light image based on the fused feature map.It is noted that during the training process, the enhanced gradient and color maps are generated by their respective decoders, which are constrained by individual loss functions to guarantee the restoration of both gradient and color information, as introduced in Section III-D.Due to the comprehensive enhancement on double domains with GEM and CEM, the proposed DDNet restores the low-light image with clear edges and natural colors.

C. ScCAM
To reduce the computational parameters, the majority of deep learning-based lightweight low-light enhancement networks extract hierarchical features progressively.However, this strategy leads to the insufficient utilization of low-frequency information, which results in poor performance on image detail restoration.Meanwhile, the self-calibrated convolutions (SCCs) perform satisfactorily in a variety of low-level and higher-level vision tasks [58].SCCs can efficiently extract multi-domain and multi-scale feature information to guide the enhancement processing without additional computational effort.In this section, we propose the ScCAM to conduct the encoder-decoder structures.It mainly consists of two parts (i.e., the upper and lower branches), as shown in Fig. 5.In particular, the upper part computes the attention information by introducing the spatial attention module, which can be expressed as follows where x in , f 1×1 , f 3×3 , F sam , M (•; •), and F scm represent the input of convolutional layer, the convolutional operation with 1×1 kernel size, the convolutional operation with 3×3 kernel size, the spatial attention module, the multiplication function, and the standard convolution module, respectively.In addition, the lower part uses the standard convolution module to recover the spatial domain information, which can be expressed as follows The output features of these two parts are then concatenated together and fed into a 1×1 convolution layer for information fusion.To speed up model training, the local residual path is employed to generate the final output feature.The output (y ScCAM ) of ScCAM can be thus yielded by where (•; •) represents the concatenation operation.1) Spatial Attention Module: In the process of low-light image enhancement, the complexity of scene information increases the difficulty of enhancement.Considering the human visual cerebral cortex, applying the attention mechanism can analyze complex scene information more quickly and effectively.The spatial attention module is beneficial for analyzing where the valuable information on the feature map is, which contributes to focusing more precisely on the feature map's valuable information.As shown in Fig. 5, to achieve spatial attention, we first use the average pooling and max pooling in the channel dimension.The feature maps are then concatenated and fed into a convolution layer with 7 × 7 kernel to generate the final spatial attention feature map.The spatial attention function can be expressed as follows where I, F s avg , F s max , f 7×7 , and S(•) represent the inputs of spatial attention module, average pooling, max pooling, the convolutional operation with 7×7 kernel size, and the sigmoid function, respectively.
2) Standard Convolution Module: In the standard convolution module, the convolution layer is first employed to guarantee the learning ability.Layer normalization (LN) is independent of batch size, which reduces the computational complexity when calculating normalization statistics.Furthermore, the Parametric Rectified Linear Unit (PReLU) is employed to perform nonlinear activation on the normalized data, which improves the generalization ability of the network in complex low-light scenes.The standard convolution function can be generated as follows where w, LN (•), and P R(•) represent the inputs of the standard convolution module, layer normalization, and parametric rectified linear unit, respectively.

D. Loss Function
To effectively constrain each component of the DDNet, we propose a joint loss function L total consisting of Laplacianbased gradient consistency loss L Lap , coarse enhancement where ω 1 , ω 2 , and ω 3 are the weights of each loss, which are set to 0.2, 0.2, and 0.6, respectively.The GEM and CEM are proposed to enhance the gradient and color features, respectively, which are constrained by the ℓ 2 loss function.
The L Lap and L Coarse can be given as follows where N is the number of pixels, Îl i (p) and I l i (p) are the i-th color channel of pixel p in the gradient map of low-light image and ground truth, respectively.Îc i (p) and I c i (p) represent the corresponding values on the color domain.
To finely fuse the gradient and coarse enhancement features, we use the structural similarity (SSIM) [59] as the constraint of the final enhancement to further refine the learning and mapping, i.e., where Îf i is the final fine enhancement image, and I i is the ground truth.ssim(•, •) calculates the structural similarity consisting of the aspects of color, structure, and contrast.

IV. EXPERIMENTS AND ANALYSIS
In this section, the experimental details are first introduced, which include datasets, evaluation metrics, and running platform.To clearly demonstrate the superiority of DDNet, qualitative and quantitative comparisons with several state-ofthe-art methods on standard and transportation-related datasets are then presented.To validate the rationality of the network, we conduct ablation experiments on each module.The experiments on running time, object detection, and scene segmentation are finally conducted, which demonstrate practical contributions of the proposed method to real-time UHD transportation surveillance in ITS.
A. Implementation Details 1) Datasets: It is commonly intractable to capture the realworld low/normal-light image pairs, which brings great challenges for data-driven image enhancement networks.Therefore, to improve the robustness of our DDNet to the complex natural environments, we utilize the real-captured and synthesized low-light images simultaneously.The most commonly used dataset is LOL [36], which contains 1500 pairs of lowlight images.Among them, 500 pairs are captured in real scenes, and the rest are synthesized with the adaption of the Y channel in YCbCr image through the interface from Adobe Light-room software 1 .
Besides LOL, to improve the enhancement effect on transportation surveillance scenes, we select 1000 clear outdoor images from the PASCAL VOC 2007 [60], COCO [61], as well as DETRAC [62] datasets and synthesize the lowlight images with another method, which multiplies a specific coefficient to all image pixels.The synthesized image L(x) can be generated by where C(x) is the clear image, and m(x) is the coefficient, which is a random number between 0.1 and 0.9.To prove the generalization ability of DDNet, besides evaluation on the LOL dataset, we also select representative low-light images from DICM [22], LIME [27], MEF [63], and TMDIED dataset for testing.
2) Evaluation Metrics: For low-light image enhancement, the evaluation metrics can be broadly classified into two groups: with or without the reference of ground truth.To conduct a more comprehensive analysis of the enhancement effectiveness, we first utilize the peak signal-to-noise ratio (PSNR) [64], structural similarity (SSIM) [59], and learned perceptual image patch similarity (LPIPS) [65] as our reference-based evaluation metrics.Additionally, we have incorporated the natural image quality evaluator (NIQE) [66] and perceptualbased image quality evaluator (PIQE) [67] as our no-reference metrics to quantitatively evaluate the performance of image enhancement across diverse low-light scenarios.It is noteworthy that larger values of PSNR and SSIM, as well as smaller values of NIQE, PIQE, and LPIPS, are indicative of better image quality. 1 The hyperparameters of Adobe Light-room software: Exposure (−5+5F ), Highlights (50 min {Y, 0.5} + 75), Shadows (−100 min {Z, 0.5}), Vibrance (−75 + 75F ), and Whites (16(5 − 5F )).It is noted that the X, Y , and Z are the variable obeys uniform random distribution U (0, 1), and F = X 2 .

TABLE II THE QUANTITATIVE COMPARISON OF NIQE BETWEEN OUR METHOD AND
THE STATE-OF-THE-ARTS ON DICM [22], LIME [27], MEF [63]  3) Running Platform: In the training period, the Adam optimizer is employed to suggest 100 epochs for training DDNet.The initial learning rate of the optimizer is 0.001, which is multiplied by 0.1 after every 20 epochs.Besides, the experimental network is trained and tested in a Python 3.7 environment using the PyTorch software package.The computational device is a PC with an AMD EPYC 7543 32-Core Processor CPU accelerated by an Nvidia A40 GPU, which has also been widely used in industrial-grade servers (e.g., Advantech SKY-6000 series and Thinkmate GPX servers).The proposed method could be thus easily extended to the higherlevel visual task (e.g., vehicle detection and tracking) in ITS.
1) Quantitative Analysis: We first compute objective evaluation metrics (PSNR, SSIM, NIQE, PIQE, and LPIPS) for 15 LOL test images.As presented in Table I, LIME outperforms the Retinex-based approach (i.e., NPE) overall, with credit to the noise reduction achieved by BM3D.Furthermore, CRM utilizes a camera response model, which is more effective in extracting information from low-light backgrounds.Zero yields unsatisfactory results in extremely low-light regions.Although DLN utilizes both local and global features of lowlight images and exhibits better generalization capabilities, the enhancement effect still falls short.Compared with the state-of-the-arts, our DDNet has an obvious advantage in the objective evaluation indicators with better stability, which is beneficial from the comprehensive guidance of both color and gradient domains.
We also made an objective evaluation of images on other public datasets, including DICM [22], LIME [27], MEF [63], and TMDIED, as illustrated in Tables II and III.Traditional methods are relatively uneven because they are challenging to deal with the nonuniform noise.The learning methods can receive satisfactory performance on both low-light enhancement and noise suppression, which thus performs better.In addition, due to the decomposition and reconstruction of double-domain features, DDNet can effectively recover the valuable information hidden in the dark with better robustness.Therefore, the enhanced image can better satisfy the complex transportation scenes and has the best quantitative evaluation metric.In Fig. 6, we present the quantitative evaluation results with the box plots.The first row is the NIQE evaluation results, and the second row is the PIQE evaluation results.The nonreferenced metrics indicate that our method has better image quality compared with the state-of-the-arts.
2) Visual Analysis: To compare the visual performance of our DDNet with the state-of-the-arts, we first analyze the visual differences in the standard LOL test dataset.As shown in Fig. 7, HE has demonstrated significant improvements in the brightness and contrast of low-light images with rapid computational efficiency.However, it lacks the capability to Fig. 7.The visual comparisons of different enhancement methods for three typical images from the LOL dataset [36].From left to right: (a) Low-light images, restored images, generated by (b) HE [16], (c) NPE [26], (d) LIME [27], (e) CRM [29], (f) Dong [31], (g) BIMEF [30], (h) DeHz [32], (i) RetinexNet [36], (j) MBLLEN [40], (k) KinD [35], (l) EnlightenGAN [46], (m) DLN [41], (n) Zero [22], (o) StableLLVE [42], (p) LLFlow [43], (q) MTRBNet [44], (r) SCI [23], (s) the proposed DDNet, and (t) Ground Truth, respectively.[16], (c) Dong [31], (d) EnlightenGAN [46], (e) DLN [41], (f) Zero [22], (g) RUAS [37], (h) LLFlow [43], (i) SCI [23], and (j) the proposed DDNet, respectively.suppress the noise and results in color distortion in local areas.NPE and BIMEF exhibit similar visual performance with poor contrast.Although LIME can eliminate noise in localized regions of the image, the BM3D algorithm struggles to distinguish between noise and texture information.CRM produces severely skewed color information in comparison to Retinex-based methods.RetinexNet demonstrates promising color extraction capabilities, but the edge feature is often severely compromised.MBLLEN and KinD can effectively remove unwanted noise information; however, the color naturalness is often unsatisfactory.EnlightenGAN, which employs a weakly-supervised architecture, can achieve lowlight enhancement, but it is ineffective in extremely dark areas.Zero is lightweight and efficient, but the enhancement effect is often compromised for the sake of computational speed.DLN suffers from noise interference, which limits its effectiveness.While the StableLLVE recovers a significant amount of valuable information from dark regions, the resulting image is often overexposed, leading to a gray-andwhite image with minimal contrast.SCI exhibits unsatisfactory performance when applied to extremely low-light images.By comparison, our proposed DDNet achieves a better balance between brightness enhancement and noise suppression in comparison to the current state-of-the-art methods.
To verify the robustness of the proposed method on lowlight transportation surveillance, we also collect UHD low-Fig.9.The qualitative results of object detection experiments on low-light transportation surveillance data, which select YOLOv5 and YOLOX [3] as the basic detection methods.From left to right: (a) Low-light images, the enhanced images of (b) KinD [35], (c) EnlightenGAN [46], (d) Zero [22], (e) RUAS [37], (f) LLFlow [43], (g) SCI [23], and (h) the proposed DDNet, respectively.It can be seen that DDNet is more beneficial for detection accuracy improvement due to the enhancement of edge features on the gradient domain.3) Running Time Comparisons: To prove the advantage of DDNet in terms of computational efficiency, we compare the performance on the running time with the objective indicators of the enhancement performance, as shown in Table .VII and Fig. 10.It is noted that the time over one second is shown in '-', which is not worth considering in UHD transportation surveillance due to the poor efficiency.With the outperforming enhancement performance, our method is able to enhance the 4K images over 35 FPS on the experimental platform, which is faster than most of the previous methods, meeting the requirements of UHD transportation surveillance.Although Zero [22] and SCI [23] are faster, their enhancement effect is much worse than ours.[68], which selects DAFormer [69] as the basic segmentation method.The first and third rows are raw images, and the second and fourth rows are the visualized results of scene segmentation.From left-top to right-bottom: (a) Low-light image, and the segmentation results on the enhanced images of (b) HE [16], (c) RetinexNet [36], (d) KinD [35], (e) EnlightenGAN [46], (f) Zero [22], (g) RUAS [37], (h) SCI [23], (i) the proposed DDNet, and (j) Ground Truth, respectively.It is noted that the employed DAFormer is pre-trained on cityscapes dataset.Compared with other methods, our DDNet enables the model pre-trained on normal-light images performing better under low-light conditions.

C. Ablation Study
In this section, we attempt to verify the necessities of ScCAM and double-domain guidance.The 15 images from the LOL test dataset are utilized as the basic reference.According to the metrics provided in Table IV, the employment of the spatial attention module (SAM) and standard convolution module (SCM) significantly improves the enhancement performance.When both SAM and SCM are employed, PSNR, SSIM, and LPIPS performance are improved by 1.38, 0.015, and 0.019, respectively.The experimental results about double-domain guidance are illustrated in Table V.The objective evaluation performance is the worst when the information of both color and gradient domains is not enhanced.The employment of coarse enhancement module (CEM) and LoG-based gradient enhancement module (GEM) significantly improves the enhancement performance.When both CEM and GEM are employed, PSNR, SSIM, and LPIPS performance are improved by 0.85, 0.009, and 0.019, respectively.
In addition, to verify the balance between the constraint on different domains, we conduct the ablation experiment on the design of loss function.Specifically, we set the weight of each loss differently in the training period.Table VI presents the quantitative result.Firstly, we fix the weight ratio between ω 1 and ω 2 and adjust the ratio between them and ω 3 .We then fix ω 3 as the obtained best result and adjust the ratio between ω 1 and ω 2 .The ablation experiment indicates that current weights can supervise the network better with more satisfactory enhancement results.

D. Improvement of Object Detection in ITS
In order to further demonstrate the practical benefits of our proposed DDNet in the domain of transportation surveillance, we have employed the YOLOv5 and YOLOX [3] to detect objects under low-light conditions, and compare the detection results with or without the application of image enhancement methods.To conduct our analysis, we have selected experimental images from the COCO [61] and ExDARK [70] datasets.Specifically, we initially selected 1500 transportationrelated images from the COCO dataset for the training of our detection networks.Subsequently, we performed evaluation tests on the ExDark dataset.As depicted in Fig. 9, the detection networks exhibit poor performance in dark transportation scenes, often failing to achieve accurate object detection owing to the low contrast and vague edge features.However, following the application of enhancement methods, the detection accuracy is significantly increased.Furthermore, in comparison to state-of-the-art methods, the images enhanced by DDNet demonstrate superior performance, primarily due to the comprehensive recovery of both color and gradient features.These findings provide the evidence that DDNet holds practical benefits for low-light transportation surveillance tasks, and is beneficial for higher-level visual tasks in ITS when operating under low-light environments.

E. Improvement of Scene Segmentation in ITS
The scene segmentation is also a typical higher-lever visual task in transportation surveillance.To demonstrate the practical improvement of our method for scene segmentation, we conducted the comparison experiment on ACDC [68], a real-captured transportation-related dataset under adverse visual conditions, including low-light, hazy, rainy, etc.We employed the DAFormer [69] with the model weight pretrained on cityscapes dataset, which mainly consists of normallight images.Fig. 11 presents the visual results.As can be observed, in low-light environments, the edge features of objects appear vague, and the color brightness is low, making it challenging for segmentation methods to accurately classify the pixels.Additionally, accurately classifying small objects, such as distant pedestrians, is difficult owing to the low contrast.Following the application of low-light image enhancement method, the visibility of low-light scenes is significantly improved.However, most state-of-the-art methods tend to suffer from noise interference and color distortion, leading to erroneous segmentation.Furthermore, it is still challenging to accurately segment small objects due to the vague edge features.In particular, our DDNet effectively recovers the low-light image with better color naturalness and clear edge features, resulting in more accurate classification of challenging pixels in the enhanced images.Overall, our method enables models pre-trained on normal-light images to perform better in low-light conditions.
V. CONCLUSION AND FUTURE PERSPECTIVES This paper proposes a double domain guided real-time lowlight image enhancement network (DDNet) for UHD transportation surveillance.Specifically, we suggest the encoderdecoder structure as the main architecture of the learning network, and the original task is divided into two subtasks (i.e., coarse enhancement and Laplacian of Gaussian (LoG)based gradient enhancement).The coarse enhancement module (CEM) and LoG-based gradient enhancement module (GEM) are proposed and embedded in the encoder-decoder structure, which assist the network to efficiently enhance the color and gradient features under the constraint of the proposed joint loss function.Through the decomposition and reconstruction of both color and gradient features, our DDNet can perceive the detailed information concealed by the dark background with greater precision.Image quality and running time experiments on standard datasets and UHD low-light images in transportation surveillance demonstrate that our DDNet satisfies the requirement of real-time transportation surveillance.Besides, compared with the state-of-the-arts, the object detection and segmentation experiments prove that our method contributes more to higher-level image analysis tasks under low-light environments in ITS.It is mainly beneficial from the guidance of both color and gradient domains.
In conclusion, our work presents a real-time low-light image enhancement method for UHD transportation surveillance in ITS.Although our method obtains promising results in this study, it still faces several challenges, e.g., inadequate realcaptured dataset and relative large model size.The further improvement of our method includes follows.
• To overcome the inadequate real-captured dataset, the semi-supervised architecture and generative adversarial networks (GAN) will be considered to reduce the dependence of our DDNet on paired datasets.• Currently, although the proposed method achieves the real-time processing for transportation surveillance, the model size is not lightweight enough.In the future, we will consider to employ the pruning technology [71] to build more lightweight models.• To overcome the blurred appearance features of the fastmoving objects in real-time transportation surveillance (e.g., the vehicles on the expressways), we will consider to utilize the multi-task learning to achieve image deblurring and enhancement simultaneously.

Fig. 1 .
Fig. 1.The illustration of our DDNet for real-time low-light transportation surveillance under different practical scenes.

Fig. 2 .
Fig. 2. The comparison between the low-light enhancement results on UHD images in transportation surveillance.From left to right: (a) raw 4K low-light image, (b) enhanced result after resizing the image to 1080P, and (c) 4K image enhancement.It is obvious that the resizing operation causes significant detail loss on UHD images.

•
We propose a double domain guided low-light image enhancement network (DDNet), aided by Laplacian of Gaussian (LoG)-based gradient information.It effectively improves the image quality captured under low-light conditions with keeping most details on both color and gradient domains.• We design the LoG-based gradient enhancement module (GEM) and the coarse enhancement module (CEM) embedded in the encoder-decoder structure, which enhances the color and gradient domain features effectively.Besides, a joint loss function is proposed to constrain the enhancement of different domains separately.• The quantitative and qualitative evaluation experiments compared with the state-of-the-arts are conducted on standard and transportation-related datasets.Experimental results show that our DDNet significantly improves the enhancement performance.Besides, the running time satisfies the requirements of real-time UHD transportation surveillance.The object detection and scene segmentation experiments indicate the improvement of our DDNet for higher-level visual tasks in ITS.The rest of this paper is organized as follows.The recent studies on low-light image enhancement are reviewed in Section II.In Section III, We introduce the details of our DDNet.Numerous experiments on standard and transportation-related datasets have been implemented to evaluate the enhancement performance and practical benefits for transportation surveillance in Section IV.Conclusion and future perspectives are finally given in Section V.

Fig. 3 .
Fig. 3.The examples of the enhanced results on gradient domain, from left to right: (a) low-light images, (b) LoG-based gradient feature map, (c) GEMenhanced gradient map, and (d) final enhanced images.

Fig. 4 .Fig. 5 .
Fig. 4. The flowchart of our double domain guided low-light image enhancement network.The coarse enhancement module (CEM) and LoG-based gradient enhancement module (GEM) are embedded in the encoder-decoder structure to improve the image quality on separate domains.Moreover, the outputs of diversified decoders are constrained by the proposed joint loss function respectively.

Fig. 10 .
Fig. 10.The trade-off between the running time, NIQR, and PSNR on 4K images (3840 × 2160 pixels).The results show the superiority of our DDNet among the start-of-the-art methods.

Fig. 11 .
Fig.11.The detailed results of segmentation experiments on the ACDC dataset[68], which selects DAFormer[69] as the basic segmentation method.The first and third rows are raw images, and the second and fourth rows are the visualized results of scene segmentation.From left-top to right-bottom: (a) Low-light image, and the segmentation results on the enhanced images of (b) HE[16], (c) RetinexNet[36], (d) KinD[35], (e) EnlightenGAN[46], (f) Zero[22], (g) RUAS[37], (h) SCI[23], (i) the proposed DDNet, and (j) Ground Truth, respectively.It is noted that the employed DAFormer is pre-trained on cityscapes dataset.Compared with other methods, our DDNet enables the model pre-trained on normal-light images performing better under low-light conditions.

TABLE I THE
QUANTITATIVE COMPARISON BETWEEN OUR METHOD AND STATE-OF-THE-ARTS ON THE LOL TEST DATASET [36].THE BEST THREE RESULTS ARE HIGHLIGHTED IN RED, BLUE, AND GREEN COLORS.↑ AND ↓ REPRESENT THAT HIGHER OR LOWER VALUES INDICATE BETTER RESULTS, RESPECTIVELY.
, AND TMDIED DATASET.THE BEST THREE RESULTS ARE HIGHLIGHTED IN RED, BLUE, AND GREEN COLORS.

TABLE III THE
[63]TITATIVE COMPARISON OF PIQE BETWEEN OUR METHOD AND THE STATE-OF-THE-ARTS ON DICM[22], LIME[27], MEF[63], AND TMDIED DATASET.THE BEST THREE RESULTS ARE HIGHLIGHTED IN RED, BLUE, AND GREEN COLORS.

TABLE IV THE
ABLATION EXPERIMENTS ON THE SAM AND SCM.THE RESULTS ARE SHOWN IN PSNR, SSIM, AND LPIPS ON THE 15 IMAGES FROM THE LOL TEST DATASET [36].↑ AND ↓ REPRESENT THAT HIGHER OR LOWER VALUES INDICATE BETTER RESULTS, RESPECTIVELY.

TABLE V THE
ABLATION EXPERIMENTS ON THE GEM AND CEM.THE RESULTS ARE SHOWN IN PSNR, SSIM, AND LPIPS ON THE 15 IMAGES FROM THE LOL TEST DATASET [36].↑ AND ↓ REPRESENT THAT HIGHER OR LOWER VALUES INDICATE BETTER RESULTS, RESPECTIVELY.

TABLE VI THE
ABLATION EXPERIMENTS ON THE WEIGHTS OF LOSS FUNCTIONS.THE RESULTS ARE SHOWN IN PSNR, SSIM, AND LPIPS ON THE 15 IMAGES FROM THE LOL TEST DATASET [36].↑ AND ↓ REPRESENT THAT HIGHER OR LOWER VALUES INDICATE BETTER RESULTS, RESPECTIVELY.

TABLE VII THE
COMPARISON OF RUNNING TIME (UNIT: SECOND) BETWEEN THE DDNET AND OTHER LOW-LIGHT IMAGE ENHANCEMENT METHODS.