SETR-YOLOv5n: A Lightweight Low-Light Lane Curvature Detection Method Based on Fractional-Order Fusion Model

End-to-end automatic driving requires the identification of lane curvature. We proposed a lightweight detection method for the low-light lane curvature based on the Fractional-Order Fusion Model (FFM) to assure real-time performance and increase the reliability of automatic driving in low-light conditions. To begin, the FFM method is introduced to enhance images with low average brightness, fuzzy detail, and a high signal-to-noise ratio. Under low-light conditions, these images cannot clearly express information (such as rainy, snowy, foggy, and other harsh external environments). Then, aiming at the problems of complex network structure, high hardware configuration required for training, and low transmission Frames Per Second (FPS) of real-time detection in the previously proposed YOLOv5, the SETR-C3Block module is proposed. The YOLOv5n is improved by optimizing the configuration of the target detector head and the network’s structure, which solves the problems of low efficiency and redundancy parameters in feature extraction in the network. According to the experimental results on the lane curvature dataset, the mAP@.5:.95 of SETR-YOLOv5n is 87.22%, the transmission FPS of real-time detection is 70.4, and the number of model parameters is only 1.8M. It shows that the SETR-YOLOv5n can meet the lightweight and accuracy requirements of target detection by the mobile terminal or embedded device.

image illumination and proposed the fusion-based image 99 enhancement method [23], which effectively improved the 100 image's illumination. However, the method's goal is limited 101 to poorly lighted images, not degraded images or of poor 102 quality. Similarly, Zhang et al. [24] proposed a dual illumi-103 nation estimation-based automatic image exposure correction 104 system. The multi-exposure image fusion approach trans-105 forms an image with both under-exposed and over-exposed 106 parts into a well-exposed image overall. Despite its posi-107 tive findings, this approach has not been proved to preserve 108 the fine details of medical images necessary for diagnosis. 109 Similarly, more image-enhancing methods based on nonlin-110 ear optimization with many constraints have been proposed. 111 Zhou et al. [25] proposed a new optimization strategy for 112 managing gamma values to improve image contrast. A typical 113 gamma correction is used in the enhancing procedure. Impos-114 ing many optimization constraints increases the computing 115 complexity of this technique. As a result, we introduced the 116 FFM-based model in this work for upgrading various lane 117 curvature images by enhancing the dark area while keeping 118 the bright area of the input image. 119 Compared with other methods, end-to-end automatic driv-120 ing dramatically reduces the hardware cost and the research 121 difficulty [26], This method can also obtain universality in 122 different scenarios with the help of dataset diversity. How-123 ever, at present, end-to-end automatic driving still lacks prac-124 ticability and reliability and is dependent on the performance 125 of the onboard computer [27]. Therefore, the application of 126 this technology has high requirements for the lightweight 127 and detection accuracy of the target detection method [28]. 128 YOLOv5 is widely used in CV tasks such as target detec-129 tion and has achieved great success. However, due to the 130 limitation of storage space and hardware performance, the 131 storage and calculation of the YOLOv5 network model on 132 vehicle equipment is still a considerable challenge. In 2021, 133 YOLOv5 released the 6.0 Version, and this new version inte-134 grates many new functions based on YOLOv5 5.0 Version, 135 streamlines and fine-tunes the network structure, and intro-136 duces the YOLOv5n, which maintains the depth of YOLOv5s 137 and reduces the width parameter from 0.5 to 0.25 After this 138 operation, the total parameters are reduced by 75%, from 139 7.3M to 1.8M. The number of Floating-Point Operations Per 140 Second (FLOPs) is reduced by 72%, from 17.0G to 4.7G, 141 which is very suitable for the onboard computer environment. 142 However, YOLOv5n still has some problems, such as low 143 detection accuracy and poor convergence. At the same time, 144 under the condition of low light, the overall clarity of the 145 image is low, the edge is fuzzy, and the details and texture 146 structure of the image are not well preserved, which has 147 seriously affected YOLOv5n the detection effect. As a result, 148 it is essential to integrate the fractional-order picture enhance-149 ment method with the lane curvature detection method based 150 on improved YOLOv5n to increase the real-time and relia-151 bility of automatic driving while lowering the hardware and 152 software costs of automatic driving.   The main contents of each section of this paper are as fol-214 lows: Section II describes the composition and background 215 of SETR-YOLOv5n. In Section III, the effectiveness of the 216 method is illustrated by a series of comparative experi-217 ments. In Section IV, the influence of the SETR-YOLOv5n 218 component on performance and the future application of 219 SETR-YOLOv5n in automatic driving are discussed. Finally, 220 Section V summarizes the full text and draws a conclusion. 221 Image is an important way to record and transmit informa-226 tion. Due to the influence of many reasons, the definition 227 and quality of the image will gradually decline in image 228 transmission, which leads to some difficulties when people 229 transmit pictures for analysis many times. Therefore, image 230 enhancement has become an important part of image process-231 ing in this case. The image sensor is the main source of input 232 datasets for various optical imaging devices, computer image 233 processing systems, and auto-drive systems. In the process of 234 automatic driving, due to the influence of weather, exposure 235 conditions, and other factors, the image sensor's visibility 236 and contrast of the image output will be reduced, resulting 237 in poor image quality and affecting the effect of automatic 238 driving. Low-light image has low contrast, concentrated gray 239 level range, and low image quality, which seriously affects the 240 effect of target detection [30]. Therefore, improving the low-241 light image quality is significant in practical applications.

242
To model the human visual perception system, the Retinex 243 hypothesis is developed [31]. The image perceived by the 244 observer is primarily regarded as an image with multiplicative 245 noise in this theory. As a result, the illumination map is 246 a multiplicative noise with a gradual transformation that is 247 generally uniform. The Retinex theory works by estimating 248 the noise in distinct pixels in a picture and then removing the 249 noise to get the original reflection image. Furthermore, the 250 lighting schemes of the three channels of the color image 251 must be assumed to be identical. The theory argues that 252 the observed color image may be broken down into two 253 components: reflectivity and illumination. The following is 254 a representation of the mathematical expression: where S and R are the captured image and the reflectance, 257 respectively. L represents illumination, and • represents 258 element-wise multiplication.

259
Fractional-order calculus was born in 1695. Leibniz, a Ger-260 man mathematician, thought about what the expression 261 meaning is when the derivative-order becomes 1 2 . The evo-262 lution of fractional-order calculus theory is a topic of nearly 263 VOLUME 10, 2022 exclusive interest for a few mathematicians and theoretical 264 physicists, with a 300-year history [32]. Compared with 265 integer-order calculus, fractional-order calculus expands the 266 order of operation [33]. Although implementation complexity 267 is relatively high, it has a higher degree of freedom and flex- where the gamma function is defined as In recent years, the research and application of fractional-  Dai et al. [29] Proposed the FFM method using fractional-295 order calculus. Compared with integer-order calculus, it can 296 better preserve the texture details of the image and suppress 297 the noise. The energy function is modeled as: 3) L and S should be close enough [35].

309
In general, we set the optimization objectives as: where α, β and γ are the parameters. As shown in Figure 1, the final enhancement result of the 314 image is obtained through the fusion process. The fusion 315 method can be expressed as follows: where W i is the weight of the i-th image. The principle block 318 diagram of the FFM is shown in Figure 1.

319
The weight formula is as follows: where I is the corresponding illumination.

324
According to the article [29], the n-th R can be got: It is identical to the R sub-problem, the n-th L can be got: 329

333
This research employs Information Entropy (IE) to objec-334 tively evaluate the improved picture in order to understand 335 the small variations in the image more logically. Entropy is an 336 estimate of the amount of information that will be available 337 before the result is known. Information is the information 338 provided by a given event, and entropy is an estimate of the 339 amount of information that will be available before the result 340 is known. The average quantity of information in a picture is 341 reflected by image information entropy, which is a statistical 342 form of feature. The image's IE is then represented as follows: 343 where a i is the random output signal of the image. Seven low-light images are tested to validate the per-

346
formance of the suggested model, as shown in Figure 2.

347
In visual comparison, it can be seen that FFM can signifi-348 cantly enhance low visibility images.

349
As shown in Table 1, it can be seen that compared with  where represents the whole image domain, S L (x) represents 368 the similarity between the original image x and the recon-369 structed imagex, and PC m (x) represents the maximum value 370 of phase consistency.

371
As shown in YOLOv5 series network models include 5 network mod-376 els of different sizes: s, m, l, x, and n. Among them, the 377 YOLOv5n network model is the latest YOLOv5 series net-378 work model [38]. The YOLOv5n network model, on the one 379 hand, has a high detection accuracy and a quick reasoning 380 speed. The weight file of the YOLOv5n network model, 381 on the other hand, is modest, about 75% less than that of 382 YOLOv5s, indicating that YOLOv5n is ideal for deployment 383 to embedded devices for real-time detection. Because the 384

414
In each convolution process, some complex interference 415 information will inevitably be distributed on some chan- different weights to each channel information, and screen 421 the channel information according to the weight, which can 422 effectively mitigate the impact of interference information 423 in the complex automatic driving environment. As shown in 424 Figure 3, The typical representative of channel attention is 425 Squeeze-and-Excitation Networks (SENet).

426
In Figure 3, the input characteristic diagram U, U has C 427 channels, and the space size of each channel is H × W . 428 the global average pool is performed for each channel, and 429 the calculation formula of channel weight Z C is shown in 430 formula: where output Z C is a one-dimensional array with length c, 433 representing the weight obtained by compressing the channel. 434 The activation function is showm as formula: where: the dimension of S C is 1 × 1 × C, and c corresponds 437 to the generated channel attention weight. The channel atten-438 tion weights need to be obtained through the previous full 439 connected hierarchy and nonlinear learning; The dimension 440 of W 1 is C r × C, the dimension of W 2 is C × C r , and r is 441 the shrinkage coefficient. The shrinkage layer is composed 442 of two fully connected layers. 443 Finally, the input channel is weighted and adjusted. The 444 channel attention weighting formula is: attention is headh, which is written as: and V h are given as follows:  Any C3 module of YOLOv5n can be replaced by the 502 SETR-C3 module proposed by us, which can effectively 503 obtain deeper feature information and rich semantic informa-504 tion. However, in order to introduce the attention mechanism 505 without changing the backbone network so that the pre-506 trained weight in the public dataset can be used for migration 507 learning and reduce the training time of the network, this 508 design replaces some C3 modules in the YOLOv5n network 509 module with the SETR-C3 module (see Figure 5)   The running equipment of this experiment is configured as 583 follows: NVIDIA 3070ti graphics card, AMD 5600x CPU, 584 32GB Corsair memory.

586
In order to further verify the detection performance of the 587 method, SETR-YOLOv5n is compared with the YOLOv5n 588 and several representatives YOLOv5 and their improved 589 methods YOLOv5x, YOLOv5s, Improved YOLOv5m, 590 Improved YOLOv5l, and YOLOv5-Liteg in terms of accu-591 racy and speed. Table 4 and Figure 10 show that the SETR-592 YOLOv5n detection approach provides the best precision. 593 At the same time, the performance of SETR-YOLOv5n is 594 significantly better than the other object detection meth-595 ods. Compared with the baseline YOLOv5n, the SETR-596 YOLOv5n method improves the two metrics of precision and 597 mAP@0.5:0.95 by 6.29% and 15.72%. Overall, our proposed 598 model achieves the best performance among all models and 599 significantly outperforms the lightweight object detection 600 model YOLOv5-Liteg, with a 3.15 FPS improvement, 3.5MB 601 reduction in parameters, and a 20.06% improvement in 602 mAP@0.5:0.95, which reflects the excellent performance of 603 our lightweight method when performing real-time object 604 detection tasks. However, due to the addition of the SETR-C3 605 module, which increases the calculation load, and the FFM 606 method is used as image preprocessing, the FPS of SETR-607 YOLOv5n is slightly lower than YOLOv5n.

608
As shown in Figure 9, several images are selected to com-609 pare the YOLOv5n with the improved method in this paper. 610 It can be seen that both methods can effectively detect the 611 lane curvature target. However, the positioning accuracy of 612 YOLOv5n for the lane curvature target with sparse features, 613 low visibility, and unclear targets is poor, which is easy to 614 cause missed detection. The lane curvature target is easy to 615 be affected by the noise in the image, resulting in a high 616 false alarm rate. The improved method reduces the missed 617 detection rate of lane curvature and improves the detec-618 tion accuracy. The reason is that the FFM is introduced to 619 FIGURE 8. The network module of the SETR-YOLOv5n. Compared to the original YOLOv5n, there are three improvements in the architecture. First, the detection head with red background is simplified, the model parameters are reduced, and the detection efficiency is improved. Second, it combines shallow semantic features with deep semantic features, and the FFM is introduced to preprocess the dataset pictures. Third, the C3 module is improved to improve its ability for feature extraction.

654
According to the performance comparison and analysis 655 of the above models, the number of FLOPs and parameters 656 of YOLOv5ntypeB and YOLOv5ntypeC network models 657 are less than YOLOv5n, which will save a lot of memory 658 space. Although YOLOv5ntypeB has slightly more param-659 eters than YOLOv5ntypeC, its precision, recall, mAP@0.5, 660 and mAP@0.5:0.95 are higher than YOLOv5ntypeC. It can 661 be concluded that the performance of the YOLOv5ntypeB 662 network model is better than that of the YOLOv5n and 663 YOLOv5ntypeC network models.

664
To evaluate the network performance, refer to the parame-665 ters and FLOPs in the training process. The test results of the 666 three models are shown in Table 5. The test results show that 667 the parameters of YOLOv5n are relatively large, as high as 668 In order to verify the impact of each improved method 679 on the YOLOv5n target detection method, we evaluated 680 each method on the lane curvature dataset data set and 681 designed seven groups of ablation experiments. Using con-682 sistent experimental settings, the impact of different methods 683 on the YOLOv5n target detection method is shown in Table 6. 684 Among them, model A integrates the SENet module on the 685 basis of YOLOv5n, model B integrates the MHSA module on 686 can be controlled directly according to the input image. It is an 741 artificial intelligence method closer to human thinking mode. 742 In the practical application of autonomous driving systems, 743 the deep reinforcement learning method is widely used by 744 researchers, such as the method based on the Actor-Critic 745 frame (see Figure 12). This method can make driving behav-746 ior decisions according to the driving environment. However, 747 due to the high training requirements of deep reinforcement 748 learning, the depth of environment perception neural net-749 work of deep reinforcement learning cannot be designed too 750 deep, resulting in the limited driving environment state that 751 the reinforcement learning method can perceive under low-752 light conditions, and it is unable to make driving behavior 753 decisions usually. Currently, the target detection method has 754 been widely used as an auxiliary reinforcement learning task 755 where he is currently an Engineer. His research 966 interests include modeling, optimization, and con-967 trol for complex industrial processes, intelligent 968 control, and deep learning.