Research on Car License Plate Recognition Based on Improved YOLOv5m and LPRNet

The application of license plate recognition technology is becoming more and more extensive. In view of the current practical requirements for the recognition accuracy and real-time performance of license plate recognition system in complex scenes, the existing target detection methods and license plate recognition methods are studied, and a car license plate recognition method based on improved YOLOv5m and LPRNet model is proposed. On the basis of studying the YOLOv5m algorithm and the image features of the car license plate, the YOLOv5m algorithm is improved from three aspects: the K-means++ algorithm is used to improve the matching degree between the anchor frame and the detection target, the DIOU loss function is used to improve the NMS method, and the feature map with $20\times 20$ is removed to reduce the number of detection layers. A lightweight LPRNet network is used to realize license plate character recognition without character segmentation. Combining the improved YOLOv5m algorithm with LPRNet network, a license plate recognition system based on IYOLOv5m-LPRNet model is designed. The experimental results show that the average recognition accuracy of license plates in front, tilt, night and strong light interference scenes is more than 98%; Compared with the models of YOLOv3-LPRNet, YOLOv4-LPRNet, YOLOv5s-LPRNet and YOLOv5m-LPRNet, the recognition accuracy and recall rate of this method are improved, reaching 99.49% and 98.79% respectively; The mAP of this method is also the highest, reaching 98.56%; In terms of recognition speed, this method is also faster than the other four methods, and the number of pictures processed per second is increased by 5 compared with the YOLOv5m-LPRNet model. Therefore, the improved license plate recognition method in this paper performs well in robustness and speed.


21
With the continuous advancement of smart city construc-22 tion, intelligent transportation system has developed rapidly, 23 and license plate recognition system is a necessary part of  The associate editor coordinating the review of this manuscript and approving it for publication was Wei Liu. At present, there are two main types of license plate 32 recognition technologies. One is to use traditional image 33 processing technology, and the other is to use deep learn-34 ing methods. Deep learning methods are more robust than 35 traditional methods and have been widely studied [1], [2]. 36 Shi and Zhang [3] proposed to use BGRU to optimize 37 the license plate recognition network model, and com-38 bined with the improved YOLOv3 network to locate the 39 license plate. This method has good robustness. In order to 40 solve the problem of fuzzy license plate character recog-41 nition, Zhang et al. [4] proposed an license plate char-42 acter recognition algorithm without character segmentation 43 based on improved CRNN+CTC(Convolutional Recurrent 44 Neural Network+Connectionist Temporal Classification). 45 The algorithm has good robustness and fast operation speed. 79 Fu and Qiu [11] used the improved YOLOv3 network struc-80 ture to identify the characters of the license plate. The net-81 work is composed of seven full connection layers to predict 82 the seven characters of the license plate, and each full con-83 nection layer accurately predicts the position and category of 84 one character.

85
The existing license plate recognition methods have some 86 shortcomings, such as low recognition accuracy and poor 87 real-time performance, and the application of YOLOv5 algo-88 rithm in license plate recognition is rare. Therefore, this paper 89 studies the application of YOLOv5m algorithm in license 90 plate recognition, improves YOLOv5m algorithm accord-91 ing to the characteristics of license plate objects, and pro-92 poses a license plate recognition method based on improved 93 YOLOv5m and LPRNet combined with LPRNet license plate 94 recognition network. Taking the blue license plate of the 95 car as the object, the recognition test is carried out, and the 96 recognition performance is quantitatively evaluated.

II. IMPROVEMENT OF YOLOv5m ALGORITHM
YOLOv5m is one of the models of YOLOv5 [12], which 100 is characterized in that the depth of each CSP module 101 is CSP1_2, CSP1_6, CSP1_6, CSP2_2, CSP2_2, CSP2_2, 102 CSP2_2, CSP2_2. In Focus and CBS_1, CBS_2, CBS_3, 103 CBS_4 the model widths (i.e. the number of convolution ker-104 nels) of each stage are 48, 96, 192, 384 and 768 respectively. 105 YOLOv5m has more depth and width than YOLOv5s. The 106 structure is slightly complex, but the detection accuracy is 107 higher. YOLOv5m network is composed of four parts: Input, 108 Backbone, Neck and Prediction. Its structural framework is 109 shown in Figure 1.  Its advantage is that more small targets are added through 116 random scaling, which improves the robustness.     K-means algorithm is a random allocation of initial clustering 171 centers, which is not suitable for clustering license plate 172 data sets. Therefore, in order to improve the accuracy of 173 license plate detection, K-means++ algorithm [15] is used 174 for multidimensional clustering of label data sets, which can 175 effectively reduce the time for the model to find the anchor 176 process. In order to make the anchor frame and detection 178 frame have a large intersection, so as to select the best a priori 179 frame, the expression of the algorithm is: where, IOU represents the intersection over union of the 182 prediction frame and the real frame.

208
DIOU loss function is: where, b is the predicted target box, b gt is the real target 211 box, ρ 2 (b, b gt ) is the distance between the center point of the 212 predicted target box and the real target box, and l is the dis-213 tance between the diagonals of the minimum circumscribed 214 rectangle of the two boxes.

215
Assuming that the network model detects a candidate box 216 set as H i , for the prediction target box M with the highest 217 category confidence, the p i of DIOU-NMS update formula is 218 defined as: where, i is the number of anchor boxes corresponding to each 221 grid, p i is the classification score of different category targets, 222 IOU is the intersection and union ratio, and R DIOU (M,H i ) 223 is the value of R DIOU about M and H i , ε is manually set 224 threshold for NMS operation.

225
DIOU-NMS method takes into account the distance, over-226 lapping area and aspect ratio between the predicted target 227 frame and the real target frame. The farther the distance 228 between the center points of the two rectangular frames, it is 229 determined that they may be located on different detection 230 objects. Combining the IOU of the two rectangular boxes 231 with the distance between the center point, on the one hand, 232 optimizes the IOU loss, on the other hand, guides the learning 233 of the center point, and can more accurately return to the 234 prediction target box [16].  Dropout layers set to prevent over fitting. The network input 263 is 94×24 image, the output layer is a convolution layer, and its 264 structure is shown in Table 1. Each basic module contains four 265 convolution layers, one input layer and one feature output 266 layer. Its structure is shown in Table 2.   Experimental configuration: Intel CPU R Core TM I7-108750 327 @2.60GHz×8, The GPU is GeForce GTX 3060 12GB, the 328 RAM size is 16GB, the operating system is Windows 10, 329 the development environment is PyCharm 2021, the frame-330 work is Pytorch 1.7, the development language is Python 3.8, 331 CUDA 11.6, CuDNN 7.6.

332
Superparameter Settings: the optimizer is Adam, the initial 333 learning rate is 0.01, the learning rate attenuation coefficient 334 is 0.1, batch_size is 32 and the weight attenuation coefficient 335 is 0.0005. Because the larger the epoch value is, the more 336 stable the training model is, the higher the accuracy is and the 337 faster the convergence is. Therefore, the epoch is determined 338 to be 300 through experiments.

340
In order to effectively evaluate the robustness of the model 341 and the accuracy of license plate recognition, the perfor-342 mance of the model is evaluated by five indicators: Preci-343 sion, Recall, F score , mean of average precision(mAP) and 344 FPS. F score comprehensively reflects the performance of 345 the model. FPS represents the number of pictures pro-346 cessed per second, the larger the value, the faster the 347 operation speed. The closer mAP is to 1, the better the 348 overall performance of the model. The formula for defin-349 ing the evaluation index by using the confusion matrix 350 is [19]:      Table 4. 419 It can be seen from Table 4 that the Precision of the 420 method in this paper reaches 99.49%, which is 13.75%, 421 11.5%, 6.57% and 5.29% higher than the Precision of the 422 YOLOv3-LPRNet, YOLOv4-LPRNet, YOLOv5s-LPRNet 423 and YOLOv5m-LPRNet models respectively; Recall reaches 424 98.79%, which is 19.66%, 17.17%, 4.14% and 3.59% higher 425 than the Recall of YOLOv3-LPRNet, YOLOv4-LPRNet, 426 YOLOv5s-LPRNet and YOLOv5m-LPRNet models respec-427 tively.