Multinational License Plate Recognition Using Generalized Character Sequence Detection

Automatic license plate recognition (ALPR) is generally considered a solved problem in the computer vision community. However, most of the current works on ALPR are designed to work on license plates (LP) from specific countries and use country-specific information which limits their practical applicability. Such ALPR systems require changes in the algorithm to work on other countries’ LPs. Previous works on multinational LP recognition are tested on datasets from various countries that share the same LP layout. To address this issue, this study presents a deep ALPR system designed to be applicable to multinational LPs. The proposed approach consists of three main steps– LP detection, unified character recognition, and multinational LP layout detection. The system is mainly based on the you only look once (YOLO) networks. Particularly, tiny YOLOv3 was used for the first step whereas the second step uses YOLOv3-SPP– a version of YOLOv3 that consists of the spatial pyramid pooling (SPP) block. The localized LP is fed into YOLOv3-SPP for character recognition. The character recognition network returns the bounding boxes of the predicted characters and does not provide information about the sequence of the LP number. A LP number with an incorrect sequence is considered wrong. Thus, to extract the correct sequence, we propose a layout detection algorithm that can extract the correct sequence of LP numbers from multinational LPs. We collected our own Korean car plate (KarPlate) dataset and made it publicly available. The proposed system was evaluated on LP datasets from five countries which include South Korea, Taiwan, Greece, USA, and Croatia. In addition, a small dataset containing LPs from 17 countries was collected to evaluate the effectiveness of the multinational LP layout detection algorithm. The proposed ALPR system consumes about 42 ms per image on average for extracting LP number. Experimental results demonstrate the effectiveness of our ALPR system.


I. INTRODUCTION
Automatic license plate recognition (ALPR) has huge applicability in various applications such as stolen vehicle identification, parking lot management, electronic toll collection, traffic flow monitoring, etc. This topic has been extensively researched by researchers worldwide to improve performance of ALPR in real-world scenarios. Current ALPR algorithms achieve exemplary performance in controlled environments; however, performance is decreased when dealing with complex scenes. The current challenges in ALPR includes multinational ALPR, dealing with uncontrolled conditions such as uneven illumination, weather (snow, fog, rain, etc.), image distortion, image blurring, occlusions, etc. Multinational ALPR is a challenging issue due to the differ-The associate editor coordinating the review of this manuscript and approving it for publication was Massimo Cafaro . ences in license plate (LP) layouts among different countries and the non-availability of public multinational LP datasets. Different LP layouts and lack of publicly available multicountry datasets are responsible for the meager amount of research work conducted on the problem of multinational ALPR. Few research works [6], [15] have proposed multinational ALPR systems that claim to work on LPs from different countries. However, these methods were validated on datasets from various countries that shared a common LP layout. Based on our analyses, most of the LPs worldwide can be broadly classified into single line or double line LP. The datasets used in previous works [6], [15] contain only single line LPs and may require additional steps to recognize double line LPs.
A typical ALPR pipeline commonly consists of the following three steps: license plate detection, character segmentation, and character recognition. License Plate detection is VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ responsible for finding the location of the LP in a given image. Character segmentation is responsible for segmenting individual characters from the detected LP whereas the role of character recognition is to classify each of the segmented characters. The first two steps are crucial for correct ALPR since it directly affects the character recognition stage. Failure to localize the LP in the first stage leads to failure in the subsequent stages. In order to overcome this issue, some literature merges the character segmentation and recognition steps as object recognition step. Few recent publications proposed end-to-end deep learning structures to completely remove inter-dependency among the three stages. However, as mentioned in the preceding paragraph, these methods are either tailored to work on specific country's LPs or are tested on multi-country datasets that share a common single line LP layout. This paper presents a highly accurate deep ALPR system that is applicable to license plates belonging to multiple countries. In this study, we propose a three-stage deep multinational ALPR approach that combines deep learning with an image processing-based multinational LP layout detection algorithm. LP detection is the first stage of the proposed ALPR system which is responsible for detecting the LP region in an image. This stage uses tiny YOLOv3 network architecture [22] for detecting the LP region and is referred to as the ''attention network'' since it provides the LP region to the next stage of our ALPR system. Tiny YOLOv3 was selected since LP detection is a relatively simpler object recognition task and does not require an extremely deep network. Unified character recognition is the second stage of our ALPR pipeline and uses YOLOv3-SPP which we call as the ''recognition network''. YOLOv3-SPP is a modified version of YOLOv3 [22] that includes a spatial pyramid pooling (SPP) [23] block. YOLOv3-SPP was chosen due to its ability to deal with multiscale and small objects. LP region from the attention network is fed as input to the recognition network. The recognition network recognizes all the characters printed on the LP. The recognition network provides no information about the order of the recognized characters. For extracting an ordered string from the recognized characters, we propose a multinational LP layout detection algorithm. The proposed layout detection algorithm is based on image processing techniques and can extract the correct sequence of the LP number from multinational LPs. It does so by effectively classifying among single line and double line LPs. The proposed ALPR system was tested on Korean (KarPlate dataset), Taiwanese (AOLP dataset [26]), American (Caltech Cars (Rear) 1999 [28]), Greek (Medialab LPR Database [29]), and Croatian (University of Zagreb [30]) license plate datasets. Our own KarPlate dataset was generated by using our semi-automatic dataset generation strategy. Our ALPR system outperformed previous works in terms of performance and speed which is evident in Section 5. In order to test our layout detection algorithm, we collected a small dataset consisting of LPs from 17 different countries and tested it on the layout detection algorithm. The results show the effectiveness of our layout detection algorithm and validates the applicability of our ALPR system to multinational LPs.
The main contributions of our study are summarized below: 1) We present a deep end-to-end ALPR system which is applicable to multinational LPs without the need of any additional steps. Given an image containing a LP, the proposed system can extract the LP number with the correct sequence. The proposed ALPR system does not require any specific pre-processing to be used on different datasets. Previous studies could also be finetuned on other countries' datasets, however, modifications in the algorithm would be required to work on different LP layouts. 2) We propose a simple and effective multinational LP layout detection algorithm that can classify among various LP layouts used in most of the LPs used worldwide. Given the correct bounding boxes, the layout detection algorithm can extract the correct sequence of LP number from various LP layouts belonging to different countries. 3) To the best of our knowledge, we present the first publicly available 1 Korean car plate dataset (KarPlate dataset) containing more than 4,000 full HD images of Korean cars. 4) Based on the experimental results on datasets from five different countries, the proposed approach performs better than previous works in terms of performance even though many previous works utilize artificial data to prevent overfitting and to increase the dataset size. We did not use any artificial data in our system. The remainder of the paper is organized as follows. Section II reviews related studies along with their limitations. The details about our semi-automatic data annotation strategy is discussed in Section III. The working of our algorithm is explained in Section IV while experimental evaluation is presented in Section V. Conclusion is covered in the last section of this paper.

II. RELATED WORKS
ALPR approaches can be broadly divided into two main categories -traditional image processing methods and deep learning methods. This section reviews the recent works in the latter category since our approach also falls into this category. Specifically, we will review the relevant literature in license plate detection (LPD) and license plate recognition (LPR) subcategories. In addition, we also review commercially available ALPR software and the work done on multinational LP detection/recognition. Lastly, we discuss the limitations of previous works.

A. LICENSE PLATE DETECTION
A cascade framework consisting of two convolutional neural networks was used to detect the LP region in [1]. The first convolutional neural network (CNN) in the cascade framework was responsible for extracting regions containing text from the image while the second CNN classified text regions as LP or as general text. Each image was divided into subregions which were fed independently into a CNN in [4]. The CNN outputs a score which indicates how likely a specific sub-region is to contain a LP. A cascaded approach like the one in R-CNN [27] was used in [5] for LP localization. A weak SNoW classifier generates candidate LP regions which were fed into a strong CNN classifier (AlexNet [16]) to be scrutinized. Images that failed to pass a confidence test were fed into another CNN (AlexNet [16]) which identified the reason of failure. Failure identification can help in identification of probable problem which helps in troubleshooting the ALPR system at the earliest convenience. A simple CNN was used in [5], [10] for LPD. Region Proposal Network and Box Regression layer was used in [5] and [10], respectively, to detect the LP location.
Many real-time LPD approaches [3], [7], [8], [11], [13] used YOLO networks or its modified versions due to its fast inference speed. Most deep neural networks struggle at detecting small objects. To tackle this issue, the study in [7] trained FAST-YOLO [18] to detect the frontal view of a car. The detected frontal view was cropped and fed into the same network to detect the LP. The LP appears bigger in the cropped frontal view image which makes it easier for the network to detect the LP. The study in [3] modified YOLO [18] and YOLO9000 [19] for application in LPD and achieved better accuracy than the original YOLO networks. A multidirectional LPD method [8] used two networks to detect rotated LPs. The first network, referred to as the attention network, detected the LP region while the second network MD-YOLO (modified version of YOLO [18]) detected the rotated bounding box of the LP. Two YOLO networks, one for vehicle detection and the other for LPD, were used in [13]. A CNN called WPOD-NET that regresses coefficients of an affine transformation for detecting and unwarping distorted LPs was proposed in [11]. The input to WPOD-NET is the image of a vehicle which was detected by YOLO [18].

B. LICENSE PLATE RECOGNITION
Many studies [1], [5], [6] have tried to unify the subtasks (character segmentation and character recognition) of LPR. The studies in [1], [6] considered LPR as a sequence labeling problem making the character segmentation step unnecessary. The study in [1] proposed the use of a recurrent neural network (RNN) with long short-term memory (LSTM) and Connectionist Temporal Classification (CTC) [20] for LPR. On the other hand, the study in [6] used Bidirectional RNNs (BRNNs) with CTC loss [20] for LPR. The study in [5] employed a sweeping OCR technique that swept an OCR classifier across the LP image and localized the characters by utilizing a probabilistic inference method based on hidden Markov models (HMMs). Viterbi decoding was used for determining the most likely code sequence using a language model. A slightly modified version of YOLO called CR-NET along with a heuristic method was used in [7] for Brazilian LPR. A few other studies [9], [11] also used CR-NET for LPR. A low-computation CNN inspired by SqueezeNet Fire Blocks [21] and Inception Blocks was used for realtime LPR in [10]. The work in [12] combined feature maps after ROI pooling and fed them into subsequent classifiers for LPR. A different approach which segmented characters from a LP using semantic segmentation followed by a counting refinement stage was adopted in [13]. A modified version of DeepLabv2 ResNet-101 model was used for semantic segmentation. The counting refinement stage extracted character regions and fed them into AlexNet [16] for character counting. A sliding-window single class detector via tiny YOLO classifiers was used in [17] for LPR. 36 tiny YOLO models were used to recognize characters in the AOLP dataset [26].

C. COMMERCIAL ALPR SOFTWARE
Commercially available ALPR software include Sighthound [2] and OpenALPR [14]. Sighthound outperformed OpenALPR and used a sequence of deep CNNs, however, exact details about the CNN architecture are unavailable since it is a commercial product. It is worth mentioning that [2], [14] were trained on their own large privately collected dataset for various countries.

D. MULTINATIONAL LICENSE PLATE DETECTION/RECOGNITION
To the best of our knowledge, there has been considerably meager research conducted on the topic of multinational license plate detection/recognition. The study in [15] proposed a system for multinational license plate detection in images with complex backgrounds. First, the rear vehicle lights were extracted by converting the image to YUV color space. Once the rear lights were detected, the LP area was detected by using a histogram-based approach on the edge energy map. The study by [6], although not termed as multinational, tested their approach on LP datasets from various countries to validate the generalizing capability of their method.

E. LIMITATIONS
Most related works have one of the following limitations: 1) Most previous works on LP detection/recognition are tailored to work on license plates from specific countries. Generally, such approaches use countryspecific information to constrain the issue of LP detection/recognition. 2) The previous works [6], [15] dealing with multinational license plates were tested on datasets from various countries, however, it must be noted that the datasets used for testing contained the same single-line license plate layout. In short, the datasets used lack diversity in LP layouts. 3) Certain previous works [1], [17] that test their approach on AOLP dataset require additional pre-processing using Hough transform when testing on subset RP to tackle the rotated LPs.

III. SEMI-AUTOMATIC DATA GENERATION STRATEGY
Most Korean license plate datasets are not available publicly. Therefore, we create our own Korean car plate (KarPlate) dataset by using our own semi-automatic dataset generation strategy as illustrated in Fig. 1. The details about our dataset generation strategy are described below:

A. INITIAL DATA COLLECTION AND TRAINING
We collected image data from a car parking lot in South Korea. This initial dataset consists of 372 images -each image containing a distinct car. We manually annotate this data using an open-source image labeling software known as LabelImg [24]. After labeling our data, we generated two datasets -one consists of full car images with bounding box annotations for LP while the other dataset consists of images of cropped LPs with bounding box annotations for every character.
After dataset annotation, we augmented our dataset by using the augmentation strategy shown in Fig. 2. We designed this augmentation strategy in a way that enhances the robustness of our system across varying conditions. The system was trained by using the augmented images. Our networks were trained to an extent where it could be tested on unseen data with enough accuracy. The weights of the trained network were saved and used in the next step for automatic annotation of unseen data.

B. AUTOMATIC ANNOTATION
We collected raw video data for many days from a CCTV camera installed at a car wash facility in South Korea. From the raw video data, we extracted video clips which contain cars and merge all clips into one clip of about 32 hours duration.
The merged video clip was tested on our network which was trained in the previous step. If the system detected a LP in the frame, a script saves the frame along with the detected bounding box annotations and LP number. A total of 471,981 frames along with annotations and LP numbers were saved after running inference on video data.
However, these 471,981 images contained many redundant frames. These frames looked identical in appearance. It is important to extract only unique frames with distinct appearance.

C. KEYFRAME EXTRACTION
This step deals with the automatic selection of unique frames (keyframes) and deletion of the redundant ones. The primary goal of this step is to discriminate keyframes I key from redundant frames I r . Simply, the keyframes are found by calculating the similarity among the two images using correlation coefficient r. The overall approach is shown in Fig. 3.
Given a sequence I seq of images containing n number of images, the first frame was set as the initial reference frame I ref . The histogram of the reference frame H ref was calculated and recursively compared with the histogram of  the following query frame H q until the distance d between the two histograms reaches a threshold value T. Correlation coefficient was used as the distance measure. The criteria used to discriminate among frames is defined as follows: where, I n seq is the nth image in I seq , H ref (i) and H q (i) denote the normalized frequency at pixel value i in the histograms VOLUME 8, 2020 If d is below the threshold value, then that query frame is considered as redundant and the comparison is continued with the following query frame. On the other hand, if d exceeds the threshold then the query frame is considered as a keyframe. As soon a keyframe is found, the initial reference frame is switched with the found keyframe and the process continues until no frame remains in I seq . A total of 3,893 keyframes were extracted from the 471,981 images by using keyframe extraction.

D. MANUAL VERIFICATION
In this step, we manually verify the annotations of the 3,893 extracted keyframes which were automatically annotated. This was to ensure that all the annotations were correct. Firstly, we verified the LP numbers of each image. This was followed by verification of bounding box coordinates and bounding box labels. In case of any incorrect annotation, the image's annotation file was corrected and updated.

IV. PROPOSED ALPR SYSTEM
In this section, we describe the working of our proposed ALPR approach. Our system is primarily based on YOLOv3 [22]. The overall block diagram of our algorithm is illustrated in Fig. 4 This step of the algorithm deals with the detection and localization of the LP. The primary goal of this stage is to constrain or restrict the search area for the character recognition step. The localized LP plays is a crucial role in reducing the number of false character detections outside the LP area. We propose to use the YOLO network architecture in this stage of our system. Specifically, we use tiny YOLOv3 for this step of our proposed algorithm. Tiny YOLOv3 is a smaller version of YOLOv3 and achieves high FPS at the expense of decrease in mAP score. Tiny YOLOv3 is feasible for our application since LP detection is a simpler task as compared to LP recognition. The YOLO object detector works by splitting an image into an S × S sized grid. For each grid cell, K number of bounding boxes along with the confidence scores Pr(Object)×IOU truth pred are predicted. The confidence score indicates the extent to which the model is confident about the existence of an object. Confidence score is zero in the absence of an object. In the presence of an object, confidence score is equal to the IOU between the predicted and the ground truth bounding box. A conditional class probability score Pr (Class i |Object) is also predicted for each grid cell containing an object. The class-specific confidence score for each box is calculated using Equation 3 and is encoded as an S × S × (K × (5 + C)) tensor.
The main reason for selection of YOLO is its realtime performance and high accuracy. Specifically, we utilize the latest YOLOv3 instead of YOLOv2 even though YOLOv2 achieves higher frames per second (fps). This is because YOLOv3 achieves a higher mean average precision (mAP) score and makes predictions at three different scales. An accurate system with a reasonable level of real-time speed is preferred over a high fps system with a low accuracy. The ability of YOLOv3 to make predictions at varying scales preserves fine features which enables the network to detect small objects. Previous versions of YOLO struggled to detect small objects. The YOLOv3 used in this step is referred to as the attention network since it extracts important region for the recognition network. The network was trained on images resized at 640 × 640 for KarPlate subset LPD, AOLP [26], and Medialab [29] datasets. In case of Caltech Cars [28] and University of Zagreb [30] datasets, network input image size of 704 × 416 and 640 × 480 was selected based on LP aspect ratio, respectively. We changed the number of filters used in the last convolution layer of YOLOv3. The number of filters depends upon the number of classes (1 in this case) to predict and is calculated by using Equation 4. YOLO utilizes anchors A to predict bounding boxes with coordinates coords (x, y, w, h). The default value of A was used in our experiments. YOLO requires a confidence threshold value. An object's location will be returned only if it is over the confidence threshold. Sometimes, YOLO predicts other similar objects (like billboards, etc.) as LP. To solve this issue and improve LP detection and localization, we added negative image samples during training.

B. UNIFIED CHARACTER RECOGNITION
This step is responsible for recognizing characters in the extracted LP from the previous step. Contrary to most previous works, which address character recognition as a two-step problem (segmentation and recognition), we pose character recognition as an object recognition problem. Using object recognition, we unify the character segmentation and recognition step into one by treating characters as objects. This stage uses YOLOv3-SPP which is an improved version of YOLOv3 that uses spatial pyramid pooling (SPP) block [23].
Contrary to traditional pooling, spatial pyramid pooling splits a feature map into B i = n i ×n i bins where B i denotes the number of bins in the i-th layer of the pyramid. The feature maps are then pooled by using max pooling into the same size as its bin. This produces an N ×B vector, where N denotes the number of filters in the convolution layer and B denotes the number of bins. By pooling in local spatial bins SPP generates fixed-length vectors by pooling the features together. The filter size in traditional pooling is fixed whereas the filter size in SPP depends upon the input and output size. The use of SPP block has shown to improve performance in various CNNs since SPP handles multi-scale images effectively.
The network was trained on images resized at 384×192, 288×224, 256×224, 384×224, and 384×224 for KarPlate subset LPR, AOLP [26], Caltech Cars [28], Medialab [29], and University of Zagreb datasets, respectively. The network input image sizes were chosen based on the aspect ratios of LPs. For the KarPlate LPR dataset, the network was trained to recognize 45 classes. Korean LPs consist of 35 Korean alphabetic (Hangul) characters and 10 numerical characters. The details about the 35 Hangul and 10 numerical characters are shown in Table 1. In case of other datasets, the network was trained to recognize 36 classes (A-Z and 0-9). Like the preceding step, YOLO filters were recalculated for improving object detection. Since the number of characters n c in a specific country's LP are known, we use this information to filter out the top n c characters from the detected characters. This way characters with low confidence scores can be filtered out and false positive can be reduced. In case the number of characters is variable, we compute IOU among the detected objects and reject the object with lower confidence if the IOU among two bounding boxes is greater than a threshold.

C. MULTINATIONAL LICENSE PLATE LAYOUT DETECTION
There is a specific layout for every country's LP. Every LP number should be extracted in the correct order. The output from the recognition network does not provide information about the order of the LP number. Therefore, certain heuristics are required to extract the final number. Most previous works design methods for extracting the correct order of license number. However, these methods work only on specific countries' LPs and fail when applied to other countries' LPs. In this study, we propose an algorithm for extracting the correct order of the LP number that generalizes to multinational LPs.
In order to develop a universal algorithm, we first analyze the layouts of LPs existing in the world. We examine LPs belonging to various countries from all continents of the world (except Antarctica). Particularly, we analyze LPs from 17 countries belonging to different continents as shown in Fig. 5. Based on our analyses, we observed that most LPs in the world can be classified as single line or double line license plate as shown in Fig. 5.
The block diagram of our algorithm is shown in Fig. 6. First, we sought all the recognized bounding boxes by x TL n (top left x coordinate). Let bboxes be the sorted list of lists in ascending order.
where x TL n and y TL n represent the top left x and y coordinate whereas x BR n and y BR n represent the bottom right x and y coordinate of n-th bounding box. After sorting, we make line segments on the left side of all bounding boxes. The end VOLUME 8, 2020 This is followed by extracting the left (l left ) and right (l right ) borderline (see yellow line in Fig. 7 and 8). Since the bboxes list is sorted in ascending order, therefore, the end points of l left and l right borderline can be found by using the first and the last bounding box coordinates in bboxes, respectively. The coordinates for l left and l right are: l right = (x TL n , y TL n ), (x TL n , y BR n ) Then, we draw a line segment l center (see purple line in Fig. 7 and 8) with the midpoints of the borderlines (l left and l right ) as its endpoints. The end points of l center are: Next, we need to check the number of line segments intersecting the center line l center . If l center intersects all the line segments, then the LP is a single line LP (see Fig. 7's right most image). On the contrary, if l center intersects a few line segments then the LP is a double line LP (see Fig. 8's right most image). To find the number of intersections, we generate line equations in general form (ax +by = c) by converting the end points of line segments (l). The intersection point P(x, y) can be calculated by: where a 1 , b 1 , and c 1 represent the coefficients of one line while a 2 , b 2 , and c 2 represent the coefficients of the other line. Note that equation 10 finds the point of intersection for  infinitely long lines defined by end points, rather than the line segments between the end points. Therefore, to find whether an intersection point exists within the line segments, we apply the following criterion: If f (x) = 1, it means that an intersection point exists within the two line segments while f (x) = 0 means that an intersection point does not exist within the two line segments. Based on the number of line segments that intersect l center , we can find the type of LP by using the criterion: where N int is the number of intersections and N bboxes is the total number of bounding boxes in bboxes. The function g(x) returns a 1 for a single line LP and 2 for double line LP. For a single-line LP, the final number will have the same order as bboxes. For double-line LP, the bounding boxes lying on the top and bottom area of the LP must be found. To find the bounding boxes lying in the top part of the LP, the following criterion is applied: where bboxes int and bboxes int are lists of bounding boxes that intersect and do not intersect l center , respectively. bboxes int and bboxes int are sorted by x TL n (top left x coordinate) in ascending order. bboxes int includes bounding boxes that are associated with the borderlines (l left and l right ). y TL 1 is the top left y coordinate of the 1st bounding box in bboxes int or bboxes int . The bounding boxes in the top area will be written first while the ones in the bottom area will be written at the end in the final string.

A. DATASETS
The proposed approach was validated on datasets containing license plates from South Korea, Taiwan were manually annotated since none of these datasets provided bounding box annotations for LP detection (except for AOLP dataset) or character recognition.

1) KARPLATE DATASET (SOUTH KOREA)
The KarPlate (Korean car plate) dataset is publicly available and can be downloaded from our project webpage 1 . The details about our dataset generated using the semi-automatic dataset generation strategy will be discussed in this section. The KarPlate dataset is divided into subset LPD, subset LPR, and subset EER. Subsets LPD and LPR each contain 3,417 images for training and 850 images for testing. Subset EER contains only 929 test images along with the LP numbers. Among the 929 images in subset EER, 850 images are same as the test images in subset LPD while the remaining 79 images were captured by a mobile phone's camera. These 79 images are relatively more challenging since each image contains multiple LPs.
Each subset is intended to be used for a specific task. Subset LPD is for license plate detection, subset LPR is for license plate recognition, and subset EER is for endto-end recognition. Each image in subset LPD and subset EER have a resolution of 1920×1080. The train-split is augmented using the augmentation strategy shown in Fig. 2. After augmentation, more than 30,000 and 60,000 images for training were generated using subset LPD and subset LPR, respectively. The annotations of the augmented images were also augmented using the imgaug library [25]. Sample images from each subset can be visualized in Fig. 10. It should be  noted that our dataset does not contain images of commercial and rental vehicles in South Korea since none passed through our camera setup.

2) AOLP DATASET (TAIWAN)
The application-oriented license plate (AOLP) [26] database is a public dataset consisting of 2,049 images of Taiwanese license plates. The dataset is divided into access control (AC), law enforcement (LE), and road patrol (RP) subsets. Subset AC, subset LE, and subset RP consist of 681, 757, and 611 images, respectively. License plates in each subset present varying application parameters (like tilt, width ratio, distance, etc.) depending upon the application case. Subset AC comprises of images of vehicles passing through fixed passages such as toll stations. Images captured by roadside cameras which are used for checking traffic law violations are included in Subset LE. Lastly, subset RP contains images taken by handheld cameras which are used for finding parking violations, searching lost vehicles, etc. Following previous works, two subsets were used for training while the third subset was used for testing.

3) MEDIALAB LPR DATABASE (GREECE)
The Medialab LPR database consists of 716 images containing Greek license plates and is provided by the National Technical University of Athens. It is divided into a normal subset and a difficult subset which covers situations like shadows, low-light, blur, dirt, etc. The normal subset contains 437 images while the difficult subset contains 279 images.
Considering the previous works, normal subset was used for testing and difficult subset was used for training our system. However, it must be noted that the difficult subset contains only 279 images out of which 20 images contain unreadable license plates due to small image dimensions. This leaves us with 259 images which are extremely less for training our deep neural networks even after applying data augmentation. To tackle this issue, we used 501 images from the University of Zagreb [30] dataset, 108 images from the OpenALPR Europe dataset [31], and 1,428 images from ReId dataset [32]. All the images taken from other datasets contain European LPs. For testing, 431 out of 437 images were used from the normal subset since the remaining images contained unreadable LPs. The characters '1' and 'I' were trained as a single class due same appearance in Greek LPs. '0' and 'O' were also trained as single class due to the same fact. These characters were swapped during the testing phase since the first two or three characters in Greek LPs are always letters while the remaining four characters are always digits.

4) CALTECH CARS (REAR) 1999 DATASET (USA)
The Caltech Cars (Rear) 1999 dataset contains 126 images of vehicles containing license plates from different states of USA. The images have a resolution of 896×592 pixels and were captured at Caltech parking lot with a cluttered TABLE 2. LP detection comparison in terms of precision (P) and recall (R) at IOU of 0.5.
background. The dataset was randomly spitted into 80 images for training and 46 images for testing. The train-test split was inspired by previous works. In addition to the 80 images, 244 images from OpenALPR US dataset, 108 images from OpenALPR Europe dataset [31], and 501 images from the University of Zagreb [30] dataset were used for training the character recognition network. On the other hand, only OpenALPR US dataset [31] was used for training the LP detection network.

5) UNIVERSITY OF ZAGREB DATASET (CROATIA)
This database comprises of 510 images of vehicles containing Croatian license plates collected by the University of Zagreb, Croatia. Among the 510 images, 9 images were discarded, and the remaining 501 images were utilized for training and testing. The dataset was randomly spitted which resulted in 401 images for training and 100 images for testing. 108 images from OpenALPR Europe dataset [31] were also used for training. The characters '0' and 'O' were trained and tested as a single due to the same appearance in Croatian license plates.

B. IMPLEMENTATION DETAILS
The proposed system was trained and tested on a personal computer containing an Intel Core i7-4770 processor along with NVIDA Titan X Pascal and 24 gigabytes of RAM. Alex-ayAB's version of Darknet [33] was used to train the YOLO networks. The overall system was programmed using python and a python wrapper was used for incorporating Darknet in the system. Imgaug [25] was used for data augmentation along with the bounding boxes.

C. EXPERIMENTAL RESULTS
The proposed method was evaluated on five LP datasets each belonging to a different country to validate the effectiveness of our approach. The performance evaluation is divided into five sections -license plate detection, license plate recognition, end-to-end recognition, multinational LP layout detection, and time consumption. The sections are described below:
Precision can be calculated by dividing the number of correctly detected LPs by the total number of detected LPs. Recall can be calculated by dividing the number of correctly detected LPs by the total number of ground truth LPs. Precision and recall are mathematically defined as follows: Recall = TP TP + FN (15) where TP, FP, and FN represent true positive, false positive, and false negative, respectively. The detected LP bounding box is considered correct if it there is an overlap greater than 0.5 between the predicted and the ground truth bounding box. Considering the evaluation criterion of previous works, we select the threshold value of 0.5 for a fair comparison. The LP detection performance was compared the recent methods [1], [8], [17], [34]- [36]. It is evident from Table 2 that the proposed approach outperformed most previous works on the five datasets. The proposed approach outperforms the work by [34] with precision/recall rate of 98.85/99.76% on the Medialab dataset [29]. Similarly, a high precision/recall rate of 100.00/100.00% surpassed the results by the methods in [35][36] on the Caltech Cars dataset [28]. A high precision/recall rate of 98.01/99.00 and 100.00/100.00 was obtained on the University of Zagreb [30] and KarPlate subset LPD datasets, respectively. In addition, our system achieves high recall rate of 100.00% on both AOLP subset AC and AOLP subset RP.

2) LICENSE PLATE RECOGNITION
In this section, the LP recognition performance is compared with the previous works [1], [11], [13], [17], [37]. The accuracy shown in this section corresponds to the combined accuracy of the character recognition network and the layout detection algorithm. It should be noted that the input to the character recognition network is the cropped LP image. Given the cropped LP image, the character recognition network detects characters in the image which are later fed into the layout detection algorithm resulting in a character sequence which represents the LP number. The result is considered as correct only if all the characters are detected correctly and the LP number is in the correct sequence. The result is declared incorrect if certain characters are not detected or if more characters are detected.  The LP recognition results on the AOLP, Medialab, Caltech Cars, University of Zagreb, and KarPlate subset LPR datasets were compared with the results from previous works [1], [11], [13], [17], [37] and are presented in Table 3. The proposed approach achieves high accuracy on all the five datasets. The results show that our method achieves an average accuracy of 99.34% on the AOLP dataset outperforming the previous state-of-the-art method [13]. Our approach performed well on all subsets of AOLP dataset. It should be noted that the accuracy for AOLP subset LE by [13] was calculated only for 582 out of 757 images whereas we present result for all 757 images in AOLP subset LE. Hence, our method is superior in results when compared to the results in [13]. In case of AOLP subset RP, the toughest subset in AOLP dataset, we achieved an accuracy of 99.53%. It should be noted that the study in [7] used Hough transform to straighten the LPs as a preprocessing step before inputting it to their proposed algorithm. Our algorithm can effectively handle rotated LPs without requiring any pre-processing. The study in [11] did not include results on subset AC and subset LE while the study in [17] only provided average accuracy, hence, our results could not be compared with theirs.
It must be considered that the accuracy mentioned by [13] for Medialab dataset was calculated only for 427 images. Hence, for a fair comparison, the accuracy of 98.36% presented in Table 3 for proposed approach was calculated only for 427 images. The accuracy achieved by our method for 431 images was 97.75%. In case of Caltech Cars, University of Zagreb, and KarPlate subset LPR datasets, an accuracy of 95.65, 98.00, and 98.59 was achieved, respectively. To the best of our knowledge, none of the previous works mentioned accuracy only for LP recognition on Caltech Cars and University of Zagreb datasets.

3) END-TO-END RECOGNITION
This section evaluates the end-to-end performance of our approach. The end-to-end performance indicates the overall performance of the proposed solution. The uncropped LP image is passed into our three-step algorithm and the result is the LP number. Note that the input image is not a cropped-out LP image. The result is correct if LP is detected correctly, recognized correctly, and layout is detected correctly. Incorrect result at any of the three step leads to an incorrect recognition. Simply, given an uncropped LP image, the resulting character sequence must match the ground truth character sequence. Table 4 presents the end-to-end performance of our method and compares it with previous academic [6], [11], [36] and industrial works [2], [14]. It is evident that the proposed method surpasses results by commercial software [2], [14] and academic works [6], [11], [36] on all five datasets. It is worth mentioning that commercial software are generally trained on much larger datasets which is huge advantage in deep learning. Our method outperforms previous academic works [6], [11], [36] and both the commercially available software [2], [14] with an average accuracy of 98.93% on the AOLP dataset. For OpenALPR [2], the region was set to Europe when testing AOLP dataset since OpenALPR lacks Taiwan region. Europe was chosen since both European and Taiwanese LPs are single line LPs and use the same characters. The work by [11] used artificial data to achieve the accuracy of 98.36% on AOLP subset RP while ours did not use artificial data. The performance of the work by [11] on the AOLP subset RP drops down to 93.29% without using artificial data while our work achieves 99.51% accuracy without artificial data. Also, the work by [6] cannot deal with rotated LPs in subset RP which is evident from the low 83.63% accuracy.
The proposed method outperformed commercial software [2], [14] and achieved an accuracy of 96.98% and 97.00% on Medialab and University of Zagreb datasets, respectively. An accuracy of 97.83% was obtained by our method as compared to 98.70% by [36] on the Caltech Cars dataset. However, it must be taken into consideration that test data was randomly selected in [36] and in our work.   It may be possible that relatively difficult images might have been included in our test set which can be responsible for slightly lower accuracy. Lastly, a high accuracy of 98.82% was achieved on KarPlate Subset EER dataset. Since Sighthound [14] cannot detect Korean characters, hence, we could not test KarPlate Subset EER dataset on it.

4) MULTINATIONAL LP LAYOUT DETECTION
We already evaluated the proposed approach on five datasets from different countries. However, to further validate the applicability of our approach to multinational LPs, a demo evaluation on LPs from 17 countries is presented. Demo evaluation is presented since most countries do not have public LP datasets. Even if public LP dataset for a specific country is available, it must be annotated with bounding box annotations since our method deals character recognition as an object recognition problem. Annotating datasets for about 17 countries is extremely time consuming.
Hence, a demo dataset consisting of images each containing LP from a different country (some LPs belong to same country but differ in layout). The purpose of this evaluation is to demonstrate the applicability of our work to multinational LPs without the need of any additional steps. This evaluation assumes that the character recognition network outputs correct results. Fig. 11 shows the resulting correct sequence of characters for the demo dataset. It is obvious from Fig. 11 that the proposed layout detection algorithm can successfully extract the correct sequence of the LP number. However, it is worth mentioning that as time passes by, new LP layouts may be made available and it is possible for the algorithm to fail in that case. We show that our layout detection algorithm can work on most of the LP with different countries.
It is possible that the character recognition network might also recognize the state's name. For instance, 'FLORIDA' in USA LP can be detected as characters even though it might not be a part of the LP number. To cater this issue, such words can either be trained as another class which can be rejected later if recognized or can be used as a negative samples during training the character recognition network.

5) TIME CONSUMPTION
This section evaluates the time consumption of the proposed approach on three tasks. These three tasks include license plate detection (LPD), license plate recognition (LPR), and end-to-end recognition (EER).
The time consumption of our proposed system, presented in this section, was computed using a computer with an Intel i7-4770 processor, NVIDIA GTX Titan X Pascal and 24 gigabytes of RAM. It must be noted that the time consumption for LPR includes the time taken by the layout detection algorithm. Aslo, it is worth mentioning that the time consumption for our approach reported in this paper includes image reading time. The proposed system is faster if image reading time is excluded. Table 5 presents the time consumption for LPD, LPR, and end-to-end recognition on all five datasets. For each dataset, we take all test images from that dataset and test it on our ALPR system. The average of the time taken to process each image is reported in Table 5. It takes our system 25.45 ms per image, 16.54 ms per image, and 41.56 ms per image on average for LPD, LPR, and end-to-end recognition. The highest execution time is for KarPlate dataset. This is because the images in KarPlate dataset have a resolution of 1920×1080 pixels which is the largest among the other four datasets. Another reason is that KarPlate subset EER contains images consisting of multiple LPs whereas all other four datasets contain images with single LP.
In practical applications, ALPR systems are required to process images that contain multiple LPs which increases time consumption. Therefore, for a more realistic evaluation, Table 6 presents the time consumption assuming that a certain number of LPs are present in an image. The execution time presented in Table 6 is for KarPlate dataset (subset EER). To compute the execution time for multiple LPs, we had to manually divide the dataset into three subsets based on the number of LPs. For each subset, we test all images in that subset and report the average of the time taken to process each image in Table 6. Execution time for multiple LPs could not be computed for the other four datasets since none of the images in other four datasets contain multiple LPs. It is evident from Table 6 that the execution time increases by a few milliseconds with the increase in the number of LPs. The proposed system can process a full HD image with three LPs in 93.10 ms (about 11 frames per second). Although, 11 frames per second is not blazing fast, however, this is good enough for real-time usage. It is also worth mentioning that the execution time will be decreased if our system is implemented in C++ and if multi-threading is integrated in the source code. Unfortunately, due to time limitation, the results presented in this study were computed using the python implementation of our system which did not use multi-threading. Table 7 compares time consumption for LPD and endto-end recognition with recent works [6], [38]. For a fair comparison, the hardware used in previous works [6], [38] is mentioned in Table 7. The time consumption of previous works [6], [38] were taken from the respective studies. It is clear from Table 7 that our approach is faster than previous works [6], [38]. Our approach consumes 29.19 ms for end-to-end recognition on AOLP dataset, which is considerably less when compared to [6]. In case of LPD, the proposed method outperforms [38] with a time consumption of 19.36 ms and 14.69 ms for Caltech Cars and University of Zagreb datasets, respectively. The time consumption for LPR could not be compared since we could not find related works that report time consumption for LPR on the datasets used in this paper.

VI. CONCLUSION
This paper presented a generalized solution for multinational license plate recognition. Our algorithm consists of LP detection, unified character recognition, and multinational LP layout detection steps. LP recognition is posed as an object recognition problem which unifies the character segmentation and character recognition steps. Our system is applicable to license plates from multiple countries by using our proposed multinational LP layout detection algorithm. To the extent of our knowledge, LPs from most of the countries can be broadly classified into single line and double line LPs. The proposed layout detection algorithm is simple, yet it can effectively classify among various LP layouts. Given the correct bounding boxes, our algorithm can effectively extract the correct sequence of LP number from an image. A new Korean car plate (KarPlate) dataset was made publicly available for research purposes. The proposed solution was tested on license plates from South Korea, Taiwan, USA, Greece, and Croatia. Results show that our proposed approach outperforms previous research works and commercial software. In addition, we collected a small demo dataset consisting of LP images from 17 different countries and tested our layout detection algorithm on it. The layout detection algorithm extracted correct sequence of the LP number from 17 countries' LPs. The proposed solution consumes about 42 ms per image on average which is considerably faster than previous works. In a nutshell, our proposed algorithm can work on datasets from multiple countries without the need of any additional algorithms or country-specific information.