General Aviation Aircraft Identification at Non-Towered Airports Using A Two-Step Computer Vision-Based Approach

Aircraft identification in airport operations is critical to various applications, including airport planning and environmental studies. Previous research and commercially available systems heavily rely on recognizing aircraft tail numbers using text recognition. However, this approach alone does not provide accurate results in situations when the tail number visibility is reduced or obstructed. Furthermore, general aviation aircraft are harder to identify because they are small in size, and their tail numbers include substantial variations in fonts, sizes, and orientations. To tackle these issues, we propose a two-step computer vision-based aircraft identification method, first identifying the aircraft type and then recognizing the tail number in a probabilistic multi-frame-based (MFB) framework. In the first step, a convolutional neural network (CNN)-based aircraft classifier is customized to decrease the search space in the registration database. In the second step, the identification process is finalized by integrating the text recognition results into the designed probabilistic MFB framework. The proposed method achieves approximately 90% identification accuracy when tested on video data collected from three general aviation airports. This is a significant improvement compared to text recognition alone, which recognizes 67% of the individual tail number characters.


I. INTRODUCTION
A IRPORT operations data is critical for preparing master plans and fair allocation of state and federal funds [1]. Also, operations data facilitates environmental studies investigating the negative effects on the surrounding communities [2]. While air traffic control (ATC) towers provide detailed information about operating aircraft, more than 97% of the United States airports are not equipped with control towers [3]. These airports typically host general aviation and, in addition to corporate and self-piloted flights, provide the public with vital services, such as aerial firefighting, law enforcement, and aeromedical flights [4], [5]. Accordingly, several attempts have been made to automatically identify operating aircraft, including transponder-based methods; however, they exhibit limited accuracy and require aircraft to be equipped with transponders. Alternatively, computer vision has shown to be effective for intelligent transportation systems [6], especially airports [7], [8]. Moreover, a vision-based approach for aircraft identification can provide additional functionalities, such as billing landing fees and preventing runway incursion.
Previous studies have used computer vision to assist control towers with identifying airliners on the airfield surface [9]- [11]. Similarly to commercially available applications, they merely use optical character recognition (OCR) techniques to identify the aircraft registration number (i.e., tail number) imprinted on the aircraft fuselage, which follow the International Civil Aviation Organization (ICAO) and Federal Aviation Administration (FAA) regulations [12]. However, the presented approaches have drawbacks in identifying "difficult-to-read" tail numbers. Specifically, general aviation aircraft are considerably smaller than airliners and exhibit more variation in their shape. They comprise more than 90% of the registered civil aviation aircraft in the U.S. [13] and also account for most non-towered airports operations. Their imprinted tail numbers exhibit much variation in font, size, position, and orientation due to the lack of strict regulations compared to passenger airliners. Moreover, the visibility of tail numbers may be seriously affected by adverse lighting conditions. Accordingly, to overcome these challenges, this paper makes the following contributions: • We propose a two-step aircraft identification method that classifies the aircraft before recognizing its tail number. To integrate the identification system with the FAA registration database, we classify aircraft based on the available visual information in the database (e.g., aircraft and engine type). We explore transfer learning and customize a convolutional neural network (CNN) architecture to obtain the best classifier. The proposed approach improves system accuracy by disregarding irrelevant tail numbers from the database and by finding the class of operating aircraft for miss-identified cases. • We design a probabilistic multi-frame-based (MFB) framework to finalize the identification results. It uses the results of a text recognition network in a sequence of video frames and transforms them into a probabilistic tail number identification. Additionally, we propose a fast tail number detector using a cascaded feature-based approach to reduce the computing times and enable realtime applications. • We publish the collected test dataset from the airfield of three general aviation airports in Utah containing video frames of operating aircraft with annotation text files for aircraft and its tail number to facilitate future studies and comparisons of algorithms [14].
The remainder of the paper is organized as follows. We first review the literature and then elaborate upon the proposed method. Next, we discuss the system setup and report its accuracy on the collected data. Lastly, we discuss the limitations of the system and conclude the paper.

II. LITERATURE REVIEW
Acoustic, radio, and satellite-based methods are used to measure aircraft operations at non-towered airports. However, acoustic-based [15] and radio-based (i.e., general audio recording) [16] systems are incapable of identifying the aircraft. Mott [17] proposed a satellite-based method to detect airport flight activities by encoding the signal transmitted by a transponder carried by aircraft. This method utilizes the Automatic Dependent Surveillance-Broadcast (ADS-B) system. Mode A/C and Mode S aircraft transponder signals are common for civilian use. Only in Mode S signals, each aircraft is assigned a fixed ICAO 24-bit address that can match with aircraft registration in the FAA database (aircraft identity) [18]. Nevertheless, this system is currently inefficient due to the low equipage rate of the general aviation fleet with transponders (about 65% [19]). In addition, many of the general aviation aircraft in the U.S. (approximately 84% [20]) do not have transponders capable of transmitting Mode S signal that contains aircraft identity information. Consequently, this system cannot identify even a large portion of the equipped aviation fleet. A vision-based system can properly address this issue as it is a passive system and not limited to cooperative aircraft only.
Another advantage of a vision-based system are additional applications in the areas of safety and security. For example, the recognized visual information about the airfield can be used to signalize flight clearance for pilots approaching the airport. It would reduce runway incursion risk, which is a major safety concern (e.g., more than 1,511 runway incursions were reported in the last year alone [21]). Identifying unauthorized landings is another useful application of a visionbased system.

1) Aircraft Classification
The image-based approach for measuring the aircraft operations presented in [22], [23] focuses on counting the operations. Another research line is the image classification methods to find the aircraft type or model. Image features like SIFT and SURF are used to detect and classify aircraft [24]. The deep learning approach has provided an end-toend solution for automatic feature extraction for image classification [25]. Neural networks have also been extensively used for aircraft classification in remote sensing images [26]- [28]. Nonetheless, remote sensing image classification models cannot be used for airport-level operation monitoring systems because of the different camera fields of view. The closer range of view at airports can help us recognize the differences between similar aircraft models. Van Phat et al. [29] use deep neural networks for classifying two airliner classes (B737 and B767). Saghafi et al. [30] evaluate multilayer neural networks for classifying aircraft via simulated training data using photos from 3D aircraft models (helicopters and propellers). Likewise, Ali and Choudhry [31] classify civil airliners using feedforward neural networks for video docking systems. Their developed models can classify a small range of the aircraft type/models, while operations at general aviation airports include a wide range of aircraft from propellers to jet airliners. Maji et al. [32] introduce a finegrained aircraft image dataset that has been used for testing image-based fine-grained category recognition models using neural/non-neural techniques [33]. However, these proposed models focus on the classification of aircraft and cannot recognize the identity of individual aircraft, which can reveal detailed and necessary information (i.e., owner, technical information on aircraft registration, etc.). This paper uses aircraft classification prior to tail number recognition to decrease the search space in the aircraft registration database and increase the aircraft identification accuracy. Therefore, a novelty of our work is the development of a classifier that is customized to leverage the FAA database as well as the visually perceivable information about the registered aircraft in that database (sections III-A and IV-A1). We use the HyperBand algorithm to enhance the performance of our customized classifier, which attains a comparable or even better classification accuracy compared to its larger transferred deep learning model counterparts, ResNet50 [34] and Xception [35]. The developed classifier includes all types of aircraft included in the FAA database and is also tested and proved effective on video data collected from three general aviation airports.
Extensive research has focused on detecting airliners in commercial primary airports' video footages [9], [36]. Nevertheless, only a handful of articles paid attention to visionbased aircraft identification, while focusing exclusively on airliners using merely OCR methods. Next, we review existing articles and explain the differences and novelties of our proposed approach.

2) Tail Number Recognition
Airliner identification methods via tail number recognition are studied by Molina et al. [10] and Vidakis and Kosmopoulos [11]. Their methods target surface movements in the airport terminal area. The image-based system developed in [10] (their results are also presented in [37], [38]) processes the manually selected images captured from slow-moving airliners in the terminal area. To localize the tail number zone in [10], the authors assume that the image histogram is distinctive in the region of interest (i.e., tail number) because of its high contrast with the local background, which helps them narrow down the search for tail numbers in images. However, in the case of smaller aircraft models, the fuselage size requires a close arrangement of different elements, including the tail number, windows, aircraft wing, and aircraft tail, which induces much more variations in the histograms of the subwindows of the aircraft image. In addition, in their proposed method, Molina et al. [10] focus on developing a custom OCR model that is sensitive to the spatial transformation of the characters. As a result, their method cannot be applied to smaller aircraft. There is a significant difference in the appearance of tail numbers imprinted on airliners and general aviation aircraft due to the lack of strict regulation for smaller aircraft. Generally, smaller aircraft have a more challenging tail number shape for visual recognition. These challenges stem from the higher variations in sizes, fonts, baseline orientations, and overall design of the smaller aircraft tail numbers.
Vidakis and Kosmopoulos [11] find the target video frame that contains the airliner tail (i.e., fin), assuming the vicinity of the tail and tail number. In the proposed method, a sliding window searches the image part close to the airliner tail, which is detected using blob analysis after backgroundforeground extraction (i.e., motion-based detection). Subsequently, the most frequently recognized number is introduced as the associated tail number to the taxiing aircraft in the apron area. This approach is inefficient for the following reasons: 1) It is computationally expensive to execute OCR over a large number of sliding windows; 2) The proposed approach is prone to identification-related errors as many sliding windows are not associated with the aircraft tail number; 3) there are many visual distractions in the actual video footage of operating aircraft, such as airport service vehicles, construction equipment, and nearby traffic, thus mak-ing the motion-based detections inefficient. 4) The author's assumption regarding the aircraft tail structure shape is not applicable to the smaller aircraft body shape configuration.
Similar to approaches proposed in [10], [11], [39], the commercially available applications (Vantage 1 [40] and Vax-OCR 2 [41]) for identifying aircraft focus on recognition of the aircraft tail number using OCR techniques. As a result, their approach will be limited in cases of difficult-to-read tail numbers and where the tail number visibility is affected by illumination. To overcome these challenges, we propose a two-step identification method and add a layer of classification by processing the visual information from the aircraft silhouette shape (using CNN-based classifiers) prior to tail number recognition (using text recognition) together with the FAA database to finalize the aircraft identification. Using the designed probabilistic MFB framework, we enhanced the system's reliability. In addition, we imposed a limit on the computational intensity of our algorithms to achieve a realtime system. Fig. 1 illustrates a schematic flowchart of the proposed two-step vision-based method for automatically identifying operating aircraft from video data. Any motion detected by background analysis triggers the system. It could be airport ground vehicles, personnel, nearby highway traffic, construction equipment, and animals. Next, we use Single Shot Detector (SSD) to detect operating aircraft and a fast correlation-based object tracker, Minimum Output Sum of Squared Error (MOSSE), to track the aircraft in the rest of the video frames. The built trajectory is further processed to count and recognize the type of aircraft activity (i.e., departure and landing). The system builds a recognition data bank for identifying the target aircraft by using a CNN-based classifier to detect the aircraft class and by recognizing the possible sequences associated with the aircraft tail number at selected video frames. The operation time windows vary between half a second and ten seconds. During this time, pilot maneuvers during take-off or landing might cause some distortion or occlusion of the tail number (e.g., due to the tilted aircraft fuselage). Thus, we extract video frames every 0.1 seconds from the collected aircraft operations footage. This ensures sufficient video frames to minimize the possibility of missing the tail number and maintains a real-time system considering the algorithms' processing times.

III. METHODOLOGY
A CNN-based classifier uses the aircraft image box as input to perform the first step, find the aircraft class, and essentially remove the irrelevant candidates in the registration database. Subsequently, a tail number detector searches over the aircraft image box to localize and extract the tail number image to be used as an input of the text recognition algorithm in the second step. The departure of the aircraft from the camera field of view terminates tracking, and the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.

Two-Step Identification
Frame + 2 data: 1.detected aircraft classes 2.recognized sequences Registration Database FIGURE 1. Aircraft identification system flowchart aircraft identity will be predicted using three probabilistic MFB approaches by processing the data accumulated in the data bank and the registration database.

A. AIRCRAFT CLASSIFICATION
This module serves as a refinement filter to assist the identification process by reducing the search space for finding the associated tail number in the aircraft registration database. Specifically, it eliminates the tail numbers that are irrelevant to the recognized aircraft class. Aircraft classification can also alleviate the possible tail number miss-identifications by closely estimating the operating aircraft class information. We test two common approaches for constructing a CNNbased classifier: (1) building a custom model from scratch and (2) transfer learning. The former has more flexibility in terms of optimizing the arrangement of the neurons in a convolutional layer pattern. The latter is more common among engineers because it yields a consistent performance, especially for cases with limited training data. To obtain a real-time aircraft identification system, we pay particular attention to classification speed and accuracy when assembling different models. Notably, the number of parameters influences the speed and accuracy of a CNN model. In this optimization problem, we fixed 5 convolutional blocks (denoted as Conv B) and 3 fully connected layers as the building blocks of our custom CNN-based classifier. Next, we apply the HyperBand algorithm [42] embedded in Keras [43] to tune and optimize the following parameters of those building blocks:
• Output dimensionality of each convolutional layer is bounded to 16-64 for B1 layers, 32-128 for B2 layers, 64-256 for B3 layers, 128-512 for B4 and B5 layers (step sizes equal to lower bounds). • Output dimensionality of the first two fully connected layers is bounded to 512-1024 for FC1 and 256-512 for FC2 (step sizes equal to half of the lower bounds). • Learning rate of the optimizer is selected from {5e-3, 1e-3, 5e-4, 1e-4, 5e-5}. The mentioned boundaries are determined experimentally for an optimized recognition performance. In each block, the set of convolutional layers (all with 3x3 kernels) is followed by a batch normalization layer to accelerate the training process (as suggested in [44]) and a max pooling layer to reduce the number of parameters progressively. The rectified linear unit (ReLU) function is assigned to calculate the output of the convolutional layers and the first two fully connected layers. At the top of the network, a softmax function is set to compute the probability of each aircraft class using the products of the last fully connected layer.

2) Transferred CNN-Based Classifier Models
Here, we take a CNN model that is previously trained on the ImageNet dataset [45] (a large dataset with common learnable features to the target dataset) and train it on our target dataset. Our adopted transfer learning framework is summarized into four steps: 1. Installing the layers from a pre-trained model as the "base model" on our target classifier, excluding the fully connected layers at the top of its network. 2. Freezing the base model to prevent its weights from being modified. 3. Adding trainable, fully connected layers on top of the frozen base model to turn its features into predictions on our target dataset. 4. Training the (trainable) layers on our new training dataset.
Step 3 comprises a tuning process, similar to the custom model tuning process, to optimally  The overall architecture of the transferred classifier model is similar to the custom model by replacing the convolutional layers with a base model (Fig. 2). Then, the flattening layer is preceded by global average pooling to transform the product of the base model's last layer into a 2D feature map representation. In the end, the top fully connected layers scheme is the same as those used for the custom classifier model. We used ResNet50 [34] and Xception [35], two deep neural networks, as the base model for transfer learning.
The reason for this choice is their good performance. ResNet50 is constructed based on a residual learning framework, which is originally presented to enable deeper networks with faster inference steps, but without reducing the capacity of the network. On the other hand, Xception is developed with the idea of gaining a higher performance than its counterpart network (i.e., Inception V3 [46]), but without increasing the capacity. This goal is achieved by replacing the regular convolutions in the Inception V3 architecture with depthwise separable convolutions.

B. TAIL NUMBER RECOGNITION 1) Tail Number Detection
We propose a feature-based and rapid tail number detector and compare it with a deep neural network text detector (TextBoxes [47]) for validation. We use Haar cascade classifier [48] to construct the feature-based detector, which is faster than deep neural network text detectors and facilitates real-time application. We chose the TextBoxes algorithm developed by Liao et al. [47], which is a popular deep learning text detector.
FAA refers to tail numbers as the N-Number because U.S. aircraft registration numbers start with the letter "N" [49]. With that, once the "N" character is detected, the tail number window is set to the same height as the bounding box of the "N" character. As Fig. 3a indicates, the width of this window is an extended width of the "N" bounding box. The experiments have shown that extending the "N" bounding box width four times accurately encompasses the actual tail number length. Therefore, we construct a feature-based text detector to detect the "N" character in the aircraft image plane using the Haar cascade classifier. This cascaded method classifies each candidate subwindow of an image in consecutive rejection and acceptance stages; each is structured with a combination of weak classifiers, i.e., Haar-like features (Figures 3b and 3c). Haarlike features are convolved to the subwindow to determine if it contains the object. An Adaboost algorithm is used to optimize the selection of the classifier parameters to increase the hit rate (true positive detection rate) and decrease the false detection rate at each stage. While a minimum hit rate of 0.999 is recommended by Lienhart et al. [50], we chose 0.997 as any higher value would significantly decelerate the training process. A maximum false alarm rate of 0.4 proved to be efficient in our experiments.
As for the other hyperparameters related to the degenerate trees of weak classifiers, we fixed the weight trim rate to 0.95, maximal weak tree depth to 1, and maximal weak trees per stage to 100. As suggested in [51], a Gentle Adaboost, which typically enhances the generalization performance, is VOLUME X, 2022 used to form the degenerate trees. Finally, a grid search is performed over three maximum number of stages (i.e., 10, 15, and 20) and two sample window sizes (i.e., 20x20 and 24x24). 248 positive image samples (i.e., images of "N") and 415 negative image samples are extracted from random images retrieved from the web and set as the training data. A validation dataset of 100 aircraft images validated the use of the model with 24x24 window sizes for its higher tail number detection rate. The best model continued to train for 15 stages and terminated after achieving the overall false alarm rate criteria (i.e., MaxFalseAlarmRate MaximumNumberOfStages ).

2) Text Recognition
We propose three approaches for predicting the aircraft identity, each succeeding approach with incrementally added complexity to the model. All three approaches use a text recognition network with a conditional probability distribution representation of the predicted label sequences. We use the text recognition model of the convolutional recurrent neural network (CRNN) [52] (structured with convolutional, recurrent, and transcription layers) for its accuracy and speed. The convolutional layer receives the image as the input and outputs feature sequences that are associated with rectangular regions (window) of the input image (Fig.  4). These feature sequences are then fed to recurrent layers to produce per-receptive-field predictions. Each prediction is a score list of all possible character classes, including uppercase English alphabet, numbers, and a blank. The set of character classes is denoted by . The letters I and O are not to be used based on the FAA regulations on forming an Nnumber to avoid confusion with numbers one and zero [49].
We added a softmax function to this network after the bidirectional long short-term memory (BLSTM) layer in the recurrent layers to normalize the score list of prediction of each window (z i ) to a probability distribution over the where L r = |Ω| and ω is an element of set Ω. Lastly, the blanks and repeated labels (overlapped receptive fields) will be removed to determine the final probability distribution of the recognized label sequence for a video frame, r = r 1 , . . . , r Ls , where L s is the length of the predicted label sequence.
Probabilistic Multi-Frame-Based (MFB) Approaches. For all three approaches, we implement a CRNN model pretrained on two well-known synthetic text recognition datasets named MJSynth and SynthText [53]. Fig. 5 summarizes the following approaches of aircraft identification.
Approach 1: At each video frame, denotes the most probable value of r i . Furthermore, r * = r * 1 , ..., r * Ls denotes the predicted label sequence for that particular video frame, and A = r * 1 , ..., r * N f denotes the set of predicted label sequences for all video frames, where N f is the number of selected frames. The approach 1 ultimately selects the one with the most frequent occurrence during the operation, Approach 2: Here, we alter the first approach by associating the most frequent label sequence argument with a lexicon defined for tail numbers, where Ψ denotes the tail number lexicon which is compiled by extracting the list of the registered aircraft tail numbers that are relevant to the recognized aircraft classes from the database. Approach 3: The third approach enhances the second approach in cases where there is no actual match between the tail number lexicon and the set of predicted label sequences during the operation time window. Here, the tail number in the lexicon that achieves the highest conditional probability with the observation of r, is considered as the predicted label sequence for a video frame, and B = s * 1 , ..., s * N f denotes the set of predicted label sequences for all video frames. T D denotes a tail number listed in the lexicon and where N t is the tail number length and T D (j) is the jth character of the target tail number. Finally, we select the most frequently recognized tail number during the operation time window,

IV. EXPERIMENTAL SETUP
This section illustrates how we conduct the training of the CNN-based classifiers. Then, we elaborate on actual data collection procedures to evaluate the performance of the proposed identification method.

A. CNN-BASED CLASSIFIERS 1) Training Data and Customized Aircraft Classes
For training purposes, we use the FGVC-Aircraft image dataset [32] that contains 10,000 aircraft images, with 100 images for each of the 100 model variants. The dataset is organized in a hierarchy based on the model, variant, family, and manufacturer. However, we classify them into aircraft classes that are useful specifically for our proposed identification system. The information available in the FAAstandard registration database governs the definition of functional classes. The releasable database archive file includes the Aircraft Registration Master file and Aircraft Reference file by Make/Model/Series Sequence [54]. The list of the visually noticeable and perceivable information in database files comprises the aircraft type, engine type, number of the engines, aircraft size (i.e., weight class), and in some cases, the aircraft manufacturing model. Given these details as well as the aircraft models available in the FGVC dataset, we defined 13 classes that encompass aviation fleet mixes at a wide range of airports. Table 1 presents the 13 classes and the criteria by which the proposed method refines the registration database after classifying the operating aircraft into one of these 13 classes.
The differences between reciprocating, 4 cycle, 2 cycle, and rotary engines are mostly in their combustion process; therefore, they are placed in the same group. Turbo-prop aircraft generally can carry more payload than piston-powered aircraft. This attribute frequently leads to further appearance differences between them (class a and class b). The same logic applies to differentiating turbo-fan/jet models from turbo-prop models. The absence of propellers in turbo-fan/jet models is another expressive feature. As Fig. 6 illustrates, the number of engines and weight class of aircraft models allow further grouping of models with the same engine type. Finally, we took advantage of the available set of heavy turbo-fan/jet aircraft images in the FGVC-Aircraft dataset to recognize the manufacturer of these aircraft (class h-k). The similarities between the proxies from Embraer, Bombardier, and Gulfstream in the image dataset encouraged us to have them in one class (class j).

2) Training Procedure
We randomly split the selected FGVC-Aircraft images into three sets: training set (66% of data), validation set (17% of data), and test set (17% of data applied to each batch of 32 images propagated through the network at each epoch. We applied two regularization strategies to avoid overfitting: kernel-level regularizer using L2 regularizers with the factor of 0.01 for convolutional layers in the custom model, and dropout with the probability of 0.35 for fully connected layers in all three models. We used categorical cross-entropy as the loss function and Adam optimizer to train the classifier models. The three CNN-based classifiers were optimized to achieve maximum performance by allowing a maximum of 100 epochs training for generated agents (i.e., model configurations) from the predefined search space in section III-A1. The optimum architectures found after the search process (using the HyperBand algorithm) are tabulated in Table 2. The total number of the parameters in the models was bounded by imposing the fixed boundaries for the search space. Interestingly, the number of the parameters of the optimum architecture found for the custom model is close to the mean of all 40 generated agents (Fig. 7). Next, we continue training each optimized model (which was already trained for 100 epochs) and terminate the training process when the validation loss does not improve for more than 30 epochs. For the transferred models, after having the model converge on the target dataset, we fine-tune the model by unfreezing the base model and continuing the training process in order to utilize the entire model capacity. The learning rate is reduced 10 times during the fine-tuning process to achieve incremental improvements and, at the same time, avoid overfitting.

3) Comparison of the CNN-based Classifier Models
The confusion matrices in Fig. 8 summarize the performance of all three models on the test set of the FGVC-Aircraft dataset using confusion matrices. They illustrate a consistently better performance in classifying heavyweight jet aircraft classes (g-m classes) using the custom classifier model, which exhibits a minimum classification accuracy of 80% (Fig. 8, top). The custom model has a slightly higher rate of false positives for lightweight trijet aircraft (class f) as this aircraft class is confused with twinjet aircraft (class e) in 18% of cases. A potential reason could be the lower number of samples for class f aircraft. Similarly, the unbalanced dataset has potentially caused the slightly higher false-positive rates in detecting single-engine turbo-prop aircraft (class b) for all three classifier models. A portion of misclassifications relates to the adverse effect of possibly predicting the classes with the higher number of image samples. The column "i" in confusion matrices of all three models indicate the impact of the unproportionately larger number of BOEING image samples, especially on the Xception model with a total falsepositive rate of 47% (Fig. 8, bottom).
As Fig. 9 shows, the top 3 predicted classes on the test set by applying all three models can secure more than 96% accuracy, so we consider the top-3 recognized classes (after applying the classifier to the detected aircraft in selected video frames from the operation time window) while fil- tering the registration database. It minimizes the chance of eliminating the actual tail number from the database. Top-3 accuracy is the accuracy where the true class matches with any one of the 3 most probable classes predicted by the model. We selected the custom classifier model for use in the proposed identification system for its fewer parameters (faster inference), lower loss, higher accuracy, and slightly more consistent recognition performance for the individual aircraft classes.

B. DATA COLLECTION FROM AIRFIELD
The model performance is explored on data collected at three general aviation airports in Utah: Bountiful Airport, Heber Valley Airport, and Brigham City Municipal Airport. The airports' runways ranged from moderately short to long runways with lengths of 4,700 ft, 6,800 ft, and 8,800 ft, respectively. The three airports allowed us to have various aircraft models in our collected data, which warranted different aircraft approach speeds. We used commercial off-theshelf digital cameras (Fuji Film and GoPro) and recorded the video data with 3120x1760 resolution. The data is collected in several sessions in the daytime on sunny, overcast, and snowy days. Under adverse weather conditions, aircraft classification is expected to work, but the visibility of tail numbers can be affected. Fig. 10 exhibits a schematic display of the camera layout and cameras' positioning and field of view at one end of the airport. With runway safety area considerations, two cameras are located at each end of the runway and oriented toward the runway ends, capturing the landing aircraft off the runway surface level while approaching the airport. These cameras capture the departure operations on the runway surface level since aircraft pilots run from the end of the runway to add a safety margin for a stop on the runway in case of an engine failure or rejected take-off. In the case of multiple runways, each runway would require a separate set of cameras.
To validate the occurrence of the operations, we assigned the two ends of the runway and entrance taxiways to human observers who documented the traffic via visual inspection and by monitoring the pilots' radio communications on the airport common traffic advisory frequency (CTAF). The designated camera setup successfully captured all 141 flight operations during the experimental data collection sessions. The operations comprised departures as well as all landing VOLUME X, 2022

Consecutive video frames captured during operation time window
Zoomed Area Airfield 2D View 3D View of the Runway End   11). There were quite a few long landings related to trainingrelated touch-and-go activities.
In this experiment, we used the database of Utah-registered aircraft (released by FAA) and added the tail numbers of the out-of-state aircraft that were previously recorded in the database of test airports. The database was comprised of 7,800 registered aircraft.

V. RESULTS
We evaluate performance of the proposed system by applying our algorithms to 2,351 frames extracted (every 0.1 seconds) from video data of 141 aircraft operations (as stated in section III).

A. PERFORMANCE OF TAIL NUMBER DETECTION AND TEXT RECOGNITION
To evaluate the performance of the proposed tail number detection method, we used precision and recall calculated by computing the similarity of the predicted bounding boxes to their associated ground truth using the value of their intersection over union. Table 3 shows that the developed tail number detection using Haar cascade is remarkably faster than the TextBoxes and appropriate for a real-time system. Despite the significant difference in their processing times, the accuracy of both detectors is high and only moderately apart, with the minor superiority of the TextBoxes. The major reason was the occlusion of the letter "N" by the aircraft wing at some frames during the operation.
On average, 67% of the individual tail number characters are correctly recognized by applying the CRNN (text recognition algorithm) to the detected tail number image boxes in the video frames of each operation. The actual tail numbers are obtained by reviewing the video and matching human observations with the notes taken during in-field data collection sessions. Errors in tail number detections and character recognitions mainly stemmed from the small sizes or skewness of tail numbers characters as well as the blur caused by fast-moving aircraft.

B. PERFORMANCE OF AIRCRAFT IDENTIFICATION WITH COMPONENT ANALYSIS
Despite the CRNN limited accuracy for recognizing individual characters of the aircraft tail number in the video frames of each operation, the identification approaches predicted the actual tail number of operations with an incremental improvement in each succeeding approach. Fig. 12a shows that approach 3 with aircraft classification (two-step identification) resulted in the highest accuracy for predicting the actual tail numbers in 141 operations. Approach 1 that finds the most frequently predicted label sequence with no lexical constraint achieved the lowest accuracy. Approach 2 increased accuracy by associating the tail number lexicon with sequence recognition in the identification process, with an additionally increased accuracy using aircraft classification. After observing the CRNN results, the proposed approach 3 (the most completed probabilistic MFB approach in our framework) corrected a considerable portion of the missidentifications of approach 2.
The aircraft classification module played a significant role in the reliability of the proposed identification method. The CNN-based custom classifier model successfully predicted the class of 97.16% (137 out of 141) operating aircraft in the top-3 predictions. The consistency of the top-3 accuracies for both the FGVC-Aircraft image test set (Fig. 9) and the collected video data from airfield (Fig. 12b) indicates the high generalizability of the classifier model, which stems from the applied effective regularization. Fig. 12 shows that even though the classification module miss-classified 2.84% of the operating aircraft, it increased the overall identification accuracy of the third approach from 82.97% to 89.36% by removing irrelevant tail numbers from the search space in the database. On average, the two-step method reduced the search space in the database by 56%. The classification module could be even more helpful in the case of a larger database where it would help filter even more irrelevant tail numbers.
The two-step method alleviates disparities between the text recognition results that stem from noises in the tail number images by removing many irrelevant but potentially similar tail numbers (Fig. 13a). Having a close estimation of the operating aircraft class is another benefit, especially where the identification module fails. Examples include the cases where the aircraft has no imprinted tail number or the tail number is unreadable. In this particular study, the classifier correctly recognized the top 3 classes of 73% miss-identified aircraft (Figures 12b, 13b, and 13c).
It is noteworthy that "class a" aircraft was the prevalent class in the collected video data, thereby adding more difficulty to the identification task due to the higher variations This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.  in the appearance of these aircraft tail numbers. Specifically, our collected video data mainly contains operations of the general aviation fleet, which comprises more than 90% of the registered civil aviation aircraft in the US [13]. These aircraft types typically operate in airports that are outside of the airspace where ADS-B enabled avionics are required [1], increasing the usability of a vision-based system as an alternative solution. As Fig. 14 illustrates, aircraft classes in our collected video data comprise class a, b, c, e, and j. Interestingly, all captured aircraft classes b, c, e, and j are correctly classified.

C. PROCESSING TIMES
The computational experiments in this paper are benchmarked on a 64-bit Windows Operating System with a 3.20 GHz IntelR Core(i7) CPU. The CPU-only inference closely estimates the system's performance for compiling the system on low-cost processing platforms such as single-board computers for edge computing. We observe the total processing time of 122 milliseconds per frame during the operation time window (i.e., tracking aircraft and recognition of the target aircraft class and its tail number). Since this is close to our extraction rate of video frames (every 0.1 seconds), we conclude that the proposed system can be used in near-realtime using an affordable processing platform. Furthermore, aircraft classification reduced the processing time of the final identification by approximately 50%. Fig. 15 exhibits the extracted video frames from the proposed video-based aircraft identification system.

VI. DISCUSSION
The proposed two-step method has shown to be an effective approach for identifying general aviation aircraft. The collected data contained many challenges regarding aircraft identification, including many tail numbers that were difficult to read due to small fonts, inclined fonts, and blurred images. Moreover, the MFB framework is superior to single-shot detection because using only a subset of the frames causes misidentification due to reduced or obstructed visibility.
Any vision-based system may underperform at nighttime. Nevertheless, nighttime operations are rare at non-towered airports. Airports that expect significant nighttime operations may still use the proposed system in conjunction with appropriate lighting or night-vision cameras. Additionally, the minimum required visibility for landing aircraft on a runway exceeds the distance between the camera and the operating aircraft in an airport.
The proposed classification scheme is designed to cover a wide range of airports with various fleet mixes, including general aviation fleet and heavy airliners. That said, other classification schemes might work better for specific airports, e.g., commercial service, general aviation, or cargo service airports. In those cases, the availability of the training images and visually perceivable information (aircraft specifications) in the registration database would help determine the best classification scheme.
The proposed vision-based system can assist in automating the billing process associated with landing fees in nontowered airports. This application has not been considered in the previous works [10], [11]. Moreover, unlike the ADS-Bbased methods [17], our proposed method can identify non-