Automated Firearm Classification From Bullet Markings Using Deep Learning

Firearm violence is one of the leading causes of death in many countries around the world, including Thailand. This work proposes a fast and accurate automated method to classify firearm brands from bullet markings. Specifically, a panoramic image of a bullet collected from a crime scene was captured using a developed mobile phone application and custom-built portable hardware. The top three state-of-the-art CNNs pretrained on ImageNet—DenseNet121, ResNet50, and Xception—were further trained on the same training set, which was composed of 718 bullets collected from eight different firearm brands—Beretta, Browning, CZ, Glock, Norinco, Ruger, Sig Sauer, and Smith & Wesson—using a five-fold cross validation technique. DenseNet121 provided the highest AUC of 0.99 for CZ classification (the most common registered firearm brand in Thailand) and the highest average AUC for the eight firearm brands (0.9780 ± 0.0130 SD), which was significantly higher than those of ResNet50 and Xception. In addition, there were no interaction effects between the CNN model and firearm brand on AUC. DenseNet121, which had the highest AUC, was evaluated on the test set (72 bullets), and the results showed that the Beretta and CZ classifications had the lowest accuracy (91.18%), followed by the Browning and Norinco classifications (96.88%), whereas the Glock, Ruger, Sig Sauer, and Smith & Wesson classifications had the highest accuracy (98.41%). These results suggest that the developed mobile phone application based on a deep learning algorithm and the custom-built portable hardware have promising potential for use at crime scenes to classify firearms from bullet markings. By narrowing down the list of suspects, this convenient approach can potentially accelerate bullet identification processes for many forensic science examiners.


I. INTRODUCTION
The right to bear arms is one of the most contentious issues worldwide. In 2016, there were 250,000 deaths worldwide as a result of firearm attacks, making firearm violence one of the leading causes of death in many countries, including Thailand. In 2017, Thailand had a gun violence rate of 3.71 victims in every 100,000 people; many of these events occurred in southern Thailand [1]. Thailand has the 19 th highest gun The associate editor coordinating the review of this manuscript and approving it for publication was Kim-Kwang Raymond Choo . violence rate in the world. Moreover, among the countries in East Asia, Southeast Asia, and Australasia, Thailand has the second highest firearm violence rate after the Republic of the Philippines. In fact, Thailand has a higher gun violence rate than other countries with a notoriety for violence, such as Iraq, which has a gun violence rate of 3.54 victims in every 100,000 people [2].
Apart from the high number of registered firearms, the high rate of gun violence in Thailand is predominantly a consequence of the prolonged process of firearm identification. There are only two laboratories in Thailand: the Central widths of the lands and grooves [8], [14]. Table 1 summarizes class characteristics of eight commonly registered firearms in Thailand. On the other hand, the individual characteristics are the random imperfections in the barrels that create the toolmarks on the bullets [14]. The class and individual characteristics of all the firearm brands used in this paper are shown in Fig. 2. Individual characteristics are more specific and are normally used for bullet identification; however, the individual characteristics of firearms can be switched by exchanging the barrel with that from another firearm from the same manufacturer. For this reason, analyzing the individual characteristics of a discharged bullet does not necessarily indicate the identity of the criminal. Hence, class characteristics can prevent incorrect associations with unrelated suspects.

B. RELATED WORK
To date, many traditional bullet identification methods have been developed to help identify unknown firearms, including microscopic detection [8], [15], [16], the continuous shooting method [17], and roughness measurements with a stylus [18]. These identification methods are time consuming and are likely to be subjective [19]- [21]. Although there are accepted methodologies that allow qualitative comparisons of bullets [22], there has not been much quantitative proof to make morphological deductions. Furthermore, professional examiners must be present during the process for accurate conclusions to be drawn.
Since the abovementioned identification methods are time consuming and have certain limitations, many forensic science examiners have used integrated ballistics identification systems (IBIS Heritage) [19] to digitally compare images of ballistic evidence to real bullet markings [19], [20]. However, using the IBIS Heritage can lead to loss of information because the correspondence and the surface topography are not one to one, possibly reducing the accuracy of the conclusions. Furthermore, the list of firearm rankings generated by the IBIS Heritage is limited to the firearms that are inside its database [23], [24], thereby requiring a high level of expertise for analysis. Most importantly, the price for adopting the IBIS Heritage is quite high for developing countries including Thailand, as the system costs approximately 98,000,000 Baht (approximately 3,250,000 USD), not including annual maintenance costs. For this reason, the IBIS Heritage is available in only six of the seventy-seven provinces in Thailand.
Most studies have used image processing for bullet identification, but image processing has drawbacks. Chu et al. proposed an algorithm to perform correlation calculations to identify 48 bullets fired from six different barrel manufacturers, which were further classified into their class characteristics according to their land-engraved areas [19]. After further applying the correlation function for automatic selection of the effective correlation area and the extraction of a signature bullet profile, the correlation results showed a 9.3% higher accuracy rate than current commercial systems. Although this work improved the ranking of correct matches, the developed software could only distinguish the impression quality along the longitudinal (bullet axis) direction.
Lu et al. implemented a 3D laser color scanner (3DLCS) to scan bullets and map a 3D model of the bullet on a 2D plane [17]. They fired nine bullets-three bullets from three different barrels-and further classified the bullets into different groups. The unwrapped images were then preprocessed by various method, including image enhancement, edge detection, binarization, thinning and denoising. Then, different algorithms were applied to extract the individual characteristics of the bullets. Their results showed that there was a high similarity between the bullets fired from the same barrel, one group exceeding 90% similarity [17]. However, their experimental results were based on a few data points, suggesting the possibility of an inaccurate algorithm if implemented in larger data samples.
Xie et al. developed a measurement system based on a Talyrond 365 roundness/cylindricity system [25]. Their 3D system consists of a rotary stage that allows the collection of all facets of 3D surface topography information; moreover, this system is composed of a small and shapeable smart stylus sensor that can measure form deviations [18]. In addition to the use of a novel measurement system, feature extraction of bullet marks based on surface topography techniques has been implemented [22], [25]. They extracted class characteristics through surface segmentation, effectively separating the bullets into regions. On the other hand, individual feature extraction was completed through surface abstraction, wavelet filtering, and comparison. After initially identifying the bullets, firearm examiners suggested the best matches out of the whole list [22].
In addition to image processing, recent works have implemented machine learning methods for bullet identification. Petraco and Chan used multivariate statistical analysis and machine learning for toolmark impression pattern recognition and bullet identification [26]. Striation patterns were viewed as mean profiles, a form of multivariate feature vectors. Along with the use of standard multivariate machine learning methods, these mean profiles were used to estimate identification error rates by using a combination of principal component analysis (PCA), canonical variate analysis (CVA), and support vector machines (SVMs). Their experimental results showed low identification error rate estimates, with a general error rate of 1% with 95% confidence intervals. Many pieces of software were developed for the visualization of toolmark surfaces in the database; however, the analysis of 3D impression patterns and incomplete toolmarks from striation patterns proved to be too complicated for software analysis.
Banno also used a machine learning approach, wherein they developed a neural network to process binary signals obtained from striation images of ten unidentified bullets and ten database bullets [27]. After inputting two signals into the network, the network evaluates the similarity of the signals and produces a score that indicates the similarity of the bullet striations. Even though the neural network was able to correctly pair the unidentified bullets and database bullets, the number of bullets in the dataset used to train the algorithm proposed was small, and a final decision must still be made by a forensic scientist. Changmai applied the k-nearest neighbor (k-NN) machine learning algorithm to classify data based on similar features of the sample data [28]. Six real bullets were tested and classified into three classes. The k-NN algorithm can correctly classify 86.67% of firearms on average.
Although previous studies have used image processing and machine learning to identify bullets from bullet markings, the corresponding feature selections are time consuming and require a high level of expertise to accurately compare the bullets. In this paper, we have incorporated the implementation of deep learning for firearm classification, since deep learning has resulted in significant improvements in classifications such as object detection, face recognition, and speech recognition [29]- [32]. To the best of our knowledge, deep learning has never been used for bullet identification or firearm classification. Therefore, in this paper, we will develop a deep learning algorithm to classify panoramic images of markings on 9 mm bullets (the most commonly used bullet size in Thailand) collected from crime scenes into eight classes: Beretta, Browning, CZ, Glock, Norinco, Ruger, Sig Sauer, and Smith & Wesson (the most commonly used firearm brands in Thailand). Table 2 summarizes the advantages and disadvantages of previous and our approaches.
First, we developed portable hardware and used a mobile phone to record an immersive video from which a panoramic image of the bullet markings can be obtained. Second, all of the data from the discharged 9 mm bullets were collected at the Firearms and Ammunition Subdivision of the Central Police Forensic Science Division in Thailand, a division that issues firearm-carrying permits and investigates physical evidence from crime scenes. Then, the collected data were preprocessed and augmented prior to deep learning development. These steps will be explained in Section II. The top three performing deep learning algorithms tested on ImageNet [33] were used in this paper. These algorithms will be summarized in Section III. The hyperparameter settings of these three deep learning algorithms, five-fold cross validation, and evaluation will be explained in Section IV. The experimental results and statistical significance of these algorithms are presented in Section V. The findings of this paper will be concluded and discussed in Section VI. Finally, future work will be discussed in Section VII.

II. DATA ACQUISITION AND PREPARATION
The cylindrical-based surface of a bullet contains characteristic marks, such as grooves, thread patterns, and microscopic details, made by the specific gun barrel used to fire the bullet. These unique characteristic marks are crucial inputs for bullet identification, so close-up panoramic detail of the bullet surface is required as raw digital data for further computational analysis. The instrument incorporates a bullet rotating mechanism and an illuminating system to capture high-quality images of the bullet markings with a smartphone. Moreover, the aspherical plano-convex lens was attached over the smartphone camera to shorten its focal length; as a result, the smartphone was able to capture closeup particular details on the bullet surface.

A. HARDWARE DEVELOPMENT
Our developed hardware was mainly made of 10-mm thick polymethyl methacrylate (PMMA), a clear plastic, as shown in Fig. 3 (a)-(c). This hardware was prototyped by laser cutting, 3D printing, and a computer numerical control (CNC) machine, and all the parts were fastened together with bolts and nuts to enhance precision and avoid blurring effects from the evaporated solvent of the adhesive. Two white light LED lamps were placed on both sides of the bullet to illuminate the surface details without contrast generated by grooves and thread patterns. They were set at 30-45 degrees to the horizontal plane (the angle depends on the material, e.g., copper and lead) to obtain the highest intensity and the least reflection on the surface where the lens is focused (along the green line shown in Fig. 3 (a)). Additionally, the illumination system provides stable light to obtain a similar condition of white balance in each captured image.
A 9.0 mm diameter BK-7 glass plano-convex lens was used ( Fig. 3 (b)) and embedded into the smartphone case with the planar side facing the smartphone camera; its back focal length was 15.0 mm. Consequently, the distance between the lens and the bullet surface was fixed at 15.0 mm, and the design of the whole instrument was mainly dependent on this parameter. During use, the smartphone is placed on a stationary platform that is parallel to the horizontal plane, where the camera faces the bullet and the smartphone screen faces the user. The user is able to place the smartphone on-off the platform without clamping the smartphone, making this platform compatible for every smartphone model. This design decreases the amount of stray light that can interfere with the white balance of the camera. A driving thread is rotated by VOLUME 8, 2020 a knob, as shown in Fig. 3 (b), to bring the sliding jaw into close contact with the bullet; as a result, the jaw is able to move along with the motor.
A 28BYJ-48 stepper motor combined with its ULN2003 driver (Amazon, United States) was selected for the bullet rotating system because it operates with 5 V DC, which can be directly controlled by an Arduino UNO microcontroller (Amazon, United States). The motor consists of 4 magnetic stepping poles and a driving gear system, as shown in Fig. 3 (b), which takes 512 steps to complete one full rotation. The speed of the motor was fixed by the delay time (time taken for the stepper motor to move from the current step to the next step) of 121 milliseconds; hence, one full rotation took 62 seconds. On the other hand, the motor is attached to the ''static jaw''. An O-ring is used to connect the motor shaft to a plastic cylindrical jaw, which has a 6 mm inner diameter and a 2 mm round thickness at the end. This jaw was designed to fit the bullet head tightly because the bullet has to be rotated along with the motor without slippage; therefore, this jaw is called the ''static jaw'', as shown in Fig. 3 (c). The ''sliding jaw'' is a plastic cylinder, with one flat end designed to push the bottom of the bullet and the other end designed to connect to a 15 mm bearing that is aligned concentrically to the motor shaft, as shown in Fig. 3 (c). The ''sliding jaw'' can translate along the Y-axis (parallel to the axis of the center of the bullet).

B. MOBILE APPLICATION
An iPhone8 was used to record the video of the markings on the rotating bullet. The 9 mm bullet rotates at 62 seconds per revolution. Thus, the angular velocity of the rotating bullet is 5.81 degrees per second, and the translational speed at the surface is 0.46 mm per second. Taking into account the shutter speed of the iPhone8 camera, the experimental results indicate that rotation at this speed produces images at the optimal quality. The 80-second long videos cover 30% more than one revolution, which ensures that all the surface details were recorded. The videos were recorded at a resolution of 1,080p with a frame rate of 30 frames per second.
To further aid forensic science examiners, we designed a mobile application to increase the convenience and efficiency of bullet identification. The mobile application can be installed on various smartphones. Using the smartphone camera and the developed hardware, the mobile application can record a video of a bullet specimen. The video recorded has a resolution of 1,080p (columns) at 30 frames per second, meaning that 30 pictures of the rotating bullet specimen are taken per second. This is the standard number of frames per second for most videos captured on a smartphone. Each frame is cropped from 1,920 × 1,080 pixels (1,920 rows×1,080 columns) to 1,920 × 2 pixels (1,920 rows×2 columns), i.e., one pixel from the middle of the frame to the left and to the right, respectively. This is the size that fits with the rotational speed of the bullet specimen. Figure 4 illustrates each component of our developed bullet classification on the mobile application. The smartphone displays the video of the bullet specimen as it is recorded, as shown in (a). The red grid lines in the mobile application are shown on the smartphone screen, which allows the user to check whether the height of the bullet specimen can be recorded. The top red line is placed below the top of the bullet because the parts that were excluded did not have any useful striations that can be used to identify the firearm brand. Since the mobile application is created to classify 9 mm bullets, the size of the red grid lines can be used to check whether the size of the bullet is appropriate for classification. The user can check whether the height of the bullet specimen can be recorded, as shown in (b). The red line allows the user to check the width of the bullet specimen, as shown in (c). The user can also check if the bullet striations are placed in the middle, which is the correct position for evaluation, as shown in (d). While a video of the bullet specimen is recording, the video is converted into a panoramic image, as shown in (e), where the yellow arrow indicates the direction of the panoramic image being taken. This process was performed to reduce the file size for each bullet specimen, decreasing the time taken to transfer data to obtain the predicted firearm brand. Then, the developed algorithm will analyze the panoramic images of the bullet specimen to identify the firearm brand in 62 seconds.

C. DATA COLLECTION
The data were collected at the Firearms and Ammunition Subdivision of the Central Police Forensic Science Division in Thailand, a division that issues firearm-carrying permits and investigates physical evidence collected from crime scenes. The data collected for this paper were bullets acquired from real crime cases, which are presented in Table 3. The table shows that the number of copper bullets is significantly higher than the number of lead bullets. As the bullet passes through the barrel, bullets that are not covered with a copper jacket cannot withstand the increased friction, resulting in fouling [13]. For this reason, most people prefer bullets with a copper jacket in semiautomatic guns. Thus, most of the bullets collected from crime scenes are copperjacketed bullets.

D. PREPROCESSING
To process the images of bullet markings, a panoramic image must be obtained; however, not every smartphone can take panoramic images. To solve this problem, we developed a mobile application that allows videos to be taken from smartphones that lack the ability to capture panoramic images. By converting these videos and generating panoramic images in our mobile application, the control points of the sequence of images taken as a subset of a single video can be extracted; this technique is similar to panoramic stitching in various software packages [34]. As shown in Fig. 5, the recorded video was first split into 1,860 frames (62 seconds×30 frames), then the images were cropped at the middle of the frame (1,920 × 2 pixels), as shown in the calculation below, and the background images were removed.

E. DATA AUGMENTATION
Data augmentation is a technique to increase the size of a dataset to avoid overfitting. The features of the dataset are spiral grooves with a degree of rotation and left-hand/righthand circular rotation; therefore, a flip or rotational method cannot be implemented. Thus, the data were augmented by implementing the shifting method. The original image, a 360degree panoramic image of a fired bullet, was shifted by 10 degrees until the same panoramic image as in the beginning was obtained (10 × 36 = 360 degrees). As shown in Fig. 5

III. DEEP LEARNING ALGORITHMS
The goal of this section is to investigate the performance of different deep learning algorithms in classifying panoramic images of bullet markings into eight different firearm brands. To accomplish this aim, we selected deep learning algorithms that have one of the smallest sizes and produce one of the most accurate results among the deep learning algorithms using Keras, which were tested on the ImageNet dataset. The three deep learning algorithms with the highest performance and the lowest number of layers were chosen because we would like our developed model to be lightweight yet accurate. This study used three models, residual neural network (ResNet50) [35], densely connected convolutional network (DenseNet121) [36], and Xception [37], which were pretrained on the ImageNet dataset. Specifically, the pretrained weights were used as the initial weights. Then, these weights were further updated in the training set until the local optimum was reached for each model. These three deep learning algorithms are summarized hereinafter.

A. RESIDUAL NEURAL NETWORK
Residual neural network (ResNet) [35] is one of the first architectures that can handle sophisticated deep learning tasks. In the past, attempts have been made to increase the number of layers of deep neural networks for extracting highlevel features to obtain a higher understanding of the data and make better predictions. As the number of neural network layers increases, the gradient of the loss function decreases and becomes too small for effective training, since the weight and biases of the initial layers will not be able to update effectively in each training session. Although renormalization and the rectified linear unit (ReLU) activation function might be able to resolve the vanishing gradient problem, they are not the best alternatives when the depth increases due to the emergence of the degradation problem. As neural networks start to converge, the accuracy becomes saturated, leading to a higher training error [35].
He et al. [35] proposed ResNet to alleviate the vanishing gradient and degradation problems using skip connections, allowing the model to successfully train many layers of a network by feeding the output of one layer as the input of the subsequent layers to reconstruct information required from previous layers. Consequently, the skip connection forms an alternate shortcut path that reduces the vanishing gradient problem and degradation problem. Furthermore, the skip connection allows the model to learn an identity function that ensures that higher layers work as good or better than the lower layers [35]. In this paper, ResNet50 was used.

B. DENSELY CONNECTED CONVOLUTIONAL NETWORK
The densely connected convolutional network (DenseNet) has a convolutional neural network architecture that is the state-of-the-art according to the classification results with VOLUME 8, 2020 the ImageNet validation dataset. Huang et al. [36] used a direct connection from each layer to every other layer in a feed-forward direction. Each layer in the network receives concatenation of the feature maps produced in the previous layers as inputs and implements nonlinear functions such as batch normalization, ReLU, and convolution or pooling.
After the nonlinear function operation, the product feature maps of each layer are used as inputs to every subsequent layer. The concatenation operation is not effective when the size of the feature maps varies; therefore, the pooling operation is important by changing the size of the feature maps. To facilitate the pooling operation, the architecture is divided into multiple blocks, i.e., densely connected dense blocks, and the layers between dense blocks are transition layers that perform batch normalization, convolution, and pooling operations.
Generally, each function produces k feature maps, a hyperparameter called the growth rate. The growth rate determines how many feature maps in each layer contribute to the network. The feature maps can be accessed anywhere in the network once they are contributed. Unlike traditional architecture, there is no need to replicate one layer to another. Each layer in the network produced k feature maps and typically has many inputs. For this reason, a [1 × 1] convolution was used in the bottleneck layer to reduce the number of input feature maps to 4k. Another merit of DenseNet is compactness: the ability to reduce the number of feature maps at transition layers. If a dense block contains m feature maps, the number of feature maps will change to θm after the transition layer, where 0 < θ ≤ 1 is referred to as the compression factor. In addition to compactness, the architecture of DenseNet has several advantages: alleviating the vanishing gradient problem, strengthening feature propagation, and reducing the number of parameters [36]. In this paper, DenseNet121 was used.

C. XCEPTION
Xception [37] is a convolutional neural network that was developed by Google, and the model was the 1 st runner up in ILSVRC 2015. Xception has been shown to outperform VGGNet [38], ResNet, Inception-v3 and Inception-v4 [39]. Xception works using depthwise separable convolution and shortcuts between convolution blocks. Depthwise separable convolution involves two main processes: depthwise convolution and pointwise convolution. This approach has the advantages of efficiency in terms of computation time. In depthwise convolution, a convolution of size n × n is applied. Unlike depthwise convolution, pointwise convolution is a [1 × 1] convolution that is applied to change the dimension. Another process that is a part of Xception is the skip connections found in ResNet; therefore, Xception is an improvement of ResNet [37].

IV. EXPERIMENTAL SETUP
After data augmentation, all of the marking images from the discharged bullets in the training dataset were used to train DenseNet121, ResNet50, and Xception. The Adadelta optimization algorithm [40] was used to iteratively update the network with the training data, and the hyperparameters used in the training process were set as follows: a dropout rate of 0.50, a minibatch size of 32, and 30 epochs of training. The initial weights were the weights of a model pretrained with the ImageNet dataset [41]. After passing a panoramic image of the discharged bullet as an input to the models, the features were automatically extracted, resulting in a feature vector with 1,024 dimensions; the model weights were updated at the end of each learning rate cycle. The feature vector was fed as an input to the softmax classifier [42]  Panoramic images of bullets from the development dataset were split into five folds, where each fold is composed of 53 guns, corresponding to 6,408−6,552 images (median 6,444). Subsequently, four folds were taken as the training dataset, and one fold was taken as the validation dataset. The testing dataset consists of 12 images of Beretta bullets (12 bullets from four guns), three images from Browning bullets (three bullets from one gun), 20 images from CZ bullets (20 bullets from six guns), 16 images from Glock bullets (16 bullets from five guns), seven images from Norinco bullets (seven bullets from one gun), four images from Ruger bullets (four bullets from one gun), six images from Sig Sauer bullets (six bullets from two guns), and seven images from Smith & Wesson bullets (seven bullets from two guns). After the five-fold cross-validation was performed, the overall sensitivity, specificity, accuracy, and area under the receiver operating characteristic (ROC) curve (AUC) for each experiment were calculated.

B. EVALUATION
The efficiency of the models in classifying fired bullet images as Beretta, Browning, CZ, Glock, Norinco, Ruger, Sig Sauer, and Smith & Wesson were evaluated by graphing the ROC curves. The performance of each model was evaluated by calculating the AUC. Because CZ has the most registered firearms in Thailand, we focused on the classification of CZ. The model with the highest AUC was selected to be tested on the testing dataset.

C. SIMILARITY SCORES
The experimental setup was designed for classification among 8 firearm brands. Then, the confusion matrix was obtained. The similarity score and the perceptual distance for each brand pair were systematically derived from the confusion matrix based on a method proposed by [43]. The similarity score between each pair of firearm brands was calculated from the confusion matrix: where S ij is the similarity between brand i and brand j and P ij is an element of the confusion matrix when given with brand i (row) and classified as brand j (column). Finally, the perceptual distance (D ij ) is derived from the similarity score as follows [43]:

V. EXPERIMENTAL RESULTS
After performing 5-fold cross-validation, ROCs were plotted, and AUCs were calculated. Figure 7    Two-factor balanced ANOVA was applied to determine whether there were any significant differences among the three CNN models and any interaction effect between the CNN model and the firearm brand. Therefore, we considered  two independent variables, the CNN model (DenseNet121, ResNet50, and Xception) and the firearm brand (Beretta, Browning, CZ, Glock, Norinco, Ruger, Sig Sauer, and Smith & Wesson), and one dependent variable: AUC. The experimental results showed that the three CNN models are significantly different [F(2, 96) = 6.28, p = 0.0027]. DenseNet121 provided the highest average AUC, and post hoc analysis showed a statistically significant higher AUC than that of ResNet50 (p = 0.0018) and a higher AUC than that of Xception (p = 0.2547). In addition, there are no interaction effects between the two independent variables (CNN model and firearm brand) on AUC [F(14, 96) = 1.45, p = 0.1447]. When considering the AUC of DenseNet121 in classifying firearm brands from bullet markings, the average AUC from six firearm brands-Beretta, CZ, Glock, Ruger, Sig Sauer, and Smith & Wesson-are significantly higher than that of Browning. DenseNet121 was the model chosen for firearm classification from bullet markings in this paper. The accuracy, sensitivity, and specificity of each firearm brand tested on the test set are shown in Table 5. From the results, DenseNet121 has very high accuracy for all gun brands, ranging from 91.18% (Beretta and CZ), 96.88% The confusion matrix of DenseNet121 on the testing dataset is shown in Table 6, where the row is the ground truth and the column is the predicted class. Each number represents raw data, whereas its normalized percentage is represented in parentheses. For Beretta, ten of twelve fired bullets were correctly classified, but two fired bullets were misclassified as CZ. For Browning, only three fired bullets were available in the test set: one fired bullet was correctly classified, but two were misclassified as Beretta. For CZ, 15 of 17 fired bullets were correctly classified, whereas one fired bullet each was misclassified as Glock and another was misclassified as Smith & Wesson. Impressively, bullets fired from Glock and Smith & Wesson were all correctly classified. For Norinco, five fired bullets were correctly classified, but one fired bullet was misclassified as Beretta and another was misclassified as CZ. For Ruger, three fired bullets were correctly classified, and one fired bullet was misclassified as CZ. For Sig Sauer, five fired bullets were correctly classified, and one bullet was misclassified as Beretta. Among all misclassified fired bullets, fired bullets were misclassified as Beretta and CZ most frequently (four bullets each). Specifically, among the four fired bullets misclassified as Beretta, two bullets were fired from Browning, one bullet was fired from Norinco and one from Sig Sauer. For the four bullets misclassified as CZ, two bullets were fired from Beretta, one bullet was fired from Norinco and one from Ruger. In addition, one fired bullet  was misclassified as Glock and another was misclassified as Smith & Wesson. These will be discussed in the next section. Table 7 presents the distance scores between pairs of eight firearm brands, which were calculated from Eq. (2) using the percentage values in Table 6. The shorter the distance score is, the higher the similarity between a pair of firearm brands (based on bullet markings). From the results, Beretta and Browning are most similar with the shortest distance score (0.2513). This can be visualized from the bullet markings shown in Fig. 2

VI. DISCUSSION AND CONCLUSION
We have developed a highly accurate automated firearm classification method. The proposed method uses computer-aided approaches to provide fast and accurate firearm brand classifications of eight firearm brands from panoramic images of fired bullets. As the first firearm brand classifier on a mobile application to use deep learning, the classification of the most common registered firearms in Thailand can bring convenience and accelerate bullet identification processes for many forensic science examiners by narrowing down the list of suspects.
In Thailand, many criminals often use the same brand of high-quality hand guns because many firearms are expensive. Therefore, the classification of firearm brands can help identify criminals that have been involved in repeated crimes, regardless of where the crime is committed. Most importantly, the use of the mobile application is relatively cheap and is accessible for all officers without a high level of expertise in examining physical evidence collected from crime scenes. The proposed firearm brand classifier decreases the time required for forensic science investigations, increasing the likelihood of identifying and arresting criminals for gun violence.
To avoid overfitting of bullets with similar features that were fired from the same barrel, five-fold cross-validation was applied during data partitioning so that every bullet image fired from the same firearm was categorized as one subject and every fold was partitioned by subject. As a state-of-the-art classification approach, deep learning results benefit from the transfer of learning from pretrained networks that have a large dataset. For this reason, we chose the top three CNN models that have the highest accuracy and the fewest number of layers. According to a two-factor balanced ANOVA performed on the average AUC for all three models, as shown in Table 4, DenseNet121 outperformed ResNet50 and Xception, among which the performance differences were statistically significant. No interaction effects were found between the CNN model and firearm brand, suggesting that the performance metrics of all three CNN models were not influenced by any firearm brands.
The proposed models using DenseNet121, ResNet50, and Xception involved the Keras and TensorFlow backends. The reason behind the model selection is that we selected models have high accuracy scores and as few layers as possible; thus, DenseNet121 and ResNet50 were selected herein. We suggest that several advantages of DenseNet, including alleviating the vanishing gradient problem, strengthening feature propagation, and reducing the number of parameters [36], are the main reasons that it outperformed ResNet50 and Xception.
In addition, the dense connectivity pattern of DenseNet requires fewer parameters than ResNet and does not need to relearn redundant feature maps. In fact, ResNet contributes to the information that passes from its preceding layer to the subsequent layer but needs very little preservation and can be randomly dropped during training. Normal deep learning architectures use depthwise convolution followed by a pointwise convolution method, whereas the Xception architecture uses pointwise convolution followed by a depthwise convolution method, a distinction from other architectures that is worth exploring.
Out of all firearm brand classifications, Glock classification achieved the highest results, mainly because of its difference in rifling from the other seven firearm brands. Glock is the only firearm brand in the dataset that has a 6-groove righthand twist and polygonal rifling, as shown in Fig. 2(d). On the other hand, other firearm brands have a 6-groove right-hand twist and conventional rifling, as shown in Fig. 2. Similarly, Smith & Wesson bullets were rarely misidentified because the Smith & Wesson barrel has a 5-groove right-hand twist, which is different from other the firearm brands used in this experiment, as shown in Fig. 2(h).
There were a few drawbacks to the dataset, since there was a limited number of bullets available for data collection at the Firearms and Ammunition Subdivision of the Central Police Forensic Science Division in Thailand. First, the collected discharged bullet samples were composed of different materials because the dataset was collected from crime cases, which is related to the popularity of firearm brands used by criminals in Thailand. The Browning training dataset has 48 fired bullets from 14 guns, whereas the CZ has 226 fired bullets from 68 guns; consequently, the differences in the number of firearms and bullets fired from each firearm brand may affect the learning of the models.
In the experiment, an unequal number of lead bullets with and without copper jackets were used to train and test the model. The number of lead bullets without copper jackets collected was considerably lower than the number of lead bullets with copper jackets. Second, the number of discharged bullet samples from each firearm brand was unbalanced, which may lead to bias in the models. In our dataset, there was an overwhelming number of CZ bullets: 226 bullet fired bullets from 68 CZ guns. This could potentially affect the classification of other bullets fired from other firearm brands that have similar bullet markings as CZ, such as Beretta, Norinco, Ruger, and Sig Sauer. On the other hand, bullets with different markings from CZ, such as those produced by the barrel rifling types and number of twists of Glock and Smith & Wesson, seemed unaffected by the unbalanced data, i.e., the lead bullet samples in the Browning brand were included from only two firearms containing lead bullets (1 already partitioned into the test dataset), which cannot be partitioned 5-fold. However, this drawback was addressed by putting the mentioned firearm in every training set five-fold while leaving the validation set with no lead bullets.

VII. FUTURE WORK
Since the iPhone 8 was used to take videos of the rotating bullets, the optimal rotational speed of the motor was found based on trial-and-error. In our future work, we will use position markers placed on the two concentric shafts of our hardware to not only identify the width of the bullet specimen but also to calculate the optimal speed of the rotating motor so that it corresponds with the shutter speed of the smartphone used to take the videos.
In the future, we hope to develop our algorithm further to enable classification of more class characteristic subjects, such as the firearm model and serial number from bullet markings. Furthermore, we also aspire to expand our dataset so that damaged bullets can be classified, since our current dataset consists of undamaged bullets. Taking this limitation into consideration, we will crop out sections of a fewer number of pairs of lands and grooves from bullet images and then use the cropped images to train our model instead of using the whole panoramic image as our input.