An Ensemble-Based IoT-Enabled Drones Detection Scheme for a Safe Community

With the increasing use of Internet of Things (IoT)-enabled drones for various purposes, including photography, delivery, and surveillance, concerns related to privacy and security have arisen. Drones have the potential to capture sensitive information, invade privacy, and cause security breaches. Therefore, the need for advanced technology for the automated detection of drones has become crucial. In this paper, we propose an ensemble-based IoT-enabled drones detection scheme (in short, EDDSBS). The presented model is part of a computer vision-based module and uses transfer learning for improved performance. Transfer learning allows the reuse of pre-trained models and their knowledge in a different but related domain, enabling better performance with less training data. To evaluate the performance of the proposed EDDSBS, we test it on benchmark datasets, including the Drone–vs–Bird Dataset and the UAVDT dataset. The proposed EDDSBS outperforms the existing schemes of drone detection (i.e., in terms of accuracy). The results of the presented scheme demonstrate the potential of deep learning-based technology for automated drone detection in critical areas, such as airports, military bases, and other high-security areas. Thus the paper introduces a comprehensive process methodology for drone detection that can be applied in real-world settings for a sustainable and secure environment, which is required for a safe community.


I. INTRODUCTION
A SOCIETY that is safe and sustainable is an ecosystem in which the human, natural, and economic components are dependent on one another and derive their vitality from one another.Moreover, the subjects (i.e., human) feel safe and secure in that environment as they get protection from any kind of threats.In this particular environment, safety and sustainability remain for a long time [1].
The term "Internet of Things (IoT)" refers to actual physical things that include sensors, processing power, software, and other technologies, which can communicate to other systems and devices over the Internet to exchange their data [2].The different applications of IoT are smart home automation, smart manufacturing, smart farming, intelligent transportation system, security and surveillance, smart healthcare, and many more [2], [3].There are different variants of IoT, such as the Internet of Drones (IoD), Internet of Vehicles (IoV), Internet of Medical Things (IoMT), Industrial Internet of Things (IIoT), and Internet of Battlefield Things (IoBT) [4], [5].
With the advances of drones that are advanced and affordable for a wide range of consumers [6], [7], the field of aerial technology has revolutionized at a very rapid rate.Internet of Drones (IoD), also known as IoT-enabled drones, is an architectural framework that was created to facilitate rapid and secure communication between drones and users using the Internet as a medium [8], [9].However, this progress has given rise to various challenges and problems, with the major issues related to privacy concerns.This availability of low-cost IoT-enabled drones that are equipped with enhanced and high-powered cameras has made it easier for adversaries to spy and capture images and videos of others without their consent the [10].This rapidly increasing misuse of drones, particularly in sensitive areas such as the defence sector, has caused rapid concerns [11], [12].To address them, there is a growing need for IoT-driven automated drone detection systems using computer vision that can provide security and alert the authorities when drones venture into restricted airspace [13].
Machine learning (ML) has demonstrated its impact in multiple fields with its innovative capability in analysis and automation.ML in healthcare has been integrated into the Internet of Medical Things ecosystem for the diagnosis of diseases and developing personalized treatment plans [14].ML in the finance sector is extensively being used for automated fraud detection and market trend analysis [15].In the manufacturing sector, ML algorithms help to increase efficiency by predicting the failure of equipment and streamlining production workflow [16].Lastly, In the transportation industry, the inclusion of self-driving cars significantly relies on optimized computer vision models to drive and navigate safely in a real-time environment [17].
Computer vision is a subset of the artificial intelligence domain that deals with the development of algorithms that help train computers to understand visual data for extracting information and patterns.With increasingly powerful machines and advanced ML techniques, computer vision is being continuously applied in a variety of fields like robotics, facial recognition, medical diagnosis, and enhanced surveillance [18], [19].Transfer learning is a critical methodology in computer vision, enabling computers to utilize already trained models to be adapted for different tasks [20].Transfer learning has helped in developing extremely efficient object detection models, particularly in areas with less availability of data, including drone detection.
Ensemble models in computer vision is a paradigm that combines multiple deep learning models to build a robust and accurate classifier [21], [22].This technique involves training multiple models with the same or different architecture and layers that are then connected to each other to incorporate a single output.By capturing different aspects due to different architecture, these models are better in performance, notably by reducing biases and limitations that are prevalent in single models.In addition, ensemble models can help to improve generalization by reducing overfitting and providing a more stable prediction.Overall, ensemble models are a powerful technique in deep learning that can significantly improve the performance of machine learning models.
In the proposed scheme, the concepts of the Internet of Things (IoT) and deep learning are used.There is a Raspberry Pi, which acts as the central control server for an alarm detection module.The deployed cameras are equipped with the features of IoT and ensemble models of deep learning.The Raspberry Pi is equipped with a camera and is responsible for capturing and extracting images locally.These images are then sent to our Flask REST API, present in the cloud server for further processing and prediction.Once a drone is predicted, the Raspberry Pi triggered the alarm system, consisting of multiple buzzers and speakers, to alert the authorities of the drone's presence.To ensure timely action, we also connect the Raspberry Pi to Gmail SMTP Access to receive email notifications of drone detection.By integrating the Raspberry Pi with the alarm system and cloud-based API, we create an efficient and effective monitoring system to detect and respond to unauthorized drone activity in real-time.

A. RESEARCH MOTIVATION
This work's motivation can be summarized as follows.Advanced Drones with affordable prices and advancing technology are being utilized in various domains.However, with this surge, they are increasingly posing a security threat, with a spike in use for criminal activities, including surveillance and smuggling.Therefore, there is a critical need for an automated drone detection process and prevention measures to handle these privacy concerns [23].Computer vision envisioned techniques have shown promise in detecting drones from videos and CCTV footage, with advances towards realtime detection and monitoring.However, limitations in the algorithm are reflected in the number of false positives and false negatives, showing concerns about the robustness of the models.Ensemble models in deep learning have shown the potential to improve these baseline models' accuracy and robustness by reducing the impact of biases of the individual models.Hence it is crucial to continue research to develop effective drone detection schemes to counter these emerging threats [24], [25].The proposed ensemble deep learning-based scheme for drone detection from videos aims to utilize transfer learning-based ensemble models and provide a global and complete approach for automated drone detection.The scheme's potential impact includes enhancing public safety and privacy and reducing the potential for the preachment of privacy through attacks using drones.

B. RESEARCH CONTRIBUTIONS
The research contributions of this paper are given below: • This paper presents an ensemble-based IoT-enabled drones detection scheme (named EDDSBS).• To assess its impact on various performance parameters, a practical demonstration of the EDDSBS is performed.• The paper introduces a comprehensive process methodology for drone detection that can be applied in real-world settings for a secure environment.• In the comparative study, it is observed that the proposed EDDSBS performs better than the other existing schemes.

C. ROADMAP OF THE PAPER
The remaining part of this article is structured as follows.
An analysis and review of the related existing schemes is given in Section II.The details of the proposed drone detection scheme (EDDSBS) are provided in Section III.Then, the practical implementation of the proposed EDDSBS is conducted in Section IV.Furthermore, the performance comparison of the proposed EDDSBS with the other similar existing schemes is made in Section V. Finally, the paper is concluded in Section VI with some concluding remarks and future research works.

II. RELATED WORK
This section outlines the specifics of several currently implemented methods for detecting drones using vision-based techniques.
While radar-based methods are commonly used for detecting and tracking drones in airspace, some studies have been conducted on visual-based identification of drones.These studies typically involve using computer vision techniques to analyze video or image data and identify drones based on their appearance.
Rozantsev et al. [26] employed a Convolutional Neural Network (CNN) model and Histograms of Gradients for drone detection on a dataset of UAV and aircraft images.The study used a multi-scale sliding window technique to generate spatiotemporal cubes for detection.Samadzadegan et al. [27] presents a novel deep learningbased approach for efficient detection and recognition of drones.The method uses a CSPDarknet53 feature extraction network and monitors the IoU loss function to distinguish between drones and birds.The proposed approach can detect and differentiate between two types of drones as well as differentiate them from birds.Carrio et al. [28] presents a drone detection method that utilizes 6000 synthetic depth maps of drones and includes a 3D localization module for the collision-free deployment of drones.The method achieves an average detection rate of 74.7% with a detection distance of 9.5 meters.Lv et al. [29] proposes a background reduction module that is combined with drone detection using SAG-YOLOv5 models.The model's speed is increased by using SimAM's attention modules to reduce background and increase FPS.The method achieves a detection speed of 13.2 FPS for drone detection from high-resolution images under a fixed camera.Peng et al. [30] introduces a Physically Based Rendering toolkit for creating a synthetic dataset of drones with varying positions, orientations, camera specifications, backgrounds, and post-processing techniques.The method improves a Faster R-CNN model with training weights from Resnet-101 and achieves a precision of 80.69%.
Al-Qubaydhi et al. [31] utilizes an optimized version of YOLO, specifically YOLOv5, for drone detection in videos with diverse contrasts, including low contrast, using a dataset consisting of images of drones with various backgrounds such as water, buildings, trees, and humans.Seidaliyeva et al. [32] proposes a drone detection method that employs a background subtraction module and a CNN model for classification to enhance the model's robustness.This approach allows for the accurate detection of drones on a static background, with a processing speed significantly higher than existing approaches while maintaining comparable accuracy.Wang et al. [33] demonstrated a quick and efficient detection method for unmanned aerial vehicles (UAVS) that was based on video pictures recorded by stationary cameras.The technology saved money and cut down on operational costs.They identified moving items in the video by applying "temporal median background subtraction and then extracted global Fourier descriptors and local HOG features" from the moving object pictures.After that, the combined features were sent to the Support Vector Machine (SVM) classifier.So that they could be classified and recognized.
Fang et al. [23] looked at whether or not it would be possible to use a multistatic SDPR (MSDPR) for drone detection in practice.Analyses of signal processing processes involving multipath energy, extracted reference signal purity, and receiving antenna were utilized for the investigation of SDPR's detection range.Kang et al. [24] gathered the complete frequencies of leakage signals and the radiation pattern of a drone equipped with the GPS module while it was operating in an anechoic room so that they could conduct an analysis of the leakage that occurred from the GPS module while it was in use.They measured the leakage signals in an open-air environment using the collected data.It was determined through measurements taken in the open air that the theoretical attenuation effect was consistent with the measured value despite the fact that the distance varied.Delleji et al. [25] advised using a method of drone detection that included deep learning-based categorization and localization tasks in order to protect sensitive places and restricted areas.Particularly, they went with the YOLOv3 family of one-stage object detectors known for their speed and precision.Therefore, to better recognize small objects, such as small drones, they used the YOLOv3 deep learning neural network and worked to improve it.To do this, they upgraded the architecture of the network and fine-tuned its parameters.Reddy et al. [34] proposed a deep learning-based object recognition model, in which YOLOv3 was applied to a particular dataset in order to improve the speed and precision with which drones could be identified.An image classification technique based on a convolutional neural network was proposed by Chen et al. [35].This algorithm would convert the data collected by cooperative spectrum sensing at a sensing slot into a single image.In addition, to use more information and enhance the effectiveness of the detection process, they developed an algorithm for trajectory classification.This algorithm transformed the flying process of the drones into trajectory photos using consecutive multiple sensing slots.Furthermore, they did simulations to validate the performance of the presented technique using various parameter settings.
Unlu et al. [36] used vision-based features -2-dimensional scale, rotation, and translation invariant Generic Fourier Descriptor (GFD), which were utilized to distinguish drones from a dataset of birds by training a CNN model.Brown et al. [37] used various models for the detection of UAVs.They obtained a dataset of various images featuring UAVs and used it to build a classification model based on ResNet-18, VGG-16, MobileNetV2, and AlexNet.During the classification process, they took into account the view angle and elevation as crucial factors in determining the model's detection performance.Xun et al. [38] developed a drone surveillance system that employed the YOLOv3 model with pre-trained weights.The model was trained on a custom dataset and tested and validated in real-time settings using an NVIDIA Jetson TX2 computing device.Shi and Li [39] employed a drone detection framework that utilized three models -YOLOv4, YOLOv3, and SSD (Single-Shot-Detector) -with a CSPDarknet53 backbone network structure to ensure lightweight models for real-time drone detection.The models were trained and tested on an augmented dataset that included images collected from the Internet and their own collected images.

III. THE PROPOSED DRONE DETECTION SCHEME: EDDSBS
In this section, we provide the details of EDDSBS.The architectural representation of EDDSBS scheme is given in Figure 1. Figure 2 illustrates the flow of execution for the various processes involved.
• The process involves installing the application on the security server and authenticating the cloud server hosting the Ensemble Deep Learning Model.The next step is to install the background subtraction module on the security server machine and carry out an authentication process between the alarm system and the security server.Meanwhile, the security server registers the installed camera with a secured channel.• The installed camera system sends video surveillance of the airspace to the security server for monitoring.Installation of application on security server SS i .

3:
Authentication of application server with cloud server CS j .

4:
CS j hosts ensemble deep learning model EDL CS j .

5:
Deployment of alarm system AS k .

6:
Install background subtraction module on SS i .

7:
Carry out an authentication process between AS k and SS i .

8:
SS i registers camera CM l .Then CM l is deployed.

9:
Drone DR r is flying randomly.

10:
CM l sends video recording VD CM l of surveillance to SS i for monitoring.

11:
SS i extracts image frames from VD CM l by applying image processing techniques such as dehazing, denoising, and resolution enhancement. 12: SS i sends enhanced frames to installed background subtraction module BSM m .

13:
BSM m uses pre-trained model to split the frame into the background and foreground.

14:
The image with the removed background is sent to CS j for further analysis and prediction.

15:
EDL CS j does the analysis.AS k notifies authority through sound alerts or email notifications for proper action taking.CS j adds information of DR r in L DR .
22: end for installation of an application on the security server SS i .Then, the application server authentication with the cloud server CS j is performed using some standard method.The CS j hosts ensemble deep learning model EDL CS j .Moreover, there is a deployment of alarm system AS k .There is the installation of a background subtraction module on SS i .Also, we perform an authentication process between AS k and SS i .Again, SS i registers the cameras CM l before they are deployed.A drone DR r is flying randomly in the deployment area.CM l sends video recording VD CM l of surveillance to SS i for monitoring.At this point, SS i extracts image frames from VD CM l by applying image processing techniques such as dehazing, denoising, and resolution enhancement.After that, SS i sends the enhanced frames to the installed background subtraction module BSM m .BSM m uses the pre-trained model to split the frame into the background and foreground.The image with the removed background is sent to CS j for further analysis and prediction.Here, EDL CS j does the analysis.If EDL CS j detects DR r as a drone, then AS k notifies authority through sound alerts or email notifications for proper action taking.Otherwise, it continues the detection process.At the end, CS j adds information about the detected drone DR r to the list of detected drones L DR .

A. DATASET ACQUISITION AND PREPROCESSING
For the implementation of our proposed approach, we collected a hybrid dataset consisting of drone and bird images.We utilized two publicly available datasets, namely Birds Vs Drone and UAV dataset, as well as gathered additional images by Web scraping.This comprehensive dataset comprises 3000 images, consisting of 1500 images of flying drones and 1500 images of birds, which will be used to test the classification accuracy of our model.
In order to make the image data suitable for further analysis, the first preprocessing step we implemented in this study was resizing.This was done to standardize the image size, making it easier to work with during subsequent stages of the study.Next, we utilized dehazing to remove any atmospheric haze or fog that may have been present in the images, which could have affected the accuracy of subsequent analyses.We also applied denoising to remove any random noise that may have been present in the images, thus improving the overall clarity and detail.Finally, we applied color filtering to enhance specific colors or remove unwanted color casts.Overall, these preprocessing techniques were crucial in improving the overall quality of the image data, which ultimately aided in obtaining reliable and meaningful results from the subsequent analyses.
In addition to the previously mentioned preprocessing techniques, data augmentation techniques were also employed to increase the amount of training data available for the machine learning model.The data augmentation techniques we used included cropping, flipping, rotation, scaling, brightness, and contrast adjustments.Cropping involves selecting a specific region of an image and removing the rest to create a new variation.Flipping was used to create a mirror image of an image, which can often help the machine learning model learn more robust and invariant features.Rotation was applied to rotate the image by a certain degree, which can help the model recognize objects from different perspectives.Scaling was used to resize the image to a larger or smaller size, creating a new variation with different image dimensions.Brightness and contrast adjustments were applied to alter the brightness and contrast of the image, which can help the model learn to recognize objects under varying lighting conditions.Overall, applying these data augmentation techniques generated new and diverse variations of the original images and increased the training data by a factor of five, improving the model's robustness and performance on unseen test data.
The dataset was split into two subsets, one for training the model and the other for testing its classification accuracy.
The training subset consisted of 75% of the images (1125 drone images and 1125 bird images), while the testing subset contained the remaining 25% (375 drone images and 375 bird images).The dataset was balanced to ensure that there was an equal number of images in each class.
The image processing procedure is explained in Algorithm 2. It executes for all the given images SS i .Then, the deployed mechanism resizes the images to the standard size for easy analysis.It also applies dehazing to remove atmospheric haze or fog.Furthermore, it applies denoising to remove random noise and improve clarity.Again, it applies color filtering to enhance specific colors or remove unwanted casts.In addition, it does cropping of images to select specific regions and create new variations.After that, it applies the flipping of images to create mirror images for learning robust features.Next, it performs the rotation of images by certain degrees to recognize objects from different perspectives, and also perform the adjustment of brightness and contrast of images to learn object recognition under varying lighting conditions.It generates new and diverse variations of the original images to improve the model's robustness and performance on unseen test data.Finally, the dataset is splitted into training and testing subsets.Do resize of images to the standard size for ease of analysis.

3:
Apply dehazing to remove atmospheric haze or fog.

4:
Apply denoising to remove random noise as well as improve clarity.

5:
Apply color filtering to enhance specific colors or remove unwanted casts.

6:
Do cropping of images to select specific regions and create new variations.

7:
Apply the flipping of images to create mirror images for learning robust features.Generate new and diverse variations of the original images to improve the model's robustness and performance on unseen test data. 12: Split dataset into training and testing subsets.

B. BACKGROUND SUBTRACTION
To perform background subtraction of drone images using a pre-trained AlexNet [40], we first loaded the AlexNet model and replaced the last fully connected layer with a new layer that has only two output neurons, one for the foreground class and one for the background class.Then, we froze the weights of all layers except for the last layer so that only the new layer is trained on the drone dataset.Next, we collected a dataset of drone images and labeled them as either foreground or background, with the background being the areas of the image that do not contain the drone.We then trained the modified AlexNet model on this labeled dataset using a Stochastic Gradient Descent optimizer and binary cross-entropy loss function.
Once the model was trained, we used it as an inference model to perform this subtraction on new drone input images by passing the images through the trained model and extracting the foreground pixels as an output.The resulting foreground pixels thus represent the drone, while the background pixels represent the unnecessary background behind the drone that would not be considered for detection and identification.This approach was quite useful in our drone detection scheme as it allows for accurate segmentation and detection of the drone from the background, helping us in tracking and monitoring only the drone's movements and neglecting the background.

C. ENSEMBLE MODEL
Resnet50 [41] is a 50-layer deep convolutional neural that has found intensive use in image classification and object detection tasks.It incorporates multiple techniques, including its bottleneck design and skip connections, which help to minimize the problem of vanishing gradients by allowing the propagation of gradients.It provides high accuracy and allows to build of more dense layers, and thus has been used extensively in complex problems in medical image analysis and facial recognition.
YOLOv4, (You Only Look Once version 4) [42], is a well-known state-of-the-art object detection model that can efficiently detect objects in a real-time environment.It uses a one-stage detection algorithm as its architectural methodology, enabling it to predict the bounding boxes of objects in a single pass, along with classification into different object classes with prediction probabilities.YOLOv4, along with its multiple versions, is well renowned for its speed and accuracy and finds usage in multiple use cases, including autonomous vehicles and surveillance systems.
To build the proposed model, we utilized a transfer learning-based approach to create an ensemble model consisting of YOLOv4 and ResNet50 models as the solo components.The input image sent from the background subtraction module is duplicated and sent to both the ResNet50 and YOLOv4 models.The output from each of these models is then passed through a dropout layer, and the resulting two feature vectors are concatenated using a concatenation layer.This concatenated feature vector is then further reduced in dimensionality by a max-pooling layer and sent to a final dropout and dense layer.A sigmoid activation function is applied to this output layer to classify the image as a drone or non-drone.
The procedure for training the model is explained in Algorithm 3. It first loads the augmented training data, and then reads through each image in the training data.Moreover, background subtraction is applied to separate the image's foreground and background.If the foreground is empty, it goes to the next image.Otherwise, it continues with the process.It then sends the foreground image to both the Resnet50 and YoloV4 models, and obtains the output features from both models.After that, it sends each output to a dropout layer, and concatenates the output features from both dropout layers using the concatenation layer.Furthermore, it passes the concatenated feature vectors through the maxpooling and dense layers.After that, it passes the output of the dense layer through the sigmoid layer for classification.At this stage, there is a calculation of the loss between the predicted output and the true label.It again uses back-propagation to update the weights of the model.If the condition: P ACC < ACC is met, it goes to Step 2 in Algorithm 3 with an increment in the learning rate of the optimizer.In this way, by using this procedure, the required model is trained.

D. IOT BACKEND
In the proposed EDDSBS), the concepts of the Internet of Things (IoT) and deep learning are used.There is a Raspberry Pi, which acts as the central control server for an alarm detection module.The deployed cameras are   Go to the next image.Get the output features from both models. 12: Send each output to a dropout layer.

13:
Concatenate the output features from both dropout layers using the concatenation layer.Pass the output of the dense layer through the sigmoid layer for classification.

16:
Calculate the loss between the predicted output and the true label.

17:
Use back-propagation to update the weights of the model.Required model is trained.

22:
end if 23: end for equipped with the features of IoT and ensemble models of deep learning.The Raspberry Pi is equipped with a camera and is responsible for capturing and extracting images locally.These images were then sent to our Flask REST API, present in the cloud server for further processing and prediction.Once a drone was predicted, the Raspberry Pi triggered the alarm system, consisting of multiple buzzers and speakers, to alert the authorities of the drone's presence.To ensure timely action, we also connected the Raspberry Pi to Gmail SMTP Access to receive email notifications of drone detection.By integrating the Raspberry Pi with the alarm system and cloud-based API, we created an efficient and effective monitoring system to detect and respond to unauthorized drone activity in real time.

IV. PRACTICAL IMPLEMENTATION
This section provides practical details on the implementation of the proposed EDDSBS, including information on the hardware and software utilized.The setup had a processor of 2 X Intel Xeon and 12 GB of random access memory (RAM).The implementation was done over the Google Colab platform via a Ubuntu 18.04.5 LTS platform.The programming was done through Python 3.8 along with the Tensorflow library with Keras API.
We provide the details of the used UAVDT dataset, which is an unmanned aerial vehicle (UAV) detection and tracking benchmark dataset.It contains around 80,000 sample frames from ten hours of raw videos.There are three essential and fundamental responsibilities, namely object detection (abbreviated as DET), single object tracking (abbreviated as SOT), and multiple object tracking (abbreviated as MOT).The dataset was collected by using UAVs in a variety of challenging environments.Vehicles are the primary focus of attention in this benchmarking exercise.The frames have bounding boxes and a few other helpful attributes, such as the category of the vehicles and occlusion that have been manually annotated.The UAVDT benchmark comprises one hundred video sequences that are chosen from more than ten hours' worth of movies acquired with a UAV platform in various sites in metropolitan regions.The locations include squares, arterial streets, toll stations, highways, crossings, and T-junctions [43].
The required model was trained and validated on the desired dataset.We conducted "K4 cross-validation by dividing the dataset into four equal folds".After that, they were used to train and test the model each time for the average accuracy measurement.The "Stochastic Gradient Descent (SGD) optimizer and binary cross-entropy loss function" were used for the hyper-parameters for the compilation of the model.Then it was executed for fifty epochs.Inside the training process, the monitoring of performance metrics, i.e., training accuracy and loss, validation accuracy and loss, was estimated.It ensured that the model was not overfitting the training data.

A. EVALUATION METRIC
The proposed EDDSBS was evaluated using four key parameters, like "true positive (TP), false positive (FP), true negative (TN), and false negative (FN)."The TP and TN parameters provide the measurement of the number of correctly identified drones and non-drones (like, birds and airplanes), respectively.The FP and FN parameters provide the measurement of the number of incorrectly identified non-drones as drones and drones as non-drones, respectively [44], [45].
• Accuracy: It is a very important performance parameter, which is measured as all correctly identified cases [45].Therefore, utilizing accuracy is imperative when classes are equally important.It is estimated as follows: • Recall: The number of "positive class predictions made out of all positive examples in the dataset is calculated as recall [45]."It is estimated as follows: • Precision: The number of "positive class predictions that actually belong to the positive class is measured by precision [45]."It is estimated as follows: • F1-score: Also referred to as "F1-measure, which is calculated through the harmonic mean of precision and recall."It provides the "exact estimate of the incorrectly classified cases than the accuracy [45]."It is mathematically denoted as: where P is Precision and R is Recall • False Positive Rate (FPR): False Positive Rate is a metric that measures the percentage of times that a negative class instance is wrongly forecasted as having a positive outcome [44].It is estimated as follows: • False Negative Rate (FNR): False negative rate is a metric that measures the proportion of correctly predicted positive class instances that end up being wrongly labeled as negative [44].It is estimated as follows:

B. RESULTS AND DISCUSSIONS
Table 1 contains the details of the performance metrics of EDDSBS.It includes parameters, like accuracy, precision, recall, F1-score, false positive rate (FPR), and false negative rate (FNR).The values obtained for various performance parameters for the proposed EDDSBS with respect to accuracy, precision, recall, F1-score, FPR, and FNR are 91.20%,89.92%, 92.80%, 0.9134, 10.40%, and 7.20%, respectively.Figure 3 depicts the visual representation of the accuracy parameter for training and validation of the proposed EDDSBS.The obtained confusion matrix of the proposed EDDSBS is given in Figure 4.It provides information about various parametric values.For example, there are 348 true positives, 27 false negatives, 39 false positives, and 336 true negatives for the proposed EDDSBS.

V. COMPARATIVE STUDY
In this section, we have compared the effectiveness of the proposed EDDSBS with other existing schemes.The comparisons are made in terms of accuracy and other important features.
The accuracy of the considered schemes of Unlu et al. [36], Brown et al. [37], Xun et al. [38],   Shi and Li [39], and the proposed EDDSBS are 85.30%, 85.90% (VGG-16), 88.90%, 89.32% (YOLOv4), and 91.20%, respectively.The details of the comparison between the proposed EDDSBS and existing schemes are presented in Table 2.The results demonstrate that the proposed EDDSBS performed better than the other existing approaches.

VI. CONCLUSION AND FUTURE WORK
In this article, we presented a novel ensemble-based IoTenabled drones detection scheme using transfer learning and background subtraction technique (EDDSBS).The presented results demonstrate that the proposed EDDSBS can effectively detect drones with higher accuracy and precision.The proposed EDDSBS also showed superior performance as compared to the competing existing schemes in terms of detection accuracy.Therefore, we believe that the proposed EDDSBS has significant potential to provide security and safety against unauthorized flying drones, which is applicable in various day-to-day applications.
In the future, we would like to extend the proposed EDDSBS to incorporate other advanced machine learning techniques, such as attention-enabled deep learning and reinforcement learning, in order to improve its accuracy and efficiency further.We would also like to add some cyber security mechanisms (for example, authentication, key management, and intrusion detection) in the presented scheme as well.

FIGURE 1 .
FIGURE 1. Architectural representation of the proposed EDDSBS.

FIGURE 2 .
FIGURE 2. Process flow diagram of the proposed EDDSBS.

Algorithm 2
Image Processing Procedure Output: Preprocessed and augmented image dataset 1: for all given images SS i do 2:

8 : 9 :
Do rotate images by certain degrees for recognizing objects from different perspectives.Do scaling of images to larger or smaller sizes create new variations with different dimensions?10: Do adjusting of brightness and contrast of images to learn object recognition under varying lighting conditions.11:

Algorithm 3 2 : 3 :
Model Training Procedure Output: Trained ensemble model with prediction accuracy P ACC greater than threshold accuracy ACC 1: for all augmented training data CS j and BSM m do Load the augmented training data.Read through each image in the training data.

4 :
Apply background subtraction to separate the foreground and background of the image.

5 :if
Foreground is empty then 6:

7 : else 8 :
Continue with the next step.

14 :
Pass the concatenated feature vector through the maxpooling and dense layers. 15:

18 :
if P ACC < ACC then 19:Go to Step 2 with an increment in the learning rate of the optimizer.

FIGURE 3 .
FIGURE 3. Accuracy curve of the proposed EDDSBS.
The background subtraction module uses the pre-trained model to split the frame into the background and foreground, where the foreground contains the object of interest.•Theimage with the removed background is sent to the cloud server hosting the ensemble Deep learning Model for further analysis and prediction by the Raspberry Pi through Flask API.•If the ensemble model predicts an object as a drone, the IoT-Enabled alarm system is activated, notifying the user through sound alerts or email notifications using SMTP (SMTP) service of the Raspberry Pi.The details of the proposed drone detection scheme (EDDSBS) are given in Algorithm 1.It works as follows.The detection process is executed for all flying drones.There is an Algorithm 1 Drone Detection Scheme (EDDSBS) Output: List of detected drones L DR 1: for All flying drones do • The security server (Raspberry Pi) extracts image frames out of the live video, applies image processing techniques such as dehazing, denoising, and resolution enhancement.•The enhanced frames are sent to the installed background subtraction module.•

16 :
if EDL CS j detects DR r then