Drone Navigation Using Region and Edge Exploitation-Based Deep CNN

Drones are unmanned aerial vehicles (UAV) utilized for a broad range of functions, including delivery, aerial surveillance, traffic monitoring, architecture monitoring, and even War-field. Drones confront significant obstacles while navigating independently in complex and highly dynamic environments. Moreover, the targeted objects within a dynamic environment have irregular morphology, occlusion, and minor contrast variation with the background. In this regard, a novel deep Convolutional Neural Network(CNN) based data-driven strategy is proposed for drone navigation in the complex and dynamic environment. The proposed Drone Split-Transform-and-Merge Region-and-Edge (Drone-STM-RENet) CNN is comprised of convolutional blocks where each block methodically implements region and edge operations to preserve a diverse set of targeted properties at multi-levels, especially in the congested environment. In each block, the systematic implementation of the average and max-pooling operations can deal with the region homogeneity and edge properties. Additionally, these convolutional blocks are merged at a multi-level to learn texture variation that efficiently discriminates the target from the background and helps obstacle avoidance. Finally, the Drone-STM-RENet generates steering angle and collision probability for each input image to control the drone moving while avoiding hindrances and allowing the UAV to spot risky situations and respond quickly, respectively. The proposed Drone-STM-RENet has been validated on two urban cars and bicycles datasets: udacity and collision-sequence, and achieved considerable performance in terms of explained variance (0.99), recall (95.47%), accuracy (96.26%), and F-score (91.95%). The promising performance of Drone-STM-RENet on urban road datasets suggests that the proposed model is generalizable and can be deployed for real-time autonomous drones navigation and real-world flights.

In this work, we propose a drone navigation method using 79 Region and Edge Exploitation-Based Deep CNN. Our main 80 focus is to provide a CNN model through which drone navi-81 gation can be done using 1 single mounted camera. 82 A UAV effectively flying in the streets must follow the road 83 and react to dangerous circumstances in the same manner that 84 any other manned ground vehicle would. As a result, we intro-85 duce employing the information acquired from ground auto-86 mobiles incorporated in the above-mentioned settings. The 87 Drone-STM-RENet architecture is compatible with the input 88 feature map dimensions and output multi-class challenge by 89 changing the initial and final layers (2 classes). Comprehen-90 sively, contributions made by this work are as follows: 91

1) We propose a novel Drone Split Transform Merge 92
Region and Edged based convolutional neural network 93 (Drone-STM-RENet) that can undertake a safe UAV 94 flying in urban areas by predicting the probability of 95 collision and steering angle. 96 2) For training, an outside dataset collected from vehicles 97 and bicycles was used. To allow a UAV to detect poten-98 tially harmful scenarios, an outside collision sequences 99 dataset is used.

108
Despite our system's impressive outcomes, we do not want 109 to change the standard ''map-localize-plan'' drone naviga-110 tion approaches; instead, we want to explore if a likewise 111 task can be performed with a single shallow neural network. 112 Traditional and learning-based techniques, we believe, will 113 eventually complement one another.

115
In this section, a detailed review of the available literature 116 is given, which is not only the inspiration for this research 117 but provides insights on how Convolution Neural Network 118 emerged as one of the most researched areas in artificial 119 intelligence.

120
The obstacle identification and avoidance tasks [25] are 121 closely linked with those of autonomous navigation. Object 122 detection methods are based on either machine learning algo-123 rithms or computer vision techniques to identify obstacles.

124
The GPS range and optical sensors of an unmanned aerial 125 vehicle (UAV) that operates outside are usually used to assess 126 the device status, detect the presence of obstacles, and deter-127 mine the flight route [2], [5]. However, these kinds of work 128 are still likely to suffer in urban areas because of the building, 129 huge rushes, and dynamic states. This results in critical unob-130 served errors in the estimation of system state. In such cases, 131 SLAM is a typical approach in which the robot develops 132 a map of the environment while also self-locating within 133 it [26]. Although it may be beneficial for global navigation 134  is utilized to generate the drone steering instructions. The 177 information was enhanced to create a 'navigation enve-178 lope.' In applications such as surveillance, package delivery, 179 or humanitarian assistance distribution, the technique may 180 be used to automate drone navigation to reduce the number 181 of excursions or visits to the same location. A D-CNN was 182 built to produce drone steering instructions based on observed 183 images and to achieve autonomous drone navigation. The 184 suggested approach uses video acquired by a camera mounted 185 on drone. When a CNN and fully connected regression are 186 used together, it has been shown that it is possible to forecast 187 the steering angles required to fly the drone on its planned 188 path with high accuracy.   Accurate analysis of real-time images [43] in a chang-245 ing environment is complex due to the following factors: 246 (i) low contrast variation between foreground and background 247 boundaries, (ii) high texture variation (iii) significant vari-248 ation in the size, shape, and position of the foreground in 249 images and (iv) low illumination. Additionally, these photos 250 are heavily deformed due to the shifting environment's noise 251 level.

252
This work reported approach for automating drones based 253 on CB-CNN. The suggested approach is used to forecast the 254 steering angle and probability of collision. In this context, 255  The workflow for the drone navigation is illustrated in fig. 1. proposed block is made up of four sub-branches, as shown 295 in the diagram. The principle of Region and Edge-based 296 feature extraction is applied thoroughly at every branch, with 297 maximum and average-pooling along with convolution and 298 ReLU-activation to capture discriminating features in consid-299 erable detail. To extract patterns from an image dataset, the 300 Drone-STM-RENet separates the input into four branches, 301 uses the RE-based operator to learn region-specific variations 302 and their distinct boundaries, and then uses the concatenation 303 operation to merge the output from numerous paths. Drone-304 STM-RENet extracts a different set of abstract level features 305 by stacking two blocks of STM with the identical topology in 306 series. At the end of the process, Dropout is used, followed by 307 ReLU activation and two fully connected layers in parallel for 308 steering angle prediction and for calculating the probability of 309 colliding with another vehicle. Incorporating this concept into 310 the Drone-STM-RENet allows it to extract various variants 311 from the input feature maps. Vehicle data has a lot of variance in the images that's why a 315 strong CNN is essential for excellent discrimination. Using 316 Channel Boosting [46], [47], the proposed Drone-STM-317 RENet's discriminating ability is improved. It proposed the 318 concept of Channel Boosting to solve complex problems. 319 Extraction of significant characteristics from distorted pic-320 tures is made possible by average smoothing of the image 321 contents inside the distorted images recorded, and outliers are 322 also managed using the suggested approach. The region and 323 edge operation assists in managing region homogeneity and 324 smoothing and the systematic exploitation of resources inside 325 a given block. It helps delineate discriminating boundary or 326 edge characteristics.

328
CNNs with various architectural designs have varying capa-329 bilities for feature learning. Multilevel information can be 330 seen in many channels learned from distinct deep CNNs. 331 These channels reflect different patterns that might help in 332 VOLUME 10, 2022 TABLE 1. Quantitative data on problems involving regression and classification: When doing the steering regression task, EVA and RMSE are measured, and when performing the collision prediction task, Average accuracy and F1-score are evaluated. Also we have calculated confidence interval for recall at 95%. In comparison to the baselines that were evaluated, our model performed wonderfully. Even though Drone-STM-RENet does not have many parameters, it performs brilliantly on both tasks.  [48] in which a single learner makes the ultimate choice 338 by assessing numerous image-specific patterns [49].  The testing set was used to evaluate the model, which was 351 maintained distinct from the training and validation datasets. 352 We utilize one of the freely accessible datasets from Udac-353 ity's project [51] to learn steering angles. Nearly 60,000   fig. 3.

378
The UAV is instructed to fly with a forward velocity of v k and 379 a steering angle of θ k using the outputs. The network uses the 380 probability of collision to regulate the forward velocity: when 381 the collision probability is zero, vehicles are commanded to 382 move with the maximum velocity i.e. V max , and it stops when 383 the probability of collision is close to 1. The forward velocity 384 is filtered using a low pass filter (1 > α > 0) as shown in (1).
Similarly, The predicted steering angle is also converted into 387 a yaw angle (rotation around the z-axis). We transform s k 388 from the [−1, 1] range to the required yaw angle θ k in the 389 ] range and low-pass-filter it as shown in (2).
Lastly, a novice dynamic navigation strategy that will oper-392 ate a drone accurately with only a single forward-looking 393 camera is developed. Our method has the advantage of cal-394 culating a collision probability using one image, excluding 395 any prior knowing the platform's velocity. We believe that 396 the proposed architecture will be making decisions based on 397 the range between noticed items in the sphere of vision [42]. 398

399
Hyper-parameters that are used for training is illustrated in 400 the table 2.

403
We initially analyze our model's regression performance 404 using the Udacity dataset's testing sequence [54]. We utilize 405 two measures to measure steering prediction performance:

406
Root Mean Square Error (RSME) and Explained Variance 407 Ratio (EVA). We use F-score and average classification accu-408 racy to evaluate collision prediction performance.   [50]. 411 Weak baselines include a constant estimator that anticipates 412 0 for steering angle always and ''no collision,'' as well as 413 a random estimator. Our method outperforms it in terms of 414 prediction accuracy. In Figure 5 comparison of various known 415 architectures can be seen alongwith the Drone-STM-RENet 416 and it can be seen that our architecture is performing very 417 well in comparison to other known architectures in literature 418 and also number of parameters and number of layers are less 419 as shown in 1. Additionally, a favorable contrast to the VGG-420 16 network [50] demonstrates a utilization for the residual 421 learning scheme in terms of generalization. As demonstrated 422 in Table 1 and fig. 4 our design achieves a great performance 423 as compared to other models in the literature.

425
Various common performance indicators are used to assess 426 the performance of the implemented models. Accuracy, 427 recall, and F-score is examples of these measures. (3) is 428 used to assess accuracy by counting the total number of 429 accurate assignments. Recall is a metric that measures the 430 fraction of accurate collision probability estimates (4). The 431 F-score is specified by (5). Explained Variance is a met-432 ric that is used to measure the quality of a regressor (6). 433 The major goal of Equation (7) is to enhance the true pos-434 itive rate while lowering False-Negative for foreground or 435 region of interest detection. As a result, the Standard Error 436 (S.E.) at 95 percent Confidence Interval (CI) is presented 437 for recall/sensitivity/detection rate to identify the uncertainty 438 of the proposed Drone-STM-RENet [46]. In this instance, 439 z = 1.96 for S.E. at the 95% CI. The mistake is expressed 440 as (100-98)/2, or 20%. Images or the size of the dataset are 441 both considered to be total samples. Figure 6 shows