UGEN: UAV and GAN-Aided Ensemble Network for Post-Disaster Survivor Detection Through ORAN

Post-disaster scene understanding frameworks are increasingly crucial in Search And Rescue (SAR) operations. Unmanned Aerial Vehicles (UAVs) provide an efficient means to carry out the task of scene understanding due to the higher altitudes at which they function. However, complex environments in post-disaster scenarios make it difficult for UAVs to detect humans or objects accurately. Inefficient object detection mechanisms lead to low accuracy for object detection tasks. Hence, to mitigate these issues, we propose a UAV and GAN-aided Ensemble Network (UGEN) framework for efficient ORAN-based post-disaster survivor detection. This approach deploys a Context-Conditional Generative Adversarial Network (CCGAN)-based model to remove occlusion in the images obtained from the UAVs. The UGEN framework classifies entities present in the visual scope of the UAV using a semantic segmentation framework deployed on the CCGAN-enhanced images, resulting in a pixel-level prediction of entities present in the post-disaster images. An ensemble network comprising a combination of single-stage and multi-stage detectors detects survivors present in the post-disaster scenario, thereby combining the benefits of both architectures, resulting in a reduced false negative rate and improved performance. An Open Radio Access Network (ORAN) executes data propagation between the UAV and the ground station for reduced transmission latency. The proposed model achieved a survivor detection accuracy of 96.7%.


I. INTRODUCTION
P OST-DISASTER scene understanding frameworks utiliz- ing Unmanned Aerial Vehicles (UAVs) are becoming increasingly crucial in Search And Rescue (SAR) operations and damage assessment initiatives [1].As the number of societal disasters initiated by natural hazards continues to rise, efficient and accurate disaster response has become paramount [2].UAVs Gunasekaran Raja is with the NGNLab, Department of Computer Technology, Anna University, MIT Campus, Chennai 600044, India (e-mail: dr.r.gunasekaran@ieee.org).
Abhishek Manoharan is with the Viterbi School of Engineering, University of Southern California, Los Angeles 90007 USA (e-mail: am04038@usc.edu).
Digital Object Identifier 10.1109/TVT.2023.3349312functioning at a higher altitude have proven to be an efficient and cost-effective method for scene understanding [3].Many deep learning strategies have been deployed to effectively execute visual detection from camera sensors mounted on UAVs [4].However, the complex environments in post-disaster scenarios make it difficult for UAVs to detect humans or objects accurately.Inefficient object detection mechanisms utilizing standalone Convolutional Neural Networks (CNN) lead to low accuracy and a long inference time for object detection tasks, which can be particularly problematic in urgent SAR situations.Survivors being small objects in post-disaster UAV images makes the task of survivor detection using traditional techniques daunting [5].Furthermore, survivors tend to suffer occlusion, wherein they are covered by debris or damaged buildings in the image, making the task of survivor detection further challenging.Generative Adversarial Networks (GAN) provide an efficient means to remove occlusions in images due to their underlying image regeneration mechanisms.Another contributing factor to poor survivor detection in real-time environments is the transmission delay during data transfer between the onboard module of the UAV and the ground station.Hence, an Open Radio Access Network (ORAN) is being extensively utilized to mitigate the shortcomings of data propagation in existing UAV systems by providing a medium for the effective and low-latency transmission of data between UAVs and ground stations [6].Adopting ORAN principles to design the radio unit of a UAV in accordance with open standards and interfaces enables smooth communication between the UAV's radio unit and the ground-based Software-Defined Radio (SDR) components.The SDR components can then process the signals received, thereby applying object detection algorithms to analyze the data collected by the UAV's sensors.
To mitigate the issues above of existing survivor detection systems, the main objective of the proposed UAV and GANaided Ensemble Network (UGEN) framework is to deploy a UAV-based scene understanding scheme involving a GAN-aided semantic segmentation mechanism through ORAN.A Context-Conditional GAN (CCGAN)-based denoiser results in images having lower occlusion, thereby highlighting the essential features of the object.Deploying CCGAN improves the detection of small and dense objects [7], which is the case of survivors in images obtained from a UAV, owing to the visual regeneration of occluded survivors.Semantic segmentation on the CCGANenhanced images leads to a pixel-level prediction of various entities present in an image, thereby generating a color coding for each entity.The ensemble model, a hybrid architecture consisting of single-stage and multi-stage detectors, detects the presence of survivors.The envisioned framework incorporates an ensemble comprising the You Only Look Once (YOLOv8), Faster Region-based CNN (Faster RCNN), and Cascade RCNN mechanisms, thereby improving the performance of survivor detection while decreasing the false negative rate.The framework deploys an ORAN for the flexible data transfer between the UAV and the ground station, resulting in lower transmission latency, thereby aiding in efficient SAR operations.Deploying the proposed UGEN framework increases the accuracy and efficiency of the survivor detection task, thereby resulting in successful SAR operations.
The key contributions of this paper include the following: 1) A CCGAN-based denoiser and occlusion remover mechanism to regenerate occluded survivors and improve their detection in post-disaster UAV images.2) A SegFormer-based semantic segmentation mechanism on the CCGAN-enhanced images to classify various entities, thereby improving survivor detection.3) A hybrid ensemble network comprising single-stage and multi-stage detectors for efficient survivor detection, resulting in the reduction of the false negative rate compared to the multi-stage mechanism and an improvement in the performance over the single-stage detector, and low-latency transmission of survivor data using an ORAN medium.The remainder of this paper is organized as follows.Section II consists of a summary of the related works.Sections III, IV, and V describe the proposed work and its components.Section VI evaluates and outlines the results of the overall preprocessing module comprising the CCGAN-based occlusion remover and the semantic segmentation module, along with the hybrid ensemble network's survivor detection performance and the transmission latency of the ORAN network.Finally, Section VII draws the conclusion and future work for this article.

II. RELATED WORKS
Post-disaster survivor detection is an essential task in SAR operations.However, survivors being hard to detect, especially UAV-mounted sensors, require specialized techniques to be effectively detected.Various CNN-based models have been developed for efficient object detection.Intending to decrease the excessive false negative rate of multi-stage detectors while improving the performance of single-stage detectors, [8], [9] propose various ensemble networks that combine a multi-stage detector with a single-stage detector for effective object detection.However, detecting objects in drone images is more challenging than in images taken from the ground.Hence, the accuracy of the model trained on UAV images is still low compared to ground images.Several deep learning techniques used for the detection of objects from UAVs are studied in [10], [11], wherein various architectures are outlined, including GANs, autoencoders, and deep Reinforcement Learning (RL), and their contributions to improving vehicle detection.RL is deployed in object detection mechanisms to enhance the performance and adaptability of the detection models and can be integrated to address challenges like active object detection, multi-object tracking, active learning, and resource allocation.[12], [13] discuss deploying UAV trajectory prediction mechanisms for efficient object detection and mapping.The use of a blockchain for secure data sharing among UAVs in post-disaster scenarios and a 6G drone ecosystem are discussed in [14] and [15], respectively.However, videos captured in the UAVs are sent to on-ground workstations or the cloud for processing rather than being implemented onboard the UAV.This leads to the absence of a lightweight system for real-time detection.
GANs are being increasingly deployed in many modern detection algorithms due to their extensive applications, like the removal of occlusion present in images.[16], [17], [18] discuss denoising and occlusion-removal strategies empowered by GAN-based systems for better image recognition, wherein image regeneration post occlusion removal leads to better object detection performance.Many UAV-driven and synthetic datasets have been generated to analyze efficient methods for survivor detection using UAVs.[19] introduces a high-resolution postdisaster UAV dataset named RescueNet, which contains comprehensive pixel-wise annotation of images for semantic segmentation to detect survivors after a disaster.However, smaller objects like vehicles and pools make it challenging to generate segmentation compared to larger objects like buildings and roads.[20] proposes the UAV-Human dataset for understanding human action, pose, and behavior.It contains 67,428 multi-modal video sequences of people for action recognition and predicts poses, encouraging the development of information-centric learning models for UAV-based human behavior understanding.However, The UAV-Human dataset limits attribute recognition since the dataset is captured over a relatively long period.
Semantic segmentation classifies all entities present in an object through pixel-level prediction.This enables the detection model to recognize objects present in an image quickly.The authors of [21] propose self-attention segmentation models on the HRUD dataset.However, the HRUD dataset presents significant challenges due to the presence of classes with varying sizes and similar textures.Debris, debris textures, sand, and buildings with destruction damage greatly influence the segmentation performance of the assessed network models.[22], [23] propose two high-resolution aerial datasets aimed at UAVbased semantic segmentation for regular environments and IoTbased applications, namely UAVid and DroneSegNet.The literature [24] discusses various semantic segmentation frameworks used for entity separation and classification in UAV-driven images.ORAN is being extensively deployed in UAV trajectory estimation and data transfer to and fro onboard modules and ground stations.[25], [26], [27] discuss the various methodologies deployed to implement ORAN in UAV networks and the applications and advancements in ORAN-based systems.[28] portrays deploying a 5G RAN-based network for the Internet of Things (IoT) device connectivity.The authors of [29] discuss the utilization of Cloud Radio Access Networks (CRAN) for UAVbased systems.A 5G communication medium is the backbone of ORAN-based architectures, owing to efficient data transfer.
Hence, to mitigate the limitations of existing systems for survivor detection, the UGEN framework provides an efficient UAV-based post-disaster survivor detection framework encompassing a CCGAN-based occlusion removal mechanism, a semantic segmentation mechanism on the CCGAN-enhanced images that classifies various entities in the scene, and a hybrid ensemble network comprising single-stage and multi-stage detectors for survivor detection using the semantically classified entities.This results in the reduction of the excessive false negative rate of multi-stage mechanisms and the improvement in the performance of the single-stage detector, which in turn leads to an efficient survivor detection model for SAR operations.

III. CCGAN-BASED OCCLUSION REMOVAL MECHANISM
The UGEN framework aims to serve as an efficient methodology to detect the presence of survivors in post-disaster scenes, thereby aiding SAR operations.The overall mechanism incorporates a Context Conditional GAN (CCGAN)-based occlusion remover that regenerates occluded survivors present in the post-disaster scene from images captured from a swarm of UAVs.The model implements semantic segmentation on the CCGAN-enhanced images to produce entity-wise color coding for efficient human classification and reduced ambiguity for survivor detection.On top of that, a hybrid single-stage and multi-stage-based ensemble network for efficient survivor detection is implemented.
Post-disaster images taken from a UAV bring in several issues, namely noise, distortion, and poor clarity.However, the primary concern that needs to be addressed in UAV images of postdisaster scenes is the obstruction caused by object occlusion.Occlusion occurs when objects of interest are covered or masked by other objects, noise, or other characteristics in the image.To facilitate the rendering of the image devoid of occlusions bound to occur in the case of panoramic disaster stills posing a hindrance to the detection of survivors, a CCGAN-based survivor occlusion removal mechanism has been implemented.Fig. 1 describes the overall workflow of the proposed UGEN framework for survivor detection, wherein post-disaster images obtained from a UAV are processed through a CCGAN-based occlusion remover mechanism followed by semantic segmentation for entity classification.The enhanced images obtained from the CCGAN-aided semantic segmentation module are provided as input to a hybrid ensemble network for detecting survivors.This is followed by the low-latency transmission of survivor location data using an ORAN medium.The CCGAN-based occlusion removal framework is deployed to regenerate occluded human where L G is the generator loss function, λ L 1 is the hyperparameter that controls the relative performance of the L1 loss, L 1 is the L1 distance between the generated output image and the real output image, λ g is the hyperparameter that controls the relative performance of the GAN loss, and L g is the GAN loss function.
CCGANs most commonly find their use in inpainting problems wherein a portion of the image is either to be removed or generated.The CCGAN model possesses an Autoencoder-based structure for the generator with an encoder and a decoder, both involving CNNs consisting of Conv2D operations.A periodic increase in the number of operations is executed, followed by a bottleneck layer bridging the encoder and decoder, thereby storing the encoded context of the image.The reconstruction loss Γ rec is a vital evaluation metric used to fine-tune the regeneration of occlusion-removed versions of the post-disaster images through image inpainting.It is represented as (2) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where P is the original region before damage, C E is the model and X is the entire image that needs to be inpainted.However, the overall loss of the context encoder Γ is measured to be minimized for efficient occlusion removal and is represented as where λ adv is the coefficient for tuning the influence of the adversarial loss Γ adv , and λ rec is the coefficient that tunes the reconstruction loss Γ rec .The decoder is similarly composed of Conv2D operations termed deconvolutions as the number of channels continuously reduces.Reconstruction Loss or L2 loss is the metric most commonly used to judge the generator's performance and is also termed generator loss.

IV. SEMANTIC SEGMENTATION
A post-disaster image taken from a UAV will comprise several objects or entities.For the task of survivor detection, human survivors and cars in which survivors may be present are the only entities of interest for the succeeding survivor detection model.Hence, other entities in the scene become unnecessary for survivor detection and may cause ambiguity while trying to detect survivors.Hence, to mitigate the problem of ambiguity and improve survivor detection performance and accuracy, we propose a semantic segmentation mechanism on top of the CCGAN-based image-enhancement mechanism to be able to differentiate between various entities present in the post-disaster UAV image.Semantic segmentation deploys pixel-level prediction of images and categorizes and classifies the various entities present in the image.Algorithm 2 describes the deployment of semantic segmentation in the UGEN framework.
The output segmentation map Y S is obtained by applying the convolution process over several layers for pixel-wise entity classification and is represented as where X is the input image, Θ is the set of network parameters for the convolutional layers, W is the weight matrix, g() represents the upsampling layers in the decoder, V is the input to the decoder, θ is the set of network parameters for the decoder, and b is the bias term.While implementing pixel-level entity classification, color coding is generated for each entity observed in the image.All entities belonging to one class of objects in the image are given the same color coding.
For the detection of survivors, the SegFormer model has been deployed.Fig. 2 (5) where N p is the total number of pixels in the image, y ij is the ground truth label for pixel (i, j), ŷij is the predicted label for pixel (i, j), and the summation is taken over all pixels in the image.The attention mechanism used in the model is of vital Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Input: CCGAN-enhanced images ( ) Output: Images with semantically classified entities (γ) 1: procedure Semantic_Segmentation 2: Φ[] ← entities present in 3: ν ← 0 4: W ← weight matrix 5: g(.) ← decoder upsampling layers 6: V ← input to decoder 7: θ ← network parameters for decoder 8: b ← bias term 9: for α in do 10: ν ← ν+ 1 13: end for 14: Embed Φ in γ 15: return γ 16: end procedure importance in the working of the SegFormer framework.The attention output A is represented as where Q and K are the queries and key matrices, d is the dimension of the embedding space, and V is the value matrix.The classes of importance, namely car and people, were classified efficiently through entity-wise color coding generated by the SegFormer module.

V. HYBRID ENSEMBLE NETWORK
The images received as output from the semantic segmentation module were given as input to the hybrid ensemble network for detecting the presence of survivors in the images.For survivor detection, we propose a hybrid single-stage and multistage detector combination as an ensemble model for object detection.Standalone object detection models using single-stage or multi-stage detectors have advantages and disadvantages.The main disadvantages include the excessive false-negative rate associated with multi-stage detectors and the comparatively low performance of single-stage detectors.We propose that deploying an ensemble model for survivor detection will nullify the disadvantages of both systems, thereby decreasing the false-negative rate but obtaining high performance.The set of bounding boxes Y E constructed using an ensemble of detection models is represented as where Y E is the set of predicted bounding boxes and class labels, N e is the number of models in the ensemble, f i () represents the ith model in the ensemble, X is the input image, and Θ i is the set of network parameters for the ith model.
where λ r , λ c , and λ a are hyperparameters that control the relative importance of the different components of the loss function, L r is the regression loss that imposes a penalty on the disparity between the predicted and ground truth bounding box coordinates, L c is the classification loss that imposes a penalty on the disparity between the predicted and ground truth class probabilities, and L a is the anchor loss that encourages the model to predict anchors that match the ground truth objects.Algorithm 3 describes the working of the hybrid single-stage and multi-stage ensemble network.The YOLOv8 model uses anchor boxes to predict bounding box coordinates, which act as k clusters of bounding boxes computed using k-means clustering Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
on the ground truth bounding boxes.The anchor loss L a is used to encourage the model to predict anchor boxes that match the ground truth objects and is represented as where λ a is a hyperparameter that controls the relative importance of the anchor loss, IOU (B j , A mj ) is the Intersection Over Union (IOU) between the predicted and ground truth bounding boxes, and A mj is the mth anchor box that best matches the jth ground truth object.Like the YOLOv8 model, the Faster RCNN model is trained to reduce the loss L of the model to optimize it for efficient survivor detection.The loss L F of the Faster RCNN model is represented as (10) where L rpn is the loss function for the Region Proposal Network (RPN) that encourages it to generate accurate proposals, and L roi is the loss function for the second-stage network that classifies the proposals and refines their bounding box coordinates.The RPN generates a set of object proposals by sliding a small window (called an anchor) over the feature map output by the backbone network.Each anchor is associated with a set of scores that indicate the likelihood that it contains an object and the accuracy of its bounding box coordinates.The RPN loss function L rpn encourages the network to generate accurate scores and coordinates for the positive proposals.The positive proposals are those that have a high overlap with a ground truth object.
The RPN loss also suppresses the scores for negative proposals that have low overlap with any ground truth object.The loss function L rpn associated with the RPN is represented as where L obj is the binary cross-entropy loss for objectness classification, λ reg is a hyperparameter that controls the relative importance of the two components, and L reg is the smooth L1 loss for bounding box regression.The second-stage network takes the proposals generated by the RPN and classifies them into one of the target classes, thereby refining their bounding box coordinates.The loss function for the second-stage network L roi consists of two components, namely the classification loss and the bounding box regression loss, and is represented as where L cls is the cross-entropy loss for classification, λ reg is the hyperparameter that controls the relative importance of the two components, and L reg is the smooth L1 loss for bounding box regression.Followed by the Faster RCNN model, the Cascade RCNN mechanism is deployed to enhance survivors' detection further.Like the Faster RCNN model, RPN loss associated with Cascade RCNN encourages accurate region proposal generation by penalizing the discrepancy between the predicted objectness scores and the ground truth labels.
where L rpn represents the overall RPN loss, which consists of the objectness loss L obj_rpn , which encourages accurate objectness predictions, and the bounding box regression loss L reg_rpn , which penalizes the discrepancy between predicted and ground truth bounding box adjustments.The parameter λ controls the trade-off between the two components.Furthermore, the region classification and box regression losses are calculated for each cascade stage, aiming to refine the detection results progressively.
where L stage represents the overall loss for a specific cascade stage.It consists of the classification loss L cls_stage , which encourages accurate class predictions for the region proposals, and the bounding box regression loss L reg_stage , which penalizes the discrepancy between predicted and ground truth bounding box adjustments.The parameter λ controls the trade-off between the two components.
The overall loss of the Cascade R-CNN model is the combination of the RPN loss and the losses at each cascade stage.
where L cascade represents the overall loss of the Cascade R-CNN model.It is the sum of the RPN loss L rpn and the losses at each cascade stage L stage .The sum is taken over all cascade stages.The total loss of the Cascade R-CNN model combines the overall cascade loss and any additional regularization terms.
where L total represents the total loss of the Cascade R-CNN model.It includes the overall cascade loss L cascade and additional regularization terms L reg , such as weight decay or regularization on the model parameters.The parameter λ reg controls the strength of the regularization.The three models are trained, implemented, and tested as standalone mechanisms for survivor detection.An ensemble network combining the three models was also implemented, followed by an evaluation.The location coordinates of the detected survivors are sent to ground stations from the onboard UAV module through an ORAN-based communication medium.This methodology results in a low-latency data transmission mechanism, owing to the flexibility and the dynamic allocation of network resources, thereby optimizing the communication link between the UAV and the ground station.The UGEN framework is designed to be deployable in any ORAN-based UAV system.

A. Experimental Setup
The RescueNet dataset was used for the training of the survivor detection models.It is a newly introduced dataset for the task of SAR in disaster scenarios.This dataset comprises 10,000 synthetic images and 100 real-world images, designed to provide a comprehensive training and evaluation platform for researchers working on computer vision and robotics applications in the domain of SAR.The main classes of interest were cars and people, annotated using the Roboflow software if found in an image.Once the images were annotated, the annotations were extracted in the COCO JSON [31] and YOLO PyTorch versions for use in respective training procedures.For the training of the models, an environment comprising an Intel Core i7-10750H processor, an Nvidia RTX 2070 Max-Q Graphical Processing Unit (GPU), and 16 GB of RAM was deployed to improve the efficiency of training, thereby improving performance and reducing training time.
The CCGAN-based occlusion removal model was trained for 100 epochs, and the SegFormer-based semantic segmentation module was trained for 300 epochs utilizing the GPU.The YOLOv8 survivor detection model was trained for 1000 epochs, and the models comprising the multi-stage segment of the ensemble, namely Faster RCNN and Cascade RCNN, were trained for 200 epochs.The ORAN network deployed in the UGEN framework had a bandwidth of 1 MHz, with the UAVs comprising the network having a maximum transmission power of 200 mW (23 dBm).The network had a packet loss rate of 0.1% and a throughput of 10 Mbps.Ten-sorFlow, PyTorch, and torchvision were used to implement the survivor detection models and the hybrid ensemble network.OpenCV was used to process the dataset for training the survivor detection models, and matplotlib was deployed to plot the performance evaluation metrics post-training of the models.The GPU was enabled for training purposes using the CUDA framework.

B. CCGAN-Based Occlusion Remover
The CCGAN framework was used to remove occlusions of survivors present in images, thereby regenerating them to improve detection accuracy.All necessary libraries are imported to implement the CCGAN architecture, followed by the configuration initialization for setting

C. SegFormer Semantic Segmentation Model
Semantic segmentation and classification of entities in the CCGAN-enhanced images are executed utilizing a SegFormerbased semantic segmentation mechanism.This mechanism generated a color coding, wherein each entity present in the image was associated with a unique color, making it easy for the detection models to classify between various entities.Classes of importance, namely cars and people, were annotated for semantic segmentation and used to train the SegFormer model.

D. Hybrid Ensemble Network
The images obtained as output from the SegFormer model were used to train the YOLOv8, Faster RCNN, and Cascade Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.RCNN survivor detection models, thereby training the models on enhanced images with entity separation for better survivor detection performance.The YOLOv8 model was instantiated by downloading all dependencies from the official 'ultralytics' GitHub repository.The RescueNet dataset was annotated and augmented using the Roboflow software and used for training the models.Additional pre-processing techniques were incorporated within the Roboflow workspace, and a unique API key was generated for using the processed images for training.The YOLOv8 model was then trained with a batch size 16 for 1000 epochs.The configurations previously defined in custom_yolov8s.yamlwere also given as input parameters.The mean Average Precision (mAP) was calculated for every epoch/iteration and displayed in the output.The total number of instances the model visualized under each class was tabulated and presented and the weights were stored as 'best.pt'.Tensorboard was then used to visualize the model's performance using the log generated during training.Tensorboard plotted all performance metrics observed while training the model over the range of epochs for which the model was trained.The overall training loss of the model was found to be 0.002 for identifying various classes.The accuracy of the YOLOv8 model was found to be 84.6%.The trained model was then tested by visualizing results obtained when test images were passed through the model.The detection model showcased the bounding boxes generated for each identified class.The confidence-F1 score curve, confidence-recall curve, confidence-precision curve, and recall-precision curves for YOLOv8 have been plotted and displayed in Fig. 4(a)-(d) respectively.Initially, the precision, recall, and F1 score increase rapidly with respect to the confidence value, after which, at a certain point, it gets to a constant value, indicating the completion of successful training of the model.The precision-recall curves for people, cars, and all classes indicate the robustness of the model to detect survivors efficiently.
The Faster RCNN multi-stage CNN model was deployed using Detectron2, a computer vision library that implements various CNN models using PyTorch.The Faster RCNN model was then instantiated and trained using the faster_rcnn_X_101_32x8d_F P N_3x architecture in Detec-tron2.The model was trained for 200 epochs with a batch size of 64.Once the model was trained, Tensorboard was initialized, thereby providing various performance metrics observed during the training process of the Faster RCNN model.The trained model was then tested using the test set, and the results were visualized.The model and its corresponding weights were stored.Similarly, the Cascade RCNN multi-stage model was also deployed using Detectron2 through the deployment of PyTorch.The Cascade RCNN model was then instantiated and trained using the cascade_mask_rcnn_R_50_F P N_3x architecture in Detectron2.This model was also trained for 200 epochs, deploying a batch size 64.Tensorboard was used post-training to plot the performance metrics observed during the training process of the Cascade RCNN model, followed by the testing of the model and visualization of results.The model and its corresponding weights were saved.Fig. 5 The final ensemble model was created using the trained YOLOv8, Faster RCNN, and Cascade RCNN models.All three models were executed for the same image, and their respective outputs were visualized.The models were ensembled using the simple averaging technique, wherein the three models were trained independently, and their predictions were combined to improve overall performance.The ensemble model was then evaluated using the same image used for the previous models, and the outputs were compared.Survivors were detected with an accuracy of 96.72%.
Table I indicates the false negative rates of the standalone models and the UGEN framework for survivor detection, from which it is evident that the ensemble model has a reduction in the overall false negative rate compared to the multi-stage Faster RCNN model.The performance of standalone object detection mechanisms for survivor detection has been compared with the proposed hybrid ensemble network trained on the CCGANenhanced post-disaster images in Table II.The performance of   the hybrid ensemble network surpassed that of the standalone models.The performance of the survivor detection models for different test-train splits has been depicted in Table III, indicating the 70-30 split to be the most suitable for all cases.
The accuracy of various UAV-based detection models is illustrated in Table IV and compared to the proposed UGEN system.The results indicate that the UGEN framework's performance surpasses existing detection mechanisms.The ensemble model implemented in [8] comprising the Cascade RCNN and CenterNet models uses the VisDrone dataset [30] for object detection.The dataset contains more than 6000 aerial images obtained from a UAV-mounted camera sensor, wherein people are one of the classes of importance for detection purposes.The pruned YOLOv3-MobileNetV1 framework used in [1] utilizes a thermal UAV dataset containing 6447 thermal images of people in post-disaster scenarios taken by a camera-mounted UAV.The ResNet model deployed for survivor detection in post-earthquake scenarios in [3] was trained on a synthetic dataset comprising model images of earthquake-damaged buildings and survivors observed from a UAV's perspective.The InceptionNetV3 detection model executed in [2] for survivor detection in post-flood scenarios incorporated a post-flood image dataset from a UAV.All datasets used in the above-mentioned detection models were deployed for detection tasks similar to the survivor detection objective of the RescueNet dataset in the UGEN framework.
Fig. 6(a) depicts the average response latency of the ORAN network used in the UGEN framework compared to other networks deployed in UAV systems, namely 5G RAN [28] and CRAN [29], thereby indicating a low data propagation latency of 150 μs in the ORAN network compared to an average of 200 μs for the three networks.with an increase in packet processing time for the encompassed ORAN network as opposed to other networks, indicating a low latency of 400 μs for ORAN compared to an average of 600 μs for the three networks during packet transmission.

VII. CONCLUSION
Natural calamities lead to immense building damage and cause havoc among people who get trapped in the disaster.Since survivors may be present in a disaster-struck area, post-disaster survivor detection is essential for effective SAR operations.Though UAVs are widely used to scan the post-disaster area for survivors, inaccurate detection mechanisms lead to many survivors not being detected.Hence, to mitigate the issues faced by current survivor detection frameworks, the UGEN framework comprising a UAV-based post-disaster survivor detection mechanism that employs a GAN-aided ensemble network has been implemented.The survivor detection ensemble network is trained using UAV imagery, enabling it to detect survivors efficiently.A novel CCGAN-aided semantic segmentation preprocessing module has been implemented to remove occlusion and semantically classify entities in the images fed as input to the detection model.A hybrid ensemble model comprising a single-stage YOLOv8 model and a combination of multi-stage models, namely Faster RCNN and Cascade RCNN, improves the accuracy of survivor detection and reduces the inference time.An ORAN-based communication network enhances the efficiency of the UGEN framework by reducing the data transmission latency.Upon extensive performance analysis, the UGEN framework detected survivors with an accuracy of 96.72% UGEN: UAV and GAN-Aided Ensemble Network for Post-Disaster Survivor Detection Through ORAN Gunasekaran Raja , Senior Member, IEEE, Abhishek Manoharan , and Harun Siljak , Senior Member, IEEE Abstract-Post-disaster scene understanding frameworks are increasingly crucial in Search And Rescue (SAR) operations.Unmanned Aerial Vehicles (UAVs) provide an efficient means to carry out the task of scene understanding due to the higher altitudes at which they function.However, complex environments in postdisaster scenarios make it difficult for UAVs to detect humans or objects accurately.Inefficient object detection mechanisms lead to low accuracy for object detection tasks.Hence, to mitigate these issues, we propose a UAV and GAN-aided Ensemble Network (UGEN) framework for efficient ORAN-based post-disaster survivor detection.This approach deploys a Context-Conditional Generative Adversarial Network (CCGAN)-based model to remove occlusion in the images obtained from the UAVs.The UGEN framework classifies entities present in the visual scope of the UAV using a semantic segmentation framework deployed on the CCGAN-enhanced images, resulting in a pixel-level prediction of entities present in the post-disaster images.An ensemble network comprising a combination of single-stage and multi-stage detectors detects survivors present in the post-disaster scenario, thereby combining the benefits of both architectures, resulting in a reduced false negative rate and improved performance.An Open Radio Access Network (ORAN) executes data propagation between the UAV and the ground station for reduced transmission latency.The proposed model achieved a survivor detection accuracy of 96.7%.Index Terms-Unmanned aerial vehicles, generative adversarial networks, semantic segmentation, ensemble network, open radio access network.

Manuscript received 29
June 2023; revised 22 October 2023 and 19 December 2023; accepted 28 December 2023.Date of publication 3 January 2024; date of current version 16 July 2024.This work was supported in part by the Erasmus+ KA107 Programme (ICM), and in part by NGNLab, Department of Computer Technology, Anna University, MIT Campus, Chennai 600044, India.The review of this article was coordinated by the Guest Editors of the Special Section on Open Radio Access Networks: Architecture, Challenges, Opportuities, and Use Cases in Vehicular Networks.(Corresponding author: Harun Siljak.)
(a) portrays a sample of the output obtained from the trained CCGAN model.Fig. 2(b) showcases a sample of the output obtained from the trained SegFormer model.Fig. 2(c) showcases the output of the survivor detection model based on the hybrid ensemble mechanism.Being a semantic segmentation model composed of several key components, SegFormer incorporates a Transformer encoder with self-attention layers to capture relationships between image patches, enabling spatial dependencies to be learned across the entire image.The input image is divided into patches that do not overlap, which are linearly embedded into a lower-dimensional feature space.The patch and position embeddings pass through transformer encoder layers and then into a segmentation head, which employs convolutional layers to predict semantic segmentation masks.A decoder and upsampling module refine the segmentation predictions and upsample the feature maps to the original input resolution, thereby collectively contributing to the model's effectiveness in semantic segmentation tasks.The loss of the SegFormer model is considered an essential characteristic during the training procedure.The SegFormer loss function L S is minimized through the training of the model and is represented as
up and training the generator and the discriminator.The ImageDataset class enables the loading and usage of a custom dataset for training purposes.Instantiating the testing and training data loaders allows for efficient dataset loading using the ImageDataset class.The generator and the discriminator classes enable the creation of the CCGAN model for occlusion removal.The CCGAN model was then trained by instantiating a model using previously declared classes and training the same on the RescueNet dataset for occlusion removal.The training results and performance evaluation metrics were plotted to visualize the efficiency of the trained CCGAN model for occlusion removal.Fig. 3(a) plots the discriminator loss alongside the number of iterations.Fig. 3(b) plots the generator adversarial loss for the number of iterations through which the CCGAN model was trained.The performance statistics of the trained CCGAN model showcase its low error rate in regenerating occlusion-removed images.

Fig. 3 (
c) and (d) depict the Step-Accuracy and Step-Loss curves for the trained SegFormer model, thereby indicating a steady rise in accuracy and a steady decline in the training loss value with the execution of the training procedure.The necessary libraries were imported to implement the SegFormer framework.The images obtained as output from the GAN module were utilized to use the occlusion-removed images for semantic segmentation.The SemanticSegmentationDataset class loads and processes the dataset for the semantic segmentation task.The SegFormer model deployed on the CCGAN-enhanced images executes the semantic segmentation task.The SegformerFinetuner class prunes and finetunes the parameters used for implementing the model.The model was then trained for 300 epochs.The Early Stopping mechanism was incorporated to reduce the model's chances of overfitting.Tensorboard was instantiated to print the performance evaluation metrics witnessed during training.The model had a minimal training loss of 0.0233, thus showcasing the model's segmentation efficiency.
(a) depicts the epoch-accuracy curve for each class identified by the Faster RCNN model, depicting the increase in accuracy as the model's training was executed, leading to the final accuracy of detecting various classes to be 94.92%.The training loss of the model reduced rapidly as the training procedure took place, as indicated in the Epoch-Loss curve of the model depicted in Fig. 5(b).Fig. 5(c) and (d) showcase the epoch-accuracy and epoch-loss curves of the Cascade RCNN model, showing similar trends as that of the Faster RCNN model, achieving an accuracy of 95.31%.

Fig. 6 (
b) portrays the change in latency Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Algorithm 1 :
CCGAN-Based Occlusion Removal. in better survivor detection.CCGAN aids occlusion removal by learning to generate realistic and coherent image content in regions occluded in the input images.It accomplishes this by training on pairs of occluded and unoccluded images while considering conditional information.The generator network produces the output images, and the discriminator network helps ensure the generated images are convincing and visually accurate.Algorithm 1 discusses the usage of CCGAN for removing occlusion in post-disaster images.The generator loss L G is used to evaluate the difference between the generated image and the ground truth and is minimized by training the model for several epochs.It is represented as In the proposed model, we use the YOLOv8 framework as the single-stage detector and a combination of the Faster RCNN and Cascade RCNN mechanisms for the multi-stage frameworks

TABLE IV COMPARISON
OF VARIOUS DETECTION MODELS