A Survey on Waste Detection and Classification Using Deep Learning

Waste or trash management is receiving increased attention for intelligent and sustainable development, particularly in developed and developing countries. The waste or trash management system comprises several related processes that carry out various complex functions. Recently, interest in deep learning (DL) has increased in providing alternative computational techniques for determining the solution to various waste or trash management problems. Researchers have concentrated on this domain, and as a result, significant research has been published, particularly in recent years. According to the literature, a few comprehensive surveys have been done on waste detection and classification. However, no study has investigated the application of DL to solve waste or trash management problems in various domains and highlight the available datasets for waste detection and classification in different domains. To this end, this survey contributes by reviewing various image classification and object detection models, and their applications in waste detection and classification problems, providing an analysis of waste detection and classification techniques with precise and organized representation and compiling over twenty benchmarked trash datasets. Also, we backed up the study with the challenges of existing methods and the future potential in this field. This will give researchers in this area a solid background and knowledge of the state-of-the-art deep learning models and insight into the research areas that can still be explored.


I. INTRODUCTION
Waste generation has risen dramatically in recent years. According to World Bank data, the global solid waste generation in 2016 was approximately 2.01 billion tonnes per year. By 2030 and 2050, the world is expected to produce 2.01 and 3.40 billion tonnes, respectively [1], [2]. Trash management failure can have disastrous consequences for almost every environment. As a result of a large amount of waste, waste detection and sorting should be done early in the waste management process to maximize the number of recyclable items and reduce the possibility of environmental contamination by other items.
The daily increase in solid waste in all environments endangers both human and animal health and life. Poorly managed and openly deposited trash harms the environment, The associate editor coordinating the review of this manuscript and approving it for publication was Donato Impedovo . endangers local residents' health, causes water and air pollution, land contamination/degradation, and has numerous other consequences [3]. In areas that are not technically designated as toxic waste dump sites, such as cultivable land, highways, buildings, and construction sites, as well as occasionally inside homes or nearby, illegal trash burying happens.
Due to the challenges posed by improper garbage/trash depositions in undesignated locations [4], many have been using various techniques to detect and classify trash [5], [6]. Some research such as in [7] focuses on the direct detection of waste through its spectral signature using satellite imagery and remote sensing methods. But the satellite images varied in characteristics, and they will have different resolutions at different distances, and the objects are taken at different angles. Despite the fact that variations in light absorption allow satellites to locate objects in space, acquisition will take place in inaccessible regions with limited transportation VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. Classification of waste. [9].
options [8]. In contrast to features seen in terrestrial litters, those found in marine litters will be observed from the most advantageous vantage position. However, most of the methods used to classify trash depend on human expertise, which is sometimes very challenging and tedious to classify the waste accurately and the satellite imaging methods are computationally cost, with inability to perfectly separate trash objects, especially in case of occlusion and variation or light. Different kinds of waste are shown in Figure 1. The waste can be hazardous or non-hazardous with further subdivision in a different environment. The physical state, technical elements, reusable potentials, biodegradable potential, manufacturing source, and the degree of environmental effects are some of the specific features considered in the classification of garbage [9]. After considering these characteristics, waste, by material nature, can be commonly divided into three primary types: liquid, solid and gaseous waste [10], [11]. Domestic waste is also called municipal solid waste, although some of its content may be associated with commercial and industrial waste. Computer vision is a field of study that enables computers to analyze and derive information from visual data. Object detection and image classification are the two most common applications of computer vision. Identifying objects in digital images is referred to as object detection whereby it locates the object of interest in an image and creates a bounding box around it. Predicting an object's class is referred to as image classification. Face detection, and pedestrian detection, are common examples of object detection.
Machine learning and deep learning techniques are subsets of artificial intelligence that automatically without being explicitly programmed or learning without any direct human intervention learned from any input data. Machine learning and deep learning technique have been widely used in object detection and image classification. The same is applied to waste detection and classification.
Several survey papers have been written by different researchers in relation to waste management. There are some surveys focusing on object detection in general such as [12], [13], [14], [15], [16], and [17] all focusing on general object detection only. Illegal waste disposal surveys were conducted by many researchers such as [18], the authors survey gaseous and water waste in borehole drilling. The paper [19] discusses the conventional techniques used to dispose of waste. It also discusses the shortcomings of the current systems and how to fix them. A bibliometric-based review was conducted by [20] on the classification of domestic waste covering the years between 2000 and 2019. The author(s) reported that European countries are in the lead in doing research in this field, and plastics and metal wastes were the existing focus of automating trash detection and classification. However, this survey/review is limited to only Engineering, Environmental Science Economics, and Chemistry. Another review by [21] focuses on reviewing images underwater for better detection and classification. The authors introduce existing research on underwater target image recognition and primarily present deep learning-based underwater image recognition technology. The current problems of underwater image recognition are then summarized in this review. A systematic literature review was conducted by [22]. The authors examine disaster waste management research systematically from nine perspectives: planning, waste, waste treatment options, environment, economics, social considerations, organizational aspects, legal frameworks, and funding. Table 1 shows the existing survey papers, their contributions, and topics not well discussed.
All of the reviewed surveys focus on object detection and a few on waste detection and classification. However, none of them comprehensively surveyed the available benchmarked dataset and the deep learning models for single and multi-object detection on the waste detection and classification.
This survey paper aims to contribute by reviewing existing deep learning models for detecting and classifying waste. This work offers an organized and thorough review of several existing waste detection, classification, and DL approaches. Furthermore, it readily explained existing waste detection and classification datasets in various environments. The benefits and drawbacks of current approaches and datasets and the potential future research are highlighted to support the study.
In order to give a comprehensive technical study of the trash object identification and classification techniques as part of this survey review, many articles from several digital libraries, including IEEE Xplore, Science Direct, Scopus, ACM Digital Library, and many more, were retrieved. The most appropriate papers for the review have been identified and organized logically in this portion of the paper after careful consideration of the paper title, abstract, introduction, experiment, and future scope. Automatic and manual search strategies were conducted. The automatic search was carried out by inputting keywords on online scientific database. The search keywords are (''Waste Detection'' AND ''Waste Classification'', ''Garbage Detection'', ''Garbage Classification''). The manual search was carried out by scanning the primary study references of the automatic search. This study collected primary studies from journals and conference proceedings written in English, that are related to DL-based waste detection and classification. Studies that are not related to the domain and not written in English are excluded.
The remaining of this paper is organized into sections as follows: Section 2 is the related literature review of the image classification and object detection models. In Section 3, the deep learning methods for waste detection and classification are discussed. While in section 4, the benchmark dataset for waste detection and classification is explained vividly. In section 5, the challenges of waste detection and classification are summarized. The conclusion is provided at section 6.

II. IMAGE CLASSIFICATION AND OBJECT DETECTION MODELS
Nearly all object detection and classification were done using traditional machine learning methods before deep learning gained popularity in the previous decade. Typical ones were the histogram of directed gradients [23], scale-invariant feature transforms [24], and the viola-jones object detection algorithm [25]. These would identify a number of recurring features in the image and categorize their clusters using random forests, logistic regression, or color histograms. Image identification or classification through the use of machine learning makes use of the potential of algorithms to learn previously unknown information from a dataset containing both structured and disorganized samples (Supervised Learning). Deep learning, a kind of machine learning that employs numerous concealed layers within a model, is now the most widely utilized approach to the field. Deep learning, when used in conjunction with powerful AI technology and graphics processing units (GPUs), has made it possible to obtain exceptional results in image classification tasks. As a result, deep learning has been the driving force behind several recent breakthroughs in the fields of image identification, facial recognition, and the development of image classification algorithms that attain above-human-level performance and real-time image/object detection. Deep learning-based methods used now do far better than these. This section reviews the widely used or adapted deep learning models for waste detection and classification.
The process of image classification occurs when a computer examines an image in order to determine the ''class'' that the image belongs to. (Or a likelihood that the image belongs to a certain class.) A class is essentially a label, such as vehicle,'' ''animal,'' ''house,'' and so on, among many more examples. Some of the deep learning classification models includes, AlexNet, VGG16, followed by ResNet, MobileNet, Inception-ResNet and DenseNet.

A. AlexNet
Five Convolution layers and three Fully Connected layers are included in AlexNet, along with ReLU nonlinearity and Local Response Normalization (LRN). Alexnet will accept a maximum input image dimension of 227 by 227 pixels. Alexnet was trained with ImageNet, which consists of images of 1000 classes. Figure 2 depicts the architecture of the alexnet network.

B. VGG16
The results of the ImageNet competition in 2014 served as the basis for the development of this model. During the training phase, the ConvNets receive RGB images of a fixed size of 224 by 224 as their input. The only thing that is done in the way of pre-processing is taking each pixel and removing the mean RGB value that was derived using the training set. The model was trained using a total of a thousand different classes. The VGG16 architecture can be shown in Figure 3.

C. Inception-ResNet,(InceptionResNetV2)
The Inception-v4 model is an improved version of the Inception-v3 model, and both are a hybrid of inception modules and residual connections. Residual connections improve the efficiency of deeper and wider inception   networks with fewer hyperparameters. Previously, module implementation required more training resources and time; however, with residual connection improvement, requirements were reduced, and the model proved to be more efficient to train. In contrast to previous Inception ResNet works, batch normalization in inception-v4 occurs on top of traditional convolutional layers rather than before residual blocks; as a result, the inception block size is increased. Aside from inception-v3, inception-v4 includes residual scaling before layer accumulation, which results in more stable training and higher accuracy, as well as the ability to build a larger model size [28]. The input size for Inception-v4 is 299 by 299 pixels.

D. MobileNet
MobileNets is a model proposed by Google's research team for efficient mobile device usage. MobileNets perform depthwise separable convolution after full convolution, allowing for higher accuracy with a small number of hyper-parameters. MobileNets, in addition to depth-wise convolution, are thinner with fewer parameters because they use reduced representations of the input and rely on the model shrinkage parameter to keep the model from producing additional hyper-parameters. Because of the smaller size of MobileNets, they can train faster with fewer resources, which is one of their most useful properties for versatility [29]. The input size for MobileNets is 224 by 224.

E. DEEP RESIDUAL NETWORKS
Deep residual networks, or ResNet, won the 2015 ILSRVC (ImageNet Large Scale Visual Recognition Competition) on the ImageNet dataset, and ResNet50 is one of the most commonly used residual network architectures. Deep residual networks scientifically solved inaccuracy problems while increasing training and test error as the layers stacked for deeper networks. The use of residual blocks enabled the construction of extremely deep convolutional neural networks. ResNet proposes an elegant formation of residual blocks with three convolutional layers, batch normalization, and a rectified linear unit (ReLU) activation function, in addition to simply stacking convolutional layers. The recurrent use of these residual blocks constructs ResNET as deeply as possible with fewer hyperparameters. Because of this deep structure and batch normalization between residual blocks, ResNET achieves better feature extraction than previous versions [30]. For our classification task in this study, we used the ResNet50 variation. ResNet50 input dimensions are 224 by 224.

F. DENSELY CONNECTED CONVOLUTIONAL NETWORKS
Densely connected Convolutional Networks, or DenseNets for short, are among the most efficient deep convolutional neural network structures because they have shorter connections between layers near the input and output. DenseNets outperform previous research by reducing parameters, strengthening feature propagation, and solving the vanishing gradient problem as the network grows. DenseNet's main layer architecture is a dense connectivity layer, and its feature map concatenates multiple inputs into a single tensor. DenseNet works more efficiently with fewer parameters than ResNet variants. Identity mapping, deep supervision, and depth diversification are all natural components of DenseNets. It is worth noting that DenseNet claimed to work better without data augmentation due to large margin properties [32]. Despite the limited number of data points, our findings appear to support this claim. DenseNet input dimensions are 224 by 224.
Deep learning-based methods identify the labels of the objects based on their features using neural network architectures such as RetinaNet, YOLO (You Only Look Once), CenterNet, SSD (Single Shot Multibox detector), and Region proposals (R-CNN, Fast-RCNN, Faster RCNN, Cascade R-CNN).
Modern object detection architectures often have two stages (single-stage and two-stage object detectors), and many of them have already been pre-trained using the COCO dataset [33]. The COCO picture dataset includes 90 distinct item classifications (e.g., person, airplane, car, bicycle, handbag, tv, door, etc.). Many image classification models have been in existence, leveraging its potentials from neural networks architecture. Some of the existing deep learning models are discussed below: is an object detection model that employs top-down region proposals and high-capacity CNNs to localize and segment objects. [34]. Multiple bounding-box object region candidates are found using selective search (''regions of interest''), and each region's features are extracted separately and used for classification. Depending on the version, multiple input shape sizes are accepted by R-CNN. Some of the permitted sizes include 640 by 640, 1024 by 1024, and 800 by 1333.

H. YOLO
You Only Look Once, more often known as YOLO, is a wellknown real-time object identification technique. YOLO takes what was traditionally a multi-step process and integrates it into a single operation by employing a single neural network to conduct both classification and prediction of bounding boxes for objects that have been spotted. The image that is being input is scaled down to 448 square pixels bytes. A PAS-CAL VOC consisting of 20 labeled classes was utilized for the purpose of analyzing YOLO. Figure 7 shows the YOLO model architecture. The YOLOR [35] technique, which was launched in 2021, achieves inference times of 12 milliseconds on the same MS COCO dataset, thus surpassing the popularly deep learning algorithms YOLOv3 [36] and YOLOv4 [37]. There are some other versions of YOLO like YOLOv5 and YOLOv6, that are considered unofficial and still performing wonderfully well. However, YOLOv5 and YOLOv6 are not better than YOLOv4 in terms of performance [38]. Meanwhile, the latest version of YOLO that was recently released   is YOLORv7 [39] which addressed some of the issues that the previous versions are having.

I. CenterNet
[41] is a single-stage object detector that recognises each object as a triplet of keypoints rather than a pair. In order to enrich the information gathered by the top-left and bottomright corners and to provide more recognisable information at the central regions, it makes use of two customized modules called cascade corner pooling and centre pooling. In Center-Net, the likelihood that the centre keypoint in its central area will be predicted as belonging to the same class is high if the predicted bounding box and the ground-truth box have high IoUs. CenterNet accept the input size of 512 by 512.

J. EfficientDet
EfficientDet was developed after careful study of the neural network architecture design choices for better object detection and proposals of several key optimizations to improve efficiency. [43] Weighted bi-directional features pyramid network (BiFPN) which allows easy and fast multi-scaling feature fusion, and a compound scaling method that uniformly scales the resolution, depth, and width for all backbone, feature network, and the box was proposed and formed the efficientDet network architecture, and it accept the input image size of 512 by 512.

K. ExtremeNet
is a bottom-up object detection framework that recognizes an object's four extreme points (top, left, bottom, and right). In order to identify extreme points, it predicts four multi-peak heatmaps for each item category using a key point estimation technique. Additionally, it averages two bounding box edges VOLUME 10, 2022 in the x and y dimensions to forecast the object center using one heatmap for each category. A strictly geometric method is used to classify extreme points into objects. The four extreme points were combined, one from each map, if and only if their geometric center was predicted in the center heatmap with a score greater than a predetermined threshold. The predictions for the extreme points were listed, and the legitimate ones were chosen [44].

L. Mask-RCNN
Over the last few years, a lot of giant breakthroughs was achieved in image classification and detection, with an inference time of 330 milliseconds per frame in 2017, the Mask R-CNN [45] technique was the real-time object detector that proved to be the most successful in the MS COCO benchmark.

III. WASTE DETECTION AND CLASSIFICATION A. WASTE DETECTION
Some researchers concentrate on detecting and reporting the presence of abandoned waste through real-time video stream analysis. The research by [46] uses an improved YOLOv3 network model to perform waste detection and recognition.
The network was fine-tuned using the dataset gathered for this purpose. The findings indicate that the proposed approach could make a significant contribution to more efficient waste management in smart cities.
[47] examines a variety of deep-learning algorithms for visually detecting trash in realistic underwater environments, with the goal of exploring, mapping, and extracting such debris using AUVs. A large, publicly available dataset of actual debris in open-water locations is annotated and used to train a variety of convolutional neural network architectures for object detection. The four selected algorithms tested are YOLOv2, Tiny-YOLO, Faster RCNN with Inception v2, SSD with MobileNet v2. With corresponding results of YOLOv2 -mAP=47.9, Tiny-YOLO -mAP=31.6, Faster RCNN with Inception v2 -mAP=81, SSD with MobileNet v2 -mAP=67.4.
The author [48] proposed an automatic trash detection system based on deep learning and the narrowband Internet of things. The system detects and identifies decoration trash directly in the front-end embedded monitoring module and manages thousands of monitoring front ends via the narrowband Internet of Things and background server. an improved YOLOV2 was used for the experimentation.
[49] use a deep learning strategy to detect trash automatically. FastRCNN was the model trained, and a data fusion and augmentation strategy is proposed to improve the method's accuracy. As a result of the experiments, the method has a good generalization ability and a high-precision detection function.
Three different pieces of waste classes were experimented and reported by [50] using Fast RCNN. On the overall classification of the trash images, the authors achieved a mean Average Precision (mAP) of 0.683.
A smartphone app, called SpotGarbage was proposed and developed by [51] which detects and coarsely segments garbage regions in a geo-tagged image clicked by the user for detecting garbage in images, the app employs the proposed deep architecture of fully convolutional networks. The model was trained on a Garbage In Images (GINI) dataset, with a mean accuracy of 87.69%.
Aquatic animals do also experience serious health issues that can lead to death straight or environmental contamination by floating trash that can easily lead to their death. Research by [52] proposes a method for detecting visible trash floating on the water surface of urban canals. The authors also provide a large dataset of trash in water channels, the first of its kind, with object-level annotations. A novel attention layer that improves the detection of smaller objects is proposed. Another research in the same environment by [53] Aqua-Vision, a cutting-edge deep learning-based object detection model, was proposed over the AquaTrash dataset. With a mean Average Precision (mAP) of 0.8148, the proposed model detects and classifies various pollutants and hazardous waste items floating in the oceans and on the seashores. The proposed method localizes waste objects, which aids in the cleaning of water bodies and contributes to the environment by preserving the aquatic ecosystem.
Research by [54] has proposed a garbage detection algorithm for underwater environments that is based on an enhanced version of the YOLOv5s algorithm. The feature extraction module of the YOLOv5s network is replaced by the MobileNetv3 network, which is a lightweight network, thanks to the algorithm. While this is going on, the enhanced network is pruned in order to cut down on the number of redundant parameters and further compress the model. The findings of the experiments indicate that the approach's detection accuracy can reach 97.5% based on one-ninth of the parameters of YOLOv5s, and that the real-time detection speed on the CPU is 2.5 times that of YOLOv5s. [55] has developed and implemented an image-based detection system that can distinguish between various garbage cans for the sake of categorization..
Research conducted by [56] presented a strategy with the goal of reducing the costs associated with monitoring urban waste and better coordinating the data acquired with the essential information requirements of cities. The authors used cameras mounted on vehicles and a deep convolutional neural network model to quantify the amount of urban waste that accumulated along roadsides. The model was used to identify trash in the images that were captured. Using data collected along 84 road segments in two California cities, they compared the performance of three different models for trash detection, with the highest performing model (Mask R-CNN) obtaining 91% recall, 83% precision, and 77% accuracy.
Research by [48] proposes a deep learning-based automatic garbage detection system and a narrowband Internet of things. The system detects and identifies decoration garbage directly in the front-end embedded monitoring module, and it manages thousands of monitoring front ends via the narrow-band Internet of Things and the background server. The improved YOLOv2 network model is used in the system's front-end embedded module for garbage detection and recognition. As a result of low image resolution, a research by [57] a new and innovative feature fusion module that is lightweight was proposed as part of an algorithm that is an improved single-shot multibox detector (SSD). In the course of this study, the backbone network of VGG16 was upgraded to ResNet-101 in order to accomplish more precise detection.
A Semi Smart Trash Separator to detect and classify garbage and trash was proposed by [58], precycling techniques was used by assigning a barcode or QR code to each material, which will enable the separation process as per assigned code; Magnetic separator helps in collecting conductive metal, then the non-conductive materials are classified according to their hardness. The material recognition accuracy rate from the obtained results on AlexNet and GoogLeNet are 75% and 83% respectively. The lightweight detection network GhostNet is used in [59], which detects trash in real-time outside using robots, as the backbone of the detection network. The network was trained using a dataset that was created by researchers, and it contains four different categories of items. The findings of the experiments reveal that the upgraded version of the YOLOv4 method that was proposed has better detection performance compared to the YOLOV4 algorithm that was used initially, and that it has created adequate generalization performance in a variety of various sorts of trash, similar waste detection research for robotics applications was conducted by [60].
YOLO-Green is a waste identification model that was proposed by [61]. The model was trained on a dataset that was acquired from real-world trash and was then categorized into seven of the most prevalent forms of solid waste. YOLO-Green has an amazing mAP of 78.04% after only undergoing training for a total of 100 epochs. A fresh and lightweight waste identification system was suggested in the research carried out by [62]. The system makes use of a modified version of the yolov5 algorithm. In addition to this, the researchers came up with two approaches that they named tracking object transmission and video backtracking. These methods, together with a tracking algorithm that was based on a kernelized correlation filter, were proposed by the researchers. Table 2 summarizes some of the existing waste detection methods/models.

B. WASTE CLASSIFICATION
Many scholars have begun research in this field in the context of promoting waste sorting and recycling and its effect [63]. The research by [64] uses Trashnet dataset that consists of 6 classes of trash objects for the trash image classification. Support vector machines (SVM) with scale-invariant feature transform (SIFT) features and a convolutional neural network (CNN) were used as models. In their experiment, the SVM outperformed the CNN; however, the CNN was not trained to its full potential due to difficulties in determining optimal hyperparameters. The SVM outperformed the Neural Network in terms of performance. It achieved a 63 % using a 70/30 train/test data split. A neural network with a 70/30 train/test split achieved a 27 % testing accuracy.
The model RecycleNet developed by [32] is a carefully optimized deep convolutional neural network architecture for the classification of selected recyclable object classes. Trashnet dataset was also used, and many deep learning models were tested to classify waste with both saved model weight and training from scratch.
Deep Learning models can be hybridized to improve the accuracy of object classification models. In a study by [65] uses 5000 images with a resolution of 640 by 480 pixels and a plain grey background are used. When the investigated items have strong image features, both the Multilayer Hybrid System and CNN perform well. CNN, on the other hand, performs poorly when waste items lack distinguishing image features, particularly ''other'' waste. Under two different testing scenarios, MHS achieves significantly higher classification performance: the overall performance accuracies are 98.2 percent and 91.6 percent, respectively (the accuracy of the reference model is 87.7 percent and 80.0 percent). The item is positioned in both fixed and random orientations.
As trash can belong to different environments, [66] propose a deep learning approach for medical waste identification and classification. The authors propose ResNeXt, a deep learning-based classification method that was applied to 3480 images and successfully identified 8 types of medical waste with an accuracy of 97.2 percent; the average F1-score of five-fold cross-validation was 97. Most of the trash classification models focus on a single object in an image. A paper by [68] attempts to identify a single trash object in an image and classify it into one of the recycling categories. Support vector machines (SVM) with HOG features, simple convolutional neural networks (CNN), and CNN with residual blocks are among the models used. According to the results of the evaluation, they conclude that simple CNN networks with or without residual blocks perform well. Besides a single object detection, a single trash class was investigated by [69]. Different types of waste necessitate different management techniques; thus, proper waste segregation according to type is essential to facilitate proper recycling. The current method of segregation still relies on manual hand-picking. In the paper [70], a method for classifying wastes using images into six different waste types (glass, metal, paper, plastic, cardboard, and others) based on deep learning and computer vision concepts is proposed. For waste classification, a multiple-layered Convolutional Neural Network (CNN) model, specifically the well-known Inception-v3 model, was used, with a trained dataset obtained from online sources. The proposed method achieves a high classification accuracy of 92.5%. A model that can realize intelligent decision-making for garbage categorization for big data in the scene in a complicated system is proposed in [71]. This model also includes certain conditions for promotion and landing. The findings of the tests model indicate that the suggested model has a greater level of accuracy in terms of both detection and classification in comparison to the original YOLOv5 model, and that it is also capable of meeting the actual application requirements in terms of its real-time performance. In a study by [72] makes a suggestion for an algorithm that is based on InceptionV3 networks and tests the model on a garbage classification dataset that is quite huge in scale. Transfer learning was used in the dataset, which was then segmented into training sets consisting of 80 %, validation sets consisting of 10 %, and test sets consisting of 10 %. The accuracy of the model was determined to be 93.125 %.
In paper [73], the authors present a novel garbage image recognition model called Garbage Classification Net (GCNet), which is based on transfer learning and model fusion. Following the extraction of trash image features, the neural network models EfficientNetv2, Vision Transformer, and DenseNet are successively integrated to create the GCNet model of the garbage classification neural network. The dataset is expanded with the help of data augmentation, and the expanded dataset contains 41,650 images that are considered to be trash. The authors of [74] work on constructing a deep CNN that is tailored specifically for garbage image classification. The authors came up with the idea for the attention module known as DSCAM, which offers an original method to build attention weights. a large number of other classification models, including VGG16, Xception, MobileNet-V3, and GNet, among others, were evaluated, and the proposed DSCAM models were found to have the highest accuracy of 98.9%.
A deep neural network model for garbage classification was developed by [75] and given the name DNN-TC. This model is an improvement on the ResNext model and was developed to increase the predicted performance. After the global average pooling layer, the authors of this study changed the original ResNext-101 model by adding two fully connected layers with outputs of 1024 and N class dimensions respectively. This was done to reduce the amount of redundancy in the model. In order to evaluate the model, both the VN-trash dataset and the Trashnet dataset were utilized. [76] proposes a potential solution by creating AlphaTrash, a machine that can be fitted to conventional curb-side trashcan and used to sort out deposit trash automatically. The researchers use a pre-trained convolutional neural network (Inception-v1), the machine can classify trash with an accuracy of 94%, while using 4.2 seconds per classification.
Due to the scarcity of trash data, for the purpose of data regeneration, [77] make use of both the two-stage variational autoencoder (VAE) and the binary classifier (augumentation). An evaluation of the effect that the augmentation procedure has is carried out with the use of a multi-class classifier. This is done by determining how well an object detector was educated using a mixture of actual and simulated trash image. [78] focuses on the classification of garbage using metadata and evaluated the strategy using multiple deep learning algorithms such as VGG16, ResNet50, and DenseNet169 to compare it with the recently developed model ThanosNet, which achieved an accuracy of 94%. A lot of more research focuses on trash image classification from different devices such as in [79] for robotics, and those purely works with CNN with low accuracy such as in [80] and [81] using different benchmark datasets. The summary of the trash classificationbased research is in Table 3.

IV. BENCHMARK TRASH DATASET
Trash-based object detection is one of the trending topics in the field of object detection of computer vision. In a polluted, controlled environment, these systems enhance living things' quality of life and health. However, it is vital to assess the efficacy of the automated waste detector or classifier in detecting and classifying trash using dataset benchmarks in experimental scenarios, especially in the wild, before making it available to end users. As a result, the scientific community has created and made available a number of databases for the detection and classification of refuse. Therefore, selecting the ones to employ in the evaluation process and the strategies that are most suited for a specific environment or waste class is a difficult challenge and essential to moving this field of study forward.
The detection, classification, and segmentation of waste using deep learning have been the subject of numerous recent attempts. A few available datasets have been used to try and classify litter/trash according to typical waste types using images. Table 4 summarizes the available datasets. Figure 10 and Figure 11 show the image samples from the trash net dataset and the collected images in the wild, respectively. From Figure 11, some challenges of the presence of other objects and an unstable background can be observed that will leads to the inability of the deep learning models to generalize in the trash detection and classification tasks. While in figure 10 all images have a clear background that also will not make the trained model with that kind of image perform well in real life.
A. TRASHNET DATASET [64] The dataset includes six categories: waste, glass, paper, cardboard, plastic, and metal. The dataset comprises 2,527 photos labeled with a category (501 glass, 594 paper, 403 cardboard, 482 plastic, 410 metal, and 137 rubbish/trash). The dataset consists of images of trash taken on a white backdrop using various exposure and lighting settings (mainly one object per photo). The authors investigate the SVM and CNN algorithms to sort waste into six recycling categories. They employed an architecture resembling AlexNet but with fewer and smaller filters. Results from the SVM were superior to those from the Neural Network. Using a 70/30 train/test data split, it attained an accuracy of 63 percent. Testing accuracy for a neural network with a 70/30 train/test split was 27%.
Garbage in Image (GINI) Dataset [51]: The Garbage in Photos (GINI) collection consists of 2,561 images of an VOLUME 10, 2022  unknown resolution, 1,496 of which have bounding boxes annotated (one class -trash). They created their dataset using the Bing Image Search API. The authors use a pre-trained AlexNet, and their method focuses on segmenting a pile of trash in an image without giving any information regarding the kinds of waste that are contained in that segment. Their approach, which is based on extracting image patches and aggregating predictions, is unable to capture the more minute characteristics of object boundaries. GarbNet achieved an accuracy of 87.69% for the task of garbage detection; however, it made incorrect predictions when it found waste-like things in an image or when they were far away.
B. TrashICRA19 Dataset [47] This data set originated from the marine debris J-EDI dataset. The quality, depth, scene items, and cameras used in the videos that make up that dataset vary widely. They provide a variety of things in various levels of decay, occlusion, and overgrowth. They include photographs of many distinct forms of marine trash, taken from actual locations and containing a wide range of objects. Additionally, the water's transparency and the quality of the light differ greatly from one video to the next. 5,700 photos total, all annotated with bounding boxes on instances of trash, biological items including plants and animals, and ROVs, were extracted from these films through processing to make up this dataset. The ultimate objective is to create effective and precise waste-detecting techniques suited for onboard robot deployment.
C. TACO [82] The Trash Annotations in Context (TACO) is an open image dataset for litter detection and segmentation, which is growing through crowdsourcing. TACO contains high-resolution images, taken mostly by mobile phones. 1500 annotated images with approximately 5000 objects made up the TACO dataset. All trash has been categorized into one of 60 classifications, which are subdivided into 28 super (top) categories, including Unlabeled litter for objects that are difficult to identify or that are extensively concealed. On the instance segmentation level, the annotations are provided in the well-known COCO format [33] with an additional background description for Trash, Vegetation, Sand, Water, Indoor, and Pavement. Additionally, TACO provides about 3,000 unannotated images, of which more than 3,000 were annotated on the detection level, resulting in a total of over 14,000 instances. The fact that TACO is distinguished by a wide range of litter types and a sizable diversity of backgrounds, from tropical beaches to London streets, is a huge benefit. Although labels may contain some user-induced bias and inaccuracies due to the dataset's crowdsourcing nature, not all objects in TACO may be categorically classified strictly as litter as their category is frequently reliant on context. D. UAVVaste [83] The public UAVVaste dataset currently has 772 photos and 3718 annotations and is expected to be updated. The primary motivation for developing this dataset was a lack of domainspecific data. As a result, this image set is recommended not only for benchmarking object detection evaluations, but also for building solutions connected to UAVs, remote sensing, and even environmental cleaning. E. WADABA [84] The WADABA dataset contains images of plastic trash collected from households. A minimum of 100 objects were planned for capture, with each object receiving forty photographs under various situations. Two types of lighting were used: fluorescent lamps and LEDs. The image settings are as follows: 1920 × 1277 pixels, 300 dpi resolution, RBG 24 bit colour palette, and JPG file format.
F. GLASSENSE-VISION [85] The Glassense-Vision dataset is a collection of image data that has been collected from different objects. The collection contains 505 photos representing several object types (banknotes, cereals, medications, cans, tomato sauce, water bottles, and deodorant sticks). All the images in the collection have been manually annotated. The many use cases (object categories) can be classified into three geometrical types: flat items, boxes, and cylindrical things. All photos were saved with a resolution of 665 × 1182 pixels.
G. MJU-Waste [86] This dataset was developed by collecting waste items from a university campus, transporting them to a lab, and photographing people carrying waste objects. The images in the collection were all taken by the author. with a Microsoft Kinect RGBD camera. This dataset's current version, MJU-Waste V1, contains 2475 co-registered RGB and depth picture pairs. The dataset was specifically divided into a training set, a validation set, and a test set of 1485, 248, and 742 photos, respectively. The depth frames contain missing data at reflecting surfaces, occlusion boundaries, and remote locations due to sensor limitations. In order to obtain high-quality depth photographs, the median filter was employed to fill in the missing values. MJU-Trash annotates each image with a pixel-wise mask of waste elements.
H. OPEN LITTER MAP [87] Over 100k images from phone cameras make up the free, public, and crowd-sourced dataset known as Open Litter Map. Each image includes details such as the kind of litter it was taken from, the coordinates, the timestamp, or the phone model. Images came from all across the world and were captured by various individuals. Consequently, they are very different from one another.
I. WASTE PICTURES [88] Nearly 24000 trashes images from Google searches are collected in Waste Pictures, which is split into 34 classes. Even x-rays and drawing of trash are included in the wide variety of images. The image sizes vary greatly as well. However, the majority of the images are smaller than 2000 × 2000 pixels. Images should be carefully examined before being used in a categorization task due to their provenance.
J. Wade-AI [99] Images of trash in a natural setting are available in the Wade-AI dataset thanks to Google Street View. It has 2200 manually labelled instance mask annotations on around 1400 photos in COCO format, all of which belong to the same class, garbage. The source of the photographs affects the environment and size of the images. The majority of photos are less than 1000 × 1000.
K. NWNU-TRASH [100] The web crawler technology, Python code, and manual photography were used to create a recyclable waste image dataset named NWNU-TRASH), which includes waste glass (3845), waste fabric (3862), wastepaper (3766), waste plastic (3865), and waste metal (3573), with a total of 18911 images. Different backgrounds are chosen for the images, and the number of different types of waste images is balanced, as is the data diversity, which is more in line with the needs of the real background.
L. CLASSIFY-WASTE [101] Over 21000 waste instances from Extended TACO, drinkingwaste, waste pictures, Google search, TrashNet, and Places are included in the classify-waste dataset. The majority of trash is made up of metal and plastic, or an unknown category that is closely related to the distribution of waste types produced by humans. Nonetheless, it contains a diverse set of trash that ensures the generalizability of a model trained on this dataset. The waste classification dataset contains eight labels the categories include: Fruit, vegetables, herbs, used paper towels, and tissues are examples of biowaste. Glass objects include glass bottles, jars, and cosmetic packaging. Scrap metal and nonferrous metal, beverage cans, plastic beverage bottles, plastic shards, plastic food packaging, or plastic VOLUME 10, 2022 straws are examples of metals and plastics. Non-recyclable items include disposable diapers, string, polystyrene packaging, polystyrene elements, blankets, clothing, and used paper cups. Other types of waste include construction and demolition waste, large-sized waste (such as tyres), used electronics and household appliances, batteries, paint, and varnish cans, and expired medicines. Paper, cardboard packaging, receipts, newspapers, catalogs, and books are all examples of paper. Unknown waste (highly decomposed and difficult-to-identify litter), and extra class background label (no litter): a sidewalk, a forest trail, and a lawn.
M. CIGARETTE BUTT DATASET [97] This dataset contains 2200 images of cigarettes on the ground that were created synthetically. It is intended for CNN training (convolutional neural networks). The images were generated automatically using custom code that used the Python Imaging Library to apply random scale, rotation, brightness, and other parameters to the foreground cutouts. iPhone 8 is used in taking pictures, and the original pixel resolution was 3024 by 4032.
There are currently few open waste datasets, with the TrashNet dataset being the most widely used. It is a small collection of recyclable waste images, including glass, paper, cardboard, plastic, and metal, with 2,527 photos in total. Currently, the majority of waste classification research based on image recognition is based on the TrashNet dataset, which has a high classification accuracy rate. This dataset, however, has some flaws: 1. The amount of sample data is insufficient; 2. The number of different types of waste is unevenly distributed; 3. The background of the image is single or clear, which does not meet the needs of real scenes and is detrimental to the training model's generalization ability; and 4. The number of items is insufficient to represent the majority of objects in a community or domain.

V. CHALLENGES OF WASTE DETECTION AND CLASSIFICATION
Even though deep learning-based models have emerged as an extremely powerful framework for dealing with various types of vision problems, such as image classification [26], object detection [26], [102], and more relevantly single object tracking. Despite the contributions of DL models, there are still challenges that remain in trash or garbage detection and classification.

A. LIGHTNING CONDITION
Because of illumination changes, problems become more complex, as different lighting conditions change the visibility of an object or should alter its appearance which leads to serious difficulty.

B. INSUFFICIENT DATA
lack of available trash data is a major obstacle affecting the implementation of AI systems. AI models are primarily driven by large data sets for training and calibration. Current research is frequently hampered by a lack or inadequacy of waste data. This is partly because waste and trash management industries are mostly out-of-date, with few reliable records and scarce sensory data, particularly in developing countries.

C. OBJECT SIZE AND LOCATION
Objects in low visibility should not be visible enough to be recognized. The system may fail if the object is too small or the distance from the system is too great. Various lighting conditions and shadows should also make it difficult to identify the images.

D. OBJECT LOCALIZATION OR IN THE WILD
Image classification to determine the class of the images is a major problem in object detection and identification. The system was unable to predict the location of the object in the images. As a result, image classification is a major issue.

E. OCCLUSION OR TRASH IN THE WILD
Some objects are blocked or hidden by the image of another object's presence, which leads to the blockage of most or some parts of the targeted object, which will cause serious low recognition accuracy.

VI. CONCLUSION
In conclusion, this paper discusses a vast number of research papers, to be exact on the subject of deep learning in trash detection and classification, as well as object recognition, with a primary emphasis on the most recently published articles in the field. In the references, you will find a list of the papers that were utilized for the purpose of this survey. The papers that were collected were from reputable and reliable publishers such as IEEE, Scopus, Google Scholar, and Springer, amongst others. The purpose of this survey study is to investigate the many different uses of deep learning for recognizing and classifying waste. This study provides an orderly and comprehensive assessment of numerous available methods for the detection and classification of garbage using machine learning and deep learning. In addition to this, it easily explained benchmarked datasets on the detection and classification of trash in a variety of settings. In order to support the study, both the benefits and the drawbacks of the existing methods and datasets, as well as the possibilities for future research, are highlighted. In addition, we are considering performing a systematic literature review on this subject, and also experimenting with different machine learning and deep learning algorithms.