ScrapNet: An Efficient Approach to Trash Classification

As people have become more aware of their actions and how they affect their surroundings, they have realized the dire state of the environment. As a result, the recycling movement has gained momentum as a measure to save it. Contemporarily, the recycling industry has not seen a major shift and the problems that existed decades earlier persist. Trash classification is at the core of these problems, because if you can’t classify it, obviously you can’t recycle it. Manual classification often leads to misclassification as humans classify or judge things based on their experiences, knowledge, and not just absolute facts. Additionally, if the waste to be sorted is toxic, being in direct contact may be physically harmful to the people involved. Until a solution is found for this problem, the recycling industry won’t be on par with the rise in recycling culture. Thus, this is the problem we have set out to solve; this paper proposes a Deep Learning model based on EfficientNet Architecture that can classify different kinds of trash with an accuracy high enough to make it a viable solution for the industry while using a comparatively lower number of parameters than existing methods. We achieved an accuracy of 98% on the TrashNet dataset, the standard data for Trash classification, and outperformed all existing models. Additionally, as no large dataset with a varied set of trash images was present, we created a new dataset of 8135 images by combining various datasets and standardized them, achieving a classification accuracy of 92.87% with EfficientNet B3.


I. INTRODUCTION
Annually, 2.01 billion tonnes of municipal solid waste is generated, and at least 33 percent, which is a conservative estimate by any means, is not managed correctly. The daily waste generated by a person per day averages 0.74 kilograms but ranges from 0.11 to 4.54 kilograms. Even though highincome countries only account for 16% of the world's population, they account for 34% of the waste. Also, waste collection differs across income levels, reflecting the consumption trends. High-income countries generate more dry waste that is easily recyclable like-paper, plastic, cardboard, metal, glass, etc. On the opposite end of the spectrum, the low-income countries generate more food and green waste and just 20% recyclable waste. There is a clear inverse relationship between organic waste produced and the level of economic development [11].
The associate editor coordinating the review of this manuscript and approving it for publication was Yudong Zhang .
According to [11], Municipal Solid waste is expected to rise to 3.4 billion tonnes annually by 2050. The waste collection followed by recycling, combusting, or landfilling is crucial for managing this rising threat. Still, only 48% of waste is collected in low-income countries, dropping to 26% outside urban areas. One of the main constituents of waste is plastic. Since the rise of plastic waste in the 1950s, only 9% has been recycled, 12% has been incinerated while leaving 79% of it in the environment. The rapid growth in plastic manufacturing has outpaced the rise of nearly every other manmade material. For comparison, half of all steel produced is used in construction, while half of all plastic manufactured becomes trash in less than a year.
A solution for this excessive waste is the need of the hour, and the long-term solution is recycling. As shown in Figure 1, we see that even though recycling has been on a rising trend, only 25.6% of the Municipal Solid Waste generated in the U.S. in 2015 was recycled [9]. The recycling industry faces several hurdles, which begin right at the source,i.e., the household. People often do not separate their waste which makes  it a challenge to separate the recyclable material at the plant. This leads to a majority of the waste in the landfill, further aggravating the situation and increasing the pollution.
Even if a substantial part of this waste reaches recycling plants, the plants are neither accurate nor large enough to efficiently recycle the waste. They are heavily dependent upon manual labor that is riddled by problems like a lack of training, information retention, excessive downtime, inability to distinguish materials, etc. and slows down the throughput of the plant [12]. Globally, less than 14% of the total waste is recycled, and almost one-fourth of the waste is thrown in uncontrolled landfills( Figure 2). It is estimated that 1.6 billion tonnes of Carbon dioxide (5% of Global emission) was generated due to disposing of waste in open dumps and uncontrolled landfills. This number is expected to rise to 2.38 billion tonnes by 2050 if nothing changes.
The current COVID-19 pandemic has only worsened the situation with the astronomical rise in medical waste. With the advancement in technology, techniques like Deep Learning have gained enough robustness to be implemented in reallife situations. Many papers have been proposed to solve the problem of waste classification, but the majority of the experiments have been done on the TrashNet dataset. Albeit being the most frequently used waste dataset, it has a small size and variety. The other datasets currently being used to classify waste were too specific in their use cases to be used in the industry on a large scale. Another drawback of the majority of existing research is that they achieved their highest accuracy through large networks like ResNet, DenseNet, and Incep-tionNet, and would be harder to implement in smaller devices. As quite evident, there is a lot of scope for improvement in this field. This paper presents a modern approach, one that implements and compares the latest Deep Learning neural networks to aid the struggling recycling industry. We suggest a deep neural network for object detection for multi-object scenes to separate trash and then a different deep neural network that classifies trash based on their material and composition into Seven categories. Finally, we suggest using a similar network to further sub-classify the plastic category as recyclable or non-recyclable based on popular guidelines. Our major contributions through this paper are as following-1) Creation of a new diverse dataset, which is larger than any pre-existing waste dataset, and can be used as the new standard for waste classification. 2) Comparison and analysis of various deep learning architectures for item detection of objects on the novel dataset to achieve superior accuracy. 3) Modifying the deep neural network used on our data to classify images of TrashNet dataset and achieve the highest accuracy yet. 4) Reclassifying plastic objects using deep neural network as recyclable or non-recyclable based on UK recycling guidelines. The rest of the paper is distributed as following -Section 2 covers the previous related works in waste classification through machine learning and deep learning approaches, Section 3 covers the creation of a new dataset called ScrapNet and its distribution. Section 4 covers the methodology, the architectures, and the experiments performed, while Section 5 discusses the results of these experiments. Section 6 concludes the paper while comparing the accuracy attained on TrashNet.

II. RELATED WORKS
Even though this field is still quite undeveloped with considerably less research on the use of advanced technology like deep learning, the following studies have been conducted and are relevant to our research: • Authors Tan et al. [19], proposed a weighted Bi-directional Feature Pyramid Network(BiFPN) along with a compound scaling method to improve efficiency for object detection networks. Based on this, they developed a new family of object detectors called Efficient-Det, which achieve much higher efficiency than existing networks across a wide range of constraints.
• Authors Yang and Thung [24], collected images of single pieces of recycling or garbage and classified them into six classes: glass, paper, metal, plastic, cardboard, and trash. The data was hand-collected and consisted of 400-500 images for each image class. SVM with SIFT and CNN were implemented and compared for this classification task. They achieved an accuracy of 63% and 22%, respectively.
• In another study, Chu et al. [6] proposed a Multi-layer Hybrid deep-learning System (MHS) to sort waste automatically. A high-resolution camera with sensors was used to extract image features and other features. The MHS used a CNN and a multi-layer perceptron (MLP) to combine the image with features and classify wastes as recyclable or not. They achieved an accuracy higher than 90% under two different testing scenarios.
• Authors Adedeji and Wang [1] proposed an intelligent waste classification system that uses ResNet-50 CNN to extract the features and SVM to classify them into different categories such as glass, metal, paper, etc. The trash image dataset created by [24] was used and, an accuracy of 87% was achieved.
• A more targeted approach was taken by Sousa et al. [ [15]. They compared these deep learning architectures by classifying images of the TrashNet dataset into different categories. They implemented a combined Inception-ResNet model that outperformed others and achieved an accuracy of 88.6%.
• Several deep CNN architectures were implemented and compared by the authors Bircanoglu et al. [4] to find the best one. They compared several models without pre-trained weights and found that Inception-ResNet, Inception-v4 outperformed other networks with a 90% accuracy. Further, they also implemented transfer learning and fine-tuning of weight parameters using ImageNet. They were able to achieve an accuracy of 95% with DenseNet121.
• Authors Aral et al. [3] tested several deep learning models on the TrashNet Dataset. These included the DenseNet121, DenseNet169, MobileNet, InceptionRes-NetV2, Xception architectures. The accuracy achieved   by DenseNet121 was 95%, followed closely by Incep-tionResNetV2 that achieved an accuracy of 94% after fine-tuning. The related works have been summarized in Table 1. The drawbacks of the works along with how the paper aims to solve them has been highlighted as well.

III. DATA
Most research in this domain is done either on datasets that are very particular, like the Labelled waste in the wild [17] dataset which had images of left-over waste on Food Tray, or they were simulated and captured in a controlled environment with limited variety, like the dataset used in [6] and lastly, many datasets focus on limited classification categories like the VN-trash dataset [20]. As a result, Trash classification models are rarely generalized and only distinguish a specific kind of waste. Hence, these models do not represent the actual waste that required sorting. To overcome this problem, we combined multiple datasets in such a way that they would cover a broad range of categories and simulate real-life waste in a better way. We unified and standardized multiple datasets into one dataset through filtering and extraction of images in part to create a single dataset, which for convenience we will refer to as the ScrapNet dataset. This dataset is used in all our experiments along with TrashNet, to provide a comparison. We hope to make the mapping, extraction and standardization process for making this dataset from the multiple datasets used in building the Scrapnet dataset available for the scientific community to use for further research. VOLUME 9, 2021 All the datasets used are listed in Table 2 along with the number of images classification categories.
All images from TrashNet [24] and Openrecycle [7] were taken as-is and filtered manually at the annotator's discretion to remove erroneous and unrecognizable images. The images from TACO [14] contained several objects in an image, and these objects were separated through bounding boxes and then extracted to create several images. During data creation, it was noticed that images were scarce in the compost category, and to deal with this 850 representative compost images were extracted from the Waste Classification data [16] available on Kaggle.
To account for the discrepancy in the classification categories, the images were classified by 2 annotators with domain knowledge with an overlap rate of 99% to ensure proper classification. The segregated images accounted for the material of the objects, and the possibility of recycling them as classification of trash does not have clear boundaries. For example, one might assume that paper cups belong in the paper category, but for recycling purposes, it does not belong there as they have a thin lining of plastic that prevents water from leaking also prevents them from being recycled.
Images in the following Six categories -Plastic, Metal, Glass, Paper, Cardboard, Compost, are classified as belonging to the particular material class and being recognized as recyclable. On the other hand, objects in the Trash category are objects that are either in poor condition or made of a combination of materials, which renders them unrecyclable through most simple processes. The distribution of data is in Figure 3.
Lastly, while these Four datasets combined had only 7179 images distributed over a varying set of categories, ScrapNet has a total of 8135 images distributed over Seven categories. This increase of 956 images was achieved through multiple object extraction from the TACO dataset. The data distribution is in Table 3.

IV. METHODS
In this paper, we experimented with, compared, and analyzed the implementation of different CNN architectures on our dataset. Experimentation with multiple models allowed us to analyze the results, and determine the level of augmentation needed. Also, it led to the crystallization of complex relations within the data that were not visible before. The overall objective of analyzing different models was to understand the performance-accuracy trade-offs between the models.

A. COMPARISON OF DIFFERENT NETWORKS
We explored multiple neural network architectures -ResNet [10], ResNext [23], and EfficientNet [18], along with their different sub-architectures. The deep learning models currently used for Trash Classification implement ResNext101 and ResNext152, which are relatively huge compared to EfficientNet. EfficientNet consistently reduces parameters by order of a magnitude of 8.4x and increases speed by 16x Floating Point Operations (FLOPS) when compared to the current state-of-the-art models. This increased speed is necessary to apply deep learning models in real-time applications where efficiency is crucial for success. Moreover, bigger CNN models are harder to train and struggle to operate on edge-computing applications.
As a baseline experiment, we used the ResNet50 model. Although not a very large model, it is still considered one of the leading models to achieve speed and accuracy in image classification tasks.

B. ARCHITECTURES 1) EfficientNet AND EfficientDet ARCHITECTURE
EfficientNet [18] Architecture was introduced in 2019 at the International Conference on Machine Learning (ICML) by Authors Mingxing Tan and Quoc V. Le. Constructed on top of the idea that balancing the network depth, image resolution, and width can increase performance. To bring this idea to life, they introduced the concept of compound scaling through EfficientNet. Compound scaling uses a coefficient to scale up the width, image resolution, and depth uniformly.
The authors propose the following compound scaling parameters and constraints: φ is the user specified coefficient used to control the Floating Point Operations (FLOPs) for constructing the model. α, β, γ are the parameters used for distributing the depth, width and image resolution.
The FLOPs required for a convolutional operation are directly proportional to d, w 2 and r 2 as represented in Equation 1. The authors restricted the value of the parameters of α, β 2 , γ 2 to 2 so that with every new φ, the FLOPs for the new model goes up by 2 φ .
The architecture of the base EfficientNet model: EfficientNet B0, was found by manipulating the depth, width, and resolution parameters to maximize accuracy while minimizing the FLOPs. The search space is the same as Tan et al., 2019 and as a result, the EfficientNet B0 is very similar to the MnasNet [3]. This network was used as a baseline to construct the large networks in the EfficientNet series.
The same concept was further propagated into the Effi-cientDet [19] Architecture. Compound Scaling was applied to the BiFPN Network to scale alongside the EfficientNet backbone and provide a faster convergence time.
The experiments conducted by the authors revealed that for the same FLOPs, the accuracy of EfficientNet was higher than any existing architecture. Additionally, it provided a faster convergence along with better performance in computer vision tasks [18].

C. TRANSFER LEARNING
When all the parameters of a neural network are learned from scratch, the weights of the neural network get initialized randomly, and as a result, the model learns very slowly and often has to be trained for a long time to reach convergence. Instead, a concept known as Transfer learning [13] is used quite often. Generally, transfer learning is used for very large networks that have been pre-trained on a vast dataset. As a result, minimal training is required for achieving state-ofthe-art accuracy, even with a small amount of data. Transfer learning can either be done by retraining the pre-existing layers in the model or adding new head layers to the end of the model and training either the newly added layers or all of them.
We applied transfer learning by using the architectures above as the base model and building upon them by adding new head layers. Experiments on all the architectures have been done using the widely popular weights trained on Ima-geNet, but EfficientNet models have also been trained using the Noisy Student Weights [22] which achieves 88.4% at the top-1 accuracy on ImageNet, which outperforms the previous state-of-the-art model. Also, we have experimented with ResNet and ResNext from scratch but observed that applying Transfer learning using pre-trained weights provided better results.

D. EXPERIMENTAL SETUP
The experiments were performed on a computer system having an eight-core CPU with 32 GB RAM, GTX 1080 GPU running Ubuntu 20.04 operating system. The deep neural network model was coded in Python using Pytorch 1.8.0 and CUDA 10.2. The assumption that pre-processing tasks like normalization and reshaping of data would not modify the informational integrity of the data has been taken.
Experiments were performed to investigate the effect of different network architectures, optimizers, and transfer of weights. The first phase was to find the best architecture for the dataset. From there on, the next problem we tackled was deducing the best sub-architecture. For example, in the EfficientNet architecture, the sub-architecture would include the B0, B1. . . B7. In this phase, we also compared the performance of pre-trained weights while deducing the sub-architectures. Finally, we compared the performance of different optimizers on the dataset. Each optimizer has a different set of parameters attached to it, the optimizer parameters were set to their default values or as suggested by the authors of the architectures. A total of 22 different experiments were conducted on the dataset. A combination of metrics, Precision, Recall, F1-Score, and Accuracy, of the validation set, was used for evaluating the different architectures. Each network was trained with a batch size of 4 and gradient accumulation of batch size 16.

E. TRAINING AND EXPERIMENTS
The primary objective was to make sure we had objects separated in images with multiple objects. Without having separated objects, any model would not be able to adequately determine the class. We looked into the state-of-theart object detection models and came across EfficientDet, a recently released family of object detectors that outperforms its competitors by a significant margin. Afterward, we performed tests by constructing different models using ResNet, ResNext, and EfficientNet architectures as the base, followed by additional layers. We also conducted tests using two different ways, either by initializing the base model from scratch or using pre-trained weights.
When trained from scratch, the weights for all the layers in the architecture are randomly initialized using a gaussian distribution. Meanwhile, when training using pre-trained weights, the three architectures are initialized using pretrained weights, and the new fully connected layers are randomly initialized and finally trained on the new categories of our dataset.
Initially, all the EfficientNet models, along with Resnet and Resnext models were trained using all 4 optimizers to understand the performance differences as seen through Table 4. The optimizer comparison for EfficientNet B3 has been visualized in Figure 4 to see the change in accuracy with increasing Epochs. During our testing, the RMSProp optimizer achieved the highest accuracy, followed by Adam, Adagrad, and SGD. While Adagrad converged the fastest at four epochs, RMSProp, SGD, and Adam converged much later at 7-15 epochs. Even though the Adagrad optimizer converged earlier than RMSProp, it had a much lower accuracy, and as a result, we decided to go forward with the RMSProp optimizer.
Weight Standardization(WS) [21] was implemented as part of the Convolution Layer to improve results. WS standardizes the weights in the Convolution Layer so that they have zero mean and unit variance. This reduces the Lipschitz constants of loss and smoothes out the landscape. WS has been tested out on computer vision tasks and has shown significant improvement in image classification and object detection tasks among many others.
where W denotes the modified weight, W denotes the original weights, n is the number of weights. The models combined with the EfficientNet architecture were tested using both methods. When the models were trained from scratch, the weights of the model were initialized from random Gaussian distribution. The model was trained for up to 50 epochs using RMSProp Optimizer at a learning rate of 0.0001. For fine-tuning the model, early stopping and learning rate Scheduler have also been applied by reducing the learning rate by a factor of 2 every time, along with reducing the patience for Early Stopping [5] by a factor of 2.
In the last fully-connected layer, the softmax function is used in order to calculate the confidence of each prediction as follows: where y i denotes the probability of each class, n is the number of classes, x i and x j are the inputs. The architecture of our modified EfficientNet B3 is in Figure 5. This architecture is the general representation of all models used in our experiments. Only the model layer varied depending on which model was being implemented.
All the three architectures use the image sizes they were primarily trained on for the results, i.e., 224 × 224 was used for ResNet50 and 380 × 380 for the EfficientNet B4. According to our experiments, the pre-trained models perform better when retrained using original image sizes. The EfficientNet models performed about 2% lower with the image size of 224 × 224 than the experiments performed with their original image size.
The model's standard naming is used to refer to the models initialized with random weights, like -''EfficientNet B0'' while the models trained with pre-trained weights are referred to with a suffix ''PW'' which stands for ''Pre-trained weights'', like -''EfficientNet B0 -PW''.
The ResNet models on average took a longer time to converge than the EfficientNet models with about 7-10 epochs more on average as compared to the much smaller Efficient-Net architecture. EfficientNet models converged between 4 to 7 epochs, while the ResNet models took 14 epochs. The ResNet models had lower performance than all the Efficient-Net models as seen in Figure 6.
As another validating parameter for our model, we tested it on the TrashNet dataset. For this, our data was split into 80/10/10 as training, validation, and testing datasets. Also, we understood the possibility of a fluke where the entire TrashNet data might end in the training set, and, as a safeguard, we split the TrashNet data within our larger dataset into 70/15/15 as well. Moreover, we implemented Transfer Learning to use the weights found during testing on our primary dataset. These weights were mapped onto all layers except the final layer due to the absence of the compost category in TrashNet.
We have also implemented the concept of Adaptive Gradient Clipping. Gradient clipping is used to clip the size of gradients to ensure optimization of deep neural networks in areas with a higher loss. Adaptive Gradient clipping removes the need for hand-tuning the clipping parameters and allowed us to train our model with larger batch sizes and better data augmentations.
The process flow used on the ScrapNet Dataset to detect if a plastic item is recyclable or not has been summarized in Figure 7. And the process flow for Trashnet is summarized in Figure 8. A subset of TrashNet is separated to use for testing, and the rest of TrashNet, along with OpenRecycle, TACO, and Waste Classification datasets were combined as explained previously and called ScrapNet. The dataset was split into Training, Test sets, and the ScrapNet Model is trained on the training set. The ScrapNet model uses Conv and WSConv layer, followed by a Global Average Pooling and then BatchNorm and Dropout to get a 7 class classifier. AGC, Gradient Accumulation, and RandAugment were also used to achieve state-of-the-art accuracy with ScrapNet. Finally, as a comparative measure, we replaced the 7-class classifier layer with a 6-class classifier layer. Then the TrashNet Test set was fed to the model and successfully achieve the highest accuracy on TrashNet.
The Plastic category in particular was chosen to be subclassified. The main reason being that Plastic objects are recyclable only if they belong to a certain sub-group of plastic, and even then, specialized equipment is often needed to recycle them. The driving factor behind this decision was that plastic is still one of the most major waste components and one of the toughest to sort and recycle due to the complexity of the material. Traditionally, plastic is divided into seven categories, and some of these are recyclable while others are non-recyclable. This leads to a lot of confusion in manual sorting, eventually leading to a lot of recyclable plastic being thrown out instead [2].
The UK Sustainability guide [8] was used as a baseline for the division of the TACO Dataset into Recyclable and Non-recyclable objects. The EfficientNet B3 network, which performed the best for 7 class Trash classification, was trained on the TACO Dataset's plastic category, and then further used to classify the entire plastic category of ScrapNet. The division for Recyclable and Non-recyclable Objects in the Taco Dataset is explained in Table 5.

V. RESULTS AND DISCUSSION
The EfficientDet B3 achieved the highest mAP in our testing of 82.34% and was used for creating bounding boxes and separating multiple objects in a scene. The results of the VOLUME 9, 2021 FIGURE 7. ScrapNet paper process flow from localizing Trash in images to eventually predicting the recyclability status of the Trash. trash classification experimentation are shown in Table 6. The Precision, Recall, and F1-Score have been computed using a weighted average. The experiments used augmented data to increase the heterogeneity of our dataset.
The experiments were performed with 5 runs for each model and the mean accuracy has been considered as the primary data point going forwards. The experiments are summarised in Table 7 and Figure 9 with the Standard deviation, Minimum and Maximum accuracy received in the 5 runs for each of the model.
ResNet50 was chosen as the baseline model because it is one of the most popular models for image classification.  While being relatively easy to implement, it achieved a decent accuracy of 83.11% on our dataset. In the world of CNN, ResNet50 is still a relatively small model, and we decided to implement larger models with higher complexity. ResNet101 and ResNet152 were chosen. A fall of 1.9% was observed on dropping to ResNet101, and the accuracy decreased even further by 2.1% on ResNet152. Then, ResNext101 was implemented, but the results were close to ResNet101, and the accuracy was lower than that of ResNet50 and ResNet101.
While researching different models and techniques to improve the Trash classification, a new deep neural network architecture released by Google called EfficientNet was discovered. The team at Google created Eight models based on this architecture, and released them in 2019, ranging from EfficientNet B0 to EfficientNet B7 with an increasing depth and parameters. The architecture claimed to outperform all existing networks on ImageNet. As the existing results weren't at par even with fine-tuning and optimization, it was decided to experiment with the EfficientNet architecture.
Firstly, EfficientNet B0 was implemented, as it is the base EfficientNet model, and it outperformed ResNet101 by an astounding 6% and 9% for the pre-trained and non-pretrained versions, respectively. It became our top-performing model yet. With this positive result, we decided to implement and fine-tune all the other EfficientNet models as well.
Through the experimentation process, we observed that, for the pre-trained versions of the models, the accuracy increased from EfficientNet B0 to B2 and peaked at B2 before declining substantially from B3 to B7. For the non-pretrained versions, the accuracy showed erratic behavior and no consistent pattern, but the best accuracy was achieved on the non-pre-trained versions of B0, B3 and B5. B5 has not been chosen for comparison because of its much larger size, and no real benefit on top of the smaller networks. After further finetuning and training, the best accuracy achieved was 92.87% by EfficientNet B3.
Additionally, we achieved an accuracy of 89.21% with EfficientNet B3 for the final sub-classification of Plastic as Recyclable and non-recyclable. The non-pre-trained weight version of the models had a higher accuracy compared to their pre-trained counterparts in all cases except one(B2). And the same was true for our top-performing model, where B3 beat B3-PW by almost 5%, as seen in Figure 10.
During the testing and comparison of the models, some interesting patterns were observed and the same have been  discussed further. Referring to Table 6, we noticed that Effi-cientNet B0 & B3 had a similar F1 score and an accuracy close to our top model, and we decided to take a look at the confusion matrix to find where B0 might be lacking. Figure 11 and 12 show the confusion matrix for B0 and B3 respectively. We can observe the following things: 1) B0 performed better than B3 in the paper category by 3.3%, in metal by 0.5% and in compost, by 2.7% while performing similar to B3 or only marginally lower in the other four categories. This means that B0, which has less than half of the parameters of B3, can perform similarly. Additionally, it classifies better where Paper, metal, or compost might be the priority or the major category. Its small size also enables quick training when new data is added. 2) B3, on the other hand, performed better than B0 in the trash category by 1.5%, in plastic by 4.2% and glass by 1.4% category while performing similarly to B0 or marginally lower in the other four categories.  Even though B3 has a higher number of parameters, it's useful due to its higher overall accuracy, especially in the above three categories. Moreover, the accuracy gap might even increase as the model trains on more data over time. This observation led to another use case for EfficientNet B0, it could be used in smaller devices as it works on fewer parameters and would require lesser computing power. On the other hand, B3 can be used in places where accuracy is a priority without regard for computation power. The final Effi-cientNet B0 and B3 networks were trained using RMSProp optimizer with a learning rate of 0.0001 and a Batch size of 4 and gradient accumulation of batch size 16 on a Training set the size of 6840 images and tested on the Testing dataset of 1260 images. The callbacks used in the model were Early Stopping and reduce learning rate on the plateau of 5 epochs based on validation loss by a factor of 0.5 and a minimum learning rate of 0.41. Finally, stochastic weight-averaging was used with a starting period of 5 epochs and averaging period of 3 epochs.
The standard trash data currently used is the TrashNet dataset. To get comparative results for our models, we put our model against all other models we found in our research and literature review in Table 8 plotted in Figure 13. The model was trained on ScrapNet with seven categories, and transfer Learning through substitution of the final sevenelement classifier with a six-element classifier for classifying TrashNet was used. The model achieved an astounding accurate of 98.4%, and in our research, this was the highest accuracy ever achieved on TrashNet. The model outperformed modified SVM's, ResNets, and DenseNets by a margin of 3.5%. TrashNet is quite limited in its categories, size and as the size and variety increased, these models would improve substantially, and the ScrapNet dataset is a step in that direction. As the research community starts using this dataset as the new standard to build new models surpass ours and achieve higher accuracy, we get closer to real-life implementation of these technologies in Industries.
For future work, the aim is to improve the model by further enabling it to classify the objects into sub-categories. This sub-classification would be immensely useful in real-life applications as all objects are not recycled through the same procedure and often require different machinery. This subclassification would allow recycling plant owners to have the option to accept only those items that the plant can recycle, leading to a direct decrease in the amount of non-recyclable trash being collected at these plants. Additionally, we aim to work on improving the size and quality of the data so that human intervention and validation can be eliminated to improve safety measures in the recycling industry. At the time of writing, we see a lot of untapped potential in this domain and the field is ripe for more research which we hope happens soon.

VI. CONCLUSION
We set out to create a novel dataset, compare and analyze different Deep learning approaches to classify waste. We successfully constructed a new dataset for trash classification that consists of over 8100 images of objects spanning seven categories. This dataset is larger and more diverse than any existing dataset. We achieved an mAP score of 82.34 for object detection through EfficientDet B3. After intensive comparison and analysis, we proposed a new deep neural network model for trash classification using the EfficientNet B3, which was modified and fine-tuned to be more efficient to achieve an accuracy of 92.87%. To provide a comparative study, showcase the robustness and effectiveness of our model, we performed several experiments ourselves using different architectures such as ResNet, ResNext, and our model outperformed all of these. In addition to this, we tested our model on TrashNet, compared it with several previous related works on this dataset as well, and achieved one of the highest accuracies of 98.4% on TrashNet Figure 13. Lastly, we utilized EfficientNet B3 to achieve an accuracy of 89.21% for sub-classification of the plastic category.
ABHISHEK MASAND is currently pursuing the bachelor's degree in computer science engineering with Manipal University Jaipur. His research interests include imitation learning, reinforcement learning, NLP, and computer vision.
SURYANSH CHAUHAN is currently pursuing the bachelor's degree in computer science engineering with Manipal University Jaipur. His research interests include machine learning, data and business analytics, and data exploration.
MAHESH JANGID (Senior Member, IEEE) received the Ph.D. degree in deep learning. The objective of his research is to use deep learning approaches in computer vision field and document analysis and recognition. He is currently working as an Associate Professor. He has guided bachelor's and master's degree students in the field of deep learning and digital image processing. He has published and presented more 20 research papers in the peer reviewed journals and international conferences. His research interests include machine learning, soft computing, pattern recognition, and image processing. He is actively involved in IEEE activities. Apart from that, he is also an Active Member of various globe societies, such as CSTA-ACM, IEEE-Computer Society, UACEE, and IAENG.
RAJESH KUMAR received the M.Tech. degree from the Computer Science and Engineering Department, Guru Jambheshwar University of Science and Technology, Hisar, Haryana, India. He is currently pursuing the Ph.D. degree with the Computer and Communication Engineering Department, Manipal University Jaipur. He is also with the Department of Computer Science, College of Informatics, Bule Hora University, Ethiopia. He has more than eight years of experience in teaching and two years of experience in industry. His area of expertise and interests include computer networking, especially with wireless protocols, and cyber security. He has in depth knowledge in Cisco device configurations with routing, switching and security. He is also working in 5G wireless technology and has published several research papers in various peer-reviewed journals and conferences.
SATYABRATA ROY (Senior Member, IEEE) received the B.Tech. degree in computer science and engineering, in 2009, and the M.Tech. and Ph.D. degrees (Hons.) in computer science and engineering, in 2014 and 2020, respectively. He is currently an Assistant Professor (Senior Scale) with the Department of Computer Science and Engineering, School of Computing and Information Technology, Manipal University Jaipur, Rajasthan, India. He is also an Enthusiastic and Motivating Technocrat with more than ten years of research and academic experience at different reputed institutes. He has supervised many students for their M.Tech. dissertation work and supervising Ph.D. scholars in the domain of information security. He has published many research articles in top quality international journals and national/international conferences of repute. He has organized many international conferences, FDPs, and workshops. He has participated in many short-term courses, faculty development programs, workshops, and MOOCs offered by prestigious universities of India and abroad. He has served as resource person of many FDPs and seminars. His research interests include cryptography, the Internet of Things, cellular automata, computer networks, computational intelligence, machine learning, and formal languages. He is a Professional Member of ACM. He has served as a member for technical program committee of many international conferences and symposiums. He is also working as a reviewer for several reputed international journals.