A Reliable and Robust Deep Learning Model for Effective Recyclable Waste Classification

In response to the growing waste problem caused by industrialization and modernization, the need for an automated waste sorting and recycling system for sustainable waste management has become ever more pressing. Deep learning has made significant advancements in image classification, making it ideally suited for waste sorting applications. This application depends on the development of a suitable deep learning model capable of accurately categorizing various categories of waste. In this study, we present RWC-Net (recyclable waste classification network), a novel deep learning model designed for the classification of six distinct waste categories using the TrashNet dataset of 2,527 images of waste. The performance of our model is subjected to intensive quantitative and qualitative evaluations and is compared to various state-of-art waste classification techniques. The proposed model outperformed several state-of-the-art models by obtaining a remarkable overall accuracy rate of 95.01 percent. In addition, it receives high F1-scores for each of the six waste categories: 97.24% for cardboard, 96.18% for glass, 94% for metal, 95.73% for paper, 93.67% for plastic, and 88.55% for litter. The reliability of the model is demonstrated qualitatively through the saliency maps generated by Score-CAM (class activation mapping) model, which provide visual insights into its performance across various waste categories. These results highlight the model’s accuracy and demonstrate its potential as an effective automated waste classification and management solution.


I. INTRODUCTION
Globalization, fueled by rising populations, industrial expansion, and economic expansion, has led to an increase in the demand for natural resources.This increased resource consumption has simultaneously led to an alarming increase in waste production [1].A significant amount of urban waste The associate editor coordinating the review of this manuscript and approving it for publication was Liandong Zhu.
continues to be illegally disposed of, primarily through landfills and incineration [2].This continuous flow of pollution poses a serious risk to urban ecosystems and the health of local residents.Notably, a significant portion of this waste consists of household garbage, and the decomposition of certain components within household garbage can lead to the accumulation of hazardous compounds in the environment, thereby escalating ecological risks [3].In addition, certain residential waste materials manifest poor biodegradability, as exemplified by the common plastic pollution observed in underwater ecosystems worldwide [4].One-third of the world's waste is improperly managed, lacking proper sorting and adequate measures, thereby causing extensive environmental pollution and posing a grievous threat to sustainable development [5].In response to these escalating environmental challenges, the Environmental Protection Agency (EPA) has emphasized the significance of reprocessing municipal solid waste (MSW) as an environmentally responsible waste management strategy [6].Indeed, the global production of municipal solid waste reached 2.01 billion tons in 2016, with projections indicating an increase to 2.59 billion tons by 2030 [5].In order to mitigate environmental consequences and assure the development of sustainable societies, the need for efficient waste management procedures has never been greater.
In the past ten years, the field of deep learning has witnessed remarkable advancements, driven by substantial improvements in computational capabilities and theoretical underpinnings [7].These advancements have had a significant impact on a variety of computer vision domains, producing exceptional results in tasks such as image classification, object detection, and semantic segmentation.Notably, Convolutional Neural Networks (CNNs) have ushered in a new era of image classification.These networks automate feature extraction, improve accuracy, and redefine the capabilities of computer vision, which is especially relevant for the efficient detection and classification of waste in recycling, thereby reducing labor-intensive processes and costs [8].Computer vision and deep learning methodologies hold great promise for automating the identification and classification of waste types, thus streamlining waste management processes [9].Recognized as effective strategies for reducing waste production and promoting sustainability, recycling and waste sorting have encountered obstacles such as low efficiency in traditional machine and manual waste classification, limited public awareness of waste categorization, and the inherent complexity of the waste classification process [10], [11], [12].These obstacles have prompted the investigation of automatic waste detection and classification technologies with the goal of enhancing operational efficiency and reducing costs.Due to their robust modelling capabilities [13] and end-to-end learning paradigm, reducing the need for explicit feature engineering [14], deep learning approaches have proven superior to conventional machine learning techniques.The success of these deep learning models is defendant upon the availability of relevant datasets [15].Yang and Thung's 2016 introduction of the TrashNet Dataset marked a milestone in waste image classification [16].While additional waste datasets such as TACO, AquaTrash, and VN-trash have since emerged, expanding the available resources [9], [17], [18], they have certain limitations, such as small sample sizes, a focus on specific environmental contexts, and restricted accessibility as non-open-source datasets.Addressing these obstacles and optimizing the performance of waste classification remains a top priority for researchers.
The TrashNet dataset, which includes the most prevalent waste types -cardboard, glass, metal, paper, plastic, and litter -is the focal point of our experiment.Our objective is to develop a robust deep learning framework that can accurately classify these waste types, thereby improving the effectiveness and efficiency of waste sorting and recycling processes.The main contributions of our work can be summarized as follows: • Six different categories of waste were classified with high reliability in this study.
• A novel deep learning model, recyclable waste classification (RWC-Net) is proposed to classify the wastes.
• A saliency map-based visualization generated score-CAM (class activation mapping) was shown as quantitative evaluation.
The manuscript is organized as follows: In Section II, we present a comprehensive literature review of the Trash-Net dataset.Section III offers an in-depth discussion of the methods and materials used in our study, encompassing a detailed dataset description, preprocessing steps, model specifications, and the evaluation metrics employed.Section IV provides an extensive analysis of both quantitative and qualitative aspects of our investigation.Finally, in Section V, we conclude with a discussion of our findings and outline potential future research directions aimed at advancing sustainable waste management practices.

II. RELATED WORKS
Effective waste management has become a major social concern, necessitating an efficient automated waste classification system.In a rapidly expanding and industrialized world, encouraging residents to participate actively by sorting and recycling waste has become essential to the effective management of waste.In the early phases of research, images of waste were classified using traditional machine learning techniques.In 2016, Mindy et al. applied the Support Vector Machine (SVM) algorithm to the TrashNet dataset, attaining a 63% accuracy [16].In 2018, Bernardo et al. classified six categories of garbage images from the same dataset using the K-Nearest Neighbors (KNN) algorithm with an impressive 88% accuracy [19].Other efforts, including those by Mandar, employed the Random Forest (RF) and Extreme Gradient Boosting (XGBoost) algorithms, yielding 62.61% and 70% accuracy, respectively [20].However, with the recent advancement in the field of deep learning, the landscape of waste classification has changed.The superior performance of deep learning models over traditional machine learning techniques has led to significant advances in waste management [21], [22].
In recent years, deep learning models have made significant contributions to the field [23].In early 2018, Kennedy et al. implemented the OscarNet network, which was refined by VGG19 to attain an accuracy of 88.42% on Trash-Net dataset [24].Notably, in October 2018, the team led by Costa et al. presented a fine-tuned AlexNet network with 91% accuracy and a fine-tuned VGG16 network with 93% accuracy [19].In addition, Rabano et al. incorporated a MobileNet network with an accuracy of 87.2% [25].In December of 2018, Rahmi et al. evaluated several classical networks using the TrashNet dataset.Inception-Resnet V2 and DenseNet121 achieved an accuracy of 89%, which was notably remarkable.They also fine-tuned these models using the ImageNet dataset, where fine-tuned DenseNet121 obtained 95% accuracy and fine-tuned Inception-ResNet V2 attained 94% accuracy [26].In June of 2019, Victoria et al. built upon this foundation to further develop the field.They attained 87.71% accuracy using the Inception network, 88.34% accuracy using the Inception-ResNet network, and 88.66% accuracy using the ResNet network [27].These developments demonstrate the substantial progress made in waste classification using deep learning models.The transition from conventional machine learning to deep learning has not only improved classification accuracy, but also created new avenues for comprehending the intricate refuse categorization process.

III. MATERIALS AND METHODS
In this section, we explore extensively into the TrashNet dataset, exploring its preprocessing phases, data preparation procedures, and waste image classification procedures.Subsequently, the following sections provide a comprehensive breakdown of the deep learning methodologies utilized in this study, detailing their complexities and methodological foundations.In addition, we elaborate on the quantitative metrics used to evaluate the success of each experiment, as well as the qualitative methodologies employed to interpret the results.FIGURE 1 is an illustrative visual representation intended to provide a comprehensive overview of our proposed waste classification method.This diagram depicts the comprehensive workflow of our waste classification method.

A. DATASET DESCRIPTION
In this study, we utilized the publicly accessible Trash-Net dataset, a valuable resource for our study.This dataset includes 2,527 high-resolution images precisely categorized into six distinct waste categories, including cardboard, glass, plastic, paper, metal, and litter.Notably, each image in this dataset depicts a single object and corresponds to a standard resolution of 512 by 512 pixels.This comprehensive dataset, with its variety of waste categories and single object focus, served as the foundation of our research, allowing us to investigate and classify waste materials in a structured and methodical manner.Table 1 provides detailed information on each fold split of the training, validation, and testing sets of the dataset.

B. DATA PREPROCESSING
The TrashNet dataset consists of images in Portable Network Graphic (PNG) format, with standardized dimensions of 512 by 512 pixels each.The creators of the dataset arranged all the data into six different folders, each of which corresponds to a distinctive waste category, including cardboard, glass, plastic, paper, metal, and litter.The dataset underwent a thorough series of preprocessing steps in preparation for training our deep learning models.These procedures included data resizing, augmentation, normalization, and cross-validation.These steps were taken to improve the overall performance of our deep learning models in this study.

1) DATA PREPARATION
To facilitate the training of various deep learning models, we conducted essential data preparation steps.These measures included resizing the data and implementing k-fold cross-validation, a common method for evaluating the performance of deep learning models across the entire dataset.In this investigation, we started by shuffling the entire dataset and creating five folds, with each fold containing the entire dataset along with different validation and test set.We utilized the conventional data allocation split of 70% for training, 20% for validation, and 10% for testing.In terms of image sizes, we resized the images to 224 by 224 pixels, the standard measurement for training our deep learning models.For specific models, such as Inception-v3, the images were resized to 299 by 299 pixels to meet the model's optimal performance requirements.Throughout this research, these detailed data preparation stages were carried out to ensure the robustness and dependability of our deep learning models.

2) AUGMENTATION
The data set details for each category, as presented in Table 1, reveal significant disparities in the distribution of images across the several categories.To correct these disparities and improve the quality of the dataset, a suite of diverse data augmentation techniques was carefully implemented with the PyTorch framework using Python.Initially, we implemented a 'Random Horizontal Flip' with a probability of 0.5 that entailed horizontally flipping images.This technique substantially increased the dataset's variability, enriching the training data for deep learning models.Consequently, we implemented a '30-degree Random Rotation,' introducing random rotations to the images and expanding the dataset's representation of diverse viewpoints.In addition, a 'Random Crop' operation with a probability of 0.5 was used to increase dataset size and enhance the accuracy of class representation.
These combined augmentation strategies increased the number of training images per class to roughly 2,500, laying the groundwork for the training and evaluation of our deep learning models.The number of images for each class after augmentation is presented in Table 1.

3) NORMALIZATION
Normalization is a widely used image processing technique in the field of computer vision that is used to standardize the pixel values of images within a dataset.In our implementation of normalization, we first calculated the global mean and standard deviation of the dataset.The mean represents the average pixel value, whereas the standard deviation quantifies the amount of variation in pixel values around this mean.This method entails transforming pixel values by subtracting the mean and dividing by the standard deviation, both of which are computed parameters.This operation scales the pixel values to attain a mean of zero and a variance of one, thereby centering the data distribution around zero.During the image loading phase, the normalization process was implemented, particularly for RGB images, necessitating the normalization of all three colors channel.Formally, the equation for normalizing each channel is expressed as Eq. ( 1).
Here X represents the original pixel value of the image, X norm represents the normalized pixel value, µ (mu) is the global mean, and σ (sigma) is the global standard deviation across the dataset.The incorporation of normalization into our image processing techniques improves the convergence of machine learning models and ensures consistent performance across the variety images in the dataset.

C. MODEL DESCRIPTION
In this section, we explore the architecture of our proposed waste image classification model and provide key insights into its construction.FIGURE 1depicts how our model was trained to classify waste images into six distinct categories: cardboard, glass, metal, plastic, paper, and litter.Prior research on waste classification has investigated well-known deep learning models such as AlexNet, GoogleNet, ResNet, DenseNet, Inception, MobileNet, and EfficientNet [23], [28].In our experimental configuration, we examined five well-known deep learning models in depth: GoogleNet [29], ResNet50 [30], Inception-v3 [31], MobileNet-v2 [32], and DenseNet201 [33].DenseNet201 and MobileNet-v2 demonstrated the highest performance on the Trashnet dataset among these models.As a result, we developed a novel model, RWC-Net, that combines the advantages of MobileNet-v2 and DenseNet201, achieving the highest performance among all the pretrained models we evaluated.In Section IV, a comprehensive comparison based on various evaluation metrics is presented.For the waste classification, we employed the 'LogSoftMax' activation function and the Cross-Entropy loss function in the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.output layer.The loss function was optimized using the Adam optimizer with a learning rate of 0.00001.In the following sections, we provide an in-depth discussion of these models' architectures, casting light on their detailed design principles and operational characteristics.

1) ARCHITECTURE OF DENSENET201
DenseNet201 is a well-known deep convolutional neural network (DCNN) whose architecture prioritizes information flow and gradient propagation [33].Each layer, which is composed of multiple dense blocks, forms dense connections with all preceding layers within the same block.This dense connectivity strategy promotes efficient feature reuse, allowing the model to capture intricate patterns effectively.The design optimizes performance with bottleneck layers, bulk normalization, and ReLU activations.These bottleneck layers employ 1×1 convolutions strategically prior to 3×3 convolutions, thereby effectively reducing computational complexity.Batch normalization improves the consistency of feature maps to increase training stability.ReLU activations introduce nonlinearity to model complex data relationships while mitigating the possibility of gradient vanishing.To control spatial dimensions, deliberate transition layers are inserted between dense blocks.These transition layers typically consist of batch normalization, an 1×1 convolutional layer, and a 2×2 average pooling layer, all of which reduce computational complexity.After the final dense block, global average pooling is used to convert 4D feature maps into 2D feature vectors, thereby reducing spatial complexity prior to the dense layers.Class predictions are generated by a dense layer whose size corresponds to the number of classes, followed by a 'LogSoft-Max' activation.The architecture of DenseNet201 is depicted in FIGURE 2(a) of the paper, which provides a visual representation of dense connectivity, obstruction layers, transition layers, and global average pooling.DenseNet201's architecture optimizes information flow, gradient propagation, and feature learning efficacy, making it a versatile option for a wide range of computer vision tasks [33].

2) ARCHITECTURE OF MOBILENNET-V2
MobileNet-V2, the third version of the MobileNet series, is a lightweight and highly efficient deep learning model that excels in resource-constrained environments, meeting the needs of mobile devices and peripheral computing systems [32].At its core, MobileNet-V2 features a streamlined architecture that has been carefully designed to achieve an optimal balance between model size and computational efficiency.This design paradigm is based on the incorporation of ''bottleneck'' layers, which are composed primarily of depth-separable convolutions.These layers play a crucial role in reducing the number of model parameters and computational complexities, thereby enhancing the model's efficiency while conserving its representational capacity.MobileNet-V2 is also distinguished by the ingenious concept of ''inverted residuals.''This design decision navigates the delicate balance between lightweight expansion and a linear constraint, enhancing the model's efficiency and adaptability.
In addition, MobileNet-V2 incorporates the ''squeeze-andexcitation'' module, which improves its ability to capture critical features by recalibrating channel-specific feature responses.Versatility is one of the model's most notable qualities.For class prediction, the architecture concludes with a fully connected layer containing the class size followed by the 'LogSoftmax' activation function.The architecture of MobileNet-V2 is depicted in FIGURE 2(b) of the paper, which provides a visual representation of the model's architectural details.MobileNet-V2's architecture is delicately designed to accommodate a variety of applications and constraints, making it an indispensable asset in computer vision and deep learning [32].In order to fine-tune the model further, we incorporated an exponential weight loss adjustment from deeper to shallower layers, as shown in Eq. (2).
Here, following a 1 2 i polynomial decay, L i(adjusted) represents the altered loss weight for the primary loss output L i .When i = 0, it presents the final output, refers to the ultimate loss function (i.e., L i(adjusted) = L i ), where it maintains a loss weight of 1.While the loss weights of shallower layers decrease gradually.In this case, the auxiliary losses 1 and 2 are multiplied by 1  4 and 1 2 , respectively.The collective characteristics of auxiliary outputs and the final output traverse custom auxiliary branches to generate an output with consistent characteristics.The auxiliary branches generate six classes for the final output layers, as described in the dataset, making RWC-Net highly supervised.For the auxilary output, we extracted features from the inverse residual modules of MobileNet-v2 and combined them with features arriving from DenseNet201.The final output was produced by concatenating the features of the final CNN layers of both models, followed by adaptive average pooling, flattening, and passing through a linear Multi-Layer Perceptron (MLP) layer and classifier.The inspiration for the use of auxiliary branches and loss function optimization came from the architecture of Inception-v3 [31].In our implementation of auxiliary branches, we mirrored the structure of the original Inception-v3 model.The feature vectors were average pooled with large kernels, such as 5 × 5, 7 × 7, etc., and then compressed by a convolutional block with a kernel size of (1,1).The original dimension of the feature map was then restored using a convolutional block with kernels of the same size.Utilizing effective feature pooling with (1,1) kernels facilitated consolidation of features across all three branches.After reducing the feature vector to a single dimension, it was passed through a linear MLP block and classifier.The MLP block has the same number of input neurons as the number of features in the 1D feature vector and the same number of output neurons as the output classes (six for all the categories of waste).To facilitate the classification task, both auxiliary and final activations were endowed with 'LogSoftmax' functions, as defined by Equation (3).
The RWC-Net architecture exploits the synergy between DenseNet201 and MobileNet-v2 in an effort to maximize their complementary capabilities and improve the model's accuracy in refuse image classification.

D. EXPERIMENTS
In this experimental section, a variety of deep learning models, each designed to classify waste into six distinct categories, were used to classify waste.The investigation was conducted on TrashNet dataset containing 2,527 images in total.To ensure reliable model training and evaluation, the training dataset was augmented to 15,000 images, with 252 images intended for validation and 504 images allocated for testing.This splitting approach was followed to the standard 70%/10%/20% split for training, validation, and testing for each fold of the 5-fold cross-validation of the dataset.

VOLUME 12, 2024
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Throughout the experiment, a 5-fold cross-validation strategy was employed, enabling a complete assessment of the entire dataset across various test sets (5 * 20% = 100%).

E. QUANTITATIVE EVALUATION
In this study, we have used a variety of deep learning models to classify six distinct waste categories, including cardboard, glass, plastic, paper, metal, and debris.To properly evaluate the performance of our models, we have employed established evaluation metrics that include precision, recall (sensitivity), specificity, and the F1 score.In this evaluation, these metrics were computed using data extracted from the confusion matrix, which contains crucial parameters such as true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).Notably, these metrics were accompanied by confidence intervals (CIs) of 95%, a crucial measure of the dependability and robustness of our evaluation outcomes.The confidence interval (CI) for each evaluation metric was calculated using the formulation outlined in Eq. ( 4) [34].
where N is the number of test samples and z is the significance level for 95% CI, which is 1.96.All values were computed using the global confusion matrix, which consists of all test fold results from the 5-fold cross-validation in respective investigations.In Eqs. ( 5) to ( 9), we highlighted the exact formulation of accuracy, precision, recall or sensitivity, specificity, and F1score for subject-by-subject evaluation in our study [35], [36].

Accuracy = TP + TN TP + TN + FP + FN
(5) To account for class imbalance, all metrics except accuracy were weighted.For accuracy, we reported the overall macro value derived from the confusion matrix for the entire dataset.
We have also displayed the confusion matrix for the model with the best performance.

F. QUALITATIVE EVALUATION
Class Activation Mapping (CAM) techniques were utilized for qualitative evaluation of our CNN-based deep learning models [37].CAM generates weighted activation maps for individual images based on the model's predictions, emphasizing regions that have a significant impact on these predictions.We conducted our analysis using Score-CAM, one of several advanced CAM techniques such as Grad-CAM [38], Grad-CAM++ [39], Smooth Grad-CAM++ [40], and Score-CAM [41].This method employs the model's unique characteristics to generate weighted heatmaps for test images within each class, thereby enabling visualization of the classifier's class-specific learning process.Score-CAM is distinguished by its reliance on the specific attributes of the trained model, unlike other CAM variants such as Grad-CAM, which utilize generic algorithms.In addition to the quantitative metrics used, CAM's qualitative evaluation provides a deeper comprehension of the performance of the model.It improves our understanding of how the model arrives at its predictions and provides additional validation for the CNN-based deep learning models used in this study.

G. IMPLEMENTATION DETAILS
In this study, the algorithm was implemented using the PyTorch deep learning framework to train waste classification models.Our server configuration for the train via Google ColabPro consisted of a single NVIDIA Tesla T4 with 15GB GPU memory, a 2-core Intel Xeon CPU at 2.00GHz, and 26GB of system memory.All investigations were conducted utilizing Python 3.10.12and PyTorch 1.11.0.

IV. RESULTS AND DISCUSSION
This section addresses the performance of our deep learning model in classifying the above-mentioned six categories of waste.Our thorough assessment includes both quantitative and qualitative evaluations of our experimental outcomes.
In addition, we provided a comparison with respect to various state-of-art models featured in the literature on waste management.As we analyze our findings, we also put light on the limitations we encountered in our research.In addition, we discuss prospective avenues for future research and development to address the issue.

A. CLASSIFICATION RESULTS OF THE DEEP LEARNING MODELS
The cumulative results of our deep learning models' 5-fold experiments are depicted in Table 2, highlighting their performance metrics and their ability to accurately classify images of waste.In addition, Supplementary Table 1 provides a comprehensive breakdown of the 5-fold results obtained by these models for a more in-depth understanding of the experiment.We explored various optimizers and learning rates to determine the most effective combination in the course of our investigation.The results of this exploration are presented in Supplementary Table 2 for comparative analysis.
The intention behind the development of the RWC-Net model was to leverage the unique feature extraction capabilities of two separate models, namely Densenet201 and Mobilenet-v2.This fusion resulted in the development of the RWC-Net model, which achieved the highest F1score of 95.01%across a variety of performance metrics,  outperforming various state-of-the-art models.Moreover, the model's overall accuracy was exceptional coming in at 95.01%, and it also exhibited notable precision (95.04%), recall (95.01%), and specificity (98.88%).These results demonstrate the model's proficiency in accurately classifying images of waste.In Table 3, we presented the class-based evaluation metrics to provide a more thorough evaluation.The ''cardboard'' class received the highest F1-score of 97.24%, while the ''litter'' class received the lowest F1score of 88.55%, as shown in the table.Remarkably, the model consistently produced F1-scores of at least 94% for the remaining classes.Notably, the ''litter'' class presented particular difficulties due to its limited representation in the original dataset, which consisted of only 137 images containing small-sized waste objects.This inherent class imbalance had a minimal effect on the overall performance of the model.For further transparency and deeper insights into our investigation, we present the fold-wise evaluation matrices in Supplementary Table 3.
In FIGURE 3, we present the combined class-based confusion matrix, a visual depiction of the collective results of all the 5 folds of our proposed model.The above diagram is an effective illustration of the model's ability to classify waste materials into all the six distinct categories of waste.The robust performance is evident as the model consistently demonstrates accurate and reliable classification across all the different categories of waste, reinforcing its impressive performance in classifying distinct categories of waste.

B. CAM BASED QUALITATIVE EVALUATION
In this section, we evaluate our proposed RWC-Net model using saliency maps, specifically Class Activation Maps (CAM) derived by the Score-CAM model.These CAM heatmaps highlight our model's ability to focus on the most relevant regions of an image and accurately classify various waste types.FIGURE 4 depicts a selection of four images from each of the six waste categories, accompanied by their respective original images and CAM heatmaps.Notably, these images were chosen at random from the original five-fold dataset to ensure a representative assessment of our model's overall performance.
To further evaluate the efficacy of our model, we generate heatmaps using Score-CAM.These heatmaps illustrate precisely where our classification model focuses its attention when classifying different categories of waste.FIGURE 4 demonstrates that, in the majority of cases, the model focuses primarily on the object's centre.For larger waste items that occupy a significant portion of the image, the model effectively concentrates on multiple regions within the waste region.FIGURE 4 depicts the consistent ability of our model to concentrate in on waste-containing regions of an image.This capability remains consistent across diverse waste categories and object sizes, demonstrating the model's proficiency in accurately localising and identifying waste objects.The qualitative evaluation conducted using CAM-assisted saliency maps provides valuable insights into the performance of our model, validating its capacity  to precisely identify and classify a broad variety of waste categories.

C. COMPARISON OF RWC-NET PERFORMANCE WITH EXISTING WORKS
The TrashNet dataset, which was published in 2016, has become the standard benchmark for waste image classification tasks, and has been utilized in a variety of research projects.Nonetheless, it is notable that several studies have neglected to employ cross-validation in their research.This oversight may lead to data leakage and bias during the assessment phase.In contrast, our research is committed to preserving the data's integrity, and we have adopted a five-fold cross-validation strategy.This method reserves 20% of the dataset for testing in each fold, ensuring that no data leakage or bias occurs during our investigation.By following this thorough cross-validation strategy, we ensure that our results are accurate and trustworthy.In Table 4, we present a comparative analysis of the performance of our model versus several state-of-the-art studies conducted on the TrashNet dataset.
In the realm of classical machine learning and deep learning, the implementation of k-fold cross-validation is of paramount importance.This method permits a comprehensive evaluation of the performance of a model across the entire dataset.Notably, a review of previous research, as shown in Table 4, reveals that the majority of investigations did not employ cross-validation.This omission has the potential to introduce bias and data leakage into the test set, thereby influencing the precision of the performance metrics applied to the TrashNet dataset.To underscore the significance of meticulous cross-validation, we present a concrete example.In a referenced study [48], researchers diligently employed a comprehensive 9-fold cross-validation methodology, yielding an impressive overall F1-score of 93.68%.In our own investigation, we opted for a five-fold cross-validation approach.The outcomes are compelling, as we achieved a remarkable overall F1-score of 95.01%.This performance metric not only attests to the effectiveness of our proposed model but also underscores the practical applicability and potential of the RWC-Net model in realworld waste management applications.

D. LIMITATION OF OUR WORK AND FUTURE DIRECTION
Our research aimed to improve the efficiency of waste management systems by classifying six distinct categories of recyclable waste using the TrashNet dataset.Despite the fact that the dataset provides a foundation for this endeavour, it has limitations that have impacted the performance of our deep learning models.The relatively small size of the dataset, which consists of a total of 2,527 images, is the most significant limitation.There are only 137 images in the ''litter'' class, which is especially concerning.This insufficiency of data proved insufficient for robust model training, and waste class, which consists of small waste items not covered by the other five waste categories, achieved an F1-score of 88.50%.The disparity in class representation hinders the model's ability to recognize and classify these waste items effectively.The dataset also lacks bounding boxes and segmented masks, which is a notable limitation.Each image depicts a single category of waste on a white background, limiting its applicability for more complex waste detection and segmentation tasks.In practical waste management scenarios, the capability of identifying waste within larger scenes and providing precise localization via bounding boxes or segmentation masks would be invaluable.Furthermore, the dataset only includes six categories of recyclable waste, whereas in actual waste management numerous waste types are encountered every day.Expanding the dataset to include a broader range of recyclable waste categories would result in a more precise and extensive representation of actual waste management challenges.To address these limitations, waste management research should prioritize the accumulation of larger and more diverse datasets in the future.This may involve the collection of additional images to improve class representation and accommodate more waste varieties.In addition, the annotation of bounding boxes or segmentation masks on images of waste represents an exciting direction for enhancing waste detection and segmentation techniques.By doing so, we can contribute to the development of more efficient and comprehensive waste management solutions that account for the complexities of actual waste scenarios.

V. CONCLUSION
Recycling is crucial to minimizing waste and optimizing waste management procedures.Utilizing automatic classification tools powered by models of deep learning to sort different types of waste can significantly improve processing efficiency and reduce operational costs in waste management.In our research, we developed a deep learning-based image classification model capable of categorizing six distinct waste types.Our proposed RWC-Net model, a combination of two renowned pretrained models, DenseNet201 and MobileNet-V2, performed exceptionally well at classifying images of waste.Through the combination of these model characteristics and the optimization of our loss function with the incorporation of two auxiliary outputs, we outperformed several existing models in the waste classification task with an impressive overall accuracy rate of 95.01%.In addition, our model attained an accuracy of 94% or higher for five of the six waste categories.To assure a thorough evaluation, the performance of our model was evaluated across all five folds of the dataset, providing an accurate representation of its capabilities.These outcomes surpassed the performance of several state-of-the-art models in the field of waste image classification, resulting in impressive advances in waste recycling processes.As a further demonstration of the robustness of RWC-Net, we generated Score-CAM-based heatmaps for waste images, which vividly demonstrate the model's proficiency at recognizing various waste categories.This visualization highlighted the model's precision in identifying waste objects, further establishing its utility in the classification of waste.The proposed method is ideally suited for incorporation into waste sorting devices, thereby enhancing the efficacy of waste sorting and recycling processes.Future research efforts may concentrate on improving classification accuracy, especially for the 'litter' category, and on waste detection, which may involve the incorporation of bounding boxes around waste objects in image data.Given the variation in waste generation and recycling practices between nations, our future research will involve the collection of waste images from various geographical regions to evaluate the adaptability and effectiveness of RWC-Net in diverse waste management systems.
MOHAMMAD NASHBAT received the B.Sc. degree in chemical engineering from the Jordan University of Science and Technology (JUST), and the M.Sc.degree in chemical engineering from University Putra Malaysia (UPM).He holds a Post-Secondary Instructor Certificate from the Memorial University of Newfoundland.He has extensive professional experience that directly relates to chemical process engineering.He was a Lecturer of chemical process engineering technology with multiple institutions, including Memorial University, the College of the North Atlantic, the Jubail Industrial College, and the Malaysia-France Institute.In these roles, he has designed and developed curriculum, taught various courses, and supervised students' capstone projects.His experience as a lead instructor demonstrates his ability to effectively deliver course material and support students' learning.His professional experience also includes positions as an application process engineer, a technology consultant, and an offshore DCS engineer/a senior training instructor.These roles highlight his practical knowledge and hands-on experience with process engineering equipment and distributed control systems.His expertise in the operation and troubleshooting of offshore platforms and training in simulation systems further enhanced his qualifications.Additionally, his research experience as a research assistant and his involvement in industry-related projects, such as converting palm oil mill waste to fertilizer and animal feed and water treatment, further demonstrates his practical understanding of chemical processes.
AZAD ASHRAF is a Lecturer of chemical and process engineering with the University of Doha for Science and Technology (UDST).He has expertise in renewable energy, chemical engineering, solar power, life cycle assessment, and environmental science.He has years of experience working in chemical, petrochemical, and wastewater industry in USA and Canada.He was with Dow Chemical Company, Union Carbide, General Electric and Crompton Corporation, USA.Previously, he has worked in gas separation, hydraulic fluid, and energy audit research.He also worked in the industrial wastewater and energy audit field in Canada.He was also appointed as an Environmental Officer and a Chemical Analyst for the United Nations (UN) Mission in Haiti (MINUSTA), where he was responsible for training environmental audit to 5000 military and police personal from various contingents all over the world.In addition to his industrial and corporate experience, he has more than 12 years of teaching experience with the colleges and universities in USA, Canada, U.K., and Bangladesh.He was a Research Supervisor (Energy Audit Team) with McMaster University, Canada.He taught Engineering with the Georgian College, Canada; the Chelsea College, London, U.K.; American International University Bangladesh (AIUB), Dhaka, Bangladesh.He has authored or coauthored 29 articles in peer-reviewed journals.His research was focused primarily on air, water, and land pollution assessment with life cycle analysis of products and processes.Open Access funding provided by 'Qatar National Library' within the CRUI CARE Agreement

FIGURE 1 .
FIGURE 1.An overview of our overall methodology for waste image classification.

3 )
ARCHITECTURE OF THE PROPOSED RWC-NETThe proposed RWC-Net model for waste image classification was developed using a combination of two deep convolutional neural network (DCNN) models: DenseNet201 and MobileNet-v2.The motivation for combining them was to leverage the complementary feature extraction and learning capabilities of both networks.To achieve this, we utilized pretrained DenseNet201 and MobileNet-v2 models, originally trained on the extensive 'ImageNet' dataset, to acquire rich image representations.Multiple auxiliary outputs were implemented to optimize the loss function and improve the model's overall performance.FIGURE2(c) depicts the architecture of RWC-Net, which includes two auxiliary outputs.The first auxiliary output extracts and concatenates features from the second dense blocks of DenseNet201 and the fifth inverse residual block of MobileNet-v2.The second auxiliary output is derived from the third dense blocks of Densenet201 and combined with characteristics from the final inverse residual block of MobileNet-v2.The final RWC-Net output was created by concatenating the DenseNet201 and MobileNet-v2 outputs, resulting in a comprehensive representation that combines characteristics from both networks.

FIGURE 3 .
FIGURE 3. The combined confusion matrix with class-based results.

FIGURE 4 .
FIGURE 4. Class activation maps (CAM) generated by Score-CAM of the six categories waste along with their respective original images.
MAZHAR HASAN-ZIA is a Lecturer of chemical and process engineering with the University of Doha for Science and Technology (UDST).He has expertise in renewable energy, chemical engineering, solar power, life cycle assessment, and environmental science.He has authored or coauthored a number of articles in peer-reviewed journals.ALI K. ANSARUDDIN KUNJU is a Lecturer of chemical and process engineering with the University of Doha for Science and Technology (UDST).He has expertise in renewable energy, chemical engineering, solar power, life cycle assessment, and environmental science.He has authored or coauthored a number of articles in peer-reviewed journals.SAIDUL KABIR received the B.Sc. degree (Hons.)from the Department of Electrical and Electronics Engineering, University of Dhaka, in 2022.His undergraduate thesis was the recognition of human activities based on smartphone sensor data using CNN and LSTM based models.His research interests include artificial intelligence and computer vision.MUHAMMAD E. H. CHOWDHURY (Senior Member, IEEE) received the Ph.D. degree from the University of Nottingham, U.K., in 2014.He was a Postdoctoral Research Fellow with the Sir Peter Mansfield Imaging Centre, University of Nottingham.He is currently an Assistant Professor and a Program Coordinator of the Department of Electrical Engineering, Qatar University.He is currently running NPRP, UREP, and HSREP grants from the Qatar National Research Fund (QNRF) and internal grants (IRCC and HIG) from Qatar University, along with academic projects from HBKU and HMC.He has filed several patents and published more than 180 peer-reviewed journal articles, more than 30 conference papers, and several book chapters.His current research interests include biomedical instrumentation, signal processing, wearable sensors, medical image analysis, machine learning, computer vision, embedded system design, and simultaneous EEG/fMRI.He is a member of British Radiology, ISMRM, and HBM.He has won the COVID-19 Dataset Award, the AHS Award from HMC, and the National AI Competition Awards for his contribution to the fight against COVID-19.His team was a gold-medalist in the 13th International Invention Fair in the Middle East (IIFME).He has been listed among the Top 2% of scientists in the World list, published by Stanford University.He is serving as a Guest Editor for Polymers, an Associate Editor for IEEE ACCESS, and a Topic Editor and a Review Editor for Frontiers in Neuroscience.

TABLE 1 .
The complete details of each fold data split for training, validation, and test set.

TABLE 2 .
The performance of different models for classifying the waste images with 95% of CI.

TABLE 3 .
The class-based performance of the proposed model on different categories waste with 95% of CI.

TABLE 4 .
A comparison of our proposed model with the existing research on TrashNet dataset.