IncepX-Ensemble: Performance Enhancement Based on Data Augmentation and Hybrid Learning for Recycling Transparent PET Bottles

Recycling used plastic bottles is a significant step towards environmental protection and land pollution. Lifestyle changes in developing countries such as South Korea have substantially impacted the increase in the use rate of the plastic waste year by year. Plastic bottles of various types usually have varied recycling values. In most countries, human labor is used to manually categorize and handle recyclable waste. This study aims to provide an automated recyclable transparent plastic bottle classification system that can be used to replace existing trash disposal methods. Studies on the usefulness of Transfer Learning (TL) and Ensemble Learning (EL) techniques in image categorization have been conducted recently. At first developed InceptonV3, Xception, ResNet152, and DenseNet169 based TL structure. Then to enhance the level, we have proposed an ensemble model with InceptionV3 and Xception, named IncepX-Ensemble, to classify images in well-manner and poorly transparent plastic bottle images. After that, to evaluate the proposed algorithm, we have applied data augmentation to overcome the imbalanced problem. In our research, the accuracy for predicting transparent plastic bottles value reached 99.76% accuracy. The proposed ensemble model’s potential use and limitations have also has examined. This method provides the image classification of transparent plastic bottles and has essential potential value for environmental protection and pollution control.


I. INTRODUCTION
T HE rapid expansion in mineral water bottle and beverage bottle use has resulted in many problems, including resource depletion and degradation of the environment. Recycling is a crucial technique for dealing with these issues. Plastics that have been recycled are employed as raw materials in new products such as concrete, vehicles, and textiles.
Every minute, more than a million plastic bottles are sold worldwide. Nearly 13 million tons of plastic garbage are dumped into the oceans every year. Plastics take a long time to disintegrate. PET (polyethylene terephthalate) is a typical plastic bottle material that takes 450 years to completely degrade (naturally decompose).
In 2019, South Koreans consumed 23.5 billion singleuse plastic goods, according to a market research analysis by Statista, which includes roughly 23.5 billion single-use plastic bags (per capita consumption: 460 single-use plastic bags) 4.9 billion PET bottles (96 PET bottles per person) and 3.3 billion plastic cups (per capita consumption: 65 plastic cups). In 2019, South Korea's single-use plastic per capita was estimated to be around 11.5 kg.
In 2018, South Korea's plastic waste recycling rate was about 44.8%, and about 8,100 plastic waste was generated every day. The plastic recycling rate has been hovering between 40% and 50% in the past ten years. The average domestic garbage collection rate in South Korea has exceeded 50%.
Plastic waste has been increasing day by day in South Korea recently; more than 450,000 tons of disposable plastic bags and 70,000 tons of PET bottles have been used in South Korea.
The Ministry of Environment is working to amend current laws to make it easier for consumers to recycle plastic bottles. Plastic bottles accounted for 22.7% of all trash found on beaches last year, making them the primary source of beach litter. Paper and food waste grew by 3.8% and 7.7%, respectively, over the previous year. Plastic bottles account for 23.9% of all trash in the ocean. Plastic packaging, such as PET bottles, accounts for most of the waste accumulated along Korea's coastlines. Compared with other beaches across the country, Jungmun Beach and Hamdeok Beach have significantly more plastic garbage. To raise awareness about the need for environmental preservation, the Beautiful Community Forum held an Environment Photo Exhibition and a Recycled Garbage Exhibition at Gangwon's Sokcho Beach and Jeju's Hamdeok Beach. Plastic bottles were detected on Jungmun Beach in Jeju, Sokcho Beach in Gangwon, and Daecheon Beach in Chungnam than on other beaches. Food, plastic, and metal garbage were found on Jeju's famous Hamdeok beach, Pohang's Wolpe Beach, and Busan's Songjeong beach.
The highest recycling value is seen in PET water bottles. We can't recycle filthy plastic, and we can't recycle foodrelated plastic. Clean plastic is the only kind that can be recycled. Recycle bulky plastics to other waste disposal methods and dispose of disposable plastics that are difficult to recycle. At the same time, it contributes to the recycling non-renewable materials and the environmental protection of biodegradable plastics from renewable resources. Another thing to consider is that plastic production releases significant amounts of greenhouse gases that contribute to global warming. Recycling reduces oil and energy consumption, reducing greenhouse gas emissions such as carbon dioxide. Plastic recycling is also very beneficial at the social level. Recycling means reborn; technology has been applied to waste plastic bottles recycled into high-quality consumer items. Clear plastic bottles that have been recycled have been resurrected as clothing, eco-friendly purses, and cosmetic bottles, among other high-end things. Only 30%-40% of the community's old plastic bottles were recycled into high-quality consumer goods.
As a result, things made from a single material are the most straightforward to recycle. Water bottles, for example, are composed entirely of PET plastic [1]. Sorting waste is the first stage in the recycling process. Plastic bottle sorting can be accomplished in a variety of ways. A range of manual and automatic methods are used to classify plastic bottles. Day by day, manual classification has gradually faded away as science and research improved, improvement of image classification technology in machine learning resulted in efficient sorting. PET material does not need to be separated into different colors or processed in batches. We need to categorize each clear plastic bottle for the recycling process, and our proposed model can efficiently classify transparent plastic bottles. As a result, sorting must begin with the type of plastic resin used. The plastic bottles are squeezed or crushed when the separation process is completed. This helps with the conversion to pellets while also reducing the amount of space taken up by the bottles. A grinding machine is then used to break down the pellets into smaller pellets. These plastic pellets can be used to make a variety of items.
The image classification of bottles has excellent efficiency and potential and adds a lot of research value in reducing environmental pollution. The type of plastic bottles, based on their image, has received less attention. The image classification process is both cost-effective and time-consuming. Learning from limited images is a significant challenge in image classification using deep learning(DL) [2]. It requires lots of images to operate successfully and resultant correctly. Then, as a solution to this formidable challenge, transfer learning is offered in this work. For a variety of effective DL models, TL is the key to success [3]. These models are pretrained on a source dataset before being used and fine-tuned for achieving the goal. The most well-known pretraining dataset is ImageNet, widely used to improve image processing tasks like segmentation, detection, and classification [4], [5].
Trash classification, which has recently gained much attention from researchers, is also a viable industry computer vision application. CNN algorithms require a large amount of data for training purposes, and it takes more computation time for training. Therefore, we compare four distinct CNN models pre-trained on the ImageNet dataset to classify transparent plastic bottle images in our research. On the PET-image dataset, we first look at fine-tuning the weights. We compare the applicability of a linear decay learning rate schedule and cyclical learning rates for fine-tuning pretrained CNNs for transparent plastic bottle image categorization. Despite deep learning's higher performance in many areas, traditional machine learning is increasingly preferred over deep learning, as represented by CNN due to cost and other concerns [6]. The main contribution of this study is as follows: • This research begins by gathering 1667 images from six different classes. We proposed an ensemble model that can evaluate the PET-bottle data set, and the accuracy improved by using a deep neural network. Our main contribution is an ensemble model for enhancing the classification result for plastic bottle recycling purposes. • A data augmentation technique enhances the imbalanced transparent plastic bottle data set, reducing the effort and computation. • Our suggested module, the IncepX-Ensemble, has a higher classification accuracy using a weighted average ensemble for the classification approach and a more balanced single-category classification ability.

II. RELATED WORK
The unique quality of convolutional neural networks, i.e., learning features directly from images, has contributed to deep learning unprecedented image classification and recognition improvement. However, achieving an appropriate result in the image classification task with CNNs is dependent on a vast dataset, and the data should be appropriately labeled. Most importantly, it needs considerable computational resources. We applied a transfer learning strategy to reduce training time and accelerate convergence. It transfers generic information from pre-trained CNNs to build robust classifiers for a new target task. Instead of training the model from scratch, it uses another model trained on a similar problem to transfer the learned knowledge of the pre-trained model to a new model and then learn some new features. A computer-aided deep learning framework proposed by Liang et al. [7] for image classification of pneumonia in children makes use of a residual network combined with dilated convolution. The proposed model achieved a recall and f1score of 96.7% and 92.7%, respectively. Much work has been done in pneumonia disease detection using transfer learning. [8] proposed an ensemble approach that used chest x-ray images to classify pneumonia and achieved 96.4% accuracy and a recall of 99.62%. Waste classification has a significant impact on the environment. A well-known TrashNet dataset was used with convolutional neural network-based architecture (DenseNet121) by [9]. The authors achieved an accuracy of 99.60%. There has been a lot of research on a COVID-19 classification based on chest CT images [10] proposed a COVID-19 detecting model named as CCSHNet to improve diagnosis. The model was based on a transfer learning algorithm and the CCSHNet dataset. The model achieved 95.61%, 96.25%, 98.30%, and 97.86% accuracy for four classes, respectively. Yinghao Chu et al. [11] in their proposed multilayer hybrid deep-learning system, used CNN based method to classify waste that includes plastic, metal, paper waste and achieved 90% of accuracy. [12] proposed a smartphone application named SpotGarbage using deep learning architecture to detect and segment garbage in images. The proposed model was trained on the GINI dataset and achieved a mean accuracy of 87.69%. Authors [13] proposed a solution for sorting waste through the development of a stable model that can autonomously classify trash using deep neural networks. In their experiments, the authors used the ResNet model to increase the predictive performance and achieved an accuracy of 94% for the DNN-TC dataset, and 98% for the Trashnet dataset [14]. Next, a CNN-based architecture was introduced to classify waste for recycling purposes. Authors [15] proposed a model using VGG, Inception, and ResNet to detect waste types and achieved 88.6% of accuracy. To classify waste, authors [16] proposed a combination of three CNN models to achieve high classification results with two waste image datasets and achieved 96.5% and 94% of accuracy. Intelligent waste management plays an essential role in developing an environment, considering this authors [17] proposed a smart E-bin for waste classifica-tion using CNN and achieved 96% classification accuracy. Another research also shown for waste classification with a dataset having seven classes and using the DenseNet121 module, authors performed 93.3% of accuracy [18]. An experiment conducted for the image of waste bottle classification using resnet using serial attention frame and authors [19] achieved an mAP value of 94.1%. A garbage classification was proposed [20] using the YOLOv3 network to reduce pollution and maximize recycling resources, introduced the classification concept, and achieved a recognition rate of 95.33%. A machine learning algorithm was proposed [21] for automatic waste sorting machines with low computational efficiency using mobilenetV2 and inception pre-trained models and achieved 99.75% of accuracy.

III. METHODOLOGY
This section first describes various data augmentation techniques and the PET bottle dataset used for our experiment. Then further with the overall architecture explanation and the implementation details of each model. We give an overview of the proposed model and define how we implement the weighted average ensemble technique for improving classification results.

A. DATA AUGMENTATION
In our experiment, to avoid overfitting, we artificially increase the dataset. This data may cause some variation when someone else collects additional web or real-life data. Following the collection of data for each class, we increase the dataset using six alternative data augmentation methods [22]; the following methods are : 1) Affine transformation: Lines that are parallel before the transform remain parallel post-application of the transform. Affine transformations include scaling, translation, and rotation. We also utilize a transformation matrix in computer graphics, a handy tool for performing affine transformations. A transformation matrix is a matrix that may multiply a point's coordinates to get the transformed point. Therefore, the transformation matrix function can be defined as in equation 1 [23]. A transformation matrix is a matrix that multiplies the coordinates of a point to produce the transformed point. The best way to represent an affine transformation is to use a 2x3 matrix. Multiply this by [xyz]. Where (x, y) are the coordinates of the point. The idea of having a z is to mitigate shear at a value of 1. Multiply the 2 x 3 matrix by the 3 x 1 matrix to get the 2 x 1 matrix with the new point coordinates.
2) Rotation: The rotation function as shown in equation 2, where θ is between 10 and 180 degrees, is applied.
Vertical flip: With a specified probability, the vertical flip augmentation flips the input image along its horizontal (top to bottom) axis. Equation 5 and equation 6 give the vertical flip formula.
4) Shares: Finally, each image is shared, as shown by equation 7 the affine transformation below, s, defines the amount that each image is shared, and it is in the range of [0.1, 0.35]. 5) Scaling: We scale each image in either the x or y direction; particularly, we scale each image in the x or y direction, as shown is equation 8 6) Zoom-range: It is used for randomly zooming inside images. The image is zoomed in using the zoom augmentation approach. This approach enlarges the image by randomly zooming in or adding pixels around the image. The zoom range argument of the ImageData-Generator class is used in this method. If we use a float value to provide the zoom-in value, it will be [1-floatValue, 1+floatValue]. As in the case of the brightness parameter, zoom has some boundary values. The image is magnified if the zoom value is less than 1.0, and it is zoomed out if the zoom value is greater than 1.0.

B. DATASET
Our dataset has six different classes of plastic bottle images collected from the industry in Korea. However, it is an actual life plastic bottle image. This project aims to classify plastic bottles correctly before going into a recycling machine. There are few plastic bottle datasets available, and the data size, particularly for plastic bottles, is too small to use for classification purposes. The PET-bottle dataset was decided especially for this research, which includes six types of plastic bottle classes according to clear plastic bottles, colored caps with labels on bottles. This is related to an application requirement for which this dataset has been chosen. The data set consists of different bottle images with different angles. The recycling machine will detect different kinds of bottles. Actual bottles are given to the conveyor belt as input, and the system will sort the bottles to determine whether the plastic bottle is recyclable. The background will not be changed because the background will be the same as the conveyor belt background. Provides a high level of recognition speed is the fastest, and the accuracy is high. The main drawback. This technology does not recognize the bottle if it is damaged. Because of this, the machine with this technique cannot detect too many used bottles.
There is less amount of publically data available for the plastic bottle dataset. Those datasets also have mixed categories, not particularly in a plastic bottle. The collected data for our experiment is not a publically available dataset. At first, we collated 1667 plastic bottle images; later, we organized the data account to the different specifications for the experiment setup. Each image in the PET bottle dataset contains only one object, a plastic bottle, and a plain background. The human eye more easily perceives this but not by a recycler machine. There are no other objects in the image that could provide additional information. Initially, the images' original size ranges from 3024 x 3024 to 2240 x 2240 pixels. We first scaled all the images to 224 x 224 pixels format for our experimental setup. Unlike other classification datasets, each image in the PET bottle dataset contains only a single object, the PET bottle. This image is more straightforward for the human eye and more accessible for the computer to see. There are no other objects in the image that can provide additional information about the function. Another issue is that the dataset has a small amount of data. Because the input is merely raw images (3-dimensional arrays with height, width, and channels for computers), pre-processing will be required before they can be classified into the labels provided.
Before conducting our research, we applied a simple sorting with our PET bottle dataset. There are two types of transparent plastic bottles, mineral water bottles, and beverage bottles. At the same time, consider mineral water bottles containing plain drinking water, whereas beverages bottles have colored water with flavored soda-type liquid. We first conduct a unique cleaning process with the beverage bottles; as a general rule, a quick rinse is more than adequate. Fill the container with water for most items and vigorously swish the water around the inside. If the residue is sticky, we may need a scrub brush or scraper to remove more of the product to make it more useful for recycling. The labels on most plastic bottles must be removed. This non-plastic trash cannot be recycled and may compromise the structural integrity of the finished product. In our dataset, transparent plastic bottles have different designs and labels on their body and different colored cap on them. We make three unique classes according to different designs and colored caps, Bottle_ShapeA, Bottle_ShapeB, Bottle_ShapeC, respectively. The other three classes include M asinda and Samdasoo, the two well-known mineral water bottle companies, and P epsi as a beverage bottle. Table 1 and table 2 exhibit the datasets before and after augmentation, respectively.
Each image has one transparent plastic bottle category and a distinct folder as far as the marked classes go. We divided the dataset for each class having images into 60% for training, 10% for validation, and 30% for testing. For our experiment, we first split the dataset into 60% for training and 40% for testing data. Further, we split the holdout test data into 10%(0.25% of total holdout test data) for validation and 30% (0.75% of entire holdout test data) for testing. The following subsections provide an overview of the whole dataset. The following names are used to divide the transparent plastic bottle images: Bottle_ShapeA, Bottle_ShapeB, Bottle_ShapeC, Masinda, Pepsi, Samdasoo. The primary purpose of training the model using balanced images is to improve its learning stage. Because there isn't enough data for deep learning training, an augmentation procedure boosts classification accuracy. We have used two TL models for our image classification task, InceptionV3 and Xception networks. [24]. After that, a weighted average ensemble approach was developed that emphasized more accurate classification results. Finally, to check the efficacy of the proposed method, with the accuracy, precision, recall, and f1-score calculated. The workflow of our models is shown in Fig 1.

C. TRANSFER LEARNING
Transfer learning means that instead of learning everything from scratch, we can use another model that was trained on a similar problem so that it can transfer the learned knowledge of the pre-trained model to a new model and then learn some new features. We had experimented with both ResNet152 and ResNet50 with the same dataset; ResNet50 had poor performance in terms of accuracy. Therefore, we have provided the result for ResNet152. We choose DenseNets 169, not 201, both are the same accuracy, but DenseNets 169 is better in size and computation speed. Whereas DenseNet201 is a much bigger size and parameter, computational recourses should be high. TL focuses on storing knowledge gained from problem-solving and applying it to new related problems. Using the additional data from the ANN to decode it using the characteristics of the previous empirical training, then the ANN has greater generalizability [25]. We can use pre-trained and trained deep learning models on many out-ofthe-box data sets using TL techniques directly. Then decide which layers can be reused. Finally, we can use the output of these layers as input to train the network with fewer parameters and smaller scales utilizing the creation of these layers.
This study compares the InceptionV3, Xception, Resnet152, and DenseNet169 TL models. The final decision is based on the overall performance of the classification and the number of features. Finally, we use the InceptionV3, Xception, DenseNet169, and Resnet152 networks with previously trained parameters in the ImageNet dataset. For the parameter settings of these four networks, we use the initial input size of images (224 x 224 x 3), batch size (32), and epochs (100). The learning rate employs a decay learning rate, with a decay step of 5 and a decay rate of 0.1. After data processing and parameter settings, our next step is to choose an optimizer algorithm. The binary cross-entropy function is minimized using stochastic gradient descent(SGD) [26] optimizers that use gradient descent techniques extensively because of their ability to improve CNN performance while generating quick learning, and Adam (Adaptive Moment Estimation) optimizer [27]. Compared to gradient descents and stochastic gradient descents, Adam-optimized gradient descents are relatively stable and suitable for large datasets or parameters. Following a study of the methodologies mentioned above, the main goal of this research is to construct a network model using InceptionV 3 and Xception. The models are first pre-trained with the ImageNet dataset and then fine-tuned with the PET bottle dataset. Finally, the proposed method achieves satisfactory classification results based on a limited data set.   169  101  17  51  1  Bottle_ShapeB  238  143  24  71  2  Bottle_ShapeC  41  25  4  12  3  Masinda  249  149  25  75  4  Pepsi  339  203  34  102  5  Samdasoo  631  379  63  189  Total  1667  1000 167 500

1) InceptionV3 model
InceptionV3 is a variant of the GoogLeNet image recognition Inception architecture that was introduced in 2015 [28]. Generally, the Inception module has three different sizes of convolutions and a maximum grouping layer. Add channels to the network output of the previous layer through convolution and then perform a nonlinear fusion. Over-fitting may be avoided, and the network's expression and adaptability to multiple scales can be improved in this approach. InceptionV3 is a Keras-developed network structure that is pre-trained in ImageNet [29]. The image input size is 299 x 299 pixels with three channels by default. Compared with the previous version (Inception v1 and v2), the Inception v3 network architecture uses the convolution kernel partitioning method to divide large integrals into smaller convolutions. For example, a 3x3 line is divided into 3 x 1 and 1 x 3 lines. The segmentation method can be used to minimize the number of parameters. While speeding up network training, it can retrieve spatial features more effectively. At the same time, Inception v3 optimizes the Inception network architecture module using three regional grid sizes (35 x 35, 17 x 17, and 8 x 8), as illustrated in Fig. 2.

2) Xception Model
Xception often referred to as the extreme version of Inception, is a next-generation deep learning model architecture based on deep separable convolutional layers developed by F. Chollet of Google. The architecture of the Xception model is a linear stack of deeply separable convolutional layers with residual connections, which is helpful for the definition and modification of the deep network architecture. The Xception is a modification of the Inception architecture that uses distinct depth convolutions instead of conventional inception modules [30]. The depth-wise separable convolution comprises a depth-wise convolution, a spacial convolution conducted independently across each input channel, and a pointwise convolution; this is a 1 x 1 convolution that changes the input dimension. Another important factor is that this module follows the depth-wise and pointwise convolutions by a ReLU non-linearity. Fig.3 shows the architecture of Xception, which consists of three main stages: entry flow, middle flow, and exit flow. Excluding the first and last modules, the architecture has 36 convolutional layers, each with 14 modules. The linear residual connection surrounds all other modules.

3) ResNet152 Model
Multiple residual blocks are piled on top of each other in a Residual Network (ResNet) [31]. Each block has various layers, including two convolutional layers, followed by ReLU activation and batch normalization functions. Fig. 4 shows the architecture of ResNet152, which takes an input image of size 224 x 224 x 3, performs an initial convolution with a kernel size of 7 x 7, and then performs a grouping operation on the kernel size. 3 x 3 output with 7 x 7 granularity. The output of the grouping layer is then passed as a series of remaining VOLUME 4, 2016 blocks. Each block contains 3 layers: 1 x 1 convolution, 3 x 3 convolution, 1st x 1 convolution, followed by an average grouping and fully connected layer, and softmax activation to generate the imagenet dataset. [32].

4) DenseNet169 Model
DenseNet is similar to ResNet, but the main difference is that the output is connected, creating connections between each layer and other layers. The architecture of the DenseNet model is shown in Fig. 5. DenseNet has similar advantages to ResNet in that it eliminates the problem of gradient loss and many other benefits, such as improving feature propagation between layers, simplifying feature reuse, and drastically decreasing the network's overall learnable parameters [33].
It takes a 224 x 224 x 3 input image and processes it through a 7 x 7 kernel size and a 2-step initial convolution, a 3 x 3 kernel size, and a 2-step maximum pool operation. Maximum grouping results in 4 dense blocks and 3 transition layers. The convolution sequence of the transition layer is 1 x 1, and the average binning is 2 x 2. Softmax activation is used to perform global average binning after the 4th dense block. Dense blocks consist of 1x1x and 3x3x. Each convolution operation consists of stack normalization, ReLU activation, and convolution.

5) Ensemble Learning
Training many models instead of a single model and combining the predictions from these models is a successful strategy for lowering the variance of neural network models. Ensemble learning (EL) is usually used to improve the performance of the model (such as classification, prediction, etc.) or reduce the possibility of making wrong decisions [34]. EL uses a weighted average ensemble mechanism in this research [35]. Other uses of EL include: 1) Providing confidence in the conclusion of the model. 2) Selecting the best (or close to ideal) features, data fusion. 3) Incremental learning. 4) Abnormal learning. 5) Error correction algorithms.

D. PROPOSED INCEPX-ENSEMBLE MODEL DESCRIPTION
This section introduces a deep learning framework for recycling transparent plastic bottle images. The effectiveness of the weighted average ensemble approach for classification, we have integrated this technique to produce a more accurate result [36], [37]. The weighted average or weighted sum ensemble is a variation on voting ensembles that assumes all models are equally skilled and contribute proportionally to the ensemble predictions. Each model is given a fixed weight, multiplied by the model's prediction, and utilized to calculate the sum or average prediction. In the weighted average strategy, the most challenging aspect is choosing the relative weights for each model. The weights can be selected based on the skill of each model, such as classification accuracy or negative error, where a large weight means that a particular model is performing better. Figure 6 shows the Overview of Weighted Average Ensemble approach.
In our study, we have used two TL models according to classification accuracy and computation time. We have used fixed weights for each of the models. The weighted average has been calculated by multiplying each prediction by the model's weight to get a weighted sum, then dividing the value by the sum of the weights. As shown in equation 9.
Where Wae = Weighted Average Ensemble, P 1 and P 2 are the predictions from the individual models, W 1 and W 2 are the weights for each models.
This paper proposed a deep learning framework for recycling transparent plastic bottle images. The weighted average or weighted sum ensemble is a machine learning strategy that aggregates predictions from several models, with each model's contribution weighted according to its competence or performance. Figure 7 shows the general workflow of the proposed IncepX-Ensemble model architecture.  In the early phase, input images are resized to a 224 x 224 x 3-pixel shape for further feeding into the proposed architecture. Then use the image data class to use one-hot encoding to identify the label for each class in the data set.

2) Step2: Training Model and Validation
Following that, we have defined a function to generate a list of models for the ensemble. The experiment uses the performance of each ensemble model in the training dataset as the relative weight of the model when making predictions. Performance is calculated using classification accuracy as a percentage of accurate predictions from 0 to 1. The higher the value, the better the model and prediction. Each ensemble model is evaluated first in the training set and then in the validation set. The accuracy of the validation set is used as the model's weight.

3) Step3: Classification
Here, the testing data is sent to the tuned deep learning classifier of the proposed architecture, categorizing all bottle images into six classes.   Table 4 shows the performance of all transfer learning methods along with our proposed model tested on the PET bottle dataset. We have used the same dataset and the same training and testing ratio for all the models. The output is significantly impressed with the proposed fine-tuning approach in stateof-the-art CNN architecture. All the four TL models and our proposed IncepX-Ensemble model give accuracy of over 99% each. Maximum accuracy is achieved after applying the weighted average ensemble approach. Acc, Pre, Rec, F1, Sens, and Spec refer to accuracy, precision, recall,f1score, sensitivity, and specificity respectively. Compiling the model will be fine-tuned during training after completing the hyperparameter configuration and optimization algorithms. Model performance is evaluated using a test dataset consisting of 1260 images. Xception and InceptionV3 architecture performances are better than all other transfer learning architectures, but the weighted average of the ensemble surpassed all transfer learning models, including the Xception architecture. The weighted average performed more stable manner with a test loss of 0.04. The ensemble method has higher accuracy than the single TL method. However, among all the ways, our proposed method has the highest accuracy of 99.76%. Figure 8 to figure 11 shows the loss and accuracy variation of training and validation sets over 100 epochs. These graphs indicate that the IncepX-Ensemble model obtains the highest accuracy and the smallest loss value after 73 epochs. It suggests that the proposed model could quickly achieve the stable and generalization on the PET bottle dataset over 73 epochs. It can be seen from the curve that the Resnet152, InceptionV3, Xception, and IncepX-Ensemble networks are more stable.

IV. RESULTS AND DISCUSSION
We have shown 100 epochs to display the performance of other models; compared to other methods, the proposed model for classification results is constant after 28 epochs.   We compare our research with other recent research results of similar nature. Our proposed weighted average ensemble model outperforms as compared with all other models. Accuracy, precision, recall, and f1 score are ranking indicators. The best performance indicators are highlighted in bold in Table 5. Accuracy : The percentage of images that are correctly classified is called the classification accuracy. Use the following equation 10 to calculate the accuracy of the model.
Precision : Precision refers to the accuracy of the classification prediction algorithm given by the equation 11. P recision = T P (T P + F P ) * 100 Recall : The recall rate represents the efficiency of the classification prediction system, it is shown is equation 12.
F1-score: It simply the resulted mean of precision and recall, it is shown in equation 13.
Sensitivity : It is a statistic that assesses a model's ability to predict true positives in each category, it is shown in equation 14.
Sensitivity = T P (T P + F N ) Specificity : A is a statistic assesses a model's ability to predict true negatives in each category, is shown in equation 15.

Specif icity =
T N (T N + F P )

D. COMPARISON OF CLASSIFICATION EFFICIENCY
In the first, Table 6 shows the details of the four transfer learning models. These models are trained using the same initialization and learning speed strategy. It can be seen that the classification performance of the proposed ensemble model is the best among all transfer learning methods. We use the proposed model to compare training loss, validation loss, training accuracy, and validation accuracy with other TL methods.
A confusion matrix is a technique for evaluating machine learning classification performance. Figure 12 shows the confusion matrix of inceptionV3, xception, and IncepX-Ensemble model for the PET bottle dataset. The vertical axis is the actual category, and the horizontal axis shows the predicted categories. The confusion matrix's diagonal boxes from the top left corner represent the correctly predicted images in the test dataset. The remaining boxes show the results that were incorrectly predicted. This figure shows that it is challenging to classify classes with similar characteristics. However, the proposed IncepX-Ensemble model improves accuracy by reducing misclassified cases. As shown in the figure, for inceptionV3 total of 21 images had incorrect predictions as class 0, class 1, class 2. Class 3 and class 5. However, for xception, 1 image has been misclassified as class 0, 1 image has been misclassified as class 4, and 3 images have been misclassified as class 5. Where our proposed model shows a slightly better result than the other two models, 4 images have been misclassified.

V. CONCLUSION
We propose an ensemble model based on transfer learning, called IncepX-Ensemble, which can successfully classify images of transparent plastic bottles. We use ensemble learning methods combined with data enhancement to avoid overfitting in the model training stage, mainly due to the problem of obtaining a large amount of training data due to the high cost of labeling. This document proposes a weighted average ensemble model, which improves the classification performance index by adjusting the deep migration learning architecture. It is used to detect images of transparent plastic bottles. One of the challenges is developing a model that can be deployed in a system with low computational efficiency. This proposed application does not recognize the bottle if it is semi-damaged or damaged. We want to improve the model for semi-damaged or damaged plastic bottles for better classification.
The proposed method employs two algorithms to leverage InceptionV3 and Xception Convolutional Neural Network pre-trained weights on ImageNet as an initialization for the new model. After preprocessing and scaling the image, the Adam optimizer is used to minimize the binary cross-entropy function. Our proposed strategy outperforms the baseline model on several performance indicators and leads to the initial results. With 99.76% accuracy, 99.38% precision, 99.81% recall, and 99.86% f1 score.
The possibilities of our study and the limitations of the pro-posed strategy were examined in light of the initial findings. Hopefully, this will play a positive role in managing plastic bottle waste and environmental growth.