Journals & Magazines >IEEE Access >Volume: 6

Deep Learning Based Improved Classification System for Designing Tomato Harvesting Robot

This system has good performance for designing visual system.

Abstract:

Maturity level-based classification system plays an essential role in the design of tomato harvesting robot. Traditional knowledge-based systems are unable to meet the cu...Show More

Topic: AI-Driven Big Data Processing: Theory, Methodology, and Applications

Metadata

Abstract:

Maturity level-based classification system plays an essential role in the design of tomato harvesting robot. Traditional knowledge-based systems are unable to meet the current production management requirements of precision picking, because they are time-consuming and have low accuracy. Our research proposes an improved deep learning-based classification method that improves the accuracy and scalability of tomato ripeness with a small amount of training data. This study was on the relationship between different dataset augmentation methods and prediction results of final classification task. We implemented classification systems based on convolutional neural network (CNN), by training and validating the model on different augmented datasets and tried to choose an optimal augmentation method for datasets. The experimental results showed an average accuracy of 91.9% with a less than 0.01-s prediction time. Compared to the existing methods, our solution achieved better prediction results both in terms of accuracy and time consumption. Moreover, this is a versatile method and can be extended to other related fields.

Topic: AI-Driven Big Data Processing: Theory, Methodology, and Applications

This system has good performance for designing visual system.

Published in: IEEE Access ( Volume: 6)

Page(s): 67940 - 67950

Date of Publication: 02 November 2018

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2018.2879324

Funding Agency:

Contents

SECTION I.

Introduction

Tomato is one of the most popular vegetables in human daily life since it is consumed by millions every day. However, with the trend of aging work force, the labor cost is rising and it has become one of the limiting factors in many agricultural industries. On the one hand, a large number of agricultural enterprises are facing the challenges of low profit. While, on the other hand, with the growing world population, the production of tomato still needs to satisfy the demand. The harvesting robot of tomato seems to be a plausible way to solve these critical issues of keeping tomato quality control with reduced labor cost. Due to these reasons, many researchers have been working on developing robots for fruit and vegetable harvesting for the last few decades [1], [2].

The color of tomato is a major index to judge maturity. Tomato fruit passes through five different stages of maturity. These can be recognized through color changes from green turning to light pink, pink, light red, and then red, which classify them into five distinct categories. An appropriate appearance of the produce brings high price for the company. So, one must take into account the length of transport route and storage time for an optimal harvest. In general, from green color tomato needs 21 to 28 days for breakers, 15 to 20 days for the turning, 7 to 14 days for pink, 5 to 6 days for light red, 2 to 4 days for red stages [3]. Therefore, it is an important task to improve the tomato classification system for the design of harvesting robot.

In recent years, the method based on machine vision and pattern recognition has been well studied and applied, especially in many intelligent agricultural products’ processing or sorting [4]–[6]. Specifically, computer vision is one of the most important parts of the harvesting robot. However, the methods based on machine vision are selected by experienced personnel. Obviously, such methods have drawbacks in flexibility and timeliness that make them hard to apply in farm enterprises. Furthermore, the development of such system with good performance in terms of accuracy timeliness and scalability should resolve many challenging issues. These include tasks such as illumination variation, occlusions and so on for conducting work on various factors. Although many researchers have used machine vision technologies, there still has a long distance to supply for the automatic harvesting robot. Both accuracy and efficiency have not been achieved for designing such robot.

Recently, convolutional neural network (CNN) based classification systems have made ground breaking advances in many tasks. Deep neural network (DNN) system often encounters over-fitting problems although it has shown outstanding performance in many aspects. The problems of over-fitting are mainly caused by three reasons, i.e. complex models, data noise and limited training data [7]. To create a dataset with enough samples is often a difficult and time-consuming task. Particularly, some images are hard to obtain, such as specific disease of one kind of plant. Therefore, data augmentation is an effective way to pursue, by artificially increasing training data, when the number of images in the dataset is insufficient.

Our motivation for this study was to look for an efficient procedure for observing tomato ripeness. Moreover, we needed to find a method having a good performance both in predicting time and accuracy. Furthermore, we aimed to design the classification system with extensible capabilities, so that it can be applied to harvesting robot commendably. Based on the above considerations, we designed and implemented classification system as shown in Fig. 1.

FIGURE 1.

The flow chart of the classification system for harvesting robot.

Show All

We analyzed the relationship between tomato storage duration and changes in appearance, and then divided images of tomato into five categories according to ripeness in this study. We propose novel network model architecture with less complexity to implement the task of fast classification. For the dataset construction, we used several ways to collect the images of tomato, which include different ripeness levels. After labeling each image, we verified the accuracy of the dataset. Considering that the acquisition of data sets is a time-consuming work, we took advantage of several different methods for dataset augmentation. By comparing both the performance of training and prediction results of the model under different augmentation methods, we derived the most suitable augmentation method for this study. This method offers suggestion for designing tomato harvesting robot.

The structure of this paper is as follows. In Section I, we introduce the motivation of this research, and also some relevant background. In Section II, we show the establishment of our dataset, mainly including the collection of images, annotation, filtering of inappropriate data, and data augmentation. We build a novel framework for ripeness classification based on deep learning in Section III. Then, we show some results of the classification system on different datasets in Section IV. In Section V, we discuss the conclusions of this research.

SECTION II.

Related Works and Background

Estimation of tomato maturity is a significant and important study for automatic picking. To estimate the maturity of tomatoes, Goel and Sehgal [8] proposed a color based method, where the ripeness classification system achieved high accuracy. Pavithra et al. [9] used machine learning technology for automatic detection and sorting of cherry tomatoes and they designed classifier system to improve accuracy and economize time consumption. Lu et al. [10] used machine vision and Visible/Near-Infrared Spectroscopy technologies to comprehend rapid assessment of tomato ripeness.

Mohapatra et al. [11] adopted image processing approach for red banana’s ripening grade determination. Although these methods gave good performance through experiments in certain environments, they are still difficult to apply.

At present, the classification system based on CNN has achieved good results in many areas. In this regard, researchers have proposed different data augmentation methods. For example, Zhou et al. [12] proposed cross-label suppression dictionary learning for signal representation in face recognition to preserve the label property effectively. Chen et al. [13] proposed a novel approach that applies cascades of three deep convolutional neural networks (DCNNs) methods to detect the defect of fasteners. Li et al. [14] proposed FingerNet, which consists of one common convolution part and two different de-convolution parts to enhance fingerprint. To effectively suppress the outliers and accurately reconstruct the image from compressive measured data, Li et al. [15] presented a novel multiplier network based algorithm to achieve better performance in image reconstruction. Ma et al. [16] presented a new method based on variational Bayesian learning method, and achieved flexible performance for modeling vector with positive elements on Dirichlet process mixture of the inverted Dirichlet distributions. Also, there are some CNN-based studies for face recognition [17], wireless communications [18]–[20], automatic speaker verification [21] and internet of things [22]. Similarly, driven by the remarkable success of deep learning, CNN-based classification or identification systems have recently made ground breaking advances in agricultural industry. For example, investigations have been done on classification and identification system for crop diseases [23], [24]. Also many researchers have developed systems for plant identification and detection [25], [26].

Over-fitting is one of the most serious problems based on the CNN method. Beneficial to the effective way of preventing over-fitting problems, many deep learning-based studies exploit data augmentation methods. For example, a new approach based on CNN for alcoholism detection task with data augmentation methods only uses one hundred training image [27]. For road detection, Muñoz-Bulnes et al. [28] used two ways for dataset augmentation. First is a geometric transformation, which includes random affine transformations, perspective transformations, mirroring and so on. The second is called pixel value changes, which includes noise, blur and color changes. The final experimental results showed that training on data augmentation improved performance by 1 to 2% [28]. By random rotation or adding several kinds of noise separately, Hussein et al. [29] augmented CT images for nodule characterization of lung. They compared the identification accuracy on different datasets to prove the superiority of the proposed method. Ma et al. [30] proposed another novel method for bounded support data that can be used in many important applications.

SECTION III.

Dataset

A. Image Annotation and Verification

During this study, we took nearly 200 pieces of color images, containing five maturity levels, collected from farm during daytime under natural light conditions. Each maturity level included more than 30 tomatoes’ images. This was a relatively small amount of data compared to that needed generally for network training.

According to the market demand, we took both quality and storage life into consideration and classified acquired images into five categories. Table 1 shows both the expiry date and quality of each stage. By these standards, we divided these 200 images into their corresponding categories.

TABLE 1 Quality of Appearance and Expiry Time for Each Category

t-Distributed Stochastic Neighborhood Embedding (t-SNE) is an algorithm derived from Stochastic Neighborhood Embedding (SNE) [31]. The idea of t-SNE is to view whether high-dimensional data $x_{i}$ represents points in high-dimensional space. A nonlinear mapping method that is used to map it at low dimensional space $y_{i}$ .

In high dimensional space, the pairwise distance between two points is converted into a joint probabilistic distance $p_{ij}$ . The transformed equation can be formulated as Eq. (1).

$\begin{align*} p_{ij} =\frac {\textrm {exp}\left ({{-{\left \|{ {x_{i} -x_{j}} }\right \|} \mathord {\left /{ {\vphantom {{\left \|{ {x_{i} -x_{j}} }\right \|} {2\sigma ^{2}}}} }\right. } {2\sigma ^{2}}} }\right)}{\sum \nolimits _{k} {\sum \nolimits _{l\ne k} {\exp \left ({{-{\left \|{ {x_{i} -x_{j}} }\right \|} \mathord {\left /{ {\vphantom {{\left \|{ {x_{i} -x_{j}} }\right \|} {2\sigma ^{2}}}} }\right. } {2\sigma ^{2}}} }\right)}}},\quad \textrm {for }\forall i,~j:i\ne j \\{}\tag{1}\end{align*}$ View Source

In a low dimensional space, the pairwise distance between two points is converted into a joint probabilistic distance $q_{ij}$ defined as Eq. (2).

$\begin{align*} q_{ij} =\frac {\left ({{1+\left \|{ {y_{i} -y_{j}} }\right \|^{2}} }\right)^{-1}}{\sum \nolimits _{k} {\sum \nolimits _{l\ne k} {\left ({{1+\left \|{ {y_{i} -y_{j}} }\right \|^{2}} }\right)^{-1}}}},\quad for~\forall i\forall j:i\ne j \\{}\tag{2}\end{align*}$ View Source

By minimizing the Kullback-Leibler divergence measuring, t-SNE gets the low-dimensional represented by the cost function, can be formulated as Eq. (3).

$\begin{equation*} C=KL\left ({{\left.{ P }\right |\left |{ Q }\right.} }\right)=\sum \nolimits _{i} {\sum \nolimits _{j\ne i} {p_{ij}}} \log \frac {p_{ij}}{q_{ij}}\tag{3}\end{equation*}$ View Source

Therefore, benefiting from the advantage of t-SNE method, after manual annotation images, we exploit t-SNE method to check the distribution of the Dataset. Fig. 2 shows the result of a part of dataset. From the result we can find that the image is basically in accordance with the level of maturity gathered together. We also deleted undesirable data base in this result.

FIGURE 2.

The visualization of a part of dataset images on t-SNE distribution.

Show All

B. Augmentation Methods

Creating a data set for learning often requires a lot of energy. Collection, data cleaning, tagging, and so on, takes a lot of time. Taking into account the situations of more usage scenario, such as random noising and translation of size, this paper proposes two types of data augmentation operations to alleviate these problems.

1) Geometric Transformations

Scaling and rotations are two ways for geometric transformations. In order to find the best augmentation methods, we generated three datasets in this section.

$S(p,q),D(j,k)$ represent the source and target point of the discrete image. These two points present as $s(u_{p},v_{q})$ and $d(x_{j},y_{k})$ in Descartes coordinate system. For the scaling transformation, we exploit the following formula, shown as Eq. (4).

$\begin{equation*} \begin{cases} x_{j}= {s_{x}u}_{p} \\ y_{k}={s_{y}v}_{q} \\ \end{cases}\tag{4}\end{equation*}$ View Source

Where

$s_{x}$

and

$s_{y}$

are random non-negative scaling coefficient on horizontal and vertical axes. We generated the dataset S based on random scaling transformation. For the rotations transformation, we exploit the following formula Eq. (5).

$\begin{equation*} \begin{cases} x_{j}= u_{p}\cos \theta - v_{q}\sin \theta \\ y_{k}={u}_{p}\sin \theta + v_{q}\cos \theta \\ \end{cases}\tag{5}\end{equation*}$

View Source

$\Theta$

represents the angle between the rotation image and the original image in counter clock wise direction on the horizontal axis.

$\mathrm {\theta \in }\left ({0, }\right.\left.{ 360^{\circ } }\right]$

. We generated the dataset R based on random rotation transformation.

Then, we generated the datasets R & S based on datasets S and R. The number of each category in each dataset is shown in Fig. 3.

FIGURE 3.

The number of images for each category.

Show All

2) Random Noise

We adopted three types of noises, i.e. Pepper, Salt, Gaussian for data augmentation methods. Probability density function (PDF) of Gaussian expression as Eq. (6).

$\begin{equation*} p_{g}\left ({z }\right)=\frac {1}{\sqrt {2\pi } \sigma }e^{-\left ({z-\overline z }\right)^{2} \mathord {\left /{ {\vphantom {\left ({z-\overline z }\right)^{2} {2\sigma ^{2}}}} }\right. } {2\sigma ^{2}}}\tag{6}\end{equation*}$ View Source

Where

$z$

represents the gray scale of image.

$\overline z$

and

$\sigma$

represent the mean value and standard deviation of

$z$

respectively. The PDF of Pepper noise expression as Eq. (7).

$\begin{equation*} p_{P}\left ({z }\right)=\begin{cases} p_{P}& z=p \\ 0 &else \\ \end{cases}\tag{7}\end{equation*}$

View Source

Where

$p_{P}$

represents probability with pepper noise occurrence. The PDF of Salt noise expression as Eq. (8).

$\begin{equation*} p_{s}\left ({z }\right)=\begin{cases} p_{s}& z=s \\ 0 &else \\ \end{cases}\tag{8}\end{equation*}$

View Source

Where

$p_{P}$

represents probability with salt noise occurrence. Based on the above probability models, the corresponding random noises were generated respectively.

3) Combination

In order to study the relationship between different ways of data augmentation and the predicted results of this task, we combined the two ways of geometric transformations and random noise by adding Pepper, Salt and Gaussian to the datasets of R, S and R &S separately, and then got nine kinds of datasets for training. They are R & PN, S & PN, S & R & PN, R & SN, S & SN, S & R & SN, R & GN, S & GN, and S & R & GN.

SECTION IV.

Classification Architecture

In this paper, we designed classification architecture shown in Fig. 4. The purpose of design classification architecture is to maintain overall information, while preserving local details, with short response time of prediction. Based on the above considerations, we designed this architecture, which consists of three parts. The first part is to input color images of three channels, and these images have 200 pixels attributes of both the height and width. In the second part, we exploited five layers of CNN to extract features. The convolution kernel sizes are $\mathrm {9\times 9, 5\times 5}$ and $\mathrm {3\times 3}$ . In order to retain features, reduce unnecessary parameters and improve the speed of calculation, we also inserted two layers of max-pooling in CNN layers. The last part is for final result classification by a fully connected layer.

FIGURE 4.

CNN based architecture for classification system.

Show All

A. Feature Extractor

There are many ways to extract features, such as SVM [32], HOG [33], SIFT [34] and so on. In general, feature extraction based on image processing or machine learning requires the selection of appropriate features according to experience and is not suitable for expansibility of system. In the present study, we exploited CNN based methods to extract features of images without any extra pre-processing. This part mainly includes the following three sub-parts.

1) Convolutional Layers

There are five convolutional layers in the design of feature extraction architecture. The kernels of these five convolutional layers are $9\times 9$ with the dimension of 16, $5\times 5$ with the dimension of 32, 64 and 128, and $3\times 3$ with the dimension of 128.

2) Activation Function

Several activation functions have been proposed, where ReLU [35] is one of the most popular functions in many classification tasks. The function is defined as Eq. (9).

$\begin{equation*} f\left ({x }\right) =\max (0,x)\tag{9}\end{equation*}$ View Source

The function of ReLU only needs a threshold to get the activation value without other extra mass operations. Therefore, the network based on such activation function converges faster than most of these. That’s why, we exploited ReLU as the activation function of this architecture.

3) Pooling

When a deep learning based architecture gets deeper, it gets larger parameters, more calculations and easy to occur over-fitting phenomenon.

The most important function of pooling layer is to keep invariance in the main feature information, reduce parameters and prevent over-fitting. Commonly, mean-pooling and max-pooling are two forms for the pooling layer. Mean-pooling is calculating the average value of image area as the pooled value of this area.

Similarly, Max-pooling is choosing the max value of image area as the pooled value of this area. In this research, we exploited max-pooling with the size of 44 after each convolution layer.

B. Classifier

Each node of the fully connected layer is connected with all nodes of the previous layer to combine the extracted features with the previous edges. Due to the fully connected characteristics, generally there are more parameters than other layers. In this task, we exploited only one fully connected layer with 32 neurons to connect with the last convolution layer. Softmax [36], [37] model can be used to effectively solve classification problems. The function is given as Eq. (10).

$\begin{equation*} p_{j}=\frac {e^{x_{j}}}{\sum \nolimits _{i=1}^{K} e^{x_{i}}}\tag{10}\end{equation*}$ View Source

Where

$p_{j}$

represents the probability of each category

$j\mathrm {\in }\left [{ 1,5 }\right]$

and the value of

$k$

is 32. The states of softmax model are mutually constrained, that means only one of the

$K$

has value. By exploiting softmax model, we can calculate the probability of each category.

C. Training Strategy

We used cross-entropy [38] as the loss function during the model training. The loss function is defined as Eq. (11).

$\begin{equation*} C= -\frac {1}{n}\sum \limits _{x} \left [{ y\ln {a+}\left ({1-y }\right)\ln \left ({1-a }\right) }\right]\tag{11}\end{equation*}$ View Source

Where

$y$

is the expected output value, and

$a$

is the actual output. Stochastic gradient descent (SGD) [39] is exploited for each iteration. The expression can be given as Eq. (12).

$\begin{equation*} \theta ^{i}=\theta ^{i-1}-\alpha \frac {\partial }{\partial \theta ^{i-1}}J\left ({\theta ^{i-1} }\right)\tag{12}\end{equation*}$

View Source

Where

$\alpha$

is learning rate, it controls the range of change for every iteration

$i$

of a given path toward

$J\left ({\theta ^{i-1} }\right)$

, and partial derivative of function

$J\left ({\theta ^{i-1} }\right)$

to the variable

$\theta ^{i-1}$

, represents the direction of maximum change in

$J\left ({\theta ^{i-1} }\right)$

. Through the training of the model, we achieved a better performance.

SECTION V.

Experiments

The experiments performed on a Windows 10 64-bits PC equipped with an Intel(R) Core (TM) i5-7500 CPU @ 3.20GHz processor, and 8 GB-RAM. For deep learning technology, parallelizing calculation is an important power. Benefiting from GPU’s parallelizing power, we used NVIDIA GTX 1060 GPU having 3GB of memory to reduce our training time. Also, we used high-level neural networks application programming interface of Tensorflow to implement our proposed deep learning model.

A. Classification Based on the Dataset S, R, S & R

In this section, we train the model on dataset with geometric transformations.

Fig. 5 shows the curves of model accuracy rate changes as the number of iterations increases on S, R, and S & R datasets during training. From the results curve, we can find that the convergence of model on S & R is the fastest, followed by S. However, the convergence of model trained on R is the slowest and the accuracies of validation changes performance is not good as well (as shown in Fig. 6).

FIGURE 5.

Iteration of training accuracies changes on dataset with augmented methods of geometric transformations.

Show All

FIGURE 6.

Iteration of validation accuracies changes on dataset with augmented methods of geometric transformations.

Show All

Then, we took the trained model to predict 100 pieces of untrained images. The results are presented in Table 2. The best result is of the model trained on dataset S & R. It is worth mentioning that, although training on R did not perform very well, the prediction result is better than S.

TABLE 2 Predicted Results on Each Category of the Model Which are Trained on R, S and R & S

B. Classification Based on the Dataset S With Noise

In this section, we train the model on dataset S with three types of random noise. Fig. 7 shows the curves of accuracies of training changes as the number of iterations increases on S, S & PN, S & GN, S & SN.

FIGURE 7.

Iteration of training accuracies changes on S with noise of Pepper, Salt and Gauss separately.

Show All

From the results curves, we can find the models trained on all the datasets are fast to converge. After thirty epochs, all the models nearly complete to converge. Moreover, the accuracies of validation changes on these datasets are similar with the accuracies of training, and the accuracies of validation changes shown in Fig. 8. Beyond that, both the accuracy of training and validation reach at 96%.

FIGURE 8.

Iteration of validation accuracies changes on S with noise of Pepper, Salt and Gauss separately.

Show All

Then, we tested the predictive results of these models (see Table 3). The table shows that the dataset of S with Salt noise has the best effect on final results, followed by that with Gaussian noise and finally with Pepper noise.

TABLE 3 Predicted Results of Each Category of the Model Trained on S With Added Three Kinds of Noise

C. Classification Based on the Dataset R With Noise

In this section, we trained the model on dataset R with three types of random noise. Fig. 9 shows the accuracies of training changes as the number of iterations increases on R, R & PN, R& GN, R & SN. Unlike the series of S, we can find the convergence rate of all these datasets much slower. Besides, the validation accuracy based on all these datasets is hard to promote it when reaches at nearly 80%, as shown in Fig. 10.

FIGURE 9.

Iteration of training accuracies changes on R with noise of Pepper, Salt and Gauss separately.

Show All

FIGURE 10.

Iteration of validation accuracies changes on R with noise of Pepper, Salt and Gauss separately.

Show All

Although, training performances of these datasets are inferior to S series, they can be closely considered for the predicting results. The results are presented in Table 4.

TABLE 4 Predicted Results of Each Category of the Model Trained on R With Added Three Kinds of Noise

D. Classification Based on the Dataset Combined R & S

In this section, we train the model on dataset R & S with three types of random noise. Fig. 11 shows the accuracies of training changes as the number of iterations increases on R & S, R & S & PN, R & S & GN, and R & S & SN with respect to these three datasets. Fig. 12 shows the accuracies of validation changes during training on these datasets. From these curves we can find that the models trained under these datasets have the best performance both in convergence and accuracies changes. There were few shocks in training. Although training based on R series alone did not perform well, its performance improved when combined with S.

FIGURE 11.

Iteration of training accuracies changes on R & S with noise of Pepper, Salt and Gauss separately.

Show All

FIGURE 12.

Iteration of validation accuracies changes on R & S with noise of Pepper, Salt and Gauss separately.

Show All

This group also had better predictions than previous groups as obvious from the result shown in Table 5. Similarly, the model trained on salt noise provides the best prediction results.

TABLE 5 Predicted Results of Each Category of the Model Trained on R & S With Added Three Kinds Of Noise

E. Performance Comparison With Augmented Methods

Image brightness and flip changes are two important methods of augmentation and solve the problem of over-fitting effectively. It is due to the images brightness as changes prevalently exist in natural environment. Therefore, we compared our method with these two augmentation methods in this study.

Firstly, we generated two datasets, one augmented by random brightness changes (BRIG), and the other augmented by flipping both horizontally and vertically (FLIP). Fig. 13 shows the accuracies of training changes as the number of iterations increases on BRIG, FLIP and R & S & SN.

FIGURE 13.

Iteration of training accuracies changes on the datasets with three augmentation methods.

Show All

Fig. 14 shows the accuracies of validation changes during training on these datasets. From these curves we can find the models trained on R & S & SN still have the best performance both on convergence and accuracies changes compared with the other two datasets.

FIGURE 14.

Iteration of validation accuracies changes on the datasets with three augmentation methods.

Show All

Again, we took these three trained models to predict 100 pieces of untrained images. The results are given in Table 6. The best results are obtained for the model trained on dataset R & S & SN.

TABLE 6 Predicted Results on each Category of the Model Trained on BRIG, FILP and R & S & SN

F. Runtime Analysis

Additionally, to promote the system, time cost plays an essential role. Therefore, we take response time of this system into consideration with two experimental conditions. One is under Intel(R) Core (TM) i5-7500 CPU, and the other is adding with NVIDIA GTX 1060 GPU. We tested the response time of our system under these two conditions to predict 100, 300, 500 and 700 pieces of images. The results are shown in Fig. 15.

FIGURE 15.

The time cost of the classification model on CPU or GPU condition for different number of images.

Show All

The results demonstrated that the response time of our classification system was less than 1 millisecond (ms) per one hundred images, whether the device had parallel computing power or not.

SECTION VI.

Discussion and Analysis

Recently, there are a number of studies on data augmentation methods, and most of these methods provide good performance in certain fields. From our research, we consider the characteristic of this task was found concurrent to meet object distance and angle changes in tomato classification. Therefore, adopting rotation, scale change and rotation with scale change are the three methods we used to augment our datasets.

Furthermore, Noise exists widely in natural environment. For approaching our dataset to the real environment condition, we add three types of noise, i.e. Pepper, Salt and Gaussian to the dataset. Adding Gaussian and Pepper noise brings colorful pixel to images. As the colorful information of images plays an important role in the classification task, such methods can create confusion. Comparatively, adding Salt noise will not introduce other colorful information except white, so the results would be much better compared with the other two methods.

SECTION VII.

Conclusions

In this paper, we divide tomato into five categories according to different ripeness indices based on the relationship between the storage time and appearance. Here, we designed and implemented a novel architecture, based on deep learning for the classification of tomato maturity levels. Compared with other classical architectures, it has less parameter calculation and higher accuracy.

To achieve better performance of the designed classification model, we use t-SNE to verify the distribution of the dataset, and to delete bad images. In order avoid over-fitting problem during training of the model, we exploit three methods of augmentation for the datasets.

Through experiments on different groups of datasets, we obtained the best predicted results by training on the R & S & SN dataset. With this, the final accuracy reaches to 91.9% and the prediction time becomes less than 0.01 second per one hundred images.

References is not available for this document.

Deep Learning Based Improved Classification System for Designing Tomato Harvesting Robot

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Works and Background

Dataset

A. Image Annotation and Verification

B. Augmentation Methods

1) Geometric Transformations

2) Random Noise

3) Combination

Classification Architecture

A. Feature Extractor

1) Convolutional Layers

2) Activation Function

3) Pooling

B. Classifier

C. Training Strategy

Experiments

A. Classification Based on the Dataset S, R, S & R

B. Classification Based on the Dataset S With Noise

C. Classification Based on the Dataset R With Noise

D. Classification Based on the Dataset Combined R & S

E. Performance Comparison With Augmented Methods

F. Runtime Analysis

Discussion and Analysis

Conclusions

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Deep Learning Based Improved Classification System for Designing Tomato Harvesting Robot

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Works and Background

Dataset

A. Image Annotation and Verification

B. Augmentation Methods

1) Geometric Transformations

2) Random Noise

3) Combination

Classification Architecture

A. Feature Extractor

1) Convolutional Layers

2) Activation Function

3) Pooling

B. Classifier

C. Training Strategy

Experiments

A. Classification Based on the Dataset S, R, S & R

B. Classification Based on the Dataset S With Noise

C. Classification Based on the Dataset R With Noise

D. Classification Based on the Dataset Combined R & S

E. Performance Comparison With Augmented Methods

F. Runtime Analysis

Discussion and Analysis

Conclusions

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?