Detection of Pits in Olive Using Hyperspectral Imaging Data

The unknowing presence of pits in olives is an undesirable and even dangerous experience that affects overall quality of products contain olives or serving dishes in salad bars, airline meals, and cocktail lounges. Therefore, detection for removal of pits and pit fragments are very important for maintaining a high quality of products/services. The contributions of the paper are a) new hyperspectral data captured in a specific range of the wavelengths that are suitable for detecting olives with pits inside; b) developing a classification method, that is the CNN classifier that yields a high accuracy of pit detection. We have collected spectral responses of Manzanilla (black and green) and Gemlik (black) olives in the 400 nm- 900 nm spectrum. Olives have been classified using 1D convolutional neural networks (1D-CNN) with 99.5% and 97.69% of classification accuracies of pitted and whole olives for training and test sets, respectively. Further boosting of the accuracy up to 98.27% on the test dataset has been attained by CNN with a dropout layer. As expected, the CNN attained a better performance than those of the Support Vector Machine (SVM), k-Nearest Neighbors (KNN), Random Forest (RF), and Logistic Regression (LA) classifiers.


I. INTRODUCTION
O LIVE (Olea europaea L.) traditionally grown in Mediterranean basin, is now cultivated around the world. It is a valuable natural food source and used for extracting cooking oil. Olives are an excellent source of Vitamin E and other powerful antioxidants which make them beneficial for the heart and may protect against cancers and osteoporosis. Being cured in salt, they are also high in sodium [1]. Olives, like other various fruits such as dates and cherries, are processed before consumption. Some of them go through a pitting process where pits are removed. Pitted olives are usually used in salads, cooked dishes, sandwiches, and pizzas. Olives have been pitted and stuffed manually for a long time. The first continuous olive pitting machine was developed in [2]. Most commercial pitting machines remove the stems and utilize punch needles to remove pits out of the olives. Current conventional pitting machines are able to pit at a rate up to 2500 olives/min with a minimum error rate of 1-2%. However, a higher error rate will be expected when the machine is not correctly adjusted [3]. The unknowing presence of pits in table olives or food is an undesirable and even dangerous experience that may lead to dental damage. Therefore, a flotation step usually is followed to separate the pitted olives from the non-pitted ones as well as their fragments. Finally, a visual inspection is done before packing the products. Pitts remaining hidden within the olives, result in an obvious risk of a significant reduction in the overall quality of the products. Hence, the industry is always seeking much more time-efficient and precise ways to detect pits. Over the last 50 years, light reflectance technology has provided rapid and non-destructive methods of food analysis. Among them, there are various techniques for the analysis of internal structure in fruits; pit detection has not been an exception. [4] used visible light scattering and [5] investigated near infrared (800 nm -830nm) for its potential of pit detection in cherries. The main issue was their methods were sensitive to cherry orientation. [6] patented a light transmission based model to detect cherries pits. [7] improved the previous model by considering different transmission wavelength regions including, microwave transmission, ultrasound reflection, and light beam transmission. The accuracy was close to 95%. [8] patented a pit detection device which was consisted of an infrared point source light and a linear CCD array. [9] showed that magnetic resonance imaging (MRI) could be a suitable technique for real time detection of pits in processed cherries. Later, they used one-dimensional nuclear magnetic resonance (NMR) projection to detect pits in olives under motion with the maximum classification accuracy of 97.7% at 15 cm s − 1 belt speed [10]. However, NMR was too expensive for industrial implementation. [11] investigated NMR to design, construct, and test a prototype on-line quality evaluation sensor utilizing high speed NMR techniques. The results showed better classification accuracy at the highest speed of 250 mm/s (100% and 99% for whole and pitted cherries, respectively). [12] used MRI technology for citrus seed identification with the classification accuracies of 88.9% and 86.7% for seedless and seed-containing oranges, respectively under stationary conditions. Later, they studied different MRI sequences obtaining a fast low angle shot (FLASH) and combined spiral radial (COMSPIRA) images as fruits were conveyed at 50 mm/s and the accuracy was increased up to 100% [13]. [14] improved the Hernández-Sánchez (2006) model by designing a genetic segmentation algorithm to determine the area where seeds were likely to present, then by using image processing techniques seed detection was done with the accuracies of 91% and 92% for seedless and seedcontaining fruits, respectively. [15] proposed a lower cost and simpler line scan x-ray inspection system to differentiate non-pitted cherries. It was able to detect 100% of missed non-pitted cherries, but orientation was critical. Other pit detection models such mechanical ones, that are not based on light properties are not included here. Hyperspectral technology has already been successfully established in the field of quality and safety inspection in a variety of food products: cereal and grain [16], [17], beans and nuts [18], [19], fruit and vegetable [20]- [22], meat [23], [24], diary [25]- [27], egg [28], [29], fat and oil [30], [31], liquid based and semi-liquid food [32], and spices [33]. To the best of our knowledge, this is the first report on using hyperspectral imaging (HSI) to detect pits in olives. However, in recent years there have been reported attempts to detect pits and their fragments in cherries using hyperspectral imaging. Although the differences in density and other physical properties between olive and cherry make it impossible to compare performance of pit detection models, but the the methods can be similar. [34] were the first to developed a hyperspectral imaging over the spectral regions of 450-1000 nm in combination with a feed-forward back-propagation neural network (BNN) to detect pits in tart cherries of three different sizes (small, medium, and large) as well as two different colors (light red and dark red). They also studied the bruising effect on pit detection by adding bruised cherries of two different post-bruising treatments (room storage vs. cold storage). Hyperspectral images of cherries were captured before and after removing pits under four different orientations. The accuracies were 96.5% and 96.9% for whole and pitted cherries, respectively. [35] examined the applicability of hyperspectral imaging technology in the VNIR wavelength range (400 -1000 nm) for detecting pits in fresh and frozen cherries of three selected cultivars. The detection accuracy was close to 86%. Later they improved their method to detect pits fragments in addition to the whole pits [36]. They used correlation-based Feature Selection (CFS) algorithm and 2nd derivative pre-treatments of the hyperspectral data to construct the BNN model. Their model was found to be 94.6% accurate for fresh cherries and 83.3% for frozen samples. Distinguishing between intact and damaged areas drilled cherries got accuracy of higher than 87% jointly for fresh and frozen cherries. A the same time in another study, they used principal component analysis (PCA) and second derivative pre-treatment to extract features of hyperspectral images, then designed a BNN model to detect whole pit or pit fragments in fresh cherries [37]. They obtained the accuracy of 81.4% for whole pit/pit fragments detection and almost 96% for distinguishing between drilled and intact cherries. In similar study, [38] investigated the potential of detecting pits and pit fragments (half pit + quarter pit) in fresh cherries of three cultivars using near infrared in the wavelength region from 800 to 2600. Partial least squares discriminant analysis (PLA-DA) was applied to the spectra of cherries and obtained overall accuracy of 95% for a binary model (no pit versus whole pit/pit fragments) using all reflections at each wavelength. Recently, a study has been conducted to find the optimal method for quick detection of the pit inside the processing fruit that is invisible to the naked eye [39]. Cherry data acquisition was carried out in the NIR region (740-850 nm). The proposed system was based on a histogram of Gradients (HOG) descriptor and the SVM classifier and it was 97% and 98% precise and accurate, respectively. The above mentioned studies confirm that hyperspectral imaging is an appropriate technology for detecting pits in cherries, independent of their size, color and orientation while capturing the images.
Recent advances in image processing techniques have allowed to implement new ideas for problems that used to be impossible to solve just a few years ago. Of those pioneer developments, Convolutional Neural Networks (CNNs) have become the de facto standard for various approaches in computer vision and machine learning in different areas including image and video analysis and enhancement. 1D CNNs have been proposed recently. They have shown a stateof-the-art performance in applications such as personalized biomedical data classification and early diagnosis, structural health monitoring, anomaly detection and identification in power electronics and electrical motor fault detection [40]. This is explained by the complexity of convolution, and more compact architectures. The most of the 1D CNN applications use only 1-2 hidden CNN layers and neurons lesser than 50 with lesser than 10,000 parameters. These "shallow" network is easy to train and use. It is shown that the compact 1D CNNs have a high performance on when there is a limited labeled data. Because of these features, 1D CNN architectures are suited for real-time, embedded and mobile applications such as electrocardiogram (ECG) classification and monitoring system [41]- [42], wireless and real-time structural damage detection [43], [44], real-time fault detection and identification for Modular multilevel converter (MMC) [45], and intelligent bearing fault diagnosis system [46]. When the training data set are insufficient, or in applications over 1D signals, 1D CNNs can obtain better results than 2D CNN. Besides, their compact and straightforward configuration, performing only 1D convolutions, provides low-cost hardware and realtime implementation [47]. While 2D-CNN is suitable to process 2D data such as images, 1D-CNN is commonly used to process sequential data, natural language data, 1D signals such as ECG, or time series data. For example, the authors of [48], [49] performed encrypted traffic classification task with 1D-CNN and obtained better performance than for the case when data were transformed to 2D and 2D-CNN was applied. In another study, for the speech recognition task of Civil Aviation's Radiotelephony Communication (CARC), an experiment in [50] used the same layer compared 1D-CNN and 2D-CNN with the same number of layers to process the acoustic features of the CARC's speech and the recognition results showed that 1D-CNN performs better than the 2D-CNN in terms of Word Error Rate (WER).One dimensional CNNs, have also shown considerable performance on variety of problems dealing with hyperspectral images of food and agricultural products. [51] used 1D-CNN in hyperspectral data to detect aflatoxin in the food products. At the same time, [52] showed hyperspectral imagery in combination with a 1D CNN has immense potential as a tool to detect and quantify spores on nutrient media as well as on specific food matrix (mashed potato). In this manuscript, to process olive hyperspectral data, we considered the classification power of the spectral signature of a pixel versus the spatial texture information of an individual spectral band in the neighborhood of a pixel, and the feasibility of the approach has been proved by the high classification accuracy yielded by the classifier.
The objective of this study was to demonstrate the efficacy of hyperspectral imaging in the VNIR wavelength range (400 -900 nm) as an accurate, rapid, and non-destructive method for detecting pits in green and black Manzanilla as well as black Gemlik and is organized as follows: The overview of the HSI classification procedure is presented in Section 2 which describes hyperspectral imaging as the source of data as well as 1D Convolutional Neural Networks as the classification model. Section 3 presents the results of the trained models and provides a comparison with multiple conventional classifiers such as SVM. Section 4 concludes the paper.

A. HYPERSPECTRAL IMAGING
Hyperspectral imaging (HSI), also known as chemical or spectroscopic imaging, is an emerging platform that integrates conventional imaging and spectroscopy to simultaneously obtain both spatial and spectral information from a sample. Similar to other spectral imaging, it collects and processes information from across the electromagnetic spectrum, but instead of providing only one (gray scale images) or three spectral values (conventional color images), it obtains more than one hundred spectral values for each pixel in the image as demonstrated in Figure 1. The resulting image contains two spatial dimensions and one spectral dimension which is called hypercube, data-cube, or spectral cub. In fact, a hyperspectral image contains a stack of gray scale sub-images one behind each other at different wavelengths and can be described as I(x, y, λ) where x and y are spatial coordinates, and λ denotes the number of wavelength channels. The hyperspectral image can be viewed as an image that each pixel contains a complete spectrum. Each material has an especial variation in spectral features mainly due to the difference in its composition and structure. The energy spectrum of a sample defines as the electromagnetic energy which scattered, absorbed, reflected, or emitted at a specific wavelength. This is also known as spectral signature or spectral fingerprint, since it extracts unique characteristics of a given sample with a wide spectral range [53]. Hence, HSI is a comprehensive technology that provides the nondestructive analysis of food products.

B. HYPERSPECTRAL IMAGING SYSTEM
The samples were imaged using a Resonon Pika II VNIR (Visible + Near Infrared) hyperspectral imaging camera (Resonon Inc., Montana, United States) coupled with an objective lens (Schneider Xenoplan 1.4/23-0902 from Schneider-Kreuznach , Bad Kreuznach, Germany). The Pika II camera is a push-broom (line scan) hyperspectral sensor with a spectral range of 400-900 nm in 240 wavelength bands. It also has 2.1 nm spectral resolution and 12-bit dynamic range or 4096. The HIS was controlled by a desktop computer with the software SpectrononPRO (version 5.1, Resonon, Bozeman, MT, USA) for image acquisition. Figure 2 shows our HSI system which consists of the following components: light sources, a Pika II hyperspectral camera, and a linear translation stage. Light sources provide illumination to the inspected samples; thus, their performance can greatly affect the efficiency and accuracy of imaging system. Moreover, the type of lamps and their positions are determined based on the hyperspectral camera specifications. For example, there is a study [54] showing noise in the spectrum when LED light illumination has been used during the data acquisition beyond 750 nm (NIR), as demonstrated in Figure 3. Therefore, we have used incandescent light to bring the benefit of capturing near infrared information into our experiment.

C. SAMPLES PREPARATION
Pitted and whole green and black olives of Manzanilla and Gemlik (Tirilye) cultivars were used as the training samples. In total, 1,000 samples (250 samples for each category) were selected to train the classification models ( Figure 4). Pitted and whole olives are NOT from the "same" samples. The test VOLUME 4, 2016

D. HYPERSPECTRAL IMAGER CALIBRATION
Prior to scanning, due to the differences in camera quantum and physical configuration of imaging systems such as illumination sources, accurate calibration is necessary to performed. Since the uncorrected radiance might be varied for different systems, or even for the same systems used in different times, imager calibration guarantees the stability and consistency performance of hyperspectral imaging sys- tems. The dark current characteristics of the camera array detectors creates a non-uniform response. To measure the average dark current noise (I dark ), all light entering the camera are blocked by turning off the lights and fully covering the objective lens with the opaque black cap, then the response is collected and averaged. The white reference image (I white ), representing the highest intensity values, was obtained by scanning a white target which reflects almost 99% of received light. The corrected image (I c ) was calculated using the following formula [55]: In fact, it is a radiance spectrum normalization which excludes spectral non-uniformity of the illumination sources and removes the dark current.

E. IMAGE ACQUISITION
To take images of the samples, we first arranged 8 or 10 olives (the same cultivar) into two rows and four columns without touching each other on a white plate. Both pitted and whole olives were scanned from the side in which their major axis were parallel to the plate' surface as shown in Figure 5.
Hyperspectral images were acquired in the 400-900 nm wavelength range, with a spectral resolution of 2.1 nm and 240 spectral bands. The values of each three-dimensional (x, y, λ) image were 640 × 200 pixels × 240 spectral levels, where (x) and (y) represent spatial dimensions and (λ) indicate the number of spectral bands, respectively. In total, 177 scans were obtained which included a data-cube in Band  Interleaved by Line (BIL) image coding, and a regular RGB image ( Figure 6).

F. DATA ACQUISITION
We extracted the spectral signature of each olive by averaging over all its pixels in the spatial domain. The training and test datasets contain 1000 and 173 data points respectively; each has 240 spectral values within the range of 400-900 nm. Mean spectral signatures of different olive cultivar are shown in Figure 7. Widely variable range of data values may cause the independent variable with larger scale being arbitrarily weighted more heavily by the learning model. To avoid this issue, standardization is done to distribute features of vastly different magnitudes, symmetrically around the mean of zero with a unit standard deviation. This means that the mean of all features becomes zero and the standard deviation of the resultant distribution is equal to one.
If the data has outlier or skewed, data scaling using standardization is difficult, because the calculated mean and   standard deviation will be affected by the outlying values. In this case median is best measure of central tendency. One solution is ignoring outliers of the mean and the standard deviation computation, then only the calculated values are used to scale the data. This scaling approach is called Robust Standardization or Robust data scaling that we used in this paper. In this approach, first the median (50th percentile) is subtracted from each value, then the results are divided by the interquartile range (IQR) which is the difference between the 75th and 25th percentiles [56]: The result data have zero mean, zero median and a standard deviation of one. Although the scaling is not skewed by outliers, the outliers are still present. Table 1 and Figure 8 describe the architecture of deep CNN used in this paper.

G. ARCHITECTURE OF 1D CONVOLUTIONAL NEURAL NETWORK
The architecture consists of four sets of convolutional (C2, C4, C6, and C8) and pooling (P3, P5, P7, and P9) layers. After applying the last set of convolutional and pooling layers, the size of output data is (13,128) where 13 and 128 are the dimension and the number of feature maps in the fourth set layers (last set layers), respectively. After applying the flatten operation, the data size will become 1664 (13×128). The last three layers in the architecture including the output layer, are fully connected. The 1D CNN model was built in Python 3.8 and TensorFlow framework 2.2. After four stages of convolutional and pooling layers stacked one after the other, the input pixel vector can be converted into a feature vector, which captures the spectral information. Followed by two fully connected layers, the binary classification is performed in the output layer. The best architecture may be the simplest, or it may be the most accurate and of a low computational complexity. In this manuscript, we are target accuracy by starting from the simplest CNN consisted of a Convolution layer, a Max pooling, and a fully connected layer and computed the accuracy. Then we started adding Convolution and pooling blocks one by one and computed the accuracy at each step. That allows to find an optimal number of Convolution and pooling blocks. Similarly, we checked fully connected layers for the best accuracy. The performance of any artificial neural network depends on the connections (weights) of neurons; therefore, it is very important to choose a proper weight initialization method. In this paper, all weights are initialized by "He" initialization method which is a current standard approach for networks using the ReLU activation function. The complete description can be found in [57]. Reference [58] proved mathematically that "He" approach is the best weight initialization strategy for the ReLU . In this method, the weights are initialized randomly with Gaussian distribution with zero mean and standard deviation of 2 n where n is the number of inputs to the neuron. Before setting an updating rule for the weights, it is necessary to set an "error" measure which is a cost function. In our implementation, we used the binary cross entropy loss function which is computed on a mini-batch of inputs as follows [59]: where, m defines the mini-batch size. Two variables y i andŷ i are i th predicted and true labels in the mini batch, respectively. All layers are trained using a back-propagation algorithm with Adam optimizer [60] which is an extension of stochastic gradient descent. Gradient descent is a way to minimize the cost function by updating the model's parameters in the opposite direction of the gradient of the cost function respect to the parameters. The size of the steps which are taken to reach to the local minimum is determined by the hyperparameter called "learning rate" or η. Adam, a combination of the 'gradient descent with momentum' and the 'RMSP' algorithms, is efficient when working with a lot of data or parameters. It computes the adaptive learning rate for each parameter. It is called Adam which is derived from "Adaptive Moment Estimation", since it estimates the first moment (mean) and second moment (uncentered variance) of gradient to adapt the learning rate for each weight. To estimate the moments, it calculates an exponentially moving average of the gradient, m t as well as the squared gradient, v t which are denoted as: where g t is gradient on current mini-batch. The new introduced parameters β 1 and β 2 control the decay rates and usually have the values of 0.9 and 0.999 respectively. Since they are close to 1, by initializing m t and v t to zero vectors, they are initialized as zero vectors, so m t and v t are biased towards zero especially during the initial time steps. To avoid this issue, bias correction is performed: Finally, the weights are updated in epoch t as follows : where w t and w t−1 are weights and η is the learning rate. Using ϵ (a very small number) prevents any division by zero in the implementation. The CNNs ability to automatically learn many filters under the constraints of a specific problem, results in highly specific features which can be detected anywhere on input data. Overfitting is a common problem in machine learning, which means the model works very well on training dataset but performs poorly in the test data set. In other words, it will be unable to generalize, so it will make inaccurate predictions when given new data. Due to limited training samples of hyperspectral data, HSI classification often leads to overfitting, so it is necessary to use additional techniques to avoid overfitting. We used L2 regularization which encourages the sum of the squares of the parameters to be small by applying a weight λ to the squared values of the network's weights, then adds it to learning model to minimize the cost function ,c 0 which is modified in the following equation: where m defines the size of min-batch, N is the number of weights and λ > 0 is regularization parameter needs to be tuned manually. Adding 1 2 as a coefficient simplifies the derivation process. Due to the assumption that a simpler model generalizes better, adding regularization term to the weights can decrease their values, then leads to the reduction of the bias which makes overfitting less likely to occur.

A. EVALUATION OF CNN MODEL
We conducted experiments to find the best CNN parameters during the model training phase. We have changed the filter size and number of feature maps in all convolution layers to maximize the classification results. The number of samples used for training and testing are 1000 and 173, respectively. A five-fold cross-validation was used to create and evaluate the proposed model. We determined the value of k=5, not so large to require a long running time and it also provided a base line for repeated evaluations. Each validation set was 20% of the training dataset, or 200 samples, close to the size of the actual test data. The training dataset was shuffled before being split; thereafter scaled using Robust Scaler.

1) Batch size
We examined the Batch Size (BS) = 4, 8, 16, 32, 64, 100, 128, 200, and 500. Figure 9 shows training, validation, and test sets performances of the model trained with different batch sizes. The average values of training and validation accuracies were calculated after all 5-folds were evaluated, therefore we added error bars on the top of their bar plots in Figure 9. The test accuracy was evaluated after applying the model on the test data. Learning and L2-regularization (λ) parameters (hyperparameters) were held constant with values of 0.001 and 0.0001, respectively. We kept the number of epochs to 200. According to the Figure 9, the validation as well as test VOLUME 4, 2016 accuracies for batch size of 64, 100, and 200 have close values, therefore we selected the batch size to be 100 for the study. Figure 10 shows the training graphs of the model after all 5-folds were evaluated for the learning rate, the regularization parameter, the batch size, and the number of epochs of 0.001, 0.0001, 100, and 200 respectively.

2) Number of epochs
Underfitting and overfitting are two major causes of poor performance of learning models. Underfitting occurs when the model is unable to capture the underlying trend of the data, therefore it can neither model the training dataset nor generalize to a new dataset. In contrast, if the model gets trained with so much of data, it starts memorizes the noise and fits too closely to the training dataset, so, it cannot generalize well to new data set. There are various reasons for underfitting and overfitting problems. Inappropriate training period is one of them. Stopping training too soon can cause underfitting, while so many training iterations may lead to overfitting. Figure 11 displays the effect of the number of epochs on the model performance. Learning rate, L2regularization, and the batch size were held constant with values of 0.001, 0.0001, and 100, respectively. As we can see, before the number of epochs is reached to 100, the model is underfit, and after 200 epochs, it is overfit. Based on the validation and test accuracies, we selected 200 epochs optimal for training the model.

3) L2 regularization parameter
The L2 regularization parameter (λ) is an input to the model, which reduces overfitting, but it also adds bias to the model to improve the generalization performance, that will prevent overfitting. In this case the prediction accuracy in the training dataset will not grow, but the test accuracy will increase. Since increasing of λ results in less overfitting but also greater bias, we need to find the best value. We initialized λ with a zero, then train the model on all training dataset, then we computed the training and validation average loss of fivefold cross validation. We repeated the process for a slightly larger value of lambda to see how it affects the variability of the training and validation loss. Fig. 12 shows the results.

1) Accuracy
It is defined as the number of correct predictions, divided by the total number of outcomes.

2) Precision
It s the ratio of true positives (TP) and total positives predicted. In this study, precision means what portion of olives that are classified as whole olives (with pit), actually have pit.

3) Recall or Sensitivity
It is the ratio of correct positive results divided by the number of all relevant samples (all samples that should have been identified as positive). For example, what proportion of olives that actually had pit are classified as whole olive (with pit) Sensitivity gives us information about the model performance with respect to false negatives (how many whole olives did we miss), while precision gives us information about the performance with respect to false positives (how many whole olives did we caught). So, precision is about being precise which means if there is only one whole olive, and it is classified as whole, (even if there are a large number of pitted olives misclassified as whole) the performance is 100% precise. On the other hand, sensitivity is more about capturing all olives that have pit (whole olives). If we wish to design a system to detect "all" whole olives in order to avoid any injury to customers, we should minimize false negative rates that is increase sensitivity to 100% and keep precision being within limits. Since pitted olives misclassified as whole ones do not cause any serious problem, false positives are not critical. However, too many false positives can cause wasting good olives.

4) F1 Score
It is the Harmonic Mean between precision and sensitivity. F1 Score ranges between zero and one. It shows how precise the classifier is (how many samples it classifies correctly), as well as how robust it is (it does not miss a significant number of samples), therefor F1 Score tries to find the balance between precision and sensitivity. F 1 = 2 × P rec × Sen P rec + Sen (11)

5) Specificity
It is the ratio of correct negative samples (TN) divided by the number of all relevant samples (all samples that should have been identified as negative). In this study, it shows what portion of pitted olives are correctly classified as pitted olive.

6) False Omission Rate (FOR)
It is the ratio of false negative samples (FN) divided by total negative predicted. In the other words, it measures the proportion of false negatives which are incorrectly rejected.
The FOR metric is a good measure as it indicates the percentage of pit containing olives that the classifier fails to identify as whole olives and thus, they continue in the production line as a correct product, which is not desired.

C. PIT DETECTION RESULTS
To demonstrate the effectiveness of the proposed CNN model, we compared with multiple conventional classifiers which are k-nearest neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM). For a fair comparison with CNN, the number of training and testing samples were the same for all classifiers. To find the best values of hyperparameters of the models, a grid search was implemented. Then a 5-fold cross-validation was applied to assess the performance of the models. The performance comparison in terms of accuracy of classification between CNN and other classifiers is tabulated in Table  2. All the classifiers were operated with optimal parameters. The proposed CNN has a superior performance among various models. Although the average accuracy of the RF model on the training set is 100%, however the test set accuracy is low, that means overfitting to the training dataset. Under the CNN model, all the training, validation, and test sets are more than 97% accurate. SVM results are very close to CNN; both perform significantly better than other classifiers. Although SVM trains much faster on large data sets than CNN, CNN has a higher precision due to its standard deviation of the validation accuracy which is lower than a half of that of the SVM. The Pika II camera has the ability to acquire the RGB and hyperspectral images of the same scene concurrently. We have provided the performance results of the four conventional classifiers on olive RGB dataset in Table 3. The results indicate advantages of hyperspectral over RGB data for olive pit detection.

D. DROPOUT
Hyperspectral data have a limited availability of training samples that leads to overfitting. Dropout is the popular regularization method [61] which prevents overfitting. The term "dropout" refers to dropping out nodes (hidden and visible) in a neural network which means they temporarily removed from the network along with all its incoming and outgoing connections. In fact, Dropout introduces a new hyperparameter, i.e., the probability at which nodes are dropped out.
In every iteration, the process randomly selects some nodes and removes them from the network. The dropout layer is usually added after each layer, to temporarily zero the neuron output with a certain probability which is called the "dropout rate". We added the dropout layer after each pooling and fully connected layers except the output layer. We used two different dropout rates: one for the convolutional blocks and one for the fully connected layers. Then, we trained the  Table 3 shows the pit detection results obtained by different classification models including CNN with and without dropout layers, with respect to accuracy, precision, sensitivity, f1, specificity, and FOR scores. All the models have been tested using their optimal value of parameter as shown in Table 2. Moreover, the classification results were obtained for the green and black olives, separately. Number of green and black olive in test dataset are 68 and 105, respectively which 25 samples of green olives as well 37 samples of black olives are whole (contain pit). From the results obtained, it can be seen that the performance of the CNN was improved by adding dropout layers with respect to almost all of the classification scores that increase the overall test accuracy from 97.69% to 98.27%. Although SVM performs slightly better than CNN even with dropout layer on the black olives, its performance on the green olives is even worse than LR.

IV. CONCLUSION
In this study, we have demonstrated capabilities of the hyperspectral technology as a non-destructive and reliable method for olive pit detection. We have collected hyperspectral data from different types of olives in spectral range of (400 -900 nm) and performed pit detection method using CNN. The data set consisted of 1,173 Manzanilla (black and green) and Gemlik (black) olives which were divided to training and test data with 1000 and 173 samples, respectively. For training and classification using the CNN, we have used an average of the spectral information for a group of pixels in the spatial domain. The performance of the CNN was compared with four conventional classifiers such as SVM, RF, KNN, LR The SVM model achieves 97% of the CNN test accuracy, however its average validation accuracy is lower, and the standard deviation is twice higher. The CNN model improves the overall performance when dropout layers are added to the architecture. The results in this study are promising and the future research will be extending the dataset and determining the optimal wavelengths for pit detection. The continuation of this study is detection also the pit fragments detect pit.

ACKNOWLEDGMENT
The publication fee for this article was supported by the UNLV MSI Open Article Fund. EMMA E. REGENTOVA received her PhD in computer engineering from the State Engineering University of Armenia (Polytechnic of Yerevan). She is currently a Professor in Electrical and Computer Engineering Department at the University of Nevada, Las Vegas. She has been working and publishing her research performed for numerous funded projects on the topics in Image processing, Computed tomography and compressed sensing, Remote sensing and hyperspectral image analysis, Computer assisted medical diagnostics, Biomedical research, Data compression, and coding, Intelligent Transportation Systems, Neutron and photon imaging for security and inspection, Digital system design, and embedded DSP.
KAZEM TAGHVA received his Ph.D in 1980 from the University of Iowa. He is currently the Chair and a Professor in Computer Science Department at the University of Nevada, Las Vegas. Prior to joining UNLV, he was chairman of the Computer Science Department at New Mexico Tech. His research includes Information Retrieval (IR), Machine Learning (ML), and Database Systems (DBMS). His work in IR is focused on retrieval from noisy text (OCR), and in ML covers Named Entity Recognition including proper nouns, addresses, and acronyms, and in DBMS deals with dependency theory. He was supported by Department of Energy for 25 years to construct the largest government database of documents for legal discovery (Licensing Support Network).