Zanthoxylum bungeanum Fruit Detection by Adaptive Thresholds in HSV Space for an Automatic Picking System

Zanthoxylum bungeanum is widely cultivated, and some automatic picking systems or robots are researched to pick the pepper fruit instead of human hands. The detection algorithm is very important to an automatic picking system. Hue, saturation, and value (HSV) color space is widely used in color image segmentation. There is a fixed threshold based on HSV color space, or there is a dynamic threshold based on the Otsu method after the color images are converted into the gray images, because the Otsu method is very popular for the gray pictures. After the evaluation of the fixed threshold and the Otsu method for the pepper fruit detection, we propose an adaptive threshold method directly based on the HSV color space, called AHT and BBSV, which means adaptive hue threshold and balance between saturation and value. There is an adaptive threshold of separate hue, saturation, and luminance component. The hue threshold is obtained according to the conditions of whether there is soil, rock, and the pepper fruit in the image. The saturation and luminance thresholds are obtained by keeping the sum of saturation and luminance unchangeable. There are many hundreds of pictures for the test dataset. Our proposed method works very well, and the recall rate, accuracy, and false alarm can separately achieve 100%, 100%, and 0%, and to evaluate the location precision, we address two metrics: the ratio of the overlap area between the detection region and the ideal region and the location error. The ratio of the overlap area of our method is above 64%, and the location error is about 8%. All performance has been greatly improved compared with the fixed threshold.

Chinese pepper or Sichuan pepper. Regarding the planted area or the production of Chinese pepper, China ranks first in the world [1]. So far, the pepper fruits are picked by hand at most places, and the workers are prone to be hurt because of the thorns in the tree branches. Moreover, the pepper tree height is generally 2-5 m, so the workers are difficult to pick the pepper fruit on the so high branches. It is always short of workers although the wages grow rapidly, since there are few and few farmers in the countryside with the economic development in recent years. Therefore, some researchers proposed new picking tools to improve picking efficiency [2], [3], and other researchers proposed automatic picking systems or intelligent picking robots without manual control to reduce the worker's number and improve efficiency greatly [4], [5], [6].
The machine vision device is the core part of the automatic picking system of zanthoxylum bungeanum fruits. It takes a photograph and detects the pepper fruit from the photograph and gets the location of the pepper fruit; then, the pepper fruit is picked and collected automatically by the system. So, the detection method of the pepper fruit is very important. The detection of the pepper fruit is the method of assigning a label to the entire pixel in an image, such that pixels with the identical label contribute to the pepper fruit, which is meant as image segmentation.
Thresholding is one of the most used methods for image segmentation. The method of thresholding is to segment an image into two or more object regions. Many methods are proposed to find out the optimal threshold, which maximizes or minimizes an objective function. The goal of thresholding is to select a set of thresholds, which can discriminate between object and background pixels. Bi-level thresholding selects only one threshold, which separates the pixels into two classes, while multilevel thresholding determines multiple thresholds, which divide the pixels into several groups. The Otsu method is one of the traditional threshold selection methods. The optimal threshold is calculated according to the distribution of the gray-level histogram based on variance and intensity. It is very popular for the gray pictures [7], [8], [9].
Since there are three color components in color pictures, thresholding of color image segmentation is complex and always different according to the different implementations based on the choice of the color space. A color space is used to discriminate different colors. All the sets of colors are specified by a 3-D coordinate system and a subspace with the color component and specific models, which can reconstruct all the colors of the pictures. The frequently used color spaces are RGB, hue, saturation, and value (HSV), CIELAB, YCbCr, and so on.
Since the Otsu method is used for the gray images, color images can be converted into gray images for the application of the Otsu method. Color image conversion can be based on the Karhunen-Loeve transform, which is one representation of a stochastic process. The process is to map the multidimensional data with correlation into a new coordinate in the region of the data distribution in order to compress the data information, while the orientation of the new coordinate should keep the maximum amount of the information. This method is helpful to achieve dimensionality reduction of high-dimensional data. Then, the multithreshold Otsu method is used and three pictures show good segmentation result [10].
RBG color space can be used for color images, and the multilevel Otsu algorithm is used for separate R, G, and B channels [11]. Beyond RGB color space and the multilevel Otsu method, group search optimization (GSO) is used to get the dynamic threshold. The method is able to outperform the comparison approaches for the tests considering three, four, and five fixed thresholds for each RGB channel, and its performance is at least as good as the best comparison approach considering two thresholds for each RGB channel [12]. RBG color space is also used to detect the pepper fruit, and the value of the red channel minus the green channel of the pepper fruit is not the same as the background and leaves, and the fixed threshold is used to split the pepper fruit region and other regions. There are 200 pictures with the pepper fruit photographed in the natural scenes including sunny and cloudy days. The location error of the center of the fruit bunch is about 10 mm [13].
CIELAB color space is used to detect tumors. Color pictures are converted to CIELAB color space from RGB. The A and B layers consist of all the color information that exists in the image. K -means divides the objects into several clusters using the Euclidean distance metric. The detection of the tumor uses support vector machine (SVM) classifier. Three pictures are tested and the tumor is found [14]. LAB color space is also used to detect the pepper fruit. According to the obvious red tone of the mature pepper fruit, the image is converted from RGB to CIELAB color space, and the A component is used with the K -means clustering algorithm by K = 3. The algorithm can work with the pepper fruit, but it fails when there are many branches and trunks [15].
YCbCr color space is used to detect navel orange. The Otsu threshold segmentation algorithm is used to calculate the threshold value of the Cr component in the YCbCr color model, remove the background of navel orange, and obtain the binary images of the segmented navel orange area. A total of 98 images including positive, normal, and reverse light images are randomly selected and the recognition rate is about 87% [16].
Only the hue component in HSV color space and threshold ranged between 110 • and 130 • is used to classify good leaf surfaces and backgrounds; 60 experimental results show that the algorithm can identify the leaf surface accurately and represent a 100% accuracy rate [17]. Only the hue component of HSV color space is also used to detect the pepper fruit. The histogram of the hue component is calculated, and the threshold in the histogram is limited to a small range. The pepper fruit region is obtained from the threshold by the Otsu algorithm, which is called the improved Otsu algorithm. The threshold is fixed by three values according to the three conditions: direct sunlight, backlight, and shade. There are 180 pictures with the pepper fruit to test, and the correct recognition rate is about 90% [18]. The pepper fruit is detected by a fixed threshold of the hue component in HSV color space after the homomorphic filter is used to compensate for the light at the edge of the pepper fruit, and according to the round shape, the degree of circularity of the region of the pepper fruit is calculated. There are 235 pictures with the pepper fruit from the pepper planting base to test, and the average recall rate is up to 94% [19].
HSV color space and all of the HSV components are used to detect human skin. The hue (H ) component, which is scaled from 0 • to 360 • , was divided into 36 primary intervals. The saturation (S) and value (V ) components, which are scaled from 0% to 100%, were separated into ten primary intervals. Each was applied to the 951 skin samples extracted from the pictures of the individuals comprised of the human skin samples and nonskin samples, and we observed which intervals contained at least one tone which could be regarded as a human skin tone. The HSV filter selected a well-defined band in the geometric representation of the color space reducing the spectrum to 94 030 tones (2.5352% of the total spectrum), thus rejecting 97.4648% of colors as probable human skin tones [20]. The histogram of each component in the HSV color space is calculated to detect the tactile paving and the threshold is dynamic according to the average and standard deviation of each histogram. There are 870 images taken in several indoor and outdoor environments in Europe and Asia to test, and the accuracy is about 91% [21]. The fixed range of the hue and value components is used to detect the green pixels; 150 green fluorescence images are tested and the acceptance rate was 90% [22]. In the image of tea bud, there is no obvious boundaries change in the histogram of separate R, G, and B channels in the RGB color space, and the image presents a relatively obvious boundary after HSI/HSV transformation, and it is easier to set the threshold value to separate buds [23]. The image is color-converted by HSV; then, the region of interest (ROI) is selected by the fixed threshold to detect and track the moving target [24].
HSV color space is also used to detect human faces [25], forest fires [26], koi fish [27], and so on.
Some researchers used hybrid color spaces. The HSV and YCbCr color spaces are used to detect mango leaves. Each component including H , S, V , Cb, and Cr is compared with the threshold of the histogram by the Otsu method, and 62 images with mango leaves are tested; Cr component has a better performance, where the average recall is about 97% [28]. Both RBG and HSV color spaces are used to detect the pepper fruit. The R component is split with a fixed threshold of 0.6, and the H component is split with a fixed threshold of 0.9. Then, the result of R and H is multiplexed to get the final detection result [29].
Compared with other color spaces, HSV is closer to people's visual features, and HSV color space is dominant for color image segmentation, especially for the pepper fruit with the red color character, so we use HSV color space in our research, and the thresholds in HSV space are mostly fixed or dynamic by the Otsu method or k-means algorithm. The fixed thresholds meet the limited conditions, so we try to get the dynamic or adaptive thresholds. The object function of the Otsu method is equivalent to that of the k-means algorithm in multilevel thresholding [30]. We evaluate the Otsu algorithm for the detection of the pepper fruit before we propose our method to get the adaptive threshold. Our method is that the threshold of each component in the HSV color space is adaptive according to the conditions of every image.
The rest of this article is organized as follows. In Section II, RGB and HSV color model is introduced, and we give the general flowchart to detect the red pepper fruit. The typical scenarios of the detection are defined and the Otsu algorithm is evaluated in Section III. Section IV describes our proposed algorithm and flowchart. The simulation and experimental results are discussed in Section V. The conclusion is presented in Section VI.

II. COLOR SPACE AND GENERAL FLOWCHART
We generally regard the basic colors as red, green, and blue and define other colors as a mix of these three. So, the RGB color space is basic, but what we perceive as color seems to depend on the characteristics of brightness, hue, and saturation, which is HSV color space. So, we introduce the relationship between HSV and GBR.

A. RGB Color Space
In this color space, the primary colors or color components are red (R), green (G), and blue (B). It is an additive model, in which colors are produced by adding components, with white having all colors present and black being the absence of any color. This is the model used for active displays, such as television and computer screens. The RGB model is usually represented by a unit cube with one corner located at the origin of a 3-D color coordinate system, the axes being labeled R, G, and B, and having a range of values [0, 1], which is shown in Fig. 1(a). The origin (0, 0, 0) is considered black and the diagonally opposite corner (1, 1, 1) is called white.
Mixtures of light of these primary colors cover a large part of the human color space and thus produce a large part of human color experiences. This is why color television sets or color computer monitors need only produce mixtures of red, green, and blue light.
So, the ith pixel in the color image can be expressed by where x and y are the indexes of the width and height of the image, and r , g, and b are the red, green, and blue value, which is in the range from 0 to 1.

B. HSV Color Space
The HSV color space is a description of the color change of different gray-level colors. Compared with the RGB color space, the HSV space is less sensitive to illumination changes and can better reflect the color distribution of the picture.
In the HSV color space, color light of the basic parameters can be measured with hue (H ), saturation (S), and value (V ). Value or luminance is defined as the brightness of the light, which is caused by human eyes. In general, the larger energy the color light contains, the brighter it presents. On the contrary, it is dark. Luminance range is from 0% (black) to 100% (white). Hue reflects the color categories. Saturation refers to the degree of color depth. For the same hue color, the higher the degree of saturation is, the deeper the color is, and the lower the saturation becomes, the lighter its color is.
The HSV color space is represented by a cone, as shown in Fig. 1(b). The three points on the top surface represent the three color positions of red, green, and blue, respectively. The S direction indicates the saturation change, and the closer to the outer frame, the higher the saturation; the height of the cone indicates the luminance value, and the bottom of V is black and the top is white.
So, the ith pixel in the color image can be expressed by where x and y are the indexes of the width and height of the image, and H , S, and V are hue, saturation, and luminance, which is transformed from RBG by where max is the function to obtain the maximal value, and min is the function to obtain the minimal value. From (3) to (7), H is in the range from 0 • to 360 • , S is in the range from 0 to 1, and V is in the range from 0 to 1.
In our research, Opencv is used because it is very popular to process the digital image with open source. In Opencv, H , S, and V are stored by the integers of 8 bits, which means the stored value of H is divided by 2 and round to the integer, and the stored value of S is amplified by 255 and round to the integer, and the stored value of V is amplified by 255 and round to the integer. So, the range of the stored value of H is from 0 to 180, the range of the stored value of S is from 0 to 255, and the range of the stored value of V is from 0 to 255. In the next, we express the value of H , S, and V according to these ranges.

C. General Flowchart
Since there is an obvious character which is the pepper fruit is red to distinguish the leaves, branches, and other backgrounds, the pepper fruit can be detected from the images based on the color space. In general, the red pepper fruit in the color picture is detected by the following steps: 1) read the picture with the original RBG color space; 2) convert to HSV color space; 3) get the mask within some small range of H , S, and V separated by a threshold; 4) morphological operations, such as dilating and eroding; 5) get the contours; and 6) get the rectangle of the max contour. The detection algorithm is to improve the mask and better get the fruit region from the contour. Since the picture should be detected at one time before the picking system operates every time, so we get the max contour of each detection to calculate the location and make the picking system move to the fruit.
These steps can be shown in the general flowchart of Fig. 2.
III. SCENARIOS AND PRESENT METHOD For investigating the detection problems of Sichuan Pepper fruit, we find out several typical scenarios and solve their problems from the features.

A. Typical Scenarios
Since the automatic picking system is located in the field, it can work well whenever the weather is cloudy or sunny, whether the background is soil, rock, road, or house. If the weather is cloudy, the pepper fruit is dim in the picture from the camera of the automatic picking system. The background of soil, rock, and others maybe interfere with the correct detection of the pepper fruit.
The typical scenarios are shown in the eight pictures in Fig. 3.

B. Otsu Algorithm
The Otsu algorithm is used to maximize the interclass variance. The principle is as follows.
Suppose there are L gray levels [1, 2, . . . , L] in a gray image. The number of pixels at level i is denoted by n i and the total number of pixels by N = n 1 + n 2 + · · · + n L . The gray-level histogram is normalized and its probability distribution is The pixels are divided into two classes C 0 , and C 1 by a threshold at level k; C 0 denotes pixels with levels [1, . . . , k], and C 1 denotes pixels with levels [k + 1, . . . , L]. Then, the probabilities and averages of C 0 , and C 1 , respectively, are given by and where So, we can get the relation The class variances are The optimal threshold k that maximizes interclass variance η or equivalently maximizes σ 2 where The Otsu algorithm is used for the gray picture, and we can convert the color picture into the gray one to use the Otsu algorithm instead of converting to HSV color space in the general chart of Fig. 2. The pixel number of every gray level from 0 to 255 is calculated, and the gray histogram is obtained, and the threshold is found by the Otsu algorithm. Then, we can get the thresholding image as the mask, and the thresholding image means the pixel value is set to 255 if the original pixel value is bigger than the threshold and set to 0 otherwise.
From the gray histogram of Fig. 4(b), we can get the threshold of 84; then, we can get the thresholding image of  In general, we give the detection of the maxim area because our detection is for the location of the pepper fruit to control the picking system, which means once detection once to pick.
On the other hand, we can use the image of the hue component instead of the gray one.
From the hue histogram of Fig. 5(b), we can get the threshold of 113; then, we can get the thresholding image of Fig. 5(c) and get the detection result of Fig. 5(d).
The Otsu method can work by using the image of the hue component instead of the gray one if the fruit exists, but it cannot work if there is no fruit, as an example in Fig. 6; from the hue histogram of Fig. 6(b), we can get the threshold of 84; then, we can get the thresholding image of Fig. 6(c) and get the detection result of Fig. 6(d), where the blue rectangle is the false detection of the pepper fruit.

C. Fixed Threshold
Since the Otsu algorithm is used for the gray picture, the color ROI is segmented through the color space, such as GBR, HSV, and so on.
When HSV space is used for segmentation, the red ROI is extracted by where ∈ means to belong to some set, ∪ means the union, and [] means a set. According to the color space of HSV, h 0L is fixed to 0 and h 1H is fixed to 180. Empirically, h 0H can be 10 and h 1L can be 156, s L can be 43 and s H can be 255, and v L can be 46 and v H can be 255.  When the empirical fixed values of H , S, and V are used, the detection results of Fig. 3(a) and (b) are good, as shown in Fig. 7.
In Fig. 7, the blue rectangle means the ROI, which means the pepper fruit is detected successfully.
But when the same empirical fixed values of H , S, and V are used, the soil or rock is regarded falsely as the pepper fruit, as Fig. 8   Next, we propose our method to get adaptive thresholds of H , S, and V components.

IV. PROPOSED METHOD
The empirical fixed values cannot meet the condition of soil or rock, what is the reason? We found out some interesting features.

A. Interesting Features
First of all, we found that the saturation value of rock is not beyond 50. For example, we choose two regions to analyze, as shown in Fig. 9. The red, blue, and green points of Fig. 9(b) and (d) are separate hue, saturation, and luminance values.
From Fig. 9, we can watch that the values of H , S, and V in Regions 1 and 2 are very stable. In Region 1, the average value of hue, saturation, and luminance is separately 160, 10, and 255, and in Region 2, the average value of hue, saturation, and luminance is separately 6, 44, and 243.
The average values of hue, saturation, and luminance are separately obtained by these steps: 1) choose the hue value of all pixels of the region; 2) convert the original picture with the format of jpg or png into HSV space, according to (3)-(7); and 3) calculate the average values. Second, we found that the saturation value of soil increases and the luminance value decreases compared with rock. For example, we choose two regions to analyze, as shown in Fig. 10.
From Fig. 10, we can watch that the values of H , S, and V in Regions 1 and 2 are very stable. In Region 1, the average  Third, we found that the saturation and luminance values of the pepper fruit change alternately, which means when  saturation increases a little, luminance decreases a little at the same time and vice versa. For example, we get the statistical result, as shown in Fig. 11.
From Fig. 11, we can watch that the value of H is very stable, and S and V change alternately both in Regions 1 and 2. In Region 1, the average value of H is 176 and 4, and the average value of S and V is separately 181 and 180. In Region 2, the average value of H , S, and V is separately 175, 167, and 100. Now, we give the analysis of S and V changing alternately. The RBG distribution of fruit region 1 is shown in Fig. 12(a). The value of the r component is always bigger than the values of the g or b components, and the value of the g component is very close to the value of the b component. The values are evenly divided into five parts according to the horizontal axis pixel. The values of each RGB component of each part are averaged to calculate the values of the saturation and value components by (4)- (7). The difference in S + V values between the five parts is small, especially between part 1 and part 2, as shown in Table I. The calculation results in Table I are in good agreement with Fig. 12(b).  Similarly, the RBG distribution of fruit region 2 is shown in Fig. 12(c). The values are also evenly divided into five parts according to the horizontal axis pixel. The values of each RGB component of each part are averaged to calculate the values of the saturation and value components by (4)- (7). There is almost no difference in S + V values between the five parts. The calculation results in Table II are in good agreement with Fig. 12(d).
Fourthly, if we calculate the hue histogram of the pepper fruit image with soil, rock, or road, there is a peak around 11, which is the start value of the orange color, namely the neighbor color of the red one. For example, we get the histogram result of soil and rock, as shown in Fig. 13.
The histogram of hue is obtained by the following steps: 1) convert the original picture with the format of jpg or png into HSV space, according to (3)- (7); 2) choose the hue value of all pixels; 3) get the sum number of pixels for every hue value; 4) search the maxim of the sum numbers; 5) normalize the sum numbers with maxim; 6) plot the histogram with the x-axis as the hue value from 0 to 180 and the y-axis as the normalization of the pixel numbers.

B. Balance Between Saturation and Value
We can get the balance between saturation and luminance of the pepper fruit, but there is no balance in the images of rock and soil, so we can use that to extract the pepper fruit.
Our proposed algorithm of balance between saturation and luminance (BBSV) can be expressed by: 1) initialize ROI to be empty; 2) set s i = s min ; and v H = 255 to extract the objective region and get ROI i ; 5) ROI = ROI ∪ ROI i ; 6) s i+1 = s i + s step ; and 7) repeat 3)-6) until s i+1 > 255 or v i+1 < v min where B, s min , v min , and s step are the parameters and will be analyzed in Section V; h 0L is fixed to 0 and h 1H is fixed to 180, and h 0H and h 1L can be adaptive value described in the next.

C. Adaptive Hue Threshold
Since there is a peak around 11 in the hue histogram, which is due to the dry grass or soil, and at the same time, there is a peak of 0, which is generated by the red color of the pepper fruit, so there is a minimum between the two peaks, and we should get the minimum in the range from 1 to 10 as the value of h 0H .
The another red range is similar. If there is a peak around 155 due to the dry grass or soil, and at the same time, there is a peak around 180 due to the pepper fruit, so there is a minimum in the range from 156 to 179 between the two peaks, and we should get the minimum as the value of h 1L .
But sometimes, there is no peak around 11 or 155 in the hue histogram if there is no dry grass or soil in the image, and the value of h 0H and h 1L should be separately 10 and 156.
So, we should estimate whether there is a peak around 11 and 155 in the hue histogram. If there is a peak around 11, there is a minimum in the range from 0 to 10, which is called the left minimum, and there is a minimum in the range from 11 to 25 (the end value of the orange color or the start value of the yellow color minus 1), which is called the right minimum. The slope of fitting the line from the left minimum to the peak (left line) is plus, and the slope of fitting the line from the peak to the right minimum (right line) is minus. We can use the slopes to estimate the presence of the peak. If the slope of the left line is beyond a threshold and the slope of the right line is below a threshold, the peak shows up.
In some cases, there are multiplex minimums in the range from 1 to 10 or from 156 to 179 in the hue histogram, or there are multiple values that are almost equal to the minimum. Then, we should get the biggest hue value as h 0H and the smallest hue value as h 1L .
In some cases, although the trend from 0 to minimum between 1 and 10 or from 180 to minimum between 156 and 179 is declining, it maybe fluctuant, and the minimum is not as good as the value of h 0H or h 1L . If the value in the range from 0 to minimum between 1 and 10 or from 180 to minimum is smaller than the front and back neighbor values or the difference between the contiguous values alternates with positive and negative, the fluctuation is present, and we should get the first value whose next value toward the minimum is bigger and beyond a threshold, which is the fluctuation elimination.
Overall, the steps of our proposed algorithm of an adaptive hue threshold (AHT) are as follows: 1) calculate the histogram of hue and get the pixels number of every hue value: R(h min , . . . , h max ); 2) assume h a , h b , and h c (h c > h b > h a ) as the first, second and third color start point, for example, h a = 0 is the red color start point, h b = 11 is the orange color start point, and h c = 26 is the yellow color start point; 3 6) search the peak R h p ,max = max h b <h i <((h a +h b )/2)int R(h i ) and get the index of h p ; 7) search the right minimum R h r ,min = min int((h b +h c )/2)<h i <h c R(h i ) and get the index of h r ; 8) fit a straight line with the x-axis of h i / h p (h p ≥ h i ≥ h l ) and the y-axis of R h i /R h p ,max , then get the slope of the line g l,l ; 9) fit another straight line with the x-axis of h i / h r (h r ≥ h i ≥ h p ) and the y-axis of R h i /R h p ,max , then get the slope of the line g r,l ; 10) if g l,l ≥ T 1 and g r,l ≤ T 2 , then h 0H = h l ; 11) get the normalized histogram of hue 12) calculate the difference between the neighbors as: 13) initialize set U max and U min to be empty; 17) from 0 to the maxim index of U max and U min , search the first matched result: if U min (i) < U max ( j) < U min (i + 1) and R ′ (U max ( j)) − R ′ (U min (i)) > T 3 , then h 0H = min(h 0H , U min (i)). Note: the steps of 11)-17) are the fluctuation elimination.
18) assume h d , h e , and h f (h d > h e > h f ) as the first, second, and third color endpoint, for example, h d = 180 is the red color endpoint, h e = 155 is the purple color endpoint, and h f = 124 is the blue color endpoint; 19) initialize the h 1L = h e + 1; 20) search the right minimum R h r ,min = min h d <h i <int((h d +h e )/2) R(h i ) and get the index of h r ; 21) if there are multiplex minimums, then choose the smaller index: from h r − 1 to int 28) calculate the difference between the neighbor as: 29) initialize set U max and U min to be empty; 32) repeat 13)-14) with h d − 1 > h i > h e from h d − 2 to h e + 1; and 33) from 0 to the maxim index of U max and U min search the first matched result: if U min (i) < U max ( j) < U min (i + 1) and R ′ (U max ( j)) − R ′ (U min (i)) > T 7 , then h 1L = min(h 1L , U min (i)).
Note: the steps of 27)-33) are the fluctuation elimination. In the above steps, T 0 T 7 are the parameters, and int is the function to get the integer, and in our research, h min = 0 and h max = 180.

D. Flowchart
After the processing of AHT and BBSV, there is edge detection to remove the indistinct region, which is induced by the pepper fruit far away. The gray level is very low after the indistinct region is processed by the edge detection, and it can be removed when it is below a threshold.
The Sobel algorithm is widely used because its process is fast and the edge detection result is effective. The Sobel edge processor uses a convolution kernel to create a series of gradient magnitudes. Applying convolution K to pixel group p, the output N is In general, the convolution K is divided into two convolution kernels, one to detect changes in horizontal contrast (K x ) and another to detect vertical contrast (K y ) After the horizontal and vertical convolution, the horizontal output is N x (x, y) and the vertical output is N y (x, y), so the Sobel processor output is From Fig. 14, we can watch out that the indistinct region is removed very well after Sobel detection and dilating and removing the lower gray region.
Our proposed algorithm is used before getting the mask and after converting to HSV color space in the general flowchart in Fig. 2, so the general flowchart is modified and the flowchart of the whole detection of our proposed method is shown in Fig. 15.

V. EXPERIENCE RESULTS AND ANALYSIS
We evaluate the parameters of our proposed algorithm first and then introduce the dataset and performance metrics to get the final experience result.

A. Parameters of AHT and BBSV
For AHT, T 0 − T 7 are the parameters. T 0 is 0.08, T 1 is 0.45, T 2 is −0.45, T 3 is 0.15, T 4 is 0.05, T 5 is 0.45, T 6 is −0.45, and T 7 is 0.15.   Fig. 13(a), and the blue rectangle of Fig. 16(e) and (f) is the detection of the pepper fruit, and the output result is only the maxim of the detection contour. The hue threshold is 4 and 156 according to Fig. 16(a)-(d) and the threshold is used in Fig. 16(f), and the fixed threshold of 10 and 156 is used in Fig. 16(e).
From Fig. 16, we can watch that the parameters with the typical value are effective and get the correct detection result; in contrast, the fixed hue threshold works invalidly.
For BBSV, the typical parameter values are B = 230, s min = 110, v min = 40, and s step = 10, and Fig. 17 is the detection result by BBSV with the typical parameters. There is no blue rectangle of detection, and it is obvious that BBSV works very well and there is no false detection anymore.  Now, we turn on and off the fluctuation elimination, as shown in Fig. 18, where FE means fluctuation elimination. We can watch that the false detection can be eliminated further.
Finally, we turn on the AHT and BTSV at the same time to see how our proposed algorithm works in the typical scenarios of Fig. 3. Fig. 19 is the detection results, and we can watch that our proposed algorithm works very well.

B. Dataset
To evaluate the performance of our algorithm, we build the dataset to test. The original pictures with the pepper fruit are more than 300, which is at most 1706 × 1280 (expressed by origin in the following equation), and some of them are got from the internet, and others are photographed in the zanthoxylum bungeanum garden. We use the following equations to generate many pictures of 640 × 640 (expressed by img) from every original picture: img (x, y) = origin (x,ỹ) (28) where x and y are the pixel indexes, W is the width of the original picture, H is the height of the original picture, and Then, we can get more than 1900 pictures for the test dataset, and there are close to 1800 pictures with the pepper fruit and more than 120 pictures without the pepper fruit.

C. Performance Metrics
To evaluate the performance of our algorithm, we build the dataset to test.
True positive (TP) and true negative (TN) correspond to images, in which the presence or absence of the pepper fruit was correctly recognized. False positive (FP) and false negative (FN) correspond to images, in which the presence or absence of the pepper fruit was not correctly recognized.
The recall rate is The accuracy is The false alarm is Beyond the above classic measurement, we add another two measurements to evaluate the location precision of the detection fruit.
The first is the ratio of the overlap area between the real rectangle and the ideal rectangle of the fruit region where A o is the overlap area, and A ideal is the ideal rectangle area. The ideal region is manually labeled according to the fruit cluster width and height.
The Roa definition can be shown in Fig. 20. The second is the coordinate error, which is defined by the ratio of the central coordinate difference between the real and the ideal contrast with the ideal coordination width or height where x ideal and x real are separately the ideal and real central coordinate of the x-axis, y ideal and y real are separately the ideal and real central coordinate of the y-axis, and w ideal and h ideal are separately the ideal width and height. The coordinate error definition can be shown in Fig. 21.

D. Result Analysis
The performance result with the different parameters is shown in Fig. 22 and Table III. The coordinate error in Table III is the average of x ce and y ce . From Fig. 22 or Table III, we can watch out that the false alarm of the fixed parameter is very bad, and since the picture number without the pepper fruit is a small ratio, the recall and accuracy are still good.
While the parameter value of B increases from 230 to 270, the recall, accuracy, and false alarm become better. When the parameter is above 250, the false alarm is zero, and the location error is below 10%. If false detection happens, the picking system will operate wrong, and the picking system default is at high risk when the picking system recognizes the rock or soil as the pepper fruit. So, we usually choose the parameter to reduce the false alarm as possible, and 260 is the best parameter of B.  With the best parameter, the recall rate, accuracy, and false alarm of our proposed method can separately achieve 100%, 100%, and 0%, and the ratio of the overlap area can get above 64%, and the coordination error is about 8%. All performance is greatly improved compared with the fixed threshold.
While the parameter value of B increases from 230 to 270, the ratio of the overlap area becomes smaller, and the location error becomes higher. The overlap area becomes smaller, which can be watched out through Fig. 24, which is the mask of different parameters. When the parameter value of B increases to 250, there is no connection between the left pepper fruit region and the right one as Fig. 24(c) and (d). Since we only get the maximal contour of the region, the maximal region is left, and the detection result is left as Fig. 23(b). When the parameter value of B increases to 260, there is no connection  between the top pepper fruit region and the bottom one as Fig. 24(e) and (f), and the detection result is top as Fig. 23(c).
In addition, the overlap area ratio with the fixed parameter increases and the location error increases at the same time, that is because the detection region is much bigger due to the   false alarm, so the region includes the ideal detection part, which makes the overlap area ratio bigger and the region is out of range of the ideal detection part, which makes the location error bigger. It can be watched out from Fig. 25, where the white rectangle is the ideal detection part and the blue rectangle is the real detection part. Now, we can compare the function of the BBSV and AHT algorithms. Fig. 26 and Table IV are the result. The coordinate error in Table IV is the average of x ce and y ce . From Fig. 26 or Table IV, we can see BBSV is a bigger function, but both BBSV and AHT take effect.

VI. CONCLUSION
We address a method including BBSV and AHT to detect the zanthoxylum bungeanum fruit of the photograph from an automatic picking system. The recall rate, accuracy, and false alarm of our method can separately achieve 100%, 100%, and 0%, and to evaluate the location precision, we address two metrics: the ratio of the overlap area between the real detection rectangle region and the ideal detection rectangle region and the coordination error between the real x-axis/y-axis and ideal x-axis/y-axis. The ratio of the overlap area of our method can get above 64%, and the coordination error is about 8%. All performance has been greatly improved compared with the empirical fixed threshold.