Detection Method of Damaged Camellia Oleifera Seeds Based on YOLOv5-CB

To solve the problems that the existing sorting equipment cannot effectively identify and sort damaged Camellia oleifera seeds and traditional manual sorting of damaged Camellia oleifera seeds is inefficient and slow, in this paper, a damaged Camellia oleifera seeds detection method based on YOLOv5, coordinate attention, and weighted bidirectional feature pyramid network was designed. In this study, according to the actual requirements, firstly, the Coordinate Attention module (CA) was added to the YOLOv5 algorithm to improve the detection precision of damaged Camellia oleifera seeds in stacked Camellia oleifera seeds. Secondly, the network structure was optimized and the weighted bi-directional feature pyramid network (BiFPN) was added. The module integrates multi-scale features from top to bottom to reduce the missed detection of slightly damaged Camellia seeds. The final experimental results show that compared with the original YOLOv5 model, the detection precision of the improved model YOLOV5-CB is improved by 6.1%, reaching 92.4%, and the mean Average Precision (mAP) is also improved from 87.7% to 93.4%, the average detection time of a single Camellia seeds image is 6.4ms, which meet the requirements of precision and real-time in practical application.

2022 supports the expansion of the Camellia oil plantation area [7], which will further increase China's Camellia oil plantation area.
Camellia oleifera fruit includes two parts: husk and Camellia oleifera seeds. In the production and processing of Camellia oil, two ways are often used: pressing oil with the shell and removing the shell. Camellia oleifera husks do not contain oil, and the shell-crushing can effectively improve the oil yield and the quality of Camellia oil while reducing machine wear and improving the service life of machines and accessories [8]. Therefore, peeling oil is the most common oilpressing method. The product of Camellia oleifera fruit after shelling treatment contains a large amount of peel and a small number of broken Camellia oleifera seeds caused by shelling. To ensure the quality of Camellia oil, separating the fruit dander and damaged Camellia oleifera seeds from the Camellia oleifera seeds used for oil extraction is necessary. For example, Li et al. designed a seed shell sorting machine composed of a rubber conveyor belt and vibration VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ support plate by taking advantage of the difference in the shape and friction coefficient of Camellia oleifera seed and fruit shell so that the oil Camellia oleifera seed shell moves upward while the Camellia oleifera seed moves downward, and Camellia oleifera seed and fruit shell are separated [9]. For example, Chen et al designed a cutting-type Camellia oleifera fruit depilator by taking advantage of the difference in the quality of Camellia oleifera seeds and husks and realized the separation of husks and seeds by adjusting the rotation speed of the rubber plate installed in a spiral manner [10]. For example, Lu et al. designed a kind of oil Camellia shell seed sorting machine integrating photoelectric color sorting by taking advantage of the color difference between fruit shell and Camellia oleifera seed, which can collect and process oil Camellia material images in real-time through a CCD camera, and realize shell seed separation through highpressure gas [11]. At present, the separation equipment of Camellia oleifera can achieve the separation of fruit shell and Camellia oleifera seed efficiently by using a sieve plate, wind force, friction, and color selection. However, because the quality, color, and friction coefficient of damaged Camellia oleifera seed and complete Camellia oleifera seed are not very different, the traditional sorting method cannot separate them well from them. In recent years, with the development of computers, machine vision technology has been gradually applied to the field of Camellia oleifera separation. For example, Li et al proposed a sorting method for Camellia oleifera fruit based on multi-feature identification of preference immune network [12]. Although it has a great improvement compared with the traditional sorting method, the algorithm recognition rate can reach up to 90%, but in practical applications due to the complexity of the environment, the algorithm still has a high possibility of false detection and missed detection. For example, Yufei et al proposed a sorting method for Camellia oleifera husk and seeds combined with machine vision technology [13]. This method comprehensively considers various image features of Camellia oleifera fruit color, texture, and geometric shape, but this method has a better sorting effect on the husk, and cannot effectively sort the damaged Camellia oleifera seeds. For example, Xie et al proposed a Camellia oleifera seed integrity recognition algorithm based on a convolutional neural network, and the integrity identification of Camellia oleifera seeds is as high as 98.05% [14], but the Camellia oleifera seeds in the samples detected by this algorithm are spaced. In practical industrial applications, it is more common for Camellia oleifera seeds to stack, block each other, and have a large base.
The above algorithms show that the application of machine vision in the field of Camellia oleifera seed sorting is completely feasible, but their deficiency is that they do not consider the accumulation and occlusion of Camellia oleifera seed in practical application, and the existing algorithms often cannot guarantee a high recognition rate in these situations.
The complete shell is conducive to the storage of camellia oleifera seeds, because the damaged shell of camellia oleifera seeds is vulnerable to microbial contamination during transportation and storage, and the mildew rate is much higher than that of camellia oleifera seeds with complete shell [15]. The mildew of camellia oleifera seeds is extremely dangerous, which not only seriously affects the quality of camellia oil, but also leads to the accumulation of toxic substances [16]. Using camellia oleifera seeds with damaged shells during camellia oil extraction has the risk of introducing mildewed camellia oleifera seeds [17]. Aspergillus is the most common mold in moldy camellia oleifera seeds, which will cause serious harm to the human body. For example, the most common aflatoxin in moldy agricultural products has strong hepatotoxicity, carcinogenicity, teratogenicity, and mutagenicity [18]. Therefore, it is of great significance to ensure the quality and safety of Camellia oil to accurately sort out the damaged camellia seeds.
Based on the YOLOv5m model in the YOLO series, this paper integrates the attention module and weighted bidirectional feature pyramid network and makes full use of the attention mechanism to important information. Weighted bidirectional feature towers can aggregate features with different resolutions. A new camellia oleifera seed detection network model is constructed to solve the problem that existing algorithms cannot effectively identify damaged camellia oleifera seeds in complex environments such as occlusion and stacking. Because coordinate attention (CA) and weighted bidirectional feature pyramid (BIFPN) are introduced at the same time, I named the network YOLOv5-CB.

A. EXPERIMENTAL MATERIALS
The camellia oleifera seed sample used in this experiment was produced in Suzhuang Town, Kaihua County, Quzhou City, Zhejiang Province. The whole experiment process completely simulated the actual industrial assembly line. Use labelimg software to manually mark the position of damaged camellia seeds on the collected data set, and place the marked image under the label file. The experiment collected 558 data images and removed 19 data sets without damaged Camellia oleifera seeds. Of the 539 pictures that meet the experimental requirements, 70% are randomly selected as the training set, 20% are randomly selected as the verification set, and 10% are randomly selected as the test set. Some data sets are shown in Figure 1 (a).
The cutout is a regularization technique of a convolutional neural network. It can improve the network's robustness by randomly shielding some square areas during the training process [19]. Due to the insufficient number of samples, the overlapping environment cannot be fully simulated. This paper uses Cutout data enhancement technology to simulate overlapping samples during training. The characteristics of the shielding area randomly generated by Cutout technology can reasonably represent the overlap of camellia oleifera seeds in practical application. The training set processed by cutout technology is shown in Figure 1 (b).

B. EXPERIMENTAL EQUIPMENT
The equipment used in this experiment mainly includes a feeding device, an image acquisition device, a conveyor belt, and parallel manipulator four parts. The whole working process is as follows: the shell and the mixture of Camellia oleifera seeds fall onto the sieve plate from the feed port. The quality of the causal shell is different from that of the Camellia oleifera seeds. The sieve plate is used for the rough selection of Camellia oleifera seeds and fruit shells, and the fruit shells fall into the debris collection box, and the Camellia oleifera seeds fall onto the conveyor belt. Camellia oleifera seeds arrive at the image collection area with the conveyor belt. The camera takes pictures of the Camellia oleifera seeds on the conveyor belt and transmits the collected images to the computer. The computer determines the damaged Camellia seeds through the algorithm and controls the manipulator to put the damaged Camellia oleifera seeds into the corresponding collection box, and the intact Camellia oleifera seeds fall into the final collection box. The 3d assembly drawing of the experimental equipment is shown in Figure 2. The physical picture of the equipment is shown in Figure 3.
The image acquisition device is shown in Figure 4, including 1 camera and 2 fill lights. Two parallel fill lights are located on both sides of the top of the camera, which    can effectively reduce the influence of shadows during image acquisition. The camera part is mainly composed of AI-230U150M industrial camera and an LB0814-5M lens.

III. METHOD A. YOLOv5-CB NETWORK MODAL
The model used in this paper is the YOLOv5 network of the latest version of the YOLO series. YOLOv5 network can be divided into S, M, L, and X according to depth_multiple and width_multiple, as shown in Table 1.
To select a suitable network model, this paper carries on the experimental comparison to the commonly used YOLOv5 series algorithms. The performance pairs of YOLOv5 series models are shown in TABLE 2. It can be seen from the table that YOLOv5 series models have little difference in VOLUME 10, 2022 Precision, Recall and mean Average Precision (mAP) of the data set collected in this experiment, but they cannot meet the detection requirements of practical application. Through comprehensive consideration of various parameters and model sizes, to meet the requirements of industrial application precision and real-time detection at the same time, this paper chooses YOLOv5m as the basic model for improvement.
It can be seen from Table 2 that the performance of YOLOv5l and YOLOv5x has decreased to a certain extent, which conflicts with the cognition that improving the depth can enhance the ability of network learning characteristics. Although increasing the depth can improve the ability to learn different features, it does not always improve the performance of the network [20]. In this data set, the characteristics of damaged camellia oleifera seeds are likely to be confused with those of normal camellia oleifera seeds after overlearning, leading to performance degradation.
The overall structure of the YOLOv5 network consists of four parts: the input end, the backbone network, the neck, and the output end. The input end performs preliminary processing such as data enhancement and adaptive image scaling. The backbone network mainly obtains the feature map through a series of operations such as convolution, pooling, cross-scale connection, and spatial pyramid. A series of mixing and combining are performed to complete the fusion of high and low-level image features to obtain more image feature information. Finally, the output terminal predicts the image features through the obtained feature information and generates the boundary with the highest confidence according to the size of the detection target. Complete the whole target detection process. The improved model YOLOv5-CB is shown in Figure 5.

B. COORDINATE ATTENTION MODUL
In the process of practical industrial application, the Camellia seeds on the production line are inevitably stacked and occluded. To make the model more focused on the feature information of damaged Camellia oleifera seeds, the attention module was introduced into the YOLOv5 model. Attention is regarded as an effective solution for general classification tasks [21] and has been widely used in various neural networks. At present, the Squeeze-and-Excitation attention (SE) mechanism is widely used [22]. This mechanism calculates channel attention through 2D pooling, which significantly improves the performance of current mainstream network models. But SE only encodes the information between channels, ignoring the equally important spatial relationship, which is used to capture objects in vision tasks. Spatial information is also crucial in capturing target structures in visual tasks [23]. Later, Woo et al proposed a Convolutional Block Attention Module (CBAM) [24]. Unlike the SE module which only calculates channel attention, the CBAM module obtains position information by reducing the channel dimension of the input tensor and then uses convolution to calculate the space. Attention, which is sequentially inferred along two independent dimensions, channel, and space, is structured as shown in Figure. 6(a). The CBAM module is a lightweight general module, and it can improve the performance of various models to a certain extent. It is also a commonly used attention mechanism. However, the spatial attention calculated by convolution can only extract part of the information, and still cannot establish the long-range dependencies that are essential for vision tasks. In this case, Hou et al. proposed a Coordinate Attention mechanism (CA) [25], the structure is shown in Figure. 6(b).
The CA attention module embeds the position information into channel attention. To avoid the loss of position information caused by the 2D global pooling operation, the 2D global pooling is decomposed into two 1D global pooling, and then the two 1D global pooling is used. The global pooling operation aggregates the input features in the horizontal and vertical directions into two independent feature maps and encodes the two feature maps that have embedded directional features into an attention map, so the location information can be saved in the two attention maps, and finally, the two attention maps are applied to the input feature map by multiplication. Specifically, given input X, CA attention encodes each channel in two spatial ranges (H,1) (1, W) along horizontal and vertical coordinate directions respectively, that is the output of channel c at height h is: Similarly, the output of channel c with width w is expressed as: This paper introduces the CA attention mechanism into the backbone network of the YOLOv5 model. One of the improvements is to add the CA module to the partial convolution operation at the beginning of the backbone network so that more coordinate information can be retained when the image is first processed. The second improvement is to add a layer of the CA attention module to the backbone network so that the feature information of the picture can be further processed before entering the neck.  [27]. The importance of analyzing images at many scales stems from the nature of the image itself. Real-world scenes contain objects of various sizes, large and small, and these objects contain a lot of information such as size, position, color, etc. The FPN pyramid structure of the FPN may miss information on other scales. To solve this problem, Tan et al. proposed a simple and efficient weighted bi-directional feature pyramid network (BiFPN) [28], It repeatedly applies top-down and bottom-up multi-scale feature fusion while introducing learnable weights to learn the importance of different input features. As shown in Figure 7.
The difference between the weighted bidirectional feature pyramid and the previous works is that when integrating different input features, most of the previous feature pyramids are simply summed up, but the contributions of these different input features to the fusion output features are different. The where ∈ = 0.0001 is used to avoid numerical instability, and w is a learned parameter, similar to the attention mechanism, VOLUME 10, 2022 used to distinguish the importance of different features in the feature fusion process. Let's take the calculation of the output of layer 6 as an example: The calculation expression is: P out 6 = Conv( w 1 · P in 6 + w 2 · P td 6 + w 3 · Re size(P out 5 ) In the formula P td 6 -the middle feature of the 6th layer from top to bottom, P in 6 -the input feature of the 6th layer from left to right, P in 7 -the input feature of the 7th layer from the top to the bottom, P out 5 -the output of the 5th layer from the bottom to the top Features, P out 6 -output features of layer 6, Resize -usually an upsampling or downsampling operation.

IV. RESULTS AND ANALYSIS
The hardware configuration of the experimental environment in this paper is Intel(R) Xeon(R) Gold 5218R CPU@ 2.10GHz, GPU RTX 3090, 24GB video memory. The software configuration is Windows10, and Cuda 11.1. Deep Learning Framework Platform: Pytorch 1.8.1, Python 3.8.5.

A. PERFORMANCE EVALUATION INDEX
Appropriate evaluation indicators should be selected when analyzing the experimental results. Since Precision and Recall have certain limitations [29], this paper conducted a comprehensive analysis based on three indexes: Precision (P), Recall (R), and mean Average Precision (mAP). The Precision refers to the proportion of the prediction targets of a certain category that are correctly predicted to be positive to all the predicted positive samples of the category. The Recall represents the proportion of positive correct predictions in all positive forecast samples [30]. Whose definition formulas are: where TP(True Positive) -the correct Positive sample, which is Positive and judged to be Positive; FN(False Negative) -False Negative sample, a positive sample but judged to be Negative; FP(False Positive) -False Positive sample, which is negative but judged to be Positive; TN (True Negative)the correct Negative sample, originally a Negative sample and judged to be Negative. What we pursue is that both the Precision and the Recall can reach an ideal value. However, both influence and restrict each other in reality. The most common in this case is to use the Average Precision (AP) and the mean Average Precision (mAP) to measure the detection precision of an algorithm [31], and the detection effect of the model is reflected by the values of AP and mAP. Calculated as follows: In the formula, N-the number of detected categories. Since there is only one detection class in this experiment, mAP is equivalent to AP here.

B. EXPERIMENT
This paper also refers to the CBAM attention module commonly used today. As can be seen from Figure 8, the performance of the YOLOv5 algorithm which only introduces the CBAM attention module decreases instead of rising, indicating that this method is not suitable for Camellia oleifera seed detection. However, the performance of the model using the CBAM attention module and BiFPN pyramid network is significantly improved, but the mAP curve cannot reach the level of the YOLOV5-CA model. In comparison, the improved method adopted in this paper is more suitable for the detection of damaged Camellia oleifera seeds under the condition of the industrial conveyor belt. As can be seen from the figure, the performance of the algorithm after the introduction of CA attention and the BiFPN pyramid in this paper has been significantly improved. Although the precision and mAP are not as good as the original model at the beginning of training, the model curves after convergence are better than the original model.
It can be seen from Table 3 that YOLOV5-CB has better performance than other improved methods. Compared with YOLOV5m, the detection precision of YOLOV-CB is 6.1% higher than the original model, reaching 92.4%, and the mAP is 5.7% higher. Although the recall rate is reduced to a certain extent, the improved model can fully meet the requirements  for detection precision in practical applications. In addition, the size of the improved model has also increased from 40.2MB to 44.3 MB. However, the average detection time of a single camellia seed image is 6.4 ms, which meets the requirements of the detection task of damaged camellia seeds in the industrial assembly line.
It is easy to see from Table 4 that the improved model in this paper is superior to several well-known target detection algorithms in all aspects of the dataset. On the one hand, the basic model of YOLOv5-CB uses the well-known YOLO series of algorithms. On the other hand, the improved method is selected based on the characteristics of camellia oleifera seeds. First of all, based on YOLOv5, this paper integrates the CA focus mechanism module to make the algorithm pay more attention to the important feature information of the image, and save the coordinate information of the image feature. Secondly, a multi-scale feature fusion BiFPN weighted bidirectional feature pyramid network is introduced to further improve detection precision and reduce model loss and error detection. The final experimental results show that the YOLOv5-CB model has higher precision and average precision than the original model. The average detection time of a single camellia seed image is 6.4 ms, which meets the realtime requirements.

C. MODEL DETECTION RESULTS ANALYSIS
In practical application, we often face two difficulties. On the one hand, the number of Camellia oleifera seeds on the conveyor belt is large, and the proportion of damaged seeds is relatively small. At the same time, the damage of individual damaged Camellia oleifera seeds is small, and the proportion of damaged seeds on the whole picture is very low, which will lead to the occurrence of missing inspection. On the other hand, in the actual assembly line, Camellia oleifera seeds often slip onto the conveyor belt through the vibrating screen plate, which makes the damage in the dense Camellia oleifera seeds difficult to identify. In this paper, the optimization of the original YOLOv5 model can effectively improve the missed and false detection of the model in practical industrial applications. Figure 9. shows the comparison of the detection effect of the original model and the YOLOv5-CB model on some samples. It can be seen from Figure 9. (a) that the improved YOLOv5 model is more accurate for the detection of slightly damaged Camellia oleifera seeds. As shown in Figure 9. (b), the YOLOv5-CB model has a higher detection precision for damaged Camellia seeds in dense Camellia oleifera seed groups. Through the detection results of evaluation indicators and some samples, the CA attention mechanism was introduced to strengthen the processing of damage feature information of the model, and effectively improve the detection precision of the model for slightly damaged Camellia oleifera seeds. Through the multi-scale feature fusion of BiFPN weighted bidirectional feature pyramid, the detection ability of damaged Camellia seeds in dense Camellia oleifera seeds was greatly improved.
It can be seen from the above that YOLOv5-CB performs well, but there are still some shortcomings. On the one hand, YOLOv5-CB has an increase of 4.1MB compared with the original model, which should be further lightweight to adapt to applications on mobile terminals and embedded devices. On the other hand, the detection time of YOLOv5-CB single image is 6.4 ms, but in fact, this is the detection speed in the laboratory environment. In practical application, the cost of using 3090 equipment is inappropriate. The model should be further simplified to improve the detection time under traditional equipment.

V. CONCLUSION
Improving the detection accuracy and efficiency of camellia oleifera seeds is of great significance for supporting the expansion of the camellia oleifera planting area. Aiming at the problems of low manual sorting efficiency and low machine selection accuracy of camellia oleifera seed damage, a detection method of camellia oleifera seed damage based on attention mechanism and feature pyramid was proposed. First of all, to simulate the real industrial environment, we designed a special experimental scheme to collect data sets and used Cutout technology to simulate overlapping and occluding environments. Secondly, through the experimental comparison of several models with different widths and depths of the YOLO series, we choose YOLOV5m, which has a relatively balanced effect and detection time, as the basic model. Finally, a coordinate attention module and weighted bi-directional feature pyramid network are introduced to strengthen the detection of slightly damaged camellia seeds in the model and improve the detection ability of the model for damaged camellia seeds in dense camellia seeds.
The final experimental results show that YOLOv5-CB meets the requirements of high precision and high-speed detection in practical applications, and its performance is superior to the general target detection model.
Although YOLOv5-CB meets the requirements of high precision and fast speed for industrial applications, we still need to further optimize it in the future. The YOLOv5-CB model is not superior to the original model in size, so we should lightweight it to adapt to mobile terminals and embedded devices with higher model requirements. Secondly, the detection time of YOLOv5-CB single image is 6.4 ms, but in fact, this is the detection speed in the laboratory environment. In practical application, the cost of using 3090 equipment is inappropriate. The model should be further simplified to improve the detection time under conventional equipment.  JUNJIE LIU was born in Yongzhou, Hunan, China, in 1999. He is currently pursuing the master's degree. His main research interests include deep learning and medical image processing. VOLUME 10, 2022