Processing math: 100%
Identification of Tea Disease Under Complex Backgrounds Based on Minimalism Neural Network Architecture and Channel Reconstruction Unit | IEEE Journals & Magazine | IEEE Xplore

Identification of Tea Disease Under Complex Backgrounds Based on Minimalism Neural Network Architecture and Channel Reconstruction Unit


It is a diagram of model transfer process. In this study, we utilize migration learning to apply the knowledge learned from the source target domain to the target domain ...

Abstract:

Tea production plays a crucial role in maintaining agricultural output, and the prompt diagnosis and efficient management of tea diseases are essential for ensuring a hea...Show More

Abstract:

Tea production plays a crucial role in maintaining agricultural output, and the prompt diagnosis and efficient management of tea diseases are essential for ensuring a healthy tea industry. Traditional machine learning techniques for disease identification often require time-consuming feature engineering tasks, which can be a bottleneck in achieving accurate and efficient results. In contrast, deep learning approaches have shown superior performance in disease identification by eliminating the need for manual feature engineering. However, in complex backgrounds, such as when environmental variables and multi-scale changes interact with the imaging of tea diseases, feature extraction becomes extremely challenging. In this study, a novel technique for tea diseases recognition in a complex setting is proposed, including the channel reconstruction unit (CRU) and the minimalism neural network model VanillaNet, named VCRUNet, to address the aforementioned issues. VCRUNet incorporates the channel reconstruction unit (CRU), which effectively reduces channel redundancy between features in the convoluted neural network, thereby improving the model’s ability to extract relevant features. To overcome the limitations of limited sample data, we fine-tuned the model parameters using transfer learning on a self-built tea disease dataset, supplemented with the Plant Village dataset for pre-training. The experimental results demonstrate that our proposed technique achieves an impressive accuracy of 92.48% in accurately detecting tea diseases in complex environments. This significant improvement in accuracy outperforms current methods and enhances the efficacy of tea disease identification. Meanwhile, the detection speed is 4.5 seconds per 100 images. The outcomes of this research have a direct impact on the early diagnosis and effective management of tea diseases. By providing a more accurate and efficient approach, our technique contributes to the overall agricultural output and prom...
It is a diagram of model transfer process. In this study, we utilize migration learning to apply the knowledge learned from the source target domain to the target domain ...
Published in: IEEE Access ( Volume: 12)
Page(s): 35934 - 35946
Date of Publication: 06 March 2024
Electronic ISSN: 2169-3536

Funding Agency:


SECTION I.

Introduction

Tea diseases have a large influence on tea yield and quality [1]. In modern agricultural production, manual identification remains the primary method of detecting tea infections, which presents a substantial difficulty for tea garden managers. Manual observation takes a lot of time and effort, making it difficult to swiftly identify disease states in huge amounts of data. This might influence the identification’s accuracy and dependability. The swift advancement of artificial intelligence technology has led to a growing interest in investigating more precise and effective ways to diagnose tea diseases.

Nowadays, deep learning-based image recognition algorithms are widely used in agriculture [2], [3], [4], [5], since deep learning is progressively replacing classical machine learning in this field to address issues related to disease identification. Deep learning learns advanced characteristics from data in an incremental way, significantly improving the accuracy of crop disease diagnosis. Traditional machine learning often extracts disease features manually, which increases the complexity of disease feature extraction.

Despite the fact that crop disease identification and detection techniques have been done well [6], [7], the timing and state of disease occurrence produces large variations due to geographic differences in different regions. The use of many models is not highly robust, and there is no way to achieve a more reasonable migration, so we need to construct our own dataset and experimentally confirm the effectiveness and realism of the identification of specific diseases with the help of deep learning methods. For the implementation of disease recognition algorithms, many models use increasing the depth of the network hierarchy to improve the model performance, which not only causes phenomena such as gradient explosion or gradient vanishing but also affects the model recognition speed and size. Therefore, it is necessary to design a high-performance disease recognition algorithm with a minimalist structure.

In this study, a minimalist architecture tea disease recognition model, VCRUNet, based on channel reconstruction unit (CRU), is proposed for the problem of accurate recognition of tea diseases in complex backgrounds. The proposed network model uses a convolutional module for image feature extraction, adds CRUs to improve the ability of disease feature localization in complex backgrounds, and combines global features to significantly improve the accuracy of tea disease recognition. The main contributions of this paper can be summarized in the following three aspects:

  1. The tea disease dataset was constructed to address the problem of insufficient samples of tea disease datasets with complex backgrounds. Meanwhile, the diversity of image features was enhanced using data enhancement strategies to simulate the light, angle, and occlusion encountered in real-life scenes;

  2. A tea diseases recognition model, VCRUNet, is developed to address the challenge of recognizing tea diseases in complex backgrounds. It is built around a minimalist framework consisting of VanillaNet13 and CRU. In parallel, migration learning is used to fine-tune parameters on a pre-training model using a self-constructed tea illness dataset in order to reduce the effects of overfitting caused by insufficient sample size;

  3. The proposed method achieves the best recognition accuracy of 92.48% on the tea disease dataset. In terms of model recognition speed, it is 4.5 seconds per 100 images. The recognition success rate has a leading edge over other advanced algorithms.

SECTION II.

Related Works

Many scholars have lately addressed agricultural disease issues with deep learning-based recognition systems. In order to limit the loss of feature information and expand the sensory field, Pan et al. [8] developed an inflated convolution-based feature fusion (DCFF) module. This module aggregates feature information from various convolutional layers in two phases, thereby extending the sensory field and providing a powerful feature representation capability. Additionally, the InceptionC module was added to further improve performance. Jadhav et al. [9] presented an effective transfer learning-based strategy for identifying soy-bean diseases that used pre-trained AlexNet and GoogleNet convolutional neural networks (CNNs) to reach 98.75% and 96.25% accuracy, respectively. Simhadri et al. [10] proposed the use of transfer learning for rice leaf disease recognition, and experiments showed that the InceptionV3 model achieved the best performance. Zhao et al. [11] proposed a self-supervised Contrastive learning method for Leaf disease identification with do-main Adaptation (CLA), which utilizes large-scale yet messy unlabeled data to train the encoder and obtains their visual representations in the pre-training stage. Zhou et al. [12] developed a residual-distilled transformer architecture to address the issues with identification accuracy and model interpretability in rice leaf disease diagnosis study. The authors present a distillation technique to extract weights and parameters from the pre-trained vision transformer models. As residual blocks for feature extraction, the residual concatenation of the vision transformer and the distilled transformer is supplied into a multi-layer perceptron (MLP) for prediction. Sanida et al. [13] used the VGGNet network based on transfer learning to increase the accuracy of tomato disease identification algorithms, outperforming other state-of-the-art approaches in the test set. Zhang et al. [14] proposed a Multi-channel Automatic Orientation Recurrent Attention Network (M–AORANet) to extract abundant disease features, with the goal of addressing the issues of noise easily generated during image acquisition and transmission of tomato diseases, as well as the intra-class variability and intra-class similarity of leaf diseases. Meanwhile, the noise interference in the photos was minimized using an asymptotic non-local mean approach. For the detection of apple leaf diseases, Zheng et al. [15] developed an effective lightweight model (RepDI) based on structural reparameterization. In order to achieve better detection performance, the model embeds the parallel expansion attention mechanism module and applies depth-separable convolution and structural reparameterization approaches. In an effort to improve the accuracy of strawberry disease identification, Li et al. [16] proposed the spatial convolutional self-attention-based transformer (SCSA-Transformer), which makes use of Multi-Head Self-Attention (MSA) to capture feature dependencies over long distances in images of strawberry disease. Chen et al. [17] proposed MS-DNet, a novel lightwei-ght network architecture that achieves an average accuracy of 98.32% for the identification of various crop disease kinds. It is minimal in size and has a rapid computing speed for the network model. Gao et al. [18] suggested a backbone network BAM-Net based on aggregate coordinate attention mechanism (ACAM) and multi-scale feature refinement module (MFRM) to diagnose apple leaf diseases.

As a consequence of continuous advancements in deep learning technology for crop disease diagnosis and detection, research on various disease types in many crops has gradually changed, and people have tried to use the method to solve a range of issues. Hu et al. [19] proposed a technique for identifying tea diseases that relies on an enhanced deep convolutional neural network (CNN). With an average recognition accuracy of 92.5%, a multi-scale feature extraction module was introduced to the upgraded CIFAR10 speed model deep CNN to enhance the automatic feature extraction from various tea disease photos. In 2022, Hu et al. [20] proposed employing weight initialization and diseased leaf segmentation techniques in conjunction with the multi-convolutional neural network model MergeModel to automatically diagnose tea diseases in small samples. The outcomes demonstrated that MergeModel could successfully discriminate between healthy and sick tea leaves as well as recognize common tea diseases as sooty mold, tea white scab, tea leaf blight, and tea red scab. Datta et al. [21] proposed utilizing deep CNN with several hidden layers to categorize sick tea leaves into distinct groups. In order to distinguish between three types of plant stress, Zhao et al. [22] presented a multistep method based on hyperspectral imaging and continuous wavelet analysis (CWA). The outcomes of the experiment demonstrated the effectiveness of hyperspectral imaging for plant phenotyping following pests and diseases.

SECTION III.

Materials and Methods

A. Data Acquisition

1) Tea Disease Dataset

The experimental data in this experiment were obtained from the Liu Bao Tea Trial Base of Wuzhou College, Wanxiu District, Wuzhou City, Guangxi Zhuang Autonomous Region, China, as shown in Figure 1. The data were collected in early–mid-2023, when tea tree diseases were in high incidence, and images of tea leaves were taken at different times and under different climatic conditions, which is conducive to the study of the impact of multiple factors on the identification of tea diseases in complex backgrounds and at the same time enhances the robustness of the data.

FIGURE 1. - Wuzhou College Liu Bao Tea Trial Base.
FIGURE 1.

Wuzhou College Liu Bao Tea Trial Base.

The collection and shooting tools were Nikon Z5 camcorder and smart phone (the iPhone 14 Pro). The images of tea leaf blight, tea brown leaf spot, tea red scab, and healthy tea leaf were collected, and 658 images were randomly selected for the recognition study. The images were saved in JPG format, and some of the examples are shown in Figure 2. Due to the uncertainty of disease production, the number of image data collected was not evenly distributed, as shown in Table 1.

TABLE 1 Details of the Number of Image Data
Table 1- 
Details of the Number of Image Data
FIGURE 2. - Images of tea leaf disease samples.
FIGURE 2.

Images of tea leaf disease samples.

2) Public Dataset

The public dataset Plant Village [23] was used for model pre-training, which contains a total of 54,305 images of 26 diseases on 14 plant species. An example of the Plant Village dataset images is shown in Figure 3.

FIGURE 3. - Example of plant village dataset images.
FIGURE 3.

Example of plant village dataset images.

3) Data Enhancement

In order to improve the accuracy and robustness of the model classification, this study performs data enhancement on the collected sample data. The main data enhancement means used are as follows:

  1. The adjustment of image dimness was processed using three different image brightness adjustments to simulate different sunlight conditions;

  2. Rotate the image at different angles, 90° clockwise, 180° clockwise, and 270° clockwise, to simulate different shooting angles;

  3. Performs horizontal flip processing on the image;

  4. Noise processing, pretzel noise, and Gaussian noise were added in turn to simulate different shot qualities;

  5. Adjusts the image chroma;

  6. Adjusts the image contrast;

  7. Adjusts the image sharpness;

  8. A random erasure [24] operation was performed on the images to simulate occlusions that occurred during the shooting process.

We divided the original data into training set, validation set and testing set in the ratio of 7:1:2. Among them, the data in the training set was data-enhanced using the above enhancement strategy, and an example of data enhancement is shown in Figure 4. The same segmentation strategy is used for public dataset.

FIGURE 4. - Example of sample images after data enhancement.
FIGURE 4.

Example of sample images after data enhancement.

B. Experimental Methodology

1) Vanilla Neural Architecture

With the development of artificial intelligence (AI) chip technology, factors such as the complexity and depth of the neural network design are now major determinants of the inference speed of neural networks. Huawei Noah’S Ark Lab has proposed a network architecture called VanillaNet [25], which is detailed in Figure 5. The completely connected layer, main body, and stem comprise the architecture. Nevertheless, VanillaNet minimizes the number of levels by using only one layer at a time to construct a very simple network topology. Reducing the neural network’s computational complexity and speeding up inference are the goals.

FIGURE 5. - VanillaNet-6 network architecture diagram.
FIGURE 5.

VanillaNet-6 network architecture diagram.

The above figure shows the VanillaNet architecture with 6 layers as an example. For the stem, a 4\times 4 \times 3\times \text{C} convolutional layer with a step size of 4 is used to map an image with 3 channels to features with C channels. At stages 1, 2, and 3, a maximum pooling layer with a step size of 2 is used to reduce the size and the number of feature maps and increase the number of channels by 2. At stage 4, the number of channels is not increased because it is followed by an average pooling layer. The last layer is a fully connected layer that is used to output the classification results. Each convolutional layer has a convolutional kernel size of 1\times 1 , with the goal of using minimal computational cost while maintaining feature map information. An activation function is applied after each 1\times 1 convolutional layer. Batch normalization is added after each layer.

2) Attention Mechanisms Module

The Channel Reconstruction Unit (CRU) [26] is mainly designed to be able to make full use of the redundant information in the feature channel. The structure is shown in Figure 6, and its main idea is to adopt a three-step strategy, i.e., Split-Transform-and-Fuse strategy.

FIGURE 6. - The architecture of channel reconstruction unit.
FIGURE 6.

The architecture of channel reconstruction unit.

In the split section, for an initial feature X (given an intermediate feature mapping \mathrm {X\in }\mathbb {R}^{\mathrm {N \times C \times H \times W}} , where N is the batch axis, C is the channel axis, and H and W are the spatial height and width axes), split into \alpha \text{C} channels and (1-\alpha)\text{C} channels, where \alpha (0\le \alpha \le1 ) is the split ratio. Subsequently, a 1\times 1 convolution is utilised to compress the channels of the feature map, while a squeezing ratio r is introduced to control the feature channels, where r is set to 2. After the segmentation and squeezing operations, the spatially refined feature X is divided into an upper Xup and a lower Xlow.

In the transform section, Xup is fed into the upper transform stage to act as a “Rich Feature Extractor” and use group-wise convolution (GWC) (set group size g = 2 in the experiments) and point-wise convolution (PWC) to extract high-level representative information. GWC reduces the number of parameters and computations but cuts off the flow of information between groups of channels.

PWC compensates for the loss of information and helps the information flow between functional channels. After that, the outputs are summed to form a merged representative feature map, Y1. The formula for the above stages is expressed as:\begin{equation*} Y_{1}=M^{G}X_{up}+M^{P_{1}}X_{up} \tag{1}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where \text{M}^{\mathrm {G}} and \text{M}^{\mathrm {P}}_{1} are the learnable weight matrices of GWC and PWC, and Xup and Y1 are the upper input and output feature maps, respectively. Xlow is sent to the lower transformation stage, where 1\times 1 PWC operations are applied to generate a feature map with shallow hidden details. The generated and reused features are concatenated to form the feature map Y2. The formula for the above stages is expressed as:\begin{equation*} Y_{2}=M^{P_{2}}X_{low}\cup X_{low} \tag{2}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \text{M}^{\mathrm {P}}_{2} is the learnable weight matrix of PWC, \cup is the cascade operation, and Xlow and Y2 are the low-level input and output feature maps, respectively.

In the fuse section, the output features Y1 and Y2 from the up-and-down conversion stages are adaptively merged by using the simplified SKNet method [27]. Global average pooling is first applied to select the global spatial information Sm with channel statistics. The equation is expressed as:\begin{align*} S_{m}=Pooling\left ({Y_{m} }\right)\!=\!\!\frac {1}{H\times W}\sum \nolimits _{i=1}^{H} \sum \nolimits _{j=1}^{W} {Y_{c}(i,j)}, m=1, 2 \tag{3}\end{align*}

View SourceRight-click on figure for MathML and additional features. Next, the upper and lower global channel descriptors S1, S2 are stacked together, and the feature importance vectors \beta _{1} , \beta _{2} are generated using the channel soft attention operation. The equation is expressed as:\begin{equation*}\beta _{1}=\frac {e^{S_{1}}}{e^{S_{1}}+e^{S_{2}}}, \beta _{2}=\frac {e^{S_{2}}}{e^{S_{1}}+e^{S_{2}}}, \beta _{1}+\beta _{2}=1 \tag{4}\end{equation*}
View SourceRight-click on figure for MathML and additional features.

Finally, merging the upper feature Y1 and the lower feature Y2 yields the channel refinement feature Y.

3) Proposed Model Structure

In this study, a novel neural network model structure for tea disease recognition, VCRUNet, is proposed, and Figure 7 illustrates the model structure of VCRUNet. In order to improve the recognition accuracy of the model, the model introduces the attention mechanism module CRU on the basis of the VanillaNet-13 model to strengthen the model’s ability to extract the features of tea diseases.

FIGURE 7. - The architecture of VCRUNet.
FIGURE 7.

The architecture of VCRUNet.

Firstly, a batch of tea leaf disease images is input with a resolution of 224\times 224 \times 3 . After the first layer of convolution, it enters into Block1. After entering, it first passes through the CRU module before entering the next convolutional layer, and after that, {Block2, Block3, Block4, Block5, Block6, Block7, Block8, Block9, Block10, Block11} 10 modules perform the same operation and are finally fully connected to output the disease type of the image. As the depth of the network increases, CRU is still able to maintain the characterization capability of each convolutional module, which improves the performance and stability of the network and allows the entire neural network to be trained with high accuracy.

4) Tea Disease Recognition Model Based on Transfer Learning

Deep convolutional neural network models are larger and more accurate than traditional machine learning methods, but deep convolutional neural networks usually require a large amount of training data. The premise of using this method to solve the problem of tea disease identification is that we must have a large amount of data on tea diseases, and in practice, this data requirement limits the application of this method. Transfer learning can effectively improve the performance of models by utilizing existing knowledge, reducing the amount of data and computational overhead, and it is also widely used in agriculture [28], [29], [30], [31]. Thus, we use transfer learning to apply the knowledge learned from the source target domain to the target domain, where we transfer the trained network model on the Plant Village dataset to the tea disease dataset and realize the reuse of the network model parameters and weights on the tea disease dataset by fine-tuning the parameters during the model training process. In this case, the model transfer process is shown in Figure 8.

FIGURE 8. - Diagram of model transfer process.
FIGURE 8.

Diagram of model transfer process.

SECTION IV.

Results

A. Experimental Design

In our experiments, all image processing as well as model training were implemented using Anaconda3 (Python 3.7.16) and PyTorch. The experimental hardware environment consists of Intel (R) Core(TM) i9-13900K with 128 GB of RAM and NVIDIA GeForce RTX 4090 graphics card for model training and testing, as detailed in Table 2. This experiment mainly includes comparison experiments on different network models, comparison experiments with our proposed model, model transfer, and ablation experiments between the public dataset and our own established tea diseases dataset.

TABLE 2 Experimental Environment Configuration Parameters
Table 2- 
Experimental Environment Configuration Parameters

Throughout the model training kind, the number of training rounds was set to 100, and the batch size and learning rate were set to 64 and 0.01, respectively. Other detailed hyperparameters are shown in Table 3.

TABLE 3 Model Parameter Setting
Table 3- 
Model Parameter Setting

The use of cutting-edge Convolutional Neural Networks (CNNs) in computer vision applications has recently gained popularity in research. To evaluate the performance of our proposed method, we select different model networks for experimental comparison, including ResNet [32], MobileNetV2 [33], Swin-Transformer [34], Vision-Transformer [35], and VanillaNet. The reason for using these networks for comparison is that these network models are widely used in the field of disease identification, and it is more characteristic and convincing to use the experimental results of these network models as the initial baseline. Therefore, the models were tested after 100 epochs of training, and their results were evaluated while comparing them with the method proposed in this study.

B. Evaluation Metrics

In order to show the performance of the network in this study, accuracy, precision, recall, and F1-Score were used to measure the performance of the network model in recognizing tea diseases. The formulas for these measures are given below:\begin{align*}Accuracy&=\frac {TP+TN}{TP+FP+TN+FN} \tag{5}\\ Precision&=\frac {TP}{TP+FP} \tag{6}\\ Recall&=\frac {TP}{TP+FN} \tag{7}\\ F1-Score&=2\times \frac {Precision\times Recall}{Precision+Recall} \tag{8}\end{align*}

View SourceRight-click on figure for MathML and additional features. whereas false positives (FP) represent the number of incorrect samples predicted to be incorrectly incorrect, true negatives (TN) represent the number of incorrect samples predicted to be in the correct category, and false negatives (FN) represent incorrect identification, true positives (TP) represent the number of correct samples predicted to be in the correct category.

C. Model Classification Results and Performance Analysis

The tea diseases test dataset was used to evaluate the performance of all the models, and Table 4 shows the recognition accuracy results of all the models. The VCRUNet proposed in this study performs the best among all the models, with a recognition accuracy of 92.48%. The VCRUNet model adds the channel attention mechanism CRU to the VanillaNet13 model, which improves the feature extraction ability of the network and makes the recognition effect optimal. Experimental results showed an improvement in identification accuracy of 1.5% compared to ResNet18, ResNet50, and Swin-Transformer-Tiny, an increase of 2.25% compared with MobileNetV2 and Swin-Transformer-Small, and an improvement of 0.75% and 3.76% in comparison with Vision-Transformers and VanillaNet-13.

TABLE 4 Disease Identification, Accuracy, Recall, Precision, F1-Score for Different Models
Table 4- 
Disease Identification, Accuracy, Recall, Precision, F1-Score for Different Models

Therefore, based on the experimental results, we chose VanillaNet-13 with the highest accuracy rate as the basic framework of the model for improvement and compared the recognition accuracy of different models for tea diseases in complex backgrounds by introducing the attention mechanism CRU, which proves that the model can better extract the disease features in the image and enhance the feature expression ability.

The network achieves the best results in all the performances compared to the traditional CNN network. The recall for the VCRUNet model was 92.37%, up 2.15% compared to ResNet18 and ResNet50, up 2% compared with MobileNetV2, up 2.17% relative to Swin-Transformer-Small, up 1.79%. Compared to Swin-Transformer-Tiny, up 0.78% from Vision-Transformers, up 3.96% against VanillaNet-6, and up 0.21% with VanillaNet-13.

The precision of the VCRUNet model was 92.27%, up 0.45% from ResNet18, up 1.09% compared to ResNet50, up 0.75% against MobileNetV2, up 2.6% over Swin-Transformer-Small, up 1.78% relative to Swing-Transformers-Tiny, up 0.57%.

Compared to Vision Transformers, up 4.52% compared with VanillaNet-6 and up 0.3% with VanillaNet-13.

The F1-Score of the VCRUNet model was 92.32%, up 0.26% compared to the best-performing VanillaNet-13. The above data suggests that we have the best of all model performance evaluation indicators, so our method has a clearer advantage in identifying tea diseases.

The recognition time of the VCRUNet model is 4.5s/100 images, which is 2.16s different from the fastest, but 3.76% more accurate. Meanwhile, when compared with other models, it is found that the recognition time is not much different.

Our model was first pre-trained using the public dataset Plant-Village, and the validation set accuracy and loss values of different network models on the Plant-Village dataset are shown in Figure 9(a) and Figure 9(b). The validation set accuracy and loss values of different network models on the tea diseases dataset are shown in Figure 9(c) and Figure 9(d). The experimental results show that the VCRUNet model proposed in this study has the fastest convergence speed, and while combining with other evaluation indexes, it can be concluded that the network structure proposed in this paper has more obvious advantages.

FIGURE 9. - Diagrams of model recognition accuracy and loss value for different models. (a) Accuracy change curves for the Plant-Village dataset; (b) Loss change curves for the Plant-Village dataset; (c) Accuracy change curves for the tea diseases dataset; (d) Loss change curves for the tea diseases dataset.
FIGURE 9.

Diagrams of model recognition accuracy and loss value for different models. (a) Accuracy change curves for the Plant-Village dataset; (b) Loss change curves for the Plant-Village dataset; (c) Accuracy change curves for the tea diseases dataset; (d) Loss change curves for the tea diseases dataset.

The confusion matrix is one of the methods used to evaluate the effectiveness of classification models. In order to verify the performance improvement brought by the migration learning to the models in this study, confusion matrices are produced for different network models on the test set of tea diseases, and the horizontal coordinates of the confusion matrices indicate the true values and the vertical coordinates indicate the predicted values. For the test set of four images of tea leaf blight, tea brown leaf spot, tea red scab, and healthy tea leaf, experiments were carried out and confusion matrices were produced, and the results of the experiments are shown in Figure 10. As can be seen from the figure, (a) – (h) are the comparative model recognition effects, and (i) is the VCRUNet recognition effect. Our proposed method is more accurate in classifying each tea diseases data.

FIGURE 10. - Confusion matrix of different models in the tea diseases test set. (a) ResNet18; (b) ResNet50; (c) MobileNetV2; (d) Swin-Transformer- Small; (e) Swin-Transformer-Tiny; (f) Vision-Transformer; (g) VanillaNet-6; (h) VanillaNet-13; (i) VCRUNet. D1 is tea leaf blight; D2 is tea brown leaf spot; D3 is tea red scab; D4 is healthy tea leaf.
FIGURE 10.

Confusion matrix of different models in the tea diseases test set. (a) ResNet18; (b) ResNet50; (c) MobileNetV2; (d) Swin-Transformer- Small; (e) Swin-Transformer-Tiny; (f) Vision-Transformer; (g) VanillaNet-6; (h) VanillaNet-13; (i) VCRUNet. D1 is tea leaf blight; D2 is tea brown leaf spot; D3 is tea red scab; D4 is healthy tea leaf.

Receiver Operating Characteristic (ROC) graphs help to get a clearer assessment from the numerical results. The ROC curves for all models are shown in Figure 11. Based on the area analysis of micro-average ROC curves it is concluded that the Area Under roc Curve (AUC) region of VCRUNet is maximum, which indicates that the combination of VanillaNet model structure combined with Attention Mechanism CRU module helps in the problem of classification of tea diseases.

FIGURE 11. - ROC graphs for different models. (a) ResNet18; (b) ResNet50; (c) MobileNetV2; (d) Swin-Transformer-Small; (e) Swin-Transformer-Tiny; (f) Vision-Transformer; (g) VanillaNet-6; (h) VanillaNet-13; (i) VCRUNet.
FIGURE 11.

ROC graphs for different models. (a) ResNet18; (b) ResNet50; (c) MobileNetV2; (d) Swin-Transformer-Small; (e) Swin-Transformer-Tiny; (f) Vision-Transformer; (g) VanillaNet-6; (h) VanillaNet-13; (i) VCRUNet.

D. Model Feature Extraction Visualization Results

To better investigate the interpretability of VCRUNet, we used Grad-CAM [36] to visualize the class activation map (CAM) of VCRUNet, as shown in Figure 12. Based on the heat map display, it is known that the more extensive the red areas in the image, the more attention the model pays to this part of the image. If the heat map color is blue, it indicates that this region is prone to redundancy. In CAM, all activation regions of the VCRUNet model were located in the diseased region of the leaf, with more accurate region coverage than the other models. From the figure, it can be seen that the tea diseases recognition model VCRUNet proposed in this paper can better recognize tea diseases, and the added CRU structure has an obvious effect on the improvement of model performance.

FIGURE 12. - CAM for different models. (a) Original figure; (b) ResNet18; (c) ResNet50; (d) MobileNetV2; (e) Swin-Transformer-Small; (f) Swin-Transformer-Tiny; (g) Vision-Transformer; (h) VanillaNet-6; (i) VanillaNet-13; (j) VCRUNet.
FIGURE 12.

CAM for different models. (a) Original figure; (b) ResNet18; (c) ResNet50; (d) MobileNetV2; (e) Swin-Transformer-Small; (f) Swin-Transformer-Tiny; (g) Vision-Transformer; (h) VanillaNet-6; (i) VanillaNet-13; (j) VCRUNet.

SECTION V.

Discussion

The complex background of tea diseases images collected in natural environments and the lack of large-scale dataset construction lead to low recognition accuracy for many models. In this study, a minimalism neural network model, VCRUNet, which incorporates a minimalism neural network, VanillaNet, and a CRU, is proposed as an effective solution for recognizing tea diseases in complex backgrounds. Meanwhile, in order to cope with the problem of a small amount of data, the data enhancement technique was first used to expand the disease dataset and then utilized the public dataset for transfer learning to get the best recognition results. However, the small amount of data is still a common problem encountered in disease identification and detection, and the time period of disease occurrence is not centralized. The difficulty of data collection is also an important factor contributing to the problem; therefore, it is also necessary to construct a publicly available large-scale dataset of tea diseases. In future work, data on common diseases on tea collected and expanded.

Secondly, in order to further improve the usefulness of VCRUNet, we will further reduce the number of model parameters and computations in our future work. In addition, during the period of high incidence of diseases in tea plantations, the work of grading the degree of disease is also an important issue that needs to be addressed urgently, so we will also carry out related work such as disease severity assessment after collecting enough data to be able to manage and control the occurrence of tea diseases in a timely manner.

SECTION VI.

Conclusion

The aim of this study is mainly to solve the problem of identifying tea diseases in complex backgrounds and to provide a key method for promoting the income of tea farmers and the intelligent management of tea gardens. In this study, we have proposed VCRUNet, a model for recognizing tea diseases in complex backgrounds based on the minimalist neural network VanillaNet. The study constructed a dataset containing 658 images of three common tea diseases and healthy leaves and introduced various data enhancement strategies. Then, the channel redundancy between features in the convolutional neural network is reduced by introducing CRU, which improves the recognition accuracy of the model. Finally, the public dataset Plant Village dataset was used for pre-training to obtain a pre-training model, and then the self-built tea diseases dataset was used on the pre-training model to fine-tune the model parameters by means of transfer learning to mitigate the effects of overfitting and other effects brought about by the small amount of data. The experimental results show that the proposed method is superior to the classical methods based on ResNet18, ResNet50, MobileNetV2, Swin-Transformer-Small, Swin-Transformer-Tiny, Vision-Transformer, VanillaNet-6, and VanillaNet-13. The recognition accuracy can reach 92.48%, and the recognition time is 4.5s per 100 pictures, which can be effectively applied to the process of tea diseases control.

References

References is not available for this document.