Plant Disease Detection and Classification by Deep Learning—A Review

Deep learning is a branch of artificial intelligence. In recent years, with the advantages of automatic learning and feature extraction, it has been widely concerned by academic and industrial circles. It has been widely used in image and video processing, voice processing, and natural language processing. At the same time, it has also become a research hotspot in the field of agricultural plant protection, such as plant disease recognition and pest range assessment, etc. The application of deep learning in plant disease recognition can avoid the disadvantages caused by artificial selection of disease spot features, make plant disease feature extraction more objective, and improve the research efficiency and technology transformation speed. This review provides the research progress of deep learning technology in the field of crop leaf disease identification in recent years. In this paper, we present the current trends and challenges for the detection of plant leaf disease using deep learning and advanced imaging techniques. We hope that this work will be a valuable resource for researchers who study the detection of plant diseases and insect pests. At the same time, we also discussed some of the current challenges and problems that need to be resolved.


I. INTRODUCTION
The occurrence of plant diseases has a negative impact on agricultural production. If plant diseases are not discovered in time, food insecurity will increase [1]. Early detection is the basis for effective prevention and control of plant diseases, and they play a vital role in the management and decisionmaking of agricultural production. In recent years, plant disease identification has been a crucial issue.
Disease-infected plants usually show obvious marks or lesions on leaves, stems, flowers, or fruits. Generally, each disease or pest condition presents a unique visible pattern that can be used to uniquely diagnose abnormalities. Usually, the leaves of plants are the primary source for identifying plant diseases, and most of the symptoms of diseases may begin to appear on the leaves [2].
In most cases, agricultural and forestry experts are used to identify on-site or farmers identify fruit tree diseases and pests based on experience. This method is not only subjective, but also time-consuming, laborious, and inefficient.
The associate editor coordinating the review of this manuscript and approving it for publication was Yongming Li . Farmers with less experience may misjudgment and use drugs blindly during the identification process. Quality and output will also bring environmental pollution, which will cause unnecessary economic losses. To counter these challenges, research into the use of image processing techniques for plant disease recognition has become a hot research topic. The general process of using traditional image recognition processing technology to identify plant diseases is shown in Fig. 1. Dubey and Jalal [3] used the K-means clustering method to segment the lesions regions, and combined the global color histogram (GCH) color coherence vector (CCV) local binary pattern (LBP), and completed local binary pattern (CLBP) was used to extract the color and texture features of apple spots, and three kinds of apple diseases were detected VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ and identified based on improved support vector machine (SVM), and the classification accuracy reached 93%.
Chai et al. [4] studied four tomato leaf diseases, including early blight and late blight leaf mildew and leaf spot, and extracted 18 characteristic parameters such as color, texture, and shape information of tomato leaf spot images, using stepwise discriminant and Bayesian discriminant principal component analysis (PCA), respectively. Principal component analysis and fisher discriminant methods were used to extract the characteristic parameters and construct the discriminant model. The accuracy of the two methods reached 94.71% and 98.32%, respectively. Li and He [5] selected 5 kinds of apple leaf diseases (speckled deciduous disease, yellow leaf disease, round spot disease, mosaic disease, and rust disease) as the research objects. By extracting 8 features of the apple leaf spot image, such as color, texture, and shape. The BP neural network model was used to classify and recognize the diseases, and the average recognition accuracy reached 92.6%.
Guan et al. [6] extracted 63 parameters including morphology, color, and texture features of rice leaf disease spots, and applied step-based discriminant analysis and Bayesian discriminant method to classify and recognize three rice diseases (blast, stripe blight, and bacterial leaf blight) with the highest recognition accuracy of 97.2%.
In short, it can be concluded that studies on plant disease recognition based on traditional image processing technology have achieved certain results, with high accuracy of disease recognition, but there are still deficiencies and limitations as follow: 1) The research links and processes are cumbersome, highly subjective, time-consuming and labor-consuming; 2) It is heavily dependent on spot segmentation; 3) It is heavily dependent on artificial feature extraction; 4) It is difficult to test the disease recognition performance of the model or algorithm in more complex environments.
Therefore, it is of great significance to realize intelligent, rapid, and accurate plant leaf disease recognition.
In recent years, deep learning technology in the study of plant disease recognition made more progress. Deep learning (DL) technology in the face of the user is transparent, the researchers of plant protection and statistics professional level is not high, can be automatically extracted image features and classification of plant disease spot, eliminating the traditional image recognition technology of feature extraction and classifier design a lot of work, can express original image characteristics, has the characteristics of the end-toend. These characteristics make deep learning technology in plant disease recognition-obtained-widespread attention, and it has become a hot research topic. This is due to three factors: the availability of larger datasets, the adaptability of multicore graphics processing units (GPUs), and the development of training deep neural networks and supporting software libraries, such as the computing unified device architecture (CUDA) from NVIDIA.
Recently, the convolutional neural networks (CNN), a special of deep learning techniques, are quickly becoming the preferred methods [7]. CNN is the most popular classifier for image recognition, and it has shown outstanding ability in image processing and classification [8]. Deep learning approaches were first introduced in plant image recognition based on leaf vein patterns [9]. They used 3-6 layers CNN classified three leguminous plant species: white bean, red bean, and soybean. Mohanty et al. [10] trained a deep learning model to recognize 14 crop species and 26 crop diseases. The trained model achieved an accuracy of 99.35% on the test set. Ma et al. [11] used a deep CNN to conduct symptom-wise recognition of four cucumber diseases (i.e., downy mildew, anthracnose, powdery mildew, and target leaf spots). The recognition accuracy reached 93.4%. Kawasaki et al. [12] introduced a system based on CNN to recognize cucumber leaf disease, which realized an accuracy of 94.9%.
Although very good results have been reported in the literature, however, the diversity of the used datasets is limited. Large datasets (comprised of thousands of images) are required for the training of CNNs. Unfortunately, for plant leaf disease recognition, such large and diverse datasets have not yet been collected for use by researchers. At present, transfer learning is the most effective way to train the robustness of CNN classifiers for plant leaf disease recognition. Transfer learning enables the adaptation of pre-trained CNNs by retraining them with smaller datasets whose distribution is different from the larger datasets previously used to train the network from scratch [13]. Indeed, it is effective that using CNN models pre-trained on the ImageNet dataset and then retraining them for leaf disease recognition. Therefore, the combination of deep learning and transfer learning provides a new way to solve the problem of limited datasets of plant diseases.
There are some research papers previously presented to summarize the research about agriculture (including plant disease recognition) by DL [8], [14], but they lacked some of the recent developments in terms of visualization techniques implemented along with the DL and modified the famous DL models, which were used for plant disease identification.
The article [15] presented many imaging techniques for plant disease detection, and the focus was on imaging techniques. The major techniques presented for plant diseases and classification are SVM, K-means, and KNN.
The article [16] presented many developed/modified DL architectures implemented to detect and classify plant diseases. And provided a comprehensive explanation of DL models used to visualize various plant diseases. But there is no mention of the early detection of the diseases and how to detect and classify plant diseases based on small samples.
In the paper [17], the authors had presented a comprehensive review of recent research work done in plant disease recognition using IPTs, from the perspective of feature extracted based on hand-crafted or using deep learning techniques. And it is concluded that the deep learning techniques have superseded shallow classifiers trained using hand-crafted features. But they lacked some of the recent developments in terms of visualization techniques, and there is no mention of the early detection of the diseases and how to detect and classify plant diseases based on small samples. This paper aim at the shortcomings of the existing review papers on disease detection, we provide a review of recent studies carried out in the area of plant leaf disease recognition using image processing, hyper-spectral imaging, and deep learning techniques. We hope that this work will be helpful for researchers in the area of plant leaf disease recognition using DL methods.
The rest of this paper is organized as follows. In Section 2, review some basic knowledge including deep learning concept, foundation, framework, development history, model evaluation criteria, the plant leaves disease datasets, and the data enhancement methods, etc. In Section 3, we review research work done so far towards the application of deep learning in crop leaves disease recognition from some aspects. In Section 4, plant disease detection based on small sample data set is discussed. In Section 5, some applications of hyper-spectral imaging in plant disease detection are discussed. Section 6, summarizes and discusses gaps in the existing literature that need to be addressed. The second generation of neural network-back propagation (BP) (1986∼1998): Hinton invented the BP algorithm suitable for multi-layer perceptron (MLP) in 1986 and adopted sigmoid function for nonlinear mapping, which effectively solved the problem of nonlinear classification and learning. This method caused the second upsurge of neural networks. However, in 1991, the BP algorithm was pointed out that there was a gradient vanishing problem.

II. BASIC KNOWLEDGE OF DEEP LEARNING
The third generation neural network-DL(2006-present): In 2006, Hinton gradient disappeared in the deep web training are put forward in this problem solution, but because there is no special effective experimental verification and no attention. It was not until 2011 that the ReLU activation function (the activation function that can effectively restrain the gradient disappeared problem) was put forward, then enter the outbreak period in 2012, in the famous ImageNet image recognition contest, the Hinton team used a deep learning model-AlexNet to win, and far more than the second method (SVM). Since then CNN has attracted the attention of many researchers.
After the introduction of AlexNet [19], the DL architecture began to evolve over time as shown in VI. Many advanced DL models/architectures were used for image detection, segmentation, and classification, and these architectures were successively applied to plant disease detection.

B. DISEASE DATASETS
Common diseases datasets are: 1) P1antVillage, an open dataset, has now collected 54309 plant leaves disease images, covers 14 kinds of fruit and vegetable crops, such as apple, blueberry cherries, grapes, orange peach bell pepper potato raspberry soybean pumpkin strawberry, and tomatoes, corn contains 26 diseases (17 kinds of fungal disease, 4 kinds of bacteria disease, 2 kinds of mycosis, 2 kinds of viral diseases and 1 kind of diseases caused by mite), also includes 12 healthy crop leaf images. 2) 'Plant Pathology Challenge' for CVPR 2020-FGVC7 (https://www.kaggle.com/c/plantpathology -2020 fgvc7), it consists of 3,651 high-quality annotated RGB images of 1,200 apple scab and 1,399 cedar apple rust symptoms and 187 complex disease patterns (the leaves with more than one disease in the same leaf) and 865 healthy apple leaves.
3) While others constitute datasets of real images collected by the authors for their research needed(corn, tea, soybeans, cucumbers, apples, grapes). 4) Growing the plants themself and inoculating them with the virus, the method of data acquisition is commonly seen in applications that use hyperspectral images for disease detection.

C. DATA AUGMENTATION
In leaf disease detection, collection and label a large number of disease images require lots of manpower material resources and financial resources. For some certain plant diseases, their onset period is shorter, it is difficult to collect them. In the field of deep learning, the small sample size and dataset imbalance are the key factors leading to the poor recognition effect. Therefore, the deep learning model for leaf disease detection, expand the amount of data is necessary. Data augmentation to meet the requirements for the practical application, and not at liberty to expand (the color is one of the main manifestations of different diseases, for example, when doing image enhancement can't change the color of the original image). There are two common ways to augment the datasets.

1) TRADITIONAL AUGMENTATION
The typical methods are the physical expansion method (tensile rotation adjustment resolution image translation disturbance, etc.), web crawler, variational auto-encoder (VAE), and autoregressive model, etc. The shortcomings of the VOLUME 9, 2021  produced samples by the traditional expansion method are poor quality, inadequate diversity, and unevenness.

2) GENERATE ADVERSARIAL NETWORKS (GANS)
GANs is a kind of generating model proposed by Goodfellow et al. [29] in 2014. Subsequently, many variations of GAN have emerged successively, such as DCGAN, CGAN, PGGAN, LAPGAN, InfoGAN, WGAN, F-GAN, SeqGAN, LeakGAN, etc. The major goal is to generate synthetic samples with the same characteristics as the given training distribution. The GANs models mainly consist of two parts, that is, generator and discriminator. The structure diagram is shown in Fig. 3.
Generative network approaches have been extensively used to generate samples in recent years. Nazki et al. [30] is the first work that uses GANs to synthetically augment the dataset to improve the plant disease recognition performance. By optimizing the activation reconstruction loss (ARL) function and put forward an improved AR-GAN, compared with most prominent existing models, the proposed model is introduced into composite images, and nine kinds of tomato on the test data set (2789), the results showed that the classification accuracy is significantly increased (+ 5.2%), compared with the classic way.
Tian et al. [31] proposed an approach (CycleGAN) that can generate more apple disease images. Generated images augmented by conditional deep convolutional generative adversarial networks (C-DCGAN) [32] use the segmented tea disease spot image as the input of VGG16. The result showed that the average accuracy is about 28% higher by using C-DCGAN than rotation and translation.
The article [33] generated images by using deep convolutional generative adversarial networks (DC-GAN), and achieved a top-1 average identification accuracy of 94.33% on GoogLeNet. The T-distribution random neighborhood embedding (T-SNE) verified that the image distribution generated by this method was closer to the sample distribution of the real image.
In the paper [34], four different kinds of grape leaf disease images were expanded by a novel Leaf GAN model. The experimental results showed that the Leaf GAN model could make the grape leaf disease images highlight the disease and generate enough grape leaf disease images. It was proved that Leaf GAN was superior to those of the DCGAN and WGAN.

D. VISUALIZATION TECHNIQUE
In recent years, the successful application of deep learning technology in plant disease classification provides a new idea for the research of plant disease classification. However, DL classifiers lack interpretability and transparency. The DL classifiers are often considered black boxes without any explanation or details about the classification mechanism. High accuracy is not only necessary for plant disease classification but also needs to be informed how the detection is achieved and which symptoms are present in the plant. Therefore, in recent years, many researchers have devoted themselves to the study of visualization techniques such as the introduction of visual heat maps and salient maps to better understand the identification of plant diseases. Among them, the works of [35] and [36] are crucial to understanding how CNN recognizes disease from images.
For example, Brahimi et al. [35] introduced saliency maps to visualize the symptoms of plant diseases. Mohanty et al. [10] used AlexNet and GoogLeNet architectures, through the precision (P), recall (R), F1 score, and the overall accuracy to evaluate the performance of the models on the PlantVillage. Used the three scenarios (color gray and segmentation) to assess the performance of the 2 CNN famous architectures, and come to the conclusion that GoogLeNet outperformed AlexNet, the first layer of the visual results clearly showed the disease spots also. In Cruz et al. [37], the improved LeNet model was used to detect olive plant diseases, that is, segmentation and edge maps were used to identify plant diseases. Brahimi et al. [38] proposed a new visualization method, that is, a new DL model teacher/student network was introduced to identify the spots of plant diseases, compared with the existing plant disease treatment methods, the new method obtained a clearer visualization effect.
According to the author Dechant et al. [39], using different CNN combinations, the visual heat map of maize disease images was used as the inputs, and the probability associated with the occurrence of a particular type of disease was given. The ROC curve was used to evaluate the performance of the model. In addition, the characteristic map of maize diseases was also drawn. Lu et al. [40] realized that wheat disease detection by using VGG-FCN and VGG-CNN model and visualized the module features. The results showed that the DMIL-WDDS based on VGG-FCN-VD16 achieved a progressive learning process for fine characteristics of the disease. The feature visualization was a good demonstration of what the DMIL-WDDS was learning. Moreover, the results indicated that Softmax aggregation was a superior choice for DMIL-WDDS to improve the recognition accuracy. Ha et al. [41] used the VGG-CNN model to test the blight of radish and used the k-means clustering method to show the disease markers. And the method was able to detect the individual infected areas. That is, the regions of healthy radish and moderate Fusarium wilt of radish were successfully detected by the method. The results showed that the method can also be applied to other crops and plants, including tomato, tobacco, banana, and etc.
Barbedo [42] explored the use of individual lesions and spots for the task, rather than considering the entire leaf, and by using the DL models to identify the plant diseases. The accuracy obtained using the approach was, on average, 12% higher than those achieved using the original images.
Ghosal et al. [43], developed a deep CNN framework to identify and classify 8 kinds of soybean stress. And also present an explanation mechanism, used the top-K highresolution feature maps that isolate the visual symptoms to make predictions. The unsupervised identification of visual symptoms provided a quantitative measure of stress severity, allowing for identification (a type of foliar stress), classification (low, medium, or high stress), without detailed symptom annotation by experts.
Lu et al. [44] used CNN to identify rice diseases, early disease detection, and the characteristic maps of disease spots were also obtained. Picon et al. [45] proposed an adapted algorithm based on a deep residual neural network to deal with the detection of multiple crop diseases in real conditions for early disease detection. And developed a mobile application in which heat maps were used to identify plant diseases. Obtained results reveal an overall improvement of the balanced accuracy up to 0.87 under exhaustive testing, and the accuracy greater than 0.96 on a pilot test performed in Germany.
Johannes et al. [46] used an algorithm based on heat map technology to extract the diseased objects. In addition, each heat map is described by two descriptors, one for evaluating the color information of the disease, and the other for identifying the texture of the heat map. The preliminary hot-spot detection and its ulterior description by color and textural descriptors allow real-time performance as only the suspicious regions are trained and described by the higher level classifiers and descriptors.
Khan et al. [47] proposed a new visualization technology using correlation coefficient and DL model (e.g., AlexNet and VGG16 architecture). Kerkech et al. [48] variety vegetation indices in color space combined with the LeNet model were used to detect the grape diseases. The article [49] for the reason of interpreting the deep learning model, compared with some of the most popular explanatory methods: significant figure, Smooth-Grad, boot back-propagation, depth Taylor decomposition, integration gradient layered associated transmission, and gradient time input. And trained the DenseNet121 network to identify eight different soybean stresses(biological and non-biological). And concluded that the interpretability methods identified the infected regions of the leaf as important features for some (but not all) of the correctly classified images.
Taken tea leaf diseases images(tea of 261 images of 5 kinds of common disease) in the complex background as the research object, Sun et al. [50] proposed a method combining simple linear iterative clustering (SLIC) and support vector machine (SVM), and gain a significant figure accurate tea leaf disease images, with 98.5% accuracy, precision is 96.8%, the recall rate was 98.6%, the F1 score was 97.7%. The results showed that the method can effectively extract tea leaves from complex background significant figure.
Hu et al. [51] put forward a kind of new convolution neural network model ARNet (Attention residual network) combining the attention mechanism with the residual idea, and the leaves of five tomato diseases in the early and late periods were studied. The results of the study concluded that, compared with the existed models such as VGG16, the ARNet had a better classification performance. The different layers of (Attention convolutional block, ACB), are visualized in the form of heat maps, and the attention information of the different layers obtained by the module are shown in Fig. 4. Fig. 4 (a) and (b) represent the output heat maps of late early blight disease and late leaf frost disease in the ACB module at different levels of the ARNET model respectively. Among them, the heat output of each type of image showed in line 1, and line 2 showed that heat superimposed on the original image, from left to right in turn for layer 2, 3, 4, and 5 layers of the last ACB output module. As you can see in Fig. 4, the ACB module can more accurately extract the key feature of each type of disease, shallow ACB module to extract the characteristics of relatively scattered, not as a category, but the deep ACB module to extract the characteristics of more concentrated which the color is more close to red, that is the corresponding place a greater contribution to the final classification decision.

III. LEAF DISEASE DETECTION BY DEEP LEARNING ARCHITECTURES
This section presents the recent researches done by using famous DL architectures for the identification and classification of leaf diseases. Moreover, there are some related works in which modified/improved versions of DL architectures were introduced to achieve better results and software development of disease identification systems. Since each disease region has its own characteristics, Barbedo [42] and Lee et al. [52], discussed the use of individual lesions and spots rather than considering the whole leaf. The advantages of this method were that occurrence of multiple diseases on the same leaf could be detected and the data can be augmented by cutting up the leaf image into multiple sub-images. The article [55] taken 79 diseases of 14 species of plants in the experimental environment and complex field environment as the research object and used the GoogLeNet model to identify diseases. The overall accuracy of using a single lesion and spot was 94%, which was higher than using the whole image (82%). Lee et al. [52] put forward a new view of leaf disease detection that focused on identifying diseases disease area method (i.e. by the common name of disease rather than crops -diseases on the target category), and through the experiments showed that whatever crops, the model training with the common disease were more universal, especially for the new data obtained in different fields or that crops have not been seen.
Qiu et al. [53] used the Mask-RCNN whose feature extraction network was ResNet50 or ResNet101 to detect the wheat disease areas, and the average accuracy on the test dataset was 92.01%.
Ahmad et al. [54] used four different pretraining convolution neural networks VGG19, VGG16, ResNet, and Inception V3, and the models were trained by fine-tuning parameters. The experimental results showed that the Inception V3 had the best performance on the two datasets(the laboratory dataset and the field dataset). And the average performance superior to 10% to 15% on the laboratory dataset compared with on the field dataset. Bi et al. [55] showed that the recognition accuracy rates of apple leaf spot and rust models collected by agricultural experts were 77.65%, 75.59%, and 73.50%, by using ResNet152, Inception V3, and MobileNet, respectively.
Jiang et al. [56] used the Mean Shift algorithm to segment four kinds of rice disease spot (red blight stripe disease to rice blast and sheath blight) at first, and then extract shape feature by artificial calculation (put forward three new shape characteristic lesions number N, S lesion area, number of lesions ratio R) and CNN extracts color feature, at last, the SVM classifier was used to identify the diseases, and the results showed that the CNN used segmentation algorithm accuracy was 92.75%, the accuracy was 82.26% without the segmentation algorithm, and the accuracy of the CNN in combination with the SVM model was 96.8%.
Liang et al. [57] established a dataset contains 2906 of the positive samples and 2902 of the negative samples to identify rice blasts. And the experimental results showed that the senior characteristics extracted from CNN than the traditional manual extraction of local binary pattern histogram (LBPH) and wavelet transform (Haar-WT) had better identification and effectiveness.
Huang et al. [58] put forward a kind of plant leaf image disease recognition method based on the neural structure search algorithm, the method can learn the structure of the neural network to the appropriate depth on the P1antVillage, automatically. According to the results of the studied methods on the dataset of imbalanced and balanced searched out a suitable network structure, and the recognition accuracy of the model was 98.96% and 99.01% respectively. However, if the balance of the gray images was not improved, the accuracy fell to 95.40%.
Long et al. [59] used AlexNet for 2 kinds of training, that is, training from scratch and transfer learning from the ImageNet to detect the camellia leaf diseases (4 kinds of diseases and healthy). The results showed that transfer learning can significantly improve the convergence speed and classification performance of the models, and the classification accuracy as high as 96.53%.
Xu et al. [60] in order to realize image recognition of corn leaf disease (healthy, leaf blight, rust) in complex field background with small samples, proposed a convolutional neural network model(VGG16) based on transfer learning. The weight parameters of the VGG16 model were trained on ImageNet and transferred to the model, and the average recognition accuracy was 95.33%.
The ResNet50 network pre-trained on ImageNet was used to study 4 types of apple leaf diseases in the Plant Pathology 2020 Challenge dataset, and the overall test accuracy of the model was 97%. But except for the complex disease pattern category (the combination of several disease symptoms), the recognition accuracy was only 51% [61].
Li et al. [62] used VGG16 and Inception V3 models to identify different degrees of Ginkgo biloba diseases, the accuracy of the VGG16 was 98.44% in the laboratory dataset and 92.19% in the field dataset. The accuracy of the Inception V3 model was 92.3% and 93.2%, respectively. Table 1 offers a brief overview of recent research works about the application of the DL framework directly.

2) NEW/MODIFIED DL ARCHITECTURES FOR LEAF-DISEASE DETECTION
Dechant et al. [39] integrated multiple CNN classifiers to study high-resolution corn disease images. The experimental results showed that when a single CNN classifier was used, the accuracy rate was 90.8%, when two first-level classifiers were used, the accuracy rate rise to 95.9%, and when three first-level classifiers were used, the accuracy rate was 97.8%.
Liu et al. [63] proposed a new CNN structure to identify the apple leaf disease. The network was formed by cascading an AlexNet-precursor network and an Inception network. The Inception network replaced the fully connected layers in the traditional AlexNet model, significantly reducing the number of trainable parameters, thereby reducing storage requirements. Use Nesterov's accelerated gradient (NAG) optimization algorithm instead of the stochastic gradient descent (SDG) algorithm to update the weights to improve the convergence speed. The performance of this network was compared with SVM, BP, AlexNet, GoogLeNet, ResNet20, and VGG16. The accuracy of these models were 68.73%, 54.63%, 91.19%, 95.69%, 92.76% and 96.32%, while the accuracy of the proposed AlexNet-precursor + Cascade-Inception network was 97.62%.
Picon et al. [45] in order to extract the detailed features of the wheat disease symptoms, the first 7 × 7 convolutional layer of the ResNet50 network was replaced with two 3 × 3 convolutions and the sigmoid activation function was used instead of the softmax layer for improvement. And used the improved ResNet50 network to detect the early three wheat diseases (septoria, tan spot, and rust), and achieved 96% accuracy on the balanced dataset.
For the existing deep network model existed problems such as a large number of parameters, long training time, high storage cost and computational cost, etc. Wang et al. [64] based on the ResNet18, by adding a multi-scale feature extraction module to change the residual layer connection method, decomposes the large convolution kernel and performs group convolution operations, and proposes an improved multi-scale residual (Multi-scale ResNet) model, which significantly reduced the model parameters, storage space and computing overhead. The accuracy rate of 95.95% was achieved on the PlantVillage dataset, and 93.05% was achieved in the self-collected dataset of 7 real environmental diseases.
Aiming at the problem that the current plant leaf disease recognition model is easily interfered with by shadows, occlusions, and light intensity, and the feature extraction is blind and uncertain, Ren et al. [65] and others had constructed a deconvolution-guided VGG network (Deconvolution-Guided VGGNet, DGVGGNet) model, which can identify plant leaf disease and segment disease spot. This model had a recognition accuracy of 99.19% for the 10 types of tomato leaf disease images in the PlantVillage dataset. The pixel accuracy and average intersection ratio of disease spots segmentation were 94.66% and 75.36%, respectively.
And it had good robustness in occlusion, low light, and other environments.
Guo et al. [66] designed a multi-receptive field recognition model based on AlexNet (Multi-Scale AlexNet) by removing the local response normalization layer of the AlexNet network, modifying its fully connected layers, and setting a multi-scale convolution kernel to extract features. The PlantVillage dataset and self-collected 7 kinds of tomato diseased leaves dataset are the research objects. The model reduced the memory requirements of the original AlexNet by 95.4%, and the average recognition accuracy of tomato leaf diseases and each disease in the early, middle, and late stages was up to 92.7%.
Fan et al. [67] added a batch standardization layer to the convolutional layer of the Faster R-CNN model, introduced a central cost function to construct a mixed cost function, and used a stochastic gradient descent algorithm to optimize the training model. They used 9 kinds of corn leaf diseases with complex backgrounds in the field as the research object. Under the same experimental environment, the improved method had an average accuracy increase of 8.86%, and a single image detection time was reduced by 0.139s; compared with the SSD algorithm, the average accuracy was 4.25% higher, and a single image detection time was reduced by 0.018 s.
Wang et al. [68] in order to solve the problems of a long time of training, poor segmentation effect, and susceptibility to illumination and background during the image segmentation of cucumber leaf lesions in traditional convolutional neural networks, they proposed a method based on the full convolution neural network (in which the activation function of rectified linear units (RELU) was replaced by the exponential linear unit (ELU), and the batch normalization function was used to stabilize the model training process, and the softmax of the original CNN was replaced with support vector machine (SVM)). The average pixel segmentation accuracy was 80.46% and the average cross-combination ratio was 70.43% on the 6 kinds of cucumber leaf disease dataset.
Hu et al. [51] tried to solve the problem of insufficient identification methods for fine-grained tomato diseases. Taken 5 kinds of tomato diseased leaves in the early and late stages as the research objects, and proposed a new convolutional neural network model ARNet based on the combination of attention and residual ideas. Compared with existing models such as VGG16, ARNet had better classification performance, with an average recognition accuracy rate of 88.2%.
Chen et al. [70] proposed an improved VGG model (INC-VGGN) based on the VGG model framework by introducing two Inception modules, adding a pooling layer, and modifying the activation function. And the average recognition accuracy of corn plant leaf diseases reached 92%.
Zhang et al. [71] combined the expansion convolution and global pooling for the problem of the AlexNet model with too many parameters and a single feature scale and proposed a global pooling extended convolutional neural network (GPDCNN) based on the AlexNet model. After the expansion, an accuracy of 95.18% was obtained on the dataset of 6 common cucumber leaf diseases taken in the field.
Due to the problem of low recognition accuracy of grape leaves with different degrees of disease, He et al. [72] proposed a Multi-Scale ResNet based on ResNet18 by changing the conv1 layer to a combination of multiple convolution kernels and adding the SENet module to ResNet18 to identify grape leaf disease. The model had an average recognition accuracy of 90.83% for seven grape diseases including different severity.
Agarwal et al. [73] developed a CNN model with 3 convolutional layers, 3 maximum pooling layers, and 2 fully connected layers, and each layer had a different number of filters to detect 9 types of tomato leaf diseases. The experimental results showed that the average accuracy of the proposed model on the test set reached 91.2%, and its performance was much better than VGG16, MobileNet, and Inception. Table 2 briefly introduces the research progress of the improvement of DL in plant disease detection in recent years.

B. TARGET DETECTION OF PLANT DISEASES FOR LEAF DISEASE DETECTION
Fuentes et al. [74] used Faster R-CNN, R-FCN, and SSD architectures to locate lesion areas of 9 kinds of tomato leaf diseases and insect pests, and classified them according to the bounding box. And explored the influence of different CNN architectures on the detector. The results showed that ResNet50 as the feature extractor achieved a mean average accuracy (mAP) of 85.98%, and the detection time was about 160 ms per image. Subsequent work [75] refined the Faster R-CNN by introducing a single-class CNN, and the results showed that the mAP increased by 13%.
Jiang et al. [76] proposed a novel method that is the SSD with inception module and rainbow concatenation (INAR-SSD). And the VGG16 feature extractor used in the INAR-SSD network was a modification by replacing two convolution layers (Conv4_1 and Conv4_2) with inception modules, fully connected layers of VGG16 were also replaced with 1 × 1 convolutions. On a dataset of 5 kinds of apple leaf diseases, the proposed INAR-SSD network achieved the highest mAP of 78.8% compared with the Faster R-CNN (73.78%) and SSD (75.82%). Meanwhile, the detection speed of the model was 23.13 FPS.
Li et al. [77] took 5 kinds of bitter gourd leaf diseases taken in the field as the research object, modified the Faster R-CNN by increasing the size of the regional suggestion frame and integrating the feature pyramid network (FPN) based on ResNet50. The research results showed that after integrating the feature pyramid network, the average accuracy of the obtained model was 86.39%, higher than the original model (7.54%), and the accuracy of gray spot detection was improved by 16.56%. The detection time of each image is 0.322s, which can guarantee real-time detection.
Aiming at the problem of difficulty in real-time detection of apple leaf disease images under actual conditions due to the complex background and small lesions, Li et al. [78] modified the Faster R-CNN by using the feature pyramid network (FPN) and adopting precise region of interest pooling (PROI Pooling). The research results showed that the improved model can effectively detect five apple leaf diseases under natural conditions, with a mean average accuracy of 82.28%. Compared with Faster R-CNN, YOLOv3, and Mask R-CNN, the mean average accuracy increased by 5.81%, 13.92%, and 4.86%, and the detection time of a single image was reduced by 43ms, respectively.
Li et al. [79] proposed a video detection architecture of plant diseases and insect pests based on deep learning and a custom backbone, which can better reflect the quality of video detection in experiments. Experiments showed that compared with VGG16, ResNet50, ResNet101 backbone systems, and YOLOv3, the custom backbone system was more suitable for detecting untrained rice videos. The custom DCNN backbone had eminent detection sensitivity in withered leaves of rice sheath blight and rice stem borer symptoms. And the detection speed was 30 frames per second (FPS).
Aiming at the problem of low segmentation accuracy of traditional convolutional neural networks in crop disease leaf images, Wang et al. [80] constructed a regional disease detection network (RD-net) based on the traditional VGG16 model and replaced the fully connected layer with a global pooling layer. Based on the Encoder-Decoder model structure, a regional segmentation network (RS-net) was established, and the multi-scale convolution kernel was used to improve the local receptive field of the original convolution kernel and segment the lesion area accurately. Segmentation experiments were carried out on the field-photographed datasets of corn leaf spot, corn round spot, wheat stripe rust, wheat anthracnose, cucumber target spot disease, and cucumber brown spot. The segmentation accuracy was 87.04% and the recall rate was 78.31%. The comprehensive evaluation index value was 88.22% and the single image segmentation speed was 0.23 s. Table 3 offers a brief overview of recent research works in target detection of plant diseases.

C. THE SYSTEM OF LEAF-DISEASE DETECTION
In an era when smart agricultural technology is so advanced, mobile phones have become a new type of ''farming tool'' for farmers, which can help farmers in identifying diseases and insect pests. Currently, researchers develop small programs or mobile apps to help farmers identify crop pests and diseases. The farmer takes pictures and uploads the diseased parts of the crop, and the system will return the recognition result within a few seconds. And provide users with the diagnosis results, similarity, disease characteristics, causes, and prevention and control plans for users, so that farmers can treat diseases and insects in a scientific way and increase crop yields.
Ozguven and Adem [81] modified the Faster R-CNN by increasing the size of the input layer from 32 × 32 pixels to 600 × 600 pixels and developed an automatic detection and recognition system for leaf spot disease in 3 levels of sugar beet disease severity (mild, moderate, and severe). The developed Faster R-CNN achieved an accuracy of 95.48% compared to 92.89% achieved by Faster R-CNN.
Aiming at the problem that the classification accuracy of the classification model for the severity of crop diseases and insect pests is not high enough, Yu et al. [82] proposed an  improved ResNet50 model (CDCNNv2) combined with deep transfer learning and developed a classification system for the severity of crop diseases and insect pests. In addition to realtime and fully automatic detection of crop pests and diseases, the system also implements a series of supporting functions such as prevention and control recommendations and drug recommendations.
Li et al. [83] combined the attention mechanism with the residual structure to build the PARNet model and completed the development of the WEB application. The average accuracy of the platform for 5 tomato leaf diseases can reached 96.84%. It was 2.25%∼11.58% higher than other models (VGG16, ResNet50, and SENet).
Jiang et al. [84] redesigned and optimized the convolutional neural network structure based on the traditional LeNet-5 network, and proposed a convolutional neural network system for ginger disease recognition based on the four kinds of ginger disease collected in the natural environment. The recognition rate of four kinds of ginger diseases reached 96%.
Zhou [85] identified 5 kinds of apple leaf diseases based on transfer learning and the Faster R-CNN and developed an apple leaf disease detection system based on the Android platform. The detection system had an average recognition accuracy of 76.55% for apple leaf diseases.
Liu et al. [86] deployed the MobileNet network on the mobile phone, and the average recognition accuracy of the 6 kinds of grape diseased leaves collected in the field was 87.5%, and the average calculation time for a single image was 134ms.
Based on the ResNet50 architecture, Esgario et al. [87] developed a system that can identify and estimate the severity of stress caused by biological agents on coffee leaves. The system had an accuracy of 95.24% for the classification of biological stress on coffee leaves, and an accuracy of 86.51% for estimation of the severity.
Xiong et al. [88] proposed an automatic image segmentation algorithm based on the GrabCut algorithm and selected the MobileNet as DL classification model, and designed a crop disease recognition system for mobile smart devices. The system had a recognition accuracy of more than 80% for a total of 27 diseases of 6 crops in the laboratory environment and the field. Table 4 offers a brief overview of recent research works in the development of plant leaf disease identification systems.

IV. LEAF-DISEASE DETECTION BASED ON SMALL SAMPLES
In practical applications, the incidence of some plant diseases is low and the cost of acquiring disease images is high, 56692 VOLUME 9, 2021 resulting in only a few or dozens of disease images collected. The transfer learning method can transfer the knowledge learned from the general large dataset to the professional fields with relatively little data. But for the datasets with only a few or dozens of images, the transfer learning method also has the problem of low recognition accuracy [23]. This is because it is difficult for the deep network to learn different features, which leads to problems that are difficult to converge or over-fitting. Therefore, plant disease datasets with single or small samples can hardly support the training of DL architecture. On the other hand, for the recognition of new classes that do not appear in the training set, the deep learning model needs to be retrained.
Recent advances in DL have proven the effectiveness of several architectures to learn new classes using small datasets, a famous sub-field known as Few-Shot Learning (FSL) [89]. FSL can not only solve the recognition problem of new classes that did not appear in the training but also solve the problem of the neural network, which difficult to converge due to the small number of experimental samples, thereby improving the accuracy of small datasets recognition.
The FSL methods used for image classification include model initialization, metric learning and data generate methods. The initialization method focuses on the adjustable parameters in the network so that a new class of classifier can be learned from a limited set of examples [90]- [92]. The aim of metric learning methods is learning to compare. It means that once a network learns to compare classes, it will be able to learn new classes from few labeled samples [93]. Finally, generate data methods, the methods learn a generator from the data in the base classes, and use that generator to generate data for new classes.
FSL solutions for plant leaves classification have been introduced recently, for example, Hu et al. [94] present a low shot learning method for tea leaf disease identification, used the improved conditional deep convolutional generative adversarial networks (C-DCGAN) for data augmentation. And the average identification accuracy of the proposed method was 90%. Wang and Wang [95] proposed a fewshot learning method based on the Siamese network with contrastive loss and kNN classifiers to solve plant leaf classification problem with a small sample. Das and Lee [96] proposed a two-stage multilayer neural network for the few-shot recognition of new categories and a detailed mathematical theory derivation process.
Aiming at the problem of too few samples in the training set, Li et al. [97] proposed a one-shot learning method for the first time and used Bayesian functions to build a network. Subsequently, many DL architectures for one-shot learning tasks were proposed and achieved remarkable results. The author verified in his work [98] that using FSL can transfer knowledge from a clear source domain (colon tissue) to a more general domain (composed of colon, lung, and breast tissues) by using few training images. Experimental results showed that the FSL can obtain an average accuracy of 90% with only 60 training images, which was better than fine-grained transfer learning (73%). Zhong et al. [99] proposed a generative model based on conditional adversarial auto-encoder (CAAE), which was used to perform generalized one-shot and few-shot learning in the case of few or even zero training samples to solve the problem of citrus diseases identification.
Ren et al. [100] proposed a plant disease identification method based on one-shot learning for the small sample problem of plant leaf diseases. Taking 8 kinds of plant disease with a small number of samples in the public dataset P1antVillage as the identification object, the focal loss function (FL) was used to train the plant disease classifier based on the relation network. The results showed that the recognition accuracy of the method in 5-way and 1-shot tasks reached 89.90%, which was 4.69% higher than the original relation network model. At the same time, compared with matching network and transfer learning, the improved method had increased the recognition accuracy on the experimental dataset by 25.02% and 41.90%, respectively.
Argüeso et al. [101] taken 38 kinds of plant disease images in the public dataset P1antVillage as the identification object, Siamese networks, and triplet loss was used and compared to classical fine-tuning transfer learning. The median accuracy was 55.5 % learning for 1 image per class. Median accuracy were 80.0 % and 90.0 % for 15 and 80 images per class. The FSL method outperformed the classical fine-tuning transfer learning which had an accuracy of 18.0 % and 72.0 % for 1 and 80 images per class, respectively. The author Wu [32] took 3 kinds of tea leaf diseases as the research object, segmented the lesions and expanded the dataset of the segmented lesions at first, and then used the combination of depth transfer and Cayley-Klein metric to realize the identification of tea diseases, and a result of 100% recognition accuracy was achieved. Table 5 offers a brief overview of recent research works in plant leaf disease detection based on small samples.

V. HYPER-SPECTRAL IMAGING(HSI) WITH DL MODELS
Plants may be affected by multiple pathogens at the same time during the growth process, and some different pathogens may produce similar symptoms and signs [102], [103] and the symptoms are not obvious at the early stage of plant diseases, which makes it easy to use naked eyes or simple computer vision has become very difficult to detect plant diseases.
The electromagnetic spectrum range of hyperspectral sensors is mainly concentrated in the visible and near-infrared (400 ∼ 1000nm), and sometimes includes shortwave infrared (SWIR, 1000 ∼ 2500nm). This sensor can obtain spectral information from hundreds of narrow spectral bands [104]. These narrow bands are highly sensitive to subtle plant leaf changes caused by diseases and can distinguish different types of diseases so that early asymptomatic detection can be carried out. Therefore, HSI is the focus of recent research, for the early detection of plant diseases. For example, the review [105] provided an overview of advanced hyperspectral technologies for plant disease detection.   Xie et al. [106] investigated the feasibility of using a hyperspectral imaging technique to identify two kinds of diseases on tomato leaves. Both imaging information and spectral information were investigated in the study. The ELM model was established to identify the diseased samples and the successive projection algorithm (SPA) was applied to select useful wavelengths. The classification accuracy was 97.1% of the SPA-ELM model on the testing set. The early hyperspectral images of cucumber downy mildew in greenhouses collected infield, it is influenced by environmental illumination and difficult to extract effective features from them. Qin et al. [107] proposed a novel method of extracting feature bands based on disease difference information. Which improved combining adaptive weighting algorithm (CARS) and successive projection algorithm (SPA), and an early detection model of cucumber downy mildew was established. For the hyperspectral image of healthy cucumber leaves and the daily hyperspectral images within 12 days of infection, the detection rate of 100% can be obtained from the 2∼12 day of infection, and the detection rate of the test set for 1 day of infection reached 95.8%. Abdulridha et al. [108] used hyper-spectral imaging technology combined with the MLP classification method, had an accuracy of 99% for the four stages of tomato bacterial spot disease and bacterial target spot disease (healthy asymptomatic stage, early stage, and late-stage). Yuan et al. [109] proposed a method for detecting tea tree anthracnose based on hyperspectral imaging.
Through spectral sensitivity analysis, the disease sensitive bands were determined, and two new disease indexes were established using these bands: tea tree anthracnose ratio index (TARI) and tea tree anthracnose normalized index (TANI). A method combined unsupervised classification and adaptive two-dimensional threshold detection is proposed, based on a set of optimized spectral features. The results showed that the overall accuracy of identifying diseases was 98% at the leaf level, and the overall accuracy of identifying diseases was 94% at the pixel level. In [110], a detailed review of DL with the HSI technique was provided. In order to avoid the over-fitting and improve accuracy, a detailed comparison was provided between several DL models like 1D/2D-CNN (2D-CNN better result) LSTM/GRU (both faced over-fitting), 2D-CNN-LSTM/GRU (still over-fitting). Therefore, a novel hybrid method (2D-CNN-BidLSTM/GRU) which from a convolutional and bidirectional gated recurrent network was proposed for the hyperspectral images. The model resolved the problem of over-fitting and achieved 75% F1-score and 73% accuracy for wheat disease detection [111]. In [112], the author developed a supervised 3D-CNN model to learn the spectral and spatial information of hyperspectral images for the classification of healthy and charcoal rot infected samples. A visualization method based on a saliency map was used to identify the classification accuracy of hyper-spectral wavelengths. The importance of wavelength can be inferred by analyzing the size of the gradient distribution of the saliency map of the image on the hyper-spectral wavelength. Based on the hyper-spectral imaging of the inoculated and simulated inoculated stem images, the classification accuracy of the model 3D-CNN was 95.73%, and the F1 score of the infection category was 87%. For the detection of potato virus, DL was used to describe the hyperspectral images and achieved acceptable values of precision (78%) and recall (88%) [113]. In [114], developed a DL model of multiple Inception-ResNet, which uses both spatial and spectral data on hyperspectral UAV images to detect the yellow rust in wheat. The model achieved an accuracy of 85%, which was quite a lot higher than the RF-classifier (77%). Gui et al. [115] divided the early soybean mosaic virus disease (SMV) into 0, 1, and 2 degrees according to its severity. In the case of a small number of experimental soybean samples, they proposed a novel SMV early detection method which combined convolutional neural network and a support vector machine (CNN-SVM), and achieved an accuracy rate of 96.67% on the training set and 94.17% on the testing set. The literature [116] taken corn seedlings after cold stress as the research object, extract the spectral curve of the comprehensive evaluation index of cold damage based on hyper-spectral images, and used DL to construct a corn seedling damage detection model. According to [117], a novel hyper-spectral analysis proximal sensing method based on generative adversarial nets (GANs), named as outlier removal auxiliary classifier generative adversarial nets (OR-AC-GAN) was proposed in order to detect tomato plant disease before its clear symptoms appeared. The classification accuracy achieved 96.25% before visible symptoms show up at plant leaf level (as shown in Fig. 5). Table 6 offers a brief overview of recent research works in plant leaf disease detection using hyperspectral images.

VI. CONCLUSION AND FUTURE DIRECTIONS
In this paper, we have introduced the basic knowledge of deep learning and presented a comprehensive review of recent research work done in plant leaf disease recognition using deep learning. Provided sufficient data is available for training, deep learning techniques are capable of recognizing plant leaf diseases with high accuracy. The importance of collecting large datasets with high variability, data augmentation, transfer learning, and visualization of CNN activation maps in improving classification accuracy, and the importance of small sample plant leaf disease detection and the importance of hyper-spectral imaging for early detection of plant disease have been discussed. At the same time, there are also some inadequacies.
Most of the DL frameworks proposed in the literature have good detection effects on their datasets, but the effects are not good on other datasets, that is the model has poor robustness. Therefore, better robustness DL models are needed to adapt the diverse disease datasets.
In most of the researches, the PlantVillage dataset was used to evaluate the performance of the DL models. Although this dataset has a lot of images of several plant species with their diseases, it was taken in the lab. Therefore, it is expected to establish a large dataset of plant diseases in real conditions.
Although some studies are using hyperspectral images of diseased leaves, and some DL frameworks are used for early detection of plant leaves diseases, problems that affect the widespread use of HSI in the early detection of plant diseases remain to be resolved. That is, for early plant disease detection, it is difficult to obtain the labeled datasets, and even experienced experts cannot mark where the invisible disease symptoms are, and define purely invisible disease pixels, which is very important for HSI to detect plant disease. SHUJUAN ZHANG received the Ph.D. degree in agricultural mechanization engineering from Zhejiang University. She is currently a Professor and the Doctoral Director with the College of Agricultural Engineering, Shanxi Agricultural University. She has published over 60 articles in her research-related fields. Her major research interests include digital agriculture information collection technology and equipment, technology and equipment for harvesting, primary processing, and non-destructive testing of agricultural products.
BIN WANG received the B.S. and M.S. degrees from Shanxi Agricultural University, China, in, 2011and 2015, respectively, where he is currently pursuing the Ph.D. degree with the College of Agricultural Engineering. His research interests include hyperspectral technology and nondestructive testing technology.