Cardamom Plant Disease Detection Approach Using EfficientNetV2

Cardamom is a queen of spices. It is indigenously grown in the evergreen forests of Karnataka, Kerala, Tamil Nadu, and the northeastern states of India. India is the third largest producer of cardamom. Plant diseases cause a catastrophic influence on food production safety; they reduce the eminence and quantum of agricultural products. Plant diseases may cause significantly high loss or no harvest in dreadful cases. Various diseases and pests affect the growth of cardamom plants at different stages and crop yields. This study concentrated on two diseases of cardamom plants, Colletotrichum Blight and Phyllosticta Leaf Spot of cardamom and three diseases of grape, Black Rot, ESCA, and Isariopsis Leaf Spot. Various methods have been proposed for plant disease detection, and deep learning has become the preferred method because of its spectacular accomplishment. In this study, U2-Net was used to remove the unwanted background of an input image by selecting multiscale features. This work proposes a cardamom plant disease detection approach using the EfficientNetV2 model. A comprehensive set of experiments was carried out to ascertain the performance of the proposed approach and compare it with other models such as EfficientNet and Convolutional Neural Network (CNN). The experimental results showed that the proposed approach achieved a detection accuracy of 98.26%.


I. INTRODUCTION
Cardamom is widely used as a flavoring agent and is widely used in medicine, including allopathy and Ayurveda [1]. It is a money-mint crop; modern technology for agro production has been developed and widely accepted in all cardamom-growing territories in India. Still, the spread of various pests and diseases remains a challenge that is considered a significant production barrier experienced by the cardamom sector. Small cardamom is affected by a host of pathogenic bacteria, which seriously damages the crop and is often harmful. Diseases infected with cardamom plants, such as Colletotrichum Blight and Leaf Spot, have emerged frequently in fields where crop management is not considered [1].
The emergence of plant diseases distresses agrarian production. If vegetation disorders are not diagnosed in time, food scarcity will intensify [2]. Plant diseases, pests, and weeds threaten production and quality farming, resulting The associate editor coordinating the review of this manuscript and approving it for publication was Zhouyang Ren . in crop loss and economic loss. That means about 15-25% of food production in India [3]. Various other factors degrade the eminence and quantum of agricultural products, such as climate change and modern cultivation techniques with large amounts of chemical fertilizers. Infected plants often show apparent signs or sores on plant leaves, trunks, flowers, or fruits. In general, each disease or insect environment produces a single visual archetype that can be used to interpret anomalies. Generally, plant leaves are a significant source of plant disease, and most prophetic significance of the disease may initiate to emerge on the plant leaves [4].
In habitual, agricultural, and plant pathology experts visit the farmland or farmers to identify plant disorders and pests based on acquaintances. This approach is not only humble, but also ambitious and ineffective. Agriculturists with less knowledge may misjudge and use pesticides or insecticides indiscriminately during the screening process. This has resulted in indispensable economic losses. To address these challenges, image processing using an automatic plant leaf disease detection approach is essential. Timely perception is the baseline for effective interdiction and supervision of plant leaf diseases, and they play an essential role in the supervision and decision-making of agrarian products.
In a recent study, computer vision and machine learningbased techniques were developed for plant leaf disease detection. Real-time plant disease detection has some significant challenges, such as complex background and severity of the disease due to the images being captured in real-time scenarios from the farm field.
In this study, the detection of cardamom plant disease was proposed. Cardamom plant leaf images are captured in the farm field with complex backgrounds and a dataset generated, which measures the detection ability of the proposed approach. U 2 -Net architecture [5] used in this work, which uses multiscale features to remove the background from the image. Cutting-edge deep learning models such as EfficientNetV2 [6] were used in this work.
The key contributions of this work are: • The cardamom plant leaf dataset was collected from a cardamom plantation in Chinnahalli, Sakaleshpur, India, from April to June 2021, using different electronic devices and set as a benchmark dataset for the subsequent study.
• Complex background of the images removed by extracting the multiscale features with U 2 -Net.
• A cardamom plant disease detection approach was proposed using EfficientNetV2. A set of experiments was conducted to ascertain the detection efficiency of the proposed approach. The grape plant leaf dataset was also used to assess the performance of the proposed approach.
This paper is organized as follows: A literature survey is discussed in Section II, Section III describes the dataset used in this work, Section IV explains the proposed method and experimental results. Conclusions and future works are discussed in Section V.

II. LITERATURE SURVEY
Modern image processing and deep learning-based techniques are widely used for the detection of plant leaf disease. Many diagnostic methods use a Convolutional Neural Network (CNN) and a pre-trained model to detect and classify healthy and unhealthy plants. Manso et al. used segmentation to remove background data and applied a trained neural network for classification [7]. Zhang et al. proposed a method to diagnose cucumber plant diseases by separating images with diseased patches by combining K-means, exploring the condition and color of infected leaf lesions, and separating unhealthy leaf images using scant resentment [8]. Yeh et al. proposed improved ocular-heed deep neural networks for classification by reading the feature maps highlighting the essential regions that also weaken the meaningless connected layers [9]. Various artificial intelligence and image-based plant disease detection approaches have been proposed [10]. Many technological approaches have been proposed for the diagnosis of plant diseases [11]- [13] and the general diagnosis of the disease [14]- [16].
The appropriate colorimetric scheme was selected, and the relevant parameters of the appropriate elements were selected from the two-color elements and texture [17]. Pattern recognition of tomato leaves was performed using K-means clustering. However, the detection accuracy is affected by the input selection of the feature parameters. Hang et al. proposed disease detection by developing a traditional backpropagation algorithm, and the learning rate of the model was adjusted dynamically [18]. A region-based single-shot multi-box detector was used with Visual Geometry Group (VGG) to detect the cotton plant disease that was proposed by [19]. Sibiya and Sumbwanyambe proposed a CNN model for maize plant disease detection on mobile captured images and achieved a detection accuracy of 92.85% [20]. Shin et al. proposed an approach using six different pre-trained deep learning-based models for detecting powdery mildew disease of a strawberry dataset [21]. Karthik et al. proposed a tomato plant disease detection method based on a CNN [22]. Some of the studies used transfer learning for plant disease detection on various plant datasets [23], [24], peach dataset [25], and pre-trained AlexNet was employed for cucumber [26], [27] and tomato [28]. Pre-trained ResNet was employed for coffee and soybean plant disease detection [29], [30], and DenseNet was employed for apple leaves [31]. Jiang et al. proposed a novel multi-task approach by employing transfer learning with VGGNet, in which the authors extracted the independent features from multiple datasets and trained independently for multiple related tasks on wheat and rice plant datasets [32]. Singh et al. employed a multi-layer CNN to detect mango leaf diseases and achieved 97.13% detection accuracy [33]. The Internet of Things was employed with fuzzy networks to detect pauropsylla disease [34].
Chouhan et al. proposed a Wi-Fi-enabled plant detection approach by employing Wi-Fi cameras to capture images, and for classification, a Radial Basis Neural Network (RBNN) was employed to detect mango leaf diseases. The RBNN uses radial basis functions in the intermediate layers of the network [35]. Bacterial foraging escalation was assessed using RBNN. Furthermore, features were extracted from the region-growing perspective [36].
Some studies considered the combination of segmentation and classification for plant disease detection; Lu et al.  for recognizing orange fruits by employing mask R-CNN and instance segmentation tasks in an image [41]. Meyer Fernand proposed an approach for segmenting leaves by setting a certain threshold [42]. Morris Daniel employed CNN, for instance, segmenting for selecting a leaf from the image [43].
Singh et al. proposed a segmentation approach by employing K-means clustering, watershed segmentation, and threshold-based exemption on coconut leaf image datasets to detect leaf blight disease and attained a 96.94% detection accuracy by employing CNN [44]. Tassis et al. proposed an approach by employing mask R-CNN, for instance, segmentation and removal of the background using semantic segmentation by employing UNet and attained a 94.27% detection accuracy on coffee plant disease detection [45]. Chouhan et al. proposed a neural network model with superpixel clustering for segmentation and achieved 98.57% detection accuracy [46].
Most of the studies considered the public dataset [47]; this study collected the cardamom plant leaf images of complex backgrounds from farm fields. However, to the best of our knowledge, this is the first study on cardamom plant leaf disease detection using a deep learning-based approach.

III. DATASET DESCRIPTION
A. IMAGE CAPTURING SETUP Figure 1 shows the image capturing setup. Real-time cardamom plant leaf images were captured in the farmland using different mobile phones. To encourage real-time plant disease classification, images were captured with background, noise, different light illumination, and different angles.

B. CARDAMOM DATASET 2021
In this study, we collected 1724 cardamom plant leaf images of three classes, namely, Colletotrichum Blight and Phyllosticta Leaf Spot and healthy category. These are labeled with the help of the Indian Cardamom Research Institute officers, Regional Station Sakaleshpur, Karnataka(State), a wing of Spices Board India. All images were captured in daylight from 10 AM to 5 PM from April to June 2021. Table 1 presents the cardamom dataset of 2021. The diseases mentioned in Table 1 are common to the cardamom plant, affecting the growth and yield of the crop. Each image is further captured under a farm field scenario, without technical means, preserving all archive information and removing the background image in this work. The original cardamom plant leaf images had a complex background with different VOLUME 10, 2022 dimensions and capturing conditions. The three different types of cardamom plant leaf images are shown in Figure 2.

C. PLANTVILLAGE DATASET
PlantVillage [47] is a freely available dataset that is widely used in the field of plant disease classification. It contains over 54,284 images, all of which were annotated. In these images, it is difficult to find the inclement conditions such as the complex background. In this study, we used the grape dataset available in the PlantVillage dataset was also used. The details of the dataset used in this study are described in Table 1.

IV. PROPOSED METHOD
In this study, we proposed a cardamom plant leaf disease detection approach by employing a background removal technique to remove the complex background of the image by using U 2 -Net. EfficientNetV2 deep learning model is used for classification.

A. BACKGROUND REMOVAL
Cardamom plant leaf images are of RGB, collected with a complex background with different dimensions and resolution, and the leaf is surrounded by several other factors, generally in the environment.
In most cases, computer vision algorithms remove the background from an image, such as image thresholding in OpenCV and grab hut techniques [48]. These techniques help when the background color differs from the interesting object; in such cases, it is easy to remove the background by utilizing green and blue screens to eradicate the foundation and replace it with another scene.
Removing the background from an image is a highly challenging task without explicit pre-or post-processing. If the item has a very similar color to the background, it tends to be highly challenging to track down a perfect form because of soft edges or shadows.
The background removal approach used in this study was U 2 -Net [5] as shown in Figure 3.. It takes the input image to produce a mask of the region of interest. Further, it applies a bitwise operation on the original image and the mask produced by U 2 -Net. U 2 -Net architecture is a twofold interlaced U-structure, as shown in Figure 4. It consists of 3 parts. The first part is a six-stage encoder; in this stage, it uses a ReSidual U-Block (RSU): RSU block has three components described in Figure 5; extricate local features, an input convolutional layer generates the intermediate activation map FM (x), next is encoder-decoder, which is like U-Net that takes FM (x) as input. It learns to extricate and encode the multiscale contingent attributes U (FM (x)). Minimizing the loss during direct upsampling extricates the multiscale attributes from gradually downscaled activation maps. It encodes them into high-aspiration activation maps by incremental upsampling, concatenation, and convolution. Finally, the residual connection combines the surplus connection, which combines local attributes and multiscale attributes as shown in Equation 1. The second part of the U 2 -Net architecture is a five-stage decoder that uses the dilated version of the RSU. Finally,  saliency probability maps were generated by attaching the decoder stages to the encoder stage. Saliency-map shows each pixel's unique quality. A saliency-map helps to distinguish an interesting part from the background. Deep supervision was employed with the following loss function (Equation 2) to minimize the loss and to manifest robust regularization for the learned features. Loss = N n=1 w n side l n side + w fusion l fusion (2) where N is number of saliency-maps generated by the encoder and decoder, l n side is the loss of side-saliency-maps for each stage, l fusion is the fusion for the ultimate loss of the saliency-map, w n side is weight for the side-saliency-map, and w fusion is weight for the loss-term.

B. CLASSIFICATION MODELS
CNNs consist of various layers, such as convolutional, pooling, and fully connected layers [49]. The convolutional layer is an essential part of the CNN because it extracts detailed information of the input images using different convolution kernels. Several convolutional layers extract the set of feature maps, known as the color and edges of the input image. The feature map function is defined in Equation 3 [49]: FM denotes the feature map, W denotes the weight, b denotes the offset vector, and f(.) defines the ReLU activation function as defined in Equation 4 [50]: where z is the input. The pooling layer reduces the possibility of overfitting by minimizing the spatial dimension and convolution. Which is defined in the Equation 5 [49]: y l i indicates the feature vector, s defines the pooling size, and down(.) indicates the downsampling.
Finally, the one or more fully connected layers defined, which flatten the network by connecting all the previous layer neurons, final fully connected layer predicts the class label, where Softmax activation function is used in the pre-trained models which are used in this work. Softmax is defined in the Equation 6 [49]: where y denotes input vector exp(y i ) denotes the exponential function for input vector. exp(y j ) denotes the exponential function for output vector.
In CNN, hyperparameters are determined before training; on the other hand, parameters such as weights and biases are changed during training. There are two types of hyperparameters: those that deal with a network structure and those that deal with training. Kernel sizes and the number of layers of the model are the hyperparameters that deal with the network structure; kernel size plays a role in extracting the features on a large scale. The deeper the layers, the higher the classification rate. Hyperparameters that deal with VOLUME 10, 2022 training include batch size, learning rate, and dropout. The loss function is expressed by Equation 7.
EfficientNet is a family of CNNs proposed by [51]; it does the scaling on CNN such as depth, that is, how depth the network in terms of several layers, width is how wide the network, and resolution in terms of the image resolution. EfficientNet scaled up the CNN using a compound scaling method to scale the dimensions of the network. It uses Mobile inverted Bottleneck Convolution (MB-Conv) as a baseline network and scale-up this network as EfficientNet. EfficientNetV2 [6] is also a family of CNNs; it produces a higher performance and a shorter training period. To enhance the training and efficacy, it uses Fused MB-Conv for the first three stages and MB-Conv for the subsequent stages (Figure 6), and this is faster than existing models, which are up to 6.8x smaller. The architecture of the three different EfficientNetV2 versions are shown in Figure 7.
The proposed pipeline of the cardamom plant leaf disease detection approach has four stages. The first stage depicts the dataset preparation; in this stage, we collected the real-time cardamom plant leaf images from the cardamom plantation and labeled them. The second stage is used to remove the background of the leaf image and noise, as explained earlier in this paper. The third stage is training the deep learning-based model from scratch using the generated dataset. The next and final stages are the performance evaluations of the trained model. The proposed approach is illustrated in Figure 8. This has two phases; the first is the training phase, which is used as the processing stage to remove the complex background from the input image by using U 2 -Net. The background removed images are further processed to resize the images using an image resizer and fed into the next stage. The next stage is employed to train the deep learning-based models such as CNN, Effi-cientNet, and EfficientNetV2. The three different versions of EfficientNetV2 used in this work are EfficientNetV2-S (Small with 22 million parameters), EfficientNetV2-M (Medium with 54 million parameters), and EfficientNetV2-L (Large with 120 million parameters). Finally, the trained model produces the classification results.
In the testing phase, the cardamom plant leaf image was fed to a trained deep learning-based model after the completion of pre-processing operations, such as background removal and resizing of the image, the trained deep learning-based model produces the classification results.
where M represents the weight matrix, n represents the number of training samples, C represents class labels, and P is the predicted probability.

C. EXPERIMENTAL RESULTS
All the experiments were implemented using Python 3.6 programming language and executed using NVIDIA DGX Station server with 4X Tesla V100 and 500 TFLOPS. As discussed in Section IV-A, all the cardamom plant leaf images are captured in complex backgrounds, and all of them are of different dimensions; to remove the complex background from the captured image, U 2 -Net is used [5]. Figure 9 shows the background removal of the cardamom plant leaf images using U 2 -Net. Figure 9 a) shows the original cardamom plant leaf images, Figure 9 b) VOLUME 10, 2022    shows the mask generated by U 2 -Net, Figure 9 c). The output generated by the background removal approach, and Figure 9 d) shows the resized images using the image resizer.
All the original input images are resized to 224 × 224 for all the three deep learning-based models used in this study. In the experiment, 90% of the dataset was used for training and 10% for testing. A set of experiments were conducted to measure the performance of the proposed approach for 100 epochs on the cardamom plant dataset. The same set of experiments was conducted using a publicly available grape dataset to assess the performance of the proposed approach. Further, an additional set of experiments was carried out using other deep learning models such as CNN and EfficientNet. The CNN attained a maximum detection accuracy of 91.30% on the cardamom plant dataset and 94.24% detection accuracy on the grape dataset. EfficientNet attained maximum detection accuracy of 94.10% and 97.81% for the cardamom and grape plant datasets, respectively. EfficientNetV2-S attained a maximum detection accuracy of   95.59% and 96.44% on cardamom and grape plant datasets, respectively. EfficientNetV2-M obtained a maximum detection accuracy of 88.44% and 93.72% on the cardamom and grape plant datasets, respectively. EfficientNetV2-L attained a maximum detection accuracy of 98.26% and 96.45% on the cardamom and grape plant datasets, respectively. Table 2 presents performance evaluation of the proposed approach.
Testing is broadly distinguished as either internal or external testing. Internal testing splits the dataset into training set and test set, training set is used for training the model, and the test is used for performance evaluation of the model such as accuracy, precision, recall, and F1-Score. External testing involves testing the trained model using an independently derived dataset,that is, a dataset that is not used for training and internal testing. External testing is a task that exigents the trained model with an additional dataset that is different from the original dataset used during training. This process demonstrated the generalizability of the model.
To understand the behavior of the trained models, external testing is essential; this examines the dialects learned during training, and this helps to measure the performance of the trained models. A set of external testing was conducted using trained models such as CNN, EfficientNet, and Efficinet-NetV2 models on cardamom plant and grape plant datasets. Table 3 describes the performance evaluation of the external testing for the cardamom plant dataset. In the external testing, EfficientNetV2-S outperformed the other models for the cardamom plant dataset. Figure 10 shows the confusion matrix for external testing on trained EfficientNetV2 models for the cardamom plant dataset. Table 4 describes the performance evaluation of the external testing for the grape plant dataset. In external testing, CNN outperformed the other models for the grape plant dataset. Figure 11 shows the confusion matrix for external testing on trained EfficientNetV2 models for the grape plant dataset.
A set of experiments was conducted with 5 fold cross validation on CNN, EfficientNet, and EfficientNetV2-L models; Table 5 describes the 5 fold cross validation results for 100 epochs on cardamom and grape plant leaf datasets. The EfficientNetV2-L model outperforms the other two models on the cardamom plant dataset with a 91.42% detection accuracy.
A wide set of experiments was conducted on the CNN, EfficientNet, and EfficientNetV2 models. The results shown in Table 2-4 shows that the EfficientNetV2 model outperforms the other models and attained almost consistent results on the grape plant dataset, and EfficientNetV2-L attained a maximum result of 98.26% detection accuracy on the cardamom plant dataset. Table 6 shows a comparison of the proposed approach with the state-of-the-art methods. Employing U 2 -Net with EfficientNetV2 outperforms for real-time cardamom plant leaf disease detection with a 98.26% detection accuracy.

V. CONCLUSION AND FUTURE WORK
An efficient plant leaf disease detection approach is essential to detect plant diseases in real-time. In this regard, the cardamom plant leaf disease detection approach is proposed, where the cardamom plant leaf dataset was collected from a farmland with a complex background. Segmenting and detecting diseases in real-time images is a challenging task, as the images are associated with other factors such as the background of the image, environmental factors such as lighting, and angle of the capturing conditions. In the proposed method, the U 2 -Net architecture is employed to remove the complex background, which produces results without deteriorating the quality of the original image. For classification, in this work, CNN, EfficientNet, and EfficientNetV2 models were trained instead of using the pre-trained weights for EfficientNet, and EfficientNetV2. EfficientNetV2-S and EfficientNetV2-L models outperformed the other models; EfficientNetV2-L achieved 98.26% detection accuracy for the cardamom plant dataset, and EfficientNetV2-S achieved 98.28% detection accuracy for the cardamom plant dataset on external testing.
The dataset can be enhanced by collecting images of cardamom diseases with nutrition deficiency in future work. Furthermore, the model can be extended to identify the severity of the disease and nutrition deficiency.