Zero-Shot Transfer Learning Framework for Plant Leaf Disease Classification

Agriculture’s pivotal role in sustaining livelihoods and driving economic growth is widely recognized, yet various challenges like the adverse effects of climate change and limited resource availability hinder its productivity. Notably, plants are susceptible to various viruses and bacteria, impacting yield and food security. The emergence of deep learning, particularly convolutional neural networks (CNNs), has transformed agriculture by facilitating tasks such as disease detection. However, a significant challenge arises from the often unrealistic assumption that training and testing data share the same distribution. To address this, domain adaptation and transfer learning techniques have been employed, bridging the gap between different data distributions. Therefore, a novel framework named ‘Zero-Shot Transfer Learning’ is introduced. This addresses the challenge of improving classifier performance when trained on a source domain with different classes and tested on a target domain, exemplified by tomato and potato datasets. More specifically, in this framework, we include different CNN models along with techniques such as data augmentation, synthetic data generation, and robust discriminative losses, enhancing classifier performance in zero-shot scenarios. Extensive experiments on plant leaf disease classification under the zero-shot Transfer Learning assumption demonstrate the superiority of the proposed framework for effective disease classification. Ultimately, this framework holds the potential to promote crop yield optimization and ensure food security.


I. INTRODUCTION
Agriculture, with its multifaceted role in sustaining livelihoods and driving economic growth, is globally recognized as vital [1].It not only provides employment opportunities and income but also significantly contributes to a country's gross domestic product (GDP) [2].Particularly in rural areas, where farming forms the backbone of livelihoods, millions rely on agriculture as their primary income source.As the global population is projected to approach 10 billion by 2050, concerns about food shortages, increased hunger, and escalated food demand emerge [3], [4].Addressing these challenges becomes paramount given the limitations of agricultural resources.
The associate editor coordinating the review of this manuscript and approving it for publication was Donato Impedovo .
To address challenges related to disease diagnosis and the impracticality of experts' presence in rural areas, traditional machine learning approaches have been proposed [5], [6], [7], [8].These approaches focuses on categorizing plant diseases using attributes like texture, type, and color of plant leaf images.Prior research has employed machine learning methods such as k-nearest neighbors [9], random forests [10], support vector machines [11], and k-means clustering [12] for supervised or unsupervised classification after preprocessing steps to remove backgrounds and isolate infected regions.Notably, these methods have been criticized for their performance limitations and evaluation primarily in controlled laboratory settings.
Therefore, recent years have seen a surge in deep learning methods, particularly convolutional neural networks (CNNs), addressing various agricultural tasks.These CNN-based approaches have garnered significant attention and success across diverse agricultural tasks, ranging from pinpointing plant diseases to detecting weeds and classifying crop pests [13], [14].Notably, CNNs have emerged as a favored architectural choice for various image and video analyses, encompassing pivotal computer vision assignments like image recognition, object detection, and segmentation.
The open-source agriculture datasets have become available, their quality and quantity often do not suitable for real world applications due to lack of robustness in diverse real world environments.To address these difficulties, adopting data augmentation can be helpful.A notable challenge within the domain of deep learning methods revolves around their reliance on annotated data.To effectively tackle this hurdle, researchers have embraced the utilization of data augmentation strategies, encompassing both manual and synthetic methodologies.In the context of manual data augmentation, a rigorous process unfolds, encompassing the creation of images through techniques such as cropping, rotation, and flipping.This augmentation process seamlessly integrates a profusion of supplementary insights into the model, thereby endowing it with the capacity to extract additional attributes that were previously non-existent [15].Equally significant, the sphere of synthetic data augmentation involves the generation of artificial images, accomplished through well-established generative models like Generative Adversarial Networks (GANs) [16].This advanced approach facilitates the production of images imbued with diverse styles and contextual settings, thus profoundly enriching the model's capability to comprehend an expanded spectrum of environmental attributes during its training regimen.In the pursuit of enhancing deep learning models even further, a novel approach has been introduced, focusing on the preservation of discriminative information [17].This strategy entails the minimization of distances between samples from similar classes, while concurrently maximizing the distances between samples from dissimilar classes.Despite their potential, deep learning approaches assume training and testing data from the same distribution, which may not hold true in real-world scenarios [18], [19].Domain adaptation and transfer learning emerge as solutions to this distribution shift challenge [18], [20], [21], [22].Domain adaptation minimizes distribution discrepancies between training and testing data, while transfer learning leverages pre-trained models to enhance performance in new domains.Domain adaptation is used to improve the performance of a model on a target domain by capitalizing on knowledge from a related source domain, even target domain has different data distributions.Domain adaptation is particularly useful in scenarios where collecting labelled data in the target domain is exorbitant or impractical [23], [24].
Nowadays, transfer learning has gained significant attention and has been successfully applied in various domains, including precision agriculture, natural language processing, computer vision, and robotics [25], [26], [27].Its application has shown promising results in addressing real-world problems across multiple fields such as industrial, security and surveillance, healthcare, agriculture, automobile, and finance.As a result, transfer learning is increasingly recognized as a vital approach for tackling practical challenges.Within the realm of transfer learning, two specialized techniques are zero-shot learning [28], [29], [30], [31] and few-shot learning [32], [33].These approaches address scenarios where there is limited or no labeled data available in the target domain.
Zero-shot learning is a formidable challenge involving the recognition or categorization of previously unseen objects or classes in a target domain [34], [35], [36].The model, trained on a labeled source domain, aims to extend its grasp to unfamiliar classes in the target by using extra information like semantic attributes, bridging the gap between the two domains.This enables the model to predict new classes without direct training.This approach is valuable for expansive or open-world classification tasks.In contrast, few-shot learning addresses knowledge acquisition from a limited set of annotated examples in the source domain [32].Rather than relying on an extensive dataset, the model learns from scant labeled examples per class.Its objective is to adeptly generalize to new classes or instances in the target domain with minimal supervision.Often using metalearning, the model swiftly adapts to new tasks by leveraging past knowledge.This empowers the model to generalize effectively and make accurate predictions with minimal labeled examples.
In this paper, we address one of the most critical challenges, which is the identification of diseases within a domain where we lack any labeled examples, making traditional model training impossible.However, we do have access to another domain that contains labeled samples for different classes.Leveraging these labeled samples from the other domain, we aim to identify classes within our target domain.As illustrated in Figure 2, we are provided with a source domain dataset for tomatoes with three labeled classes: Tomato-late-blight, Tomato-Healthy, and Tomato-Early-blight.Our objective is to apply this knowledge to classify the target domain dataset for potatoes, which consists of three classes: Potato-late-blight, Potato-Healthy, and Potato-Early-blight.The task at hand falls under zeroshot learning, one of the most challenging problems in disease identification field.
In this work, we introduce a novel framework called 'Zero-Shot Transfer Learning' to address the problem of improving classifier performance when trained on source domain labeled data with different classes and tested on target domain data.We explore various techniques within this framework, including data augmentation methods such as manual augmentation and synthetic data generation using Generative Models.Additionally, we investigate the benefits of employing robust discriminative losses, such as center loss and triplet losses, to further enhance the classifier's performance.
The contributions of our proposed work can be succinctly summarized as follows: • We present an innovative framework that harnesses labeled data from the source domain to facilitate accurate classification of diverse classes within the target domain.
• To ensure seamless knowledge transfer between the source and target domains, we explore a range of strategies.These include the integration of pre-trained models, application of data augmentation techniques (both manual and generative), and the incorporation of discriminative methods.
• Through comprehensive experimentation on real-world datasets, we provide empirical evidence showcasing the effectiveness of each strategy integrated within our proposed framework.

II. RELATED WORKS
After conducting a comprehensive literature survey, we have identified five main categories of existing work related to our paper: 1) Traditional machine learning methods, 2) CNN-Based Supervised Learning (CNN-SL), 3) Data Augmentation Methods, 4) Discriminative Methods, 5) Transfer Learning and Domain Adaptation Methods, and 6) Zero-short Learning Methods.These five categories of existing work lay the foundation for our research, and we aim to build upon these insights to propose our innovative framework of Zero-Shot Transfer Learning.

A. TRADITIONAL METHODS
In previous endeavors, a variety of machine learning techniques have been harnessed to address the challenge of identifying plant diseases.Typically, these approaches involve several distinct stages: • Input Leaf Image: The process commences by considering an input leaf image.
• Pre-processing Techniques: Employing pre-processing techniques to eliminate noise and enhance image quality.
• Segmentation Techniques: Executing segmentation techniques to isolate the region of interest within the leaf image, thereby isolating the affected areas.
• Disease Detection: Employing implemented algorithms to detect diseases present in the segmented regions.
• Feature Extraction Techniques: Extracting features such as Local Binary Patterns (LBP), Histogram of Oriented Gradients (HoG), and Speeded-Up Robust Features (SURF) from the images.
• Feature Selection: Employing techniques like Particle Swarm Optimization (PSO) and Principal Component Analysis (PCA) to select crucial features.
• Classification: Utilizing the selected features to train classifiers such as Support Vector Machines (SVM) and K-Nearest Neighbors (KNN) to perform the final disease classification.
To gauge the effectiveness of disease identification, common performance measures such as Accuracy, Area Under the Curve (AUC), and F-score are often considered.
For instance, Dhingra et al. [37] conducted an in-depth exploration of digital image processing techniques for leaf disease detection and classification.This study comprehensively discusses the performance of disease detection and classification, analyzing state-of-the-art techniques introduced between 1997 and 2016.Singh and Misra [38] proposed an algorithm for image segmentation, which facilitated automatic detection and classification of plant leaf conditions.Chilingaryan et al. [39] delved into research developments over the past 15 years regarding machine learning techniques for accurate crop yield prediction and estimation of nitrogen status.Pydipati et al. [40] introduced the color co-occurrence method (CCM) that combines with statistical classification algorithms to classify diseased and normal cit- rus leaves based on texture-based hue, saturation, and intensity (HSI) color features.Camargo and Smith [41] outlined an image processing approach for identifying visual symptoms of plant diseases through exploration of colored images.
He et al. [42] proposed a method for identifying cotton leaf diseases, employing three color models for feature extraction.Tucker and Chakraborty [43] considered thresholding and clustering parameters techniques for identifying and detecting diseases in oat and sunflower leaves.Aduwo et al. [44] devised an automated vision-based analysis system to detect cassava mosaic disease, using color and shape features and various classifiers, including the Naive Bayes classifier.Muthukannam and Latha [45] emphasized the significance of Particle Swarm Optimization (PSO)-based image segmentation for plant leaf disease identification.Jian and Wei [46] introduced a method for identifying diseases in cucumber leaves using Support Vector Machines (SVM) with polynomial, radial basis function, and sigmoid kernel functions.Chandra and Bedi [47] described various approaches of SVM, parameter selection and multiple approaches impact the efficiency of image classification approaches.Tanveer et al. [48] proposed the concept of using Twin SVM (TWSVM) by which used dual hyperplanes which are not parallel to classify the data points.Further Wahab et al. [49] implemented the approach in which image segmentation is vital, instead of using entire image, parrt of image is considered for detecting diseases in chilli plants.Kour and Arora [50] proposed pre-requisite step before applying SVM in Particle swarm optimization based SVM of the plants segmentation and classification.

B. CNN-BASED SUPERVISED LEARNING (CNN-SL)
In recent years, the cutting-edge approach of using convolutional neural networks for supervised learning (CNN-SL) has found widespread application across various agricultural domains.The CNN models varies depending upon the number of layers and number parameters used in millions as shown in Table 1 These applications encompass tasks such as identification, classification, detection, quantification, and prediction.In a notable study by Mohanty et al. [51], the efficacy of this technique was demonstrated through the training of two renowned deep convolutional neural networks: AlexNet and GoogLeNet.Their goal was to discern 14 distinct crop species and identify 26 diseases within the PlantVillage dataset.Impressively, GoogLeNet emerged as the standout performer, achieving an exceptional accuracy of 99.35%.However, the robustness of these models was subsequently tested against two separate validation datasets, leading to a significant drop in overall accuracy to a mere 31%.This outcome highlights a notable disparity between the models' performance on the initial dataset and their ability to generalize to other datasets.
Utilizing a VGG-19 based Convolutional Neural Network, the study attains a 95.6% accuracy in automated plant disease classification, leveraging data from PlantVillage.The model, deployed on an Android app, showcases real-time leaf disease detection across 13 plant species, underscoring its robust potential for accurate agricultural disease management.A pioneering hybrid model, combining Convolutional Autoencoder (CAE) and CNN, achieving remarkable results in plant disease detection.Employing the model on Bacterial Spot disease in peach plants, it attains 99.35% training and 98.38% testing accuracy, utilizing only 9,914 training parameters, offering improved precision and efficiency in automatic disease identification.Thakur et al. [52] elevated the importance of machine learning approaches for vision based approaches for plant disease identification.The authors pointed out the datasets availability and influence in models performance.Hassan and Maji [53] study firmly ratified the need of technology to replace manual actions in decision making, from inception of image capturing to model decision making for image classification by CNN models.

C. DATA AUGMENTATION METHODS
Generative Adversarial Networks (GANs) are a powerful class of generative models that operate in a unique adversarial framework.GANs consist of two main components: a generator and a discriminator.The generator's primary role is to create realistic data, such as images, videos, or audio, from random noise or latent representations.On the other hand, the discriminator acts as a binary classifier, attempting to distinguish between real data from the dataset and data generated by the generator.GANs have demonstrated impressive capabilities in various applications, such as generating high-quality images, creating realistic video sequences, and generating audio signals that sound authentic.They have also been used for data augmentation, style transfer, and image-to-image translation tasks.GANs have had a significant impact on the field of generative modeling, leading to a new era of creative and realistic data synthesis [54].The model executed on concatenated dataset generated with data augmentation over existing data, will enhance the performance of the system by minimizing over fitting issues [55].cGANs generate artificial images of maize and common weeds (Charlock, Fat Hen, Shepherd's Purse, and Small-flowered Cranesbill) to expand the dataset [56].Zeng et al. implemented methods with support of GANs to detect severity in infected leafs in citrus plants [57].The Data augmentation will influence the model efficiency even in the circumstances having small datasets of annotated images used for segmentation in plant disease.The data generated by augmented methods will support to minimize the efforts to create pixel wise segmented annotations which can be implemented in real time applications [58].The fidelity of synthetic images is evaluated using t-SNE visualization, and improved performance is demonstrated in crop/weed classification models using transfer learning (CNN) and feature extraction (SVM, LDA) techniques.

D. DISCRIMINATIVE METHODS
In the realm of literature, various discriminative loss functions such as hinge loss, triplet loss, center loss, and more, have been explored to enhance the discriminative capabilities of classifiers.For example, Center loss is used in conjunction with softmax cross-entropy loss for face recognition tasks.It encourages the features of each class to cluster around their class-specific center, making the intra-class variations smaller and enhancing discriminability [59].Triplet loss is considered in triplet networks for tasks like image retrieval.It encourages embeddings from the same class to be closer to each other and farther from embeddings of other classes, thus improving the discriminative power of the model [60].The hinge loss is commonly used in Support Vector Machines (SVM) for binary classification.It penalizes misclassifications and pushes the model to ensure that the correct class score is higher than the incorrect class score by a certain margin [61].Margin ranking loss is employed in siamese networks or triplet networks used for tasks like face recognition or similarity learning [62].The loss compares the similarity scores between anchor-positive pairs and anchornegative pairs, aiming to maximize the margin between them.Similar to margin ranking loss, contrastive loss is used in siamese networks to learn embeddings that project similar samples close together and dissimilar samples farther apart [63].
Zhang and Zhang [64] introduced Orthogonal Locally Discriminant Projection for classification of Plant Leaf Diseases.Argueso et al. [65] introduced a few-shot learning methodology for the classification of plant diseases through field images.Their study employed three distinct CNN architectures to construct two foundational models, including a Triplet network and a sophisticated Deep Adversarial Metric Learning (DAML) technique.The outcomes of their assessment revealed that a foundational model trained on an extensive collection of source field images could be fine-tuned to effectively classify novel diseases using only a limited quantity of images.Fan et al. [17] presented a novel approach involving deep feature descriptors combined with traditional handcrafted features through feature fusion for enhanced plant leaf image analysis.The integration includes center loss to improve the distinctiveness of fused features, ensuring compactness within classes and separation between them.Experimental validation on three datasets demonstrates significant classification accuracies: 99.79%, 92.59%, and 97.12%.[66].In their study, they delved into eight pre-trained models, including VGG16, VGG19, and ResNet50, to extract intricate features from images.Rangarajan et al. utilized pre-trained VGG16 and AlexNet models to successfully identify six distinct tomato diseases and pests, attaining impressive recognition accuracies of 97.49% for both models [67].Nsumba [73].Ma et al. by implementing unsupervised domain adaptation approach for better performance, proposed the method for crop yield prediction in which features extracted from multiple domain sources [74].Magistri et al. proposal implementation majorly unitized segmentation in domain adapation for real world crop monitoring application model.Instead of treating the entire field as a single unit, divided into a semantic way to reach out effectively to get better results [75].In Table 2 summary, provided the information regarding authors worked on the Plant Village dataset, employed variations in model architectures, testing approaches, and the utilization of techniques such as data augmentation and loss functions.The table summarised the approach followed by authors regarding encompassing manual data augmentation and/or synthetic augmentation, employed discriminators like Centre Loss and Triplet Loss, model tested on unknown data or not.

F. ZERO-SHOT LEARNING METHODS
In the realm of literature, the majority of efforts dedicated to zero-shot learning have been centered around tasks such as object recognition and segmentation within real-world datasets.Surprisingly, none of these endeavors have delved into the intricate challenge of zero-shot learning applied to disease classification.For example, in Zero-shot Learning (ZSL), semantic embedding spaces are utilized as a means of transferring knowledge.Most existing ZSL [76], [77], [78] employ attribute spaces as their semantic embedding spaces.However, using attribute spaces to represent object classes necessitates the manual definition of an attribute ontology, which can limit the effectiveness of attribute space-based ZSL methods.To address this limitation, [79], [80], [81] explored the use of semantic word vector spaces as an alternative to attribute spaces.There are some ZSL approaches, which are based on visual-semantic similarity matching [82].The issue of domain shift in Zero-shot learning (ZSL) was initially reported in [35] and referred to as the projection domain shift problem.This problem was addressed by introducing transduction multi-view embedding.A new zeroshot learning (ZSL) technique for unsupervised domain adaptation was presented by Hou et al. [83].To tackle the problem of projection domain shift, the authors introduced a novel regularized sparse coding framework.Reference [84] presented the zero-shot learning for domain adaptation, where the test instances are restricted to be only from unseen classes.The article cited as [85] seeks to tackle the issue of generalized zero-shot learning, as outlined in [84].This problem arises in a more practical scenario where test instances may fall into any category, whether it is a known or unknown class.Li et al. [80] proposed a method for generalized zero-shot domain adaptation called TUPL (Target Unseen class Prototype Learning).In this approach, samples from both domains are projected into a shared subspace, ensuring that samples from the same class are close together to address domain differences.Attribute-Based Zero-Shot Learning, the model is trained on a source task with labeled data and learns to associate attributes or semantic descriptions with the classes.During zero-shot transfer, the model uses these attribute vectors to predict the target classes, even without any labeled data from the target task [86].
Embedding-based methods leverage shared embeddings to transfer knowledge across tasks [87].By learning a common embedding space during pre-training, the model can map new inputs to the same space and perform well on unseen tasks.Model-based methods zero-shot learning, a generative model is trained during pre-training to represent the data distribution [88].During testing, the generative model can be used to synthesize samples for unseen classes.Generative adversarial networks (GANs) can be used in zero-shot learning to generate samples for unseen classes.The generator is trained on seen classes during pretraining, and then it can generate samples for unseen classes during testing [89].Tsai et al. [90] proposed Deep Domain Adaptation approach, which uses privileged information from task-irrelevant dual-domain pairs.Zhang et al. [91] proposed a novel method for zero-shot domain adaptation (ZSDA) that involves training domain-invariant semantic features and task-invariant domain features simultaneously using adversarial learning.The approach aims to learn domain shift domain features in a task-agnostic manner.Meanwhile, [92] investigated the zero-shot scenario in the day-night domain by leveraging prior knowledge obtained from a physics-based reflection model.TABLE 2 depiction the literature in which various authors worked on Plant village Dataset and considering various objectives like testing the model on unknown classes, usage of discriminators like Center Loss, Triplet Loss and Data augmentation approaches either manual or synthetic data generators compared with our proposed method.TABLE 2, it can be seen that none of the existing approaches except our proposed appraoch satisfy all the important objectives required for improving the performance of the model on the unknown class samples.4. The experimental setup divided into task wise, Task-1 consists TLB,TH, TEB classses images as source, PLB, PH, PEB classes images as target.In Task-2 PLB, PH, PEB classes images as source and TLB, TH, TEB classes instances as target.To distinguish each class instances are represented with unique color as shown in Figure 3 Step-1.

B. STEP-2: AUGMENTATION TECHNIQUES
In the realm of data augmentation, we delve into two primary strategies: 1) Manual Image Generation, and 2) Synthetic Image Generation.

1) MANUAL IMAGE GENERATION
we focused to generate new images from original images.The original images from different classes be perform data augmentation.In this process, initially we take original image rotated with specific angle and saved the generated image.The new generated image corners may be out of dimension then edges cropped to fit the fixed dimension.Within this technique, we explore elementary operations like rotation, cropping, and flipping.These subtle alterations introduce variations to images, enhancing the model's ability to generalize by learning from diverse viewpoints of the same object or scene.In the proposed approach, we have taken original image and applied fixed angles starting with 5, multiples of 5 like 10, 15, 20 till 90, the new images generated will be cropped to fixed dimension as of original image as shown in Figure 4.

2) SYNTHETIC IMAGE GENERATION
This process involves crafting entirely new images based on the inherent patterns and characteristics of the original dataset.This approach often employs more sophisticated methodologies, such as generative adversarial networks (GANs) or image-to-image translation models.These techniques yield exceptionally realistic images closely resembling the original data as shown in Figure 5.
In scenarios where the source domain comprises numerous classes but only a handful of samples per class, and this data is considered for domain adaptation, the learned model might struggle to generalize effectively for the target domain.To tackle this challenge, Generative Adversarial Networks (GANs) [93] have emerged as a solution.GANs are a subset of modern neural networks utilizing two opposing models: the Generator (G) and the Discriminator (D) during training.The Generator (G) endeavors to deceive the Discriminator (D) by producing output images akin to the provided input dataset.Meanwhile, the Discriminator strives to enhance its discernment by classifying the generated images as authentic or counterfeit.
In this work, we explore variant of GAN like Conditional Generative Adversarial Networks (cGANs) [94].This This equation succinctly captures the interplay between the generator and discriminator, emphasizing the importance of their balance.
Through cGANs, we empower ourselves to generate synthetic data that is conditioned on labeled data from the source domain.This is achieved by the dynamic interplay between the authentic source data and the synthetically generated data.

C. STEP-3: NORMALIZATION STRATEGIES
Given that real-world data is sourced from diverse resources, each with its individual constraints, it's essential to acknowledge the potential for noise within the dataset.To address this, we employ normalization techniques.These techniques play a crucial role in standardizing the data and bringing it to a uniform distribution, thereby facilitating more accurate and insightful analyses.Consequently, we are considering the utilization of min-max normalization techniques before supplying the data to the CNN Model.
The normalization process can be represented by the equation: In this equation, x represents a data sample, and X corresponds to a feature vector.This normalization approach aims to scale the data appropriately, ensuring that each feature contributes effectively to the model's training process.

D. STEP-4: PRE-TRAINED CNN MODELS
Consider a real-world scenario in which you might encounter a task with very limited or unavailable labeled data.In such situations, developing a deep learning model becomes impractical or even impossible.To address this challenge, transfer learning approaches have been introduced.Transfer learning leverages the concept of pre-trained models, specifically Convolutional Neural Network (CNN) architectures trained on abundant labeled data like ImageNet.The effectiveness of this approach largely hinges on the choice of an appropriate CNN model.This selection depends on the specific requirements of the problem at hand.For instance, if the model's evaluation is constrained by low computational resources, opting for a CNN model with fewer parameters, such as MobileNetV2, proves advantageous.Conversely, if resource availability allows, selecting a higher parameter model like VGG19 could be beneficial.In this paper, we delve into both types of models for our experiments, facilitating a comprehensive comparative analysis.

E. STEP 5: INCORPORATING THE LOSS FUNCTION
In the training of the CNN model, we incorporate the cross-entropy loss function as follows: where y i represents the true label and ŷi signifies the predicted label for the i th sample.
To enhance the classification capability of the model, we introduce two additional discriminative components to the classifier.

1) TRIPLET LOSS
The motivation behind adopting the triplet loss is twofold: it encourages the projection of all images of a single subject onto a single point in the embedding space, while simultaneously enforcing a margin between pairs of images of the same subject and all other subjects.
In the context of the triplet loss, for a specific subject, the distance between an image x a i (anchor) and all other images x p i (positive) of the same subject is smaller than the distance to any image x n i (negative) of any other subject.The triplet loss function is defined as follows: where T is the set of all possible triplets in the training data with a cardinality of M , and α denotes the margin between positive and negative pairs.The minimized loss is formulated as: After incorporating the triplet loss with the cross-entropy loss, the final loss function becomes: 2) CENTER-BASED LOSS The center-based loss was initially introduced for face recognition.We adapt this loss by combining it with the cross-entropy loss for increased discrimination [96].
The process begins by passing the training images through a network pretrained on Imagenet, yielding feature descriptors f y i (x i ).The center c k of the k th class is then computed as: Furthermore, the distance d ik between the feature descriptor of each image and each class center c k is calculated as follows: The overall center-based loss can be computed as where C is the number of clases.Upon incorporating the center-based loss with the cross-entropy loss, the final loss function becomes:

F. STEP 6: TRAINING THE MODEL
We can seamlessly implement the proposed method using mini-batch Stochastic Gradient Descent (SGD) [95].The total loss is defined as L = L s +αL d .Here, the cross-entropy is a conventional softmax classifier.L d could be either T L or C e L. Both the functions are differentiable with respect to 143870 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the inputs, allowing the parameters to be updated through standard backpropagation: where η represents the learning rate.

G. STEP 7: ASSESSING FINE-TUNED MODELS
After training the model using the training dataset, it's essential to evaluate its performance using the testing dataset.
The evaluation is typically done in terms of accuracy, calculated using the following formula: This metric provides insight into how well the model's predictions align with the actual outcomes in the testing phase.

IV. EXPERIMENTATION AND RESULTS
This section presents comprehensive information regarding the experimental setup, data-set statistics, evaluation metrics employed in this study, as well as a series of experiments conducted and their corresponding results.Additionally, a thorough comparative analysis with alternative methods is provided to offer further insights.

A. TRAINING CONFIGURATION
The training setting for experiments, Firstly for data augmentation, the images are used in the same sizes, i.e., 32 × 32.The mini batch size was 100.We use softmax classifier and Stochastic Gradient Descent optimization with momentum 0.9 and learning rate 0.001.

C. EXPERIMENT SETTINGS 1) ZERO-SHOT TRANSFER LEARNING (ZsTL)
In this setting, we explore pre-trained CNN models in two ways, one without fine-tuning and the other with fine-tuning of existing pre-trained CNN models.In the setting without fine-tuning, we assess the performance of the pre-trained model on the target domain samples directly, without any further adjustments.However, in the fine-tuning stage, we take the existing pre-trained model and fine-tune its weights using source data samples.This fine-tuning process extends for 100 epochs, after which we evaluate the model's performance on the target domain samples.
2) ZsTL WITH DATA AUGMENTATION USING GANs (ZsTL-DAG) In this setting, we focus on conditional generative adversarial neural networks (CGANs) generated images and existent images as source domain images.The newly generated images through GANs are then assorted with the original source domain images.Specifically, we generate 200, 300, and 500 samples per class from random noise.We compared the performance of this assorted dataset with the target domain in two scenarios: without fine-tuning the pre-trained model and with fine-tuning the pre-trained model.

3) ZsTL WITH DATA AUGMENTATION USING CROPPING (ZsTL-DAC)
In this experiment setup, we focused to generate new images from original images.The original images from different classes will taken to perform data augmentation.In this process, initially we take original image rotated with specific angle and saved the generated image.The new generated image corners may be out of dimension then edges cropped to fit the fixed dimension.The angles considered for rotation is starting with 5, multiples of 5 up to 90 degrees angle.

4) ZsTL WITH DISCRIMINATIVE INFORMATION PRESERVATION (ZsTL-DIP)
To preserve the discriminative information during the experiment, we incorporate center-based losses(ZsTL-DIP-CL) and triplet-based losses(ZsTL-DIP-TL).These additional loss functions are applied during the fine-tuning process of the existing pre-trained model, alongside the cross-entropy loss.

5) ZsTL WITH DATA AUGMENTATION USING GANs AND DISCRIMINATIVE INFORMATION PRESERVATION (ZsTL-DAG+DIP)
In this context, we enhance the data from the source domain by incorporating images generated from CGANs, while simultaneously preserving the essential characteristics of the target domain through the utilization of center and triplet losses.

6) ZsTL WITH DATA AUGMENTATION USING CROPPING AND DISCRIMINATIVE INFORMATION PRESERVATION (ZsTL-DAC+DIP)
In this experiment, we enhance the data from the source domain by employing simple techniques such as rotation, flipping, and so on.The augmented data is then fine-tuned using discriminative losses to further refine its quality and effectiveness.

7) ZsTL WITH DATA AUGMENTATION USING CROPPING AND GANs, DISCRIMINATIVE INFORMATION PRESERVATION (ZsTL-DAC+DAG+DIP)
In this experiment, we incorporate both data generated using GANs models and data obtained through cropping techniques.Subsequently, the augmented data is fine-tuned using discriminative losses, which serve to enhance its quality and effectiveness.

D. RESULTS AND DISCUSSION 1) ZERO SHORT TRANSFER LEARNING (ZsTL)
The first column in Table 5 displays the testing results of the models on the target domain, where the models were not trained using any data from the source domain.Models trained with additional data generally demonstrate higher accuracy compared to those without additional training.This indicates the importance of training on relevant datasets to improve model performance.Among the models considered, the VGG16 pre-trained model demonstrates the best performance with the highest accuracy of 50.27%.
143872 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Following closely is the VGG19 model, achieving an accuracy of 49.95%.Among the more computationally efficient models, MobileNet-v2 performs reasonably well with an accuracy of 45.07%, surpassing the other models.However, the MnaseNet1_0 model lags behind and exhibits poor performance compared to all the other models.In the second column of Table 5, the testing results of the models on the target domain are presented, where the models were trained using data from the source domain containing classes unknown to the target domain.The improvement in all tasks clearly demonstrates that testing accuracy can be enhanced by leveraging data from other datasets with unknown classes.Among all the models, the VGG16 model achieves the highest accuracy of 71.46%, closely followed by the EfficientNet_b0 and VGG19 models with accuracies of 66.68% and 65.42% respectively.Notably, the EfficientNet_b0 model surprises with an impressive 83.53% accuracy improvement.Comparing the models in terms of the percentage improvement in accuracy, the MobileNet-V3-Small model stands out with the highest improvement of 158%, making it the most significant improvement among all the compared models.Following closely is the ShuffleNet_v2_x05 model, which exhibits an improvement of 105%.Without fine-tuning the pre-trained models, VGG16 achieves the highest accuracy of 44.92%, surpassing all others.Once the pre-trained model parameters are refined, MnasNet1.0stands out with an impressive accuracy of 59.73%, marking a substantial leap from its initial 39.31% accuracy.
In summary, the findings underscore the potential benefits of leveraging data from diverse datasets to improve the testing accuracy of models in domains with unknown classes.

2) ZsTL WITH DATA AUGMENTATION USING GANs
In augmentation using GANs models, a fascinating pattern emerges when adding different numbers of images per class.The VGG16 model stands out, surpassing all other models with an impressive accuracy of 73.14%.Augmenting 200 images per class results in a 2% improvement for the VGG16 model, a 5% improvement for the VGG19 model, and a significant 9% improvement for the AlexNet model, compared to their respective baseline accuracies.Furthermore, other models also demonstrate improvements when augmenting with varying numbers of images per class.The ResNet18 model shows an 8% improvement when adding 300 images per class, the DenseNet121 model achieves a 12% improvement under the same augmentation, and the ShuffleNet model experiences a 4% improvement.Likewise, when augmenting with 500 images per class, the GoogLeNet, EfficientNet, and MobileNet-v2 models show improvements of 4%, 7%, and 2% respectively.These results underscore the efficacy of augmentation using GANs in enhancing model performance across various architectures.The VGG16 model, in particular, shines as it achieves the highest accuracy compared to all other models in this specific scenario.This highlights the potential of GANs-based augmentation as a valuable technique for improving the performance of deep learning models in classification tasks.Upon augmenting the synthetic data set with 200 images per class, VGG16, VGG19, ResNet18, and MobileNet-V3-Small emerge as the top performers in terms of accuracy amongst the various models.Notably, VGG19 exhibits a noteworthy improvement, boasting a 23% increase in accuracy.However, introducing 300 and 500 instances per class into the source domain may not yield favorable outcomes for most of the fine-tuned models.Nonetheless, there are improvements observed in the GoogLeNet and EfficientNet_b0 models after the addition of 500 instances per class in the source domain.Likewise, the MobileNet-V2 model achieves its peak accuracy after the introduction of 300 instances per class.
In summary, the findings suggest that training models with additional relevant data can improve accuracy.Different models show varying levels of performance based on the tasks and training data used.Fine-tuning and data augmentation can also have significant impacts on model accuracy.

3) ZsTL WITH DATA AUGMENTATION USING CROPPING
When applying rotation and cropping techniques for augmentation, intriguing insights emerge when considering varying quantities of images per class, as depicted in Table 6.Notably impressive accuracy results are attained through the utilization of rotation and cropping methods, especially when maintaining a uniform number of images across classes.
143874 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.In Task-2's accuracy assessment, the most noteworthy progress is observed in the VGG16 model, achieving an impressive 53.15% accuracy increase when supplemented with 200 instances, compared to the base model's accuracy.The EfficientNet_b0 and MobileNet_v2 models exhibit accuracies of 40.97% and 44.75%, respectively, upon augmentation with 300 instances.Furthermore, employing 500 instances results in accuracies of 42.73%, 43.66%, and 61.31% for the ResNet18, GoogLeNet, and MnasNet_0 models, respectively.
The inclusion of rotation and cropping techniques for data augmentation leads to substantial accuracy improvements, particularly when maintaining a consistent number of augmented images per class, across a variety of model architectures.

4) ZsTL WITH DISCRIMINATIVE INFORMATION PRESERVATION (ZsTL-DIP)
The experimental results employing two distinct discriminators, namely Center Loss and Triplet Loss, are presented in Table 7 Upon comparing the performance of Triplet Loss and Center Loss, a discernible pattern emerges: Triplet Loss consistently outperforms Center Loss across the majority of the models.These comparative results underscore the significance of maintaining the discriminative information of source domain samples, which notably enhances the performance of the target domain model.

5) ZsTL WITH DAG+DIP
Table 8 presents a performance evaluation of various models and configurations, focusing on the achieved accuracy when using data instances generated through a GAN with a discriminator.In the context of Task T-1, a thorough examination of the data in Table 8 reveals a distinct pattern: when employing a GAN model to augment images, the triplet loss method effectively emphasizes the preservation of discriminative information compared to the center loss method.
For instance, notable models such as vGG19, ResNet18, and GoogLeNet consistently outperform other settings, especially when combined with the TrL+AUG-500 configuration.Similarly, the MobilevNet 2 model demonstrates strong performance when coupled with the TrL+AUG-200 setting.These observations emphasize the superiority of the triplet loss approach in maintaining discriminative information while augmenting data through a GAN.
When delving into task T-2, an interesting contrast emerges: the utilization of center loss to preserve discriminative information surpasses the performance of triplet loss.To illustrate, when combined with the CeL+AUG-500 configuration, specific models such as GoogLeNet, DenseNet_b0, MobileNet_v2, and ShuffleNet_v2 × 0_5 consistently excel, outperforming all other settings.
In summary, it becomes evident that the preservation of discriminative information, coupled with augmentation techniques, plays a crucial role in enhancing the performance of the source domain model as it translates to the target domain model.As anticipated, many models demonstrate superior performance compared to other experimental settings.In T-1, the utilization of Triplet Loss with 500 instances per class results in the most notable accuracy increase.For instance, MnasNet1_0 experiences a remarkable 76.99% accuracy rise to 47.90%, while DenseNet121 achieves a commendable 71.7% accuracy with Center Loss and 200 instances per class.Among the models, VGG 19 stands out, achieving an elite accuracy of 75.17% with Triplet Loss and 300 instances.
Transitioning to T-2, remarkable accuracy enhancements are observed.MobileNet_v2, DenseNet121, GoogLeNet, and ShuffleNet_V2_×0_5 models achieve an outstanding 5.74%, 3.4%, 2.3% and 4.5% increase in accuracy with Center Loss and 500 instances per class.Additionally, EfficientNet_b0 attains the highest accuracy across all settings, with a notable 50.96% accuracy on Triplet Loss and 500 instances per class.
These findings underscore the substantial impact of combining various instance sources with discriminative loss methods, further validating their effectiveness in enhancing model performance.

E. CONVERGES ANALYSIS
To gain insights into the convergence patterns exhibited by various models across different scenarios, we utilized visualizations to depict the training and testing accuracies in Figure 6.The graphs present the Training and Testing accuracies on the Y-axis and the number of epochs on the X-axis.This approach allows us to track the evolution of accuracy values over time, potentially unveiling trends and convergence behaviors.These visual representations hold the potential to yield valuable insights into the models' learning processes and their performances under diverse conditions.
Graph  Graph (f) depicts the results of Zero Short Transfer Learning with Center Loss and Triplet Loss Discriminators executed on Data Augmented images generated by GANs and cropping, employing 500 instances per class with the MnasNet1_0 model.The graph highlights that training accuracies for both Triplet Loss and Center Loss are consistently equivalent.However, testing accuracies differ.The initial testing accuracy for Triplet Loss is higher, gradually decreasing and then rising again as epochs progress.On the contrary, the testing accuracy for Center Loss starts lower but steadily improves as epochs increase.By the 100 th epoch, the testing accuracy for Center Loss lags behind Triplet Loss by 1.07%.

V. CONCLUSION
Our paper introduces a Zero-shot Transfer Learning framework as a solution to the challenge of an inadequate data availability in the target domain for leaf Disease Classification.By integrating cutting-edge CNN models and a range of techniques such as data augmentation, synthetic data generation, and robust discriminative losses, we establish a robust methodology for effectively transferring knowledge from a well-populated source domain to a less abundant target domain.Through comprehensive experimentation on plant disease classification datasets, we validate the potency of each model and technique within the framework.
The results obtained from our experiments highlight the significant improvements in classification accuracy achieved through the implementation of these techniques.This not only confirms the effectiveness of the Zero-shot Transfer Learning framework but also underscores the broader potential of this approach in enhancing disease classification within agricultural contexts.As agriculture continues to face challenges in a rapidly changing environment, leveraging innovative methods like Zero-shot Transfer Learning could play a crucial role in optimizing crop yield which ensures food security.Our research contributes to the ongoing advancements in agricultural technology and opens up avenues for further exploration in the realm of data-driven approaches for tackling real-world agricultural challenges.

FIGURE 2 .
FIGURE 2. Zero short transfer learning: source domain classes, target domain classes.
Zero-Shot Transfer Learning Framework Pipeline is shown in Figure 3.This diagram illustrates a comprehensive architecture designed for the classification of plant leaf diseases.The process encompasses several key steps, including data preparation, augmentation, normalization, integration of a pre-trained model, consideration of loss functions, model training, and final model testing.The pipeline is structured into seven distinct phases: Dataset Preparation (Step 1): This phase involves the selection and preparation of datasets for the zero-shot learning process.Augmentation Techniques (Step 2): Step 2 focuses on the application of various augmentation techniques to enhance the diversity and richness of the dataset.Normalization Strategies (Step 3): Step 3 highlights the implementation of normalization techniques to ensure consistent and standardized input data.Pre-trained CNN Models (Step 4): In Step 4, a range of pre-trained Convolutional Neural Network (CNN) models are presented for feature extraction.Loss Function Integration (Step 5): Step 5 involves the integration and comparison of different loss functions, contributing to the overall effectiveness of the model.Fine-tuning the Model (Step 6): Step 6 details the fine-tuning process, where the model is adjusted to improve its performance and adapt it to the specific task.Ultimately, the pipeline concludes with the Evaluation of Fine-tuned Models (Step 7) on the target data, assessing the model's ability to accurately classify plant leaf diseases.A. STEP-1: DATA PREPARATION Plant Village Dataset: Plant Village dataset consists of 38 classes, classes classified based on type of crop, type of disease.The crop type under fruits category Apple, Blueberry, Cherry, Grape, Orange, Peach, Raspberry, Strawberry.In vegetables category Corn, Bell Pepper, Potato, Soybean, Squash, Tomato.The classes classification based on type of diseases like fungal, bacterial, oomycete, viral and mite.Apart from above mentioned the dataset consists of classes of healthy leaves.The Plant Village dataset consists of 54,303 images from 14 different crop types including 26 disease categories and 14 healthy leafs classes shown in Table3.To evaluate the performance of Zero short Transfer learning model, experiments with different settings were performed.The experiments performed on images of group of classes treated as source domain and tested on images of group of classes treated as target domain.Initially source data with different discriminators executed.Later, apart from original images, data augmented model generated instances added and treated as source data one set of training will be performed.Later data augmented images added to original instances and performed another set of training performed.

FIGURE 4 .
FIGURE 4. Data augmented images generated by different angle rotation angle indicated at bottom of each image and cropping the original image of Tomato Healthy leaf class (first row, first column).

FIGURE 5 .
FIGURE 5. Data augmented images generated by generative adversarial networks the original image of Tomato Light Blight class, Tomato Healthy and Tomato Early Blight class with generated images for different no of epochs.
(a) illustrates the outcomes of Zero Short Transfer Learning with Center Loss and Triplet Loss Discriminators executed on Data Augmented images generated by GANs with 200, 300, and 500 instances per class using the EfficientNet_b0 model.The graph demonstrates that Training accuracies remain consistently high regardless of the instance count.Concerning Testing accuracies, the results for 500 instances per class surpass those of 200 and 300 instances.The accuracies of 200 and 300 instances exhibit minor variations, with slightly higher values for the latter.Graph (b) showcases the results of Zero Short Transfer Learning with Center Loss and Triplet Loss Discriminators executed on Data Augmented images generated by Cropping, featuring 200, 300, and 500 instances per class using the VGG 19 model.The graph indicates that accuracy values fluctuate until around 50 epochs, after which variations stabilize.Notably, the Testing accuracy for 500 instances per class starts notably higher compared to 200 and 300 instances.Graph (c) displays the outcomes of Zero Short Transfer Learning with Center Loss and Triplet Loss Discriminators executed on the MobileNet_V2 model.The graph illustrates that Testing accuracy values exhibit more fluctuations with Triplet Loss compared to Center Loss.With Central Loss, Training accuracy rises until approximately 20 epochs, beyond which it gradually and consistently stabilizes.In contrast, the accuracy of Triplet Loss fluctuates throughout all 143876 VOLUME 11, 2023Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

FIGURE 6 .
FIGURE 6. Illustration accuracy of diffrent models with instances generated by GANs model, Augmented models with center Loss, triplet Loss.In graph symbols notation, TR-Train Accuracy, TS-Test Accuracy, CeL -Center Loss, TrL -Triplet Loss.

TABLE 1 .
State of art CNN models.

TABLE 2 .
Comparative study used on dataset for different architectures with discriminator and data augmentation.

TABLE 3 .
Plant village dataset images category wise and crop type.

TABLE 5 .
Zero short transfer learning with data augmentation using GANs models.

TABLE 6 .
Zero short transfer learning with data augmentation using cropping.

TABLE 7 .
Zero short transfer learning with discriminative information preservation.

TABLE 8 .
Zero short transfer learning with discriminator on GANs model.

TABLE 9 .
Zero short transfer learning with discriminator on augmented images.

TABLE 10 .
Zero short transfer learning with discriminator on GAN, augmented images.

Table 10
provides a comprehensive overview of accuracy across different models and configurations, focusing on instances generated through Data Augmentation, synthetic instances, and original instances combined with Center Loss and Triplet Loss discriminators.The experiment involves the incorporation of augmented instances alongside original and synthetic instances, with the addition of 200, 300, and 500 instances per class.Both Center Loss and Triplet Loss are employed for Task T-1 and T-2.