A Comprehensive Review on Deep Learning Assisted Computer Vision Techniques for Smart Greenhouse Agriculture

With the escalating global challenges of food security and resource sustainability, innovative solutions like deep learning and computer vision are transforming agricultural practices by enabling data-driven decision-making. This paper provides a focused review of recent advancements in deep learning-enabled computer vision techniques tailored specifically for greenhouse environments. First, deep learning and computer vision fundamentals are briefly introduced. Over 100 studies from 2020 to date are then comprehensively reviewed in which these technologies were applied within greenhouses for growth monitoring, disease detection, yield estimation, and other tasks. The techniques, datasets, models, and overall performance results reported in the literature are analyzed. Tables and figures showcase real-world implementations and results synthesized from current research. Key challenges are also outlined related to aspects like model adaptability, lack of sufficient labeled greenhouse data, computational constraints, the need for multi-modal sensor fusion, and other areas needing further investigation. Future trends and prospects are discussed to provide guidance for researchers exploring computer vision in the niche greenhouse domain. By condensing prior work and elucidating the state-of-the-art, this timely review aims to promote continued progress in smart greenhouse agriculture. The focused analysis, specifically on greenhouse environments, fills a gap compared to previous agricultural surveys. Overall, this paper highlights the immense potential of computer vision and deep learning in driving the emergence of data-driven, smart greenhouse farming worldwide.


I. INTRODUCTION
In In the face of escalating global challenges such as population growth, climate change, and urbanization, the The associate editor coordinating the review of this manuscript and approving it for publication was Yongjie Li. call to reimagine our agricultural and horticultural systems becomes more critical than ever.The shadow of these issues hovers ominously over future food security, intertwined with the sustainable usage of our dwindling natural resources [1].Yet, in this realm of concerns, a beacon of promise emerges from technological advancements, holding the potential to revolutionize the agricultural and horticultural landscapes [2].Our pursuit, therefore, centers on the strategic leverage of these cutting-edge technologies to create more sustainable, productive, and resilient farming ecosystems worldwide.
Sustainable access to high-quality food is not just a predicament for developing countries; even developed nations are grappling with this issue.Current agricultural practices, conducted predominantly in open fields, are not sufficiently productive to meet the escalating demand.As the World Health Organization projects, food production will need to increase by 70% by 2050 to meet the needs of an estimated global population of 10 billion [3], about 7 billion of whom will be living in urban areas [4].Thus, exploring alternative production systems becomes not just an option but a necessity to ensure a sustainable food supply chain.
One promising alternative is Controlled Environment Agriculture (CEA), which includes a variety of methods such as greenhouses, high-tunnels, vertical farms, and plant factories [5].In particular, greenhouse farming has emerged as a method offering greater control over the growing conditions of crops, thereby enhancing yield and quality.This technique leverages controlled environments to optimize plant growth, promising higher production rates compared to traditional farming [6].However, despite the potential of greenhouse farming within the CEA framework, it faces certain challenges.Economic sustainability remains a significant concern due to high operational costs, complex microclimate controls, and the need for continuous labor [7], [8].These factors can inhibit the scalability and efficiency of greenhouse farming.To overcome these obstacles, it is imperative to incorporate advanced technologies such as artificial intelligence (AI) and deep learning (DL) into these farming systems.These cutting-edge tools can potentially transform the operational aspects of greenhouse farming.For instance, they can be leveraged for improved micro-environment monitoring and root-zone control, creating the optimal conditions for plant growth while minimizing resource wastage [5].Additionally, AI and DL can facilitate the automation of labor-intensive tasks, thereby enhancing overall operational efficiency.In this context, the adoption and integration of AI and DL in greenhouse farming becomes not only a strategic advantage but a necessity to realize the full potential of Controlled Environment Agriculture(CEA) in addressing our global food security challenges.
Simultaneously, the sphere of horticulture presents a distinct set of challenges and opportunities.Here, we face the necessity of balancing the cultural and economic value of crops with the labor-intensive nature of their cultivation.Thankfully, modern advancements offer promising solutions.Deep learning, in particular, holds the potential to streamline and revolutionize horticultural practices.The sheer volume of data that can now be collected from digital horticulture necessitates efficient processing and analysis.Deep learning algorithms can effectively handle such 'big data', enabling precise and timely decision-making in crop management [9].
In light of these advancements, it is crucial to remember that the applicability of such technologies is not limited to large-scale farming.Small indoor farms, which require significant labor year-round, can also benefit from the integration of intelligent automation.This inclusive approach to technological application in agriculture is critical to ensure food security in the long term.
Through this review, we strive to explore and underline the recent advances in deep learning-assisted computer vision technologies in the domains of agriculture and horticulture in greenhouse setup.We aim to shed light on the challenges, opportunities, and future prospects of these technologies, underlining their potential role in securing global food supply chains, improving horticultural productivity, and propelling us towards a sustainable future.

A. REVIEW SCOPE
In our extensive review of over 100 research papers sourced from esteemed scientific databases such as ScienceDirect, Web of Science, IEEE Xplore, and Scopus, we noticed a trend: while many surveys tackled the broader agricultural domain, there was a distinct gap in the literature specifically focusing on greenhouse farming.Recognizing this limited attention to greenhouse environments, we sought to make a significant contribution by narrowing our scope and offering an in-depth analysis of computer vision applications within greenhouse farming.This work emphasizes the state-of-the-art in computer vision techniques for this specialized agricultural setting.Table 1 provides a comparison between existing survey papers on deep learning-based computer vision applications in the broader agricultural domain and our niche exploration into greenhouses.With our paper, our aim was to shed light on every potential application of computer vision in greenhouse agriculture.This review is intended not only for agricultural researchers keen on understanding nuances in greenhouse setups but also for general computer vision enthusiasts curious about its specific applications in such controlled environments.Furthermore, we have pointed out the real-world impacts and challenges of scaling up these innovative solutions within greenhouses.

B. CONTRIBUTION
Numerous research has explored the application of deep learning and computer vision techniques in the field of agriculture.However, there is a noticeable gap in the existing literature about the comprehensive implementation of these techniques, particularly within the context of greenhouse farming.This work aims to address the existing knowledge gap by providing a comprehensive analysis of the implementation of deep learning techniques in the realm of greenhouse farming, specifically emphasizing the application of computer vision.The paper makes several notable contributions.Key contributions are summarized as follows: • Provides an in-depth literature review focused specifically on deep learning-enabled computer vision 4486 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.applications in greenhouse environments, addressing a gap in existing research.
• Investigates and synthesizes techniques, capabilities, limitations, and future work needed for computer vision across major greenhouse application areas including growth monitoring, disease detection, yield estimation, etc.
• Analyzes performance of computer vision techniques based on results in current literature and summarizes findings in coherent tables.
• Explains common deep learning architectures leveraged in agricultural research to enhance reader comprehension.
• Highlights challenges that need to be addressed for effective real-world implementation in greenhouses.
• Provides visualizations of real-world systems and schematics to showcase practical applications.
• Outlines future directions for advancements in computer vision technologies tailored to controlled agriculture settings.
• Focuses the scope on an understudied niche area of computer vision in greenhouses to fill a literature gap and make novel contributions.
Overall, this review paper makes significant contributions by providing a focused technical synthesis, performance analysis, and future outlook specifically targeted to greenhouse applications of computer vision.The findings aim to catalyze advancements in this promising domain.
The paper is organized as follows.Section I provides an introduction covering the background, objectives, and scope of the literature review.Section II presents an overview of smart greenhouses and controlled environment agriculture.Section III delves into the fundamentals of computer vision and deep learning, the key technologies explored.Section IV, the core of the paper, investigates various applications of computer vision in greenhouse farming based on current literature, spanning areas like growth monitoring, disease detection, yield estimation, and more.For each application, the techniques, results, limitations, and future work are discussed.Section V summarizes the key challenges faced in implementing computer vision in greenhouses and provides perspectives on the future direction of these technologies in advancing greenhouse agriculture.Finally, Section VI gives concluding remarks and highlights the contributions made by this focused review paper.Overall, the logical organization facilitates comprehension and showcases the in-depth analysis of computer vision capabilities, specifically in controlled greenhouse environments.

II. SMART GREENHOUSE: AN OVERVIEW
Controlled Environment Agriculture (CEA), a modern farming method that uses greenhouses, is a powerful tool in today's agriculture.This method is not new; in fact, greenhouses first started to be used in farming in The Netherlands and France back in the 19th century [13].Since then, the technology has improved, and its use has spread worldwide.So, what exactly is a greenhouse?Essentially, a greenhouse is a structureoften built from glass or plastic that allows for year-round crop production, irrespective of the season.The glass or plastic walls and roof let sunlight in while keeping pests, diseases, and bad weather out.Depending on the outside weather, different variations of greenhouses can be used.For example, in very cold places, smaller greenhouses called ''cold frames'' can be used.These structures trap heat from the sun to keep the plants warm.In hot and dry areas, ''shade house'' greenhouses can be used.These provide shade and help to keep the plants moist [7].As illustrated in figure 1, a modern greenhouse is a structure that provides a controlled environment ideal for crop cultivation.The image depicts a typical setup, including the protective coverings and the internal layout.This practical example gives a clear understanding of how greenhouse farming can defy external weather conditions and provide suitable growth conditions for diverse crops.
The main advantage of greenhouses is the ability to manipulate environmental parameters such as temperature, light intensity, moisture, and nutrient levels, adapting them to specific crop needs [15].This facilitates an extended growing season, improved crop quality, and efficient use of resources.However, the management and maintenance of greenhouses can be resource-intensive, and fine-tuning conditions to the optimal range for various crops can be a complex task.This is where smart greenhouses, equipped with Internet of Things (IoT) technology, come into the picture.Smart greenhouses integrate sensors and embedded controllers that collect real-time data and relay it to a cloud server.The system can then make automatic adjustments to the internal greenhouse conditions based on this data, minimizing human intervention [16], [17], [18].Smart greenhouses offer FIGURE 1.A real-life example of a modern greenhouse.This image offers a practical perspective of the controlled environment within which diverse crops can be cultivated irrespective of external weather conditions.Source: [14].
automatic regulation of critical factors like temperature, light, and irrigation, as well as control over other mechanical operations [19].This brings a new level of efficiency to farming, optimizing resource use and potentially improving crop yields.
Moreover, smart greenhouses provide farmers with valuable insights into the most suitable harvesting times, soil quality, nutrient requirements for plants, and water quality [20].This data-driven approach allows for more informed decisionmaking, making farming more reliable and cost-effective.The future of farming appears even more promising with the integration of Artificial Intelligence (AI) and Computer Vision.These technologies can further automate the greenhouse processes and increase their precision [21], [22], [23].For instance, AI can analyze the vast amounts of data collected, predicting future needs and helping make even more accurate adjustments [24], [25].Computer Vision can monitor plant growth, detect diseases early, and even identify when crops are ready to be harvested [12], [26], [27], [28], [29].In essence, the marriage of greenhouse farming with IoT, AI, and Computer Vision is ushering in a new era of efficient, data-driven agriculture, transforming the traditional greenhouse into a 'smart' one (see figure 2).This integration, as illustrated in figure 2, provides a complete picture of how smart greenhouses are designed with various components working harmoniously.The ultimate vision is a fully automated and remotely controlled farm, optimizing resources, enhancing crop yield, and paving the way for a sustainable agricultural future.

III. VISION THROUGH LEARNING: A DEEP DIVE INTO COMPUTER VISION AND DEEP LEARNING
Computer vision (CV) represents a sophisticated interdisciplinary field deeply rooted in both biological science and engineering, converging human-like perception with machine efficiency.Its core idea revolves around replicating how humans see and understand their surroundings and translating this understanding into computational models that machines can utilize.Historically, CV began as an effort to mimic human visual faculties.Early research aimed at understanding how humans perceive the world and then endowing machines with similar visual abilities.This required extensive collaboration between neuroscience to decode the human visual system and computer science to develop corresponding algorithms for machines [30].Groundbreaking applications such as optical character recognition (OCR) [31] and vehicle plate detection marked the initial steps in this journey, impacting diverse areas like traffic control, law enforcement [32], and even retail [33].These achievements were furthered by the use of deep learning (DL) and neural networks, allowing for the automatic extraction of high-level features from data, reducing the need for human intervention.
The evolutionary development of CV expanded into more advanced areas like medical imaging, for detecting and diagnosing diseases [34], and autonomous driving, where it guides self-driving cars [35].In manufacturing, CV aids in quality control with machine consistency, surpassing human ability [36].It even breathes life into virtual reality in entertainment and social interaction [37] and enhances personalized safety measures through facial recognition systems in security [38], [39].A vivid representation of these diverse applications of computer vision across various sectors can be seen in figure 3.An intriguing expansion of CV can be seen in the agricultural domain [12], [40], [41], [42].Here, CV assists not only in crop-related tasks but also plays a significant role in livestock management.It contributes to monitoring animal health, tracking behavior, and managing resources, thereby increasing efficiency and sustainability in animal farming.For crop health, CV enables the detection of diseases, identification of pests, and optimization of resource allocation.With the analysis of visual data, CV tools can automate tasks like fruit picking, guide precision farming techniques, and even assist in tasks related to animal husbandry, such as recognizing individual animals, monitoring their movements, and observing their health status.
Despite these advances, achieving a computer's understanding of an image at a human child's level remains unattainable.This underscores the ongoing challenges in a field with nearly limitless possibilities.The continuous progress in machine  Deep neural networks enable machines to recognize patterns and make decisions once believed to be exclusive to humans.CV is more than a field of study; it is a continually evolving area that harmoniously blends human perception with machine precision.Its applications are enriching various societal sectors, sometimes even surpassing human capabilities.

A. FROM BASIC SHAPES TO COMPLEX PATTERNS
In the initial stages of computer vision, the primary focus was on identifying simple and basic geometric forms such as edges, curves, and corners.These early techniques were often grounded in methods like gray-level segmentation [43], which involves dividing an image into different regions based on variations in brightness or color.However, these primitive methods had significant limitations and were not robust enough to handle more complicated visual tasks where understanding and interpreting complex patterns were required.To address these limitations and enhance the interpretation of visual data, researchers sought new approaches and began to integrate artificial neural networks into computer vision systems [44].Artificial neural networks, inspired by the human brain's interconnected neuron structure, allowed for a more sophisticated analysis of visual data [45].Unlike the pixel-by-pixel analysis used in more rudimentary methods, neural networks provided a way for computer vision systems to analyze entire sections of an image in context, providing a more holistic understanding of the image content.This shift in approach led to substantial improvements in both performance and accuracy, enabling computer vision systems to understand and recognize more intricate patterns and shapes.Furthermore, these new methods allowed computer vision systems to analyze dynamic visual data, such as videos, enabling a higher level of interpretation that considered not only the shapes and patterns within a single frame but also how these elements changed over time [46].The integration of neural networks marked a turning point in the field, transforming it from a discipline that could only handle rudimentary visual tasks to one that could take on complex challenges.This evolution also laid the groundwork for the modern, advanced applications of computer vision that we see today.
Today's computer vision applications follow a process that includes the acquisition of image data, the processing and analysis of that data using sophisticated algorithms and neural networks, and ultimately, the interpretation of the image as shown in figure 4.These applications range from medical imaging to autonomous vehicles, demonstrating the field's progress from understanding basic shapes to interpreting complex, dynamic patterns.

B. THE TOOLBOX OF COMPUTER VISION: TECHNIQUES AND TASKS
Computer vision is a diverse and exciting field that tries to mimic how humans perceive and understand the world around them.Its primary goal is to make sense of images, just like our own eyes and brains do.By understanding what is depicted in an image, computer vision systems can provide valuable information to guide the actions of robots, AI, and other automated systems.In the following sections, we will delve deeper into some of these critical techniques: image classification, object detection, segmentation, and 3-D reconstruction, that make it all possible.

1) IMAGE CLASSIFICATION
Image classification, a fundamental technique in computer vision, assigns predefined labels to images, categorizing complex visuals.Its essence is the ability to capture the essential features of a scene-such as differentiating crops or identifying plant diseases-without focusing on each tiny detail.The advent of Convolutional Neural Networks (CNNs) has led to a significant transformation in this process.These robust networks use a variety of mathematical processes to efficiently learn from labeled images of crops(In figure 5).The revolution began in 2012 with AlexNet [47], a novel CNN architecture, setting a benchmark in the ImageNet Large Scale Visual Recognition Challenge, ushering a new era for image classification via CNNs.Between 2014 and 2017, there was a surge in refining CNN architectures, as seen with advancements in models like ResNet [48], VGG [49], and DenseNet [50].These models, by employing innovative algorithms and connectivity patterns, heightened the ability to discern intricate details in agricultural imagery.Post-2017 witnessed the incorporation of reinforcement learning into CNNs, enabling these models to auto-determine optimal architectures, enhancing adaptability in dynamic agricultural environments.Contemporary studies further improve upon this with variants like ZFNet [51], which leverages visualization tools to understand neural activity, translating this into pixel-space insights.Such advancements not only enhance classification abilities but also unveil the working mechanisms of CNNs, making image-based agricultural decision-making more precise and insightful.Currently, these technologies are of extreme significance in improving the accuracy of image-based agricultural decision-making.

2) OBJECT DETECTION
Object detection in computer vision aims to locate and classify all possible objects within a given image.Central to this 4490 VOLUME 12, 2024 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.task are Convolutional Neural Networks (CNNs), which have predominantly been employed in two main architectures: onestage and two-stage detection systems (in figure 6).Twostage models, exemplified by the RCNN family, first pinpoint potential object regions (known as region proposals) and then classify these regions into distinct object categories.The evolution of this approach has led to various iterations, such as the original RCNN [52], Fast RCNN [53], and the more popular Faster RCNN [54].Three pivotal advancements in the Faster RCNN are the Region Proposal Network (RPN), which efficiently creates object regions; ROI pooling, which extracts consistent features from regions of varying dimensions; and a multitask loss function, which consolidates the training process.While Faster RCNN is acclaimed for its accuracy, its processing speed is a limiting factor, especially for real-time applications.In contrast, one-stage models like YOLO [55] and SSD [56] are designed for faster processing.These architectures generate candidate object regions from each pixel in feature maps, which are then classified and adjusted for accurate object boundaries.However, a key challenge with one-stage models is the significant imbalance between object and background regions in images.To address this, the RetinaNet [57] framework introduced a focal loss function, which emphasizes detection accuracy, balancing out the disproportionate number of irrelevant regions.While RetinaNet offers a blend of accuracy and efficiency, the choice between one and two-stage models depends largely on the specific application needs, with the former being ideal for real-time tasks and the latter prioritized when accuracy is paramount.Surprisingly, these state-of-the-art object identification methods have significant uses outside of traditional fields.The agricultural industry has embraced these technological marvels, especially in the context of greenhouse farming.Precision agriculture has entered a new era by adopting such advanced object detection technologies in greenhouse farming.These improvements have allowed for exact monitoring of plant growth, effective detection of pests, and efficient allocation of resources.

3) SEGMENTATION
Image segmentation is a fundamental computer vision task which categorizes each pixel of an image based on the object it belongs to.Early techniques used region division and merging, and later algorithms leveraged metrics like intra-regional consistency and inter-regional dissimilarity.Recently, machine learning has greatly enhanced segmentation, with key advancements being Mask RCNN [58], dual attention network [59], and particularly, U-Net [60] and its variants such as Attention U-Net [61], U-Net++ [62], ResUNet++ [63], and TransUNet [64].These models have not only advanced the state of image segmentation but have also demonstrated exceptional efficacy in specialized tasks like medical image segmentation.Building on these semantic and instance segmentation techniques, such as the encoder-decoder framework and detection-based methods, further expands the versatility and precision of computer vision applications.Semantics and instance segmentation are vital techniques in computer vision.At a high level, semantic segmentation assigns masks to groups of objects with the same meaning in an image, such as all plants, while instance segmentation focuses on individual objects.Two primary frameworks are employed in these types of segmentations: encoder-decoder-based frameworks and detection-based frameworks.In the encoder-decoder approach, models typically consist of two main phases, as shown in figure 7. The encoder extracts meaningful feature maps from images using convolutional neural networks (CNNs).The decoder, on the other hand, upsamples these feature maps into per-pixel labels using transposed convolution.To enhance the precision of segmentation, these models often employ a lateral connection scheme.This connects feature maps between the encoder and decoder phases, ensuring the preservation of the image's semantic meaning throughout the process.Additionally, post-processing methods like conditional random fields (CRFs) are utilized to refine object boundaries.Notable models in this category include U-Net, fully convolutional network (FCN) [65], FastFCN [66] and DeepLab [67].
Detection-based frameworks, in contrast, pivot on CNN architectures tailored for object detection.Some early efforts attempted to leverage object detection models, for instance, segmentation, such as simultaneous detection and segmentation (SDS) [68] and DeepMask [69].However, these approaches struggled to achieve desired performance levels.The game-changer in this arena was the Mask RCNN, which integrated an FCN with a Faster RCNN to create masks for individual objects.This model has consistently demonstrated top-tier performance in both semantic and instance segmentation, solidifying its reputation in the field.

4) 3-D MODELING
3-D modeling in computer vision is fundamentally about stereo correspondence and 3-D reconstruction [70], [71].Stereo correspondence generates a 3-D model from multiple images of the same object or scene by finding matching pixels across these images and mapping their 2-D positions to 3-D.Techniques such as epipolar geometry, sparse correspondence, and dense correspondence are commonly used [72].On the other hand, 3-D reconstruction creates a 3-D model from a single image [73], [74].The earliest approach involved predicting object shape from visual shading, pioneered by Horn in 1970 [75].This was followed by other ''shape from X'' methods like shape from texture and shape from focus.Active range finding and model-based reconstruction, which are often used in architectural 3-D modeling, are among other methods.Designing an effective loss function for evaluating predicted 3-D point clouds against ground truth remains challenging, with options including evaluating the coverage of the ground truth object's silhouette by the projected 3-D point clouds.Deep-learning-based algorithms have recently led to significant enhancements in the performance of 3-D reconstruction systems.In summary, the world of computer vision is as complex as it is fascinating, comprising an array of techniques and tasks that together help the system understand and interact with its environment.With the rapid advancements in machine learning and artificial intelligence, we can expect to witness further evolution in these techniques, expanding the possibilities of what computer vision can achieve.

C. THE ADVENT OF DEEP LEARNING
Machine learning offers a significant advancement in data processing.Traditional methods typically necessitate manual feature extraction.With the surge in large data sets and the introduction of graphics processing units (GPUs), algorithmic methodologies have seen considerable refinement.Deep learning, an evolution from traditional machine learning, incorporates some ''deeper'' (more complex) structures, enabling automatic feature extraction from unprocessed data.It often surpasses the efficacy of its predecessor in various classification and prediction tasks [76].By integrating multiple layers of abstraction, it allows hierarchical data representation [77], [78].This multi-layered approach enhances the analytical performance for numerous large-scale data processing tasks [79], [80].Essentially, deep learning is an advanced non-linear data processing technique grounded in representation learning and pattern analysis.Typically, deep learning models refine data representations through multi-layered neural networks.These networks comprise several neurons structured in layers.Neurons in adjacent layers connect based on weights, which are adjusted during learning.These neurons represent diverse non-linear functions, facilitating the creation of complex models.By connecting multiple layers, deep learning provides solutions for intricate real-world challenges efficiently [78].
The complex architecture and massive learning capability of deep learning models equip them with exceptional prediction and classification capabilities.This enables them to adapt effectively to complex data analyses.Leveraging its innate ability for automatic feature extraction, deep learning addresses several challenges in agriculture, such as various recognition, growth monitoring, yield estimation, quality assessment, stress detection, and more.These applications are discussed in detail in the next section.
Convolutional neural networks (CNNs) and their derived models play a pivotal role in artificial intelligence and have led to breakthroughs in image processing and analysis.CNNs are a class of deep, feed-forward artificial neural networks (ANNs) that are a family of multi-layered neural networks that have been effectively used in computer vision applications.They constitute a prominent method for analyzing vast amounts of data.Our analysis indicates that a significant majority of horticulture-related papers utilize CNNs.Typical CNNs incorporate convolution, pooling, and fully connected layers in various configurations to perform complex learning tasks.A model CNN architecture displays the process of classifying different flower species using multiple layers and components in figure 8.
In the convolutional layer, local patterns within an image are identified using the process of convolution.Here is how it works: A kernel is initially positioned on the top-left part of the image as shown in figure 9.Each pixel under this kernel gets multiplied by the respective kernel value.The resulting products are then summed, with a bias added afterwards.The kernel shifts by a pixel, and the process continues until the entire image undergoes this filtering.Following the convolutional process, the pooling layer steps in, aiming to down-sample and extract prominent features from the obtained feature map.It also brings invariance to minor translations, rotations, and scaling in the image.Two prevalent pooling methods are max and average pooling, as shown in figure 10.While max pooling considers the highest value from a designated portion of the image, average pooling computes the mean of that portion.In most CNN architectures, convolutional and pooling layers alternate.The last key layer is the fully connected layer, where every neuron is interlinked to the preceding neuron.In this stage, the various features that were obtained from earlier layers are combined and condensed into a one-dimensional representation, preparing them for detection or classification objectives.
The efficacy of a deep learning model depends on the appropriate choice of hyperparameters.These include aspects like the network architecture, number of layers, number of neurons in the hidden layer, convolution and pooling layer structure, learning rate, weight initiation, and activation function.While custom architectures can be revolutionary and innovative, they typically demand an advanced level of computational expertise that may be beyond the capabilities of ordinary agricultural researchers.As a result, researchers frequently start with a pre-trained architecture that has proven to perform well over a wide range of data structures and challenges and then modify it to fit the issue at hand.This approach is reliable and effective.
CNN architectures like LeNet, AlexNet, VGGNet, MobileNet, Inception V3, EfficientNet, GoogLeNet, and ResNet have been deployed for many computer vision tasks discussed in the next section.These networks process input through multiple convolution and pooling layers before utilizing fully connected layers for classification or detection tasks.The choice of parameters within these layers and model selection should be tailored to the specific problem as illustrated in Fig 11, graphical representations of the SSD deep learning architecture have been used to detect cherry tomatoes.The SSD utilized VGG16 as its base network, with multiple layers appended to its tail end.During the training process, both the image and its associated Ground-Truth Box were simultaneously fed into the system.Following this, each feature map across 6 layers generated a default box set, as well as the confidence of object category within the box.
Another prevalent deep learning model used for sequential data is the recurrent neural network (RNN).This model excels in predicting prices [83], processing natural languages [84], recognizing speech [85], among other applications [86], [87].An RNN's unique attribute is its ability to retain previous data, which influences current outputs.Figure 12 demonstrates the basic architecture of the RNN applied to fruit quality assessment, illustrating how prior outputs serve as inputs due to the presence of the hidden layer.This layer effectively acts as a memory cell, ensuring every prior result informs the subsequent iteration.Consequently, each unit within the hidden layer is termed a recurrent cell.
Image segmentation tasks also benefited from deep learning techniques.The R-CNN method, for instance, combines CNNs with region proposals for object detection.Currently, Mask R-CNN leads the segmentation task, adding a branch to the Faster R-CNN for precise, high-quality segmentation mask generation for each region of interest.We already covered this in the previous section.There are many other neural network architectures that have been applied in the realm of deep learning.These include the Single Shot Multibox Detection (SSD), Long Short-Term Memory networks (LSTM), the ''You Only Look Once'' (YOLO), Region-based CNN (R-CNN), Fast R-CNN, and its successor, the Faster R-CNN etc.Beyond RGB imagery, these structures also support varied data formats, including videos, hyperspectral visuals, and spectral datasets.Moreover, for ease of reference, we have compiled a summary of the models reviewed.This includes details such as the year of development, key concepts, and links to source code or third-party implementation if available (See Table 2).Furthermore, to assess the effectiveness of these neural networks, various evaluation metrics have been used.Here are some commonly utilized metrics in the studies we have reviewed: • Classification Accuracy (CA): This metric calculates the ratio of accurately identified images or classes to the overall number of images or classes.For problems involving multiple classes, the CA is computed as the average across all categories.

CA = Number of Correctly Classified Samples Total Number of Samples (1)
• Precision: Defined as the ratio of True Positives (TP) to the combined count of TP and  False Positives (FP).
• Recall: Denoted as the ratio of TPs to the collective sum of TPs and False Negatives (FN).
• F1-score: This is the harmonic mean of precision and recall, providing a balance between the two.
• Root-Mean Square Error (RMSE): It measures the root-mean square of the differences between predicted and actual values, offering insight into the model's prediction accuracy.

IV. DEEP LEARNING ASSISTED COMPUTER VISION APPLICATIONS IN GREENHOUSE FARMING
The integration of deep learning-assisted computer vision techniques into greenhouse environments has produced promising results, indicating significant effectiveness.Deep learning-enhanced computer vision has a wide range of applications within the greenhouse industry, including recognition and classification of crops, crop yield estimation, crop quality inspection, crop growth monitoring, automatic harvesting, disease and pest management and so on.A small taxonomy of these applications is illustrated in figure 13.
In this section of this paper, we explore the specific problems addressed in academic literature across these application areas.We investigate various computer vision techniques, models and architectures that have been utilized, the performance of the model adopted, discuss the underlying constraints and challenges, and offer insights to guide future research endeavors.

A. CROP GROWTH MONITORING
Ensuring the healthy growth of crops is essential in greenhouse farming in order to achieve optimal yield, quality, and efficient Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.use of resources, all of which contribute to profitable outcomes in agricultural production.Crops require a variety of nutrient elements, including macronutrients, secondary nutrients, and micronutrients, for optimal growth.Traditionally, monitoring crops has always relied heavily on human observations, which frequently leads to inaccurate and delayed results.Precision agriculture, especially in controlled environments like greenhouses, emphasizes the importance of consistent and accurate crop monitoring at varied growth stages.By integrating computer vision into greenhouse management, we can achieve real-time, precise monitoring.It provides timely interventions for optimal growth by detecting minute changes in crop health due to nutritional deficiencies far earlier than manual inspections.Utilizing innovative technologies to enhance agricultural processes, especially in the realm of greenhouse farming, is increasingly prevalent.Lee et al. [92] embarked on an exploration using clip-type Internet of Things (IoT) cameras to monitor tomato growth in a greenhouse setting, using deep learning-based object detection to track flower blooming and immature fruit development.By integrating the flower's bloom date with temperature forecasting, their system could predict harvest dates with an average error of two days.While the clip-type design effectively addressed challenges like plant overlap in densely populated greenhouses, reliance on wired power sources and occasional occlusions posed limitations.The introduction of battery-powered IoT cameras and more adaptive positioning might enhance its applicability.In a different study, Gang et al. [93] implemented a convolutional neural network (CNN)-based model to estimate growth indices for greenhouse lettuce, utilizing RGB-D data from stereo cameras.Their dual-stage CNN architecture, building upon the ResNet50V2 layers, demonstrated remarkable accuracy.Similarly, Zhang et al. [94] showcased the capabilities of convolutional neural networks in relation to digital images.The study demonstrated how CNN can be useful in monitoring growth indicators like leaf fresh weight, leaf area, and leaf dry weight, delivering results that outperform traditional methodologies, especially for specific lettuce cultivars.However, while these models can extract depth information to enhance accuracy, their real-time application demands considerable computational power.Nevertheless, the speed of image processing, particularly with edge devices like Jetson SUB mini-PC, implies a potential future in real-time monitoring.
The growing field of computer vision and deep learning has opened up new possibilities in crop management.Pretrained models like YOLO, ResNet, VGG16, MobileNet, Detectron2, etc., have showcased remarkable capabilities in object detection.Integrating these advanced techniques into horticultural research has yielded impressive performance and opened avenues for innovative applications.For instance, Moysiadis et al. [95] highlighted the potential of using the YOLOv5 pre-trained model for mushroom growth monitoring.The complexities of mushroom growth patterns raise the significance of this research.However, the accuracy of detection underscores the challenges of using computer vision in dense environments with overlapping objects.Similarly, in a recent study, Shinoda et al. [96] introduced the ''RoseTracker'', a system that combines YOLOv5, SORT, and a regression model aimed at monitoring the growth of roses in cultivation environments.The dataset they provided stands out due to its specific focus on the unique stages of rose growth.Though the results demonstrate remarkable accuracy, the system's adaptability across diverse global cultivation conditions and its versatility with other floral species will be crucial factors in determining its broader acceptance and utility.
These examples of applying deep learning-based computer vision to greenhouse crops demonstrate its utilization for various aspects of research and production, including species classification, organ detection, growth stage monitoring, and localization.Studies have implemented techniques like convolutional neural networks, YOLO, and other object detection models to track developmental indicators and phases precisely.While occlusion and adaptability across diverse environments remain challenges, deep learning enables realtime, accurate crop monitoring to support data-driven decisionmaking for optimal yields, quality, and resource efficiency in greenhouse farming.The technology displays remarkable potential in replacing time-consuming and imprecise manual inspection.Overall, deep learning-based computer vision facilitates timely, precise interventions for optimal greenhouse crop growth by detecting growth changes earlier than humanly possible.Further technological advances could transform greenhouse management and contribute to more sustainable, productive agricultural practices.Some of the studies in this area are precisely summarized in table 3

B. RECOGNITION AND CLASSIFICATION OF CROPS
Automated recognition and classification of greenhouse crops is a challenging task due to the vast diversity and continuous evolution of plant species.Moreover, crops can undergo various mutations, leading to significant variations within a single class.As a result, flowers from different species may share similar features such as shape, colour, and general appearance.Therefore, the recognition and classification of greenhouse crops presents a complex challenge with multiclass problems.Although manual classification is possible, it is typically labour-intensive and prone to errors, particularly when handling large numbers of samples.Hence, the application of computer vision-based deep learning techniques for the purpose of identifying and classifying species or cultivars holds immense potential as a groundbreaking advancement in the field of smart agriculture and greenhouse farming, thanks to their outstanding speed and accurate recognition capabilities.
In recent advancements, the adoption of deep learning-based computer vision techniques in the domain of greenhouse agriculture has witnessed notable developments, particularly when it comes to enhancing precision and efficiency.Chen et al. [97] attempted to resolve the challenge posed by the dense intertwining of cucumber canopy vines.By proposing an image recognition model based on an enhanced YOLOv5, they aimed to increase the accuracy of detecting cucumber canopy vine tops.Notably, they introduced the CA (Coordinate Attention) mechanism module and transitioned from GIOU to EIOU for loss regression, resulting in a commendable accuracy of 97.1% in recognizing cucumber canopy tops in varied conditions.While the methodology is outstanding, there is potential to further adapt it for broader agricultural applications, considering the dynamism of outdoor conditions that might affect accuracy.
Islam et al. [98] addressed the complications in separating leaf pixels from backgrounds in thermal images due to factors like thermal radiation and greenhouse humidity.They proposed TheLNet270v1, achieving a remarkable 91% accuracy in distinguishing canopy pixels.This innovative approach underscores the capability of deep learning in analyzing thermal images within greenhouses.While the results are promising, it is essential to make sure the model works well with different pixel size and adapt to the diverse environmental shifts commonly observed in greenhouses.
Zhou et al. [99] proposed an ''improved Faster-RCNN'' architecture to detect strawberries from ground-level RGB images.This mechanism not only aids in efficient harvesting but also plays a pivotal role in selecting high-yield strawberry varieties.Their method achieved a notable fruit extraction accuracy of 86%, which surpassed the three other methods tested.However, the complexity and environmental adaptability of these models still necessitate further research.Future work in this domain needs to address training complexity and refine models to ensure robustness across various environmental conditions.
Cong et al. [100] innovatively incorporated the Swin Transformer attention mechanism into Mask RCNN, enhancing the model's feature extraction capabilities.Their model efficiently segmented sweet peppers even in complex Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.scenarios, such as varying lighting conditions, pepper overlaps, and leaf occlusions.Achieving an average FPS value of 5, their approach holds promise for real-time monitoring of sweet pepper growth.Nevertheless, there is still room for improvement, especially in terms of real-time performance (e.g., inference speed), to make it more optimized and viable for large-scale practical applications, particularly in automated fruit-picking systems.
In recent years, the automation of crop harvesting, specifically tomatoes, has gained considerable attention in agricultural robotics.Su et al. [101] targeted tomato maturity detection, an essential aspect for determining post-harvest logistics such as transportation and storage.Their SE-YOLOv3-MobileNetV1 model excelled in classifying tomatoes into four distinct maturity levels with an average precision value of 97.5%.The incorporation of the Squeeze-and-Excitation attention mechanism ensured accurate detection while keeping the model lightweight, an attribute essential for embedded development in robotic applications.However, despite the robustness of their model, additional progress is required to address the challenges posed by real-world circumstances, such as the mutual occlusion caused by leaves and fruits.Meanwhile, Yuan et al.'s [82] study focuses on detecting cherry tomatoes in a complex greenhouse setting.Considering the operational environment and the precision offered by deep learning, they chose the SSD model and further experimented with varying base networks and input sizes.Their results demonstrate significant improvements in automatic cherry tomato detection, with a fascinating 98.85% precision achieved using the Inception V2 network.However, the challenge of detecting side-grown tomatoes underscores the need to fine-tune the model for complex scenarios.Moreia et al. [102] highlighted the crucial step towards achieving fully automated robotic harvesting: the development of an accurate fruit detection system.They proposed a deep learning-based system using SSD MobileNet v2 and YOLOv4 models to detect tomatoes and introduced an innovative histogram-based HSV colour space model for classifying their ripening stage.Notably, YOLOv4 displayed an impressive performance in both detection and classification, with an impressive F1-Score of 85.81% in the detection task.Yet, challenges persisted in identifying the middle stages of ripening due to subtle colour variations.This underpins the importance of continual model refinement, especially when differentiating closely related classes.In another similar study, Mao et al. [103] strived to enhance the accuracy and practicality of cucumber detection in complex environments.They introduced a multi-path convolutional neural network (MPCNN) with colour component selection and a support vector machine (SVM).The methodology effectively identified the cucumber region by reducing the background interference and emphasizing the colour differences between the cucumber and its surroundings.This approach yielded satisfactory results, with over 90% pixels in cucumber images being accurately classified.However, it remains uncertain how this model would perform under different environmental conditions or with other cucumber varieties.
These applications of deep learning-based computer vision for automated recognition and classification of greenhouse crops shows promising results, but also persisting challenges.Studies have utilized techniques like YOLO, Mask RCNN, Faster R-CNN, MobileNets, and Swin Transformers to accurately detect and classify various crop species, cultivars, and growth stages.The complexities of distinguishing between highly similar classes, handling occlusion, and adapting models across diverse environments and lighting conditions remain active research problems.However, deep learning methodologies have achieved remarkable accuracies surpassing traditional techniques.Continual refinement of models and architectures tailored to specific crops, growth conditions, and agricultural tasks is still needed for robust realworld performance.Table 4 summarizes the technical details of the studies presented in this subsection.

C. CROP DISEASE MONITORING
In agriculture, plant diseases always remain a major issue that causes significant losses in the world's food production, particularly in controlled settings like greenhouses.It is severely affecting the yield and quality of agricultural products and has become a key concern in the development of global agriculture.Traditional identification methods, largely manual and guided by pathologists, often lack the desired speed and precision, making them unsuitable for the fast-paced requirements of modern agriculture.Therefore, the development of deep learning and computer vision presents a promising path toward fast and accurate disease recognition; however, environmental factors such as different lighting and leaf occlusion continue to be ongoing challenges that researchers work to overcome in the area of greenhouse disease monitoring.At present, advances in imaging technology have led to the creation of several open-source image datasets featuring various horticultural crops.Notable datasets include ImageNet, PlantVillage, and OUFD.These collections have significantly improved the accuracy of image classification and recognition.Such large-scale image datasets are extensively used, offering plenty of feature information for training deep neural network models in horticultural research.With PlantVillage(https:// www.tensorflow.org/datasets/catalog/plant_village),Wspanialy et al. [104] explored diseases affecting tomato leaves and highlighted the importance of automated disease detection because of its cost-effective implications.Their system can identify various tomato leaf diseases and evaluate their severity.While the method showed the potential to detect previously unseen diseases and yielded severity estimations comparable to human assessments, biases in the dataset, especially concerning the background, could limit its realworld applicability.Future efforts should concentrate on diversifying datasets to ensure the development of more generalized and robust models.Similarly, using the PlantVillage dataset, Restrepo-Arias et al. [105] introduced a novel diagnostic approach that highlights the impact of genotypic and phenotypic characteristics on how plants respond to pathogens.Their method, which emphasizes texture-based features and uses Bayesian Optimization to train artificial neural networks, achieved an impressive accuracy of up to 96.31% with MobileNet.This method's emphasis on textural features shows promise, potentially reducing biases arising from leaf morphology.However, exploring more plant datasets and, experimenting with different image sizes, not relying solely on texture, and considering various features might enhance its classification accuracy and broaden its applicability.
Despite the widespread use of open-source datasets like PlantVillage, crop recognition systems are still in the development stage and have not been established on a large scale.As a result, most researchers prefer to experiment with their own collected image sets.Zhao et al. [106] collected images of healthy and diseased strawberry varieties to build their dataset.They introduced a modified Faster RCNN architecture, emphasizing multiscale feature fusion.Achieving a commendable mAP of 92.18%, their method stands out for its efficiency and accuracy.Still, continuous refinement of their model is required for its adaptability across a broader spectrum of strawberry diseases.Xu et al. [107] addressed the challenge of melon leaf disease detection using an innovatively pruned version of YOLO v5s combined with ShufeNet v2.Their strategy achieved an impressive 95.7% mAP@0.5.By focusing on smaller disease features, they achieved real-time detection in intricate greenhouse environments.The model's speed and efficiency, with an inference time of just 13.8 ms, are noteworthy.Their work underscores the power of leveraging refined neural networks for specific tasks, though the expansion to other crops will determine its broader relevance.Zhang et al. [108] employed the EfficientNet-B4 model to identify diseases in cucumber leaves, achieving an impressive 97% accuracy.However, external factors like lighting introduced challenges in distinguishing similar diseases.While their approach offers promise for real-time greenhouse monitoring, addressing environmental variables and optimizing for different devices, remain crucial for broader applicability.Zhang et al. [109] utilized color and colorinfrared (CIR) images to diagnose wheat diseases like leaf rust and tan spots.Employing deep features extracted through the ResNet101 model, their approach achieved notable accuracies up to 84%.While their approach maximizes the advantages of automated image analysis over manual observation, the deeplearning model's supremacy in capturing finer features stands out, making it a promising direction for further advancements in disease detection.
These investigations show that deep-learning models, particularly when trained on comprehensive image datasets, offer substantial promise in timely and accurately detecting plant diseases in controlled environments like greenhouses.The fusion of computer vision and advanced imaging technology has shown the potential to revolutionize traditional agricultural monitoring practices.A detailed summary of the techniques and their respective performances in the discussed research can be found in table 5.

D. AUTOMATIC HARVESTING
Historically, the agricultural landscape was dominated by manual and labor-intensive processes, leading to increased costs and limitations in efficiency.The recent advent and integration of computer vision technology have marked a significant agricultural transformation.Today, the use of advanced technology in the form of intelligent harvesters equipped with vision-based robotics is on the rise.Groundbreaking research in this area has reshaped the paradigms of contemporary agricultural production, both investigating and implementing these technological advancements.For instance, Rong et al. [110] 's exploration into robotic harvesting of greenhouse tomatoes sheds light on the current technological challenges in replicating the efficiency and accuracy of manual harvesting.They proposed an advanced system to accurately identify tomato positions and determine the best grasping technique.By integrating a YOLOv5m-based detection mechanism, they achieved impressive recognition accuracies of 90.2% for tomato bunches and 97.3% for individual fruits.However, while their optimized strategies reduced collision impacts on the manipulator's grasp, yielding a promising harvesting success rate of 72.1%, their average harvesting time of 14.6 seconds per fruit still underscores the need for speed improvements.
Similarly, Benavides et al. [111] attempted to automate tomato crop harvesting using a Computer Vision System (CVS), with a primary focus on the detection and localization of ripe tomatoes.By employing a myriad of digital image processing tools and basic trigonometry, the system successfully classified around 80.8% of beef tomatoes and 87.5% of cluster tomatoes as ''collectible.''An outstanding achievement was the sub-millisecond processing time, a significant leap from previous methodologies.Despite the advancements, challenges persist in terms of ambient lighting conditions and the variability of the working environment, emphasizing the need for more adaptable and robust systems.
A different study by Rong et al. [112] highlights the need for a fully automated mushroom harvesting robot.Their innovative robot system, employing Intel RealSense D435i for imaging and an improved SSD algorithm for detection, offers a promising mushroom recognition success rate of 95%.The system's robustness is further evidenced by an admirable harvesting success rate of 86.8% within an average time of 8.85 seconds per mushroom.However, their initial approach to mushroom recognition faced challenges with varying illumination, adhesion intricacies, and posture identification, underscoring the need for deep learning models with more sophisticated recognition algorithms to effectively negotiate the intricacies involved in mushroom identification across diverse circumstances, hence enhancing the process.Nonetheless, the fragile nature of oyster mushrooms highlights  Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
the critical need for advancements in end-effector designs, ensuring minimal damage during the harvesting process.
Recent innovations by Liu et al. [113] explored the application of the DA-Mask RCNN model for detecting green asparagus.The aim was to enhance detection precision during the autonomous harvesting of green asparagus by integrating MASK RCNN with depth information.The addition of the depth filter showed significant promise, especially under varying illumination conditions, achieving precision values of up to 0.993.While the model demonstrated resilience against false positives in bright lighting, further optimization, particularly for nighttime scenarios, is crucial for broader applications.
These studies underscore the transformative potential of combining computer vision with deep-learning models in reshaping harvesting practices.While significant advancements have been achieved, the interplay between accuracy and operational efficiency emerges as a central concern.Table 6 summarizes the technical details of the studies discussed in this subsection.

E. YIELD ESTIMATION
Yield estimation in agricultural production is pivotal for stakeholders across the spectrum-from farmers to agricultural enterprises.It aids in strategically navigating post-harvest operations, driving marketing initiatives, and optimizing resource allocation.While traditional techniques have dominated this space, there is an emergent reliance on cutting-edge computational methodologies.The intersection of computer vision and agriculture has seen innovative developments, particularly with the employment of deep learning models to address challenges like overlapping crops, dense vegetation, and varying light conditions specially in greenhouse setup.This synthesis of technology and agriculture brings new methodologies, delivering improved precision in predictions, even as complexities arise.Wang et al. [114] employed an improved version of the YOLOv3 deep learning model to estimate tomato yields in artificially lit plant factories (PFAL).By refining the traditional YOLO algorithm, they achieved a notable mean average precision (mAP) of 99.3%-a 2.7% improvement over the original YOLOv3.Notably, their approach excelled in distinguishing densely packed and obscured fruits, paving the way for real-time crop monitoring and dynamic yield estimation.However, challenges persist due to the complex lighting conditions in PFAL environment and similarities between green fruit and their surrounding vegetation.In another study, Maji et al. [115] investigated wheat yield estimation through SlypNet, a hybrid deep learning approach that combines Mask R-CNN and U-Net.This approach effectively captures wheat morphological features, attaining a high mAP of 97.57% in spike detection.The study underscored the technique's resilience to natural field constraints like overlapping and varying resolution.Yet, while SlypNet proved excellence for its detection capabilities, it acknowledges the challenge in precisely estimating grain yield from spikelet counts.Further studies focused on investigating the intricate anatomical details of spikelets could potentially enhance the precision of yield prediction.The growth and development of generative organs in greenhouse plants are essential for both yield estimation and higher productivity.Given the challenges in the greenhouse environment, such as leaf and branch obstructions and the risk of duplicate counts, there is a pressing need for more efficient methods.Manual counting approaches are time-consuming and often marred by inaccuracies, highlighting the necessity for rapid and automated solutions.Egi et al. [116] addressed this need by innovatively incorporating a drone-based AI system to detect and count greenhouse tomatoes.Their emphasis was not limited to the fruits; they also targeted the flowers.Using the YOLO V5 and Deep Sort algorithms, their method showcased remarkable accuracies of 99% for green tomatoes and 85% for red tomatoes.Nonetheless, their approach faced challenges with flower detection, securing only a 50% accuracy.While their achievements are commendable, the potential inaccuracies from drone movements, combined with the shortcomings in flower detection, hint at areas for improvement.An expanded dataset could offer a potential solution.Zhou et al. [117] utilized an ''Improved ResNet'' deep learning model to segment accurately and grade broccoli heads in greenhouse conditions.Their method achieved an impressive accuracy of 0.896 for broccoli head segmentation, even in varied lighting conditions.While the model's performance is notable, its dependency on controlled environments and data processing pipeline reliance on manual settings poses challenges for real-world applications.However, the potential integration of semi-supervised learning in future work indicates promising strides towards refining and optimizing crop yield estimation in greenhouse farming.
From the discussed studies centered on yield estimation in agricultural settings, particularly in greenhouses, it becomes evident that the scientific community is making strides in leveraging advanced computational techniques.These pioneering methodologies, while exhibiting remarkable promise, also underscore the complexities of the greenhouse environment.Further investigation and advancement in these methods could potentially bring about a paradigm shift in the agricultural sector-including farmers and enterprisesstrategize planting, harvesting, and marketing initiatives.Table 7 provides more details about the complex technical aspects of the studies discussed here.

F. CROP HEALTH ANALYSIS
In the realm of greenhouse plant health analysis, advanced computer vision techniques have provided innovative solutions for the prompt detection of crop biotic and abiotic stresses, nutrient deficiencies, and water stress, ensuring superior plant quality.The ability to promptly identify biotic and abiotic stresses is imperative for effective greenhouse crop management and optimal plant health.While visual inspection is tedious and subjective, computer vision and spectral Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
imaging offer great promise for automated, non-invasive assessment of crop condition.As Taha et al. [118] employed spectral analysis and machine learning models to estimate nutrient contents in aquaponically grown lettuce.Using the selected optimal wavelengths, they achieved commendable predictive accuracies (R 2 ≥ 0.94), suggesting a potential automated solution for nutrient estimation in aquaponics.However, the real challenge lies in translating these lab-based experiments to real-world greenhouses where external factors can significantly impact spectral readings.Eshkabilov et al. [119] also employed hyperspectral imaging within the 400-1000 nm range to predict nutrient concentrations in lettuce cultivars using PLSR and PCA models, achieving high accuracy (R 2 = 0.784-0.987)with laboratory measurements.However, while hyperspectral imaging holds promise, its implementation might be restrictive due to cost implications, especially for small-scale greenhouse farmers.
RGB imaging has also been utilized for abiotic stress detection.Lak et al. [120] developed a classification algorithm using visible light images and MLP neural network modeling to detect water stress in greenhouse tomatoes.After optimizing with PCA, the technique achieved 83.3% accuracy in distinguishing normal and water-stressed plants using only RGB image features.Surprisingly, adding thermal imagery did not improve results.Similarly, Levanon et al. [121] utilized RGB and thermal imaging along with neural networks to predict water and nutrient stress in banana plantlets.The multi-modal data fusion approach enabled models to achieve high prediction accuracy of over 90% for four stress classes.However, the small sample size of 16 plants may limit generalization.
Beyond abiotic factors, computer vision shows promise for biotic disease screening.Najafian et al. [122] introduced a large dataset of over 40,000 wheat kernel images to detect Fusarium damage using deep learning models like Effi-cientNet and ResNet.The semi-supervised approach reached F1 scores up to 84.29% for binary classification but was limited for multi-class tasks.Overall, the dataset provides a valuable benchmark, but more samples are needed.Janani et al. [123] proposed a unique method to identify nitrogen nutrient levels in groundnut leaves using the CNN-based HVN model, achieving a training accuracy of 95% and validation accuracy of 92%.Although their method highlighted the significance of leaf color in determining nitrogen concentration, they acknowledged potential inaccuracies stemming from external factors unrelated to nitrogen levels.Such nuances emphasize the importance of considering all potential variables in crop health prediction.
Overall, these studies demonstrate that computer vision and spectroscopy can enable automated, non-destructive monitoring of multiple crop health indicators.However, translating these technologies into commercial systems will require more robust models that fuse multi-modal data to overcome environmental noise, larger datasets to improve generalizability, and model optimization to account for subtleties between stress symptoms and normal variations.
Table 8 provides more details about the complex technical aspects of the studies discussed here.

G. PEST AND INSECT MONITORING
Monitoring pests and insects in greenhouses has always been a primary concern for sustainable agriculture, impacting both the quality and quantity of yield.It has now become an exciting field in the agricultural domain, emphasizing the critical need for precise, real-time solutions.Traditional methods have shown limitations in accuracy and response time.Consequently, computer vision and deep learning, particularly convolutional neural networks (CNNs), are being increasingly explored to identify pests like whiteflies and thrips, aiming to streamline and improve the detection process in greenhouse environments.Traditionally, farmers manually sample, count, and identify pests, a time-consuming and error-prone process.The integration of computer vision presents a more effective and accurate alternative, and a wide range of research is exploring its possibilities in the context of integrated pest management (IPM) [124], [125], [126], [127], [128], [129], [130], [131].In a recent study, Liu et al. [129] investigated real-time pest detection on crops using advanced computer vision and deep learning.By adopting convolutional neural networks (CNNs) and generating a virtual database for training, they achieved an impressive 97.8% detection accuracy for various invertebrate pests on crops.In the future, utilizing multispectral or hyperspectral imaging may make it possible to identify well-camouflaged pests.Furthermore, the development of ground-based robotic system capable of performing real-time proximal detection of invertebrate pests could be the future of pest management.Yang et al. [128] focused their research on identifying greenhouse pests, especially whiteflies and thrips, using image processing.Their novel method employed dual color spaces(HIS and Lab) combined with advanced ensemble learning classifiers, resulting in a commendable 95.73% recognition accuracy.While this approach substantially minimized manual intervention, the study underscores the challenge of uniformly illuminating complex greenhouse environments, possibly impacting accuracy in varied scenarios.Despite this success, there is still room for improvement.The system could be more optimized by reducing the number of false positive detections.Moreover, extending the system's capability to a broader range of environments beyond greenhouses is on the horizon.On a similar note, Li et al. [131] also conducted research on detecting whitefly and thrips from sticky trap images using a deep learning model, 'TPest-RCNN', based on the Faster R-CNN.This model, finetuned for small insect detection, showed excellent precision with a mAP of 0.95.While the results were promising, challenges like diverse pest densities and lighting conditions remain.However, their approach sets a robust groundwork for real-time monitoring, which can significantly aid timely interventions.In a distinctive approach, Zhao et al. [130] tackled pest detection in Brassica chinensis through images  captured by unmanned aerial vehicles (UAV).Given the challenges of aerial imagery such as image blur and small object sizes, their strategy leveraged deep learning combined with improved CenterNet algorithms, realizing a detection performance of up to 94.7% R-squared.Despite its potential, aerial imagery's inherent issues, like blurring, could affect its widespread adaptability unless integrated with superior quality cameras or advanced image restoration techniques.In a different study, Lins et al. [127] addressed the task of automating the counting and classification of aphids, particularly Rhopalosiphum padi, using a software named 'AphidCV'.This software, rooted in computer vision and machine learning, not only expedited the counting process but also introduced morphometry data.While its prowess is undoubted, some limitations in classifying winged aphids were evident.Diversifying the training dataset, perhaps with data augmentation techniques, could be beneficial.This solution holds promise for its extension to other aphid species and direct application in field-based pest management, reducing reliance on labor-intensive manual methods.
Given the advancements detailed above, computer vision, particularly when coupled with deep learning methodologies like CNNs, showcases transformative potential in the realm of greenhouse pest detection and management.Nevertheless, as research progresses, it is imperative to address inherent challenges, such as diverse pest densities and varied lighting conditions, to ensure consistent, real-time accuracy.Embracing multi-modal data fusion, hyperspectral imaging, and more extensive datasets can further amplify the precision and reliability of these systems within greenhouse environments.Table 9 details the technical aspects of the discussed studies.

H. GREENHOUSE CROP QUALITY INSPECTION
The quality control and grading of agricultural commodities is essential for determining their market value, safety, and appeal to consumers [132].Manual inspection, however, can be inconsistent, labor-intensive, and unscalable for largescale production.integration of computer vision and artificial intelligence has unlocked exciting new possibilities for automated, non-destructive, and real-time assessment of horticultural produce quality [133], [134], [135], [136], [137].As explored by Tan et al. [134], deep convolutional neural networks demonstrated high accuracy (mean average precision of 95.52%) in classifying the maturity of tomatoes based on color features extracted from images.The use of task-specific architectures like Mask-RCNN allowed precise segmentation and localization of the produce within complex backgrounds.However, the model training time of 6 hours indicates room for optimization in computational efficiency for real-time adoption.
Beyond maturity, quality parameters like pest/disease damage, ripeness, and shelf-life have also been automated using imaging techniques.For instance, Hendrawan et al. [135] developed a convolutional neural network model to categorize large green chili peppers into three maturity classes with 91.27% accuracy.The model shows promise for rapid, objective maturity grading to standardize quality.However, classification accuracy was lower for immature peppers, warranting further research into data augmentation and transfer learning to improve model robustness.In another study, Shi et al. [136] combined deep learning and causal analysis to predict maturity dates of leafy greens in greenhouses, achieving a root mean squared error of only 2.49 days.While novel, the approach struggled with crops in late static growth stages, suggesting the need for adaptive models that emphasize historical data over static phenotypes in late stages.
Wei et al. [137] developed a model using grape skin color analysis and a backpropagation neural network to predict the maturity of greenhouse-grown grapes, achieving up to 79.4% accuracy.A two-factor color model performed better than single-color predictors.However, prediction accuracy varied between grape varieties depending on color changes during ripening.Custom varietal models or adaptive techniques may further improve prediction.
Zhu et al. [133] proposed a computer vision approach using YOLOv5 and OpenCV to grade mushrooms in greenhouses based on size features automatically.The model achieved 96% accuracy in identifying and measuring occluded mushrooms under varying illumination.However, further work is needed to optimize mushroom recognition speed and expand functionality for tasks like robotic spraying.Nyalala et al. [138] presented a technique using machine learning and image processing to estimate the weight and volume of tomatoes on a simulated greenhouse conveyor system.Occluded tomatoes were segmented using polygon approximation before extraction of shape features.The best models achieved high correlation with reference measurements, demonstrating feasibility for in-line, non-destructive quality screening.Nonetheless, additional is required across diverse tomato varieties and shape features.
The studies presented highlight the immense potential of computer vision and AI techniques to automate the quality assessment of greenhouse-grown horticulture produce.Advanced models have enabled non-destructive evaluation of multiple quality traits, from external features like pest/disease damage and maturity to internal parameters like texture and shelf-life.However, enhancements in model versatility, accuracy, and computational efficiency are needed to account for the diversity of greenhouse varieties and environments.Current systems also have limited real-world testing beyond controlled settings.Expanding datasets, integrating multi-modal sensor inputs, and optimizing deep learning architectures tailored for greenhouse conditions will be critical next steps.While research is still progressing, computer vision and AI solutions promise to transform quality control practices for greenhouse horticulture.Automated, real-time quality grading and defect detection could provide invaluable objective data to support selective harvesting, packing, pricing, and sales for greenhouse produce.Overall, these emerging technologies are poised to enhance productivity, reduce waste, and add value to the competitive greenhouse industry.presents the technical details of several studies conducted in this area.

I. SEED QUALITY ANALYSIS
Seeds are crucial to modern agriculture, determining both food supply and crop yield [118].Traditional manual assessments of seed quality, while essential, are laborious and prone to inaccuracies.The commercial seed industry is now leaning into computer vision technology, tapping into its potential to extract seed features with precision.With the rapid advancement of various imaging techniques, enhanced by deep learning, this technology sets new benchmarks for seed quality evaluation in greenhouse settings [140], [141], [142], [143], [144], [145].For example, Medeiros et al. [141] investigated the application of Convolutional Neural Networks (CNN) with X-ray imagery to determine crambe seed quality.Their deep learning models robustly categorized seeds based on their tissue integrity, germination, and vigor with impressive accuracies of 91%, 95%, and 82%, respectively.This substantiates the vast potential of X-ray imagery in furnishing critical insights into the physical and physiological attributes of seeds.However, a lingering concern is the reliance on digital radiographic images, which, while powerful, might not capture the complete essence of seed vitality in diverse scenarios.On a similar note, Hong et al. [142] employed a combination of hyperspectral and X-ray imaging techniques for a nondestructive viability prediction of pepper seeds.The ensemble-based fusion model, integrating both hyperspectral and X-ray data, stood out with an accuracy of 92.51%.This approach demonstrates that combining different imaging modalities can produce more accurate categorization results.Nonetheless, the research's dependency on just two pepper-seed cultivars may demand broader experimental validations for holistic reliability across varied conditions.In a different study, Lube et al. [143] introduced the MultipleXLab system: a flexible platform for monitoring seed germination and root growth.Using deep learning methodologies, they showcased the system's capability to screen seed vigor and evaluate seedling responses under varied conditions.The system, however, presents two key challenges: its restricted experimental duration due to agar dehydration and a potential light deprivation issue for seeds placed in specific positions.Overcoming these limitations could elevate its utility.Gao et al. [144] presented an innovative end-to-end platform named HyperSeed, which is adept at providing hyperspectral information specifically for seeds.Their application on rice seeds using a 3D convolutional neural network (3D CNN) outperformed traditional methods like the support vector machine (SVM) model, boasting an impressive 97.5% accuracy.Nevertheless, the system is constrained by its single-threaded software nature and demands exploration of global spatial traits, hinting towards potential future advancements.On the other hand, Sabanci et al. [140] conducted a study to distinguish between tomato seed cultivars, employing a multi-tier deep learning approach.They initially utilized convolutional neural network (CNN) models for seed image classification, with MobileNetv2 showcasing the highest efficacy.Furthermore, they leveraged deep features from this model to feed a Bidirectional Long Short-Term Memory (BiLSTM) network, pushing the classification accuracy to a notable 96.09%.Despite the promising results, the study underscores the importance of diverse datasets for model robustness and suggests hyperparameter optimization to enhance performance.
To synthesize, while computer vision technologies are profoundly redefining seed quality assessments in greenhouse farming, there remains an imperative for further refinements to harness their full potential.Such tools, when impeccably optimized, can provide real-time and nondestructive solutions to the complexities of seed quality evaluation in the modern agricultural landscape.Table 11 provides a concise summary of several studies conducted in this field.

J. WEED MANAGEMENT
Weed management poses a significant challenge in contemporary agriculture, as weeds compete with crops for vital resources such as light, water, nutrients, and space.Weeds are a primary factor behind agricultural yield losses.Study [146] highlighted that weeds account for about 34% of yield losses, substantially more than losses from pests (18%) or pathogens (16%).Recently, the agriculture sector has seen a growing interest in integrating modern weed management techniques with computer vision.Extensive research has been carried out using these cutting-edge technologies to control and manage weed growth in greenhouses.In one study, Koparan et al. [147] emphasized the role of image background in deep learning models for weed detection.The study applied advanced architectures like VGG16 and ResNet50, highlighting a decrease in model accuracy when transitioning from a uniform to a non-uniform background and vice versa.However, when combined data sets from both backgrounds were used, the performance surged to nearly 99%.Despite their rigorous approach, the limitations lie in the model's dependency on the image backgrounds, underscoring the need for more diverse training data.This could enhance the model's adaptability across varying environmental conditions.In another study, Wang et al. [148] underscored the challenges posed by the limited availability of weed datasets in the field.The study introduced ''Weed25'', a dataset encapsulating images of 25 weed species and utilized state-of-the-art models like YOLOv3, YOLOv5, and Faster R-CNN to achieve impressive accuracy rates hovering around 92%.However, the research confines its scope to only 25 species, leaving room for the inclusion of more diverse weed types, which could lead to further refinement of models tailored for precision in greenhouse weed management.Oda et al. [149] developed a multispectral camera system specifically for recognizing weeds within crops.Their findings revealed that the infrared band was more precise than other bands, highlighting its importance in plant detection.This affirms  the potential of combining computer vision with multispectral imaging in enhancing post-emergence herbicide applications.Nonetheless, challenges persist, including the influence of leaf overlap and varied light intensities on detection accuracy.Addressing these challenges head-on could pave the way for more robust and accurate weed detection models.Koparan et al. [150] examined the role of site-specific weed management using RGB image texture features.Their methodology compared the Support Vector Machine (SVM) and the deep learning-based VGG16 models.Remarkably, their deep learning approach, specifically the VGG16 model, showcased an impressive f1-score value of 100% for corn classification, which is a breakthrough for the corn crop production system.While their method holds potential, one must consider the vast variability of crops and regions.A significant highlight of their work is the effectiveness of VGG16 in identifying weeds in the presence of various crops, elucidating the intricacies of weed-crop dynamics in precision agriculture.In a different study, Rai et al. [151] deployed deep learning models on edge devices, aiming to detect weeds through aerial imagery.By comparing both heavyweight and lightweight deep learning models, they showed that lightweight models, specifically CSPMobileNet-v2 and YOLOv4-lite, achieved a mean average precision(mAP) of 83.2% and 82.2%, respectively.Their approach offers real-time detection with commendable accuracy but, critically, must be scrutinized for potential hardware biases and storage challenges, especially with high-resolution aerial data.The flexibility offered by edge computing cannot be ignored; however, securing these devices remains a pertinent concern.Drawing insights from those studies, it is evident that integrating computer vision techniques holds immense potential for refining greenhouse weed management.However, achieving precision remains a formidable challenge.Diverse data sources, enhanced models, and advanced imaging techniques may collectively steer the future of this realm, allowing for more sustainable and efficient weed control in greenhouses.A brief overview of different studies in this area, along with their technical details, can be found in Table 12.

V. CRITICAL CHALLENGES FACED BY COMPUTER VISION TECHNOLOGY IN SMART GREENHOUSE OPERATIONS AND THE WAY OUT
While computer vision has shown immense potential across various greenhouse applications, as highlighted in this review, significant challenges must be addressed for effective realworld deployment.

A. LACK OF LARGE-SCALE STANDARDIZED DATASETS
A major limitation identified is the shortage of large-scale standardized image datasets for the greenhouse domain (as noted in [66], [102], [105]).Most studies rely on small proprietary datasets collected by the researchers themselves, often just a few hundred images, which restricts generalization of techniques ( [93], [98], [107]).There is a need to establish extensive public databases encapsulating the diversity of greenhouse environments, with variability in factors like lighting, humidity, crop types, growth stages, and imaging angles represented ( [92], [95], [106]).Centralized repositories like PlantVillage offer a valuable start but have limited coverage and annotation complexity.Constructing large-scale greenhouse image datasets with standard formatting and annotations will be critical to benchmark performance of computer vision techniques and fuel advancements ( [94], [114], [115]).

B. NEED FOR SPECIALIZED AND OPTIMIZED MODELS
While deep neural networks have driven progress in many visual recognition tasks, off-the-shelf models pre-trained on open-field images fail to account for the unique intricacies of greenhouse environments ( [96], [97], [105]).Networks designed for generic datasets do not transfer effectively to greenhouses, as the specialized appearance and growth patterns of crops in controlled settings are not represented ( [99], [108], [109]).The frequent occlusion from dense foliage, glass surface reflections, condensation, extreme illumination fluctuations, and evolution of visual features across plant growth requires specialized model architectures and training strategies ( [95], [100], [114]).For instance, Li et al. [97] found that directly applying YOLOv3 for cucumber canopy recognition resulted in insufficient accuracy, requiring an enhanced model with additional coordinate attention modules.Xu et al. [107] showed that extensive pruning and adaptation of YOLOv5s was needed for effective real-time melon disease detection in greenhouses.Rong et al. [110] demonstrated that off-the-shelf YOLOv5 had difficulties recognizing occluded tomato bunches, needing optimization of the loss function and model architecture.These studies underscore the need to tailor base models to address nuances like occlusion, illumination variation, and growth patterns unique to greenhouse environments.Beyond architectures, the training process must also account for greenhouse-specific factors.Models like those proposed by Cong et al. [100], Gang et al. [93], and Liu et al. [113] illustrate the benefits of customized training regimes using specialized greenhouse datasets over pre-trained networks.However, collecting exhaustive labelled greenhouse data can be challenging.Recent work by Moysiadis et al. [95] demonstrates the potential of simulated or synthetically augmented data for greenhouse model training.Such emerging data-centric solutions must be explored in combination with adapted model architectures.
Overall, while deep learning has shown immense potential, realizing robust computer vision for greenhouses will require specialized model architectures and training techniques tailored to the unique intricacies of controlled environments ( [92], [94], [98]).Greenhouse-specific networks, re-training regimes, augmented data, and continual optimization will be instrumental in developing computer vision solutions adept at the nuances of greenhouse conditions.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

C. CONSTRAINTS IN COMPUTATIONAL RESOURCES
Many state-of-the-art techniques like Mask R-CNN are computationally intensive, making real-time deployment difficult given the constrained resources of embedded greenhouse systems ( [111], [114]).Hardware devices in agricultural settings often have tight power budgets, restricting the complexity of models that can be run ( [93], [116]).While cloud-based solutions can provide greater parallel processing power, sole reliance on the cloud introduces drawbacks like network latency, connectivity dependencies, and data privacy concerns.A balanced approach could be edge-cloud co-design, with lightweight models handling core functionality on-device while leveraging the cloud for more intensive computations.Still, optimizing models and inference pipelines for efficient execution on low-power devices remains a key challenge ( [112], [117]).Research on extremely lightweight yet accurate architectures specialized for greenhouse conditions is imperative.

D. LIMITED INTEGRATION OF DIVERSE SENSING MODALITIES
While most research relies solely on RGB images, other sensing modalities like hyperspectral imaging can provide valuable crop insights ( [118], [119], [120]).However, techniques to integrate and fuse multi-modal data sources are still emergent.Capturing relationships between RGB, spectral, depth, thermal, and other data could significantly enhance model robustness and performance ( [121], [122]).Developing sensor fusion methods and tailored multi-modal networks is an open research frontier.

E. SHORTAGE OF SPECIALIZED AND INTERDISCIPLINARY TALENT
The effective development and implementation of computer vision technologies for greenhouses require skilled professionals spanning multiple disciplines ( [14], [103], [104], [105]).Emerging deep learning solutions rely heavily on the parallel computing capabilities of GPUs ( [64]).To fully leverage these tools requires both computer vision and agriculture expertise ( [93], [114]).However, there is a talent shortage with competencies and experience across these domains ( [97], [99]).Computer vision involves diverse fields like image processing, machine learning, and pattern recognition ( [14]).Integrating techniques from these areas into the complexities of greenhouse farming demands specialized interdisciplinary knowledge ( [95], [108]).From researchers advancing scientific innovations to technicians managing realworld deployment, skilled personnel able to bridge computer vision and agriculture are imperative but lacking ([100], [106]).Closing this talent gap across computer science and agricultural engineering is vital to drive progress.Initiatives to support education, training, and collaboration across disciplines are critical to develop professionals that can effectively apply computer vision to transform greenhouse agriculture.
In summary, major challenges like lack of sufficient labeled data, need for task-specific models, computational constraints, and limited multi-modality demonstrate that significant work remains to develop computer vision techniques adept at the nuances of greenhouse environments.Substantial progress has been made, but overcoming these areas through data generation, model optimization, efficient designs, and sensor fusion integration will be essential to realize the full potential of computer vision for smart greenhouse agriculture.

VI. CONCLUSION
This paper presented a focused analysis of recent advancements in applying computer vision and deep learning for greenhouse agriculture automation.Spanning diverse application areas like crop monitoring, disease detection, yield forecasting, and quality analysis, over 100 studies were reviewed in detail.The innovations showcased offer glimpses into the transformative potential of data-driven intelligent solutions for optimizing productivity and sustainability in controlled environments.However, significant challenges remain that must be addressed before widespread adoption.The lack of large-scale standardized datasets restricts model generalization and limits performance benchmarking.Meanwhile, computational constraints of embedded systems pose bottlenecks for real-time deployment.There is also a need for more skilled talent with expertise across computer vision, deep learning, and agriculture.Most importantly, the uniqueness of greenhouse environments demands specialized, optimized techniques -off-the-shelf solutions pre-trained on open field data often fail to transfer effectively.Nonetheless, the progress made indicates that an exciting future lies ahead.Expanded collaborations for standardized dataset development, computational advancements in low-power devices, interdisciplinary training programs, and research on tailored solutions for greenhouses will be critical to drive the field forward.Computer vision and deep learning have already shown initial success in automation tasks like robotic harvesting and quality assessment.With continued innovation, they are poised to transform greenhouse infrastructure worldwide -enabling autonomous, efficient, data-driven systems that enhance productivity, resilience, and sustainability.This timely review provided a holistic synthesis of the state-of-the-art, analyzed key challenges, and outlined prospective directions.By condensing current work and elucidating future needs, it aims to motivate and guide ongoing research to unlock the full potential of AI and computer vision for next-generation smart greenhouse agriculture globally.The possibilities are boundless, and the opportunities endless.

FIGURE 2 .
FIGURE 2. Schematic Representation of a Smart Greenhouse components Integrated with IoT, AI, and Computer Vision Technologies.

FIGURE 3 .
FIGURE 3. Diverse Applications of Computer Vision Across Various Sectors.
learning and DL further amplifies what CV can accomplish.

FIGURE 4 .
FIGURE 4. Flowchart of the Computer Vision Process.This diagram illustrates the sequential stages involved in computer vision applications, from the initial acquisition of image data to the final interpretation and utilization of the processed information.

FIGURE 5 .
FIGURE 5. Diagram of CNN's image classification architecture mechanism.

FIGURE 6 .
FIGURE 6. CNN architecture mechanism for Object detection.

FIGURE 8 .
FIGURE 8.A Deep Convolutional Neural Network (DCNN) layout: This structure comprises an initial input layer, followed by four convolutional layers and their corresponding ReLU activations.It also features two stochastic pooling layers, a pair of fully connected layers, and concludes with a softmax regression output layer.Source: Prasad et al. [81].

FIGURE 9 .
FIGURE 9.The process of convolutional operation.

FIGURE 10 .
FIGURE 10.The process of pooling operation.

FIGURE 13 .
FIGURE 13.Taxonomy of computer vision applications in greenhouse farming.

TABLE 1 .
Comparison of Deep learning-based Computer Vision applications in agriculture-related survey papers.GM Growth monitoring, RCC Recognition and classification of crops, DM Disease Monitoring, AH Automatic Harvesting, QI, Quality Inspection, YE Yield Estimation, PHA Plant Health Analysis, PIM Pest and Insect Management, SQA Seed Quality Analysis, WM Weed Management.

TABLE 2 .
Summary of major CNN architectures.

TABLE 3 .
Summary of vision-based crop growth monitoring studies.

TABLE 4 .
Summary of recognition and classification of crops.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 4 .
(Continued.) Summary of recognition and classification of crops.

TABLE 5 .
Summary of greenhouse crop disease monitoring.
VOLUME 12, 2024Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 6 .
Summary of greenhouse automatic harvesting task.

TABLE 7 .
Summary of greenhouse yield estimation.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 8 .
Summary of greenhouse crop health analysis.

TABLE 9 .
Summary of greenhouse pest and insect management.

TABLE 10 .
Summary of greenhouse crop quality inspection.

TABLE 11 .
Summary of greenhouse seed quality analysis.Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE 12 .
Summary of greenhouse weed management.