A Reinforcement Learning Approach for Optimal Placement of Sensors in Protected Cultivation Systems

Optimal placement of sensors in protected cultivation systems to maximize monitoring and control capabilities can guide effective decision-making toward achieving the highest levels of productivity and other desirable outcomes. Reinforcement learning, unlike conventional machine learning methods such as supervised learning does not require large, labeled datasets thereby providing opportunities for more efficient and unbiased design optimization. With the objective of determining the optimal locations of sensors in a greenhouse, a multi-arm bandit problem was formulated using the Beta distribution and solved by the Thompson sampling algorithm. A total of 56 two-in-one sensors designed to measure both internal air temperature and relative humidity were installed at a vertical distance of 1 m and a horizontal distance of 3m apart in a greenhouse used to cultivate strawberries. Data was collected over a period of seven months covering four major seasons, February (winter), March, April, and May (spring), June and July (summer), and October (autumn) and analyzed separately. Results showed unique patterns for sensor selection for temperature and relative humidity during the different months. Furthermore, temperature and relative humidity each had different optimal location selections suggesting that two-in-one sensors might not be ideal in these cases. The use of reinforcement learning to design optimal sensor placement in this study aided in identifying 10 optimal sensor locations for monitoring and controlling temperature and relative humidity.


I. INTRODUCTION
Agriculture is important for the sustenance of livelihoods worldwide by providing nutrition, raw materials for industries, draft power, and mobility, among others.
The associate editor coordinating the review of this manuscript and approving it for publication was Dongxiao Yu .
Protected cultivation systems, including greenhouses, create ease in controlling macro and micro-environments that allow year-round crop cultivation and help provide favorable growing conditions during uncertainties such as extreme weather and pests [1]- [3]. Financially, protected cultivation systems usually have higher returns per unit area compared to open field cultivation [4]. VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The adoption of these systems is rising, with an estimated 405,000 hectares [5] of greenhouses spread across all continents and fifty-five countries operating them on commercial scales [5].
Unlike in open field cultivation, operations in protected systems tend to be more sophisticated and require the adoption of several technologies [4], [6], [7]. These technologies make protected cultivation systems capital intensive. First-generation intelligent protected cultivation systems focused primarily on adopting sensors for monitoring indoor climate conditions and performing essential controls of temperature, humidity, and irrigation. For most plants, an air temperature range of 10 • C to 24 • C must be maintained for survival and optimum yield [8], [9]. Temperature control is critical not just for plant performance but also for managing the system's energy consumption which invariably influences production costs. In addition, maintaining the right range of relative humidity is critical in controlling transpiration and preventing fungal infection [10].
Autonomous condition observation is essential to health monitoring in protected cultivation, especially for large systems. Drastic changes in atmospheric conditions are more likely in protected cultivation systems than in open fields. Monitoring and controlling these basic parameters in addition to others such as carbon dioxide and electrochemical conductivity [11]- [15] are therefore more critical in protected systems. These have given rise to the development of sensors for monitoring the micro and macro conditions of protected cultivation systems. The development of mobile robotics to automate processes and optimize production has also transformed operations and increased the need for sensors.
Currently, most sensors are randomly installed in protected cultivation systems depending on the grower's resources, available technical know-how, and the size of the protected cultivation system [16]. Conventionally, as many sensors as possible have been placed at random locations within enclosures to monitor climatic conditions. The use of multiple redundant sensors could result in big data and associated problems with data management. The quality of information obtained depends considerably on the number and location of sensors. Optimizing sensor locations within a distributed process reduces operating costs but is challenging since most distributed processes are intrinsically nonlinear with infinite dimensions. Early methods used were derived from approximation models of the partial differential equation (PDE), such as the finite difference method or the error covariances matrix of Kalman filters [17]- [19]. These early methods had been applied only to linear systems for a small number of sensors without any general systematic approach. Therefore, they were infeasible for complex nonlinear systems which required higher dimension representations.
Reports [20]- [23] have also shown that sensor data validation could be very challenging in system monitoring due to the stochastic nature of failure occurrence, poor quantity, and incorrect location of sensors. This could cause an inability to get sufficient information and lead to the incorrect understanding and knowledge of the conditions of the systems. For example, in vegetable cultivation, many leaves and fruits on plants can cause blind areas and low utilization of directional sensors placed incorrectly [24].
Several techniques such as system reliability criterion under epistemic uncertainty [25], genetic algorithms [26], [27], Harris hawks optimization [28], exponential-time exact algorithm [29], and Fisher information matrix [30] have been used to design optimal sensor placement. Methods such as supervised and unsupervised learning have been less explored in controlled environment cultivation compared to other applications to the best of our knowledge. In [31], the authors reported two methods (equal sensor-spacing and trial-anderror) to select numbers and locations of wireless sensor nodes in an intelligent greenhouse. The study suggested that increasing the number of sensors did not necessarily reduce errors in measurement. Therefore, an optimal number and placement of sensors needed to be determined through trial and error.
The advancement of computer simulation of human thinking in machine learning techniques such as reinforcement models provides robust ways to address sensor placement problems that are high-dimensional, complex, and full of uncertainties. In this study, a machine learning approach was used to determine the optimum number and locations of sensors for high-quality and representative data collection in a greenhouse. The major steps involved, a) designing and installing two-in-one temperature and relative humidity sensors to take measurements and a network architecture to transmit and store data remotely to a server and b) based on the reinforcement learning method, tracking errors encountered in each trial and programming an agent to interact with the environment (the greenhouse) to reduce errors. This paper is organized as follows: Section I introduces the background of the study and related works. Section II describes the method of data acquisition and the properties of the studied greenhouse. Sections III presents the proposed framework categorized into problem formulation, the Thompson sampling algorithm, and its implementation. Finally, Section IV analyzes the environmental data, and Section V presents our conclusions.

A. RELATED WORK
The optimal sensor placement problem has been explored in various fields with different machine learning techniques. In [32] for example, Gaussian processes were explored to develop a learning scheme of greedy algorithms. Also, [33] employed a random forests algorithm to select the most important input variables as the optimal sensor locations. In another study, [31] developed a scalable algorithm for sensor placement under constraints of cost and complete coverage using a non-deterministic polynomial-time-complete (NP-complete) -class of computational problems for which no efficient solution algorithm has been found -for arbitrary sensor fields. Adopting a grid-based placement scenario, the problem was formulated as a combinatorial optimization problem for minimizing the highest distance error in a sensor field under the constraints. Also, [34] introduced a Bayesian approach to optimal sensor placement for structural health monitoring focusing on the example of active sensing using guided ultrasonic waves by implementing an appropriate statistical model of the wave propagation and feature extraction process. For structural health monitoring, [26] employed a genetic algorithm technique which assigns a fitness value to each candidate solution of the problem and applies the principle of survival of the fittest. The modal strain energy (MSE) and the modal assurance criterion (MAC) were used as the fitness functions. The limitation to this approach was the little chance or probability of getting good results when the number of iterations were high. For sensor placement to maximize the efficiency of a fault detector, [35] proposed a neural network method to locate and classify faults to determine an optimal (or near-optimal) sensor distribution.
Some earlier work has also been reported for optimal sensor placement in greenhouses. In [16], an error-based method and entropy-based method to determine the optimal sensor locations were explored. It was believed that in controlling the internal environment of a greenhouse, sensors should be installed at points that accurately represent the entire environment and this inspired the error-based approach. This approach first calculated the reference trend by averaging the air temperature data measured by all the sensors followed by a combination trend calculated by averaging the air temperature data measured at the selected sensor locations for all the combinations. The error trend of each combination was calculated as the difference between the reference trend and the combination trend. Finally, the combinations were ranked according to statistical indices calculated using the error trends (the average, standard deviation, outlier, and z-index). The z-index, an index for evaluating how close the distribution of error trends is to a Gaussian distribution was used. The combinations were ranked according to each statistical index and then scored accordingly. Attempts to identify the optimal sensor location for detecting areas with significant air temperature variations led to the second entropy-based method. The approach here was to minimize the amount of redundant information in the measured data while maximizing the quality of information obtained. In [24], a hierarchical cooperative particle swarm optimization algorithm was used for directional sensor placement in a vegetable-cultivating greenhouse to maximize target coverage without occlusion. Their experimental results showed improved sensor utilization to a certain degree. The particle swarm optimization algorithm decomposed the global effective coverage problem into the utilization optimization of each sensor and finally led to the orientation angles proposed [24]. The experimental results showed that the studied model and algorithm could avoid occlusion between covered objects (as observed in the arrangement of leaves on a tree) while improving sensor utilization to a certain degree. Liu [36] proposed a solution to optimal sensor placement in a solar greenhouse located on a roof based on the analysis of the characteristics of temperature in the internal environment and using computational fluid dynamics.
Most of the approaches mentioned above relied on complex control assumptions and schemes, or an exhaustive search over a large set of candidate placements that were defined in advance. Therefore, they were infeasible for complex nonlinear systems which required a high-dimensional representation. The Reinforcement Learning (RL) approach helps in addressing these challenges by allowing an autonomous active agent to learn the optimal policies while interacting with an initially unknown environment. The self-learning from unknown environments makes RL a promising candidate for an optimal sensor placement problem.
In [37], an RL-based method for optimal sensor placement in the spatial domain for modeling distributed parameter systems (DPSs) where the sensor placement configuration is mathematically formulated as a Markov decision process (MDP) with specified elements was proposed. The sensor locations were optimized through learning the optimal policies of the MDP according to the spatial objective function. Paris et al. [38] worked on the robust flow control and optimal sensor placement using deep reinforcement learning. They focused on the efficiency and robustness of the identified control strategy and introduced a novel algorithm (S-PPO-CMA) to optimize the sensor layout. An energy-efficient control strategy reducing drag by 18.4% at Reynolds number 120 was obtained. This control policy was shown to be robust both to the Reynolds number in the range [100, 216] and to measurement noise, enduring signal to noise ratios as low as 0.2 with negligible impact on performance. Along with a systematic study of sensor number and location, the proposed sparsity-seeking algorithm successfully optimized a reduced 5-sensor layout while keeping state-of-the-art performance [37]. In [39], the foot plantar sensor placement by a deep reinforcement learning algorithm without using any prior knowledge of the human foot anatomical area was studied. To apply a RL algorithm, the authors proposed a sensor placement environment and reward system that aimed to optimize fitting the center of pressure (COP) trajectory during the self-selected speed running task. In this environment, the agent considered placing eight sensors within a 7 × 20 grid coordinate system, and then the final pattern became the result of sensor placement. The results showed that this method could generate a sensor placement which had a low mean square error in fitting ground truth COP trajectory, and robustly discovered the optimal sensor placement in many combinations.
To the best of our knowledge, all the methods seen so far for solving the sensor placement problem using RL selected their actions based on the current averages of the rewards received from those actions [37], [38]. However, the Thompson Sampling [40], [41], sometimes known as the Bayesian Bandits algorithm, takes a different approach. It extends this current mean reward to build up a probability model from the VOLUME 9, 2021 obtained rewards and afterward samples from this to choose an action. With this, we will not only have an increasingly accurate estimate of the possible reward received, but the model also provides a level of confidence in this reward, and this confidence increases as more samples are collected. This process is known as the Bayesian Inference [42], which entails updating viewpoints as more evidence is available. Figure 1 shows the aerial map of the research farm where the experimental greenhouse (indicated with a letter (G)) is located. The greenhouse in which the experiments were performed is used to grow strawberries (Figure 2A) in the research farm of Kyungpook National University, Daegu, Republic of Korea. It was Quonset-shaped with transparent vinyl material for roof and walls and had a concrete floor. The internal air temperature of the greenhouse was measured using 56 two-in-one temperature and humidity sensors installed at distances of 1 m and 3m apart vertically and horizontally, respectively. The sensors were installed on 8 rows and 7 columns; this was done to achieve a uniform distribution. The cartesian coordinates of each sensor was determined using the X, Y, Z three-dimensional coordinate system. The sensors' range for measuring temperature was −20 to 80 • C with an error of ± 0.3 • C while that of relative humidity was 0 to 100% ± 2%. In Figure 2B, the alphabets (A-H) indicate columns while in Figure 2C, the numbers 1-7 represent the points on rows the sensors were installed. As a precaution to minimize error in the set-up for data collection, sensors were installed in plastic protective cases to shield them from direct solar radiation. Figure 3 illustrates the set up for collection which  is described as follows; the sensors were connected using cables to the sensor nodes and the sensor nodes transmitted the temperature and relative humidity data wirelessly to the network controller through gateways. These data were sent to the sever and then to a mobile display.

III. PROPOSED FRAMEWORK A. PROBLEM FORMULATION
Given that there were m sensors out of n candidate placements (m < n) with the task of finding the available m sampling locations so that a predefined measure could be optimized, the most informative m locations from the assumed n would be chosen. This makes the complexity of the optimal sensor placement problem NP-hard [43], [44]. Now, if m < n was to be chosen in such a way that a lesser number of sensors was used while getting approximately the same measurements, then the sensor placement problem could be seen as a multi-arm bandit problem. With this approach, a fixed and limited set of resources (sensor locations) must be allocated between competing (alternative) sensors in a way that maximizes their expected gain. Each choice (sensors) properties are only partially known at the time of allocation and may become better understood with time or by allocating resources to the choice [45] using the Thompson Sampling algorithm technique (1). Then, where γ is modeled as a Beta distribution. The Beta distribution was used to model the simplest form of the multi-armed bandit problem, the binary outcome or reward. Instead of each location returning a varying number of selections, each location was either selected or not. The rewards had only two possible values: 1 when the chosen location was selected and 0 if otherwise. When a random variable had only two possible outcomes, its behavior could be described by the Bernoulli distribution which validates the solutions as reported in [46]- [48]. In this study, each machine received a reward of 1 when the outcome was successful (that is, if the average temperature or average relative humidity measurement of a sensor was greater than the mean measurement) and 0 if otherwise since our goal was to identify the sensor with the highest probability of success (Equation 2). Hence for each day of the month, experiments were run, and the number of successes, recorded. The sensor locations with the highest number of successes were selected.
if success: The value of Beta (α, β) is within the interval [0, 1]; α and β correspond to the counts when we succeeded or failed to get a reward, respectively.
The posterior probability is Beta (β) with updated parameters in Equation 2.

B. THOMPSON SAMPLING ALGORITHM AND ITS IMPLEMENTATION
Thompson Sampling (sometimes referred to as Posterior Sampling or Probability Matching), is an algorithm that follows exploration and exploitation to maximize the cumulative rewards obtained by performing an action. In this algorithm, an action (exploration) is performed multiple times and based on the results obtained from the actions, the algorithm either rewards or penalizes. Further actions are performed with the goal to maximize the reward (exploitation). In other words, new choices are explored to maximize rewards while exploiting the already explored choices. Since Thompson Sampling makes use of Probability Distribution and Bayes Rule to predict the success rates of each slot machine, it is mathematically expressed in Equation 3 as, where D represents the data observed, P(θ/D) is our posterior, P(D/θ) is the likelihood of observing the data given θ, and P(θ) is the prior belief on the distribution of θ.
To model the prior distribution of θ, the Beta distribution was adopted as a parametric assumption. The Beta distribution is a function of a and b, which represent the counts of successes and failures for a given θ (Equation 4), respectively. In the context of a prior, it represents the pseudo counts of successful and unsuccessful trials the sensor has, which represents the initial perspective of the reward function of the specific choice of sensor. In other words, each sensor location is selected based on the Beta distribution of rewards and penalties associated with each sensor. where: In summary, for each row, that is, a new round n , one of the 56 sensor locations was selected to calculate the placement rate (if a sensor is placed there or not). The goal was to select the best location at each round, over many rounds. The process in Thompson Sampling was given as, For each n, repeat over 3,000 rounds, the following three steps: -For each location i, take a random draw from the following distribution (Equation 5): where: N 1 i (n) is the number of times the location i has received a 1 reward up to round n.    Note that 1 is any temperature or humidity measurement greater than the overall mean measurement.
-The strategy s(n) that has the highest θ i (n) is (Equation 6): N 1 s(n) (n) and N 0 s(n) (n) is updated according to the following conditions: • if the location selected s(n) received a 1 reward (Equation 7): • If the location selected s(n) received a 0 reward (Equation 8): In Figure 4, sensor locations from 1 to 56 are given in step A. Each location having its Beta distribution as seen in step B. The possible sensor location available was denoted with A, and an example matrix gotten from selecting a particular sensor location (1) or not (0) for every number of trials was represented with B. In this study, instead of simulating the Beta distribution [49], the generated data set was simulated to conform with the 1 and 0 Beta distribution and the mean was used as the threshold. Measurements below the mean of the data was categorized as zero, or 1 if otherwise. This is because it allows determination of the overall trend of a data set, since there was no outlier in the data.
As the number of trials of a sensor location increased, the confidence in the estimated mean also increased. This was reflected in the probability distribution becoming narrower. The sampled value was then drawn from a range closer to the true mean (see the smallest yellow histogram in step C  in Figure 4). This decreased exploration and increased exploitation since the location with a higher probability of returning a reward would be selected with increasing frequency (seen in figures 7 a-g). On the other hand, locations with a low estimated mean were chosen less frequently and they dropped early from the selection process.
For each day of the month, the distribution of the sensor location with the highest information measured was progressively shifted to the right, while the location capturing lower information was progressively shifted to the left. We took the analysis a step further than the conventional greedy algorithm [50] and immediate averages of the rewards received from agents' actions. The additional step involved the exploration and exploitation approach and building a probability model from the obtained rewards. Afterward, samples from this were used to choose an action. With this approach, we not only achieved an increasingly accurate estimate of the possible rewards received but also a relatively higher level of confidence in these rewards which increased as more samples were collected. This process is known as Bayesian Inference, and it entails updating viewpoints as more evidence becomes available [42], [51].

IV. EXPLORATORY DATA ANALYSIS OF ENVIRONMENTAL DATA
Generally, data was collected monthly, hence the preprocessing was done separately. For each month, there were approximately 5 to 6 percent of missing values. These missing rows were deleted because missing values present in the dataset could impact the performance of the model by creating a bias in the dataset. The basic visualization for data collected in the   patterns of negative correlation. The line charts in Figure 6 show temperature and relative humidity measurements from four randomly selected sensor points. It was observed that for any randomly selected measurement point, the trends were similar. Excluding rows with missing data, 38,953 air temperature and relative humidity data points were used for identifying the optimal sensor locations. From the descriptive statistics carried out in April dataset, the mean temperature was approximately 24 • C, with a standard deviation of 7 • C. The minimum temperature was approximately 11 • C while the maximum was approximately 47 • C across each temperature measure.
An exploratory data analyses demonstrated the problem of installing sensors in protected cultivation systems (Table 1 and 2 in the Appendix). It shows the variation in the location of the optimal sensors to measure the temperature and relative humidity in the greenhouse every month. In the current study, it was evident that two-in-one sensors were not ideal for measuring temperature and relative humidity simultaneously. The inverse relationship and the behavior of moisture in the environment resulted in different patterns in the variation of the optimal location for measuring the two parameters. In all the studied months, no high-ranking sensor locations were found to measure temperature and relative humidity simultaneously using the reinforcement learning approach. Furthermore, the optimal location to measure both the temperature and relative humidity varied across the different months. This could be caused by external disturbance factors [52] such as the movement of the sun, wind direction, solar radiation, outdoor wind speed, temperature, or relative humidity.
From the measured values, the fluctuation in the frequency of occurrence occurred more with temperature compared to relative humidity. In Figure 7a-1, about 20% of the temperature sensors occurred (were selected) over 1000 times with the sensor at location E3 occurring about 100% more than the rest for the month of February (end of winter).
In a similar period, the selected number of sensors for relative humidity (Figure 7a-2) had a similar percentage for the number of sensors occurring over 1,000 times. Also, the sensor at location B4 showed about a 220% more increase in the number of times it occurred (selected) compared to the rest. The selected sensor location for both the temperature and relative humidity had a significant sensor that differed from the others and possibly best explained the condition in the protected cultivation system.
In March (beginning of spring; Figure 7b) the number of sensors occurring (selected) more than 1,000 times slightly increased to about 30% compared to the percentage of sensors selected in February (Figure 7a). In this month, five locations had a closer relationship in how much they explained the condition of the greenhouse. They occurred with a difference of about 25% compared to the previous month that had 100% difference in occurrence for temperature ( Figure 7b1). However, a different trend was seen for relative humidity (Figure 7b-2) where the sensor at location D6 occurred (was selected) 70% more times than the rest of the sensors. Sensor at locations D6 and G6 occurred (were selected) 100% of the time compared to the rest of the sensors.
Towards the end of spring (May), the number of temperature and relative humidity sensors occurring (selected) more than 1,000 times reduced to 20%. This was close to observations in February (Figure 7a). However, the sensors occurring (selected) over 1,000 times were different from those in February (Figure 7a). The exception was for temperature sensor (location 23) which occurred 100% more than the rest similar to February (Figure 7d-1). For the relative humidity (Figure 7d-2), three sensors (location 3, 9, and 40) occurred about 100% more than the remaining sensors with locations A3 and B2 occurring about 50% more than location F6.
In the summer month of July (Figure 7f), an entirely different trend was recorded. Only about 15% of the sensors were VOLUME 9, 2021 TABLE 2. Sensors' rankings and percentages of occurrence using measured relative humidity dataset. selected (occurred) more frequently compared to the others for temperature (Figure 7f-1). Here, it was observed that the other sensors were occurring (selected) less compared to the other months in winter and spring. This lesser occurrence of most of the sensors was also observed with the relative humidity data (Figure 7f-2). As in the temperature data (Figure 7f-1), only about 15% of the sensors occurred (selected) over 1,000 times. Out of these, three sensors (B6, C3, and D3) occurred 125% more than the rest with sensors B6 and D3 occurring about 50% more than sensor C3.
In autumn (October), over twenty sensors occurred over 1,000 times and had close relationships in frequencies ( Figure 7g) for temperature reading. In this, a maximum difference of occurrence of about 250% was observed between three sensors and the rest for temperature data (Figure 7g-1). However, the remaining sensors occurring over 1,000 times had a similar occurrence in frequency. About 40% of the temperature sensors occurred over 1,000 times and could feasibly explain the condition of the protected cultivation system (Figure 7g-1). For relative humidity (Figure 7g-2), a different trend was seen in the number of sensors occurring over 1,000 times. Just like in February (Figure 7a), only 20% of the sensors occurred over 1,000 times with one sensor occurring about 100% more than the rest. Also, two sensors (D2 and E4) occurred about 100% more than the rest of the sensors making them the most important sensors for relative humidity in October.
As mentioned earlier, growers and sensor companies tend to install sensors in the middle of protected cultivation systems and usually spread them across the center when they have more resources [16], [53]. In this study, our method showed that the sensors that occurred more frequently for temperature were not the center sensors across the different months and seasons. In the investigated months, the center sensors did not show prominence in February, March, April, May, July, and October ( Figure 8). In these months, their percentages of frequency of occurrence were 4.94%, 5.61%, 5.53%, 3.42%, 27.40%, and 5.53% compared to side sensors that occurred at 29.68%, 13.99%, 10.97%, 20.86%, 37.84% and 10.97%, respectively ( Table 1 in the Appendix). In addition, over 70% of the sensors occurred less than a percent, showing redundancy.
Similar trends were seen in the relative humidity data set. The highest occurrences for the sensors at the center of the greenhouse were at percentages of 4.79%, 23.21%, 15.18%, 31.16% and 15.18% for February, March, April, July, and October, respectively ( Table 2 in the Appendix). None of the center sensors were found among the ten sensors with the highest occurrences in May and June ( Figure 8). However, the center sensors were the top frequently occurring sensors in March and July with a 42.9% and 20.3% difference, respectively from the next highly occurring side sensors. As in the temperature dataset, over 70% of the sensors here were also redundant.
As stated previously, the differences in the occurrences of these sensors in every simulated month could be attributed to the rapid and wide changes in external disturbances. Despite recording monthly changes in the frequency of occurrence (selection criteria) of these sensors, changes in the patterns with the seasons were observed likely due to the wide disparities in the amount of sunlight, temperature, and amounts of water in the external environment (relative humidity), as well as the times of the rising and setting of the sun in these seasons.

V. CONCLUSION
A reinforcement learning approach was adopted and programmed to solve the optimal sensor placement problem in a greenhouse, a type of protected cultivation system. Data VOLUME 9, 2021 was collected over 7 months across the four seasons, that is February (winter), March, April, and May (spring), June and July (summer), and October (autumn) and analyzed separately. The problem was formulated as a multi-arm Bandit Problem, using the Beta distribution and the Thompson Sampling algorithm techniques. Two possible outcomes were programmed for a sensor selection; 1 when the chosen sensor location is selected and 0 if otherwise. The distribution of the sensor location with the highest information measured was progressively shifted to the right, while the location capturing lower information was shifted gradually to the left using the Thompson Sampling Algorithm. We used the exploration and exploitation approach and built a probability model from the obtained rewards and immediate averages of the rewards received from agents' actions. The 10 most optimal sensor locations out of 56 for temperature and relative humidity were identified. These made up about 63-95% and 70-91% in frequency of occurrence from the total sensors for temperature and relative humidity data, respectively. Results showed that for each month, selected optimal sensor locations for temperature were distinct from relative humidity suggesting that the current practice of using two-in-one sensors was not ideal in these use cases. This study proposes a system aimed at finding optimal sensor location points in protected cultivation systems. Modeling the prior distribution to explore more functions aside from the Beta distribution could be explored in future research. Tables 1 and 2. BLESSING ITORO BASSEY received a bachelor's degree in mathematics from the University of Ilorin, Nigeria, and a master's degree in mathematical sciences from AIMS Cameroon. She is currently pursuing the master's degree with the African Masters in Machine Intelligence (AMMI), and a Research Intern with the Smart Agriculture Innovation Center, Kyungpook National University, Republic of Korea, and Carnegie Mellon University, USA. She has employed the techniques and theories drawn from many fields within mathematics, statistics, data analytics, and machine learning to solve problems and proffer solutions in different disciplines. Her research interests include natural language processing (NLP) and the application of machine learning for social goods, such as, health and agriculture.

See
RAMMOHAN MALLIPEDDI (Senior Member, IEEE) received a master's and Ph.D. degrees in computer control and automation from the School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore, in 2007 and 2010, respectively. He is currently an Associate Professor with the Department of Artificial Intelligence, School of Electronics Engineering, Kyungpook National University, Daegu, South Korea. He has coauthored articles published in the IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION and many others. His research interests include evolutionary computing, artificial intelligence, image processing, digital signal processing, robotics, and control engineering. He also serves as an Associate Editor for Swarm and Evolutionary Computation, an international journal from Elsevier, and a regular Reviewer for journals, including the IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION and the IEEE TRANSACTIONS ON CYBERNETICS. SENORPE ASEM-HIABLIE received a bachelor's degree in oceanography and fisheries science from the University of Ghana, Legon, a master's degree in marine estuarine environmental sciences from the University of Maryland Eastern Shore, and a Ph.D. degree in agricultural and biological engineering from Penn State. She is currently an Assistant Research Professor with the Institutes of Energy and the Environment, The Pennsylvania State University, and a Research Fellow with Project Drawdown. Her interests include applied research that provides stakeholders with data-driven decision tools and solutions to help prioritize actions to reduce negative environmental impacts and promote efficient resource use.
MARYLEEN AMAIZU received a bachelor's degree in electrical and electronic engineering from the Federal University of Technology Owerri and a M.Sc. degree in embedded systems from Coventry University. She is currently pursuing the Ph.D. degree with the University of Leicester, U.K. Her Ph.D. research focuses on real-time detection of anomalous events in smart visual networks, which can find application in video surveillance, occupancy monitoring, and human behavior understanding. Specifically, she focuses on how visual systems can self-learn and profile activities to distinguish special occurrences over time. Her current research interests include deep learning, anomaly detection, and cloud/edge computing.