Crop Prediction Based on Characteristics of the Agricultural Environment Using Various Feature Selection Techniques and Classifiers

Agriculture is a growing field of research. In particular, crop prediction in agriculture is critical and is chiefly contingent upon soil and environment conditions, including rainfall, humidity, and temperature. In the past, farmers were able to decide on the crop to be cultivated, monitor its growth, and determine when it could be harvested. Today, however, rapid changes in environmental conditions have made it difficult for the farming community to continue to do so. Consequently, in recent years, machine learning techniques have taken over the task of prediction, and this work has used several of these to determine crop yield. To ensure that a given machine learning (ML) model works at a high level of precision, it is imperative to employ efficient feature selection methods to preprocess the raw data into an easily computable Machine Learning friendly dataset. To reduce redundancies and make the ML model more accurate, only data features that have a significant degree of relevance in determining the final output of the model must be employed. Thus, optimal feature selection arises to ensure that only the most relevant features are accepted as a part of the model. Conglomerating every single feature from raw data without checking for their role in the process of making the model will unnecessarily complicate our model. Furthermore, additional features which contribute little to the ML model will increase its time and space complexity and affect the accuracy of the model’s output. The results depict that an ensemble technique offers better prediction accuracy than the existing classification technique.

monoxide. The others are: mercury, arsenic, dioxins and furans, asbestos, and aflatoxins [3]. Abiotic factors also include bedrock, relief, climate, and water conditions -all of which affect its properties. Soil-forming factors have a diversified effect on the formation of soils and their agricultural value [4].
Predicting crops yields is neither simple nor easy. The methodology for predicting the area under cultivation is, according to Myers et al. [5] and Muriithi [6], a set of statistical and mathematical techniques useful in an evolving and improving optimization process. It also has important uses in design, development, and formulation new as well as improving existing products. Presentation or performance of statistical analysis requires the possession of numerical data. Based on them, conclusions are drawn as to various phenomena and further, on this basis, binding economic decisions can be made. According to Muriithi [6], the better you describe certain phenomena in terms of numbers, the more you can say about them, and with increasing data accuracy you can also obtain more accurate information and make more accurate decisions.
The biggest problem in the temperate climate zone is assessment of agroclimatic factors in terms of shaping the yield of winter plant species, mainly cereals. The key factor influencing wintering yield, which provides access to days with a temperature over of 5 • C, their number and frequency, and the number of days in the wintering period with temperatures above 0 • C and 5 • C. A number of these can be estimated on the basis of public statistics and yield regression statistics in years. Developed models for checking the situation that assess whether they want to be a probation of state policy in the field of intervention in the cereal market. Efficient forecasting of productivity requires forecasting of agrometeorological factors. Aspects related to the variability of these factors may pose a particular problem [7]. Many researchers have dealt with this issue with varying degrees of success [8]- [10].
Grabowska et al. [9] predicted narrow-leaf lupine yields for 2050-2060 using weather models and three climate change scenarios for Central Europe: E-GISS model, HadCM3 and GFDL. The fit of the models was assessed by means of the determination coefficient R2, corrected coefficient of determination R2adj, standard error of estimation and the coefficient of determination R2pred calculated using the Cross Validation procedure. The selected equation was used to forecast lupine yield under the conditions of doubling the CO2 content in the atmosphere. These authors stated that the influence of meteorological factors on the yield of narrow-leaved lupine varied depending on the location of the station. The temperature (maximum, average, minimum) at the beginning of the growing season, as well as rainfall during the flowering -technical maturity period, most often had a significant influence on the yield. It has been shown that the predicted climate changes will have a positive effect on the lupine yield. The simulated profitability was higher than that observed in 1990-2008, and HadCM3 was the most favorable scenario.
Dąbrowska-Zielińska et al. [8] assessed the usefulness of plant biophysical parameters, calculated from the ranges of reflected electromagnetic radiation recorded by the new generation satellites Sentinel-2 and Proba-V, for forecasting crop yields in Poland. In 2016-2018, ground measurements were carried out in arable fields in the area included in the global crop monitoring network GEO Joint Experiment of Crop Assessment and Monitoring JECAM. Classification of crops was performed using optical and radar images Sentinel-1 and RadarSat-2. The PROtotypical model of Biomass and Evapotranspiration PRO was used to simulate the growth of winter wheat cultivation, to forecast its biomass size. Got high accuracy of 94% of the size of biomass modeled with real biomass.
Li et al. [10] found that accurate, high-resolution yield maps are needed to identify spatial patterns of yield variability, to identify key factors influencing yield variability, and to provide detailed management information in precision farming. Varietal differences may significantly affect the forecasting of potato tuber yields with the use of remote sensing technologies. These authors argue that improving potato crop forecasting with remote sensing of unmanned aerial vehicles (UAVs) by incorporating varietal information into machine learning methods has the best chance at present.
There are different challenges in this research area. Currently, crop prediction [11] models generate actual results that are satisfactory, though they could perform better. This paper attempts to propose an enhanced crop prediction model that addresses these issues. The prediction process [12] depends on the two fundamental techniques of feature selection [FS] and classification. Prior to the application of FS techniques, sampling techniques are applied to balance an imbalanced dataset.

A. WIRELESS TECHNOLOGY USED IN AGRICULTURE
ZigBee is a wireless technology used for short-range communications which is one of the most common standards for smart applications. The chief merits of the ZigBee technology arise from its low-cost and low-power functionalities. This makes ZigBee ideal for use on monitoring and data gathering devices where a primary concern is to ensure the longevity of batteries.
ZigBee has massive application in precision farming where the Internet of Things is used for SMART field management by precisely monitoring factors affecting the cultivated crops to facilitate increased and better agricultural output. In such a system, various factors which affect cultivation such as temperature, soil quality, pH, salinity, humidity, etc. are closely monitored to optimize the yield. For example, the nutrient quality of the soil may be accessed to optimize the use of fertilizers such that only areas with poor nutrition quality would be sprayed with fertilizers. Not only does this curb overuse of fertilizers but also reduces the time and money spent on excess fertilizing. Another example of this could be to improve the production of crops that require a constant amount of standing water -such as rice. Sensors can be laid out on the field to monitor the level of water. If the water level falls below the recommended threshold, the system would notify the farmer who may decide what action must be taken. In some cases, the sensor can also be programmed to automatically adjust the water levels by communicating with a control device that regulates the water supply. This also reduces manual labour that may be required to manage the crop.
Z-Wave is a network communication protocol created by the Danish company Zensys. Z-wave is a mesh network that uses low-energy radio waves for communication and is primarily used in SMART home and residential applications. It runs in 868.42 MHz (Europe) and 908.40 MHz (US) due to which it has a larger base range for communication and can communicate through barriers such as concrete, walls, etc. Z-Wave provides a medium for the transmission of small data packets with a throughput of 40kbit/sec which is predominantly used in monitor, transmission, and control applications, unlike Wi-Fi which is chiefly used for high-speed data transfer.
The Z-Wave Alliance, established in 2005, is a conglomeration of Z-Wave affiliated companies which developed appliances operating in Z-Wave in various home, industrial and business activities. The Z-Wave products feature interoperability at the application layer due to which any product, irrespective of its manufacturer, can communicate and effectively co-operate with other Z-Wave products. Every Z-Wave product must pass an established conformance test to prove its interoperability with Z-Wave standards. The Z-Wave Alliance has also laid out strong security standards for devices seeking to receive its certification.
Some advantages of the Z-Wave network are, firstly, its large range as compared to ZigBee. Z-Wave is also less susceptible to disturbances than ZigBee as it does not operate on the 2.4 GHz frequency band which ZigBee and Wi-Fi use. Z-Wave like ZigBee supports low power consumption devices and promotes battery longevity. The devices connected with Z-Wave may enter sleep mode whenever they are not in use to conserve power. Furthermore, all Z-Wave products are thoroughly tested and ensure robust interoperability. This is ensured by the Z-Wave Alliance as all Z-Wave products must obtain a certificate to operate on Z-Wave. The security in Z-Wave is also enhanced by the inclusion of another encrypted security layer. Z-Wave like ZigBee operates in form of a mesh network which allows for an extended range of operations by the introduction of intermediate devices which enables every Z-Wave device to connect to the network without directly connecting with, or being in the range of, the coordinator device. Plus, every Z-Wave device can intercommunicate at the coordinator as well as the intermediate node levels thus ensuring proper communication and smooth working without the involvement of a central device.
Some disadvantages of Z-Wave are, firstly, its low coverage. Thus, to cover a larger area, more devices would be required thus increasing the total cost of implementation. The speed of data transfer in Z-Wave is less (around 100kbps) which restricts it to low data transfer activities such as monitoring and control. Z-Wave can only support up to 232 nodes while ZigBee can potentially support 65000+ nodes. Although security standards in Z-Wave have been enhanced considerably, it still is vulnerable to attacks by a skilled hacker.
Z-Wave like ZigBee can be used for multiple monitoring and control systems in agriculture. The interoperability of the Z-Wave technology can be used to create interconnected agricultural systems which effectively communicate with each other to perform tasks. For example, in a greenhouse, a smart thermostat can be used to monitor the temperature. Whenever the temperature reaches higher than what is considered to be safe, the ventilation system (vents, exhaust fans, etc.) can be signaled to operate. Thus, reducing the temperature of the greenhouse. This way these appliances need not be operated the whole time and can be optimally used to save the cost of electricity.
Apart from this, Z-Wave is used extensively in home automation because of its increased security and its ability to penetrate through walls. Z-Wave can be used in SMART locks for the doors of the house which can be opened on being sent a signal from the user's phone. Z-Wave is also used to make SMART sensors which add an additional level of security to the homes. If a motion was detected in the house when the family was away, they would immediately receive a message on their phones. Sensors can also detect fire and smoke and turn on the sprinklers to contain the damage.
LoRa (Long Range) is a digital wireless communication network used in IoT. It was developed by Cyleo, a French company, which was later acquired by Semtech. Transmission in LoRa occurs over license-free communication bands of width 868MHz (Europe), 915MHz (North America), and 433(MHz Asia). With the use of license-free spectral ranges, the cost for the network provider as well as the end-users is considerably lowered. The key feature of LoRa is its ability to allow low-power communications over a long range. LoRa signals can extend up to 10 miles in open, barrierless areas and up to 3 miles in cities.
The LoRa technology governs the physical layer of transmission while the upper layers of transmission are governed by LoRaWAN (Long Range Wide Area Network). LoRaWAN is an LPWAN networking protocol used for connecting IoT devices to the internet and facilitates bi-directional communication and end-to-end security.
The LoRa Alliance was founded in 2015 by a group of companies to ensure better utilization of LoRaWAN and ensure interoperability of LoRa devices and networks. The LoRa Alliance is a non-profit association dedicated to the promotion and betterment of the LoRaWAN network. Just like Z-Wave Alliance, the LoRa Alliance too has its certification program to ensure interoperability and better provision of services to users. It aims to deliver sustainable and effective IoT applications by developing and promoting the LoRaWAN system.
The chief merits of LoRa are its long-range and low-power delivery. This makes it ideal for use in sensors and control mechanism systems. It has a low bit rate communication thus, it conserves the battery life of connected devices because of its low power requirement. It enables multi-year battery usage. Secondly, it is open-licensed, thus reducing the price of usage. LoRaWAN is the most suitable network for outdoor usage of IoT. LoRa permits inexpensive connectivity for devices in rural and remote areas. It is also of great use in mining and natural resources management operations. LoRa also enables advanced security by implementing a 2-level cryptographic security system. All data transmitted over LoRa is encrypted twice, once by the nodes and once by the LoRaWAN protocol. LoRa is also an open technology with an open and transparent standard. LoRa is backed by tech giants like CISCO and IBM, who are members of the LoRa alliance. The LoRa technology is elementary in nature due to its simplistic implementation and fast deployment.
LoRa however, cannot be used for transmission of large payloads of data. It is also not ideal for continuous monitoring applications. Because of its open frequency spectrum, LoRa may be vulnerable to transmission noise and disturbances. LoRa has been used for monitoring soil moisture content and optimizing irrigation. In a vineyard, all the irrigation valves are fitted with soil moisture sensors. The sensor measures soil moisture content at regular intervals and sends the received data to the LoRa gateway within its range. The Gateway can support up to 1000 sensors in a six-mile radius. The Gateway is connected to an internet router which transmits all the data to a vineyard management application (cloud-based or serverbased). Depending upon the requirement, the irrigation valves can be regulated. This has allowed LoRa based farms to save up to 50% more water. Apart from this monitoring climatic conditions such as temperature and humidity can also be done using LoRa. LoRa has played a key role in bringing agriculture and IoT together and in the establishment of SMART farms.
In addition to agricultural monitoring, LoRa is also used in the installation of solar panels. LoRa enables the monitoring of miles-long solar panel networks using low power consumption devices. LoRa can also be used to detect water and gas consumption and be used to make flow adjustments. Furthermore, it can also be used to detect leaks. LoRa is also used in SMART buildings and energy metering. It enables monitoring of energy consumption of all floors of a building and is a step towards building a SMART City.
ZigBee, WiFi, Bluetooth and LoRa are used in agriculture to collect the real time data for prediction process. In this work, real time static agriculture dataset of previous year is used for the prediction process. So, ML techniques are used in this work. This work uses the Random Over-Sampling Examples (ROSE), Synthetic Minority Over-sampling Technique (SMOTE), and Majority Weighted Minority Over -sampling Technique (MWMOTE) to help balance the given dataset. Feature selection is used to find salient features from the given dataset, resulting in better performance and classification techniques that help identify the target class. Wrapper feature selection techniques such as the Boruta, Recursive Feature Elimination (RFE), and Modified Recursive Feature Elimination (MRFE) are used in this work to discover the dataset's salient features. Several supervised classification techniques, such as the Naïve Bayes (NB), Decision Tree (DT), k Nearest Neighbor (kNN), Support Vector Machine (SVM), Bagging, and Random Forest (RF), are trained with the selected features to predict a suitable outcome from the dataset.

A. BASED ON SOIL CONDITIONS
Duro et al. [13] proposed pixel-based and object-based picture examination approaches for wide land cover classes, applying the three machine learning classifiers like DT, RF and SVM. Honawad et al. [14] proposed a digital image analysis approach to approximate the properties of physical soil.
The approach attempts to supplant conventional laboratory approaches in order to eliminate drawbacks such as manual involvement, time consumption, human error, and uncertain predictions. The signal processing method improved the quality of the original image through the use of filters and by computing the features in the enhanced images. The proposed algorithm uses color quantization and texture-based feature extraction by applying the Gabor filter and Laws' mask. Matching is achieved by applying statistical measurements like the mean, standard deviation, skewness, and kurtosis. You et al. [15] posited an adaptable and precise technique to anticipate yields by employing openly accessible remote sensing data.
The methodology enhances existing procedures in three different ways. To begin with, a remote detecting network is applied to propose a working methodology. Next, a novel dimensionality reduction procedure is presented that uses a convolutional neural network (CNN) alongside long-term memory. Finally, a Gaussian process is used to investigate and examine the spatio-transient structure of the data and enhance its accuracy. Anantha et al. [16] implemented a recommendation system using an associate ensemble model with majority voting. The random tree, Chi-square Automatic Interaction Detection (CHAID), kNN, and Naive Bayes (NB) are used as learners to help determine the most appropriate crop, taking into consideration soil parameters, with the results showing high accuracy and potency. The classified image generated by these techniques consists of ground truth-applied mathematics information Further, it incorporates such data as the parameters of the square measure in terms of the weather and crop yield, as well as state and district-wise crop produce. All of the above are employed to predict specific crop yields in a given set of circumstances. Rale et al. [17] developed a forecasting model which uses the default settings along with RF regression for crop yield production.

B. BASED ON ENVIRONMENTAL CONDITIONS
Jones et al. [18] modified the Decision Support System for Agrotechnology Transfer (DSSAT) crop model, using a decision support system algorithm. However, it is increasingly difficult to sustain DSSAT crop models, given the different sets of code in operation for different crops. The new design uses a multi-modular approach, comprising a cropping template as well as soil and weather modules. Further, there is a module that monitors light and water in the crops, soil, and environment.
Fernando et al. [19] studied data on annual coconut production from 1971 to 2001 in a particular region and assessed its economic impact. The research revealed that the loss sustained by the economy in crop shortage terms was around US $50 million. Ji et al. [20] advanced an estimation technique to predict rice yields. The study attempted to determine the effectiveness of artificial neural networks (ANN) in predicting rice yield in mountainous regions. It assessed the efficacy of the ANN, relative to biological parametric variations, and compared the efficiency of multiple bilinear regression models with the ANN model. Boryan et al. [21] proposed a decision tree-based technique to depict openly accessible state-level crop cover groups, in accordance with guidelines laid down by the Cropland Data Layer (CDL) and National Agricultural Statistics Service (NASS), and utilizing ground truth collected during the June Agricultural Survey. The proposed work outlines the NASS CDL program. It presents information dealing with handling strategies, order and approval, precision evaluation, and CDL item particulars, and product cost estimation procedure. Hansen and Loveland [22] proposed the use of Landsat to acquire satellite imagery that facilitates remote sensing of the environment. Current strategies for monitoring land cover changes across massive swathes of land commonly utilize Landsat information. Bolton and Friedl [23] created a precise model to forecast maize and soybean yield in the Central United States. Part of their examination included testing the capacity of the MODIS (Moderate Resolution Imaging Spectroradiometer) to catch between-yearly fluctuations in yields. Their outcomes demonstrate that the MODIS two-band Enhanced Vegetation Index outperforms the generally utilized Normalized Difference Vegetation Index in respect to anticipating maize yields. Taking into consideration data using vegetation phenology obtained from the MODIS has fundamentally enhanced the model execution internally as well as crosswise, over the years.
Dempewolf et al. [24] designed and developed a practical wheat yield prediction model for the Punjab Province of Pakistan. Shannon and Motha [25] examined the agricultural lands of North America, Central America, and the Caribbean following various weather and climate-related natural disasters. The latest climate and weather data is needed to help farmers manage agricultural risks. The study discusses climatic uncertainties in agriculture such as drought, flood, typhoons, extreme heat, and freezing temperatures. A decision support system is used to prepare adequately for hazard management prior to the occurrence of a disaster. The Agro Climate Research Centre and Agro Meteorological Department play a critical role in agriculture-based risk management activities. Manjula and Djodiltachoumy [26] analyzed crop yield prediction data supported by association rules for the chosen region, that is, the district of Madras in an Asian nation.
Eswari and Vinitha [27] employed the Bayesian network classification supervised learning model in their proposed approach Environmental characteristics such as temperature and rainfall are analyzed alongside crop information to classify crops like rice, coconut, areca nut, black pepper, and dry ginger. Bayesian network classification is employed to explore the dataset.

C. SURVEY OF MACHINE LEARNING TECHNIQUES FOR CROP PREDICTION
Shivnath and Santanu [28] devised a machine learning approach to examine soil fertility and plant nutrient management. The backpropagation network (BPN) used is trained with inputs on crop growth characteristics, nutrient reserves in the soil, and external applications for crop production. The ML system follows the 3 steps of sampling (different soils with similar properties and completely different parameters), backpropagation, and weight change.
Paul et al. [29] designed a system that uses data processing techniques to foretell the class of the soil datasets analyzed in terms of crop yields. The process of prognosticating crop yields is formalized as a sorting rule, using NB and kNN clustering. Pudumalar et al. [30] devised an exactness agriculture approach, which is a smart farming technique that uses information on soil property, soil type, and crop yields to help farmers determine the most appropriate cultivable crops based on soil parameters. A new ensemble model using the random tree, CHAID, kNN, and NB is proposed to recommend crops for a specific land area. Bodake et al. [31] developed a soil-based fertilizer guidance system that facilitates topical soil examination to help farmers cultivate the right crop. The tool is intended to be made available in the local language so farmers experience no difficulty in comprehension. Heupel et al. [32] proposed an unsupervised fuzzy classification approach that suggests crop types with produce harvested in early spring. The classification results are expected to improve with time. Liu et al. [33] investigated the probability of implementing multi-temporal Sentinel-2 satellite images to discern heavy metal-induced stress (i.e., Cd stress) in rice crops in four study areas in Zhuzhou City of Hunan Province in China. Priya et al. [34] advocated a crop yield prediction approach using the RF rule. Real-time information from Tamil Nadu state in India was used to develop the models, which were tested on several samples. The predictions generated help farmers forecast crop yields prior to cultivation.
Archana et al. [35] proposed an ontology-based recommendation system for crop quality and fertilizer use, successfully bridging the gap between the farming community and technological applications. The system predicts a relevant crop, taking into consideration the geographical area and soil type, and offers guidance on appropriate fertilizer use. The recommendation system uses the RF rule and k-means clump rule. Brogi et al. [36] proposed a superintended classification methodology to classify the Essential Commodities Act (ECA) and map regions on the basis of similar soil characteristics. Ali Al-Naji et al. [39] proposed method which is referred as a non-contact vision system based on a standard video camera to solve the irrigation-based problems in the agriculture. The authors have used the feedforward back propagation neural network to analyze/irrigate the soil which is captured at various times, distances and illumination levels.

D. MOTIVATION AND JUSTIFICATION
Farming performs a vital function in everyday life. Crop prediction in farming, which is a challenge, is based on feature selection and classification. The literature survey above has revealed that crop prediction is best undertaken by feature selection techniques Recursive feature elimination (RFE) is a wrapper feature selection method that searches through a subset of features in the training dataset for the most important ones, eliminating the rest until the desired target is obtained. The RFE technique predicts classification accuracy well. It is, however, limited by the fact that it demands dataset updating during the feature elimination process. Such updating in the RFE is a difficult, time-consuming process. Motivated by these factors, this work proposes a new framework for selecting features from a crop, following which classification is undertaken to predict the crop While existing studies have resorted to a single prediction method, our work uses several classification techniques for crop prediction.
Analysing most of the research papers, the Feature selection techniques like Recursive Feature Elimination, Boruta and Modified Recursive Feature Elimination techniques woks efficiently than other techniques. As well as, the classification techniques k nearest neighbor, decision tree, naïve bayes, support vector machine, random forest, and bagging gives better prediction rate. So, these techniques are taken for the prediction process. Though, all feature selection techniques and classification techniques are existing, the dataset used in this work is real time felin dataset. The felin dataset contains the yield of potato tubers, their yield of dry matter and starch. These are the 7-year averages expressed in (dt/ha) and their coefficients of variation expressed as percentages. Such crops were obtained in the town of Felin, where the meteorological data come from. The outline of the proposed work is given in Fig. 1.

E. OUTLINE OF THE PROPOSED WORK
The rest of the paper is organized as follows. Section 2 discusses the methodology and Section 3 the feature selection techniques. Section 4 describes the classification techniques and Section 5 the experimental design. Section 6 concludes the paper.

III. PREPROCESSING
Sampling techniques are applied during preprocessing to balance the dataset and maximize the prediction performance [37]. The sampling techniques used include the ROSE, SMOTE and MWMOTE. ROSE is used for binary classification in the presence of rare classes and SMOTE for better classifier performance in the ROC space, while MWMOTE handles imbalanced dataset issues in crop prediction.

IV. FEATURE SELECTION TECHNIQUES
There are three commonly used feature selection techniquesfilter, wrapper, and embedded. This work uses the wrapper techniques to select salient features.

A. BORUTA
Boruta is a random forest-based classification algorithm [38] that involves the voting of versatile unbiased indistinct classifiers in decision trees. The importance of a characteristic is estimated by calculating the loss of classification exactness caused by the random permutation of attributes within objects. The average and standard deviation of the loss of accuracy are calculated, and the average loss is divided by the standard deviation to obtain the Z score to measure average fluctuations in mean accuracy loss among crops.
A 'shadow' attribute is made for each tree by randomly rearranging the values of the initial attributes across objects. The importance of every attribute is determined by analyzing all the attributes in the system. Given the random nature of the fluctuations, the shadow attributes are used as a reference to point to the most important ones. As is to be expected, the degree of accuracy depends greatly on the shadow attributes. Consequently, the values will be re-shuffled constantly to obtain optimal results.
The Boruta algorithm comprises the following steps: 1. The data system, which is extended by affixing copies of all the shadow attributes, is always prolonged by 5 shadow attributes. 2. The added attributes are shuffled with the original attribute to remove any correlation with the response. 3. The Z score is computed by running a random forest algorithm on the widespread information system. 4. The Maximum Z Score Attributes (MZSA) are calculated and any attribute with a value higher than the MZSA is assigned a ''hit''. 5. For attributes with undetermined importance, a two-sided test of equality with the MZSA is carried out. 6. Attributes with importance significantly lower than the MZSA are identified as 'unimportant' and permanently eliminated from the information system. 7. Attributes with importance significantly higher than the MZSA are marked 'important'. 8. Shadow attributes are thus eliminated from the information system. 9. The process is repeated until all attributes are marked with a level of importance. Prior to these steps, however, the algorithm starts with 3 start-up rounds with a simple criterion for importance. The 3 rounds help deal with the tremendous Z score fluctuations when there are large numbers of attributes to be dealt with. In the 3 start-up rounds, the attributes are compared to the 5th, 3rd, and 2nd best shadows. Rejections occur at the end of each initial round and confirmations, on the contrary, in every round.
The time complexity of the algorithm is approximately O (P · N), where P and N are the number of attributes and number of objects, respectively.

B. RECURSIVE FEATURE ELIMINATION (RFE)
The RFE technique is a wrapper feature selection technique that starts with the entire dataset. The ranking method crucial to the RFE technique orders the dataset from the best to the worst, based on which salient features are selected. At each iteration, it eliminates the least important features from the dataset and updates the dataset, continuing the process until the most important ones are selected. RFE is a Wrapper-type feature selection and elimination technique that employs the greedy algorithm. The RFE algorithm recursively identifies and eliminates the least relevant features from the dataset until a sophisticated level of optimization is achieved. In the Wrapper method, the feature selection process is carried out based on a core machine learning algorithm which is fit into the dataset.
In the first step, the model if fit to the dataset i.e., it is generalized. Next, the least significant features are picked from the model and eliminated until only the desired and most important features remain. For a simpler understanding, the number of features along with the performance of the model can be plotted. As relevant features are added, the performance of the model increases. Once all the relevant features are added, the performance of the model will start decreasing upon the addition of redundant features which will be characterized by a drop in the performance level on the graph. Thus, an optimal level of performance can be achieved by selecting the right number and type of features.
It is not known in advance how many features a model must keep. Therefore, to determine the optimal number of features, the RFE algorithm is cross-validated. Recursive Feature Elimination Cross-Validation (RFECV) works just like RFE but, in addition to RFE, it cross-validates the features, automatically selecting the features which give the best performance. All models cannot be paired with the RFE since the RFE starts by considering the entire set of predictors. In models where the number of predictors is more than the number of samples. Furthermore, some models benefit more from RFE than others.
The main advantage the RFE has over other methods is that it categorically verifies every feature's role in processing the output of the model and eliminates features only based on their performance. Thus, producing better results in comparison to filter methods. RFE is also better suited for small sample problems. By using multiple parameters like soil texture, pH, wetness, topography, gypsum content, etc. machine learning can be used to assess the land suitability which helps plan suitable use of the land for agriculture. Here multiple features may be considered to determine the suitability of land but it is not known in advance which features play a key role in determining the final output and which features bring about added redundancies. Thus, the RFE algorithm can be used here to eliminate insignificant features to help improve the accuracy of the model. ML algorithms were used to forecast a short-term soil moisture content in potato crop farming. RFE was one of the algorithms used here to select the most significant features which affected the soil moisture from a set of initial features. Such data may assist in agronomical decision-making.
Recursive Feature Elimination is often combined with the Random Forest algorithm to tackle the presence of correlated predictors which inhibit the accuracy of the Random Forest algorithm. This has shown positive results in smaller data sets. Monitoring pasture quality in farmlands is essential to ensure efficient pasture management. Hyperspectral imaging can be used to determine the biological properties of vegetation in pasture areas. Airborne hyperspectral imaging was used for predicting crude protein and metabolizable energy in farms. The data measure was developed into regression models which used Random Forest. The accuracy of the model showed a significant improvement when RFE was used in conjunction with Random Forest

C. MODIFIED RFE
The MRFE, a wrapper feature selection technique, removes non-salient features from the dataset. Initially, the MRFE technique permutes the dataset by shuffling it, following which it combines the shuffled and original datasets. The permutation dataset reduces computation time and eliminates the need for dataset updating. Thereafter, using a RF classifier, it ranks the features in order from the best to the worst. Based on the ranking result, it selects salient features for the prediction process.

V. CLASSIFICATION TECHNIQUES A. NAÏVE BAYES (NB)
The Naive Bayes algorithm, derived from Bayes' theorem, is widely used in miscellaneous classification tasks. The three Naive Bayes algorithms are the multinomial, Bernoulli, and Gaussian. The three Naive Bayes algorithms are the multinomial, Bernoulli, and Gaussian. It is shown in Equation 1.
where P(y | X) = Posterior probability P(X | y) = Likelihood P(X) = Evidence P(y) = Prior probability. The Naïve Bayes Algorithm is a supervised machine learning algorithm that is mainly used for classification problems. It works under the assumption that the probability of occurrence of any feature is independent of the occurrence of other features and every feature contributes equally to the final outcome. It is based on the Bayes theorem for calculating the probability of events given the occurrence of another event. The Bayes theorem aims to calculate the probability of an event occurring given that another event is true. Naïve Bayes algorithm is a probabilistic classifier technique i.e., it predicts the outcome based on probability.
Given a labeled dataset and a target variable, the Naïve Bayes Algorithm would calculate the result based on probability. First, the entire dataset is preprocessed and organized into a frequency table by noting down the events and their frequencies. Then, a likelihood table is generated using the frequency table. Finally, the Bayes theorem is applied to calculate the posterior probability.
The advantages of the Naïve Bayes Algorithm are, first, it can be used for binary as well as multi-class classification of data. Secondly, it is fast and easier to implement than the other ML algorithms. It also does not require a lot of training data. It can work with both discrete and continuous data. It's highly scalable and not sensitive to irrelevant features. When the assumption of independence is true, Naïve Bayes Algorithm performs better than other algorithms. The main disadvantage of the Naïve Bayes Algorithm is that it doesn't work well with correlated variables since it works on the assumption of independence and in real-time practice, there aren't many variables that do not correlate with each other.
In agriculture, the Naïve Bayes method can be used to make lucrative food crop recommendations. The Naïve Bayes Algorithm is used to classify weather data, agricultural products, and selling prices to recommend types of food crops to farmers. This recommendation of food crops would be extremely helpful to farmers especially in an era of climate change. Better choice of food crops would mean better income for farmers at the same time reducing the possibility of crop failures.

B. DECISION TREES
A decision tree is a flowchart-like tree structure generally used in supervised machine learning for classification and prediction. A DT can be turned into a set of rules where each path, heading from the root node to every leaf node, is a rule. In a decision tree, every internal node represents a test/condition or an attribute, every branch is a result of the test, and every leaf node has a class ascribed to it that is reachable if the attribute fulfils the condition of the branch leading to it. A famous example of a decision tree is the C4.5 by Ross Quinlan. There are, broadly, two types of decision trees, categorical and continuous variable, based on target attribute types. A decision tree starts with a root node that is compared with other attributes/features in the dataset for a perfect split. A perfect split implies that the number of outputs of one class are on one side of the tree and those of the other class on the other. In this way, every node gets split until it reaches a perfect split, the outcome of which becomes the leaf node of a tree. The real challenge in constructing a DT is attribute selection. That is, given the large number of attributes available, it is difficult to select the ones to be used as root nodes or internal nodes. To this end, there are two techniques that can be applied, Information Gain and Gini Index: Information Gain (T, X) = Entropy (T) -Entropy (T, X), where T refers to the current state and X to the selected attribute; where p + represents the probability of Yes/Good and p-the probability of No/Bad.

C. SUPPORT VECTOR MACHINE (SVM)
In AI, support vector machines (SVMs) are supervised learning models that dissect information for order and relapse examination. Given a bunch of models, each set apart as possessing a place with one of two classes, the SVM assembles a model that designates new guides to one type of classification or another, rendering it a non-probabilistic doublestraight classifier (although strategies such as Platt scaling, for example, exist to utilize the SVM in a probabilistic order setting). The SVM works to widen the gap between two classifications. New models are planned that fit in and have a place with a class that is dependent on which side of the gap they fall. SVMs help tackle an array of certifiable issues. They are useful in text and hypertext form, as their application decreases the demand for named preparing examples in both conventional inductive and transductive settings. A few techniques for shallow semantic parsing depend on support vector machines. SVMs have widespread uses in the natural sciences. They have been employed to group proteins, with up to 90% of the mixtures ordered accurately. Change tests reliant on SVM loads have been proposed as a system for the translation of SVM models. Support-vector machine loads have furthermore been employed to decipher SVM models in the past. Post-hoc understanding of support-vector machine models to specify highlights appropriated by the model to make expectations is a usually new space of exploration with exceptional importance in the natural sciences. Support Vector Machine or SVM is a supervised ML technique that is used to solve regression and classification problems but is more suited to classification. SVM works well on small data sets but is more robust and efficient with large data sets. Given a dataset with n features, SVM initiates with plotting all points in the dataset in an n-dimensional space, and each point is assigned a coordinate according to the value of its features. Hereon, the classification process is conducted by determining a suitable hyperplane which to the furthest extent, differentiates the points into two distinct classes. Support vectors are essentially the points that are located close to the hyperplane and determine its position and orientation. The distance between the support vectors and the hyperplane is called the margin and to generate the most accurate hyperplane, the margin needs to be maximized as far as possible.
The advantages of the SVM algorithm are, firstly it is very effective in analyzing high dimensional datasets. It is of great use in cases where the number of dimensions is greater than the number of samples. SVM utilizes the support vectors for training and therefore consumes less memory.
Some disadvantages of the SVM algorithm are, firstly it is not suitable for very large datasets as the time required to train the model increases. It also gives inaccuracies when the target classes overlap with each other. Furthermore, the SVM algorithm cannot account for probability. SVM is used to classify agricultural data to allow for better decision-making. In a comparative study of classification techniques used for agricultural data, SVM was able to outperform Naïve Bayes and Artificial Neural Network methods.

D. K-NEAREST NEIGHBOR (KNN)
One of the most commonly used supervised and nonparametric machine learning techniques is the kNN, used in classification and regression problems. Supervised algorithms are the ones with labelled data. Labeled data refers to input data that is already tagged with the correct output in supervised learning. Supervised learning algorithms take the data and attempt to make models that predict the output data, given relevant inputs. There is, however, no actual learning in the k-NN algorithm, which follows the ''lazy learning'' principle, where all the work happens at the time a prediction is required.
The algorithm depends on distances between points, which can be ascertained using one of a few methods. A key aspect for consideration is that the distance is always required to be either zero or positive. This is done by squaring the distance or raising it to a certain power or taking the absolute values. Methods to find distances include the following: Manhattan Distance This distance is easier to calculate than the others.
(i) Euclidean Distance This is the distance between two points, used in regular geometry.
(ii) Hamming Distance This method finds distances by depending on common neighbors. |x 1 − y 1 | + |x 2 − y 2 | If x 1 and y 1 are of the same type, their difference is 0, else it is 1.
(iii) Minkowski Distance Similar to the Euclidean distance, an ''n'' value is needed here, ((x 2 − x 1 ) p + (y 2 − y 1 ) p ) 1 / p where xi and yi are the x and y coordinates of a point on an xy plane. The k-nearest neighbors (KNN) method is a supervised machine learning algorithm used to solve classification and regression problems. kNN works on the premise of similar entities existing in close proximity. Related data fields would therefore occur nearby. This helps us in mapping similarities between datasets and a given query. VOLUME 10, 2022 Before we implement the kNN method, all the labeled data must be pre-processed. Firstly, all the data must be normalized. Next, feature selection must be employed to delete the irrelevant features as kNN doesn't work well when too many features are present. Missing values cannot be tolerated, thus in the case of missing values that particular row must be deleted. Hereon, we can move towards the implementation of the kNN algorithm.
First, data is loaded into the model. kNN being a supervised learning method requires data to be loaded in labeled form. Next, K is declared according to the desired number of neighbours. Then, for every element in the dataset, the ''distance'' or ''relation'' with the query input is calculated by the machine learning algorithm. The distance between the element and the query input is then added to an ordered collection and is subsequently sorted in increasing order of the distances. Lastly, the first K items of the collection are selected and the output, depending upon the model being a regression or a classification problem, is returned by taking a mean, in case of the former, and taking the mode in case of the latter.
Choosing k is an important factor as it heavily influences the result of our ML model. If the value of K is too low, the model suffers from instability and the results become increasingly inaccurate. Conversely, an extremely high value of k will start furnishing an increased number of errors in the model. Therefore, the value of k must be balanced between the two extremums. In the case of a model where a vote is required to get the output, K should be taken as an odd number to ensure a deciding game. The chief advantages of the kNN algorithm are, firstly, it is simple and relatively easy to implement. Next, the algorithm can serve multiple purposes, right from classification and regression to searching problems. Furthermore, the algorithm can be improved by adding additional training data. The main disadvantage of kNN is that the speed of the algorithm goes on decreasing as our dataset becomes larger and larger as the cost of computation keeps increasing. Therefore, it is not suitable in cases where immediate results are required. Secondly, we must accurately determine the value of k to get appropriate results. This process can be difficult sometimes. Also, the kNN algorithm requires a large amount of memory to store large sets of data.
The kNN method can be used for predicting crop yield using a set of known parameters, namely, rainfall, temperature, humidity, and soil moisture. The value of crop yield is calculated by using the values of the nearest neighbors. kNN has yielded suitable accuracy in predicting crop yield. This model can be further enhanced by adding additional features and more data from all the seasons. kNN has also been applied for predictive analysis of paddy production and has shown better and faster results compared to the SVM algorithm.

E. BAGGING
Bagging, also known as bootstrap aggregation, is used with decision trees, where it significantly raises the stability of models in terms of advancing accuracy and diminishing variance so as to eliminate overfitting. In ensemble machine learning, bagging takes numerous weak models specializing in distinct sections of the feature space and aggregates them to pick the best prediction. An ensemble set of learners is developed and built utilizing the learning algorithm, but with each learner instructed on a different set of data. Such a process is termed bootstrap aggregating or bagging. Initially, numerous subsets of the data are created, with each being a subset of the initial data. So, a subset containing n'' values have an original dataset comprising n different instances. n'' of these are grabbed at random with replacements from the initial data. An n'', randomly grabbed and put in the bag, is chosen from the entire data collection. It is picked again at random and bagged, implying a certain degree of repetition, resulting in some recurrence. Such recurrence, however, creates no problems and is to be anticipated. So then, m groups or bags are created altogether, with each holding n'' different data instances, randomly picked, with replacements. Thus, n is the number of training instances in the original data, n'' the number of instances in individual bags, and m the number of bags. We nearly always want n'' to be less than n, usually by about 60%. Therefore, each of these bags has, as a rule of thumb, about 60% as many training instances as the original data.
Each of the data groups is used to practice a different model. There are m different models, each one practiced on different data, producing an ensemble of different learning algorithms, along with an ensemble of models to be queried identically. Each model is queried with the equivalent x and all of their output collected. The y output of specific models is taken with their mean to generate the y for the ensemble.

F. RANDOM FOREST (RF)
The random forest (RF) is one of the most successful supervised machine learning algorithms. The RF algorithm embodies the essence of ensemble learning in that it links multiple classifiers to resolve a complicated problem, thereby enhancing the performance of the model. In this method, the ''forest'' that is built is a set of decision trees. Characteristics in the RF are randomly picked in each decision split. The correlation between trees is diminished by randomly picking features that promote prediction and result in greater efficiency.
Random Forest is an ML classification algorithm that works by dividing the dataset into subsets or decision trees and aggregating the outputs of all the trees to produce the final output. Random Forest comes under the Bagging category of ensemble learning techniques. The Row and Feature samples from the main dataset are randomly selected and fed into the decision trees in the Random Forest Technique. The analyst chooses the number of decision trees for the model. Each decision tree works on the data and predicts a result based upon its calculation. Random Forest doesn't take the result from any one of the decision trees but combines the outputs from all the decision trees. Random Forest takes the majority of the result (in case the result is in a Boolean form) or the mean/median of the result (in case the result is in numerical form). Thus, a higher number of decision trees gives a more accurate result and circumvents the problem of overfitting.
The Random Forest technique provides several advantages. Firstly, Random Forest is simple and relatively easy to understand and is therefore extremely popular. It is also capable of performing both classification and regression tasks. It is also suitable for handling large sets of data with high dimensionality and most importantly, it makes the model much more precise and resolves the overfitting issue.
Random Forest cannot be used in case of extrapolation of data as it could produce inaccurate results. Although Random Forest can be used for both regression and classification, it is better suited for classification tasks. Also, it does not produce proper results when dealing with sparse data. Random Forest also needs more time for implementation and requires larger data and greater resources. In the presence of correlated predictors, Random Forest is known to produce inexact results.
Random Forest [40] can be used to predict pest attacks in cotton plants. Various factors were very considered and the Correlation filter selection method was used to select the most important features. Random Forest was then used to determine the number of trees to get a low error rate and important parameters were sighted out and used for clustering to determine the outcome. Optimized usage of water for farms is essential for reducing wastage as well as enhancing productivity. The use of a precision irrigation system for furnishing the optimal water supply needed by plants or crops will lead to better output. The amount of water required by plants can be expressed in terms of pH. The Random Forest algorithm is used to determine the pH level which in turn helps determine the water supply required by a piece of land. The concept of random forest is given in Fig. 2. Given that each bagged tree is identically disseminated, the expectation of an average of B trees is the equivalent of the expectation of each. Since this accounts for the bias of bagged trees being the same as that of individual trees, a change may only be affected through variance reduction. This contrasts with advancing, where the trees are grown adaptively to exclude bias, and hence are not identically distributed. An average of B identically distributed random variables has a variance of σ 2 . If the variables are completely identically distributed, but with positive pairwise correlation ρ, the variance of the average is given as It is observed that as B increases, the value of the second term shifts negligibly while that of the first term remains unchanged. Consequently, the size of the correlation of the bagged trees limits the benefits of averaging. The RF focuses on bagging variance minimization by cutting the correlation between the trees without increasing the variance excessively. The tree-growing process makes this procedure possible through picking input variables at random.

VI. EXPERIMENTAL RESULTS AND ANALYSIS
This paper has used an agricultural dataset that incorporates soil and environmental characteristics, both of which are not publicly available. Thus, data manually collected with care from the farming community was used for the purpose of this research.
In this work, the performance of the feature selection and classification methods was assessed using the metrics of accuracy (ACC), specificity (S), recall (R), precision (P), F1 score, mean absolute error (MAE), log loss (LL), and area under the curve (AUC). The results are shown in the tables (Table 1 to Table 5 ). Table 1 shows that the random forest algorithm, which consists in constructing many decision trees and generating a class that is the dominant of the all classes (classification) or the predicted mean (regression) of individual decision trees offers the highest accuracy, followed by the k-nearest neighbor and bagging classifiers. The Naïve Bayes Classifier has an accuracy of 70.64, with Kappa equal to 70. 12 Table 2 and we can therefore conclude that.
In Table 3 In Table 4, we evaluate the performance of the MRFE with the RF based on the Fold Validation Method. In this evaluation we take 9 Folds into consideration. For each fold we have 7 performance metrics. For Fold 10, Accuracy   Table 3, the best feature selection technique was identified. In this table, the following attributes were confirmed as significant according to the BORUTA algorithm. They were: soil temperature at various depths (5, 10, 20, 50 and 100 cm); air humidity, precipitation; average, minimum and maximum air temperature. Five attributes were confirmed as irrelevant including: cloud cover, visibility, wind direction, snow cover.
According to recursive feature elimination (RFE), six most important variables were selected out of eight. These were the monthly soil temperatures at various depths (5, 10, 20, 50 and 100 cm) and the minimum temperature.   According to the modified elimination of recursive features (MRFE), 6 variables were selected: average soil temperature, average air temperature, minimum and maximum air temperature, rainfall, and air humidity. Performance metrics, as: Accuracy, Kappa, Precision Recall, Specificity, F1 Score, AUC were at a high level.
In Table 5 Table 4 and Table 5 show the results of the random forest technique when used in conjunction with different fold validation and data splitting validation methods. Table 4 and 5 shows the performance evaluation of the MRFE, and RF methods based on the compartmentalized data validation method. As the ranges of characteristics increased, the values of the measures decreased.

VII. CONCLUSION
Predicting crops for cultivation in agriculture is a difficult task. This paper has used a range of feature selection and classification techniques to predict yield size of plant cultivations. The results depict that an ensemble technique offers better prediction accuracy than the existing classification technique. Forecasting the area of cereals, potatoes and other energy crops can be used to plan the structure of their sowing, both on the farm and country scale. The use of modern forecasting techniques can bring measurable financial benefits.