Towards Paddy Rice Smart Farming: A Review on Big Data, Machine Learning, and Rice Production Tasks

Big Data (BD), Machine Learning (ML) and Internet of Things (IoT) are expected to have a large impact on Smart Farming and involve the whole supply chain, particularly for rice production. The increasing amount and variety of data captured and obtained by these emerging technologies in IoT offer the rice smart farming strategy new abilities to predict changes and identify opportunities. The quality of data collected from sensors greatly influences the performance of the modelling processes using ML algorithms. These three elements (e.g., BD, ML and IoT) have been used tremendously to improve all areas of rice production processes in agriculture, which transform traditional rice farming practices into a new era of rice smart farming or rice precision agriculture. In this paper, we perform a survey of the latest research on intelligent data processing technology applied in agriculture, particularly in rice production. We describe the data captured and elaborate role of machine learning algorithms in paddy rice smart agriculture, by analyzing the applications of machine learning in various scenarios, smart irrigation for paddy rice, predicting paddy rice yield estimation, monitoring paddy rice growth, monitoring paddy rice disease, assessing quality of paddy rice and paddy rice sample classification. This paper also presents a framework that maps the activities defined in rice smart farming, data used in data modelling and machine learning algorithms used for each activity defined in the production and post-production phases of paddy rice. Based on the proposed mapping framework, our conclusion is that an efficient and effective integration of all these three technologies is very crucial that transform traditional rice cultivation practices into a new perspective of intelligence in rice precision agriculture. Finally, this paper also summarizes all the challenges and technological trends towards the exploitation of multiple sources in the era of big data in agriculture.


I. INTRODUCTION
The current global population of 7.8 billion (2020) persons is expected to reach 9.7 billion by 2050 [1]. It is expected that the world would require 70% more food than what available at the moment with less natural resources like land and water due to urbanization, soil erosion, climatic changes, water shortages and excessive use by livestock. It is estimated that there is about 33% wastage of agriculture production due The associate editor coordinating the review of this manuscript and approving it for publication was Mario Luca Bernardi .
to poor logistics and storage [2]- [4]. As result, the key for coping strategies in the contexts of climate change and food security is to implement precision agricultural or smart farming. Precision agriculture is a technology-enabled approach to farming management that observes, measures, and analyzes the needs of individual fields and crops [5]. Smart farming is defined as the application of information and data technologies for optimizing complex farming systems. It focuses on how the collected agriculture related information can be used in a smart way, rather than the storage of data, access to data and the application of these agriculture data. Big data and machine learning algorithms are two main important components in paddy rice smart farming. Big Data can be defined as a data that can be described using three key concepts: volume, velocity, and variety. Volume refers to size of the data, variety refers to the various types of data (e.g., text, numbers, images, videos and audios) and Velocity refers to the increasing speed at which big data is created (e.g., live stream data). Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.
Applying smart farming technologies will definitely assist farmers in various tasks to increase crop productions. In order to narrow down the scope of this paper, we focus on paddy rice smart farming as rice is an increasingly important staple food in Asia Pacific region and other parts of the world.
The changes over time in land use and soil salinity levels have significant impact on the production of rice yields [6]. In addition to that, unpredictable weather conditions and inefficient techniques to predict weather conditions are amongst the few factors that reduce rice yields production [7], [8]. For instance, most farmers in Myanmar face heavy rains during the rice growing season and that crop damage and yield losses due to heavy rains cause extensive losses among farmers [9]. As a result, the ability to predict weather or climate trends and environmental factors (e.g., soil nutrient) is very important in enhancing paddy farmers' productivity [7].
The current practices, which heavily rely on fertilizers and pesticides to increase productivity are not supporting the sustainable rice yields production because these activities are not environmentally friendly farming systems [10]. In addition to that, the timing for rice yields harvesting also influences the production of rice yields as the best timing for paddy harvesting showed a linear relationship with grain loss [11]. As a result, monitoring the growth of paddy is very crucial to sustain rice yields production.
Rice production in coastal areas is frequently affected by typhoons. The lack of ability to manage impacts from natural events and disasters that include contamination of water bodies, loss of harvest, and destruction of irrigation systems and other agricultural infrastructure is another shortcoming that requires attention [12], [13]. With smart farming, the application of data mining and analytical techniques designed so far for prediction, detection and development of appropriate disaster management strategy based on the collected data from disasters can be used to manage these impacts and consequently support agriculture or farming activities with more effectively and efficiently.
Variation within farms and region based on resource endowments, location topography and farmers circumstances make it difficult to apply the same strategy in maximizing rice yields productivity. Towards the end of the twentieth century, precision agriculture began to be utilized that applies information technologies to capture and integrate data from multiple sources (e.g., farmers, sensors) in order to have a more robust strategy associated with crop management and thus can be used to maximize agriculture productivity [14].
Unsustainable rice yields production [6], inefficient techniques to predict weather conditions, lack of ability to manage calamities [7], [8], [12], variation within farms and region [14] and poor logistics and storage [2]- [4] are among the reasons why smart farming should be adopted to sustain and optimize rice yields productivity.
In this paper, we conduct a systematic literature review (SLR) of the latest research on intelligent data processing technology involved in rice smart farming focusing on the rice production and post-production phases of the agriculture supply chain. We describe the main datasets or features extracted for data modelling. We then elaborate role of machine learning algorithms in smart agriculture, by analyzing the applications of machine learning in various scenarios in the rice production and post-production phases of the agriculture supply chain. This paper also presents a framework that maps the activities defined in rice smart farming, datasets or features used in data modelling and machine learning algorithms used to analyze these features for each activity defined in the early stage of agriculture supply chain.
The remainder of this article is organized as follows. Section II provides the literature review of the most recent reviews conducted related to smart farming. Section 3 describes the existing frameworks related to agriculture supply chain. Section 4 provides an in-depth analysis of the type of big data used in rice smart farming agriculture focusing on the variety of sources used, the variety of machine learning algorithms used and finally the variety tasks involved in the rice production and post-production phases of smart farming. The research work presented in this section is classified based on the sources and types of data that are used, the types of tasks involve in smart farming and also the type of machine learning algorithms used to model these data. Section 5 presents a framework that maps the activities defined in smart farming, datasets or features used in data modelling and machine learning algorithms used to analyze these features for each activity defined in the early stage of agriculture supply chain. Finally, Section 6 concludes this paper and presents challenges and technological trends towards the exploitation of multiple sources in the era of big data in agriculture.

II. LITERATURE REVIEW
A survey has been conducted to look into the global coverage in terms of innovation related to smart farming and the usage of machine learning in smart farming. The survey was conducted by using two methods; looking at the trends of number of scholarly works over time related to Smart Farming, Machine Learning in Smart Agriculture, Artificial Intelligence in Smart Agriculture and Internet of Things in Smart Agriculture, and reviewing all review studies that were conducted on several elements of I4.0 and its applications in smart farming for improving the productivity.
Firstly, the trends of number of scholarly works over time related to Smart Farming, Machine Learning in Smart VOLUME 9, 2021

Agriculture, Artificial Intelligence in Smart Agriculture and
Internet of Things in Smart Agriculture, can be obtained by using the Lens website (https://www.lens.org). Lens provides open datasets of patent documents, scholarly research works and any inventions related to machine learning, artificial intelligence, internet of things and smart farming disclosed in patents [15]. The Lens serves global patent and scholarly knowledge as a public resource to make science-and technology-enabled problem solving more effective, efficient and inclusive. This knowledge may help show ways forward such as new or repurposed ideas and inventions, better strategies and targeted partnerships for collective action. Based on these four keywords used in searching for trends in smart farming research, the usage of Machine Learning (ML) in smart farming, the usage of Artificial Intelligence (AI) in smart farming and the usage of Internet of Things (IoT) in smart farming, Fig. 1 through Fig. 4 display the increasing trends of number of scholarly works over time related to these keywords. For instance, based on these Fig. 1, Fig. 2   Secondly, in the past, few review studies were conducted on several elements of I4.0 and its applications in smart farming for improving the productivity in agriculture sectors as mentioned in Table 1. These studies have focused on I4.0 applications in the smart farming covering specific aspects like Internet of Things, Cloud Computing and Big Data Analytics (Machine Learning). Table 1 shows several reviews that were conducted recently that are related to smart farming or precision agriculture. Several reviews conducted were focusing on the application of machine learning algorithms in smart farming [16], [17]. For instance, Sharma et al. investigated the current state of research on machine learning (ML) applications in Agriculture Supply Chain (ASC) that includes the application of ML in four different phases in ASC; pre-production, production, processing and distribution [16]. It was concluded that all three ML algorithms can be leveraged to develop a sustainable ASC. A Machine Learning-Agriculture Supply Chain performance framework was introduced in which the machine learning algorithms are mapped into all four different phases in ASC based on the type of data used. However, these data are not explained and categorized comprehensively.
Mekonnen et al. conducted a review on the application of various machine learning methods in analyzing data captured from sensors within the agricultural ecosystem [17]. In this review, a limited number of machine learning algorithms is listed based on the data that are captured using different types of Wireless Sensor Networks (WSN) (e.g., ZigBee WSN, GSM and GPS WSN, LoRa WSN, Wifi and MQTT Sensor based with Raspberry pi and Arduino) and also remotely sensed data (multispectral or hyperspectral data) and vegetation indices. Based on the trend obtained from this review, there will be an increased use of more advanced techniques like distributed (or edge) deep learning.
Several reviews also conducted focusing on the application of deep learning algorithms only in smart farming [18]- [21]. One of the findings from these reviews is that the deep learning algorithms are proven to be better in providing high 50362 VOLUME 9, 2021 accuracy results compared with other machine learning algorithms in terms of accuracy when applied to various agricultural problems, such as disease detection and identification, fruit or plants classification and fruit counting among other domains.
The evolution of agriculture systems involves the adoption of incoming data from various sources [27] and also the application of big data applications in smart farming [26]. Together these big data technologies and the capability of machine learning algorithms in forecasting certain outcomes will cause major changes in the scope and organization of smart farming [16], [17], [26]. Lytos et al. conducted a survey paper that covers the state-of-the-art big data architectures and agriculture systems in order to bridge the knowledge gap between agriculture systems and exploitation of big data. However, in this review, the authors list out the name of the databases and features used in the agriculture systems only without outlining how these data are processed or analyzed.
The quality and type of dataset collected from sensors greatly influence the performance of the forecasting algorithm in predicting the crop yields. For instance, in optimizing the performance of forecasting crop yields, Fabrizio Balducci et al. have investigated the performance of several machine learning algorithms based on different subsets of features extracted from the environmental sensors [29].
There are quite a number of reviews conducted related to Internet of Things (IoT) technologies [38]. The emergence of Internet of Things (IoT) technologies has also improved the performance of a real-time monitoring of the data related to smart farming [20]. IoT are mainly used in monitoring crop, soil and weather, forecasting disease and crop yields, controlling irrigation machinery and autonomous vehicles and robots [22]. Based on several reviews, it can be concluded that incorporating precision livestock farming technologies (Sensors), big data analytics, and the IoT in smart farming practices puts forth a possible solution to assist us to improve agriculture productivity and meet projected global agricultural product demands [23]- [25], [28]. The increasing amount and variety of data captured and obtained by these emerging technologies in IoT offer the smart farming strategy new abilities to predict changes and identify opportunities.
However, to the best of our knowledge, no such studies are conducted to review comprehensively the present mapping of datasets or features and machine learning algorithms based on different types of tasks involved in the paddy rice production and post-production phases of the ASC. There are several related works conducted about the applications of machine learning on the paddy rice smart farming. Several researches have been conducted that apply machine learning algorithm (e.g., Support Vector Machine (SVM) [46], [98], [127], Convolutional Neural Network (CNN) [94], [95], and hybrid approaches [103], [124]) in paddy rice sample recognition and classification using high-resolution images. Remotely sensed, vegetation indices and climate data are commonly used to predict paddy rice yield estimation [34], [35], [48], [76], [77], [109] and to monitor paddy rice growth [63], [73], [84] using artificial neural networks and its variants and also linear regression approaches. In addition to that, hyperspectral and high-resolution images have been used to accurately and affectively monitor paddy rice disease [40], [41], [87], [88], [118] and assessing quality of paddy rice [93], [104], [105] by using deep learning algorithms.
In this paper, we will present a framework that maps three elements which include a) Paddy rice production and post-production activities defined in the ASC, b) datasets or features related to agriculture components captured from sensors, and c) machine learning algorithms used to analyze these features for each activities defined in the early stage of ASC. This is done by i Identifying the phases and tasks involved in the paddy rice smart farming that require intelligent data processing technologies. ii Describing the main datasets or features captured and used by intelligent data processing technologies in each task identified in the paddy rice smart farming. iii Elaborating the roles of machine learning technology in paddy rice smart agriculture, by analyzing the applications of machine learning in various tasks and phases in the paddy rice smart farming.

III. PHASES AND TASKS IN PADDY RICE SMART FARMING
This paper focuses on the smart farming technologies used in paddy growth and production. This section elaborates the selected phases and tasks involved in the paddy rice smart farming [16]. The applications of machine learning algorithms and smart technologies in the agriculture supply chain can be divided into 4 phases and include pre-production, production, post-production and finally distribution phases [16], [30], [31]. However, in this work, we focus on several tasks that require intelligent data processing technologies that can be fully utilized to improve the production of paddy rice. Thus, this review focuses on the rice production and post-production phases.
In the rice production phase, several activities are conducted sequentially such as planting, managing water, monitoring soil fertility, managing weed and finally managing pests and diseases. Then, in the rice post-production phase, the activities can be divided into harvesting, drying, storage and milling and processing. Based on these phases and tasks, we will look into the features or datasets that are applied by machine learning algorithms in this rice production processes. The SLR framework used for presenting the review findings is presented in Fig. 5. Fig. 5 highlights two main categories which are the paddy growing activities and the smart farming activities associated with the paddy growing activities. The paddy growing activities can be divided into production and post-production phases. The first step in rice production phase is planting. Rice crops can either be direct seeded or transplanted. Next, ensuring the rice plant to get adequate water is very important since rice is extremely sensitive to water shortages. Managing good practices for smart paddy irrigation is very critical to maximize water efficiency and yield. Smart irrigation for paddy rice deals with maintaining a predetermined water height in paddy fields automatically based on the growth stages of the paddy rice [32], [33]. Managing weeds is crucial to reduce the amount of weed pressure in the field. Next, monitoring soil fertility is also very essential to optimize the growth of a rice plant. At the same time, timely and accurate diagnosis of paddy diseases and managing pests are highly required to reduce loses. Monitoring paddy rice disease involves activities such as detection and recognition of diseases from paddy plant leaf images [37], [39] or classifying, detecting, and predicting infestation patterns of the Brown Planthopper in rice paddies [40], [41]. Generally, monitoring the growth of paddy rice involves analyzing the growth of paddy rice based on climate data or remotely sensed data and vegetation indices. This also includes developing an approach for mapping rice-growing areas at field level using phenology-based rice crop classification or paddy growth stages classification [34]- [36]. Predicting paddy rice yield estimation involves tasks such as yield assessment of paddy fields using machine learning algorithms [42] or mapping rice planted area using the hyperspectral data or remotely sensed data and vegetation indices [43].
In the rice post-production phase, paddy harvesting activities include reaping, stacking, handling, threshing, cleaning, and hauling. Harvesting paddy should be performed efficiently as the speed of paddy harvesting showed a linear relationship with grain loss [11]. When rice is harvested, it will contain up to 25% moisture. High moisture level during storage can lead to grain discoloration, encourage development of molds, and increase the likelihood of attack from pests. It can also decrease the germination rate of the rice seed. Assessing the quality of paddy rice can be performed by using any machine learning algorithms. Assessing the quality of paddy rice usually involves activities such as assessing the quality of the rice [44] or investigating the impact of climate change on paddy rice production [45]. Next, drying process will involve the process of drying paddy by using traditional or mechanical systems. It is important to dry rice grain as soon as possible after harvesting (ideally within 24 hours). After that, these dried rice grain will be stored to prevent grain loss caused by adverse weather, moisture, rodents, birds, insects and micro-organisms like fungi. Finally, the last activity in the post-production phase is the milling process which remove the husk and the bran layers, and produce an edible, white rice. Paddy rice sample recognition and classification can be applied to perform the milling process. In paddy rice sample recognition and classification, the main task is to separate and classify objects of rice sample based on color and texture features with the help of image processing and machine learning techniques [46], [47].

IV. APPLICATION OF BIG DATA AND MACHINE LEARNING IN RICE PRODUCTION TASKS A. BIG DATA USED IN RICE PRODUCTION TASKS
Data that are commonly used in paddy rice smart farming can be categorized into sensor data, remotely sensed data and vegetation indices, drone based data and finally paddy rice leaf analysis data. Table 2 tabulates the types of data and features used in paddy rice smart farming according to the smart farming activities described in Fig. 5.

1) SENSOR DATA
Firstly, the typical types of sensor data captured that can be used in monitoring paddy rice growth or yield estimation of paddy rice are data related to meteorological. Meteorological data (or climate data) can be used to monitor paddy rice growth [45], [48] and disease [41]. For instance, Guruprasad et al. conducted a yield estimation modeling paddy crop at different spatial resolution (SR) levels based on weather and soil data as input features. These features include day and night temperature (min, max, mean), diffused irradiance, precipitation (cumulative), relative humidity, wind speed, rainfall, pH, soil moisture and temperature (0-40cm) [48].
It was observed that the disease incidence on paddy rice growth is also directly affected by the level of temperature, wetness duration [50], [51]. Paddy rice production is also affected by the level of precipitation. For instance, the paddy rice production was found to be affected by the decreasing post-monsoon precipitation as this time coincides with the sensitivity of the paddy fruiting and ripening stages [54]. Besides that, winds may also affect the growth and production of paddy rice plants as strong winds are very detrimental to the growth and production of rice plants, especially when they occur during the flowering and ripening phases of rice [49]. Rainfall was found to be the main climate driver of the paddy rice yield [111]. The suitable soil pH for rice cultivation is at pH 6.0 [52] or 6.25 [53]. Analyzing the nitrogen level of paddy rice can also be used to assess the quality of paddy rice [105].

2) REMOTELY SENSED DATA AND VEGETATION INDICES
Secondly, remote sensing data or remotely sensed data and vegetation indices can be used in different ways in estimating paddy yield, monitoring paddy growth and diseases. Many studies are based on the mapping of rice-growing are [35], [43] [57], mapping cropping patterns [57]- [59], mapping paddy vulnerability to flooding [58].
The Moderate Resolution Imaging Spectroradiometer (MODIS) sensors have a total of 36 spectral bands and seven of them are related to vegetation and land surfaces that include several ranges [60].  Table 2).
LSWI is sensitive to the total amount of liquid water in vegetation and its soil background. LSWI was developed by considering two bands of the shortwave infrared (SWIR) and the NIR regions of the electromagnetic spectrum to compute the estimation of water content of the land surface [61]. LSWI is computed based on Eq. 1; where ρNIR is the reflectance in the NIR, ρSWIR1 is the reflectance in the Shortwave Infrared One. LSWI can be used to detect and classify paddy rice phenology in paddy fields with complex cropping patterns [35], [62]. It was also used to assess the damage of regional rainfed paddy rice after severe floods [44] and monitoring rice growth [63]. Liou and Sha found that the value of LSWI increases and becomes higher that NDVI and EVI [44]. VOLUME 9, 2021 EVI can be used to quantify vegetation greenness [64]. Son et al. have constructed a time-series EVI and LSWI data in order to perform the phenology-based rice crop classification [35]. EVI can be measured as follows; where ρNIR is the reflectance in the NIR, ρRed is the reflectance in the red, ρBlue is the reflectance in the blue, C 1 , C 2 , and L are coefficients and G is the gain factor. The coefficients adopted in the MODIS-EVI algorithm are; L = 1, C 1 = 6, C 2 = 7.5, and G = 2.5. EVI is normally combined with other vegetation indices (e.g., NDVI, LSWI) to predict paddy rice yields' estimation [35], [58] [59], [62], assess damage of regional rainfed paddy rice [44] and monitor rice growth [62], [63]. The results of applying MODIS-based paddy rice phenological detection algorithm in classifying paddy growth stages are found to be encouraging and can be used to monitor paddy rice agriculture at a larger scale [62], [63]. Indices that correlate with vegetation cover are also used in estimating paddy yield and monitoring paddy growth such as the NDVI, which is mostly used to predict paddy rice yields' estimation [43], [57], [59], assessing damage of regional rainfed paddy rice [44] and monitoring rice growth [63]. NVDI is used to measure the level of greenness and biomass of vegetation. NDVI measurements are most often taken from satellites in orbit around the Earth. NDVI can be computed based on differences in the response patterns of vegetation in the red and NIR ranges as follows [65]; where ρNIR is the reflectance in the NIR and ρRed is the reflectance in the red and this NDVI ranges between -1 (no vegetation) and +1 (green vegetation). Both the NDVI and EVI are most commonly used vegetation indices to monitor the health of vegetation on the fields [35], [43], [57], [59], [66]- [69]. However, some researchers have reported that EVI is often preferred than NDVI as EVI is more responsive to biophysical variables, such as LAI [35], [67]. For instance, EVI is more robust in capturing the difference in well-vegetated areas [67]. MNDWI is computed based on differences in the response patterns of vegetation in the green and SWIR1 ranges for the enhancement of open water features [70] and can be measured as follows; where ρNIR is the reflectance in the NIR, ρSWIR1 is the reflectance in the Shortwave Infrared One. The integration of NDVI and MNDWI from Sentinel-2A image has shown increased accuracy of predicting the paddy rice yield estimation [57]. LAI is a dimensionless quantity that characterizes plant canopies that typically can be defined as the ratio of one sided leaf area per unit ground area (m 2 /m 2 ) and can be considered as a measure of paddy crop growth and productivity since it characterizes plant canopy structure and gives an idea of the amount of biomass available in a field. LAI can be measured using a plant canopy analyzer [71]. Some works have been conducted to estimate paddy rice LAI with a fixed point continuous observation of near infrared reflectance using a calibrated digital camera [71], [72]. Estimating paddy rice LAI can also be done using machine learning methods [69] and also statistical methods [73] based on hyperspectral data. The leaf area index (LAI) and plant nitrogen concentration (PNC) were also used to estimate the nitrogen nutritional index (NNI) in paddy rice [74].
Another remotely sensed data that is widely used in smart farming called C-Band Synthetic Aperture Radar (SAR) data. C-Band SAR data can be obtained from the Sentinel-1A satellite which provides a collection of data in all-weather, day or night. C-Band SAR data has been used in a wide range of applications that include sea and land monitoring. For instance, C-Band SAR has been used in predicting paddy rice yield estimation [82], monitoring paddy rice growth [83] and monitoring paddy rice disease [40].
For instance, a SVM classifier can be used to perform segmentation and classification of paddy rice samples [46]. The prediction of nitrogen deficiency of rice crop can also be done to access the quality of the rice using deep learning methods [104].

B. APPLICATIONS OF MACHINE LEARNING ALGORITHMS IN PADDY RICE SMART FARMING
This section elaborates the roles of machine learning technology in paddy rice smart agriculture, by analyzing the applications of machine learning algorithms and smart technologies in various scenarios in the paddy rice production and post-production phases of the ASC. As mentioned earlier, intelligent data processing technologies can be applied in various scenarios in all the paddy rice production and post-production phases of the ASC and these tasks include smart irrigation for paddy rice, predicting paddy rice yield estimation, monitoring paddy rice growth, monitoring paddy   rice disease, assessing quality of paddy rice, paddy rice sample recognition and classification.
The state-of-the-art for the tasks involved in the smart paddy rice farming is illustrated in Fig. 6. First, all the acquired data (Sensor, Remotely sensed data and vegetation indices, and drone based data) will be cleaned, fusioned or integrated. Then, the dimensionality of the data can be reduced using feature selection, construction, transformation and weighting processes [162]- [164]. Next, once the data are prepared, then they will be divided into training and testing data depending on the types of task (e.g., classification, regression or clustering) or machine learning algorithms (e.g., estimation, linear and non-linear methods) used to model the data. Finally, model evaluation and interpretation will be performed to extract knowledge that supports the tasks in the smart paddy rice farming (e.g., Paddy Yield Estimation, Monitoring Paddy Growth, Assessing the Quality of Paddy Rice, Determining Paddy Rice Classes and Monitoring Paddy Rice Diseases).

1) SMART IRRIGATION SYSTEM FOR PADDY RICE
Automatic drip irrigation system requires a lesser amount of water to maintain a predetermined water height in paddy fields [32], [33] and this system can be controlled based on the captured climate data (e.g., temperature, humidity, light and rain) from sensors. Using a wireless sensor and actuator network (WSAN) to build a smart irrigation system for paddy fields can also conserve significant amount of water [106], [107]. Automatic irrigation system can cause a significant increase of rice production by making more arable land available for paddy rice plantation [33].
Besides that, smart sensors for climate and soil [36], [36], Radio-frequency identification (RFID), load Sensor and Global Positioning System (GPS) are also used in estimating paddy rice yield [113], [114]. Table 3 tabulates the applications of smart sensors (Internet of Things (IoT)) in various tasks involved in paddy rice smart farming.

2) PREDICTING PADDY RICE YIELD ESTIMATION
Most researches model the paddy rice estimation based on the hyperspectral and climate data in predicting paddy rice yield estimation (see Table 4). These studies conducted using various types of remotely sensed data and vegetation indices to predict paddy rice yield estimation [34], [35], [57], [58], [67]. Thus, one of the issues is determining the best combination of data obtained from remotely sensed data and vegetation indices to improve the accuracy of predicting paddy rice yield estimation. For instance, the integration of NDVI and MNDWI from Sentinel-2A image with temporal backscatter increased the accuracy by 0.08 [57]. Combining hyperspectral data (e.g., NDVI and MNDWI) will also increase the accuracy of estimating the paddy rice yield by using Classification And Regression Trees (CART) [57]. CART is one of the variants of Decision Tree (DT) classifiers that can be used for classification or regression predictive modeling problems [57], [66], [144]. DT is one of important types of algorithm for supervised learning, particularly in predictive modeling [78], [81], [126]. DT are constructed via an algorithmic approach that optimizes the splitting of a data set based on different conditions of the data features. In addition to that, using multi-features fusion method can also improve the accuracy of predicting paddy rice yields using a deep learning approach [112].
Partial Least Squares (PLS) algorithm can be found in many researches conducted to estimate the paddy rice yields [34], [35], [75]. For instance, short wave infrared region was found to be very essential for estimating the paddy yield using PLS algorithm [34]. PLS was developed based on the principal component regression that can be used to build models that can predict more than one dependent variable [63], [69], [136]. PLS was also found to produce higher R 2 of 0.984 compared to Principal Components Regression (PCR) in predicting paddy rice yield estimation [75]. PCR is based on Principal Component Analysis (PCA) that is used to analyze the multiple regression data that suffer from multicollinearity [132] (e.g., predicting paddy rice yield estimation [75]). Before any modelling can be performed, PCA can be used to extract features of the datasets [128]. PCA is a well-known technique used for reducing the dimensionality of the datasets [129]. This is done to increase the interpretability but at the same time minimizing information loss [130], [131].
A few variants of deep learning algorithms have also been used to predict paddy rice yield estimation based on NDVI [109], climate data [48], [110]- [112], [155] and hyperspectral data (Bands 1 ∼ 4) [75]- [77] with higher accuracy results. These deep learning algorithms include Artificial Neural Network (ANN) [48], [77], [110], [111], Convolutional Neural Network (CNN) [76], [109], [112], Recurrent Neural Network (RNN) [155]). For instance, neural network algorithms achieved better overall accuracy compared to Random Forest (RF) and Support Vector Machine (SVM) using either the hyperspectral or climate data [48], [109], [110]. Inspired by the way biological nervous systems, ANN is basically an information processing technique that works like the way human brain processes information [150]. An ordinary neural network may consist of hidden layers and weights while CNN has filters which collectively make up the convolution layers. CNN is most commonly applied to analyze images and it is a class of deep neural networks. CNN is suitable to be used for spatial data such as images. In contrast, RNN is suitable to be used for temporal data which is also called sequential data. Compared to ANN, RNN is able to learn time-series data since it has a recurrent connection on the hidden state and this looping constraint ensures that sequential information is captured in the input data [151], [155]. Although, deep learning algorithms are known to be very effective and robust to forecast yields paddy rice yield estimation, [76], [77], [109], [111], [155] they require a large amount of time-series data to improve the prediction performance [112].
RF requires two parameters namely the number of trees and the number of features to split the data set based on different conditions [143]. RF has been found to be effective in predicting paddy rice yield estimation and monitoring paddy rice growth [48], [82]. Several works related to applying SVM in paddy rice smart farming have been reviewed in this paper [48], [109]. However, they produced lower accuracies compared to deep learning algorithms. SVM is a supervised machine learning model that can be used for binary classification tasks [146]. The objective of the SVM is to find the optimum hyperplane in an N-dimensional space that can distinctly classifies the data points.
Unsupervised learning algorithm can also be used to predict the paddy rice estimation using the hyperspectral data [58]. For instance, Iterative Self-Organizing (ISO) has been used to generate paddy cropping pattern to predict paddy rice yield estimation [58]. ISO is an unsupervised learning algorithm that can be used to generate rice cropping patterns [58]. The ISO algorithm is a modification of the k-means clustering algorithm. The merging and splitting of clusters are based on a predefined threshold by the user. If the difference of distance in multispectral feature space is less than the predefined threshold, the merging or splitting of clusters will be performed [153].
There are several optimization approaches that produce estimates of unknown variables or parameters based on a series of measurements observed over time, such as the Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF) and Moving horizon estimation (MHE) [133], that can be used to predict paddy rice yield estimation. Moving Horizon Estimator with Pre-Estimation (MHE-PE) is an optimization-based estimator introduced and use an auxiliary estimator to describe the dynamics of the state over the horizon [134], [135]. MHE-PE is found to be more effective compared to MHE [68] for crop start date estimation in tropical area [68].
Some of the limitations found in these studies include the resolution limitations, topographic effects and limited and small size of time-series data that lead to estimation errors. For instance, the low fractional coverage of small-size rice paddies in the complex and hilly landscapes could also lower the probability of identification using the OTSU's algorithm [108]. OTSU's method is an image segmentation algorithm that segments a gray level image with only one modal distribution in gray level histogram [100], [148]. Stepwise classification (SW) is another classification approach that applies a strategy that combines two heterogeneous data sets in a novel way, and this can be used in estimating rice yields production [67]. Table 4 tabulates the applications of machine various learning algorithms found in some of works to predict paddy rice yield estimation.

3) MONITORING PADDY RICE GROWTH
Monitoring the growth of paddy rice can be performed by mapping paddy rice and assessing the growth stages of the paddy rice. One of the issues or challenges in monitoring paddy rice growth using machine learning algorithms is to determine the optimum features combination. With optimum features combination, the overall accuracy of the classification results can be improved [115]. For instance, the optimum features combination can be achieved by using the robust adaptive spatial temporal fusion model (RASTFM) [116]. NDVI [63], [66], [69], [115], EVI [63] and Hyperspectral bands 1 ∼ 4 [73], [78] are the most commonly used in monitoring the growth of paddy rice.
The Multilayer Perceptron (MLP) [63], a class of feedforward ANN, and RF [69], [115] algorithms show better accuracies [69], [115] compared to PLS, SVM [63], [78], [83] and Support Vector Regression (SVR) [69] in performing the paddy growth stages classification. SVR is characterized by the use of kernels, sparse solution, and the original control of the margin and the number of support vectors [141]. SVR trains using a symmetrical loss function, which equally penalizes high and low misestimates and it has been proven to be an effective algorithm in estimating real-value [69].
Least-squares support-vector machines (LS-SVM) is found to produce better results compared to Multiple Linear Regression (MLR) and PLS, in estimating LAI of paddy rice from optimal hyperspectral bands [73]. LS-SVMs are least-squares versions of SVM which can be used for classification and regression analysis problems [73], [123], [140]. MLR is a statistical technique that uses several independent variables to predict the outcome of the dependent variable [34], [73], [111], [137]. Multiple regression is an extension of linear (OLS) regression that applies only one independent variable.
Besides remotely sensed data, vegetation indices, climate and soil data obtained from smart sensors are also used in monitoring paddy rice growth [36], [36] (see Table 3). Table 5 tabulates the applications of machine various learning algorithms in monitoring paddy rice growth.

4) MONITORING PADDY RICE DISEASE
The color of the paddy rice leaves will change when they are infected by any disease and these colored spots are VOLUME 9, 2021 created on leaves. For that reason, most of the researches used high-resolution images in monitoring the paddy rice disease [37], [39], [41], [87]- [92], [117]- [120] and hyperspectral images [80], [81] to detect and assess the paddy rice diseases. The ANN algorithm and its variants, CNN, are found to be very effective in classifying task for monitoring the paddy rice diseases [39]- [41], [81], [87], [88], [90], [118]. For instance, the ANN achieved better classification results compared to FC and SVM algorithms [39] and the calibrated CNN model still showed good classification ability in a small-scale sample set and it was selected as the best classification model compared to DT, k-NN and SVM [81]. However, CNN requires a large number of samples for training purposes [88], [112]. In fuzzy classification (FC) applications, once a set of classes has been defined, one can determine the degree of membership of every object x under consideration [149]. Fuzzy classification allows object x to belong to two or more classes.
k-Nearest Neigbour (k-NN) algorithm is also very effective in detecting diseases from paddy plant leaf images and identifying Brown Planthopper in paddy field and other classification problems [37], [81], [92], [96], [117]. Given an unknown sample, k-NN finds k samples that are nearest to this unknown sample based on certain distance functions (e.g., Euclidean or Cosine distance methods) and take the average of the response variables from these k samples as the label (class) of the unknown samples [145]. k-NN can be used for paddy rice sample classification [59], [99], [103]. Compared to SVM, k-NN produces better accuracy in detecting and recognizing diseases from paddy plant leaf images [37].
Some combined approaches show promising results that involve deep learning approaches [40], [41] and SVM algorithms [40], [91]. For instance, a combination approach of two machine learning algorithms (e.g., CNN + SVM) has been used to identify the cultivated paddy regions (e.g., Using CNN), and to detect areas damaged (e.g., Using SVM) by Brown Planthopper attacks [40]. Other works include building a semantic framework that models an ontology related to rice plant knowledge and applying this framework to help farmers to identify rice diseases, receive early warnings of possible spreadable diseases, and receive treatments based on multiple observations [121].
Minimum Distance Classifier (MDC) achieved better accuracy compared to k-NN in classifying high-resolution images for monitoring paddy rice disease [117]. MDC classifies unknown sample data to classes which minimize the distance between this sample data and the class in multi-feature space [147]. One of the works reviewed has applied MDC to classify images in the task of monitoring and controlling rice diseases using Image processing techniques [117].
There are also researches conducted on developing expert systems using optimized fuzzy inference system (OFIS) [122] and forward chaining [89] for monitoring paddy rice disease. Table 6 tabulates the applications of machine various learning algorithms in monitoring paddy rice disease.

5) ASSESSING QUALITY OF PADDY RICE
The quality of paddy rice can be assessed using the hyperspectral data [74], [79], climate and soil data [105] and also high-resolution images of the paddy rice [93], [104], [123]. SVM and CNN algorithms are the two most commonly used machine learning algorithms for assessing the quality of paddy rice [79], [93], [104], [105], [123]. CNN is found to be more effective compared to SVM algorithm in assessing the quality of the paddy rice [93]. However, a combination of classical artificial neural networks and SVM also has been used to predict nitrogen deficiency of rice crop [104].
Fuzzy c-means (FCM) has also been used to assess the quality of the paddy rice. FCM is a method of clustering which allows one piece of data to belong to two or more clusters [74], [154]. Table 7 tabulates the applications of machine various learning algorithms in assessing quality of paddy rice.

6) PADDY RICE SAMPLE CLASSIFICATION
Machine learning algorithms are normally combined with computer vision techniques to perform paddy rice sample classification with more effectively. Applying computer vision and machine learning techniques to recognize and classify rice varieties is a method that can be used to increase the accuracy of classification process in real applications. Several studies have been conducted that apply and examine several morphological and textural features of rice seeds' images to evaluate their efficacy in identification of rice varieties [97] and classification of paddy rice adulteration levels [96]. In most studies related to the application of machine learning algorithm for paddy rice sample classification, deep learning algorithms are found to be very effective in classifying rice samples [94]- [97], [99], [101], [124].
The classification of the paddy rice samples can be improved with PCA-based reduced features [96], [103], [124]. PCA can be combined with other classifiers to VOLUME 9, 2021  improve the accuracy of paddy rice sample classification [96], [103], [124] and also to perform qualitative analysis in monitoring paddy rice disease [80].
Deep learning algorithms (e.g., BPNN, CNN) produced better accuracy compared to SVM algorithm [46], [101] in classifying paddy rice samples [95], [96]. When the label or number of varieties are not available, an unsupervised learning algorithm, such as clustering algorithm, can also be used to cluster paddy rice samples. For instance, k-Means clustering algorithm provides clusters with considerable separability as measured using separability index measures [103] based on the PCA-based reduced features. In k-means clustering, n observations are partitioned into k clusters in which each observation is assigned to the nearest cluster centroid.
The k-means clustering is also known as a method of vector quantization [152]. By using the k-means clustering method in paddy rice sample classification, the H channel data can provide clusters with considerable separability as measured using separability index measures [103]. k-means clustering also can be used as part of the approach to classify the annual cropping patterns of paddy crop based on k number of classes [59].
Adaptive Boosting (AdaBoost) has been used to classify paddy rice samples. AdaBoost algorithm combines multiple weak classifiers to form a single strong classifier [91], [95], [142]. AdaBoost is also known as ensemble method as it consists of multiple weak classifiers. However, deep learning algorithms are found to be more superior than AdaBoost algorithm in classifying paddy rice samples [91], [95].
A multi-classifier cascade based rice spike detection method has also been proposed that consists of SVM, CNN and k-Means algorithm [99]. Other works include training machine learning algorithms to predict weight and size of rice kernels [125], application of machine learning algorithm in detecting adulterated admixtures of white rice based on mass spectrometry data [126] and classifying organic rice samples using original rice elements [127]. Table 8 tabulates the applications of machine various learning algorithms in paddy rice sample classification.

V. RESULTS AND CONCLUSION
Based on the reviews of several works in this paper, a new framework is proposed that maps three entities that include big data, machine learning and paddy rice smart farming tasks. In this review, the types of machine learning algorithms used are highly dependent on the availability of data. At the same time, the type of data required depends directly on the type of tasks stated in each production and post-production phases of paddy rice smart farming. These machine learning algorithms are used to perform the intelligent data processing that will assist farmers in various tasks mentioned in the production and post-production phases. Based on the findings summarized in the previous sections, machine learning algorithms and smart technologies can be used to improve the overall efficiency of the paddy rice production system. The potential benefits lead to an improvement in the return of investment (ROI) for all paddy rice production systems by minimizing the losses or costs involved in the production of paddy rice. As a result, we use these findings found in the literature to map these three components (e.g., datasets, machine learning algorithms and tasks stated in the production of paddy rice) and develop a Big Data-ML-Task applications framework that can be used by the practitioners. The proposed framework shown in Fig. 7 has three main components, the types of datasets, types of ML algorithms, the types of tasks in paddy rice smart farming and paddy rice supply chain performance. With smart irrigation system, the usage of water can be reduced and at the same time fully utilized to increase the paddy rice yield [32], [106], [107]. Automatic irrigation system also may cause a significant increase of rice production by making more arable land available for paddy rice plantation [33].
The task of estimating the yield of paddy rice precisely is very important for national food security and development evaluation. The development of an integrated aerial crop monitoring solution using an Unmanned Aerial Vehicle (UAV) has motivated researchers to apply vegetation indices retrieved from hyperspectral images to estimate paddy rice yield [77]. Several studies have estimated the paddy rice yield based on time-series climate data [48], [110]- [112], [155]. Rainfall was found the main climate driver of the rice yield [111]. Other studies considered hyperspectral data to estimate the yield of paddy rice [34], [35], [57], [58], [67], [68], [75], [76], [108], [109]. Deep learning algorithms were found to be more effective compared to other machine learning algorithms for modeling paddy rice yield [48], [76], [77], [109]- [112], [155]. Maximum quality of paddy rice harvested can be obtained by using sensors to monitor humidity, temperature, pH, soil moisture and light intensity in real [113], [114].
Monitoring the growth of paddy rice is critical for understanding the growing status and yield estimation of paddy rice. For instance, the self-sufficient level (SSL) for paddy rice in Malaysia is only 70%. As the world population is increasing, intensifying paddy rice farming is more preferable over the expansion of agriculture land due to limited arable land [156]. Monitoring the growth of paddy rice is difficult for traditional farmers due to climate change, soil conditions, age of the farmers and time consumed to monitor the whole area. With remotely sensed data, creating paddy rice crop growth map is possible using the hyperspectral images [66] and synthetic aperture radar (SAR) data [83], [115]. For example, the paddy rice growth based on rice growth parameters (e.g., rice height and biomass) can be monitored with the backscattering coefficient from RADARSAT-2 data [83]. The paddy rice leaf chlorophyll contents can also be retrieved from the rice canopy hyperspectral imagery to analyze the paddy rice plant growth [63]. Leaf area index (LAI) is commonly used as a surrogate for productivity in precision agriculture (PA) and is widely used in plant growth [69], [73]. In short, the applications of machine learning algorithms have enabled us to timely and accurately monitor paddy rice planting area for national food security and management [115]. Using smart sensors to monitor soil pH, lux and temperature also provides insight in understanding the stages of paddy rice growth [36], [36].
Due to the lack of knowledge and awareness of suitable management to rectify rice plant leaf diseases, the rice production is being reduced in recent years [157]. The manual detection of plant diseases based on naked eye observation of experts is very time consuming, expensive and sometimes it produces an error when identifying the disease type [158]. Machine learning (ML) algorithm can be used to provide early warnings to anticipate rice blast and detect its presence, thus supporting the applications of biocidal chemical compounds or biological organisms used to kill parasitic fungi or their spores. Based on several studies reviewed in this paper, the applications of ML, in detecting the presence of rice blast, has also provided suitable solutions for preventive remedial actions targeting the mitigation of yield losses and the reduction of fungicide use [159]. This review will be beneficial for modelers, farmers and stakeholders, to guide them in model development and selection for the most suitable models for the effective paddy rice disease detection and forecasting. The identification of paddy diseases may also assist farmer in providing them the remedies based on the types of disease [160]. VOLUME 9, 2021 The quality of paddy rice production depends highly on the quality of soil properties. These soil properties include soils' pH and moisture, nitrogen and organic carbon content of the soil. For instance, a CNN produced promising results in assessing the nitrogen deficiency of paddy rice crop [104]. These soil properties can be captured using sensors or retrieved from the hyperspectral images [74]. SVM and CNN are the two most common machine learning algorithms used in assessing the quality of paddy rice. Compared to SVM, CNN produced better assessment accuracy [93]. Besides soil properties [105], some studies have conducted the assessment of the paddy rice quality based on the high-resolution images of the paddy rice leaf [93], [104], [123] and the hyperspectral images of the paddy rice field [74], [79].
Improving the management and productivity of the paddy rice farming is important to strengthen the food security initiatives. Due to variation in economic value of different varieties of rice, rice quality identification is very important in the international and national rice market [97], [100], [101]. The quality of the rice is used to evaluate the milling process. Rice sample may consist of full rice, broken rice, damaged rice, paddy, stones and foreign objects. Image processing and machine learning techniques can be used to separate and classify objects of rice sample [46]. Other than hyperspectral images [98], most of the studies related to paddy rice sample classification use high-resolution images and apply machine learning techniques such as SVM [46], [96], [98]- [100], [127] and deep learning algorithms [94]- [98], [101], [124]. Combining efficient feature extraction method (e.g., PCA) [103] with neural network algorithm (e.g., Back-propagation Neural Network (BPNN)) shows better accuracy results in paddy rice sample classification [96], [124] and also better clustering results for paddy rice grade identification [103]. Other image pre-processing such as histogram of oriented gradients (HOG) also affects the performance of the classifiers [94], [99]. Combining features in paddy rice sample classification also improves the classification accuracies [96].
Multi-classifier cascade can also be used to improve the performance of the paddy rice sample classification [99]. In order to get a good model, low bias and variance are required in order to have high accuracies or lower errors. An optimal balance of bias and variance would never overfit and underfit the model. Reducing variance of the final classifier model can be achieved by fitting multiple final models or using hybrid approaches [99], [103], [124] or increase the training size. In addition to that, example of low-bias machine learning algorithms include DT, k-NN and SVM. Based on the findings of this review, the performance of all these three machine learning algorithms are very competitive in predicting paddy rice yield estimation [109], monitoring paddy rice growth [78], [83], monitoring paddy rice disease [37], [40], [81], [91], [117], [119], [120], assessing quality of paddy rice [79], [104], [123] and paddy rice sample classification [46], [98], [99], [127].

VI. CONCLUSION
This paper provides a structured overview of the recent applications of machine learning algorithms and smart devices for paddy rice smart farming. In addition to that, this paper has proposed a framework that maps big data, machine learning and paddy rice smart farming tasks. The review study reveals considerable benefits to the production of paddy rice that have applied the machine learning techniques and smart devices in the paddy rice smart farming. As with any research, here, we also summarize the following guidelines based on the findings obtained from this review for future works.
First, there is a need to explore further the capability of ensemble models or hybrid models based on deep learning methods using multi-source data, as these have been shown to improve the performance of the base model. However, deep learning methods require large number of samples to come up with efficient models. For instance, in predicting paddy rice yield and monitoring paddy rice disease using the deep learning approach, a large amount of time-series data is required to improve the prediction performance [88], [112]. Since most of the studies conducted for paddy rice sample classification are based on image processing, the optimization of the classification accuracy (e.g., using hybrid or ensemble approach) is another issue that requires more explorations. For instance, more works on the variety of wavelet transforms for texture analysis and different classification techniques (decision tree, random forest) for paddy rice sample classification can be explored [96].
Second, a limited number of investigations conducted in the area of the application of machine learning algorithm based on multi-sources data as the findings from existing studies have shown that a more comprehensive understanding can be obtained by integrating multi-sources data or determining the optimum features combination. We can produce better modelling results comprehensively by analysing these complex relationships among multi-sources data or by finding the optimum features combination. For instance, using only spectral reflectance, shape and texture of paddy rice will not provide better results and additional ground truth data is required in order to classify and differentiate paddy rice accurately [161]. Using multi-features fusion (e.g., combining Landsat and SAR Time Series Data) can also improve the accuracy of predicting paddy rice yield using a deep learning approach [112]. Limited works are found in exploring and combining multiple sources of data (e.g., Sensored data (climate and soil properties), Remotely sensed data, vegetation indices and drone-based data (e.g., high-resolution images)) to improve the modelling of data for smart irrigation for paddy rice, predicting paddy rice yield estimation, monitoring paddy rice growth, monitoring paddy rice disease, assessing quality of paddy rice, paddy rice sample classification.
Finally, a more comprehensive analysis needs to be conducted to investigate the efficiency of processing software to perform image preprocessing for modelling. For example, Monitoring the growth of paddy rice based on spectral reflectance has limitations of the processing software and the complicated steps to process the images [66]. More researches need to be conducted to acquire high resolutions remotely sensed time series imagery data in both time and space through effective and efficient image segmentation process using data blending approaches [108]. in 2008. His Ph.D. degree focuses on intelligent techniques using machine learning to model and optimize the dynamic and distributed processes of knowledge discovery for structured and unstructured data.
He is currently an Associate Professor in computer science with the Faculty of Computing and Informatics, Universiti Malaysia Sabah, Malaysia, that focuses on data science and software engineering programmes. He leads and defines projects around knowledge discovery, information retrieval, and machine learning that focuses on building smarter mechanism that enables knowledge discovery in structured and unstructured data. His work addresses the challenges related to big data problem: How can we create and apply smarter collaborative knowledge discovery and machine learning technologies that bridge the structured and unstructured data mining and cope with the big data problem? He has authored or coauthored more than 150 journals/book chapters and conference papers, editorials, and served on the program and organizing committees of numerous national and international conferences and workshops.
Dr. Rayner is currently a Certified Software Tester (CTFL) from the International Software Testing Qualifications Board (ISTQB) and also a certified IBM DB2 Academic Associate (IBM DB2 AA). He leads the Advanced Machine Intelligence (AMI) research group, UMS, and he has lead several projects related to knowledge discovery and machine learning on big data. He was a recipient of the Research Fellow of the Japan Advanced Institute of Science and Technology (JAIST), Japan. He was also a recipient of multiple Gold and Silver awards at national and international research exhibitions in data mining and machine learning-based solutions (face recognition and knowledge discovery), that include the International Trade Fair Ideas (iENA 2018), Nuremberg, Germany; the International Invention Innovation Competition (iCAN 2018), Toronto, Canada; and Seoul International Invention Exhibition (SIIF 2010), Seoul, South Korea. He has secured RM6,931.433.00 worth of project grants. He was also a recipient of the Myron M. Rosenthal Academic Achievement Award for the outstanding academic achievement in computer science, in 1994.
JOE HENRY OBIT received the Ph.D. degree in computer science from the School of Computer Science, University of Nottingham. His Ph.D. thesis was developing a novel meta-heuristic, hyper-heuristic, and cooperative search. He is currently an Associate Professor of computer science with the Department of Data Science, Universiti Malaysia Sabah. His main research interests include interface of operational research and computer science. In particular, the exploration and development of innovative operational research, artificial intelligence, and distributed artificial intelligence models and methodologies for automatically producing high quality solutions to a wide range of real world combinatorial optimization and scheduling problems. VOLUME 9, 2021