Internet of Things Technology in Ecological Security Assessment System of Intelligent Land

The ecological security of land is related to the sustainable development of human beings. With the continuous progress of science, the evaluation standard of land ecological security is also changing. The traditional safety evaluation method is time-consuming and laborious, with high cost of data acquisition, small amount of data and large error of data itself. The establishment of intelligent land ecological security evaluation system through the Internet of things is conducive to a better understanding of the overall land situation in the region. Taking the land data of Cangzhou City, China as an example, this study uses the Internet of things technology to establish a set of intelligent evaluation and grading model. The random forest algorithm is used to evaluate the land ecological security, and combined with the existing data to predict the land ecological security in 2020–2025. Compared with other security evaluation methods, the intelligent land ecological security evaluation system in this study has the advantages of short operation time, low energy consumption and high accuracy. This method has a strong guiding significance for the future land ecological security management.


I. INTRODUCTION
As the basic environment for human survival activities, the land provides abundant resources. Among them, the quality and quantity of land resources restrict China's ecological security and economic development [1]. However, with the progress of the society, the population base keeps expanding. At the same time, in order to pursue high economic growth rate, human beings have developed land resources to different degrees. Unreasonable development methods have led to the reduction of green space, desertification and salinization of land. A series of land ecological problems, such as soil erosion, reduce the function of the land ecosystem and the ability to deal with and resist ecological risks, causing losses to the national economy and affecting the sustainable development of the society [2]- [4]. Therefore, it is of great significance to the ecological security evaluation of land.
In this context, the Internet of things technology for land ecological security evaluation came into being. The Internet The associate editor coordinating the review of this manuscript and approving it for publication was Chun-Wei Tsai . of things is the global infrastructure of the information society, and capable of connecting physical and virtual objects to each other to provide advanced services based on existing and evolving interoperable information and communication technologies. The Internet of things has identification, data collection, processing, and communication capabilities to make full use of objects to serve a wide range of applications, while ensuring necessary privacy. The goal is to realize all kinds of objects, including physical and virtual objects, to contact, interact, and communicate with each other through various networks [5]- [7]. There are various types of sensors and RFID devices in the Internet of things, and these sensors have different protocols, different structure, and performance, and their data structures are also different [8]. Data formats transmitted between modules of the Internet of things system also include text data, or multimedia data, such as picture format, video format, and so on. There are static data, and streaming data. Data polymorphism and heterogeneity of sensing devices lead to data heterogeneity [9], [10]. A large number of heterogeneous characteristics greatly increase the workload and difficulty of data analysis and processing and system development. Currently, the popular database management mode and system are difficult to deal with the massive heterogeneous data from multiple sources and face great problems in the process of storage, processing, and analysis [11]. The explosion of information and the emergence of massive data processing force the traditional database to innovate in technology. At the application level, massive data processing has become the basis of information scientific discovery and research. In terms of technology, mass data processing will be a revolution of traditional database technology and an inevitable trend of the development of Internet of things and cloud computing. It is necessary to introduce new computing technology to filter and clean the data and conduct data analysis and mining [12], [13]. The traditional database has many shortcomings in real-time access analysis, recovery and backup of massive data, and effective data mining. Massive data storage also faces constraints such as energy consumption and space limitations [14]. Random forest algorithm is a typical classification algorithm, proposed by professor Breiman, academician of the National Academy of Sciences, in 2001. It has good classification and prediction performance in biological, chemical, agricultural, and other fields [15]. Due to the randomness of random forest, sample number, and attribute number, it is very suitable to select parallelization method to establish multiple decision trees. In the classification process of Internet of things data processing, the parallel random forest algorithm based on Hadoop can be used to mine and classify data [16], [17]. At the same time, there are still few studies on the application of random forest algorithm in the land ecological security evaluation and prediction. The random forest algorithm is used to mine the data effectively, which has high research value for the evaluation of land ecological security.
Under such a background, the Internet of things technology for land ecological security evaluation came into being. Based on this, the Cangzhou city is taken as the research area to build the land ecological security evaluation index system. Random forest algorithm is designed and applied in the evaluation of land ecological security to analyze the current situation of land in Cangzhou city and predict its future trend.

II. LITERATURE REVIEW A. RESEARCH ON LAND ECOLOGICAL SECURITY EVALUATION
As an important part of the whole ecological environment, many researchers have evaluated and analyzed the ecological security of land. Reference [18] took 11 prefecture-level cities in Shanxi Province as evaluation units and built a comprehensive evaluation index system based on PSR model. The entropy weight method and the grey relational evaluation model were integrated to evaluate the comprehensive index, pressure index, state index, and response index of ecological security in the research area from 2005 to 2015, and the spatial and temporal evolution was analyzed. The results showed that: from 2005 to 2015, the comprehensive index of ecological safety in the study area showed an overall upward trend, and the safety level changed from level II and III to level IV and level V. The regional ecological environment has improved significantly, but the trend has slowed down, and the pressure caused by human activities on the ecological environment was generally small and has decreased [18]. Based on the PSFR model, [19] selected 16 sub-indicators and 34 evaluation indicators to establish a 4-level indicator system. Considering the entirety and difference of Luanhe river basin, the Luanhe river basin in Hebei province was divided into six assessment units to carry out ecological safety assessment according to the environmental characteristics and functional zoning. The results showed that the ecological security level of the whole basin was between grade III and grade II [19]. Reference [20], based on the PSR model, constructed an evaluation system of land ecological security in Lianyungang city from three aspects: pressure, state, and response. The evaluation and analysis of land ecological security in Lianyungang city from 2000 to 2016 were conducted with matter-element analysis method. The results showed that the ecological security level of land in Lianyungang city has increased from ''unsafe'' to ''relatively safe'' during the study period. The trend of overall security level was getting better, but the state of ''relatively safe'' was unstable, and the ecological security of land was not optimistic [20].

B. APPLICATION OF RANDOM FOREST ALGORITHM
Reference [21] applied random forest algorithm to seismic reservoir prediction and established a regression model between random forest seismic properties and porosity parameters. The results showed that the random forest model had strong robustness and tolerance to abnormal data different from the sample data. The key problem in the application of random forest algorithm in earthquake reservoir prediction was that no noise was included in the training data [21]. Reference [22] established a random forest model using the air quality monitoring network to predict the maximum 8-hour daily average ozone concentration in China in 2015 ([O 3 ] MDA8). Compared with the chemical transport model which required a large number of variables and expensive calculation, the random forest model had a comparable or higher prediction performance based on only a few readily available variables, but the calculation cost was much lower. The results showed that the national population-weighted [O 3 ] MDA8 was estimated to be 84±23µg/m 3 per year, with the highest seasonal average in summer (103±8µg/m 3 ) [22].
To sum up, the PSR model is widely used in the evaluation of land ecological security, and good evaluation effects have been obtained. However, these studies only evaluate the local ecological security in a certain period of time, and do not predict the future land ecology based on the evaluation results. As a kind of data mining algorithm, random forest algorithm can process high-latitude data with more attribute values, has higher prediction accuracy, low cost, and fast training speed compared with other algorithms. However, the application of it in the evaluation of land ecological security is still insuf- VOLUME 8, 2020 ficient. In order to fill the gap, based on the evaluation and analysis of the current situation of land ecology in Cangzhou city, the random forest algorithm is used to predict its future development trend, so as to provide a theoretical reference for the field of land ecology research.

III. LAND AND ECOLOGICAL SECURITY EVALUATION INDEX AND DATA MINING ALGORITHM DESIGN A. OVERVIEW OF STUDY AREA
Cangzhou is located in the north China plain, facing Beijing and Tianjin in the north and Bohai sea in the east. Its geographical coordinates are 37 • 29 'to 38 • 57' N, 115 • 42 'to 117 • 50' E. It covers an area of 13,420,000 km2 and has a total population of 7.8 million. The region belongs to the impact topography of Huang-Huai-Hai sea, with flat terrain and low fluctuation, and the altitude is between 0 and 17 m. Located in the mid-latitude coastal area on the east coast of Eurasia, the temperate monsoon climate is formed by climate circulation and thermal effects. In the same period of hot winter and hot summer and hot rain, the precipitation concentration degree is strong, the annual average temperature is 12.5 • C, and the average annual precipitation is between 500 mm and 600 mm. Cangzhou water system belongs to the outflow area of Bohai sea, the river is short, the ratio is lower, and there is spring flood and summer flood. The city has an average annual land volume of 2.85 billion m3, including 1.462 billion m3 of surface runoff and 1.369 billion m3 of underground land. However, the per capita land volume is only 385 m3, less than 1/6 of the national average level, which is a typical water shortage city. As the coordinated development of the Beijing-Tianjin-Hebei region, ''One Belt And One Road'', cooperation and development in the circum Bohai sea region, such as superposition of important strategic opportunities, Cangzhou coastal, location, transportation, industry and other comprehensive advantages, healthy and rapid economic development, the strategic position of Hebei province significantly increased, and gradually become an important coastal open city of ohai rim region. However, with the promotion of regional urbanization and industrialization in recent years, the population of the city has been increasing year by year, the water wetland has been expropriated, and the large-scale industrial base is backward, which on the one hand causes the ecological environment of the land in the region to be destroyed. Taking Cangzhou as the research area, this research aims to clarify the spatial and temporal distribution characteristics of regional land and comprehensively evaluate the land status, so as to provide basic information for regional land management and economic structure adjustment.

B. EVALUATION INDEX OF LAND ECOLOGICAL SECURITY
The index system can fully reflect the regional ecological security, which must be based on the structural complexity of the ecosystem. In order to better analyze the impact of human activities on the ecology, three models of pressurestate-response (PSR) based on the connotation of ecological security were established to better and more directly and comprehensively analyze the problems of land ecological security and the relationship between economic and social development and resources, environment and ecological security. The PSR model includes three models: ecological pressure, ecological state and ecological response. The data obtained from the three models determine the evaluation index. According to the PSR conceptual framework, Cangzhou city's land ecological security evaluation index system is constructed from the aspects of population, economy, resources and environment, etc., as shown in table 1.
The evaluation standard of land ecological security is the basis of quantitative and graded evaluation. At present, there is no unified understanding at home and abroad. Based on the research experience of relevant scholars [23], the evaluation standard of land ecological security is divided into five levels: safe, relatively safe, critical safe, less safe and unsafe, and the range of the corresponding standard magnitude is given, as shown in table 2.

C. DATA MINING ALGORITHM DESIGN OF LAND ECOLOGICAL SECURITY EVALUATION
The data processing process of Internet of things for land ecological security evaluation is shown in Figure 1. Hadoop distributed framework is adopted for distributed data processing. First, the collected data are loaded and processed to convert it into data in a uniform format. The distributed file system HDFS is used to store a large amount of data. Hive distributed data warehouse and MapReduce framework are adopted to clean and filter the data according to the demand. The redundant information is deleted, and finally the stored data is mined to provide efficient and predictable services for land ecological security evaluation.
In the Internet of things of land ecological security evaluation, the data that needs to be mined is huge and grows exponentially. Therefore, the extended data mining algorithm is usually used to process large-scale data and improve the operation speed of the system. The MapReduce model framework was used to process and analyze data on the basis of Hadoop processing platform. According to the characteristics of the existing data mining algorithm, whether it can be MRed so as to carry out distributed programming was analysed to improve the computing efficiency. Decision tree is an important concept in random forest because unpruned decision tree is the base classifier of random forest and decision tree is the basic unit of random forest. Many decision trees together constitute the random forest, and the collection of different decision trees adopts the random way, so these decision trees become the random decision trees. In a random forest, there's no connection between each random decision tree. When the data to be tested is introduced into the random forest, it is essentially to classify the random decision tree and finally take the most output results obtained from the decision tree as the final results.
Decision tree belongs to the category of single classifier, and its production process can be summarized into three steps: first, it needs to use recursive analysis to analyze the samples and get the tree structure which is like reverse growth. Then, the paths of each part of the decision tree are analyzed to obtain a series of rules. The third step is to make the type distinction and correlation prediction of data and samples according to the rules. The core idea of decision tree classification is to generate classification rules through path analysis, and follow the classification rules to conduct data mining on samples and related data.
Decision tree uses single classifier to classify and make decision. The following problems exist in decision tree classification.
I. Cumbersome classification rule. Due to the use of local greed method, the decision tree only takes one attribute as the basis to establish the decision tree in the process of rule classification. Therefore, the classification rules related to the decision tree may be very complicated, which requires the optimization of the decision tree, and pruning is usually adopted to achieve the simplification of the decision tree.
II. The possible result is a locally optimal solution. Some decision tree algorithms lack the process of multiple selection and comparative analysis in the retrieval process. Therefore, some decision trees may have local optimal solutions instead of one or more overall optimal solutions. VOLUME 8, 2020 III. The existence of overfitting problem. In the classification and decision-making process of decision tree, due to the variety of classifiers and rules, overfitting may occur due to the excessive influence of noise in the learning process of decision tree.
In order to make up for the shortcomings of decision tree, a stochastic forest algorithm is proposed. Random Forest (RF) establishes a forest in a random way. The Forest here is composed of a large number of decision trees, and there is no correlation between each decision tree. After the establishment of the forest, the new input sample data was used for prediction, judgment and classification. Based on the decision tree established by the random forest in the previous stage, the classification of the sample data is determined. The number of choices for each category is calculated. The category with the largest number of choices is the predicted category.
Random forest is a fusion algorithm integrating the formation of N {p (x, n ) , k = 1, 2, . . . . . . , N } classification regression trees. Based on random subspace theory and self -aggregation method, random vector (X , Y ) { n , k = 1, 2, . . . . . . , N } is selected randomly. Let X and Y be input and output vectors in the random subset of independent random vectors (X, Y), respectively. For the output p(h) of the prediction sample, there is a generalization error: The random forest output is based on the mean value of N regression trees {p ( , X n ) , k = 1, 2, . . . . . . , N }.When k → ∞, then there is: Its generalization error (RE) is: For all single trees, the average generalization error is: In the equation, − r is residual, and the individual trees are relatively independent.
The value of the attribute variable of the node after it forms the decision tree is generated from several randomly selected attribute subsets. For the samples to be tested, the random forest allows each tree to vote through self-help aggregation, and the highest category of votes is the output result, namely: In the equation, P(x) is the result of the random forest combination model, pi is the single tree classification model, and I is the indicator function.
In the classification process of Internet of things data processing for land ecological security assessment, the Hadoopbased random forest algorithm can be used to classify and process the data, and its algorithm process is shown in Figure 2. In this algorithm, the input data is preprocessed, the decision tree is established and stored in HDFS. The data is aggregated and the prediction results are output. Compared with other algorithms, the Hadoop-based random forest algorithm can process data with more attribute values, so that relatively important attributes can be obtained after data training instead of pruning each tree.

1) DATA PRE-PROCESSING
For the input Data sample generate Data and Dataset, where Data records every Data point, and Dataset records Data format. The data structure is composed of five data types: ignored represents the attribute value ignored  by the classification, values represents the specific value of the classification type of the sample, nbinstance represents the data amount of the input sample, attribute represents the saved sample data type, and labelld represents the saved specific value of the specific data type in the sample. The process of data preprocessing is shown in Figure 3.

2) BUILDING RANDOM FOREST
Step 1: The Decision TreeBuilder type object is created and the corresponding parameters are set such as the input path. The number of attributes m and the parallel configuration parameter conf are selected, and the description file generated in the previous stage through the distributed cache file is read to parse the input sample file.
Step 2: Creation of a random forest. This step is mainly to use the MR programming model to transform the data of each shard in the map function and build a tree in the clean function. The relevant data structure and function process of distributed decision tree establishment are shown in Figure 4: A: Reading the description file of training data and parameters, such as the number of trees built, randomly selected attribute trees, configuring parallel parameters, and creating new job tasks.
B: The map function transforms the input sample of this data sharding into instances and constructs the Data format for classification processing.
C: N pieces of data of equal size are randomly selected from the input samples to be put back.
D: isIdentical(data) is used to determine whether these randomly selected data have the same attribute values. If they are the same, then the leaf node is output. E: Judginf the category. If the category value of all data under the branch is the same, then it is judged as leaf node. The data.identicallabel () method is used for judgment. if eligible, the data belong to the same category.
F: m attribute values are randomly selected and the data are split attributes = randomAttributes The information gain values of these m attributes are calculated respectively and recorded as split attribute values.
G: The value range of the property selected in step f is obtained and the data are divided into several parts according to the property.
double Step 3: The random forest created by several random trees created following the steps above are written to HDFS to be saved.

3) TEST PREDICTION CLASSIFICATION
Step 1: The parameters should be configurated, and the input file and algorithm path are defined.
Step 2: MapReduce framework is adopted for prediction. First, the test data is transformed, and all trees are traversed, and the corresponding leaf nodes are traversed in each decision tree to get the classification results.
Step 3: The corresponding results are output and the accuracy is calculated.

D. EXPERIMENTAL ENVIRONMENT
In this study, the Hadoop distributed cluster composed of four machines is used as the data processing platform for testing the Internet of things. A node is used as the Master Master node of HDFS and MapReduce, namely NameNode and JobTracker node, which is mainly responsible for metadata management and task scheduling of distributed data processing platform. The other nodes are Slave working nodes, namely DataNode and TaskTracker, which are mainly responsible for data storage and specific distributed computing processes.

B. TEAT RESULTS OF RANDOM FOREST ALGORITHM
In order to study the performance of the random forest algorithm, accuracy, average recall, iteration times and running time were used to analyze and compare the random forest algorithm in this study. Where, accuracy represents the ratio of the number of correct clustering data to all data, and average recall represents the average of the ratio of the number of correctly divided into various categories to the data objects. The formula of accuracy and average recall is as follows.
The sample data is divided into training samples and detection samples to test the algorithm. The size of the two sample data is 1 GB, and the sample size is 10 million. Under the same training sample and detection sample, the random forest algorithm and decision tree algorithm are compared with SVR [24] and generalized neural network algorithm [25].  SVR maps data to a higher dimensional space through nonlinear mapping, where linear regression can be carried out. The algorithm of finding the optimal linear regression hyperplane is summarized to solve a convex programming problem with a convex constraint, and the global optimal solution is obtained. However, it has shortcomings such as slow training speed, large training error, and poor generalization ability [26].
It can be concluded from the data in table 3 that the random forest algorithm has the shortest running time, the highest accuracy, the highest average recall rate, and the best performance.
Since the data used in this study belongs to the time series, on this basis, the performance of the random forest algorithm is compared with the cyclic neural network model (RNN) and the long-and short-term memory network model (LSTM). The results are shown in table 4. The accuracy and average recall rate of LSTM are higher than that of random forest algorithm, but the difference between the two is relatively small, while the accuracy of RNN is low and the overall performance is relatively poor. At the same time, although the accuracy and average recall rate of random forest algorithm are slightly lower, its time consumption is the shortest, which greatly saves the running time of the algorithm [27].

C. EVALUATION RESULTS OF LAND SECURITY
The land ecological security level is jointly affected by the index factors in all dimensions of PSR. The key to applying the random forest algorithm to evaluate ecological security is to construct the membership rules between the ecological security level and a single index in the indicator system according to the evaluation classification standard. The specific evaluation process is as follows: step 1: Each index was randomly interpolated into 200 groups of sample data according to the classification standard interval, and the total 99778 VOLUME 8, 2020 sample size of 5 evaluation grades was 1000 groups.
Step 2: sample settings. 100 groups of data were randomly selected from 5 graded sample groups, and 500 groups of data were set as training samples and the remaining 500 groups as test samples. The index factors in the sample are input vectors, and the evaluation results I, II, III, IV and V corresponding to the classification standard are represented by the Numbers 1, 2, 3, 4 and 5 successively, and taken as output vectors, and then random forest network training is carried out.
Step 3: parameter optimization and model optimization.
In the process of model training, sensitive parameters mtry and ntree can be selected according to the size of OOB error outside the bag. In the random forest algorithm, mtry is a category of split attributes, usually set as the square root value of the variable. In this study, the total number of variables is 20, and its rounding is 4 or 5. It has been verified by multiple experiments that the evaluation result is better when mtry is 5. Ntree is the number of trees in the random forest algorithm, which affects the calculation rate and accuracy of the model. As shown in Figure 5, when its value is greater than 500, OOB is small and stable. Step 4: indicator measurement. The random forest algorithm can determine the importance of an index by the change of OOB after adding or subtracting an index factor. Samples that are early enough are generated randomly within the classification interval without loss of generality and can avoid noise pollution, so the measurement of the importance of ecological security evaluation index is more objective [28]. Step 5: model evaluation. The performance of random forest was evaluated according to the output value and actual value generated after the model training, as shown in table 5. It can be concluded that the Re of the training sample and the detection sample are close to 1, and MSE and RMSE are close to 0. The model has high fitting accuracy and good generalization ability, which can be applied to the ecological security evaluation of the target region.
Step 6: threshold setting and model application. According to the critical value of each index classification in table 2, random forest algorithm is used for simulation calculation, and its simulation value is taken as the standard threshold of different classification. The thresholds are shown in table 6.

D. LAND SAFETY FORCAST RESULTS
Ecological security is restricted by many different factors. The reason why it appears randomness is that it has changed VOLUME 8, 2020  in a certain space and time range. Therefore, to predict ecological security, its rules must be obtained first. According to the random forest regression principle, the land ecological pressure, state, response index and ecological security index of Cangzhou city from 1996 to 2016 were set as training samples to predict the security index of land ecological pressure, state and response of Cangzhou city from 2020 to 2030. As shown in Figure 6, the single prediction step size was set as 1. The design program of Matlab2016b is applied to implement simulation and prediction. After a lot of experiments, and the embedded dimension was determined to be 10 (that is, the index of the first 10 years was used to predict the index of the next year), and the optimal parameters in each prediction model were selected. According to the calculation, the precision of the model is all within the acceptable range, so the land ecological security in Cangzhou city from 2020 to 2025 is predicted.
Through the application of the previous evaluation method for land ecological security, various aspects of land ecological security data were obtained. The main research years were land pressure in Cangzhou city from 1996 to 2016 and from 2020 to 2030, as shown in Figure 7. It can be concluded that the data of response, state and pressure in this decade ranged from 1 to 4, with the characteristics of instability, and evolved from a safer level to a less safe level. This is due to the rapid development of Cangzhou city since the new century, a large number of land expropriation, land ecological environment deterioration, forest coverage, agricultural output per unit area are low, so there is instability.
According to the prediction results, the pressure index tends to decrease in the next few years, and the resulting land ecological security will also face great pressure. The area of returning farmland to forest will be increased, and the area of saline and alkaline land will be reduced.

V. CONCLUSION
In this research, Cangzhou city is taken as the research area, land ecological security evaluation index system is constructed, and random forest algorithm is designed. It is compared with the decision tree algorithm, and the running time, accuracy and average recall of the algorithm are studied and applied to land ecological security evaluation. The present situation of land in Cangzhou city is analyzed and its future trend is predicted. Compared with decision tree algorithm, SVM regression prediction algorithm, and generalized neural network algorithm, stochastic forest algorithm has short running time, high accuracy and average recall. Moreover, the algorithm has high fitting accuracy and good generalization ability, and can be applied to ecological security evaluation of the target region. The land ecology of Cangzhou city is unstable from 1996 to 2016. The land ecology prediction of Cangzhou city using this algorithm in the next 10 years shows that the pressure index tends to decrease, the area of returning farmland to forest will be improved, and the area of saline and alkaline land will decrease. The algorithm has simple model, fast learning rate, high tolerance for dimensionality and data noise, and can exclude the interaction between internal dimensions. The effects of the out-of-pocket error object model are evaluated. Its shortcoming is that it is not sensitive to attribute response with more values, which can be considered in future research.