Knowledge Discovery of Global Landslides Using Automated Machine Learning Algorithms

Understanding the complex dynamics of global landslides is essential for disaster planners to make timely and effective decisions that save lives and reduce the economic impacts on society. Using NASA’s inventory of global landslide data, we developed a new machine learning (ML)–based system for town planners, disaster recovery strategists, and landslide researchers. Our system revealed hidden knowledge about a range of complex scenarios created from five landslide feature attributes. Users of our system can select from a list of $1.295\times {10}^{64}$ possible global landslide scenarios to discover valuable knowledge and predictions about the selected scenario in an interactive manner. Three ML algorithms—anomaly detection, decomposition analysis, and automated regression analysis—are used to elicit detailed knowledge about 25 scenarios selected from 14,532 global landslide records covering 12,220 injuries and 63,573 fatalities across 157 countries. Anomaly detection, logistic regression, and decomposition analysis performed well for all scenarios under study, with the area under the curve averaging 0.951, 0.911, and 0.896, respectively. Moreover, the prediction accuracy of linear regression had a mean absolute percentage error of 0.255. To the best of our knowledge, our scenario-based ML knowledge discovery system is the first of its kind to provide a comprehensive understanding of global landslide data.


I. INTRODUCTION
Landslides are natural events that have adverse effects on human life, infrastructure, the economy, and society [1]. To reduce the negative effects of landslides and increase the level of disaster preparedness, in-depth research on global landslides is essential [2].
In the past, strategic decision-makers needed a data scientist to prepare the data, develop machine learning (ML) models, and summarize the results. Depending on the complexity of the problem, the data scientist may need an information technology administrator to run high-performance computing on infrastructure capable of handling the ML load [3], enabling the data scientist to execute the ML model and manually summarize the results for the strategic decision-maker. This task delegation process can undergo several iterations until the required model satisfies the needs of the strategic decision-maker (see Fig. 1). The delays The associate editor coordinating the review of this manuscript and approving it for publication was Zijian Zhang . caused by this task delegation may become critical if additional roles (e.g., business analysts, data engineers, artificial intelligence (AI) engineers, statisticians, database administrators, etc.) are introduced.
As shown in Fig. 1, the proposed system eliminates the delays associated with task delegation. Users of the proposed system can access secure cloud-based solutions for specific scenarios using a range of mobile, tablet and other webenabled devices [5]. Because the system uses recently developed natural language processing algorithms, the user can select a landslide scenario and obtain instant insights in plain english texts [6]. The fully automated summarized insights produced by the proposed system can support evidence-based decision-making to save lives, protect infrastructure, and reduce economic impacts on society.
In its current version, the system is connected to the National Aeronautics and Space Administration (NASA) landslide database, which contains 14,532 records of global landslides covering 12,220 injuries and 63,573 fatalities in 157 countries. From these data, 1.295 × 10 64 possible FIGURE 1. Evidence-based decision-making allows town planners and strategic decision-makers to implement effective landslide policies that can save lives, protect infrastructure, and reduce economic impacts on society.
scenarios can be constructed (see section III). From this vast number of scenarios, 25 were randomly selected to demonstrate the applicability of the system using laptops or webenabled mobile devices, with a clustering accuracy (area under the curve (AUC)) of up to 0.951) and a prediction accuracy (mean absolute percentage error (MAPE)) of up to 0.10.
Based on the existing literature [7]- [15], the proposed system is the first to utilize clustering algorithms such as automated anomaly detection (AUC up to 0.951) and decomposition tree analysis (AUC up to 0.896) in the landslide domain.
Existing landslide research using ML is based on the use of R, Python and other statistical programming languages that do not support the use of ML programs in the mobile environment. However, to make the proposed solution available to strategic decision-makers via mobile, tablet and other web-enabled devices, the proposed system was coded using Microsoft .NET and Azure. Micrsoft documentation in [16], provides the details of supported ML algorithms on Microsoft ecosystem.
Gaps in the literature on the use of ML algorithms in landslide research include the following: 1. Researchers require a complex understanding of ML models [14]. 2. ML is not being utilized to facilitate strategic decisionmaking about disaster preparedness or risk management [15]. 3. The results obtained from existing ML-based landslide research were not expanded in a natural language [14]. 4. The potential for ML models to analyze the root cause of landslides is not being harnessed [17]. 5. ML models and data must be manually handled, making knowledge discovery a time-consuming and laborintensive task [7]- [14]. 6. Existing implementation of ML algorithms are not suitable for use in integrated cloud-based applications or web-enabled mobile devices. Table 1 shows the algorithms used in the literature and whether they may be implemented in a .NET-based cloud environment.
We have previously reported on the use of ML to solve problems ranging from abnormality detection [18]- [20] to person identification [21]. Here, ML is used in knowledge discovery and analysis of the root cause of global landslides.

III. METHODOLOGY
The primary motivation for designing a new ML-based knowledge discovery solution arose from the deficiencies in VOLUME 9, 2021 FIGURE 2. Methodology for knowledge discovery of global landslides using automated machine learning.
existing ML-based landslide research. The proposed solution has the following features: 7. System users (i.e., strategic decision-makers) do not need a deep understanding of ML models. Using natural language processing [6], the information is translated into a language that the strategic decision-maker can understand. 8. Multiple interactive interfaces facilitate strategic decision-making on disaster preparedness and risk management using multiple ML algorithms. 9. The insights obtained from one ML algorithm (e.g., regression) may be expanded to other algorithms (e.g., anomaly detection, decompression tree analysis). 10. Decompression tree analysis is used to analzye the root causes of landslides. This is the first time that decompression tree analysis has been used in landslide research. 11. The solution is fully automated, and the appropriate ML algorithm is automatically executed on the correct set of data without the need for manual intervention from data scientists, data engineers, statisticians or database programmers, minimizing delays. 12. The solution is programmed using .NET, facilitating its use in Microsoft Office 365 and Azure [5], [6], [16], [22]. This will allow strategic decision-makers to access the solution via laptops and mobile devices. To develop the proposed solution, data were obtained from NASA's global landslide inventory [23] before being cleaned and transformed prior to modeling. Data modeling was then performed using best practice [24]. Finally, the data were visualized and analyzed using ML algorithms (see section III D). Fig. 2 shows the step-by-step process used to generate AI insights into global landslide data.
A. OBTAIN AND PREPARE DATA Data may be accessed from a range of sources, including online databases, websites, Excel files, flat files, web-based application programming interfaces, and even pdf files. After identifying the data source, data integration tools (e.g. SQL Server Integration Services or Power BI Query Editor) may be used to facilitate the export, transformation, and loading of data from the source into a data warehouse. Data transformation and cleaning, also known as data preparation, transform  the data into the correct format for modeling or analysis using ML. For this research, we obtained data from an online source in a comma-separated values format [23]. The data were then transformed into a suitable format, allowing a more rapid analysis and better understanding of the feature attributes of global landslides. Fig. 3 shows the categorization of the data fields, while Table 2 presents the detailed statistics of the landslide data. Understanding the statistics of landslide feature attributes is crucial before proceeding to the next step, namely data modeling, visualization, and analysis using ML.

B. MODEL DATA
Data modeling is the most important stage in the process of knowledge discovery using ML. With effective data modeling, ML-driven solutions can produce powerful insights with minimal delays. During this phase, the relationships among different sets of data with the correct cardinality are drawn.  Fig. 4 shows that the data obtained in this study were arranged in a star schema [24], with the main factual data (fatality and injury counts) in the center, surrounded by the following dimensions: category, size, setting, trigger, and country. This arrangement enabled the analysis of the main facts by category, size, setting, trigger, and country using one-way filtering. The benefits of the star schema over other data modeling techniques (e.g., flattened tables, snowflakes) is that it provides faster and more accurate results [24].
In our system, we created the scenario (S) using five dimensional features: category (G), size (I), setting (E), trigger (T), and country (C). Therefore:  − 1), which is the formula used to calculate the power set of the attribute minus 1 (i.e., P (A) − 1), where |A| is the cardinality of A. One is deduced because the power set also includes an empty set, and the selection of an empty set is not supported by the proposed system.
Hence, the total number of possible scenarios for our global landslide can be calculated as: The purpose of this study is not to produce an exhaustive list of global landslide data covering all 1.296 × 10 64 scenarios. However, we demonstrate the ability to dynamically discover knowledge based on automated ML algorithms for any possible scenario out of 1.296 × 10 64 scenarios.

C. VISUALIZE DATA
Once the data modeling was complete, we used category, size, setting, trigger, and country to filter the factual data to drive the ML-based knowledge discovery. A wide range of visualizations, including slicers, Bing Maps, key influencers, decomposition analysis, and anomaly detection on a line chart were used in the dashboards. Changing the value of each filter (e.g., landslide size to small, medium, or large) filtered the fact table containing the number of injuries and fatalities, in turn changing the key influencers, anomaly detection, or decomposition analysis. Table 3 shows how a change in a filter such as landslide size affects the number of injuries and fatalities. Table 3 shows that the total number of fatalities caused by medium-sized landslides was higher than that caused by large landslides. Table 3 was generated using the exploration dashboard of our ML-based knowledge discovery system (see Fig. 6). Fig. 6 shows that from 1915 to 2021, approximately 14,532 global landslides caused 12,220 injuries and 63,573 fatalities.  to identify anomalies in the number of total casualties, fatalities, injuries and countries by year (see Fig. 8). 4. Decomposition analysis to identify root causes and explore data (see Fig. 9). The dashboards are publicly available and hosted in Microsoft Cloud [5] (for data exploration purposes without ML features only). To utilize the fully functional version of ML-based knowledge discovery, a user can download our solution from GitHub [22], which provides regression analysis (Fig. 7), anomaly detection (Fig. 8), and decomposition analysis (Fig. 9).

D. ANALYZE DATA USING ML
For this study, we conducted an extensive analysis of NASA's global landslide data, comprising 14,532 records [23]. We used the ML algorithms in ML.NET [16], including regression analysis [25]- [27], anomaly detection [28], [29], and decomposition analysis [30], to analyze the number of total casualties, fatalities, and injuries according to the following feature attributes: landslide category, size, setting,  trigger, and country. As shown in Fig. 5, before executing any ML algorithm, the data must be prepared and cleaned for faster ML operations. To achieve this, a series of data transformations is undertaken [31]. Once the data transformation is complete, depending on the type of analysis and intent, the proposed solution will perform regression analysis [16], VOLUME 9, 2021  anomaly detection [28], [29], or decomposition tree analysis [30]. For regression analysis, if the data are numerical, then linear regression [25] is performed, and if the data are categorical, then logistic regression [26], [27] is performed.

1) TRANSFORMATION
Transformation was undertaken to prepare the global landslide data for regression analysis, anomaly detection, and decomposition analysis. During the transformation, the following three algorithms in Microsoft.ML.Transforms were executed: 13. The OneHotEncoding function converts categorical data into numerical values for efficient and effective processing of ML algorithms [32]. 14. The ReplaceMissingValues function replaces missing values with default, minimum, maximum, mean, or most frequent values 33]. 15. The NormalizeMeanVariance function adjusts values measured on different scales to a notionally common scale with computed mean and variance of the data [34].

2) REGRESSION ANALYSIS
In this paper, regression analysis was used to identify the most important landslide feature attributes associated with landslide-related fatalities. Regression analysis automatically ranks factors by their relative importance and displays them as key influencers of both categorical and numerical metrics. Two types of regression were used (see Fig. 5). For numerical features, linear regression was performed using ML.Net's stochastic dual coordinate ascent function [25]. Linear regression is one of the simplest ML algorithms in supervised learning techniques and is used to solve regression problems and predict continuous dependent variables with the help of independent variables. The goal of linear regression is to identify the best-fit line that can accurately predict the output of the continuous dependent variable. By finding the best-fit line, the algorithm establishes a linear relationship between the dependent and independent variables in the form In contrast, for categorical features, logistic regression was performed using ML.Net's L-BFGS logistic regression [26], [27]. Logistic regression is one of the most popular ML algorithms for supervised learning techniques. It can also be used for classification and regression problems. Logistic regression was used to predict the categorical dependent variable with the help of independent variables using Log y y The output of the logistic regression problem can only be between 0 and 1; therefore, logistic regression may be used when the probabilities of two classes are required, such as whether it will rain or not, 0 or 1, true or false, etc.
MAPE has been used in previous ML-based landslide research [12] to evaluate the performance of prediction algorithms. Therefore, in this study, MAPE was used to evaluate the accuracy of linear regression. To measure the accuracy of logistic regression, AUC was used because it has been used in previous research to measure the performance of clustering algorithms.

3) ANOMALY DETECTION
Anomaly detection enhances line charts by automatically detecting anomalies within time-series data. It also provides explanations of anomalies to help with root cause analysis.
Before delving into the details of anomaly detection, we consider the problem definition.
Problem 1: Given a sequence of real values (i.e., x = x 1 , x 2 , x 3 , . . . , x n ), the task of time-series anomaly detection is to produce an output sequence (y = y 1 , y 2 , y, . . . , y n ), where y i ∈ {0, 1} denotes whether x i is an anomaly point.
The implemented solution was informed by the spectral residual (SR) approach used in the visual saliency detection domain. Then, a convolutional neural network (CNN) was applied to the results produced by the SR model [28].
The SR algorithm consists of three major steps: 1. Fourier transform to obtain the log amplitude spectrum 2. Calculation of SR 3. Inverse Fourier transform to transform the sequence back to the spatial domain: where f and f 1 denote Fourier transform and inverse Fourier transform, respectively; x is the input sequence with shape nX1; A(f ) is the amplitude spectrum of sequence x; P(f ) is the corresponding phase spectrum of sequence x; L(f ) is the log representation of A(f ), and AL(f) is the average spectrum of L(f ), which can be approximated by convoluting the input sequence by h q (f ), where h q (f ) is a qXq matrix defined as: R(f ) is the SR; that is, the log spectrum L(f ) minus the averaged log spectrum AL(f). The SR serves as a compressed representation of the sequence, whereas the innovation part of the original sequence becomes more significant. Finally, the sequence was transferred back to the spatial domain using inverse Fourier transform. The resultant sequence S(x) is referred to as the saliency map [29]. The values of the anomaly points are calculated as follows: where x is the local average of the preceding points, mean and var are the mean and variance, respectively, of all points in the current sliding window, and r ∼ N (0, 1) is randomly sampled. In this process, CNN instead of raw input is applied to the saliency map, making the overall process of anomaly detection more efficient [28], [29]. VOLUME 9, 2021 Given that this paper is the first to report the use of anomaly detection in landslide research, AUC was used to measure its performance against that of clustering algorithms in previous studies [7], [9]- [11].

4) DECOMPOSITION TREE ANALYSIS
Decomposition tree visualization is a valuable tool for ad hoc exploration and root cause analysis when visualizing data across multiple filter attributes or dimensions [30].
Our implementation of decomposition analysis enables the visualization of landslide casualty data over a range of landslide feature attributes, namely, trigger, category, setting, size, and country. As shown in Fig. 9, interactive root cause analysis and data exploration are supported by the aggregation of data and drilling down into the dimensions.
For filter attributes T = {T 1 , T 2 , T 3 , . . . , T N }, where N is the number of total filter attributes within a dataset (i.e., the cardinality of T, |T | = N ), each filter attribute can form one or many filtered conditions, as follows: Each filter condition can filter r number of rows (r ∈ {1, 2, 3, . . . R}) from the dataset. For example, when Coun-try_name = Ecuador was selected, the filter condition T 1 58 selected 39 records (i.e., r = 39) from the global landslide dataset (where T 1 is the country_name filter attribute, and Ecuador is the 58 th item in that attribute). Continuing on, we defined landslide casualties as: Where, r is the rows effected by filter attribute condition T n i (20) Our decomposition tree visualization (supported by AI) enables the user to find the next filter attribute condition in which to drill down based on either high or low values [30]: 1. High value: This mode considers all available filter attribute conditions and determines that into which to drill down to obtain the highest value of the measure being analyzed. Therefore, the high-value AI split mode finds the most influential filter attribute condition T n i for which the highest level of casualty occurs, as represented by:  being analyzed. Therefore, the low-value AI split mode finds the most influential filter attribute condition T n i for which the lowest level of casualty occurs, as represented by: ∃T n i ⊆ T |C n i < C m j , ∀n, m ⊆ {1, 2, 3, . . . , N }∧∀i, j ⊆ {1, 2, 3, . . .} (22) As shown by Fig. 10, selecting ''High value'' for the measure ''Casualty'' reveals that the highest number of casualties (30,142.50) occurred when the landslide setting was a natural slope (i.e. C n i = 30142.50, where T n i represents filter attribute condition ''Landslide_Setting = natural slope''). In this way, the AI split allows the user to delve into the root cause. AUC was used to measure the performance of the decompression tree analysis algorithm.

IV. RESULTS
We executed the ML algorithms described in the previous section (i.e., decomposition analysis, regression analysis, and anomaly detection) on global landslide data containing 14,532 records of landslide events worldwide. By selecting the filter settings for one or more landslide feature attributes, we created particular scenarios from the set of 1.296 × 10 64 scenarios, as shown in Equation (7). A change   in an attribute filter causes the fact table containing injuries and fatalities to change, as shown in Fig. 4 and Table 2.
We used 25 scenarios of 1.296 × 10 64 possible scenarios to demonstrate the applicability and usability of the proposed ML-based knowledge discovery solution. As shown in Table 4, Scenarios 1-3 were based on decomposition analysis, Scenarios 4-22 were based on automated regression analysis, and Scenarios 23-25 were based on anomaly detection.

A. DECOMPOSITION ANALYSIS
Using decomposition tree analysis in Scenarios 1-3, our system answered the following strategic questions: 16. What causes the highest number of casualties? 17. What causes the highest number of casualties when the landslide setting is urban? 18. What causes the highest number of casualties when the landslide trigger is a tropical cyclone? Fig. 9 shows that we delved into the fifth level of detail to identify the causes of the highest number of casualties.

B. REGRESSION ANALYSIS
As shown in Fig. 7, the dataset may be filtered by any combination of values from feature attributes such as country, landslide setting, landslide trigger, landslide category, and landslide size. When country name was set to ''Italy'' and trigger was set to ''continuous rain'', the dataset was filtered to only six landslides (containing six injuries and seven fatalities). Therefore, regression analysis was only applied to these six landslide events, finding a positive correlation between fatality count and longitude and a negative correlation between fatality count and latitude (Scenario 10 of Table 5). When longitude increased by 3.23, the average fatality count increased by 0.76. When average latitude decreased by 2.71, the average fatality count increased by 0.28.
Using regression analysis in Scenarios 4 to 22, our system demonstrated that latitude and longitude exhibit a wide range of behaviors depending on the selected country and landslide features.

C. ANOMALY DETECTION
Anomaly detection automatically identifies anomalies in time-series data, along with supporting explanations and the strength of each explanation. As shown in Fig. 13, an anomaly was detected in 2010, when the fatality count was abnormally high (5,424). This value was substantially higher than the expected value of 4,787 and fell outside of the expected range of 4,361-5,212.  In Scenario 24 (Anomaly Case 2), the total number of landslide events in 2018 was 992, which was higher than the expected range of 664-980 (see Fig. 15). The anomaly detection algorithm in Fig. 15 attempts to explain this anomaly with the following three possible explanations (see Fig. 16 Fig. 17. Here, the injury count in 2010 was exceptionally high at 1,317 (substantially higher than the range 41-159). Fig. 17 also shows a possible explanation for Scenario 25 with the corresponding strengths. One of these explanations is that in 2010, Congo observed a substantially higher number of injuries compared with the usual range, increasing the global number of injuries from landslides in 2010. Our automated knowledge discovery solution assigned a strength of 48% for this explanation (see Fig. 18).
Therefore, in Scenarios 23 to 25, our system detected and automatically provided explanations for the following three anomalies: 42. In 2010, the fatality count was abnormally high (5,424), which was substantially higher than the expected value of 4,787 and fell outside of the expected range of 4,361-5,212. 43. In 2018, the number of landslide events was 992, which was higher than the expected range of 664-980. 44. In 2010, the number of injuries was exceptionally high at 1,317 (substantially higher than the range of 41-159).

D. VALIDATION OF ML ALGORITHMS
In previous research, ML models have been evaluated using data splitting or cross-validation in which some of the data are used to estimate the model coefficients, and the remainder are used to measure a range of evaluation metrics [7], [9]- [14]. This process of model evaluation is suitable for studies based on a single static dataset. Unlike previous studies, this study reflects a comprehensive dataset of 14,532 global landslide records that dynamically updates to a smaller set of datasets based on the user's selected scenario. Hence, for the present study, there are 1.295 × 10 64 smaller sets of dynamic data on which multiple ML algorithms are executed concurrently. Given the feasability limitations of evaluating models using extremely large dynamic datasets, our ML models were only evaluated using a selected set of scenarios associated with a selected set of data to demonstrate the feasibility of the system. Sensitivity, specificity, the receiver operating characteristic curve and AUC were used during model evaluation for anomaly detection and decomposition analysis (see Fig. 19). In the worst-case scenarios, the AUC for anomaly detection, logistic regression, and decompression tree analysis was 0.941, 0.911, and 0.896, respectively [11].
In contrast, when evaluating performance of linear regression algorithms, MAPE was used as follows: where M is MAPE, n is the number of summation iterations, A t is actual value and F t is the forecast or predicted value. MAPE has been used for model evaluation by other researchers along with root-mean-square error (RMSE) [12].
In the best-case scenarios, M < 0.10 was obtained from Scenarios 4, 6, 9, and 10 (see Table 4). However, for the other scenarios, M was found to be 0.255 on average. Fig. 20 shows a user receiving an ML-based insight on her mobile device immediately after selecting a particular scenario. The ML-based insight states, ''When longitude decreases by 0.66, the average fatality count increases by 4.4''. The prediction accuracy of this insight was MAPE = 0.105, which is higher than that of other algorithms used in landslide research [12]. Therefore, using this instant ML-based insight, the user can decide to increase landslide preparedness in cities at lower longitudes. In all previous studies, an expert data scientist was in charge of manually preparing and modeling the data and manually training and testing the ML model [7], [9]- [14]. Given that strategic decision-makers are often required to make quick decisions, delegating data science tasks to experts is often not feasible. The system presented in this paper completely automates the task of data preparation and ML modeling. Our decision support system is hosted in Microsoft Cloud, which can be accessed by users on their laptops, tablets or mobile phones. The moment a user selects a scenario from our 1.295 × 10 64 possible scenarios, our system automatically prepares the data for that scenario and executes the appropriate ML algorithm to provide hidden insights about the selected scenario to the decision-maker. A fully automated ML-based decision support system has not been previously reported.

V. DISCUSSION
The benefits of the proposed system include the following: 45. It can be executed by users with no prior knowledge of data science and ML algorithms. 46. It instantly prepares the data and executes ML modeling without delay, supporting quick decisions. 47. It is completely scalable, supporting multiple data sources with unlimited concurrent users. VOLUME 9, 2021 48. It uses two ML algorithms (anomaly detection and decision tree analysis), which have not been used in previous landslide research, with a clustering accuracy of up to AUC = 0.941.

It elaborates and explains the result in plain language
to the strategic decision maker Being a fully automated decision support system, it lacks the manual rigor of ML modeling with multiple algorithms, as demonstrated in previous research. Therefore, some studies have reported a higher accuracy in classification and prediction using a different set of ML algorithms (e.g., AUC of 0.951 in [9], AUC of up to 0.991 in [7], MAPE of 0.125 in [12]). In our future work, we will include modified versions of random forests and CNNs (as reported in [7] and [9]) as well as support vector regression (as reported in [12]) into our fully automated decision support system, which may improve the accuracy of the current version.    Moreover, the current version only analyzes textual information. In future versions, we plan to include multidimensional data, including imagery, light detection and ranging data, synthetic aperture radar (SAR) and interferometric SAR data [14]. We believe that adding multiple sources of data will increase our system's capability in terms of more insightful knowledge discovery with higher accuracy.

VI. USER NOTES
The ML-based knowledge discovery solution proposed in this study was implemented using Microsoft Power BI, which is freely available for download from https://app.powerbi.com/. The user can download the complete source files (.pbix) along with global landslide data (.csv) files from the authors' GitHub source control site at https://github.com/DrSufi/ GlobalLandslide [17]. After downloading and opening the entire solution using MS Power BI Desktop, the user can host the solution either in Microsoft Cloud or a local network to make it available to other researchers or strategic planners.
Typical users of this system are town planners, policymakers, and disaster recovery strategists concerned about landslides in any region or country (the system is capable of generating insights for 157 countries). The system will allow users to understand the characteristics of landslides in a particular area and provide useful guidance for policy implementation to mitigate the risks associated with landslides in that area.

VII. CONCLUSION
Traditionally, town planners and strategic decision-makers have relied on traditional statistical analyses of regional landslide databases for strategic planning and policy implementation [2], [15], [17], [35], [36]. Previous research into ML-based algorithms has also relied on regional and local datasets, where the data were manually prepared and ML models were manually created by expert data scientists, researchers, and engineers.
In this study, we used advances in ML and AI on NASA's robust database of global landslides to create an automated ML-based solution that provides interactive knowledge discovery in user-defined scenarios with a higher degree of accuracy compared with other ML algorithms in the literature.
We found that anomaly detection had a higher level of accuracy (AUC = 0.941) than decompression tree analysis (AUC = 0.896) and regression (AUC = 0.911, MAPE = 0.255). It should be noted that this study is the first to report on the use of anomaly detection and decompression tree analysis algorithms for analyzing landslide data. Moreover, to the best of our knowledge, our system is the first of its kind to directly provide ML-based hidden trends and insights into a vast set global landslide data containing 1.296 × 10 64 scenarios to strategic decision-makers.
In the future version of our solution, we plan to enhance system capacity and capability by studying the potential for: 50. additional landslide data sources 51. additional ML algorithms (e.g. random forest, CNN, support vector regression) 52. additional types of data (e.g. imagery, light detection and ranging data, SAR and interferometric SAR data). He is currently an Assistant Professor of information systems with Umm Al-Qura University. His current research interests include enterprise resources planning (ERP), including ERP life cycles, implementation conflicts, stakeholders, and cloud-based ERP, as well as digital transformation in government organizations, software quality, and human-computer interactions. VOLUME 9, 2021