Anomalies Prediction in Radon Time Series for Earthquake Likelihood Using Machine Learning-Based Ensemble Model

The ability to predict the radioactive soil radon gas concentration is important for human beings because it serves as a precursor to earthquakes. Several studies have been conducted across the globe to confirm the correlation of radon emission dynamics and earthquakes, and concluded that the soil radon gas is the witness of anomalous behaviour before the occurrences of several earthquakes. This anomalous behavior can help to construct a better prediction model for earthquake forecasting. This paper aims at employing different ensemble and individual machine learning methods on real time radon time series data with different scenarios to predict anomalies in data caused by the seismic activities.The ensemble methods include boosted tree, bagged cart and boosted linear model while standalone machine learning methods include support vector machine with linear and radial kernels and k-nearest neighbors ( ${K}$ -NN). We tested the methods on a dataset recorded on the fault line located in Muzaffarabad. Time series data was collected over a period ranging from March 1, 2017 to May 11, 2018 including nine(09) earthquakes. The methods are tested in four different settings with 10 times 10 folds cross validation procedure over the time window of 1 to 4. The repeated 10 fold cross validation is performed to reduce the noise in the model performance estimation by replicating the 10 fold cross validation procedure 10 times. Statistical performance evaluation measures viz. root mean square error (RMSE), root mean squared log error (RMSLE), mean absolute percentage error (MAPE), percentage bias (PB), and mean squared error (MSE) have been calculated for the assessment of performance. In setting 1, the support vector machine with radial kernel performs better with the minimum RMSE score of 1381.023 when compared to other prediction models. In setting 3, it can be observed through different performance metrics such as RMSE, the value in the range [1262.864, 1409.616] which is minimum when other prediction models for predicting soil radon gas concentration dataset. For setting 4, the boosted tree model yielded the minimum RMSE and MAPE scores of 1573.174 and 0.056 respectively. Findings of the study shows that boosted tree and support vector machine with radial kernel proved to be better regression models for the prediction of anomalies in soil radon gas concentration during seismic activities. An important finding of this study suggests that by employing boosted tree ensemble method make us able to accurately predict soil radon gas concentration automatically from environmental parameters.


I. INTRODUCTION
The accuracy at which the decision support systems (DSSs) predict the samples, say for earthquake, medical diagnosis, etc. is of main concern in several domains especially where human lives are at stake. Earthquake is considered to be a major natural disaster and its unpredictability causes loss of human lives and infrastructure [1]. When talking about the earthquake prediction, there exist two different schools of thought. The first considers it to be a phenomenon which is impossible to predict in advance while others have spent a lot of resources and efforts to make it predictable. Various studies have been carried out in the past to tackle this challenging task through different angles [2]- [9]. The factor which makes it more challenging is the lack of technology to monitor the stress, pressure, changes occurring deep beneath the earth's crust using scientific instruments with more accuracy which may result in exploiting and extracting comprehensive seismic features for the purpose of analysis. During the earthquake preparation process beneath the surface, different geophysical and seismological processes occur's. Radon and one of its radioactive isotope thoron produced from uranium and thorium sources deep down the earth may potentially serve for the prediction of impending earthquakes. Radon has three naturally occurring isotopes viz. 222 Rn (usually called radon, stems its origin from radioactive 238 U series), 220 Rn (called as thoron, stems its origin into 232 Th radioactive series) and 219 Rn (called as action, stems its origin into 235 U radioactive series). Crustal abundance of 238 U (Uranium), 232 Th (Thorium) and 235 U (Actinum) isotopes are 2.7, 8.5 and 0.02 µg kg −1 respectively. Though concentration of 232 Th is somewhat higher than 238 U in the earth crust but rate of production of 222 Rn and 220 Rn is about the same due to longer half life of 232 Th (14.1 × 10 9 years) as compared with 238 U (4.5 × 10 9 years). Out of three naturally occurring isotopes 222 Rn is more important due to its longer half life (3.825 days) as compared to 220 Rn (55.6 s) and 219 Rn (Actinon) [10]. Half-lives of later two isotopes restrict transport of these isotopes by diffusion method to short distances only. However thoron manages to reach earth surface but in lesser quantity than radon. In this article we shall focus on radon rather than other isotopes.
Several studies have been carried out across the globe focusing on earthquake prediction based upon anomalous behavior of radon gas in the atmosphere, soil, and water [3], [11]- [14]. The uneven behaviour of the radon in soil and water was correlated with the earthquake, first time, dated back in 1967 [15] and another study in 1976 also reported spikes in radon concentration before the occurrence of the earthquake [16]. Moreover, in 1978, another study reported unusual behaviour of radon concentration prior to earthquake [17], and resulted in extensive research activity to further explore the correlation between earthquakes and radon emission dynamics [18]- [25]. Moreover, the nature of carrier gases and other meteorological parameters definitely influence the radon emission underlying forces [25]- [28].
Consequently, with the recent advancements in computer science, different computational intelligence techniques have been successfully introduced to predict radon concentration from meteorological parameters [3]. Regression trees have been used to predict the radon soil gas concentration through environmental data such as pressure, rainfall, air temperature and soil temperature, and concluded that the prediction error increases a week before the earthquakes having magnitudes ranging from 0.8 to 3.3 [29], [30].A neural network system using radial basis function (RBF) has been tested that can be used as an alternative to traditional regression methods to isolate radon emission anomalies [31]. The proposed model was further tested and evaluated on future data set and the prediction accuracy 87.8% was acheived. Tareen et al. employed three different computational intelligence models to automatically detect anomalous behaviour in soil radon gas time series data by modelling the radon concentration with different statistical and meteorological parameters [11]. The findings of the study reveal that the irregular behaviour of radon concentration is caused by seismic activities. A study was conducted to optimize the machine learning model namely artificial neural network (ANN) for the accurate prediction of radon dispersion in Vietnam and concluded that ANN performed very well in order to predict radon dispersion with the lower values of performance metrics [32]. The soil radon gas concentration was estimated by employing a Deep Neural Network (DNN) using different environmental parameters and mapped the functional relationship between radon concentration and environmental parameters [33]. A new method was proposed which is based upon Adaptive Linear Neuron (Adaline) and estimated the soil radon gas concentration with associated environmental parameters [34]. The proposed methodology can efficiently differentiate the temporal variation of radon concentration related to environmental parameters. Sikder et al. employed the decision tree algorithm for the characterization of premonitory factors of low seismic activity that outperformed other regression-based techniques [35].
Machine learning explores the problem structure and construction of algorithms that can learn from and make predictions on data. It is a branch of artificial intelligence that deals with the development and study of algorithms that are capable of making models for predictions or decisions [36]. With the advent of technology, machine learning methods have shown significant results in various fields of studies such as medical diagnosis [37]- [42], banking [43], [44], market basket analysis [45], [46] and many more. A variety of methods are offered in this context such as Diagonal Linear Discriminant Analysis (DLDA) [47], k-Nearest Neighbors (k-NN) [48], Support Vector Machine (SVM) [49] and Random Forest (RF) [50] for classification and prediction purposes. Moreover, ensemble methods were also proposed such as bagging [51], boosting [52], [53] and stacking [54] where the final prediction is not made by a single model only but rather by aggregating the outcome of various weak learned models [55]. Ensemble methods show significant improvement in performance than individual models in classification and prediction problems [56]- [59]. The improvement in the performance by employing ensemble methods is based upon the premise that prediction made by the ensemble is more accurate than relying on the individual classifier that constituted the ensemble [55].
The core idea of this research work is to investigate the ensemble methods and individual learning models for the accurate prediction of soil radon gas concentration time series data. Ensemble methods used in this paper are boosted tree, bagged cart and boosted linear model, and in the individual learning models' category, support vector machine with linear and radial kernels, and K-Nearest Neighbors (K-NN) are used. The testing of ensemble and individual learning methods are performed in different settings ranging from 1 to 4. Moreover, each setting consists of several windows from W 1 to W 4 . The prediction of the soil radon gas concentration during the seismic activity or anomaly that captures the variations of the original concentration is of main interest in this study. The prediction model can better predict the soil radon concentration accurately that leads to the identification of anomalies in the time series. Instead of relying upon a single dataset for testing purposes, the testing phase is decomposed into different types of settings which are the incorporation of different seismic activities. Each of the settings leads to a different composition of training and testing sets. The time window scheme is employed to predict the radon concentration in different periods of time. The window comprises of the days before and after the occurrence of seismic events: a window size of 1 means 1 day before and after the seismic event. Likewise, window size of 3 and 4 means the samples which belong to 3 and 4 days before and after the occurrence of seismic activity. The impact of a seismic event ranged from its preparation phase (before the occurrence of a seismic event) to aftershocks. For experimentation, the dataset is recorded on the fault line present in Muzaffarabad; a city in Kashmir, administered by Pakistan over a period from 1 st of March 2017 to 11 th of May 2018 included 9 seismic events or earthquakes. The detailed description of seismic events along with their magnitude is presented below in Table 1. The cross validation procedure is applied which is 10 times 10 fold cross validation for training the models and tested using a test set provided by each setting.The original sample is randomly divided into 10 equal size subsamples in 10-fold cross-validation. One of the 10 subsamples is kept as validation data for testing the model, while the remaining 9 subsamples are used for training purposes. The cross-validation procedure is then repeated 10 times, with each of the 10 subsamples serving as validation data exactly once. To generate a single estimate, the 10 fold results are averaged. Further, this procedure is repeated 10 times and the model performance is estimated by averaging the performance across all folds and repeats. The basic idea behind the use of repeated cross validation is to incorporate all the samples in model training and validation as well as reduce the noise in the estimation of model performance. The experimentation is performed in the R language environment using the package CARET (Classification and Regression Training) [60].For performance evaluation, frequently used statistical metrics are computed such as RMSE, RMSLE, MAPE, PB and MSE. The ensemble and individual models are purely assessed upon the performance of the methods to efficiently capture temporal variations and functional relationships between radon concentration and environmental parameters.

II. MATERIAL AND METHODS
In this section, the statistical details of the soil radon gas time series dataset have been presented along with earthquake or seismic activities information. Moreover, a basic understanding of the ensemble and individual machine learning methods is also provided. The detailed information of the proposed simulation plan for prediction of soil radon gas concentration is also pictorially presented and discussed in details. Finally, the mathematical formulation of the performance metrics used for performance estimation of ensemble and individual machine learning methods for predicting soil radon gas concentration is also provided.

A. AREA OF STUDY
The Muzaffarabad city is the capital of state of azad Jammu and Kashmir, Pakistani administrated part of Jammu and Kashmir. It shares border with Pakistani provinces Khyber Pakhtunkhawa and Punjab towards west and south respectively. Eastern border is connected with the Indian administrated part of Kashmir. According to 2017 census, total population of city of Muzaffarabad was 149913. Muzaffarabad suffered from 2005 devastating earthquake with a magnitude 7.6M w causing more than 80000 causalities in and around superbs of city. Muzaffarabad is a cup shaped valley. Air quality index (AQI) of Muzaffarabad is unhealthy for sentitive group of peoples. Particulate matter concentration (PM 2.5 ) in Muzaffarabad air is 6.6 times above the WHO air quality standards [61]. Since Muzaffarabad is seismically active area and has history of occurance of regular devastating earthquakes, so forecasting possible earthquake in future is a attractive field of study. We have installed RADON measuring station over a fault line passing beneath the Muzaffarabad.

B. DATA ACQUISITION
RTM-1688-2 SARAD nuclear Instrument was installed, for the continuous radiometric measurement of radon and meterological parameters, at Chehla location with latitude 34.39621 N and longitude 73.47347 E. Radioactive radon decays into its short living daughter products which are used to find radon concentration within the radon measurement chamber. Radon-222 decays into the Polonium-218 with the emission of alpha particle. Momentarily Polonium-218 becomes positively charged due to orbital electron scattering from emitted alpha particles. Positive ions of Polonium-218 is collected by working radon chamber and number of polonium-218 ions collected in chambers are proportional to the radon concentration. RTM 1688-2 works in slow and fast modes and stores the data on non-volatile memory using a circular architecture. The data acquired from the measurements is downloaded to a personal laptop using the seriel interface [62].

C. DATA DESCRIPTION
The dataset used for this work is ''soil radon gas time series data'', recorded on the fault line located at the Muzaffarabad city of Pakistan administered part of Kashmir as shown in Figure 1.The single reading was recorded after every 40 minutes, ensuing in 36 readings for the complete day. The concrete details of the radon measurement station and its instrumentation are reported elsewhere [3], [7], [11]. The dataset contains 15692 valid observations of radon concentration along with its environmental parameters such as thoron (Bq/m 3 ), temperature ( 0 C), relative humidity and pressure (mbar). During the data collection period, nine seismic activities were observed whose details with their magnitude are presented in Table 1. When considering the attribute of interest i.e. radon concentration (RN), the minimum and maximum observed radon concentration were 13743 Bq/m 3 and 28085 Bq/m 3 respectively. Moreover, the mean and median of the whole radon time series was found to be 21364 Bq/m 3 and 21569 Bq/m 3 . During the seismic activity period, the minimum of radon concentration (RN) was observed with the concentration value of 16132 Bq/m 3 while the maximum was 26650 Bq/m 3 . For thoron time series, the concentration of thoron,during seismic activities, varied from 2146 Bq/m 3 to 3734 Bq/m 3 respectively. Figure 2 presents the complete experimental framework for this work. The simulation is executed for two different groups of machine learning methods presented as Group 1 and Group 2. Group 1 consists of ensemble methods for learning while Group 2 contains individual learning methods. The ensemble methods used in group 1 are boosted tree model, bagged cart model and boosted linear model while individual learning models are K-NN, SVMs with linear and radial kernels as presented in Group 2. The simulation is executed in 4 different settings ranging from setting 1 to 4 The basic purpose to introduce these settings is to investigate the prediction capability of the learned models on different test sets which included almost every seismic activity. Apart from the different distributions of training and testing data, the time window is also incorporated. The time window enables us to obtain data related to seismic activity along with all the samples of the days before and after the seismic activity as specified. Several investigations from the globe have confirmed the unusual behavior of soil radon gas concentrations prior to the occurrence of several earthquakes. This unusual behavior in soil radon before an earthquake could lead to the development of a better forecasting model that can lead to the prediction of soil radon gas concentration. The forecasting model can capture the temporal fluctuations in the soil radon time series by training and testing on multiple time windows. After every 40 minutes, a single reading is taken, totaling 36 readings for the complete day. The idea of the window is to extract the seismic activity along with the relevant time window (window of 1 means 36 readings before and after the seismic activity) which incorporates the non-seismic sample to seismic time series for better analyzing the variations.. The novelty of this work is to introduce settings that incorporate seismic activities in both training and testing sets to better analyze the forecasting models. The previously reported studies simply divide the dataset into seismic and non-seismic. The model was trained using non-seismic activity dataset. Further, the trained model is used to predict soil radon gas concentration in seismic activity dataset. For this work, the time window from 1 to 4 is used to extract the testing test. Each of the settings leads to a different distribution of training and testing data. Consider setting 1 presented in Figure 2, the training data consists of all non-seismic activity (NSA) samples along with the seismic activity (SA) data of 1,2,5,6,7,8,9 while testing set composed of samples belonging to seismic activity (SA) 3 and 4 with respect to time window ranged from 1 to 4. Thus, each setting along with a time window enables one to assess the performance of the models from group 1 and group 2 in a more efficient manner. The training set, splitted by each setting, is trained by ensemble methods as well as individual learning methods and results in their respective trained machine learning models. These ensemble VOLUME 10, 2022  and individual models are further tested by predicting the test set. The models are trained through a cross validation procedure which is 10 times 10 fold cross validation for this study. The predictions made by each model from groups 1 and 2 are assessed by calculating different statistical performance evaluation metrics. The performance metrics include RMSE, RMSLE, MAPE, PB and MSE.

E. ENSEMBLE METHODS
In machine learning and statistics, the ensemble is the collection of multiple models and is one of the self-efficient methods as compared to other basic models [55]. Supervised learning algorithms are extremely useful in searching through different solution spaces to predict suitable hypothesis space for certain problems.The ensemble technique combines different hypotheses to provide the best hypothesis. Basically, ensemble technique is used for obtaining a strong learner with the help of a combination of weak learners. While performing classification using ensemble methods, more computations are performed as compared to making predictions with a particular model so multiple models can be a way to help poor algorithms for performing well after doing extra computations. The ensemble method is also an example of supervised learning as firstly it is trained and then it makes predictions and represents a single hypothesis space. Experimentally, ensemble methods provide more accurate results provided that there is considerable diversity between the models.

1) BOOSTING AND BAGGING
In order to generate the different base learners in ensemble methods, sequential and parallel ensemble methods are used, such as boosting and bagging [63]. Sequential ensemble methods, such as boosting, are employed to exploit the dependence between the different base learners generated whereas in parallel ensemble method, bagging as a representative, is to exploit the independence between the base learners generated. Boosting ensemble method boosts the overall performance of a base learning algorithm in a residual-decreasing way [63]. On the other hand, the bagging ensemble method combines the independent base learner to reduce the error. The word bagging is the abbreviation of Bootstrap AGGregatING [51]. Bagging is designed to improve the accuracy of predictions in decision support systems by model averaging that helps to reduce the variance and minimizes the overfitting problem. In order to perform bagging, m different bootstraps are created from the original training data. The base learning algorithm either for classification or regression is trained upon each bootstrap and this result in m individual base learners. In the areas where classification is of main concern, the final classifications are made by combining the base learners' classifications by plurality voting or averaging the probabilities of the estimated class.For regression problems, the new predictions are made by averaging the predictions of the individual models generated using different bootstraps. Consider X is a sample for which the prediction needs to be made, BL 1 (x), BL 2 (X ), . . . . . . , BL m (X ) are the predictions generated from individual base learners. The bagged prediction P b ag is the aggregation of the predictions from individual base learners formulated as: This aggregation results in the reduction of the variance of an individual base learner and minimizes the overfitting problem as discussed above. For the base learning algorithms having larger variance (decision trees) than others, bagging works very well and improves its performance whereas the algorithm having higher bias (linear regression), the bagging results in less improvement of performance in classification and regression problems [55], [64]. The higher variance base learners are those learners for which a small change in the training data can make a major change in response values.
Boosting works by finding many rules of thumb using a subset of the training examples simply by sampling repeatedly from the distribution [65]. In subsequent iterations a new rule is generated using the subset of training examples. To make the boosting approach workable one of the methods is to focus on the difficult to predict/classify examples and to increase the weights of the examples that are misclassified. Therefore, the hardest examples would be included in the next iteration during sampling, enabling it to be predictable in the next rule of thumb. The accuracy of each weak rule is measured by how much it accurately classifies the examples. Finally, the predictions about unseen samples are made by aggregating the predictions of all the weak rules to make a single prediction rule with the hope that the aggregate is better than using a single prediction rule. A general boosting procedure is given below in Figure 3.

F. K-NEAREST NEIGHBOR
K-NN technique is a non-parametric method first developed in 1951 [66] and further expanded by Thomas Cover [48]. The algorithms work by finding the feature similarity to predict the values for test samples. The feature similarity is calculated in such a way that the distance is computed for new data samples from all the training sets. For distance calculation, there exists a variety of methods such as Euclidian and Manhattan distances. The Euclidean distance is computed by the sum of the squared difference between the existing (y) and Moreover, the Manhattan distance is the sum of the absolute difference between existing and a new point formulated as: After calculating the distance of a new sample from each sample in the training set, the K number of neighbors needs to be selected to find the classification or prediction for the new sample. The step-by-step working of the K-NN algorithm is given below. 1) Read the training and test dataset.
2) Initialize the value of K to the optimum number of neighbors 3) For every sample in test data. a) Compute the distance between the test sample and the training set. b) Distance and index of the sample is added to the ordered collection. c) The ordered collection is sorted in ascending order by their distances computed in step 3a. d) Choose the first K entries from the sorted collection. Return the mean of the K response values to serve as the predicted value for the current testing sample.

G. SUPPORT VECTOR MACHINE
The support vector machine (SVM) [67] is a deterministic technique and considered to be the most useful machine learning tool where classification and regression tasks are of concern. It was originally designed for a binary classification task that separates the samples of different classes with hyperplanes having maximum margin [68]. However, the minimum distance of instances of different classes from the classification hyperplane is called the margin. The SVM with some modifications can be used for regression tasks where the output is a real value known as support vector regression (SVR). For regression, the epsilon-insensitive regression ( − SVM ), the data for VOLUME 10, 2022 training the algorithm consists of predictor variables and associated observed response values. Here, the goal is to find a function g(x) that does not deviate more than epsilon ( ) for each training point x. In the case of linear SVM regression, let us consider a training data where x n a multivariate set of M samples with associated response values y n . In order to find the linear function g(x) that is as flat as possible, the task is to find the function g(x) with norm having minimum value (ββ ) [69].
To do so, the formulation results as a convex optimization problem to minimize the function put through all residuals with the value less than epsilon ( ) as given by: For the points when there is no such function g(x) to satisfy all the constraints above, the slack variables are introduced for each point to deal with this situation as given by: The C is known as a box constant that helps to get rid of overfitting. It is a positive numeric value that controls the penalty imposed on samples lying outside the epsilon margin epsilon margin ( ) and tolerates the trade-off between the flatness of g(x) and the extent to which the deviations are larger than . Moreover, the loss is measured from the distance between the epsilon boundary and observed value y as given by: For linear SVM regression, the Lagrange dual (L D ) can be obtained by introducing different non-negative multipliers α n and α * n for each of the instances x n . The L D for the Lagrange primal function L p is given below where we minimize the function as given by: Finally, the function that is used to predict the test set or new values are given by:

III. PERFORMANCE MEASURE
In order to assess the accuracy of the predictions of radon concentration (RN) from other attributes such as thoron, temperature, relative humidity and pressure, different frequently used performance metrics are computed. RMSE is considered to be a frequently used performance evaluation measure that has been applied to various fields of studies where prediction models are of concern. It is more sensitive to outliers because a large difference between actual and predicted values results in a markedly larger effect on its value. RMSE can be computed from: where V represents total number of samples The presence of outliers when calculating RMSE can explode the error term but RMSLE can scale down the outliers and result in nullification of their effect. The RMSLE can be calculated from the equation given below: (log(Actual n + 1) − log(Predicted n + 1)) 2 where V represents total number of samples (11) It is used mostly to avoid the excessive effect of huge differences in the predicted and actual values in the case when these values are higher in number. Moreover, the MAPE is also frequently used performance metric which is used to assess the accurateness of prediction model, computed from: Actual n − Predicted n Actual n (12) MAPE is the average of absolute percentage error. The features that make MAPE popular and useful are its scale independency and easy interpretation [70]. Apart from its advantages, it has certain disadvantages such as resultant undefined or infinite values when the actual values are zero or close to zero. The actual values with a magnitude less than 1 yielded the MAPE to a higher percentage value whilst the actual zero values resulted in infinite MAPE values [71]. Moreover, Mean Squared Error (MSE) is a performance metric that estimates how much the actual and predicted values are closer to each other, computed with V number of samples from the equation given below: More simply put, it is the average square difference between the actual and predicted value. The lower the value of MSE indicates the better fit of the prediction model. The tendency of predicted value to be smaller or larger in average to its real or actual value can be described by percentage bias (PB), formulated with V number of samples as: The larger positive values of PB indicate overestimation bias whilst larger negative values indicate model underestimation bias. On the other hand, PB of value 0 is considered to be an optimal value representing accurate model simulation.

IV. RESULT AND DISCUSSION
The RMSE and MAPE statistics for ensemble and individual learning methods are presented in Table 2. The ensemble methods include boosted tree method, bagged cart and boosted linear model and individual learning models are K-NN, support vector machine (SVM) with the linear and radial kernel. The statistics presented in Table 2 are calculated by employing all the methods from groups 1 and 2 on the soil radon gas concentration dataset in setting 1. The setting 1, as shown in Figure 2, is the distribution of training and testing samples in such a way that training data is composed of the non-seismic activity data (NSA) and seismic activities (E1, E2, E5, E6, E7, E8 and E9) while testing data is constituted by E3 and E4 with respect to time window from 1 to 4. The statistics calculated in Table 2 Figure 4-7 (a-f), presenting actual and predicted soil radon gas concentration when splitting data according to setting 1 and time window of 1 to 4. The actual radon concentration is presented by a red curve while the predicted radon concentration is presented in a black color curve. It can be seen that boosted tree and SVM with radial kernel are the two competent models from the rest because both models perform very close to each other and overlapping most of the original radon time series. The boosted linear model performs worst in setting 1 (window from 1 to 4) and does not capture the temporal variations in the time series. However, bagged cart and K-NN perform nearly equivalent to each other and perform better and result in capturing some temporal variations efficiently when compared to boosted linear model and SVM with linear kernel. Table 3 presented the different statistics when comparing actual and predicted radon concentration by the different ensemble and individual learning methods keeping setting 2 (see Figure 2) by time window of 1 to 4. It can be seen from Table 3; although, the boosted linear model performs better than other machine learning models specified with the value of RMSE in the range [1082.2, 1173.95] for windows from 1 to 4 but the boosted linear model did not capture the temporal variation as per original radon concentration (see Figure 8 (c)). This can be easily observed through percentage bias (PB) value of -0.002, -0.004 and -0.004 for time windows of 2, 3 and 4 respectively, showing negative bias which is the clear indication of model underestimation bias. A similar type of patterns can be observed in Figure 8(a-f) presenting actual and predicted radon concentrations for setting 2 and a time window of 3 days. Refer to Figure 8 (a-f), the actual and predicted radon concentration showing in red and black color, apart from lowest RMSE value of the boosted linear model, the predicted values did not follow the variations in the radon time series data. Hence, this results in negative VOLUME 10, 2022  percentage bias. However, support vector machine (SVM) with a linear kernel is the better option to be considered because it overlaps the original radon time series by capturing temporal variations throughout the tested time series. It can also be seen from Table 3, after a boosted linear model, the support vector machine (SVM) with a linear kernel has the lowest RMSE value as well as a percentage bias closer to ''0''. These statistics leads to a conclusion that SVM with linear kernel performs better in setting 2 with the time window of 1 to 4. From Table 4, by experimenting with setting 3 (see Figure 2)   relatively promising when compared with the boosted linear model with highest value and SVM with radial kernel with closer average MAPE, MSE statistics of 0.067, 3976024 and 0.05, 1992770 respectively. Refer to Figure 9 (a-f), the actual and predicted radon concentration for the ensemble and individual learning methods are shown in red and black curves. It can be seen from Figure 9a; the predicted radon gas concentration during the seismic activities overlaps the VOLUME 10, 2022    and resulted in larger values of different error metrics presented in Table 4. Table 5 presents the RMSE and MAPE statistics for ensemble and individual learning methods keeping setting 4 and the time window of 1 to 4. The minimum RMSE and MAPE score of 1573.174 and 0.056 is achieved by boosted tree model and it can be easily seen from Figure 10 (a-f) that the predictions made by boosted tree model overlap with the original radon concentration presented in red color. Similarly, from Figure 10f, the support vector machine (SVM) with radial kernel performs similar to other experimentation results calculated above in different settings, the performance of SVM with radial kernel is similar to boosted tree model by overlapping most of the variations in original radon concentration time series. The statistics computed above in different settings from 1 to 4 across all the time windows, it is concluded that boosted tree based ensemble method performs better than the individual models when predicting soil gas radon time series data during the seismic activities. It is also observed that a support vector machine with a radial kernel is the second choice after boosted tree method for this task because of its performance is slightly better than boosted tree method in setting 1 while in setting 3 and 4, its performance is closer to boosted tree method. VOLUME 10, 2022

A. COMPARISON WITH EXISTING LITERATURE
In this section, the methodology and experimental results, obtained by testing soil radon gas concentration data have been compared with most recent studies. Mir et al. [72] proposed a methodology that categorizes soil radon gas concentration data into seismically active and non-active using stacking and automatic anomaly indication function. The radon concentration along with the labeled anomaly data was trained by a meta-learner that classifies it into seismic and non-semic ones. Further, these classifications are passed to an automatic anomaly indication function that labels the time series by calculating the indication percentage. The points where indication percentage gets higher or equal to indication factor were considered to be an anomaly. Tareen et al. [7] proposed an earthquake prediction model based on boxplot interpretation using soil radon gas concentration data. The specific patterns were observed in soil radon gas concentration by analyzing boxplots. This is due to the different geological and seismic activities before the occurrence of the earthquake. Tareen et al. [11] also employed computational intelligence techniques for the detection of anomalous behavior in soil radon gas before seismic activities. The authors reported that the seismic activity or noise could be responsible for the abnormality in soil radon time-series data. Rafique et al. [3] proposed a methodology based on delegation for the accurate prediction of soil radon gas concentration data. The methodology is tested by splitting the data into seismically active and non-active time series. The delegated regressor and other methods were trained using non-seismic time series. The trained models were used to predict the seismically active time series data. Further, the root mean squared error for actual and predicted soil radon concentration was calculated for each model. The delegated regressor model outperforms when compared to other machine learning models. This research study provides more exhaustive experimentation by introducing settings that lead to different compositions of training and testing sets. Instead of training by using non-seismic data only, as performed in delegated regressor model, each setting incorporates different seismic activities for training and testing purposes. These settings along with time windows lead us to choose a better prediction model for the prediction of soil radon gas concentration at radon measuring stations. This is the novel methodology to gauge the importance of machine learning based ensemble and individual learning methods to forecast the radon concentration efficiently.

V. CONCLUSION
In order to predict radon concentration, a precursor for an earthquake, this study has employed different ensemble and individual machine learning methods for the prediction of soil radon gas concentration using different environmental attributes. The performance of the methods is assessed more vividly by incorporating different training and test set distributions through settings from 1 to 4. The training set is composed of different seismic activities and normal data while testing data is based upon seismic activities with its associated time window from 1 to 4. In setting 1, boosted tree and support vector machine (SVM) with radial kernel performed alike and captured temporal variations in the time series more effectively. For setting 2, boosted linear model has the least RMSE and other performance metrics did not capture temporal variations in the time series. Moreover, support vector machine with linear kernel and boosted tree performed relatively better than other models. In setting 3 and 4, the boosted tree model outperformed when compared to other ensemble and individual models by predicting soil gas radon concentration more accurately. This study concludes that ensemble methods results in relatively better regressed models, and support vector machine with radial kernel performs closer to boosted tree model in setting 3 and 4. This study suggests a boosted tree method to automatically predict soil radon gas radon concentration from environmental parameters in the soil radon time series. The prime focus of this study is to predict the soil radon gas concentration during the anomalies. However, this study can be extended to classify the anomalies in predicted radon concentration. Moreover, the post-processing methods such as automatic anomaly indication function may also be applied. He was also the Chairman at the Department of Physics, and the Director of quality enhancement and ORIC. He is currently working as a Professor of physics with the Department of Physics, The University of Azad Jammu and Kashmir, Muzaffarabad. He is also the Director of Advanced Studies, as an additional charge, at The University of Azad Jammu and Kashmir. He has published more than 100 research articles in international and national journals of repute. He has also produced more than 35 M.Phil./M.S. and five Ph.D. students as a Supervisor and a Co-Supervisor. His research interests include reactor physics, radiation physics, computational physics and mathematics, geophysics, and medical physics. JABER S. ALZAHRANI received the Ph.D. degree from Lamar University, in 2015. He is currently an Associate Professor at the Department of Industrial Engineering, College of Engineering at Al-Qunfudhah, Umm Al-Qura University, Saudi Arabia. He has many peer-reviewed articles. His research interests include optimization, supply chain, scheduling, and AI.
HANY MAHGOUB received the Ph.D. degree in computer science from the Faculty of Computers and Information, Menoufia University, Egypt. In 2010, he was an Assistant Professor of computer science at the Department of Computer Science, Faculty of Computers and Information, Menoufia University. Since February 2017, he has been an Assistant Professor of computer science at the Department of Computer Science, Faculty of Science and Arts, King Khalid University, Saudi Arabia. He is the author of more than 20 articles, and many funded research projects. His research interests include AI, intelligent systems and bioinformatics, the IoT, smart cities, human computation, software testing, machine learning, data mining, text mining, web mining, information retrieval, information extraction, big data, semantic web, and distributed systems.
MANAR AHMED HAMZA received the Ph.D. degree from Omdurman Islamic University, Omdurman, Sudan, in March 2021. She is currently a Lecturer with the Department of Computer and Self Development, Prince Sattam Bin Abdulaziz University, and the Faculty of Computer Science and Information Technology, Omdurman Islamic University. Her research interests include data mining, text mining, and machine learning. VOLUME 10, 2022